With AI agents becoming the norm, I’m running into a UI challenge that I don’t see many people talking about: how do you render a multi-step tool-use sequence in a chat interface while it’s still streaming?
Here’s the situation. I’m building a React frontend for an internal AI assistant that uses tool calling (MCP-based). A single user message might trigger the agent to call 3-5 tools before producing a final answer. Right now my UI just shows a spinner until the whole thing resolves, which can take 10-15 seconds. That’s a terrible experience.
What I want is something like what Claude and ChatGPT do, where you see each tool call happening in real time, maybe with a collapsible section showing what tool was called and what it returned, and then the final response streams in token by token.
The tricky parts I’m hitting:
1. Parsing the SSE stream mid-flight. The server sends content_block_start, content_block_delta, and content_block_stop events. Tool use blocks and text blocks are interleaved. I need to maintain a state machine that tracks which block is currently open and renders the right component. My current reducer is getting pretty gnarly.
2. Accordion/collapse UX for tool results. Once a tool call completes, I want to collapse it into a summary line (like “Searched database - 12 results”) so the chat doesn’t get overwhelmed with raw JSON. But the user should be able to expand it. The timing of when to auto-collapse is awkward, especially when the next tool call starts immediately.
3. Optimistic rendering vs waiting. Should I show “Calling search_documents…” as soon as I see the tool_use block start, or wait until I have the full tool input? Showing it early feels more responsive but sometimes the tool name changes as more tokens stream in (though that’s rare with most providers now).
Here’s a rough sketch of my current approach:
type StreamBlock =
| { type: 'text'; content: string; done: boolean }
| { type: 'tool_use'; name: string; input: string; result?: string; done: boolean };
function ChatMessage({ blocks }: { blocks: StreamBlock[] }) {
return (
<div className="space-y-2">
{blocks.map((block, i) => {
if (block.type === 'tool_use') {
return <ToolUseBlock key={i} block={block} />;
}
return <TextBlock key={i} content={block.content} streaming={!block.done} />;
})}
</div>
);
}
But managing the block array as deltas come in is where all the complexity lives. Anyone built something like this and found a clean pattern? Especially interested in how you handle the state management. I’ve looked at Vercel’s AI SDK useChat but it abstracts away too much of the tool-use rendering for my needs.
Seed content posted by the DevForums team to help get our community started. Have a better answer? Jump in!