ChatGPT (OpenAI)
Pattern: maximum concealment. Tool calls are a single collapsed pill — "Searched 5 sites" or "Used Python". Click to expand and see a flat list of site names or code. The pill sits inline with the response. Deep Research mode shows a scrolling log during execution ("Reading about…", "Analyzing…") but collapses to nothing once done.
Good for you: the inline pill is unobtrusive and lets the prose breathe. Bad for you: almost no provenance visible by default. A toxicologist would have to click into every pill to understand where data came from.
Claude (Anthropic)
Pattern: clear separation. Tool use renders as a distinct collapsible block with a different background. The "thinking" and "analysis" blocks are clearly delineated from response text. Web search shows "Searched X queries" with expandable query list. Artifacts (code, documents) open in a side panel.
Good for you: clean separation between reasoning and tool execution. Bad for you: still a "black box" — you see that a tool was called, not what parameters or which specific source returned data.
Perplexity
Pattern: inline citations. The most provenance-forward UI. Every claim in the response text has a numbered superscript [1][2][3] linking to a source panel. The panel shows favicon + title + URL for each source. Pro Search mode shows step-by-step progress ("Step 1: Searching…", "Step 2: Reading 8 sources").
Good for you: this is the closest to what toxicologists want — every data point traceable to a source. Bad for you: Perplexity's sources are web pages (URL + title). Your sources are API calls with structured parameters (CAS numbers, SMILES strings, section filters). You need deeper provenance than "here's a link."
Grok (xAI)
Pattern: live process log. DeepSearch shows a real-time scrolling log: "Searching for X", "Reading Y", "Found Z". The final answer includes inline [N] citations. The log collapses after completion.
Good for you: excellent transparency during execution — you can see exactly what the system is doing. Bad for you: the log is noisy and not useful after completion. No structured audit capability.
What HumanChemical needs that none of these have
- Source-level provenance — not just "we searched" but which database, with what parameters
- Audit depth on demand — a toxicologist should be able to inspect the exact CAS number, section filter, or SMILES string used in any query
- Result summaries without expanding — "23 studies retrieved" or "LogP 3.32" should be visible at a glance, not buried behind a click
- Handling 6–20+ tools — consumer AI assistants rarely make more than 3 tool calls. Your batches are 10× larger.