ReadPaper Blog
SearchSwarm: Delegation Intelligence for Long-Horizon Deep Research
SearchSwarm studies how agentic large language models can handle deep research tasks whose information needs exceed finite context windows. The paper proposes a delegation-based context-management approach in which a main agent decomposes research work, dispatches bounded subtasks through a call_sub_agent tool, and integrates citation-grounded reports; harness-guided trajectories are then filtered and used for supervised fine-tuning. The resulting SearchSwarm-30B-A3B model reports leading comparable-scale performance on BrowseComp, BrowseComp-ZH, GAIA, and xbench-DeepSearch, suggesting that delegation intelligence can be learned rather than only prompted at inference time.
Source: SearchSwarm: Delegation Intelligence for Long-Horizon Deep Research

Finite Bubble, Huge Quest
SearchSwarm addresses a central bottleneck in long-horizon deep research: a model’s context window is finite, while real research tasks can accumulate unbounded searches, pages, observations, hypotheses, and intermediate conclusions. The paper argues that common context-management strategies, such as summarizing history after a threshold or retaining only selected tool outputs, are passive because they react after the trajectory has already grown too large. Its alternative is an active main-distributes, sub-executes paradigm in which the model plans ahead, delegates bounded research subtasks, and receives condensed reports instead of carrying every intermediate step in the main context. This framing treats delegation not primarily as adding more models, but as content-aware context compression performed by the same model invoked in fresh independent contexts. The importance of the result is that deep research can become more scalable when the model learns to decide what information should remain central and what can be explored separately.

The Gap: No Delegation Skill
The paper defines the missing capability as delegation intelligence, meaning the ability to decompose complex tasks, decide when and what to delegate, and integrate returned results into the ongoing workflow. This is a sharper target than generic tool use because a research agent must preserve the global objective while isolating lower-level execution into well-scoped subtasks. In the authors’ ReAct-style formulation, the main agent repeatedly produces thoughts, actions, and observations, with call_sub_agent added to the action space alongside retrieval tools. When delegation occurs, the subagent sees only the generated brief rather than the main agent’s full history, so the quality of the brief determines whether the subtask is useful. The returned report then becomes the main agent’s observation, forcing the main workflow to synthesize compressed evidence while retaining responsibility for final judgment and uncertainty handling.

SearchSwarm’s Trick
SearchSwarm’s method centers on a harness that elicits high-quality delegation before converting that behavior into training data. The harness equips the main agent with search, visit, google_scholar, python, and call_sub_agent tools, and it guides the agent to delegate lower-level execution while maintaining an independent understanding of the overall research progress. A distinctive design choice is that the main agent must brief each subagent not only with a task description but also with the rationale: why the subtask matters and how it fits into the broader research goal. This requirement is intended to reduce redundant exploration and help the subagent conduct focused research inside its independent context. The paper also constrains subagent reports to include explicit source citations, allowing the main agent to verify conclusions and propagate evidence into the final answer.

From Harness to Training Data
The paper then uses harness-guided trajectories as supervised fine-tuning data to internalize delegation intelligence into model weights. Rather than relying only on prompting at inference time, the authors filter trajectories to retain examples that encode correct delegation decisions, including appropriate timing, subtask scope, and context-rich briefing. These filtered trajectories provide demonstrations of how the main agent should manage a long research workflow through model-generated briefs and reports. Supervised fine-tuning teaches the model to reproduce the delegation patterns that the harness helped elicit, turning a procedural scaffold into a learned behavior. This data-synthesis recipe is presented as a response to the scarcity of naturally occurring text that explicitly demonstrates multi-step delegation and multi-agent coordination.

What the Paper Claims
The paper reports that SearchSwarm-30B-A3B achieves 68.1 on BrowseComp and 73.3 on BrowseComp-ZH, which the authors describe as the best results among models of comparable scale. It also reports 82.5 on GAIA and 80.8 on xbench-DeepSearch, with the model remaining competitive against substantially larger open-source and closed-source systems in the comparisons shown. These results support the paper’s claim that delegation intelligence improves deep research performance by making context management more proactive and semantically structured. The contribution is framed as preliminary but practical: a harness, training data construction process, and model-training path for long-horizon agent tasks. The authors state that they will release the harness, model weights, and training data to facilitate further research on delegation intelligence and agentic LLM coordination.
