Harness: DeepAgents
File: src/junior/harnesses/deepagents.py
Env var: HARNESS=deepagents (BACKEND is a deprecated alias)
Dependencies: deepagents, langchain, langchain-anthropic, langchain-openai
Auth: ANTHROPIC_API_KEY or OPENAI_API_KEY
Harness Contract
Section titled “Harness Contract”The module exposes a single HARNESS instance (DeepAgentsHarness, a Harness
subclass). Its one method is schema-agnostic — the result schema is a parameter:
complete(*, system_prompt: str, user_message: str, output_schema: type[BaseModel], settings: Settings) -> LLMResultThe code-review runbook passes LLMReviewOutput, but the harness works for any
runbook’s result model. The output_schema becomes the args_schema of a
submit_review tool that the orchestrator calls exactly once.
file_access = False — context is inlined into the user message by the runbook;
the FilesystemBackend is for extra exploration beyond the diff.
Architecture
Section titled “Architecture”A single langchain orchestrator (create_deep_agent) with one tool —
submit_review. The orchestrator explores via a FilesystemBackend and submits its
synthesized result by calling submit_review once. There is no per-prompt
subagent fan-out.
complete(output_schema=…) │ ▼submit_tool, captured = _make_submit_tool(output_schema) ← args_schema = output_schemabackend = FilesystemBackend(root_dir=settings.context.project_dir) │ ▼agent = create_deep_agent( model=settings.llm.model_string, tools=[submit_tool], system_prompt=system_prompt + _SUBMIT_INSTRUCTIONS, backend=backend,)agent.invoke({messages: [HumanMessage(user_message)]}, config={callbacks: [_TokenCounter()]}) │ ▼┌──────────────────────────────────────────────────────┐│ Orchestrator LLM ││ ││ Step 1: Read the inlined context + explore repo ││ (read_file, ls, grep, glob via backend) ││ ││ Step 2: Synthesize, dedup, prioritize ││ ││ Step 3: submit_review(**fields) ← exactly once ││ │ ││ ▼ ││ output_schema(**fields) → captured[0] │└──────────────────────────────────────────────────────┘ │ ▼LLMResult(output=captured[0], usage=Usage(input, output, total)) ← from _TokenCounterPrompt Handling
Section titled “Prompt Handling”The runbook assembles system_prompt and user_message (context inlined, since
file_access = False). The harness appends _SUBMIT_INSTRUCTIONS to the system
prompt — telling the orchestrator to explore and then call submit_review exactly
once — and passes the user message as a single HumanMessage. One orchestrator, one
submit call.
Output Format
Section titled “Output Format”A LangChain StructuredTool whose args_schema is the requested output_schema:
def _make_submit_tool(output_schema: type[BaseModel]): captured: list[BaseModel] = []
def submit_review(**kwargs) -> str: obj = output_schema(**kwargs) captured.append(obj) return "Result submitted."
tool = StructuredTool.from_function( func=submit_review, name="submit_review", args_schema=output_schema, handle_validation_error=True, ) return tool, captured- The LLM sees the full schema with its constraints (for code review: severity, category, recommendation enums)
handle_validation_error=True— validation errors are returned as a ToolMessage and the LLM retries- If
submit_reviewis never called:RuntimeError— the review fails
File Access
Section titled “File Access”Provided by FilesystemBackend(root_dir=settings.context.project_dir):
| Tool | Description |
|---|---|
read_file(path) | Read file content |
ls(path) | List directory |
grep(pattern, path) | Regex search |
glob(pattern) | File pattern matching |
The orchestrator decides autonomously which files to read — it can explore beyond the inlined context.
Token Tracking
Section titled “Token Tracking”Via a LangChain BaseCallbackHandler (_TokenCounter) wired in as a callback on
agent.invoke(). It sums input/output tokens across every LLM call and the harness
returns them as a Usage:
class _TokenCounter(BaseCallbackHandler): input_tokens: int = 0 output_tokens: int = 0
def on_llm_end(self, response: LangchainLLMResult, **kwargs): # Prefer message.usage_metadata (newer langchain) # Fallback to generation_info.token_usage (older langchain)Error Handling
Section titled “Error Handling”| Situation | Behavior |
|---|---|
| submit_review not called | RuntimeError — no result produced (tokens wasted) |
| Multiple submit_review calls | Uses first captured result, logs warning |
| LLM API error | Exception propagates to the runner |
| Infinite tool loop | No built-in limit (risk) |
| Token budget exceeded | No limit (risk) |
Pros and Cons
Section titled “Pros and Cons”| Pros | Cons |
|---|---|
| LLM orchestrator — flexible workflow | Highest token usage of the harnesses |
| File exploration — reads only what’s needed | No max_iterations (could loop) |
submit_review args_schema = output_schema | 4 heavy dependencies |
| Can adapt strategy per run | Unpredictable execution time |
| submit_review with validation retry | Least predictable cost |
[!WARNING] DeepAgents is the least reliable harness. The orchestrator sometimes skips
submit_review, especially with sparse prompts or large diffs. Usepydanticfor production.