Skip to content

Harness: DeepAgents

File: src/junior/harnesses/deepagents.py Env var: HARNESS=deepagents (BACKEND is a deprecated alias) Dependencies: deepagents, langchain, langchain-anthropic, langchain-openai Auth: ANTHROPIC_API_KEY or OPENAI_API_KEY

The module exposes a single HARNESS instance (DeepAgentsHarness, a Harness subclass). Its one method is schema-agnostic — the result schema is a parameter:

complete(*, system_prompt: str, user_message: str,
output_schema: type[BaseModel], settings: Settings) -> LLMResult

The code-review runbook passes LLMReviewOutput, but the harness works for any runbook’s result model. The output_schema becomes the args_schema of a submit_review tool that the orchestrator calls exactly once.

file_access = False — context is inlined into the user message by the runbook; the FilesystemBackend is for extra exploration beyond the diff.

A single langchain orchestrator (create_deep_agent) with one tool — submit_review. The orchestrator explores via a FilesystemBackend and submits its synthesized result by calling submit_review once. There is no per-prompt subagent fan-out.

complete(output_schema=…)
submit_tool, captured = _make_submit_tool(output_schema) ← args_schema = output_schema
backend = FilesystemBackend(root_dir=settings.context.project_dir)
agent = create_deep_agent(
model=settings.llm.model_string,
tools=[submit_tool],
system_prompt=system_prompt + _SUBMIT_INSTRUCTIONS,
backend=backend,
)
agent.invoke({messages: [HumanMessage(user_message)]},
config={callbacks: [_TokenCounter()]})
┌──────────────────────────────────────────────────────┐
│ Orchestrator LLM │
│ │
│ Step 1: Read the inlined context + explore repo │
│ (read_file, ls, grep, glob via backend) │
│ │
│ Step 2: Synthesize, dedup, prioritize │
│ │
│ Step 3: submit_review(**fields) ← exactly once │
│ │ │
│ ▼ │
│ output_schema(**fields) → captured[0] │
└──────────────────────────────────────────────────────┘
LLMResult(output=captured[0],
usage=Usage(input, output, total)) ← from _TokenCounter

The runbook assembles system_prompt and user_message (context inlined, since file_access = False). The harness appends _SUBMIT_INSTRUCTIONS to the system prompt — telling the orchestrator to explore and then call submit_review exactly once — and passes the user message as a single HumanMessage. One orchestrator, one submit call.

A LangChain StructuredTool whose args_schema is the requested output_schema:

def _make_submit_tool(output_schema: type[BaseModel]):
captured: list[BaseModel] = []
def submit_review(**kwargs) -> str:
obj = output_schema(**kwargs)
captured.append(obj)
return "Result submitted."
tool = StructuredTool.from_function(
func=submit_review,
name="submit_review",
args_schema=output_schema,
handle_validation_error=True,
)
return tool, captured
  • The LLM sees the full schema with its constraints (for code review: severity, category, recommendation enums)
  • handle_validation_error=True — validation errors are returned as a ToolMessage and the LLM retries
  • If submit_review is never called: RuntimeError — the review fails

Provided by FilesystemBackend(root_dir=settings.context.project_dir):

ToolDescription
read_file(path)Read file content
ls(path)List directory
grep(pattern, path)Regex search
glob(pattern)File pattern matching

The orchestrator decides autonomously which files to read — it can explore beyond the inlined context.

Via a LangChain BaseCallbackHandler (_TokenCounter) wired in as a callback on agent.invoke(). It sums input/output tokens across every LLM call and the harness returns them as a Usage:

class _TokenCounter(BaseCallbackHandler):
input_tokens: int = 0
output_tokens: int = 0
def on_llm_end(self, response: LangchainLLMResult, **kwargs):
# Prefer message.usage_metadata (newer langchain)
# Fallback to generation_info.token_usage (older langchain)
SituationBehavior
submit_review not calledRuntimeError — no result produced (tokens wasted)
Multiple submit_review callsUses first captured result, logs warning
LLM API errorException propagates to the runner
Infinite tool loopNo built-in limit (risk)
Token budget exceededNo limit (risk)
ProsCons
LLM orchestrator — flexible workflowHighest token usage of the harnesses
File exploration — reads only what’s neededNo max_iterations (could loop)
submit_review args_schema = output_schema4 heavy dependencies
Can adapt strategy per runUnpredictable execution time
submit_review with validation retryLeast predictable cost

[!WARNING] DeepAgents is the least reliable harness. The orchestrator sometimes skips submit_review, especially with sparse prompts or large diffs. Use pydantic for production.