Skip to content

Step 4: AI Review (Phase 2)

Module: junior.harnesses.pydantic (via junior.runbook.runner.run_runbook())

Provider: OpenAI | Model: gpt-5.4-mini | Tokens: 35,398

  • User message from Step 2 (12,339 chars)
  • The merged system prompt from Step 3 (4,316 chars total)
  • The pydantic harness makes a single structured LLM call

The pydantic harness makes one structured LLM call with output_type=LLMReviewOutput. The model returns the summary, recommendation, and the full comments list directly in that one response. Junior only attaches the measured token usage afterward.

{
"summary": "The code quality is poor overall, with multiple critical security flaws...",
"recommendation": "request_changes",
"comments": [/* 38 findings */],
"input_tokens": 28174,
"output_tokens": 7224,
"tokens_used": 35398
}
SeverityCount
Critical5
High20
Medium13
Total38
FileLineCategoryIssue
api.py93securityeval(payload) on untrusted input — arbitrary code execution
api.py59securitysubprocess.run(..., shell=True) with user input — command injection
api.py88bugeval(payload) in webhook handler
api.py73securityeval(payload) — flagged again at a second call site
api.py50securityshell=True with user-controlled command
FileLineCategoryIssue
auth.py55securitycheck_permission() returns True for unknown roles
auth.py14securityDeterministic token from timestamp + hardcoded secret
auth.py8securityHardcoded SECRET_KEY in source
auth.py11securityMD5 for password hashing
database.py33securitySQL injection via f-string in find_user()
database.py61bugSQL injection in delete_user()
database.py70bugSQL injection in update_user_role()
database.py80bugSQL injection in list_users()
database.py94bugSQL injection in search_users()
auth.py27logicToken validation broken — regenerates with current timestamp
  • Some issues are flagged more than once (e.g. eval at multiple call sites)
  • The model returns request_changes as the recommendation, given the critical findings
  • 35,398 tokens used by the single structured call