Postmortem: Why DevFix Kept Looping After the Code Passed

The tests passed. PyTest returned exit code 0. The passed field in state was True.

The agent looped again anyway.

This is the postmortem on Artifact 2 - DevFix Auto-Agent, the self-healing coding agent built in AI Engineer HQ Cohort 1. Specifically, the 4 days we lost to a routing bug that should have taken 20 minutes to find. It did not because we looked in the wrong place first.

What DevFix Does

DevFix takes a natural language coding task and executes a self-correcting loop: plan, generate code, validate syntax with AST parsing, run PyTest inside a sandboxed Docker container, reflect on failures, and try again. Up to 5 attempts. When the code passes, the agent stops and returns the result.

That is the design. The implementation had a gap.

The Failure

During Cohort 1 build week 2, a member ran DevFix on a string manipulation task. The code passed on the second attempt. PyTest output showed 3 passed in 0.12s. The Docker container returned exit code 0.

The agent did not stop. It went back to the planner. It generated the code again. Again the tests passed. Again it looped.

It ran 5 times. All 5 passes. Then it stopped at the MAX_ATTEMPTS limit, not because it detected success.

The member reported it in the Thursday session. Two other members had seen the same thing that week and assumed they had done something wrong in their individual setups.

What We Thought Was Wrong

The first assumption: the Docker executor was not returning the correct exit code. We added print statements inside execute_code_in_docker(). The exit code was correct. returncode == 0. State was being updated with passed: True.

Second assumption: the state update was getting overwritten somewhere. We traced every node. The validator was writing passed: True. The executor was writing passed: True. No node was overwriting it back to False.

Third assumption: the LangFuse traces would show what happened. They did. Every node ran in the correct order. passed was True when should_continue was called. And should_continue still returned "reflector".

That is when we looked at should_continue properly.

The Actual Bug

def should_continue(state: AgentState) -> str:
    if state.get("passed") and state.get("approved", True):
        return "end"

    if state.get("attempts", 0) >= MAX_ATTEMPTS:
        return "end"

    if not state.get("passed"):
        return "reflector"

    return "end"

The logic looks correct. If passed is True and approved is True, return "end".

The bug is in the default value. state.get("approved", True).

The approved field in initial state was set to False:

initial_state = {
    ...
    "approved": False,
    ...
}

The executor node set approved to True only on the branch where human approval was enabled. When running with AUTO_APPROVE=false, the executor's approval check ran, the human said yes, but the approved field in state was never updated because the human approval function returned True without writing to state. The state key stayed False.

So should_continue evaluated state.get("approved", True). The default True in the .get() call only fires when the key does not exist. The key existed. It was False. The condition failed. The agent looped.

The routing default and the state initialization were inconsistent. A 4-day debugging session traced to one line.

The Fix

Two changes, both necessary.

Change 1: The executor node now writes approved to state explicitly after the human approval step, regardless of which path it takes.

def executor_node(state: AgentState) -> dict:
    code = state["code"]

    is_safe, safety_message = safety_guard(code)
    if not is_safe:
        return {
            "passed": False,
            "approved": False,
            "errors": f"Safety guard blocked execution: {safety_message}"
        }

    approved = human_approval(code)
    if not approved:
        return {
            "passed": False,
            "approved": False,
            "errors": "Human rejected the code. Regenerating."
        }

    success, output = execute_code_in_docker(code)

    return {
        "passed": success,
        "approved": True,
        "errors": output if not success else ""
    }

Change 2: should_continue no longer relies on the approved field as a routing signal. Human approval is handled inside the executor node. The routing function only cares about passed and attempts.

def should_continue(state: AgentState) -> str:
    if state.get("passed"):
        return "end"

    if state.get("attempts", 0) >= MAX_ATTEMPTS:
        return "end"

    return "reflector"

Simpler. No ambiguous defaults.

What the Diagram Shows

DevFix Routing Bug - Before and After

Before the fix, the routing function had a hidden dependency on a state field whose initialization was inconsistent with the function's default. The bug was invisible in unit tests because unit tests for should_continue passed approved=True explicitly. The integration test did not catch it because integration tests ran with AUTO_APPROVE=true. The specific combination of AUTO_APPROVE=false plus human saying yes was the only path that triggered it.

The Principle

State fields that control routing must have their initialization and their default values explicitly agreed on. If your routing function uses state.get("x", True) and your initial state sets x: False, you have a latent bug. It will find the worst possible moment to surface.

For every conditional edge in a LangGraph application, write a test that covers the exact initial state that users will pass in, not the state that makes the test easy to write.

That is the lesson. Four days to learn it. Twenty minutes to fix it.

What Changed in the Cohort After This

We added a state validation step that runs at graph entry before any node executes. It checks that every field used in routing has a defined, non-ambiguous value. If any routing field is missing or has an inconsistent default, the graph raises before the first node runs.

Every cohort member building an agent now gets this validator as part of the shared tooling in the MasterDexterAI GitHub org under shared/graph_validators.py.

If you are building AI systems in production and want to be part of the next cohort,

AI Engineer HQ Cohort 2 applications are open.