There’s an old saw in management: What you measure matters. And, typically, you get more of whatever you’re measuring.
Software engineers have debated productivity metrics for decades, starting with lines of code. But as the new generation of AI coding agents delivers more code than ever, what their managers ought to be measuring is less clear.
Enormous token budgets — essentially, the amount of AI processing power a developer is authorized to consume — have become a badge of honor among Silicon Valley developers, but that’s a very weird way to think about productivity. Measuring an input to the process makes little sense when you presumably care more about the output. It might make sense if you’re trying to encourage more AI adoption (or selling tokens), but not if you’re trying to become more efficient.
Consider the evidence from a new class of companies operating in the “developer productivity insight” space. They’re finding that developers using tools like Claude Code, Cursor, and Codex generate a lot more accepted code than they did before. But they also find that engineers have to return to revise that accepted code far more often than before, undercutting claims of increased productivity.
Alex Circei, the CEO and founder of Waydev, is building an intelligence layer to track these dynamics; his firm works with 50 different customers that employ more than 10,000 software engineers. (Circei has contributed to TechCrunch in the past, but this reporter had never met him before.)
He says that engineering managers are seeing code acceptance rates of 80% to 90% — meaning the share of AI-generated code that developers approve and keep — but they’re missing the churn that happens when engineers have to revise that code in the following weeks, which drives the real-world acceptance rate do …