Salesforce shipped 151% more code with AI. Here's the architecture.

The token limit was the bottleneck

Salesforce Engineering crossed 90% AI adoption in early 2026, then immediately hit a wall. Developers had Claude Code; they just couldn't use it without rationing tokens. So engineering leadership did the thing that sounds obvious in hindsight but terrifies finance teams in the moment: they removed all token limits.

The result wasn't runaway cost. It was a 151% year-over-year jump in Effective Output, a machine learning score that measures actual value delivered, not just lines committed. PRs merged per developer went up 79%. Work items completed per developer climbed 50.8%. Incidents dropped 5% despite the volume spike. That last number is the one architects miss when they model agentic workflows: quality and velocity moved in the same direction because the agents weren't cutting corners to stay under a cap.

This is the first hard proof point that the "copilot tax" mental model is wrong. The mid-market companies still metering Claude API calls per seat are optimizing for the wrong variable. The cost of the tokens is noise compared to the cost of the engineer-hours you're still burning on toil.

The 18x sprint pattern

One Salesforce product team migrated 33 API endpoints to a new cloud-native architecture in 13 days. The traditional path would have taken 231 person-days, seven days per endpoint for schema mapping, testing, and documentation. They compressed it by building a rule-based framework in markdown that Claude ingested as context, then running autonomous LLM loops (build, fix, validate) in parallel across isolated environments.

Every PR review got folded back into the rule set, so accuracy improved mid-sprint. The largest single PR delivered 21 endpoints with 100% test coverage. Five PRs total. No human in the loop after the framework was locked.

This is the pattern Maple is seeing mid-market architects replicate with Agentforce + Data Cloud + Bedrock stacks. The framework isn't code; it's structured context (markdown, reference implementations, validation schemas) that the agent reads on every loop. The agent doesn't "learn" in the ML sense, it just has better instructions each time. The 18x number isn't magic. It's what happens when you remove the handoff latency between "write the migration script" and "fix the test failure" and "update the docs."

The scaffolding nobody talks about

Salesforce didn't get here by flipping a switch. They built Engineering 360, a centralized platform that ingests data from hundreds of internal systems to track security, availability, quality, and developer productivity in real time. That telemetry layer is what let them measure Effective Output instead of guessing from PR counts. It's also what gave them the confidence to remove token limits, they could see the quality impact live.

Most mid-market orgs don't have an Engineering 360. They have Jira, GitHub, and a Slack channel where people argue about cycle time. The gap between "we adopted Agentforce" and "we're running 18x sprints" is the observability stack. If you can't measure output quality independently of volume, you can't trust an agent to write production code unsupervised. If you can't trust the agent, you revert to human-in-the-loop, and the 18x collapses back to 1.2x.

Data Cloud is the wedge product here for mid-market companies that don't want to build a custom 360 platform. It's not positioned as "agent observability," but that's the job it does when you pipe GitHub webhooks, Jira state changes, and PagerDuty incidents into a unified schema and let Einstein or Claude query it. Snowflake Cortex Analyst can do the same thing if your data warehouse is already the system of record. The architecture decision is whether you want the agent runtime and the telemetry layer in the same product family (Agentforce + Data Cloud) or split across vendors (Claude API + Cortex). Both work; the tradeoff is integration surface area vs. best-of-breed flexibility.

What this means for B2B SaaS product teams this quarter

If your engineering org is still treating AI as a per-developer productivity boost, you're solving last year's problem. The new floor is agentic workflows that compress multi-week efforts into multi-day sprints by removing human handoff latency. The companies that ship this pattern in Q3 2026 will outrun competitors still doing code review the old way by a margin that looks unfair because it is.

The stack is Claude Code or Agentforce for the agent runtime, Data Cloud or Cortex for telemetry, and a rule-based context framework (markdown, not a fine-tune) that the agent reads on every loop. The activation energy is removing token limits and building enough observability to trust the output. Salesforce proved both are achievable at enterprise scale. Mid-market has fewer legacy constraints and should move faster.