Mid-Market Teams Should Skip the LLM Gateway for Claude Code

The Gateway Tax No One Talks About

CloudThat's deployment guide for Claude Code on Bedrock mentions an "LLM gateway pattern" for routing requests to multiple providers and enforcing governance. This is the first mistake mid-market teams make: building infrastructure to solve problems they don't yet have. A gateway adds latency, operational surface area, and a month of engineering time to reach feature parity with native Bedrock IAM. Unless you're actively multi-homing between Bedrock and Vertex AI today (not planning to, actually doing it), the gateway is a distraction.

Bedrock ships AgentCore with built-in request routing, usage metering per IAM principal, and CloudWatch integration that logs every invocation with user context when you federate through OIDC. The pattern CloudThat describes as "recommended for production", direct IdP integration via OpenID Connect, gives you per-developer attribution, temporary credentials, and audit trails without writing a single line of gateway code. A 200-person company does not need a custom proxy to call bedrock.InvokeModel.

The Real Decision: SSO Federation vs. Console Sprawl

The only authentication choice that matters at mid-market scale is whether you federate your IdP (Okta, Entra ID) directly to AWS IAM Identity Center or hand out console credentials. CloudThat lists four options; two are non-starters (API keys, manual aws login) and one (SSO via Identity Center without federation) gives you single sign-on but no user-level cost attribution. The fourth option, OIDC federation from your IdP to AWS with assumed roles scoped to individual developers, is the only pattern that survives a SOC 2 audit and gives finance per-seat usage data.

Here's the decision tree: if you need to show "Developer X invoked Claude Sonnet 47 times last Tuesday" in an internal dashboard or chargeback report, you must federate. If you're fine with team-level aggregates ("Engineering used 2M tokens this month"), Identity Center suffices. Most Maple customers pick federation because their CFO asks "why did our Bedrock bill jump $4K?" and the only honest answer without user-level logs is "we don't know."

Agentforce + Claude + Bedrock: Where the Proxy Reflex Comes From

The gateway instinct comes from SaaS platforms where you can't control the client. If you're deploying Agentforce with a Data Cloud Einstein Trust Layer calling Bedrock, you don't get to rewrite Salesforce's HTTPS stack, so a gateway makes sense for logging or failover. But when your developers are the clients, writing Apex that calls Bedrock via Named Credentials, or Python scripts in a Databricks notebook, you own the call site. You can log, retry, and route in application code. A gateway centralizes these concerns at the cost of a new service to operate, a new failure mode (gateway down = all AI down), and complexity that makes onboarding a new engineer take two days instead of two hours.

Maple's reference architecture for mid-market Agentforce + Claude deployments uses Bedrock's public endpoints, federation through the customer's existing IdP, and OpenTelemetry collectors in the VPC for custom metrics if needed. We ship a CloudFormation stack that provisions IAM roles, CloudWatch dashboards, and a Lambda that ingests Bedrock logs into the customer's existing observability stack (Datadog, Splunk, whatever). Total setup time: four hours. A customer who insists on a gateway spends three weeks building it, another two weeks debugging why Claude responses are slower than their proof-of-concept, and then asks us to rip it out when they realize the bottleneck is the EC2 instance they sized for "low traffic."

When You Actually Need the Gateway

There are two legitimate reasons to run an LLM gateway: (1) you are contractually obligated to log every token in and out for compliance, and your existing logging stack can't ingest CloudWatch at the required durability SLA, or (2) you need to switch between Bedrock Claude and another provider (Vertex, Azure OpenAI) within the same request based on realtime cost or quota signals. Scenario one is rare outside healthcare and defense. Scenario two describes maybe 15 companies globally. If you think you're in scenario two, you're probably solving for a future state that's nine months away. Deploy the simple thing now, add the gateway when the pain is real.

The mid-market mistake is designing for theoretical scale. A gateway that routes to "multiple providers" sounds responsible in an architecture review, but if you're not calling those providers today, you've just built a $60K/year EC2 cluster to forward HTTPS requests. Use Bedrock directly, federate your IdP, ship the feature.