Envoy AI Gateway v0.6.x
v0.6.0
✨ New Features
AWS Bedrock
InvokeModel API for ClaudeSend requests to Claude models on Bedrock through Bedrock's native InvokeModel endpoint, complementing the existing Converse API path. Useful when applications already speak the Anthropic Messages format and want a thin translation layer.
Call Amazon Titan embedding models on Bedrock through the standard OpenAI /v1/embeddings contract. Switch embedding providers without changing client code. Cohere and other Bedrock embedding models are not yet covered and will follow in a later release.
Anthropic and Cross-Provider Translation
/v1/messages endpoint on OpenAI backendsExpose any OpenAI-compatible backend through Anthropic's Messages API. Lets Claude-style clients reach OpenAI, Azure OpenAI, or any other OpenAI-compatible provider behind the gateway without rewriting requests.
Pass JSON schema constraints through to Claude so responses conform to your declared shape. Available on Anthropic and AWS Bedrock Claude backends today; GCP Vertex AI Claude is excluded pending upstream provider support.
max_tokens is omitted on Anthropic requestsRequests without an explicit max_tokens no longer crash the translator; they're forwarded so the provider returns a normal validation error. Removes a long-standing footgun when forwarding OpenAI-shaped requests through the Anthropic path.
claude-opus-4.6Translate Claude's new adaptive thinking mode end-to-end. Adaptive lets the model decide thinking depth per request rather than committing to a fixed budget, so callers can opt in without bespoke provider code.
reasoning_effort across Anthropic, OpenAI, and GeminiA single OpenAI-style reasoning_effort value (low/medium/high/xhigh) now maps onto Anthropic's thinking budgets and Gemini 3's thinking controls. One client knob, three providers.
Gemini Provider
Use Gemini embedding models through the OpenAI /v1/embeddings contract, completing Gemini coverage alongside chat completions and Responses.
Activate Gemini's context caching using the same Anthropic-style cache_control prefix surface already supported elsewhere. Cut input token costs on long, repeated system prompts without a Gemini-specific code path.
Non-streaming Gemini reasoning is now exposed as both string content and structured thinking_blocks, matching the shape clients already use for Anthropic responses. Streaming responses still surface reasoning as string content only.
OpenAI API Compatibility
Second wave of Responses API work fills in context management and improved streaming so the /v1/responses path is closer to parity with /v1/chat/completions. If you held off on /v1/responses due to missing features, retest now.
Improved compatibility with non-OpenAI implementations of the Responses API (e.g. open-source inference servers that expose a /v1/responses endpoint), broadening which Responses-aware clients can sit in front of the gateway.
/v1/audio/speechRoute OpenAI text-to-speech requests through the gateway, so audio workloads benefit from the same auth, rate limiting, and observability as chat traffic.
MCP Gateway
MCPRouteBackendRef.forwardHeaders accepts a list of inbound headers to forward to each backend, optionally renaming them on the way out. Each MCP backend can receive its own set of headers (e.g. trace context, tenant identifiers, per-user auth) without a single route-wide rule.
Project verified JWT claims into outbound headers via MCPRouteOAuth.claimToHeaders, enabling identity-aware tool execution at backend MCP servers without re-authenticating downstream.
excludeRegex on tool selectorsMCPToolFilter now supports deny patterns (literal exclude and regex excludeRegex) alongside the existing include rules. Useful when a backend exposes more capabilities than a given route should surface.
Tool invocations now carry the tool name in dynamic metadata (key mcp_tool_name), so per-tool debugging, dashboards, and access-log fields are straightforward to wire up.
The gateway tracks which MCP server feature flags (tools, prompts, resources, logging, completions) each backend supports and merges them across a route. Capability negotiation now reflects what's actually reachable, so clients don't get told a feature is available when no reachable backend implements it.
Authentication and Identity
GCP backends now authenticate using the standard ADC chain when neither credentialsFile nor workloadIdentityFederationConfig is set in the BackendSecurityPolicy. Workloads running on GKE pick up Workload Identity automatically — no static service account JSON secret needed.
Security and Privacy
Strip or mask sensitive fields in request and response bodies before they hit logs, traces, or metrics. Lets you keep observability on while meeting privacy and compliance constraints.
Observability
aigwStandalone aigw wires up OTLP access logging out of the box when an OTLP endpoint is configured (via OTEL_EXPORTER_OTLP_ENDPOINT), removing a manual step from local-dev and demo paths.
agent-session-id → session.id header mappingSpans and logs now correlate by session.id automatically when clients send the agent-session-id header, so agent frameworks like Goose get session correlation with zero config. Override or disable via OTEL_AIGW_REQUEST_HEADER_ATTRIBUTES. Metrics never default to session IDs (high cardinality).
ReasoningToken cost typeLLMRequestCostType now includes ReasoningToken, so you can budget and bill against thinking tokens separately from input, output, and cache cost types.
Responses now carry the resolved upstream model in metadata, which clients and downstream tools can read to confirm exactly which model served a request (useful when routes use model aliasing or fallback).
Removed the OTEL span attribute count limit so long-context requests no longer have parts of their trace silently dropped.
Operations and Extensibility
The conversion webhook can now bind to a configurable port (controller.mutatingWebhook.port) and run on the host network (controller.hostNetwork), smoothing installs in clusters with restrictive admission webhook networking such as GKE private clusters.
Lua filters can now be attached after the AI ExtProc stage in the standard filter chain, so you can do last-mile request shaping (header rewrites, body tweaks) without writing a custom EnvoyExtensionPolicy.
Set GatewayConfig.spec.globalLLMRequestCosts for fleet-wide defaults and override per-route at AIGatewayRoute.spec.llmRequestCosts. Makes per-tenant or per-backend cost tracking straightforward without per-route boilerplate.
🔗 API Updates
- Core CRDs promoted to
aigateway.envoyproxy.io/v1beta1:AIGatewayRoute,AIServiceBackend,BackendSecurityPolicy,GatewayConfig, andMCPRouteare now served at v1beta1, signaling that the core API and MCP routing surface are stable enough for production use. v1alpha1 versions remain registered with deprecation warnings so existing manifests continue to apply during the upgrade window. MCPRouteBackendRef.forwardHeaders: New per-backend list of headers to forward, with optional rename. Replaces the need for a single route-wide header forwarding rule when backends expect different headers.MCPRouteOAuth.claimToHeaders: Configure which verified JWT claims should be projected into outbound headers to MCP backends.MCPToolFilter.exclude/excludeRegex: Tool selectors now support exclusion alongside inclusion, with both literal and regex forms.LLMRequestCostType.ReasoningToken: New cost type for thinking-token usage, complementing the existing input, output, and cache cost types.GatewayConfig.spec.globalLLMRequestCosts: Fleet-wide cost defaults that individualAIGatewayRoute.spec.llmRequestCostsentries can override.- Preview:
QuotaPolicyAPI (v1alpha1, no runtime enforcement yet): New CRD surface for declaring upstream-provider quota policies, laying the groundwork for quota-aware routing. Currently API-only — no controller reconciliation or enforcement is wired up. Track it for future releases; do not rely on it as a working feature today.
⚠️ Breaking Changes
AIGatewayRoute.spec.filterConfigremoved: ThefilterConfigfield onAIGatewayRoutehas been removed. Move external-processor configuration (resources, env vars, image overrides) to aGatewayConfigresource referenced from theGatewayvia theaigateway.envoyproxy.io/gateway-configannotation. v0.5 deprecated theresourcessubfield with a pointer toGatewayConfig; v0.6 removes the entirefilterConfigstruct, so anything still set there must move now.VersionedAPISchema.versionno longer acts as an endpoint prefix for OpenAI-schema backends: The legacy behavior deprecated in v0.5 — using theversionfield as a path prefix for OpenAI-schema backends — has been removed. Use the dedicatedprefixfield instead (e.g.prefix: /v1beta/openaifor Gemini's OpenAI-compatible API,prefix: /compatibility/v1for Cohere).
🐛 Bug Fixes
- Webhook cache race during extProc injection: Fixes a race where freshly applied AIGatewayRoute resources could miss extProc injection on first reconcile because the conversion webhook read from a stale cache. Scripted apply-then-curl tests should see fewer flakes.
- Field ownership preserved on updates: Controllers no longer claim ownership of fields they don't manage during updates. If you co-deploy the AI Gateway controller alongside other operators that touch adjacent fields (e.g. service mesh injectors, policy controllers), expect fewer reconcile churn loops on shared resources.
- Orphan cleanup for MCPRoute backendrefs: Resources tied to MCPRoute backend references are now cleaned up when the route or reference is removed, fixing a leak that could leave stale config in the cluster.
- Standalone Envoy startup failures surfaced by
aigw:aigwnow reports standalone Envoy startup failures cleanly instead of hanging or printing an unhelpful trace, making local dev and CI loops much faster to diagnose. - Bedrock Titan embeddings dataplane route: Restored the Envoy route for Titan embeddings in dataplane tests so Titan workloads exercise the full pipeline.
- Hardened bearer token parsing: Malformed
Authorization: Bearerheaders used to panic the MCP subject extractor; they now return a clean auth failure and the request falls through to the standard auth failure path. - Request context propagation in
PostTranslateModify: Kubernetes client calls insidePostTranslateModifynow honor request cancellation and deadlines, reducing stuck reconciles when the parent request is canceled. - Case-sensitive JSON marshalling and unmarshalling: JSON encoding now consistently honors case, fixing subtle mismatches when round-tripping fields whose names differ only in case (visible previously as occasional 400s on certain provider payloads).
- Secret rotation propagates to MCPRoute: Updates to a Secret referenced by an
MCPBackendRefare now reflected in the live configuration, matching howBackendSecurityPolicyalready handled secret updates. Operators rotating MCP backend credentials no longer need to bounce the route. - MCP proxy handles compressed
Accept-Encodingfrom upstreams: The MCP proxy now correctly handles compressedAccept-Encodingvalues from upstream requests, fixing failures when MCP backends advertise gzip or other compression schemes. aigwstandalone accepts IP addresses for endpoints: In standalone mode,aigwpreviously assumed endpoints were hostnames; you can now point OpenAI and MCP env config at IP addresses (e.g.127.0.0.1), making local dev against loopback addresses work.
📖 Upgrade Guidance
Migrating from filterConfig to GatewayConfig
The filterConfig field on AIGatewayRoute has been removed in v0.6. If you previously configured the external processor (resources, environment variables, image overrides) via filterConfig on individual routes, move that configuration to a GatewayConfig resource and reference it from the Gateway.
Before (v0.5):
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
name: my-route
spec:
filterConfig:
externalProcessor:
resources:
requests:
cpu: "100m"
memory: "128Mi"
After (v0.6):
apiVersion: aigateway.envoyproxy.io/v1beta1
kind: GatewayConfig
metadata:
name: my-gateway-config
namespace: default
spec:
extProc:
kubernetes:
resources:
requests:
cpu: "100m"
memory: "128Mi"
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: ai-gateway
annotations:
aigateway.envoyproxy.io/gateway-config: my-gateway-config
Migrating VersionedAPISchema.version to prefix
The deprecated v0.5 behavior of using VersionedAPISchema.version as an endpoint path prefix for OpenAI-schema backends has been removed in v0.6. Use the dedicated prefix field instead.
Before (v0.5):
schema:
name: OpenAI
version: /v1beta/openai # legacy: version field overloaded as path prefix
After (v0.6):
schema:
name: OpenAI
prefix: /v1beta/openai # explicit prefix field
Adopting v1beta1 APIs
AIGatewayRoute, AIServiceBackend, BackendSecurityPolicy, GatewayConfig, and MCPRoute are now served at aigateway.envoyproxy.io/v1beta1. Existing v1alpha1 manifests continue to work via conversion, but new manifests should target v1beta1 directly:
apiVersion: aigateway.envoyproxy.io/v1beta1
kind: AIGatewayRoute
Switching GCP backends to Workload Identity
If you're running on GKE, drop static service-account keys and let the gateway pick up Application Default Credentials. Configure your BackendSecurityPolicy for GCP with the appropriate workload identity binding on the controller's service account; no serviceAccountJSON secret is required.
📦 Dependencies Versions
Updated to Go 1.26.2 to pick up the latest security and performance fixes.
Built on Envoy Gateway v1.7.0 for the newest data plane capabilities and stability fixes.
Leveraging Envoy Proxy v1.37.0 for the latest networking and security features.
Support for Gateway API v1.4.1 specifications.
Continued integration with Gateway API Inference Extension v1.0.2 for stable intelligent endpoint selection.
Updated to modelcontextprotocol/go-sdk v1.4.1 for the latest MCP protocol features and fixes.
⏩ Patch Releases
🙏 Acknowledgements
We extend our gratitude to all contributors who made this release possible. Special thanks to:
- The growing community of adopters for their valuable feedback and production insights
- Everyone who reported bugs, submitted PRs, and participated in design discussions
- The Envoy Gateway team for their continued collaboration
🔮 What's Next
We're already working on features for future releases:
- Quota-aware routing — building on the new backend quota policy API to route around rate-limited upstreams automatically
- Deeper MCP authorization — finer-grained policy across tools, resources, and prompts
- Expanded provider coverage — additional embeddings, audio, and image generation backends across cloud providers
- More efficient large-context handling — continued improvements to streaming, memory use, and tracing for long-context workloads