Skip to main content

Envoy AI Gateway v0.2.x

Release version introducing Azure OpenAI integration, sidecar architecture, cross-backend failover, and a CLI tool.

v0.2.0

June 5, 2025
Azure OpenAI IntegrationSidecar ArchitecturePerformance ImprovementsCLI ToolsModel Failover and RetryCertificate Manager Integration
Envoy AI Gateway v0.2.0 builds upon the solid foundation of v0.1.0 with focus on expanding provider ecosystem support, improving reliability and performance through architectural changes, and enterprise-grade authentication support for Azure OpenAI.

✨ New Features

Azure OpenAI Integration

Full Azure OpenAI Support

Complete integration with Azure OpenAI services with request/response transformation for the unified OpenAI compatiple completions API.

Upstream Authentication for Azure Enterprise Integration

Support for accessing Azure via OIDC tokens and Entra ID for enterprise-grade authentication for secure and compliant upstream authentication.

Enterprise Proxy URL Support for Azure Authentication

Enhanced Azure authentication with proxy URL configuration options for enterprise proxy support.

Flexible Token Providers

Generalized token provider architecture supporting both client secret and federated token flows

Architecture Improvements

Sidecar + UDS External Processor

Switched to sidecar deployment model with Unix Domain Sockets for improved performance and resource efficiency

Enhanced ExtProc Buffer Limits

Increased external processor buffer limits from 32 KiB to 50 MiB for larger AI requests. Users can now configure CPU and memory resource limits via filterConfig.externalProcessor.resources for better resource management.

Multiple AIGatewayRoute Support

Support for multiple AIGatewayRoute resources per gateway, removing the previous single-route limitation. This enables better organization, scalability, and management of complex routing configurations across teams.

Certificate Manager Integration

Integrated cert-manager for automated TLS certificate provisioning and rotation for the mutating webhook server that injects AI Gateway sidecar containers into Envoy Gateway pods. This enables enterprise-grade certificate management, eliminating manual certificate handling and improving security.

Cross-Backend Failover and Retry

Provider Fallback Logic

Priority-based failover system that automatically routes traffic to lower priority AI providers as higher priority endpoints become unhealthy, ensuring high availability and fault tolerance.

Backend Retry Support

Configurable retry policies for improved reliability and resilience against AI provider transient failures. Features include exponential backoff with jitter, configurable retry triggers (5xx errors, connection failures, rate limiting), customizable retry counts and timeouts, and integration with Envoy Gateway's BackendTrafficPolicy.

Weight-Based Routing

Enhanced backend routing with weighted traffic distribution, enabling gradual rollouts, cost optimization, and A/B testing across multiple AI providers

Enhanced CLI Tools

aigw run Command

New CLI command for local development and testing of Envoy AI Gateway resources.

Configuration Translation

aigw translate for translating Envoy AI Gateway Resources to Envoy Gateway and Kubernetes CRDs.

🔗 API Updates

  • AIGatewayRoute Metadata: Added ownedBy and createdAt fields for better resource tracking.
  • Backend Configuration: Moved Backend configuration back to RouteRule for improved flexibility.
  • OIDC Field Types: Specific typing for OIDC-related configuration fields.
  • Weight Type Changes: Updated Weight field type to match Gateway API specifications.

Deprecations

  • AIServiceBackend.Timeouts: Deprecated in favor of more granular timeout configuration.

🐛 Bug Fixes

  • ExtProc Image Syncing: Fixed issue where external processor image wouldn't sync properly.
  • Router Weight Validation: Fixed negative weight validation in routing logic.
  • Content Body Handling: Fixed empty content body issues causing AWS validation errors.
  • First Match Routing: Fixed router logic to ensure first match wins as expected.

⚠️ Breaking Changes

  • Sidecar Architecture: The switch to sidecar + UDS model may require configuration updates for existing deployments.
  • API Field Changes: Some API fields have been moved or renamed - see migration guide for details. Please review the migration guide for details.
  • Timeout Configuration: Deprecated timeout fields require migration to new configuration format.
  • Routing to Kubernetes Services: Routing to Kubernetes services is not supported in Envoy AI Gateway v0.2.0. This is a known limitation and will be addressed in a future release.

📖 Upgrade Guidance

For users upgrading from v0.1.x to v0.2.0:

  1. Review usage of any deprecated API fields (particularly AIServiceBackend.Timeouts).
  2. Update deployment configurations if using custom replica configurations - the replicas field in AIGatewayFilterConfigExternalProcessor is now deprecated due to the new sidecar architecture.
  3. Remove routing to Kubernetes services - currently, Envoy AI Gateway does not support routing to Kubernetes services. This is a known limitation and will be addressed in a future release.

📦 Dependencies Versions

Go 1.24.2

Updated to latest Go version for improved performance and security.

Envoy Gateway v1.4

Built on Envoy Gateway for proven data plane capabilities.

Envoy v1.34

Leveraging Envoy Proxy's battle-tested networking capabilities.

Gateway API v1.3

Support for latest Gateway API specifications.

⏩ Patch Releases

info

Future v0.2.x patch releases will be added here as they become available.

🙏 Acknowledgements

This release represents the collaborative effort of our growing community. Special thanks to contributors from Tetrate, Bloomberg, Google, and our independent contributors who made this release possible through their code contributions, testing, feedback, and community participation.

There are those who engage in conversations, provide feedback, and contribute to the project in other ways than code, and we appreciate them greatly. Ideas, suggestions, and feedback are always welcome.

🔮 What's Next (beyond v0.2)

We're already working on exciting features:

  • Google Gemini & Vertex Integration
  • Anthropic Integration
  • Support for the Gateway API Inference Extension
  • Endpoint picker support for Pod routing
  • What else do you want to see? Get involved and open an issue and let us know!