Envoy AI Gateway v0.2.x
v0.2.0
✨ New Features
Azure OpenAI Integration
Complete integration with Azure OpenAI services with request/response transformation for the unified OpenAI compatiple completions API.
Support for accessing Azure via OIDC tokens and Entra ID for enterprise-grade authentication for secure and compliant upstream authentication.
Enhanced Azure authentication with proxy URL configuration options for enterprise proxy support.
Generalized token provider architecture supporting both client secret and federated token flows
Architecture Improvements
Switched to sidecar deployment model with Unix Domain Sockets for improved performance and resource efficiency
Increased external processor buffer limits from 32 KiB to 50 MiB for larger AI requests. Users can now configure CPU and memory resource limits via filterConfig.externalProcessor.resources
for better resource management.
AIGatewayRoute
SupportSupport for multiple AIGatewayRoute
resources per gateway, removing the previous single-route limitation. This enables better organization, scalability, and management of complex routing configurations across teams.
Integrated cert-manager for automated TLS certificate provisioning and rotation for the mutating webhook server that injects AI Gateway sidecar containers into Envoy Gateway pods. This enables enterprise-grade certificate management, eliminating manual certificate handling and improving security.
Cross-Backend Failover and Retry
Priority-based failover system that automatically routes traffic to lower priority AI providers as higher priority endpoints become unhealthy, ensuring high availability and fault tolerance.
Configurable retry policies for improved reliability and resilience against AI provider transient failures. Features include exponential backoff with jitter, configurable retry triggers (5xx errors, connection failures, rate limiting), customizable retry counts and timeouts, and integration with Envoy Gateway's BackendTrafficPolicy
.
Enhanced backend routing with weighted traffic distribution, enabling gradual rollouts, cost optimization, and A/B testing across multiple AI providers
Enhanced CLI Tools
aigw run
CommandNew CLI command for local development and testing of Envoy AI Gateway resources.
aigw translate
for translating Envoy AI Gateway Resources to Envoy Gateway and Kubernetes CRDs.
🔗 API Updates
- AIGatewayRoute Metadata: Added
ownedBy
andcreatedAt
fields for better resource tracking. - Backend Configuration: Moved
Backend
configuration back toRouteRule
for improved flexibility. - OIDC Field Types: Specific typing for OIDC-related configuration fields.
- Weight Type Changes: Updated Weight field type to match Gateway API specifications.
Deprecations
AIServiceBackend.Timeouts
: Deprecated in favor of more granular timeout configuration.
🐛 Bug Fixes
- ExtProc Image Syncing: Fixed issue where external processor image wouldn't sync properly.
- Router Weight Validation: Fixed negative weight validation in routing logic.
- Content Body Handling: Fixed empty content body issues causing AWS validation errors.
- First Match Routing: Fixed router logic to ensure first match wins as expected.
⚠️ Breaking Changes
- Sidecar Architecture: The switch to sidecar + UDS model may require configuration updates for existing deployments.
- API Field Changes: Some API fields have been moved or renamed - see migration guide for details. Please review the migration guide for details.
- Timeout Configuration: Deprecated timeout fields require migration to new configuration format.
- Routing to Kubernetes Services: Routing to Kubernetes services is not supported in Envoy AI Gateway v0.2.0. This is a known limitation and will be addressed in a future release.
📖 Upgrade Guidance
For users upgrading from v0.1.x to v0.2.0:
- Review usage of any deprecated API fields (particularly
AIServiceBackend.Timeouts
). - Update deployment configurations if using custom replica configurations - the
replicas
field inAIGatewayFilterConfigExternalProcessor
is now deprecated due to the new sidecar architecture. - Remove routing to Kubernetes services - currently, Envoy AI Gateway does not support routing to Kubernetes services. This is a known limitation and will be addressed in a future release.
📦 Dependencies Versions
Updated to latest Go version for improved performance and security.
Built on Envoy Gateway for proven data plane capabilities.
Leveraging Envoy Proxy's battle-tested networking capabilities.
Support for latest Gateway API specifications.
⏩ Patch Releases
Future v0.2.x patch releases will be added here as they become available.
🙏 Acknowledgements
This release represents the collaborative effort of our growing community. Special thanks to contributors from Tetrate, Bloomberg, Google, and our independent contributors who made this release possible through their code contributions, testing, feedback, and community participation.
There are those who engage in conversations, provide feedback, and contribute to the project in other ways than code, and we appreciate them greatly. Ideas, suggestions, and feedback are always welcome.
🔮 What's Next (beyond v0.2)
We're already working on exciting features:
- Google Gemini & Vertex Integration
- Anthropic Integration
- Support for the Gateway API Inference Extension
- Endpoint picker support for Pod routing
- What else do you want to see? Get involved and open an issue and let us know!