Version: latest

Data Plane and Traffic Flow

The data plane handles the actual request traffic, with the External Processor (ExtProc) playing a central role in managing AI-specific processing.

Components

The data plane consists of several key components:

The core proxy that handles all incoming traffic and integrates with:

A specialized extension service of Envoy Proxy that handles AI-specific processing needs. It performs three main functions:

Request Processing
- Routes requests to appropriate AI providers
- Handles model selection and validation
- Manages provider-specific authentication
- Supports different API formats (OpenAI, AWS Bedrock)
Token Management
- Tracks token usage from AI providers
- Handles both streaming and non-streaming responses
- Provides usage data for rate limiting decisions
Provider Integration
- Transforms requests between different AI provider formats
- Normalizes responses to a consistent format
- Manages provider-specific requirements

Handles token-based rate limiting by:

The data plane processes requests through several key steps:

Routing: Calculates the destination AI provider based on:
- Request path
- Headers
- Model name extracted from the request body
Request Transformation: Prepares the request for the provider:
- Request body transformation
- Request path modification
- Format adaptation
Upstream Authorization: Handles provider authentication:
- API key management
- Header modifications
- Authentication token handling
Token Rate Limiting Check: Checks the request against the Rate Limit Service:
- Validates token usage
- Enforces rate limits based on configured budgets

Response Transformation:
- Transforms provider response for client compatibility
- Normalizes response format
- Handles streaming responses
Token Usage Management:
- Extracts token usage from responses
- Calculates usage based on configuration
- Stores usage in per-request dynamic metadata
- Enables rate limiting based on token consumption

Why the External Processor is separated into two phases (Router-level and Upstream-level):
- In Envoy, retry/fallback happens after the router filter at the upstream level. For example, when the upstream server returns 5xx, Envoy does not invoke the router level filter again. Instead, it invokes only the upstream level filters. In our case, retry/fallback will make the requests to totally different AI providers. For example, on the first try, it goes to OpenAI, and on the second try, it goes to AWS Bedrock. In this case, we need to do different request transformations and upstream authorizations. So, this logic needs to be in the upstream level filter.
Why the External Processor?
- The External Processor is the most powerful battle-tested and production-ready extension point in Envoy. It allows us to implement complex logic without modifying Envoy's core codebase.
- Dynamic Modules could be a future alternative as it offers better performance as well as less complexity in the overall architecture. The work is tracked in envoyproxy/ai-gateway#90.

To learn more: