[Proposal] Streaming Policy Architecture for Policy Engine #706

renuka-fernando · 2026-01-18T10:52:59Z

renuka-fernando
Jan 18, 2026
Collaborator

1. Summary

This RFC proposes adding streaming policy support to the Policy Engine, enabling chunk-by-chunk processing of LLM responses. This allows AI guardrail policies (PII masking, content moderation, etc.) to process streaming responses from LLM providers while preserving the real-time streaming experience for end users.

2. Motivation

2.1. Primary Use Case: Real-Time LLM Streaming Responses

LLM providers (OpenAI, Azure OpenAI, Anthropic, etc.) return responses via SSE (Server-Sent Events) or HTTP chunked transfer encoding. The LLM generates tokens incrementally, and these tokens are streamed to the client in real-time.

Without streaming policy support:

User sends prompt → Gateway buffers entire LLM response → User waits... waits... → Sudden complete response appears

With streaming policy support:

User sends prompt → LLM generates tokens → Gateway processes & forwards each chunk → User sees text appearing continuously

The streaming experience is critical for AI applications:

Real-time UX: Users see the response being "typed out" as the LLM generates it
Perceived responsiveness: Even if total response time is the same, streaming feels faster
Early termination: Users can stop generation mid-stream if the response isn't what they wanted
AI Guardrails in real-time: Policies like PII masking process chunks as they arrive, masking sensitive data before it reaches the client - without breaking the streaming experience

2.2. Secondary Considerations

Memory efficiency: Not buffering complete responses reduces memory pressure
Mixed processing modes: Some policies need buffered request (validate complete prompt) but streaming response (forward LLM output)

3. Current State

The existing Policy interface assumes buffered body processing:

type Policy interface {
    Mode() ProcessingMode
    OnRequest(ctx *RequestContext, params map[string]interface{}) RequestAction
    OnResponse(ctx *ResponseContext, params map[string]interface{}) ResponseAction
}

When BodyModeBuffer is set, the kernel waits for the complete body before invoking OnRequest or OnResponse.

4. Proposed Design Options

This section presents two interface design options. Option 2 (Independent Composable Interfaces) is recommended.

4.1. Option 1: Split by Buffered vs Streaming (with Mode function)

This option uses four interfaces split by processing mode, each with a Mode() function.

4.1.1. Option 1A: Four Core Interfaces

// Buffered interfaces
type RequestPolicy interface {
    RequestMode() RequestProcessingMode
    OnRequest(ctx *RequestContext, params map[string]interface{}) RequestAction
}

type ResponsePolicy interface {
    ResponseMode() ResponseProcessingMode
    OnResponse(ctx *ResponseContext, params map[string]interface{}) ResponseAction
}

// Streaming interfaces
type StreamingRequestPolicy interface {
    RequestMode() RequestProcessingMode
    OnRequestHeaders(ctx *RequestContext, params map[string]interface{}) RequestAction
    OnRequestBodyChunk(ctx *RequestContext, chunk *Chunk, params map[string]interface{}) RequestAction
}

type StreamingResponsePolicy interface {
    ResponseMode() ResponseProcessingMode
    OnResponseHeaders(ctx *ResponseContext, params map[string]interface{}) ResponseAction
    OnResponseBodyChunk(ctx *ResponseContext, chunk *Chunk, params map[string]interface{}) ResponseAction
}

type RequestProcessingMode struct {
    ProcessHeaders bool
    ProcessBody    bool
}

Drawbacks:

Header-only policies must implement all 4 interfaces with no-op body methods
Requires Mode() function (redundant with interface choice)
Less composable

4.1.2. Option 1B: With Separate HeaderOnlyPolicy Interface

Add dedicated interfaces for header-only processing:

type RequestHeaderOnlyPolicy interface {
    OnRequestHeaders(ctx *RequestContext, params map[string]interface{}) RequestAction
}

type ResponseHeaderOnlyPolicy interface {
    OnResponseHeaders(ctx *ResponseContext, params map[string]interface{}) ResponseAction
}

// Plus the buffered and streaming body interfaces from Option 1A

Drawbacks:

Still requires Mode() function for body policies
Header method name differs from streaming (OnRequestHeaders vs embedded in streaming)
Migration from header-only to body policy requires interface change

4.2. Option 2: Independent Composable Interfaces (Recommended)

Each interface is independent - no embedding, no Mode() function. Policies implement whichever combination they need:

// =============================================================================
// HEADER INTERFACES
// =============================================================================

type RequestHeaderPolicy interface {
    OnRequestHeaders(ctx *RequestContext, params map[string]interface{}) RequestAction
}

type ResponseHeaderPolicy interface {
    OnResponseHeaders(ctx *ResponseContext, params map[string]interface{}) ResponseAction
}

// =============================================================================
// BUFFERED BODY INTERFACES
// =============================================================================

type RequestBodyPolicy interface {
    OnRequestBody(ctx *RequestContext, params map[string]interface{}) RequestAction
}

type ResponseBodyPolicy interface {
    OnResponseBody(ctx *ResponseContext, params map[string]interface{}) ResponseAction
}

// =============================================================================
// STREAMING BODY INTERFACES
// =============================================================================

// Chunk represents a single chunk in a streaming body.
// Unlike Body which represents a complete body, Chunk has explicit streaming semantics.
type Chunk struct {
    Content     []byte // The chunk content (HTTP-level data, e.g., SSE frames for LLM responses)
    EndOfStream bool   // True if this is the final chunk
    Index       int    // Chunk sequence number (0-indexed)
}

type StreamingRequestBodyPolicy interface {
    OnRequestBodyChunk(ctx *RequestContext, chunk *Chunk, params map[string]interface{}) RequestAction
}

type StreamingResponseBodyPolicy interface {
    OnResponseBodyChunk(ctx *ResponseContext, chunk *Chunk, params map[string]interface{}) ResponseAction
}

4.3. Interface IS the Mode Declaration

No Mode() function needed. The kernel determines processing mode by checking which interfaces the policy implements:

func determineRequestMode(policy interface{}) (needsHeaders, needsBody, streaming bool) {
    _, needsHeaders = policy.(RequestHeaderPolicy)
    _, needsBufferedBody := policy.(RequestBodyPolicy)
    _, needsStreamingBody := policy.(StreamingRequestBodyPolicy)

    needsBody = needsBufferedBody || needsStreamingBody
    streaming = needsStreamingBody
    return
}

Policy implements	Headers?	Body?	Mode
`RequestHeaderPolicy` only	Yes	No	Headers only
`RequestBodyPolicy` only	No	Yes	Buffered body
`RequestHeaderPolicy` + `RequestBodyPolicy`	Yes	Yes	Headers + Buffered body
`RequestHeaderPolicy` + `StreamingRequestBodyPolicy`	Yes	Yes	Headers + Streaming body
`RequestBodyPolicy` + `StreamingRequestBodyPolicy`	No	Yes	Dual-support (both modes)

4.4. Policy Composition Examples

Policy	Request Interfaces	Response Interfaces
ModifyHeaders	`RequestHeaderPolicy`	`ResponseHeaderPolicy`
JsonToXml	`RequestHeaderPolicy` + `RequestBodyPolicy`	`ResponseHeaderPolicy` + `ResponseBodyPolicy`
PIIMasking	`RequestBodyPolicy`	`ResponseBodyPolicy` + `StreamingResponseBodyPolicy`
VirusScan	`RequestHeaderPolicy` + `StreamingRequestBodyPolicy`	`StreamingResponseBodyPolicy`

4.5. Option 2 Benefits

Fully composable - implement only what you need
No Mode() function - interface IS the declaration
Header-only policies: just 2 interfaces, no body boilerplate
Body-only policies: just 2 interfaces, no header boilerplate
Dual-support: implement both body interfaces for buffered/streaming compatibility
Clear separation: headers and body are independent concerns

4.6. Comparison of Options

Aspect	Option 1A	Option 1B	Option 2 (Recommended)
Header-only policy	4 interfaces, no-op methods	2 interfaces	2 interfaces
Body-only policy	Possible (via Mode())	Possible (via Mode())	2 interfaces
Mode() function	Required	Required	Not needed
Composability	Low (monolithic interfaces)	Medium (header-only separate)	High (mix-and-match)
Kernel complexity	4 interface types	6 interface types	6 interface types
Method naming	Inconsistent (`OnRequest` vs `OnRequestHeaders`)	Inconsistent (`OnRequest` vs `OnRequestHeaders`)	Consistent (`On` + `Request/Response` + `Headers/Body/BodyChunk`)

Why Option 2 is recommended:

No Mode() function - interface IS the declaration
Fully composable - implement only what you need
Consistent method naming across all policy types
Supports body-only policies (no header processing needed)

5. Sample Policy Implementations

5.1. ModifyHeaders Policy

Use Case: Add, remove, or modify HTTP headers without touching the body.

Interfaces: RequestHeaderPolicy + ResponseHeaderPolicy (header-only, works in any route)

package modifyheaders

import policy "github.com/wso2/api-platform/sdk/gateway/policy/v1alpha"

type ModifyHeadersPolicy struct {
    headersToAdd    map[string]string
    headersToRemove []string
}

// =============================================================================
// Implements: RequestHeaderPolicy + ResponseHeaderPolicy
// =============================================================================

func (p *ModifyHeadersPolicy) OnRequestHeaders(ctx *policy.RequestContext, params map[string]interface{}) policy.RequestAction {
    ops := make([]policy.HeaderOperation, 0)
    for key, value := range p.headersToAdd {
        ops = append(ops, policy.HeaderOperation{Operation: policy.HeaderOpSet, Name: key, Value: value})
    }
    for _, key := range p.headersToRemove {
        ops = append(ops, policy.HeaderOperation{Operation: policy.HeaderOpRemove, Name: key})
    }
    return policy.UpstreamRequestModifications{HeaderOperations: ops}
}

func (p *ModifyHeadersPolicy) OnResponseHeaders(ctx *policy.ResponseContext, params map[string]interface{}) policy.ResponseAction {
    ops := make([]policy.HeaderOperation, 0)
    for key, value := range p.headersToAdd {
        ops = append(ops, policy.HeaderOperation{Operation: policy.HeaderOpSet, Name: key, Value: value})
    }
    return policy.UpstreamResponseModifications{HeaderOperations: ops}
}

Note

Clean header-only implementation! With the composable interface design, this policy only implements 2 interfaces and 2 methods. No body-related boilerplate needed.

5.2. JsonToXml Policy

Use Case: Transform JSON request/response bodies to XML format.

Interfaces: RequestHeaderPolicy + RequestBodyPolicy + ResponseHeaderPolicy + ResponseBodyPolicy (buffered only)

package jsontoxml

import (
    "encoding/json"
    "encoding/xml"

    policy "github.com/wso2/api-platform/sdk/gateway/policy/v1alpha"
)

type JsonToXmlPolicy struct{}

// =============================================================================
// Implements: RequestHeaderPolicy + RequestBodyPolicy
// =============================================================================

func (p *JsonToXmlPolicy) OnRequestHeaders(ctx *policy.RequestContext, params map[string]interface{}) policy.RequestAction {
    // Update Content-Type header
    return policy.UpstreamRequestModifications{
        HeaderOperations: []policy.HeaderOperation{
            {Operation: policy.HeaderOpSet, Name: "Content-Type", Value: "application/xml"},
        },
    }
}

func (p *JsonToXmlPolicy) OnRequestBody(ctx *policy.RequestContext, params map[string]interface{}) policy.RequestAction {
    if ctx.Body == nil || !ctx.Body.Present {
        return nil
    }

    xmlBody, err := p.convertJsonToXml(ctx.Body.Content)
    if err != nil {
        return policy.ImmediateResponse{
            StatusCode: 400,
            Body:       []byte(`{"error": "Invalid JSON in request body"}`),
        }
    }

    return policy.UpstreamRequestModifications{Body: xmlBody}
}

// =============================================================================
// Implements: ResponseHeaderPolicy + ResponseBodyPolicy
// =============================================================================

func (p *JsonToXmlPolicy) OnResponseHeaders(ctx *policy.ResponseContext, params map[string]interface{}) policy.ResponseAction {
    return policy.UpstreamResponseModifications{
        HeaderOperations: []policy.HeaderOperation{
            {Operation: policy.HeaderOpSet, Name: "Content-Type", Value: "application/xml"},
        },
    }
}

func (p *JsonToXmlPolicy) OnResponseBody(ctx *policy.ResponseContext, params map[string]interface{}) policy.ResponseAction {
    if ctx.ResponseBody == nil || !ctx.ResponseBody.Present {
        return nil
    }

    xmlBody, err := p.convertJsonToXml(ctx.ResponseBody.Content)
    if err != nil {
        return nil // Pass through if conversion fails
    }

    return policy.UpstreamResponseModifications{Body: xmlBody}
}

func (p *JsonToXmlPolicy) convertJsonToXml(jsonData []byte) ([]byte, error) {
    var data interface{}
    if err := json.Unmarshal(jsonData, &data); err != nil {
        return nil, err
    }
    return xml.Marshal(data)
}

// NOTE: This policy does NOT implement StreamingRequestBodyPolicy or StreamingResponseBodyPolicy
// because JSON-to-XML transformation requires the complete body to parse the JSON structure.

Why buffered only? JSON parsing requires the complete document structure. You cannot transform {"name": "Jo to XML until you have the complete JSON object.

Buffering Behavior: When this policy is in a route, the Router (Envoy) buffers the body. If the payload exceeds the configured limit (e.g., 10MB), Router returns HTTP 413 Payload Too Large.

5.3. PII Masking Policy (AI Guardrail)

Use Case: Scan request prompts and LLM responses for PII (emails, phone numbers, SSN, etc.) and mask them before they reach the client.

Interfaces:

Request: RequestBodyPolicy (buffered) - need complete prompt to validate
Response: ResponseBodyPolicy + StreamingResponseBodyPolicy (dual-support)

Why streaming matters here: When a user asks an LLM "What's John's email?", the LLM might respond with "John's email is [email protected]". With streaming:

Without PII policy: User sees "John's email is [email protected]" appearing token by token
With buffered PII policy: User waits for complete response, then sees masked version all at once (poor UX)
With streaming PII policy: User sees "John's email is REDACTED" appearing token by token (preserves streaming UX)

package piimasking

import (
    "regexp"

    policy "github.com/wso2/api-platform/sdk/gateway/policy/v1alpha"
)

type PIIMaskingPolicy struct {
    patterns []*regexp.Regexp
    maskChar string
}

func NewPIIMaskingPolicy(params map[string]interface{}) *PIIMaskingPolicy {
    return &PIIMaskingPolicy{
        patterns: []*regexp.Regexp{
            regexp.MustCompile(`\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b`), // Email
            regexp.MustCompile(`\b\d{3}[-.]?\d{3}[-.]?\d{4}\b`),                        // Phone
            regexp.MustCompile(`\b\d{3}[-]?\d{2}[-]?\d{4}\b`),                          // SSN
        },
        maskChar: "***REDACTED***",
    }
}

// =============================================================================
// Implements: RequestBodyPolicy (buffered request)
// =============================================================================

func (p *PIIMaskingPolicy) OnRequestBody(ctx *policy.RequestContext, params map[string]interface{}) policy.RequestAction {
    if ctx.Body == nil || !ctx.Body.Present {
        return nil
    }

    maskedBody := p.maskPII(ctx.Body.Content)

    if string(maskedBody) != string(ctx.Body.Content) {
        ctx.SetMetadata("pii_detected_in_request", true)
    }

    return policy.UpstreamRequestModifications{Body: maskedBody}
}

// =============================================================================
// Implements: ResponseBodyPolicy (buffered response - for compatibility)
// =============================================================================

func (p *PIIMaskingPolicy) OnResponseBody(ctx *policy.ResponseContext, params map[string]interface{}) policy.ResponseAction {
    if ctx.ResponseBody == nil || !ctx.ResponseBody.Present {
        return nil
    }

    maskedBody := p.maskPII(ctx.ResponseBody.Content)
    return policy.UpstreamResponseModifications{Body: maskedBody}
}

// =============================================================================
// Implements: StreamingResponseBodyPolicy (streaming response - preferred for AI)
// =============================================================================

func (p *PIIMaskingPolicy) OnResponseBodyChunk(ctx *policy.ResponseContext, chunk *policy.Chunk, params map[string]interface{}) policy.ResponseAction {
    if chunk.Content == nil || len(chunk.Content) == 0 {
        return nil
    }

    maskedChunk := p.maskPII(chunk.Content)

    if string(maskedChunk) != string(chunk.Content) {
        ctx.SetMetadata("pii_detected_in_response", true)
    }

    return policy.UpstreamResponseModifications{Body: maskedChunk}
}

// =============================================================================
// Shared Logic
// =============================================================================

func (p *PIIMaskingPolicy) maskPII(content []byte) []byte {
    result := content
    for _, pattern := range p.patterns {
        result = pattern.ReplaceAll(result, []byte(p.maskChar))
    }
    return result
}

Why this combination?

Request (Buffered): Scan COMPLETE prompt for PII before sending to LLM
Response (Dual-support): Implements both ResponseBodyPolicy and StreamingResponseBodyPolicy to work in any route configuration

Dual-support benefit: When paired with JsonToXml (buffered-only), uses OnResponseBody. When in streaming route, uses OnResponseBodyChunk.

5.4. Virus Scan Policy

Use Case: Scan uploaded files for malware using streaming virus scanner (e.g., ClamAV). This is a non-AI use case where streaming is required due to large file sizes.

Interfaces: RequestHeaderPolicy + StreamingRequestBodyPolicy + StreamingResponseBodyPolicy (streaming only)

Note: This policy differs from AI guardrail policies. AI policies stream because LLMs produce streaming responses (SSE/chunked). Virus scan streams because files can be very large (100MB+) and cannot be buffered in memory.

package virusscan

import (
    policy "github.com/wso2/api-platform/sdk/gateway/policy/v1alpha"
)

type VirusScanPolicy struct {
    scanner StreamingScanner
}

type StreamingScanner interface {
    StartScan() (ScanSession, error)
}

type ScanSession interface {
    WriteChunk(data []byte) error
    Finalize() (ScanResult, error)
}

type ScanResult struct {
    Clean      bool
    ThreatName string
}

// =============================================================================
// Implements: RequestHeaderPolicy (initialize scan session)
// =============================================================================

func (p *VirusScanPolicy) OnRequestHeaders(ctx *policy.RequestContext, params map[string]interface{}) policy.RequestAction {
    session, err := p.scanner.StartScan()
    if err != nil {
        return policy.ImmediateResponse{
            StatusCode: 503,
            Body:       []byte(`{"error": "Virus scanner unavailable"}`),
        }
    }

    ctx.SetMetadata("virus_scan_session", session)
    return nil
}

// =============================================================================
// Implements: StreamingRequestBodyPolicy (scan chunks)
// =============================================================================

func (p *VirusScanPolicy) OnRequestBodyChunk(ctx *policy.RequestContext, chunk *policy.Chunk, params map[string]interface{}) policy.RequestAction {
    session, ok := ctx.GetMetadata("virus_scan_session").(ScanSession)
    if !ok {
        return policy.ImmediateResponse{
            StatusCode: 500,
            Body:       []byte(`{"error": "Scan session not initialized"}`),
        }
    }

    if err := session.WriteChunk(chunk.Content); err != nil {
        return policy.ImmediateResponse{
            StatusCode: 500,
            Body:       []byte(`{"error": "Scan error"}`),
        }
    }

    if chunk.EndOfStream {
        result, err := session.Finalize()
        if err != nil {
            return policy.ImmediateResponse{
                StatusCode: 500,
                Body:       []byte(`{"error": "Scan finalization error"}`),
            }
        }

        if !result.Clean {
            return policy.ImmediateResponse{
                StatusCode: 403,
                Body:       []byte(`{"error": "Malware detected", "threat": "` + result.ThreatName + `"}`),
            }
        }
    }

    return nil // Pass through chunk
}

// =============================================================================
// Implements: StreamingResponseBodyPolicy (scan response chunks)
// =============================================================================

func (p *VirusScanPolicy) OnResponseBodyChunk(ctx *policy.ResponseContext, chunk *policy.Chunk, params map[string]interface{}) policy.ResponseAction {
    // Similar chunk processing for response scanning
    return nil
}

// NOTE: This policy does NOT implement RequestBodyPolicy or ResponseBodyPolicy (buffered)
// because buffering large files (100MB+) in memory is not feasible.

Why streaming only? Virus scanning large files (100MB+) cannot buffer the entire file in memory. The scanner processes chunks as they arrive, maintaining internal state, and delivers verdict on the final chunk.

6. Route Compatibility Matrix

Policy	Request Header	Request Body	Response Header	Response Body	Streaming Response Body
ModifyHeaders	✅	-	✅	-	-
JsonToXml	✅	✅ (buffered)	✅	✅ (buffered)	❌
PIIMasking	-	✅ (buffered)	-	✅ (buffered)	✅
VirusScan	✅	✅ (streaming)	-	-	✅

6.1. Compatibility Rules

If ANY policy implements ONLY RequestBodyPolicy (not StreamingRequestBodyPolicy), the request path uses buffered mode
If ALL policies with body processing implement StreamingRequestBodyPolicy, the request path uses streaming mode
Header-only policies (RequestHeaderPolicy) are compatible with both modes

The same rules apply to the response path with corresponding response interfaces.

6.1.1. Mode Determination Priority

When determining the processing mode for a route, the following priority rules apply:

If any policy is streaming-only (implements only StreamingRequestBodyPolicy, not RequestBodyPolicy): Route uses streaming mode. If another policy is buffered-only, the combination is incompatible.
If any policy is buffered-only (implements only RequestBodyPolicy, not StreamingRequestBodyPolicy): Route uses buffered mode. If another policy is streaming-only, the combination is incompatible.
If all body-processing policies are dual-support (implement both RequestBodyPolicy AND StreamingRequestBodyPolicy): Route defaults to streaming mode (preferred for real-time UX, especially in AI Gateway scenarios).
Header-only policies do not affect mode determination.

6.2. Example Combinations

Route Policies	Request Mode	Response Mode	Valid?
ModifyHeaders + JsonToXml	Buffered	Buffered	✅
ModifyHeaders + PIIMasking	Buffered	Streaming	✅
ModifyHeaders + VirusScan	Streaming	Streaming	✅
JsonToXml + PIIMasking	Buffered	Buffered	✅ (PIIMasking has buffered fallback)
JsonToXml + VirusScan	❌	❌	❌ Incompatible
PIIMasking + VirusScan	❌	Streaming	❌ Incompatible (request path conflict)

Note

Why not allow VirusScan in buffered mode when payload is small?

Processing mode is determined at configuration time based on interface implementations, not at runtime based on payload size. When PIIMasking requires buffered mode, the kernel calls OnRequestBody() - but VirusScan only implements OnRequestBodyChunk(). The kernel has no method to call.

If the VirusScan policy author wants compatibility with buffered routes, they can implement both RequestBodyPolicy AND StreamingRequestBodyPolicy (dual-support pattern). The incompatibility exists because the policy author chose streaming-only - the Gateway Controller cannot make a streaming-only policy work in buffered mode.

7. Policy Compatibility Validation

7.1. The Problem

Policy compatibility must be validated at design time, not deployment time:

❌ Bad UX:
1. User creates API in Management Portal with json-to-xml + virus-scan policies
2. Platform API accepts it
3. User clicks "Deploy to Gateway"
4. Gateway Controller rejects: "Incompatible policies"
5. User has to go back and reconfigure

✅ Good UX:
1. User adds json-to-xml policy to a route
2. User tries to add virus-scan policy
3. Platform API immediately shows: "These policies are incompatible"
4. User chooses compatible policies before deployment

7.2. Architecture: Policy Definition as Source of Truth

The Policy Definition (with its supports metadata) is the source of truth for compatibility validation. Policy Hub is where built-in policies reside, but users can also add custom policies with their own Policy Definitions.

flowchart TB
    PH[Policy Hub<br/>built-in policies]
    CP[Custom Policies<br/>user-defined]

    PH --> PD[Policy Definition<br/>supports metadata]
    CP --> PD

    PD --> PA[Platform API<br/>design-time validation]
    PD --> GC[Gateway Controller<br/>deploy-time validation]
    PD --> PE[Policy Engine<br/>runtime]

7.3. Policy Definition

Each policy definition includes supports metadata for compatibility validation:

# Policy Hub: json-to-xml/v1.0.0
apiVersion: policyhub.wso2.com/v1
kind: PolicyDefinition
metadata:
  name: json-to-xml
  version: 1.0.0
  displayName: JSON to XML Transformer

spec:
  supports:
    request:
      buffered: true
      streaming: false
    response:
      buffered: true
      streaming: false

# Policy Hub: virus-scan/v1.0.0
apiVersion: policyhub.wso2.com/v1
kind: PolicyDefinition
metadata:
  name: virus-scan
  version: 1.0.0

spec:
  supports:
    request:
      buffered: false
      streaming: true
    response:
      buffered: false
      streaming: true

# Policy Hub: pii-masking/v1.0.0
apiVersion: policyhub.wso2.com/v1
kind: PolicyDefinition
metadata:
  name: pii-masking
  version: 1.0.0

spec:
  supports:
    request:
      buffered: true
      streaming: false
    response:
      buffered: true      # Fallback for buffered routes
      streaming: true     # Preferred for AI Gateway

# Policy Hub: modify-headers/v1.0.0
apiVersion: policyhub.wso2.com/v1
kind: PolicyDefinition
metadata:
  name: modify-headers
  version: 1.0.0

spec:
  supports:
    request:
      buffered: true
      streaming: true     # Works in both modes
    response:
      buffered: true
      streaming: true

7.4. Validation at Each Layer

Layer	When	Source	Action
Gateway Builder	Policy compilation	Policy code + Policy Definition	Validate that implemented interfaces match YAML `supports` declarations
Platform API	User adds policy to an API	Policy Definition	Block incompatible combinations immediately
Gateway Controller	API deployment	Policy Definition	Defense-in-depth validation
Policy Engine	Runtime	Compiled policies	Final sanity check

Important

Gateway Builder Validation: The Gateway Builder performs build-time validation to ensure that the Policy Definition YAML accurately reflects the interfaces implemented by the policy code. For example, if a policy's YAML declares supports.response.streaming: true, the builder verifies that the policy implements StreamingResponseBodyPolicy. This prevents mismatches between declared capabilities and actual implementation.

7.5. Platform API Validation Logic

flowchart TD
    A[User adds policy to route] --> B{Get existing policies<br/>on route}
    B --> C{For each policy pair}
    C --> D{Either policy is<br/>header-only?}
    D -->|Yes| E[✅ Compatible<br/>header-only works with any mode]
    D -->|No| F{Both support<br/>buffered?}
    F -->|Yes| G[Compatible via buffered]
    F -->|No| H{Both support<br/>streaming?}
    H -->|Yes| I[Compatible via streaming]
    H -->|No| J[❌ Incompatible]
    E --> K[✅ Allow]
    G --> K
    I --> K
    J --> L[Return error]

7.6. Error Response

{
  "error": {
    "code": "INCOMPATIBLE_POLICIES",
    "message": "Cannot add 'virus-scan' to route '/upload'",
    "details": {
      "conflict": "request",
      "existing": { "name": "json-to-xml", "supports": ["buffered"] },
      "adding": { "name": "virus-scan", "supports": ["streaming"] },
      "suggestion": "Remove 'json-to-xml' or choose a buffered-compatible policy"
    }
  }
}

8. Kernel Processing Flow

8.1. Buffered Request Path

1. Envoy sends RequestHeaders
2. Kernel calls: policy.OnRequestHeaders(ctx) for each RequestHeaderPolicy
3. Kernel signals: ProcessingMode_BUFFERED
4. Envoy buffers complete body
5. Envoy sends RequestBody (complete)
6. Kernel calls: policy.OnRequestBody(ctx) for each RequestBodyPolicy
7. Kernel sends response to Envoy

8.2. Streaming Request Path

1. Envoy sends RequestHeaders
2. Kernel signals: ProcessingMode_STREAMED
3. Kernel calls: policy.OnRequestHeaders(ctx) for each RequestHeaderPolicy
4. Envoy sends RequestBody chunk 1
5. Kernel calls: policy.OnRequestBodyChunk(ctx, chunk1) for each StreamingRequestBodyPolicy
6. Envoy sends RequestBody chunk 2
7. Kernel calls: policy.OnRequestBodyChunk(ctx, chunk2) for each StreamingRequestBodyPolicy
   ...
N. Envoy sends RequestBody final chunk (EndOfStream=true)
N+1. Kernel calls: policy.OnRequestBodyChunk(ctx, finalChunk) for each StreamingRequestBodyPolicy

8.3. Mixed Mode (Buffered Request + Streaming Response) - AI Gateway Primary Use Case

This is the most common pattern for AI Gateway: buffer the complete prompt, stream the LLM response.

Request (Buffered - validate complete prompt before sending to LLM):
1. Client sends RequestHeaders
2. Kernel calls: policy.OnRequestHeaders(ctx) for each RequestHeaderPolicy
3. Kernel signals: request=BUFFERED, response=STREAMED
4. Router buffers complete request body (the prompt)
5. Kernel calls: policy.OnRequestBody(ctx) - validate/modify prompt
6. Forward to LLM provider

Response (Streaming - preserve real-time UX):
7. LLM starts generating response via SSE/chunked transfer
8. LLM sends ResponseHeaders (Content-Type: text/event-stream)
9. Kernel calls: policy.OnResponseHeaders(ctx) for each ResponseHeaderPolicy
10. LLM sends token chunk: "Hello"
11. Kernel calls: policy.OnResponseBodyChunk(ctx, chunk) - process/mask
12. Client receives: "Hello" (immediate)
13. LLM sends token chunk: ", how"
14. Kernel calls: policy.OnResponseBodyChunk(ctx, chunk)
15. Client receives: ", how" (immediate)
    ... continues until EndOfStream

The client sees the response appearing in real-time, just as if there were no gateway in between.

9. Policies That Require Chunk Accumulation

9.1. The Problem

The Semantic Completeness Challenge

LLM streaming responses arrive as a sequence of small chunks, where each chunk typically contains only 1-2 tokens. This creates a fundamental architectural challenge for certain categories of policies.

Why Some Policies Cannot Process Individual Chunks

Consider AI guardrail policies such as content safety, toxicity detection, or hallucination filtering. These policies:

Require semantic completeness - They need complete sentences or phrases to make meaningful decisions. A toxicity detector cannot determine if "I'm going to kill" is harmful without seeing what follows ("...time at the arcade" vs "...you").
Depend on external AI services - These policies often call external LLM providers or specialized ML models for analysis. Making an API call for every 1-2 token chunk is neither practical nor cost-effective.
May transform the payload - Content safety policies might redact, rephrase, or block content entirely. The decision on how to transform depends on analyzing sufficient context.

The Policy Chain Coordination Problem

This creates a critical coordination challenge in the policy execution pipeline:

Chunk arrives (2 tokens) → Policy A (content safety) → Policy B (logging) → Policy C (metrics) → Client

When Policy A (content safety) determines it needs to buffer chunks to form a complete sentence:

Policy A cannot make a pass/fail/transform decision yet - It needs more tokens
The policy engine cannot proceed to Policy B and C - Because Policy A may ultimately transform or block this content
The chunk cannot be released to the client - The content hasn't been validated yet

This means a buffering policy effectively pauses the entire downstream pipeline for the chunks it's accumulating. The policy engine needs a mechanism for policies to signal "I need more data before I can make a decision, hold everything downstream."

Contrast with Pattern-Based Policies

This is distinct from simpler pattern-matching scenarios (like PII detection where an email is split across chunks). Pattern matching has predictable, bounded buffering needs. Semantic analysis policies have variable buffering requirements - they need "enough tokens to form a complete thought," which varies based on content.

The Architectural Question

How should the policy engine handle policies that:

Cannot process chunks individually
Need to accumulate an unknown number of chunks
May transform the accumulated content
Must block downstream processing until they complete their analysis

9.2. Proposed Solutions Analysis

This section analyzes three proposed solutions for handling chunk accumulation.

9.2.1. Solution A: Pre-Processing Check (NeedsMoreData Interface)

Concept: Add a function NeedsMoreData(accumulated []byte) bool that streaming policies can implement. Before processing each accumulated chunk cycle, the policy engine calls this function on streaming policies that implement it. Processing only begins when all policies return false (ready) or EndOfStream is reached.

// StreamingRequestAccumulator is an optional interface for request streaming policies
// that need to accumulate multiple chunks before processing.
//
// Implement this WITH StreamingRequestBodyPolicy for request-side accumulation.
// Typical use case: Prompt injection detection that needs complete sentences.
type StreamingRequestAccumulator interface {
    // NeedsMoreRequestData is called with raw accumulated bytes from the client request.
    // Returns true if the policy needs more data before it can process.
    // Returns false when the policy has enough context to make decisions.
    NeedsMoreRequestData(accumulated []byte) bool
}

// StreamingResponseAccumulator is an optional interface for response streaming policies
// that need to accumulate multiple chunks before processing.
//
// Implement this WITH StreamingResponseBodyPolicy for response-side accumulation.
// Typical use case: Content safety/toxicity detection that needs complete sentences.
type StreamingResponseAccumulator interface {
    // NeedsMoreResponseData is called with raw accumulated bytes from the upstream response.
    // Returns true if the policy needs more data before it can process.
    // Returns false when the policy has enough context to make decisions.
    NeedsMoreResponseData(accumulated []byte) bool
}

Why two separate interfaces?

A policy may support both request and response streaming with different accumulation logic for each:

Request path: accumulate until complete prompt for injection analysis
Response path: accumulate until complete sentence for toxicity check

With separate interfaces, the policy can implement different logic for each path.

Note: These are separate interfaces from the streaming body policy interfaces. Streaming policies that need accumulation implement BOTH their streaming interface AND the corresponding accumulator:

Request: StreamingRequestBodyPolicy + StreamingRequestAccumulator
Response: StreamingResponseBodyPolicy + StreamingResponseAccumulator
Both: All four interfaces

Simple streaming policies (logging, metrics, PII regex) only implement the streaming body interface without any accumulator.

Flow:

Loop:
  Chunk arrives → Policy Engine accumulates RAW bytes internally
  Call NeedsMoreData(accumulated) on policies A, B, C

  If ANY says "need more" AND NOT EndOfStream:
      → Wait for next chunk, continue loop

  If ALL say "ready" OR EndOfStream reached:
      → Process accumulated through chain: A → B → C
        (A may modify body, B receives A's output, C receives B's output)
      → Send final result to client
      → Clear accumulation buffer
      → Continue loop (if not EOS)

Key Semantics:

NeedsMoreData receives RAW upstream bytes (before any policy processing)
NeedsMoreData is only called BEFORE chain processing, never mid-chain
EndOfStream forces processing regardless of NeedsMoreData result
After chain processing completes, buffer is cleared and cycle repeats

Issue Analysis:

Issue	Status	Notes
All-or-nothing	Accepted trade-off	A waits for B; ensures all policies validate with sufficient context
Wait for slowest	Accepted trade-off	Latency bounded by policy needing most context
NeedsMoreData vs Processing mismatch	Consideration	`NeedsMoreData` receives raw bytes, but `OnResponseBodyChunk` receives A's modified output. B's "ready" decision was based on different content than what it processes.

Remaining Considerations:

Consideration	Description	Mitigation
Memory pressure	Accumulation grows until slowest policy satisfied	Configurable max buffer size; error if exceeded
Perceived latency	Client sees nothing until ALL policies ready	Expected trade-off; document clearly
Release granularity	Entire accumulated content released at once	May feel "chunky" but ensures consistency

Verdict: ✅ Viable solution with clear semantics. Trade-offs are acceptable and inherent to the problem.

9.2.2. Solution B: Static Minimum Byte/Chunk Count

Concept: Each policy declares upfront a static requirement: "I need at least X bytes or Y chunks". Policy Engine buffers until all requirements are met.

type AccumulationRequirement struct {
    MinBytes  int  // Minimum bytes before processing
    MinChunks int  // Minimum chunks before processing
}

type StreamingAccumulator interface {
    AccumulationRequirement() AccumulationRequirement
}

Issue Analysis:

Issue	Description	Severity
Semantic completeness is variable	A sentence could be 5 tokens or 500 tokens. "I'm going to kill time" vs "I'm going to kill..." - can't predict length	Critical
Content-type dependency	JSON, SSE, plain text have different boundary semantics. Static values can't adapt	High
Over/under buffering	Static values either waste resources (always buffer 1KB) or fail (sentence spans 2KB)	High
No dynamic adaptation	Can't express "I need a complete sentence" as a byte count	Critical

Verdict: ❌ Not viable for semantic analysis use cases. May work only for fixed-format protocols (e.g., "buffer until newline").

9.2.3. Solution C: Action-Based Buffering Instruction

Concept: Policy returns an action like BufferMore{} mid-chain, telling the engine to accumulate more chunks before continuing to downstream policies.

type BufferMore struct{}

func (b BufferMore) isResponseAction() {}
func (b BufferMore) StopExecution() bool { return true }

Flow:

Chunk 1 → Policy A (processes, modifies) → Policy B returns BufferMore → Engine holds
Chunk 2 → Policy A (processes, modifies) → Policy B returns BufferMore → Engine holds
Chunk 3 → Policy A (processes, modifies) → Policy B processes accumulated → Policy C → Client

Clarification: If Policy A also needs semantic context, A should implement StreamingAccumulator and the kernel would wait for both A and B before processing. The flow above assumes A is a simple streaming policy (logging, transform) that intentionally processes each chunk individually.

Issue Analysis:

Issue	Description	Severity
Mid-chain state tracking	Kernel must track which policies have processed which chunks. When B releases, kernel must skip A and send directly to C.	High
Accumulated content includes A's modifications	B's buffer contains A's modified output (e.g., uppercased). Downstream policies (C, D) receive A's modifications, which is correct but requires careful understanding.	Low

Flow Clarification:

Chunk 1 → A processes (uppercase) → "HELLO" → B buffers
Chunk 2 → A processes (uppercase) → " WORLD" → B buffers
B releases "HELLO WORLD" → skip A → C processes → Client

A does NOT re-process. Kernel tracks that A already handled these bytes.

Why This Adds Complexity:

Kernel must maintain per-policy processing state for each chunk
When B releases, kernel must know to resume at C (not restart at A)
If multiple policies buffer at different points, state tracking becomes complex

Verdict: ❌ Mid-chain buffering requires complex state tracking in the kernel. Solution A avoids this by making the buffering decision BEFORE any processing starts - no mid-chain state to track.

9.3. Solution Comparison

Aspect	Solution A (Pre-Processing Check)	Solution B (Static)	Solution C (Mid-Chain Action)
Semantic boundaries	✅ Dynamic detection	❌ Fixed values	✅ Dynamic
When decision is made	Before processing starts	Before processing starts	Mid-chain (after some policies processed)
Kernel state tracking	✅ Simple (just accumulate)	✅ Simple	❌ Complex (track per-policy processing state)
Chain processing	All policies process once	All policies process once	Earlier policies process per-chunk, later policies see accumulated
Implementation complexity	Medium	Low	High
Memory management	Configurable limits	Predictable	Requires tracking which bytes processed by which policy
EndOfStream handling	✅ Natural (force processing)	✅ Natural	✅ Natural (same state tracking as normal operation)

Recommendation: Solution A (Pre-Processing Check) is the recommended approach for policies requiring chunk accumulation.

9.4. Solution A - Detailed Design

9.4.1. Interface Definition

// StreamingRequestAccumulator is an OPTIONAL interface for request streaming policies
// that need to accumulate multiple chunks before processing.
//
// The kernel checks if a streaming request policy also implements this interface
// to determine if accumulation mode should be used for the request path.
//
// Typical use case: Prompt injection detection that needs complete sentences.
type StreamingRequestAccumulator interface {
    // NeedsMoreRequestData is called with raw accumulated bytes from the client.
    // The policy should analyze the content and return:
    //   - true: need more chunks before processing (incomplete sentence, etc.)
    //   - false: have enough context, ready to process
    //
    // This is called BEFORE any chain processing occurs.
    // The accumulated bytes are RAW client content, not modified by other policies.
    //
    // IMPORTANT: The implementation MUST be transformation-agnostic. The logic should
    // only determine if enough data has been accumulated (e.g., "do I have a complete
    // sentence?") without making assumptions about content format or transformations
    // that other policies might apply. This is because OnRequestBodyChunk will receive
    // content potentially modified by earlier policies in the chain, while this method
    // always receives raw bytes.
    //
    // The raw bytes are HTTP-level data as received from the Router (Envoy). For LLM
    // streaming responses, this typically includes SSE frames (e.g., "data: {...}\n\n").
    // Policies are responsible for parsing the wire format if needed.
    NeedsMoreRequestData(accumulated []byte) bool
}

// StreamingResponseAccumulator is an OPTIONAL interface for response streaming policies
// that need to accumulate multiple chunks before processing.
//
// The kernel checks if a streaming response policy also implements this interface
// to determine if accumulation mode should be used for the response path.
//
// Typical use case: Content safety/toxicity detection that needs complete sentences.
type StreamingResponseAccumulator interface {
    // NeedsMoreResponseData is called with raw accumulated bytes from upstream.
    // The policy should analyze the content and return:
    //   - true: need more chunks before processing (incomplete sentence, etc.)
    //   - false: have enough context, ready to process
    //
    // This is called BEFORE any chain processing occurs.
    // The accumulated bytes are RAW upstream content, not modified by other policies.
    //
    // IMPORTANT: The implementation MUST be transformation-agnostic. The logic should
    // only determine if enough data has been accumulated (e.g., "do I have a complete
    // sentence?") without making assumptions about content format or transformations
    // that other policies might apply. This is because OnResponseBodyChunk will receive
    // content potentially modified by earlier policies in the chain, while this method
    // always receives raw bytes.
    //
    // The raw bytes are HTTP-level data as received from the Router (Envoy). For LLM
    // streaming responses, this typically includes SSE frames (e.g., "data: {...}\n\n").
    // Policies are responsible for parsing the wire format if needed.
    NeedsMoreResponseData(accumulated []byte) bool
}

// Example: A content safety policy implements BOTH response interfaces
// - StreamingResponseBodyPolicy (for OnResponseBodyChunk)
// - StreamingResponseAccumulator (for NeedsMoreResponseData)
//
// Example: A prompt injection detector implements BOTH request interfaces
// - StreamingRequestBodyPolicy (for OnRequestBodyChunk)
// - StreamingRequestAccumulator (for NeedsMoreRequestData)
//
// Example: A dual-path guardrail implements ALL FOUR interfaces
// - StreamingRequestBodyPolicy + StreamingRequestAccumulator (request path)
// - StreamingResponseBodyPolicy + StreamingResponseAccumulator (response path)

9.4.2. Kernel Processing Flow

func (k *Kernel) processStreamingResponseWithAccumulation(policies []Policy) {
    var accumulator []byte

    for chunk := range incomingChunks {
        // Append raw chunk to accumulator
        accumulator = append(accumulator, chunk.Content...)

        // Check buffer limits
        if len(accumulator) > k.maxAccumulationSize {
            k.sendError(413, "Accumulation buffer exceeded")
            return
        }

        // Ask streaming policies that implement StreamingResponseAccumulator if they need more data
        // (For request path, use StreamingRequestAccumulator instead)
        needMore := false
        if !chunk.EndOfStream {
            for _, p := range policies {
                // Only check policies that implement BOTH StreamingResponseBodyPolicy
                // AND StreamingResponseAccumulator
                if _, isStreaming := p.(StreamingResponseBodyPolicy); isStreaming {
                    if acc, hasAccumulator := p.(StreamingResponseAccumulator); hasAccumulator {
                        if acc.NeedsMoreResponseData(accumulator) {
                            needMore = true
                            break // At least one needs more, keep accumulating
                        }
                    }
                }
            }
        }

        if needMore {
            continue // Wait for next chunk
        }

        // All policies ready (or EOS) - process through chain
        result := accumulator
        for _, p := range policies {
            action := p.OnResponseBodyChunk(ctx, &Chunk{Content: result, EndOfStream: chunk.EndOfStream}, params)
            if action != nil {
                if immediate, ok := action.(ImmediateResponse); ok {
                    k.sendError(immediate.StatusCode, immediate.Body)
                    return
                }
                if mods, ok := action.(UpstreamResponseModifications); ok {
                    if mods.Body != nil {
                        result = mods.Body
                    }
                }
            }
        }

        // Send processed result to client
        k.sendToClient(result)

        // Clear accumulator for next cycle
        accumulator = nil
    }
}

9.4.3. Example: Content Safety Policy

package contentsafety

import (
    "unicode"

    policy "github.com/wso2/api-platform/sdk/gateway/policy/v1alpha"
)

type ContentSafetyPolicy struct {
    analyzer ContentAnalyzer
}

// =============================================================================
// Implements: StreamingResponseAccumulator (optional, for response accumulation)
// =============================================================================

func (p *ContentSafetyPolicy) NeedsMoreResponseData(accumulated []byte) bool {
    // Need at least one complete sentence to analyze
    return !hasCompleteSentence(accumulated)
}

// =============================================================================
// Implements: StreamingResponseBodyPolicy (required for streaming)
// =============================================================================

func (p *ContentSafetyPolicy) OnResponseBodyChunk(
    ctx *policy.ResponseContext,
    chunk *policy.Chunk,
    params map[string]interface{},
) policy.ResponseAction {

    // By the time this is called, we have enough context (NeedsMoreData returned false)
    result, err := p.analyzer.Analyze(chunk.Content)
    if err != nil {
        return policy.ImmediateResponse{
            StatusCode: 502,
            Body:       []byte(`{"error": "Content analysis failed"}`),
        }
    }

    if result.Blocked {
        return policy.ImmediateResponse{
            StatusCode: 403,
            Body:       []byte(`{"error": "Content blocked by safety policy"}`),
        }
    }

    if result.Modified {
        return policy.UpstreamResponseModifications{Body: result.SafeContent}
    }

    return nil // Pass through unchanged
}

func hasCompleteSentence(data []byte) bool {
    text := string(data)
    for i := len(text) - 1; i >= 0; i-- {
        r := rune(text[i])
        if r == '.' || r == '!' || r == '?' {
            // Check if followed by space or end
            if i == len(text)-1 || unicode.IsSpace(rune(text[i+1])) {
                return true
            }
        }
    }
    return false
}

9.4.4. Interface Relationship Summary

Policy Type	Interfaces Implemented
Simple response streaming (logging, metrics)	`StreamingResponseBodyPolicy`
Simple request streaming	`StreamingRequestBodyPolicy`
AI Guardrail response streaming (content safety, toxicity)	`StreamingResponseBodyPolicy` + `StreamingResponseAccumulator`
AI Guardrail request streaming (prompt injection detection)	`StreamingRequestBodyPolicy` + `StreamingRequestAccumulator`
Dual-path AI Guardrail (both request and response)	All four interfaces
Buffered body processing	`RequestBodyPolicy` / `ResponseBodyPolicy`
Header-only processing	`RequestHeaderPolicy` / `ResponseHeaderPolicy`

10. Design Decisions

10.1. Interface Design Approach

Decision: Use independent composable interfaces (Option 2) without a Mode() function.

Rationale: Option 2 provides maximum composability - policies implement only the interfaces they need. No boilerplate methods, no mode declaration redundancy. The interface a policy implements IS its mode declaration.

Alternatives Considered:

Option 1A (Four core interfaces with Mode function): Required header-only policies to implement all 4 interfaces with no-op methods
Option 1B (Separate HeaderOnlyPolicy): Still required Mode() function, inconsistent method naming

10.2. Mode Declaration Mechanism

Decision: The interface a policy implements declares its processing mode. No explicit Mode() function needed.

Rationale: The kernel determines mode by type assertion:

_, needsHeaders := policy.(RequestHeaderPolicy)
_, needsBufferedBody := policy.(RequestBodyPolicy)
_, needsStreamingBody := policy.(StreamingRequestBodyPolicy)

This eliminates redundancy - if a policy implements StreamingRequestBodyPolicy, it inherently declares streaming support.

10.3. Route Processing Mode Determination

Decision: Infer processing mode from the combined interface implementations of all policies in a route.

Rules:

If ANY policy implements ONLY RequestBodyPolicy (not StreamingRequestBodyPolicy), the request path uses buffered mode
If ALL policies with body processing implement StreamingRequestBodyPolicy, the request path uses streaming mode
Header-only policies (RequestHeaderPolicy) are compatible with both modes
Request and response paths are determined independently

10.4. Headers and Body as Independent Concerns

Decision: Header processing and body processing are separate, independent interfaces.

Rationale: A policy may need to:

Process only headers (e.g., ModifyHeaders)
Process only body (e.g., PII masking on body content)
Process both (e.g., JsonToXml updates Content-Type header AND transforms body)

Independent interfaces allow any combination without forcing unnecessary implementations.

10.5. Dual-Support Pattern for Body Processing

Decision: Policies can implement both ResponseBodyPolicy (buffered) AND StreamingResponseBodyPolicy (streaming) to work in either mode.

Rationale: This enables policies like PIIMasking to:

Use streaming mode when in a streaming-capable route (preferred for AI Gateway)
Fall back to buffered mode when paired with buffered-only policies (e.g., JsonToXml)

The kernel selects which method to call based on the route's determined mode.

10.6. Policy Compatibility Validation Timing

Decision: Validate policy compatibility at design time (Platform API), not just deployment time.

Rationale: Better user experience - users learn about incompatibilities immediately when configuring, not after attempting deployment. Gateway Controller performs defense-in-depth validation at deploy time.

10.7. Policy Definition as Source of Truth

Decision: The Policy Definition (YAML with supports metadata) is the source of truth for compatibility validation.

Rationale: Policy Definition is available to both Platform API (design-time validation) and Gateway Controller (deploy-time validation). Policy Hub stores built-in policies; users can add custom policies with their own definitions.

spec:
  supports:
    request:
      buffered: true
      streaming: false
    response:
      buffered: true
      streaming: true

10.8. Build-Time Validation by Gateway Builder

Decision: The Gateway Builder performs build-time validation to ensure Policy Definition YAML matches the interfaces implemented by the policy code.

Rationale: Compile-time validation catches mismatches early, before deployment. The builder checks:

If YAML declares supports.request.streaming: true, policy must implement StreamingRequestBodyPolicy
If YAML declares supports.response.buffered: true, policy must implement ResponseBodyPolicy
And so on for all interface/declaration combinations

This ensures the Policy Definition accurately reflects the policy's actual capabilities, preventing runtime surprises.

10.9. Chunk Accumulation Approach

Decision: Use Solution A (Pre-Processing Check) with NeedsMoreData interface for policies requiring chunk accumulation.

Rationale: Solution A makes the buffering decision BEFORE any processing starts, avoiding complex mid-chain state tracking. The kernel accumulates raw bytes and only processes when all accumulator policies signal readiness (or EndOfStream is reached).

Alternatives Considered:

Solution B (Static minimum byte/chunk count): Cannot handle variable semantic boundaries (sentence length varies)
Solution C (Mid-chain BufferMore action): Requires complex per-policy state tracking in kernel

10.10. Accumulator Interface Design

Decision: Use separate optional interfaces (StreamingRequestAccumulator, StreamingResponseAccumulator) that streaming policies implement IN ADDITION to their body policy interface.

Rationale:

Accumulation is optional - simple streaming policies (logging, metrics) don't need it
Request and response paths may have different accumulation logic
Clear composition: StreamingResponseBodyPolicy + StreamingResponseAccumulator

10.11. NeedsMoreData Receives Raw Bytes

Decision: NeedsMoreData(accumulated []byte) receives raw upstream bytes, not content modified by earlier policies.

Rationale: The accumulation decision happens BEFORE chain processing. All policies make their "ready" decision based on the same raw content. After all policies are ready, chain processing runs once with the accumulated content.

10.12. Policy Instance Lifecycle

Decision: Policy instance lives for the entire request/response lifecycle.

Rationale: The same policy instance handles all chunks in a stream. This allows policies to maintain internal state (e.g., accumulated data, scan sessions, partial pattern matches) without relying solely on context metadata.

10.13. Error Handling Mid-Stream

Decision: Return ImmediateResponse action; Envoy handles connection termination.

Rationale: When a streaming policy returns ImmediateResponse:

Policy engine stops processing subsequent policies
Envoy receives the error response and terminates the stream
Client receives the error (or connection reset if data was already sent)

This is consistent with buffered mode error handling and leverages Envoy's built-in stream management.

10.14. Memory Protection for Accumulation

Decision: Configurable maximum accumulation buffer size with error response when exceeded.

Rationale: Unbounded accumulation could lead to memory exhaustion. When the buffer exceeds the configured limit, the kernel returns HTTP 413 (Payload Too Large) rather than risking OOM conditions.

10.15. Backpressure Handling via Chunk Batching

Decision: When the Policy Engine receives chunks faster than it can process them, it batches multiple chunks together and processes them as a single accumulated chunk in the next processing cycle.

Rationale: Rather than implementing complex backpressure signaling to Envoy, the kernel absorbs temporary processing delays by accumulating incoming chunks. On the next processing cycle, all accumulated chunks are sent through the policy chain together. This:

Naturally reduces backpressure by processing more data per cycle
Maintains the streaming semantics (policies still see chunk boundaries via EndOfStream)
Avoids complex flow control protocols between Envoy and Policy Engine
Leverages the existing accumulation infrastructure

Behavior:

Normal flow:     Chunk1 → Process → Forward → Chunk2 → Process → Forward
Backpressure:    Chunk1,Chunk2,Chunk3 arrive while processing → Process all together → Forward

10.16. Header Modification Limitation in Streaming Mode

Decision: Header modifications via OnRequestHeaders or OnResponseHeaders are final and cannot be changed based on body content discovered during streaming.

Rationale: In streaming mode, headers are forwarded to the client/upstream before body processing begins. This is a fundamental constraint of HTTP streaming - headers must be sent before the body. Policies that need to modify headers based on body inspection must use buffered mode.

Limitation: If a policy discovers something during body chunk processing that should have affected headers (e.g., determining Content-Type from body inspection), it cannot retroactively modify those headers. Policies requiring such behavior should:

Use buffered mode, OR
Make header decisions based solely on request/response headers (not body content)

This is an accepted architectural limitation inherent to HTTP streaming semantics.

10.17. Dual-Support Mode Selection

Decision: When all body-processing policies in a route implement dual-support (both buffered and streaming interfaces), the kernel defaults to streaming mode.

Rationale: Streaming mode provides better real-time UX, which is the primary use case for AI Gateway. If a route author wants to force buffered mode, they can add a buffered-only policy or configure the route explicitly (future enhancement).

11. TODO / Future Work

The following items are identified for future design and implementation:

11.1. Testing Strategy

Define unit testing patterns for streaming policies
Create mock chunk stream utilities for integration testing
Document edge case coverage requirements:
- Empty chunks
- Single-byte chunks
- Very large chunks
- Rapid chunk arrival (stress testing)
- Connection drops mid-stream
- EndOfStream without prior chunks
Define test fixtures for common streaming formats (SSE, chunked JSON)

11.2. Observability

Tracing: Define span creation per chunk and per accumulation cycle
Metrics: Standard metrics to expose:
- Accumulation buffer size (current, max, average)
- Chunks per accumulation cycle
- Time spent in accumulation
- Policy processing latency per chunk
- Backpressure events (chunk batching occurrences)
Logging: Debug mode with full chunk content logging (with PII redaction)
Debugging tools: Chunk flow visualization for troubleshooting

12. References

Envoy External Processing Filter
Envoy ProcessingMode
Current Policy Interface: sdk/gateway/policy/v1alpha/interface.go

pubudu538 · 2026-01-18T16:47:45Z

pubudu538
Jan 18, 2026
Collaborator

Hi Renuka,

Overall this looks good. +1 for option 2. My only concern is the user experience. As a user, I feel like we should show relevant policies from the list based on the initial API creation. An API can be REST or a streaming API. Based on that we should show relevant policies rather than waiting for the deployment time.

Thanks!

1 reply

renuka-fernando Jan 19, 2026
Collaborator Author

Hi Pubudu,

There is another concern: in the request path, policies can be buffered, while in the response path, policies can be streaming. Although we don’t have separate request and response flows, a single policy can implement buffered processing on the request path and streaming processing on the response path.

Thushani-Jayasekera · 2026-03-04T12:13:07Z

Thushani-Jayasekera
Mar 4, 2026
Collaborator

1. Problems with existing Policy Implementation

1.1 — `Policy Interface

Every current policy exposed a Mode() method that declared the required Envoy body processing mode at startup:

// v1
type Policy interface {
    Mode() ProcessingMode   // called once at startup
    OnRequest(ctx, params) RequestAction
    OnResponse(ctx, params) ResponseAction
}

The HTTP request-response lifecycle has distinct phases, and what can physically be mutated at each phase is different:
In particular:

During response streaming chunks — the downstream client has already received the HTTP status line and response headers. It is physically impossible to change the status code, inject a new response (ImmediateResponse), or modify headers at this point.
During request streaming chunks — the upstream connection is already open and request headers have been sent. Header and routing mutations have no effect.

The v1 interface returns the same broad RequestAction / ResponseAction from every phase, making none of these constraints visible or enforceable.

1.2 — Response `jsonPath` was broken for streaming

Every guardrail policy (word-count, sentence-count, content-length, regex-guardrail, azure-content-safety) had a single jsonPath field in the response config:

response:
  jsonPath: "$.choices[0].message.content"

This only works for non-streaming responses (plain JSON). For streaming responses, Envoy assembles the SSE body into:

data: {"choices":[{"delta":{"content":"Hello "}}]}\n\n
data: {"choices":[{"delta":{"content":"world"}}]}\n\n
data: [DONE]\n\n

The single JSON path is wrong, streaming uses choices[0].delta.content, not choices[0].message.content. One path cannot cover both response shapes.

1.3 — No chunk accumulation contract

Streaming policies received raw Envoy chunks with no framework-level way to say "I need more data before I can act." A PII placeholder like [EMAIL_0001] generated token-by-token by an LLM could arrive split across multiple SSE events. Without an accumulation contract, each policy had to implement its own ad-hoc buffering duplicating logic and creating inconsistent behaviour across chains.

2. Policy Interface v2 Design

2.0 - Rejected options for Policy Interface

Please refer the below comment: #706 (comment)

2.1 — Capability declaration via sub-interfaces (no `Mode()`)

The central change: Policy is now a marker interface with no methods. Capabilities are declared by implementing phase-specific sub-interfaces. The kernel discovers capabilities at chain-build time using type assertions — once, at startup, with zero per-request overhead.

 // Marker only — no methods
  type Policy interface{}
  // Capabilities declared by implementing sub-interfaces:
  type RequestHeaderPolicy interface {
      OnRequestHeaders(ctx *RequestHeaderContext) HeaderAction
  }
  type RequestBodyPolicy interface {
      OnRequestBody(ctx *RequestBodyContext) RequestBodyAction
  }
  type StreamingResponseBodyPolicy interface {
      ResponseBodyPolicy                                                    // buffered fallback — required
      OnResponseBodyChunk(ctx *ResponseStreamContext, chunk *StreamBody) ResponseChunkAction
  }

classDiagram
    class Policy {
        <<interface>>
    }

    class RequestHeaderPolicy {
        <<interface>>
        +OnRequestHeaders(ctx) HeaderAction
    }

    class ResponseHeaderPolicy {
        <<interface>>
        +OnResponseHeaders(ctx) HeaderAction
    }

    class RequestBodyPolicy {
        <<interface>>
        +OnRequestBody(ctx) RequestBodyAction
    }

    class ResponseBodyPolicy {
        <<interface>>
        +OnResponseBody(ctx) ResponseBodyAction
    }

    class StreamingRequestBodyPolicy {
        <<interface>>
        +OnRequestBody(ctx) RequestBodyAction
        +OnRequestBodyChunk(ctx, chunk) RequestChunkAction
    }

    class StreamingResponseBodyPolicy {
        <<interface>>
        +OnResponseBody(ctx) ResponseBodyAction
        +OnResponseBodyChunk(ctx, chunk) ResponseChunkAction
    }

    class ChunkBuffering {
        <<interface>>
        +NeedsMoreData(accumulated) bool
    }

    Policy <|-- RequestHeaderPolicy
    Policy <|-- ResponseHeaderPolicy
    Policy <|-- RequestBodyPolicy
    Policy <|-- ResponseBodyPolicy
    StreamingRequestBodyPolicy --|> RequestBodyPolicy
    StreamingResponseBodyPolicy --|> ResponseBodyPolicy

Mode selection rule (evaluated at chain build time):

Chain composition	Envoy response body mode
ALL response-body policies implement `StreamingResponseBodyPolicy`	`FULL_DUPLEX_STREAMED`
ANY response-body policy implements only `ResponseBodyPolicy`	`BUFFERED` (forced)

** Since we enforce onRequestBody and onResponseBody method to be implemented for streaming policies as well all the policies will be compatible with each other and methods can be executed without any issue.

2.2 — Return Action types mirror constraints

Each phase returns a distinct action type. The type system encodes what action is possible at that phase — policy authors cannot attempt mutations Envoy cannot honour.

Phase                        Action type             Capabilities
───────────────────────────  ──────────────────────  ──────────────────────────────────────────
Request headers              HeaderAction            Header mutations, ImmediateResponse
Request body (buffered)      RequestBodyAction       Body + header + routing mutations, ImmediateResponse
Response headers             HeaderAction            Header mutations, ImmediateResponse
Response body (buffered)     ResponseBodyAction      Body + status mutations, ImmediateResponse
Request chunk (streaming)    RequestChunkAction      Chunk mutation only
Response chunk (streaming)   ResponseChunkAction     Chunk mutation only

Key constraint: ImmediateResponse is absent from both streaming chunk action types. Once response headers are committed to the downstream client, injecting a new response mid-stream is physically impossible. Encoding this in the type system prevents an entire class of incorrect policy implementations.

3. Streaming Detection & Mode Override Strategy

This is a operationally critical part of the Policy Engine. The wrong decision at the wrong time causes either unnecessary latency (buffering a streaming response) or broken Content-Length headers (streaming a buffered response).

3.1 — The point of no return

When Envoy processes response headers, it decides whether to send Content-Length or Transfer-Encoding: chunked to the downstream client before the ext_proc response for ResponseHeaders is processed. This means:

BUFFERED → FULL_DUPLEX_STREAMED upgrade at ResponseHeaders_processing: ✅ works
FULL_DUPLEX_STREAMED → BUFFERED downgrade at ResponseHeaders_processing: ❌ broken — Envoy has already decided to use chunked encoding for the client, stripping the content length header. This will cause issues for clients expecting a full json response with content length header.

3.2 — The strategy: default BUFFERED, upgrade only when streaming is confirmed

Processing mode can be set at Envou listener level and per route. We will have the default mode set for request_body_mode and response_body_mode: BUFFERED

typed_per_filter_config:
  envoy.filters.http.ext_proc:
    "@type": type.googleapis.com/envoy.extensions.filters.http.ext_proc.v3.ExtProcPerRoute
    overrides:
      processing_mode:
        request_header_mode:  SEND
        response_header_mode: SEND
        request_body_mode:    BUFFERED
        response_body_mode:   BUFFERED
        request_trailer_mode: SKIP
        response_trailer_mode: SKIP

sequenceDiagram
    participant E as Envoy
    participant K as Kernel
    participant P as Policy Chain

    E->>K: RequestHeaders
    K->>P: OnRequestHeaders()
    K-->>E: ProcessingResponse\n[ResponseBodyMode = BUFFERED] ← safe default

    E->>K: RequestBody
    K->>P: OnRequestBody()
    K-->>E: ProcessingResponse

    Note over E,K: Upstream response arrives

    E->>K: ResponseHeaders
    K->>K: inspect Transfer-Encoding, Content-Type
    alt isChunked OR isSSE
        K->>K: isStreaming = true
        alt chain.StreamResponseBody = true
            K-->>E: ModeOverride = FULL_DUPLEX_STREAMED ✅
        else chain.StreamResponseBody = false
            K-->>E: ModeOverride = BUFFERED\n(chain needs full body)
        end
    else non-streaming
        K-->>E: ModeOverride = BUFFERED\n(preserve Content-Length)
    end

    E->>K: ResponseBody chunk(s)
    K->>P: OnResponseBodyChunk() or OnResponseBody()
    K-->>E: ProcessingResponse

3.3 — Response body routing matrix

flowchart TD
    A[ResponseHeaders received] --> B{isStreaming?}

    B -->|No| C[BUFFERED mode
Content-Length preserved]
    B -->|Yes| D{chain.StreamResponseBody?}

    D -->|No — any policy buffered-only| E[BUFFERED mode
Envoy assembles all 
SSE chunks]
    D -->|Yes — all policies streaming| F[FULL_DUPLEX_STREAMED
 mode
chunks arrive per event]

    C --> G[OnResponseBody
Content: complete plain JSON
ImmediateResponse available ✅]
    E --> H[OnResponseBody
Content: aggregated SSE string
ImmediateResponse available ✅]
    F --> I[OnResponseBodyChunk per flush
Content: SSE event or JSON chunk
No ImmediateResponse ❌]

4. ChunkBuffering — Policies Control Their Own Flush Boundary

4.1 — The interface

type ChunkBuffering interface {
    NeedsMoreData(accumulated []byte) bool
}

EOS is the policy engine's responsibility. When the stream ends, the kernel flushes unconditionally and never calls NeedsMoreData.

4.2 — Accumulation flow

flowchart TD
    A[Envoy delivers chunk] --> B[accumBuf += chunk]
    B --> C{eos?}

    C -->|Yes| G
    C -->|No| D{any policy
NeedsMoreData
accumBuf ?}

    D -->|Yes| E[HOLD
Send empty ack to Envoy
Client receives nothing yet]
    E --> A

    D -->|No| G[FLUSH
Run chain on accumBuf
Send mutated result downstream]
    G --> H[Reset accumBuf]

Hold = suppress, not echo. During the hold phase, chunks are hold — not forwarded downstream. For mutating policies like PII masking this is essential: you never want partially-unmasked content reaching the client.

4.3 — Built-in strategies (no custom logic needed)

We can build utility functions for common use cases like waiting for a minimum context window/ waiting for a specific delimiter. Policy authors can use these utility functions in their needsMoreData function to ease implementation.

5. The Response JSONPath Problem — Detailed Breakdown

5.1 — Why one path cannot work for both response shapes

sequenceDiagram
    participant C as Client
    participant G as Gateway (ext_proc)
    participant L as LLM (OpenAI)

    rect rgb(220, 240, 255)
        Note over C,L: Non-streaming (stream: false)
        C->>G: POST /chat/completions\n{"stream": false}
        G->>L: forwarded request
        L-->>G: HTTP 200 Content-Length: 842\n{"choices":[{"message":{"content":"..."}}]}
        Note over G: Plain JSON\nPath: choices[0].message.content ✅
        G-->>C: response
    end

    rect rgb(255, 235, 220)
        Note over C,L: Streaming (stream: true)
        C->>G: POST /chat/completions\n{"stream": true}
        G->>L: forwarded request
        L-->>G: HTTP 200 Transfer-Encoding: chunked\nContent-Type: text/event-stream
        L-->>G: data: {"choices":[{"delta":{"content":"Hello"}}]}\n\n
        L-->>G: data: {"choices":[{"delta":{"content":" world"}}]}\n\n
        L-->>G: data: [DONE]\n\n
        Note over G: Aggregated SSE (if BUFFERED chain)\nPath: choices[0].delta.content ✅\nbut body is NOT valid JSON ❌
        G-->>C: response
    end

5.2 — The fix: two explicit path configs in Policy Definition

# v1 — one jsonPath, broken for SSE
response:
  jsonPath: "$.choices[0].message.content"   # fails on SSE body, wrong path anyway

# v2 — format-aware, two separate paths
response:
  min: 10
  max: 1000
  jsonPath:          "$.choices[0].message.content"   # plain JSON (stream: false)
  streamingJsonPath: "$.choices[0].delta.content"     # per SSE event (stream: true)

6. Summary

For policy authors

Implement only the sub-interfaces for the phases you care about — unused phases cost nothing at runtime
The action type hierarchy prevents attempting mutations that are physically impossible at a given phase
ChunkBuffering with utility functions means accumulation logic is one line per policy
All parameter parsing goes in GetPolicy — hook methods are pure, allocation-free, and fast

For operators

One policy-definition.yaml per policy, validated by the control plane before the policy code is ever invoked
response.jsonPath (plain JSON) and response.streamingJsonPath (per SSE event) explicitly address both LLM response shapes
Both path fields default to empty with auto-extraction — zero config required for standard OpenAI usage

For the runtime

Streaming mode is decided at ResponseHeaders — after upstream response type is known — preventing Content-Length stripping on non-streaming responses
Chunk suppression during the hold phase means no partially-processed data (e.g. unmasked PII) ever reaches the client
Mixed chains (streaming + buffered policies) degrade gracefully to BUFFERED with all policies receiving the full body through their correct hook

Sample implementation:

1 reply

Thushani-Jayasekera Mar 4, 2026
Collaborator

3.1 — The point of no return

This is the Envoy <->ext_proc behaviour

Thushani-Jayasekera · 2026-03-04T12:30:36Z

Thushani-Jayasekera
Mar 4, 2026
Collaborator

Design Discussion: Policy Interface Options Considered and Rejected

These are the interface designs we evaluated for policy-v2 before arriving at the final design.

The problem we were solving:

The v1 interface uses a single Mode() method and two methods (OnRequest, OnResponse) to handle every processing phase — headers, streaming chunks, and buffered bodies. It cannot enforce which mutations are valid at which phase, and it commits to an Envoy body mode at startup before knowing what the upstream will actually send. See the related issue for the full problem statement.

Option 1 — Unified Body Method with Phase Flags in Context

Idea

Collapse all processing into two methods. The same method handles header decisions, streaming chunks, and buffered bodies. The phase is communicated through fields on the context object (ctx.IsHeader, ctx.IsStreamingBody, ctx.IsBufferingBody). The policy engine decides when to call the method and how many times.

type Policy interface {
    OnRequestBody(ctx *RequestBodyContext, params map[string]interface{}) RequestAction
    //            ↑ ctx.IsHeader / ctx.IsStreamingBody / ctx.IsBufferingBody
    //            ↑ ctx.Headers always available

    OnResponseBody(ctx *ResponseBodyContext, params map[string]interface{}) ResponseAction
    //             ↑ ctx.RequestHeaders + ctx.ResponseHeaders always available
    //             ↑ ctx.IsHeader / ctx.IsStreamingBody / ctx.IsBufferingBody
}

The policy author switches on the phase flag:

func (p *MyPolicy) OnResponseBody(ctx *ResponseBodyContext, ...) ResponseAction {
    if ctx.IsHeader {
        // header-only decision
    } else if ctx.IsStreamingBody {
        // process one chunk
    } else {
        // process full buffered body
    }
}

Pros

Minimal — one method per direction
Symmetric: request and response look the same
Headers are always in context, so a policy never needs to worry about which phase has access to which data

Cons and Reason for Rejection

The type system cannot enforce phase-specific constraints.
Both RequestAction and ResponseAction expose the full set of possible mutations — header changes, routing, ImmediateResponse, body mutation — regardless of which phase is executing. During a streaming chunk, the method returns RequestAction, which includes ImmediateResponse. Nothing prevents a policy from returning it. Nothing signals that it will be silently ignored.

The phase check is an invisible runtime contract.
The ctx.IsHeader, ctx.IsStreamingBody, ctx.IsBufferingBody flags are runtime values. A policy that forgets to check ctx.IsHeader and returns a body mutation in the header phase, or returns an ImmediateResponse in a chunk phase, receives no compile-time error and no runtime warning.

A "header-only" policy is indistinguishable from a "body" policy.
The policy engine cannot determine at chain-build time whether a policy cares about headers, bodies, or both. Every policy must always be invoked at every phase, even if it returns a no-op for phases it does not care about. This adds per-request overhead and makes capability detection impossible.

Mode() problem is not solved.
The kernel still needs to know upfront whether to use BUFFERED or FULL_DUPLEX_STREAMED — and that decision cannot come from the method itself because the upstream response type is not known at startup. The phase flag approach does nothing to address when the engine switches modes.

Decision: Rejected.

Option 2 — Single Interface, All Six Methods (Monolithic)

Idea

Separate the method per phase explicitly, keeping all six methods on a single Policy interface. Every policy implements all six methods. The return type is still the broad RequestAction / ResponseAction.

type Policy interface {
    OnRequestHeaders(ctx *RequestHeaderContext, params map[string]interface{}) HeaderAction

    OnResponseHeaders(ctx *ResponseHeaderContext, params map[string]interface{}) HeaderAction

    OnRequestBody(ctx *RequestBodyContext, params map[string]interface{}) RequestBodyAction
    // full buffered request body; ctx.Headers available

    OnRequestBodyChunk(ctx *RequestStreamContext, chunk *Body, params map[string]interface{}) RequestBodyChunkAction
    // called once per streaming request chunk

    OnResponseBody(ctx *ResponseBodyContext, params map[string]interface{}) ResponseAction
    // full buffered response body; ctx.RequestHeaders + ctx.ResponseHeaders available

    OnResponseBodyChunk(ctx *ResponseStreamContext, chunk *Body, params map[string]interface{}) ResponseBodyChunkAction
    // called once per streaming response chunk
}

Pros

Clear separation of phases — each method has an unambiguous semantic
Policy authors know exactly which method will be called and when
No phase-flag switching; the method name is self-documenting

Cons and Reason for Rejection

Every policy must implement all six methods.
A header-only policy (e.g., API key auth, set-headers) has no interest in bodies or chunks. It must still implement OnRequestBody, OnResponseBody, OnRequestBodyChunk, OnResponseBodyChunk — all as empty no-ops. This is pure boilerplate and obscures the policy's actual intent.

The kernel cannot skip phases it does not need.
If every policy implements all six methods, the kernel has no way to know that a policy is header-only. It must call every method at every phase and rely on the policy returning empty no-ops for phases it doesn't care about. There is no way to set RequestBodyMode = NONE based on policy capabilities.

Need a method to identify the mode of the policy and its capabilities

Decision: Rejected.

Option 3 — Sub-Interfaces for header and body processing Same Action Types for streaming and buffered bodies

Idea

Split the 4 methods across 4 separate interfaces. A policy implements only the sub-interfaces for the phases it cares about. The kernel inspects the chain at startup to discover capabilities. However, ResponseBodyPolicy sub-interface will implement OnResponseBody method to handle both streaming and non streaming responses. Therefore, it will allow the same ResponseBodyAction

type RequestHeaderPolicy interface {
    OnRequestHeaders(ctx *RequestHeaderContext, params map[string]interface{}) HeaderAction
}

type ResponseHeaderPolicy interface {
    OnResponseHeaders(ctx *ResponseHeaderContext, params map[string]interface{}) HeaderAction
}

type RequestBodyPolicy interface {
    OnRequestBody(ctx *RequestBodyContext, params map[string]interface{}) RequestBodyAction
}

type ResponseBodyPolicy interface {
    OnResponseBody(ctx *ResponseBodyContext, params map[string]interface{}) ResponseBodyAction
}

Pros

A header-only policy implements only RequestHeaderPolicy — no boilerplate
The kernel can determine at chain-build time which phases to invoke for each policy

Cons and Reason for Rejection

The action types are still too broad.

OnResponseBody returns ResponseBodyAction. ResponseBodyAction contains ImmediateResponse, header mutations, routing mutations. During request chunk processing — with the upstream connection already open and request headers already sent — none of these are physically possible. The sub-interface separation tells the kernel when to call the method but does not constrain what the policy can return from it.

Need a method to identify the mode of the policy. (streaming/ buffering)

Decision: Rejected. Close, but the action type problem make this insufficient.

0 replies

[Proposal] Streaming Policy Architecture for Policy Engine #706

Uh oh!

Uh oh!

renuka-fernando Jan 18, 2026 Collaborator

1. Summary

2. Motivation

2.1. Primary Use Case: Real-Time LLM Streaming Responses

2.2. Secondary Considerations

3. Current State

4. Proposed Design Options

4.1. Option 1: Split by Buffered vs Streaming (with Mode function)

4.1.1. Option 1A: Four Core Interfaces

4.1.2. Option 1B: With Separate HeaderOnlyPolicy Interface

4.2. Option 2: Independent Composable Interfaces (Recommended)

4.3. Interface IS the Mode Declaration

4.4. Policy Composition Examples

4.5. Option 2 Benefits

4.6. Comparison of Options

5. Sample Policy Implementations

5.1. ModifyHeaders Policy

5.2. JsonToXml Policy

5.3. PII Masking Policy (AI Guardrail)

5.4. Virus Scan Policy

6. Route Compatibility Matrix

6.1. Compatibility Rules

6.1.1. Mode Determination Priority

6.2. Example Combinations

7. Policy Compatibility Validation

7.1. The Problem

7.2. Architecture: Policy Definition as Source of Truth

7.3. Policy Definition

7.4. Validation at Each Layer

7.5. Platform API Validation Logic

7.6. Error Response

8. Kernel Processing Flow

8.1. Buffered Request Path

8.2. Streaming Request Path

8.3. Mixed Mode (Buffered Request + Streaming Response) - AI Gateway Primary Use Case

9. Policies That Require Chunk Accumulation

9.1. The Problem

9.2. Proposed Solutions Analysis

9.2.1. Solution A: Pre-Processing Check (NeedsMoreData Interface)

9.2.2. Solution B: Static Minimum Byte/Chunk Count

9.2.3. Solution C: Action-Based Buffering Instruction

9.3. Solution Comparison

9.4. Solution A - Detailed Design

9.4.1. Interface Definition

9.4.2. Kernel Processing Flow

9.4.3. Example: Content Safety Policy

9.4.4. Interface Relationship Summary

10. Design Decisions

10.1. Interface Design Approach

10.2. Mode Declaration Mechanism

10.3. Route Processing Mode Determination

10.4. Headers and Body as Independent Concerns

10.5. Dual-Support Pattern for Body Processing

10.6. Policy Compatibility Validation Timing

10.7. Policy Definition as Source of Truth

10.8. Build-Time Validation by Gateway Builder

10.9. Chunk Accumulation Approach

10.10. Accumulator Interface Design

10.11. NeedsMoreData Receives Raw Bytes

10.12. Policy Instance Lifecycle

10.13. Error Handling Mid-Stream

10.14. Memory Protection for Accumulation

10.15. Backpressure Handling via Chunk Batching

10.16. Header Modification Limitation in Streaming Mode

10.17. Dual-Support Mode Selection

11. TODO / Future Work

11.1. Testing Strategy

11.2. Observability

12. References

Replies: 3 comments · 2 replies

Uh oh!

Uh oh!

pubudu538 Jan 18, 2026 Collaborator

Uh oh!

renuka-fernando Jan 19, 2026 Collaborator Author

Uh oh!

Uh oh!

renuka-fernando
Jan 18, 2026
Collaborator

Replies: 3 comments 2 replies

pubudu538
Jan 18, 2026
Collaborator

renuka-fernando Jan 19, 2026
Collaborator Author

Thushani-Jayasekera
Mar 4, 2026
Collaborator

1.2 — Response `jsonPath` was broken for streaming

2.1 — Capability declaration via sub-interfaces (no `Mode()`)

Thushani-Jayasekera Mar 4, 2026
Collaborator

Thushani-Jayasekera
Mar 4, 2026
Collaborator