[Proposal] Python Policy Support: Enable ML/AI-Powered Policies via Python Runtime #1103
Replies: 5 comments
-
Performance Issue & RefinementAfter discussions with @renuka-fernando and @HeshanSudarshana, we identified some issues with the proposed method: Problem: Spawning a new Python process for each request is expensive and would cause significant latency. Potential Approaches:
Dependency conflict concern: Need to find the optimal balance between performance, resource efficiency and maintainability. |
Beta Was this translation helpful? Give feedback.
-
|
@sehan-dissanayake The proposal is solid overall. One concern I had is around the disk footprint and runtime cost of maintaining a fully isolated virtual environment per Python policy, especially for ML-related dependencies (e.g., numpy, scikit-learn, torch), which can become quite large. As an example venv with above dependencies may scale upto ~800mb which in theory result in lot of disk space being consumed as generally users use lot of polices) As a possible refinement, we could consider a hybrid environment approach:
In addition, aligning with the performance discussion already noted, using long-running Python worker processes instead of spawning a new process per request could help reduce latency. This might provide a better balance between isolation, performance, and resource usage. |
Beta Was this translation helpful? Give feedback.
-
Implementation Update: Python Policy SupportThe Python policy support basic functionality has been implemented. Here's how it actually landed. A few things changed from the original proposal. What Changed: Venv-per-Policy → Merged
|
| File | Role |
|---|---|
factory.go |
BridgeFactory — conforms to policy.PolicyFactory, same as Go policy factories. Returns a PythonBridge instance. |
bridge.go |
PythonBridge — implements the policy.Policy interface (OnRequest, OnResponse, Mode). Translates Go SDK types → protobuf, calls the stream manager, translates protobuf → Go SDK types back. |
client.go |
StreamManager — singleton gRPC client. Maintains one persistent bidirectional stream to the Python Executor over UDS. Multiplexes concurrent requests via request_id correlation. |
translator.go |
Proto ↔ Go SDK type conversion. |
The policy engine's core chain execution is untouched. It iterates the policy chain, calls OnRequest()/OnResponse() on each policy — whether that's a native Go policy or a PythonBridge is invisible to it. The bridge just serializes the call to protobuf, sends it over the gRPC stream, waits for the correlated response, and translates back.
xDS updates are consumed entirely by Go. When the controller pushes a route update, Go's buildPolicyChain() instantiates a PythonBridge per Python policy (holding params, metadata, processing mode as Go structs) — exactly the same lifecycle as Go-native policy instances. The Python Executor is never notified about xDS events. Python doesn't know what routes exist or what params are configured until an actual execution request arrives.
The Python Side: Async gRPC Server + Thread Pool
The Python Executor (python-executor/) is a standalone async gRPC server:
Python Executor Process
├── asyncio event loop (main thread)
│ └── gRPC async server on unix:///var/run/api-platform/python-executor.sock
│ └── ExecuteStream handler (bidi streaming)
│ └── For each request: asyncio.run_in_executor(thread_pool, execute_policy)
├── ThreadPoolExecutor (default 4 workers, configurable via PYTHON_POLICY_WORKERS)
│ ├── Worker thread 1 → policy.on_request() / policy.on_response()
│ ├── Worker thread 2 → ...
│ ├── Worker thread 3 → ...
│ └── Worker thread 4 → ...
├── PolicyLoader — imports policies from generated registry at startup
├── PolicyCache — lazy, content-addressed: key = (name, version, sha256(params))
└── Metrics HTTP server (port 9119)
Why a thread pool, not subprocesses? The original proposal suggested ProcessPoolExecutor with subprocess spawning per execution. That was dropped because:
- Subprocess spawn overhead per request is too high for latency-sensitive API traffic
- Policy instances need to stay alive across requests (model loading, connection pools, etc.)
- Thread pool gives us concurrency with shared policy instances — a loaded ML model is initialized once and reused
The Python Policy SDK
The SDK mirrors the Go policy/v1alpha interface:
class Policy(ABC):
def __init__(self, metadata: PolicyMetadata, params: Dict[str, Any]): ...
@abstractmethod
def on_request(self, ctx: RequestContext, params: Dict) -> RequestAction: ...
@abstractmethod
def on_response(self, ctx: ResponseContext, params: Dict) -> ResponseAction: ...Action types (UpstreamRequestModifications, ImmediateResponse, UpstreamResponseModifications) are dataclasses that match Go SDK equivalents. A policy author writes a policy.py with a get_policy(metadata, params) factory function — same pattern as Go.
Mixed Go + Python Policy Chain: The Bridge Pattern
The central design point: Python policies are wrapped in Go. The chain executor never knows it's talking to Python — every policy in the chain implements the same Go policy.Policy interface. State (params, metadata, shared context) lives entirely on the Go side. Python is a stateless execution backend.
Here's a example route /foo with three policies — jwt-auth (Go), prompt-compress (Python), rate-limit (Go):
graph TB
subgraph GoPE["Go Policy Engine Process — all state lives here"]
direction TB
CE["Chain Executor<br/>iterates []policy.Policy"]
subgraph chain["Route /foo — Policy Chain"]
direction LR
subgraph p1["Policy 1"]
P1I["<b>jwt-auth</b><br/><i>Go native</i>"]
P1T["implements policy.Policy"]
P1S["State: params, metadata,<br/>shared context"]
end
subgraph p2["Policy 2"]
P2I["<b>PythonBridge</b><br/><i>wraps prompt-compress</i>"]
P2T["implements policy.Policy"]
P2S["State: params, metadata,<br/>shared context, mode"]
end
subgraph p3["Policy 3"]
P3I["<b>rate-limit</b><br/><i>Go native</i>"]
P3T["implements policy.Policy"]
P3S["State: params, metadata,<br/>shared context"]
end
end
SM["StreamManager (singleton)<br/>persistent bidi gRPC stream<br/>multiplexed via request_id"]
end
subgraph PyExec["Python Executor Process — stateless execution"]
direction TB
GRPC["async gRPC server<br/>UDS: python-executor.sock"]
TP["ThreadPoolExecutor<br/>(4 workers)"]
PC["PolicyCache<br/>lazy, content-addressed"]
subgraph workers["Worker Threads"]
W1["Thread 1<br/>prompt-compress.on_request()"]
W2["Thread 2<br/>(available)"]
W3["Thread 3<br/>(available)"]
W4["Thread 4<br/>(available)"]
end
end
CE -->|"1. OnRequest(ctx, params)"| P1I
CE -->|"2. OnRequest(ctx, params)"| P2I
CE -->|"3. OnRequest(ctx, params)"| P3I
P2I -->|"serialize to protobuf<br/>+ request_id"| SM
SM -->|"gRPC over UDS"| GRPC
GRPC --> TP
TP --> W1
W1 -.->|"ExecutionResponse<br/>(correlated by request_id)"| SM
SM -.->|"translate proto → Go SDK<br/>merge metadata back"| P2I
style GoPE fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#1b5e20
style PyExec fill:#fff8e1,stroke:#f9a825,stroke-width:2px,color:#e65100
style p1 fill:#c8e6c9,stroke:#66bb6a,color:#1b5e20
style p2 fill:#ffe0b2,stroke:#ffa726,stroke-width:2px,color:#bf360c
style p3 fill:#c8e6c9,stroke:#66bb6a,color:#1b5e20
style chain fill:#f1f8e9,stroke:#aed581,color:#33691e
style SM fill:#e3f2fd,stroke:#42a5f5,color:#0d47a1
style workers fill:#fff9c4,stroke:#ffee58,color:#f57f17
Key things to notice:
- All three policies implement
policy.Policy. The chain executor callsOnRequest(ctx, params)on each one identically. It doesn't know or care that Policy 2 is Python. PythonBridgeholds the state — params, metadata, shared context, processing mode. These are Go structs. Python never stores per-route state; it receives context as protobuf, executes, and returns a result.StreamManageris a singleton shared by allPythonBridgeinstances across all routes. It maintains one persistent gRPC bidi stream and multiplexes viarequest_id— so 100 concurrent requests share one stream, no connection-per-call overhead.- Go owns the full policy lifecycle. xDS updates create/destroy
PythonBridgeinstances in Go. Python is never notified — it only sees execution calls, never configuration events.
Build-Time Integration
Python policies are declared in build.yaml alongside Go policies:
policies:
# Go policies — remote module reference
- name: jwt-auth
gomodule: github.com/wso2/gateway-policies/jwt-auth/v0.1.0
# Python policies — remote module reference (analogous to gomodule)
- name: prompt-compress
pythonmodule: github.com/wso2/gateway-python-policies/[email protected]
# filePath — local/dev policies only (both Go and Python)
- name: my-local-policy
filePath: ./dev-policies/my-local-policyThe gateway-builder detects runtime: python in the policy's policy-definition.yaml and:
- Copies the policy source into the build output under
python-executor/policies/ - Generates a
python_policy_registry.pymapping"name:version"→"policies.module.policy" - Merges all
requirements.txtfiles (base executor deps + per-policy deps) into one - On the Go side, generates a
BridgeFactoryregistration instead of a Go plugin registration
In the Dockerfile runtime stage, a single pip3 install --target /app/python-libs installs all merged dependencies. The Python Executor is started conditionally — only if main.py exists in the image (it's always copied, but if there are zero Python policies, the entrypoint skips launching it).
Container Process Model
Same as proposed — three processes under tini, managed by the entrypoint script:
tini (PID 1)
└── docker-entrypoint.sh
├── python3 main.py [pye] ← started first, waits for UDS socket
├── policy-engine [pol] ← started second, connects to Python Executor
└── envoy [rtr] ← started last, connects to Policy Engine
Startup is sequential (Python Executor → Policy Engine → Envoy) with socket readiness checks. If any process dies, the entrypoint tears down the rest and exits.
Beta Was this translation helpful? Give feedback.
-
Proto Filesyntax = "proto3";
package wso2.gateway.python.v1;
option go_package = "github.com/wso2/api-platform/gateway/gateway-runtime/policy-engine/internal/pythonbridge/proto";
import "google/protobuf/struct.proto";
// PythonExecutorService defines the gRPC contract between Go PE and the Python process.
// The Python process is the gRPC SERVER, Go PE is the CLIENT.
service PythonExecutorService {
// Bidirectional stream for executing policies.
// Go sends ExecutionRequest, Python responds with ExecutionResponse.
// Each request has a unique request_id that the response must echo back.
rpc ExecuteStream (stream ExecutionRequest) returns (stream ExecutionResponse);
// Health check for readiness.
rpc HealthCheck (HealthCheckRequest) returns (HealthCheckResponse);
}
// ---------------------- Request / Response ----------------------
message ExecutionRequest {
// Unique ID per call so responses can be correlated on the single stream.
string request_id = 1;
// Policy to execute (name:version format used for cache lookup, e.g., "my-policy:v1")
string policy_name = 2;
string policy_version = 3;
// Phase: "on_request" or "on_response"
string phase = 4;
// Merged parameters (system + user) for this policy instance.
// The Go side resolves ${config} references in systemParameters and merges
// them with user parameters before sending. Python never sees raw ${config} strings.
google.protobuf.Struct params = 5;
// The request or response context data
oneof context {
RequestContext request_context = 6;
ResponseContext response_context = 7;
}
// Shared context (metadata, API info, auth context)
SharedContext shared_context = 8;
// Policy metadata (route info, API info) for factory creation
PolicyMetadata policy_metadata = 9;
}
message ExecutionResponse {
// Must match request_id from the corresponding ExecutionRequest
string request_id = 1;
oneof result {
RequestActionResult request_result = 2;
ResponseActionResult response_result = 3;
ExecutionError error = 4;
}
// Updated shared metadata (Python may have mutated it).
// Go side merges this back into the SharedContext.
google.protobuf.Struct updated_metadata = 5;
}
// ---------------------- Context Messages ----------------------
message SharedContext {
string project_id = 1;
string request_id = 2;
google.protobuf.Struct metadata = 3; // Inter-policy communication map
string api_id = 4;
string api_name = 5;
string api_version = 6;
string api_kind = 7;
string api_context = 8;
string operation_path = 9;
map<string, string> auth_context = 10;
}
message RequestContext {
map<string, string> headers = 1;
bytes body = 2;
bool body_present = 3;
bool end_of_stream = 4;
string path = 5;
string method = 6;
string authority = 7;
string scheme = 8;
}
message ResponseContext {
// Original request data (immutable)
map<string, string> request_headers = 1;
bytes request_body = 2;
string request_path = 3;
string request_method = 4;
// Response data
map<string, string> response_headers = 5;
bytes response_body = 6;
bool response_body_present = 7;
int32 response_status = 8;
}
message PolicyMetadata {
string route_name = 1;
string api_id = 2;
string api_name = 3;
string api_version = 4;
string attached_to = 5; // "api" or "route"
}
// ---------------------- Action Results ----------------------
message RequestActionResult {
oneof action {
UpstreamRequestModifications continue_request = 1;
ImmediateResponseAction immediate_response = 2;
}
}
message ResponseActionResult {
oneof action {
UpstreamResponseModifications continue_response = 1;
}
}
message UpstreamRequestModifications {
map<string, string> set_headers = 1;
repeated string remove_headers = 2;
map<string, StringList> append_headers = 3;
bytes body = 4;
bool body_present = 5; // false means no body change, true means use body field (even if empty)
string path = 6;
bool path_present = 7;
string method = 8;
bool method_present = 9;
google.protobuf.Struct analytics_metadata = 10;
}
message UpstreamResponseModifications {
map<string, string> set_headers = 1;
repeated string remove_headers = 2;
map<string, StringList> append_headers = 3;
bytes body = 4;
bool body_present = 5;
int32 status_code = 6;
bool status_code_present = 7;
google.protobuf.Struct analytics_metadata = 8;
}
message ImmediateResponseAction {
int32 status_code = 1;
map<string, string> headers = 2;
bytes body = 3;
google.protobuf.Struct analytics_metadata = 4;
}
message ExecutionError {
string message = 1;
string policy_name = 2;
string policy_version = 3;
string error_type = 4; // "init_error", "execution_error", "timeout"
}
// ---------------------- Health Check ----------------------
message HealthCheckRequest {}
message HealthCheckResponse {
bool ready = 1;
int32 loaded_policies = 2;
}
// ---------------------- Utility ----------------------
message StringList {
repeated string values = 1;
} |
Beta Was this translation helpful? Give feedback.
-
Policy Instance Lifecycle Gap in the Python BridgeThe current Python bridge design describes Python as a "stateless execution backend" — but Go policies are not stateless. The How Go Policy Lifecycle Works TodayEvery Go policy exports a type PolicyFactory func(metadata PolicyMetadata, params map[string]interface{}) (Policy, error)The factory receives
Concrete examples from Rate limiting — caches per-route limiter instances in a Semantic cache — creates a new instance per call to JWT auth — returns a singleton but maintains an internal The Gap in the Python BridgeThe bridge as described creates a
Proposed Solution: Lifecycle RPCs with Factory-Controlled InstancingThe fix is to add lifecycle RPCs to the proto that mirror the Go policy lifecycle exactly. In Go, the two lifecycle entry points are
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Motivation
The API Platform gateway currently supports policies written exclusively in Go. While Go provides excellent performance and is well suited for high-throughput request processing, some policy use cases involving machine learning and AI are more naturally expressed in Python because of its extensive ML and AI ecosystem.
Key drivers:
Example Use Case: Prompt Compression Policy
An AI gateway handling LLM requests can benefit from a prompt compression policy that:
llmlinguaorcompression-prompt, which have no Go equivalentsProposal
Extend the unified gateway-runtime container (which currently manages Router and Go Policy Engine as described in the Unified Gateway Container proposal) to include a Python Policy Executor as a third managed process. The three processes will run under
tinias PID 1, with the existing entrypoint script extended to manage Python policy lifecycle in addition to Router and Go Policy Engine.Architecture at a glance:
Request flow:
Architecture Diagrams
Process Tree
graph TD A[tini PID 1] --> B[docker-entrypoint.sh] B --> C[Router / Envoy] B --> D[Go Policy Engine] B --> E[Python Policy Executor] C -->|ext_proc gRPC<br/>UDS socket| D D -->|policy execution gRPC<br/>UDS socket| E style A fill:#666,stroke:#fff,stroke-width:2px,color:#fff style B fill:#666,stroke:#fff,stroke-width:2px,color:#fff style C fill:#6cf,stroke:#333,stroke-width:3px,color:#333 style D fill:#9f9,stroke:#333,stroke-width:3px,color:#333 style E fill:#ff9,stroke:#333,stroke-width:3px,color:#333Request Flow Sequence
sequenceDiagram participant Client participant Router as Router<br/>(Envoy) participant GoPE as Go Policy Engine participant PyPE as Python Policy<br/>Executor participant Backend Client->>Router: HTTP Request Router->>GoPE: ext_proc: Request Headers Note over GoPE: Execute policy chain GoPE->>GoPE: Execute Go Policy 1<br/>(e.g., JWT Auth) GoPE->>PyPE: Execute Python Policy<br/>(e.g., Prompt Compression) activate PyPE PyPE->>PyPE: Load Python policy module PyPE->>PyPE: Execute policy logic<br/>(ML inference, transformations) PyPE-->>GoPE: Policy result + metadata deactivate PyPE GoPE->>GoPE: Execute Go Policy 2<br/>(e.g., Rate Limiting) GoPE-->>Router: Modified headers/body Router->>Backend: Forwarded request Backend-->>Router: Response Router->>GoPE: ext_proc: Response GoPE-->>Router: Response modifications Router-->>Client: HTTP ResponseChanges Required
Core Components
gateway-runtime Container
Python Policy Executor (New Process)
ProcessPoolExecutorGo Policy Engine Extensions
Gateway Builder
Implementation Details
Dependency Isolation via Virtual Environments
One venv per policy: Each Python policy gets its own virtual environment with isolated dependencies, preventing dependency conflicts between policies.
Build process (multi-stage Docker):
FROM python:3.10-slim AS builder). If needed, add additional builder stages for other Python versions (e.g.,FROM python:3.11-slim AS python311-builder)python_version.txtandrequirements.txtpython3.X -m venv /policies/<policy_name>/venv/policies/<policy_name>/venv/bin/pip install -r requirements.txtCOPY --from=<builder>directivesPolicy execution calls the venv's Python binary directly:
Multi-version support (optional): If policies specify different Python versions (3.10, 3.11, etc.), the builder creates appropriate venvs from the corresponding builder stage.
Process Model and Subprocess Management
Python Policy Executor Architecture:
ProcessPoolExecutorto manage Python subprocess executionProcessPoolExecutor, giving fine-grained control over policy execution lifecycleExecution flow:
subprocess.run([venv_python, policy_py], input=json_input, capture_output=True)Benefits of subprocess model:
Communication Protocol
Go Policy Engine ↔ Python Policy Executor communication uses gRPC over UDS:
/app/python-policy.sock(lowest latency, same pattern as Router ↔ Go Policy Engine)Drawbacks and Trade-offs
Resource usage: Adding Python runtime increases container memory footprint
Startup latency: Python interpreter initialization and policy module loading add few seconds to container startup time
Performance considerations: Python policies will have higher per-request latency than Go policies due to:
Language overhead (Python vs compiled Go)
ML model inference time (varies by model complexity)
Dependency management: Python policies may have conflicting dependencies; requires careful environment isolation
Mitigation strategies:
Python support is opt-in: APIs that don't use Python policies don't pay the performance cost
UDS communication minimizes IPC overhead
Use Python policies strategically for AI/ML tasks where Go alternatives don't exist
Document performance characteristics and best practices clearly
Alternatives Considered
Alternative 1: Separate Python Policy Engine Container
Run Python Policy Executor as a sidecar container instead of embedding in gateway-runtime.
Alternative 2: Embed Python Runtime in Go via Cgo
Embed Python interpreter directly using cgo bindings (e.g.,
go-python).Open Questions
Python policy repository and distribution: How should Python policies be stored and distributed?
Context: Go policies are stored as Go modules in GitHub because that's the native way Go modules are distributed and consumed (via
go get). Should python follow its native distribution approach?External Repository Options:
Option A: Mono-repo with Go policies (
gateway-controllersrepo withpython-policies/subdirectory)Option B: Separate Python policy repository (e.g.,
gateway-python-policies)Option C: Distribute Python policies as pip packages (PyPI or private package index)
Option D: Include in gateway repo (not external)
Discovery mechanism: Use Python build manifest (
python-build.yaml) similar to Go'sbuild.yaml, referencing either GitHub repos or pip packages.Error handling: How should Python policy crashes be handled? Fail the request or skip the policy?
Log prefix name: Should we use [
py-pol] or any alternative?Beta Was this translation helpful? Give feedback.
All reactions