Skip to content

Conversation

@guicassolato
Copy link
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:

Defines an AuthScheme API designed for more dynamic agentic scenarios such as: unbounded number of identities, identity federation, authorization based on common patterns of the targeted resources and/or authorization decision offloaded to external systems provided at the level of the platform.

The new API was thought to be used in conjunction or alternatively to the AccessPolicy API depending on the use case.

Does this PR introduce a user-facing change?:

Defines an `AuthScheme` API designed for more dynamic agentic scenarios such as: unbounded number of identities, identity federation, authorization based on common patterns of the targeted resources and/or authorization decision offloaded to external systems provided at the level of the platform. The API can be used in conjunction or alternatively to the AccessPolicy CRD, depending on the use case.

Signed-off-by: Guilherme Cassolato <[email protected]>
@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 27, 2025
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 27, 2025
@guicassolato guicassolato self-assigned this Nov 27, 2025
Signed-off-by: Guilherme Cassolato <[email protected]>
@guicassolato guicassolato changed the title proposal: Dynamic Auth Dynamic Auth proposal Nov 27, 2025
rules:
- apiGroups: ["agentic.networking.x-k8s.io"]
resources: ["backends"]
subresources: ["tools"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for lack of knowledge here.
I see the 'tools' subresource here,
but I don't see a 'tools' field in the Backend spec.

Is this a notional resource for the purpose of leveraging kubernetes rbac to do a check on a specific tool name that's pulled from a request ('request.mcp.tool_name')?
So no actual field or subresource in the Backend resource stored in etcd, just in memory for the duration of a request & rbac check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. There's no backends/tools subresource known by the API server. This is a virtual resource kind synthetically created here only for the purpose of leveraging Kube RBAC for data plane authorisation.

This is just an example. It can be any other completely made-up API group, resource and subresource kinds, although there are benefits if using "real" Kubernetes kinds known by the API server (not the case here).

For access to be granted, there must be at least one Role whose values match the ones specified (statically or dynamically) in the authorization.kubernetes fields of the AuthScheme and one RoleBinding that binds that Role to the user that is also specified in the AuthScheme.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explanation. This makes this section of the proposal clearer to me.


An AuthScheme resource defines the enforcement strategy for extracting the identity, identifying the requested agentic resource, and verification methods for an agent-to-Backend request. It allows configuring a policy enforcement point (PEP) for enforcing access control policies (permissions) otherwise declared via AccessPolicy resources or other methods.

The AuthScheme API provides language for expressing extraction, trust anchor definitions, and resource identification patterns that allow integrating external authenticators (e.g. OIDC endpoints) and authorizers (Kubernetes RBAC), with a focus on the auth methods rather than the auth data. It can be used in combination or sometimes as an alternative to the AccessPolicy CRD for more advanced use cases and cases where the scale requires offloading auth beyond the limits of a policy resource.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be remis not to point to some discussion in haiyanmeng#1 (comment) around having the kube API server in the path of traffic.

cc @robscott @keithmattix @howardjohn

Is just doing OIDC for identity and a CEL authorization rule the answer here?
Will that prototype provide sufficient example of how to implement the API?
What other types should there be, and would they become part of the API standard here, or is the intention to allow implementors to specify their own types (which the API structure right now doesn't seem to allow for)

Copy link
Contributor Author

@guicassolato guicassolato Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing up that conversion @david-martin. I think it is indeed relevant to try to clarify one of main the points of the AuthScheme API.

Envoy-based implementations may choose to convert the Auth policies into Envoy RBAC filter policies.

The AuthScheme API is not as much about giving the implementations with a choice as much as it is about giving users with one.

One who choses to store permissions in Kubernetes RBAC doesn't do it because an agentic networking auth API tells it so, but the other way around. First, the user chose to store permissions in Kubernetes RBAC and then picked the AuthScheme API because it supports leveraging this particular authorisation system.

In haiyanmeng#1 (comment), I described it as "bring-your-own" auth component (trusted OIDC server, platform-provided auth systems).

having the kube API server in the path of traffic.

I've been seeing this a lot in production.

A few reasons why users pick Kube RBAC:

  • It's an authz system that comes with the platform – as opposed to implementation-specific – therefore it's less obscure and more portable, it uses the platform as source of truth
  • So the user doesn't have to repeat the same permissions over and over across multiple policy objects whenever those permissions can be simplified to a common rule that translates to "check it with this single source of truth" - it's simpler to update 1 RoleBinding than N policy objects
  • It's RBAC, not an ACL, so the user can leverage roles and use user groups as grouping mechanisms, without losing granularity for finer-grained authorisation whenever needed
  • It standardises the language used to manage all access control in your Kubernetes system

Will that prototype provide sufficient example of how to implement the API?

We can leave this discussion for when the actual prototype happens, but I anticipate an implementation less focused on Envoy-specific functionality and more on the native components of the platform. kube-rbac-proxy will probably be a good reference.

What other types should there be, and would they become part of the API standard here, or is the intention to allow implementors to specify their own types (which the API structure right now doesn't seem to allow for)

That I still don't know, but it would make it a good feedback to get from the community once the first prototype is shown.

My experience with Kube-native authorisation policies has been that OIDC+CEL and Kube SA tokens+RBAC cover the >90% of the cases, but no science here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a section to the doc about this point of using Kube RBAC for agentic networking authorization - motivations and caveats.

https://github.com/guicassolato/kube-agentic-networking/blob/authscheme/docs/proposals/0017-DynamicAuth.md#kubernetes-rbac-for-agentic-networking-authorization

…ntic networking authorization

Signed-off-by: Guilherme Cassolato <[email protected]>
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: guicassolato
Once this PR has been reviewed and has the lgtm label, please assign david-martin for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 28, 2025
}

// IdentityRuleType defines a type of identity verification rule
// +kubebuilder:validation:Enum=Kubernetes,OIDC

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel quite strongly that putting these as native things is going to be bad for a general purpose Kubernetes API. Auth is not universal. The 2 types that 1 vendor may want/can support are not the 2 types that another can, nor the two types UserA vs UserB will want.

Why these and not LDAP, SAML, Kerberos, Tailscale, etc?

This must be a vendor extension point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@howardjohn, I see your point and I don't disagree with it. However, I also believe this is part of building a standard.

The standard can evolve to supporting more types in the future. We just have to start somewhere.

I guess the issue then is which types should we start with, that are better than having no standard types at all? Or, as you said, why Kube auth and OIDC, and not LDAP, SAML, Kerberos, Tailscale, etc?

My case for these two (namely, Kube auth and OIDC) as a starting point is backed by experience, but that matters less than the following reasons, I suppose:

  • Kubernetes is the platform, it is the only given we have. Using the authz system provided by the platform makes sense to me. You don't need a 3rd-party authz system, you know the language already, etc, etc.
  • OIDC not only is widely popular but it is the de facto authentication system in MCP, which, as we know, standardised on OAuth 2.1.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubernetes is the platform, it is the only given we have. Using the authz system provided by the platform makes sense to me.

While Kubernetes is a given, and it makes sense to you, it may not make sense to others -- which is my main issue with specifying these explicitly. Of the 40 Gateway API impementations the only one I have seen that uses this mechanism is Kuadrant, which makes me think its not as universal as it may seem (there could be more than 1, but the point still stands that it is uncommon).

OIDC I am less worried about though I do worry its complex enough that a lowest common denominator API may leave things lacking (hard to say since the definition is not included in the proposal)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@howardjohn, do you think this design proposed by @david-martin would help attenuate concerns about auth mechanisms being too specific of one or another implementation?

I believe in the end what we all want is not something that makes sense to me, to you, or to any other who impersonates an implementation, but something that makes sense to users. I have a strong feeling that users will value anything that feels less implementation-specific and more in line with what the common denominator underneath (aka Kubernetes in this case) has to offer, of course as long as it solves their use cases. A prototype that is not tied to any particular implementation may be a good way to collect feedback on this idea IMO.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I have to agree with @howardjohn on this one. An API that makes sense to users is definitely our shared goal, but Kubernetes as an AuthZ API for network traffic falls short of that IMO. I don't think I've talked to a single user at Azure who has even thought about doing something like that. That's not to say this API shouldn't support such a thing as a vendor extension, but having an explicit API seems like a bit much

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubernetes as an AuthZ API for network traffic falls short of that

@keithmattix, to be clear, are you saying Kube auth for network traffic does not make sense to users because you haven't heard of that before from your users? I think what we want to assess here is what users who are presented with this option think about it, especially if it's not a vendor-specific thing.

On the other hand, from the people working on Kubernetes at both perspectives, as users and as maintainers like yourself, @howardjohn and everybody here, I would definitely factor any concrete (technical) points against this idea in.

I've included at least one of those points at https://github.com/kubernetes-sigs/kube-agentic-networking/blob/cb2ef598990fd133d80abc49fe8d04b9a38c9c9e/docs/proposals/0017-DynamicAuth.md#caveats-and-operational-considerations (increased API Server load) and am now about to include a second one (permissions to write RoleBindings to the cluster are atomic, cluster admins may want to set an AdmissionValidatingPolicy). The open questions I have to you are:

  • Are these the only technical arguments against using Kube auth for network traffic or there are others?
  • In your opinion, are these enough reasons not to do it or will users still appreciate having the option despite warnings about careful usage?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the only technical arguments against using Kube auth for network traffic or there are others?

To the best of my knowledge, this is not something the API Server was designed to handle. In my experience working with customers, a top priority is shielding their API server. Even seemingly small additional loads on the API server is liable to completely kill it (with variations depending on the specific k8s provider). Add O(data plane requests) as control plane requests seems highly problematic. in my brief testing with a trivial rbac setup, on kind on very very fast hardware (i.e. the best possible scenario), I am capping out at 3,000 RPS which is orders of magnitude below what would be expected for users.

users still appreciate having the option

Users get options via extensions. Putting something in core means an implementation must (or really really should) support it. There is a big difference. A user should have the option to use LDAP, by picking an implementation that supports LDAP -- but I shouldn't force you to implement it because who wants to implement LDAP 🙂

// AuthorizationRuleKubernetesRBAC defines the type of the authorization scheme as Kubernetes authorization
AuthorizationRuleKubernetesRBAC AuthorizationRuleType = "Kubernetes"
// AuthorizationRuleCEL defines the type of the authorization scheme as Common Expression Language (CEL) expression
AuthorizationRuleCEL AuthorizationRuleType = "CommonExpressionLanguage"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: CEL is not a vendor-agnostic portable API given that each implementation has wildly different functions, variables, etc available. Not to mention language-to-language spec differences.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fair point. The enabled CEL libraries/extensions would have to be well-defined.

I believe a reasonable set to adopt would be the same one used by Kubernetes itself: https://kubernetes.io/docs/reference/using-api/cel/#cel-options-language-features-and-libraries

I do understand though that this is only guaranteed in Golang. So another option could be starting with a minimum set known to exist in most programming languages. E.g. strings.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions are the lesser problem, what attributes are available is a bigger one. Unless we define an entire set of attributes each proxy must expose, which is likely to be either too-specific or lowest-common-denominator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! I see. You mean what in the text I called Context Variables? Agreed. That is even more important to (eventually) reach a consensus about.

I would leave that to the prototype though, and start with only a very minimum set of attributes – ones that are inherent to the protocols (HTTP, MCP, etc), and not something bloated like this.

@david-martin
Copy link
Contributor

@guicassolato I've been trying to get my head around what vendors would implement vs. how they might extend this with additional capabilities.
I put together a simple relationship diagram so I could better understand how things are proposed right now:

image

I believe the proposal would require implementors to implement for Kube & OIDC for identity, and Kube & CEL for authorization. That would align with some future conformance potentially.
If an implementer wants to add additional identity rule or authorization rule types, they can, but they wouldn't be part of the standard. At least not without updating the corresponding type (IdentityRuleType/AuthorizationRuleType) and doing a new release of the API.

I've drawn up another version of this diagram, with an identity provider and authorization policy reference from the corresponding fields in the AuthRule. My intent is to abstract the points of contention to a clearer vendor extension point.
In this approach, an IdentityRule no longer has a type. Instead it has a providerRef. Same for AuthorizationRule. This is the extension point where a vendor can implement their own supported providers, like OIDC, Kube, CEL etc.. We could still build these out in a reference implementation, but they wouldn't be part of the standard API.

image

For simpler cases, there is a provision in the spec for an inline rule. This could be more like the AccessRule in https://github.com/kubernetes-sigs/kube-agentic-networking/blob/main/docs/proposals/0008-ToolAuthAPI.md, where a simple list of tools or identities is provided inline.

@guicassolato
Copy link
Contributor Author

I've drawn up another version of this diagram, with an identity provider and authorization policy reference from the corresponding fields in the AuthRule. My intent is to abstract the points of contention to a clearer vendor extension point. In this approach, an IdentityRule no longer has a type. Instead it has a providerRef. Same for AuthorizationRule. This is the extension point where a vendor can implement their own supported providers, like OIDC, Kube, CEL etc.. We could still build these out in a reference implementation, but they wouldn't be part of the standard API.

image For simpler cases, there is a provision in the spec for an inline rule. This could be more like the AccessRule in https://github.com/kubernetes-sigs/kube-agentic-networking/blob/main/docs/proposals/0008-ToolAuthAPI.md, where a simple list of tools or identities is provided inline.

This looks like a neat design, @david-martin. Curious to see what others think about it.

I'd probably consider having the "inline rules" also as an extension type referenced via providerRef.

Of course, more CRDs translate to more reconciliation and more challenging UX (discoverability, status, etc), which is something to have in mind.

@k8s-ci-robot
Copy link
Contributor

@guicassolato: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kube-agentic-networking-verify b67473e link true /test pull-kube-agentic-networking-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@LiorLieberman
Copy link
Member

LiorLieberman commented Dec 3, 2025

I love the diagrams! thanks @david-martin

What would providerRef ref to? A provider specific GVK?

I'd probably consider having the "inline rules" also as an extension type referenced via providerRef.
Why?

Any concerns with "simple" rules that are identified (or could be) as broadly applicable?

Of course, more CRDs translate to more reconciliation and more challenging UX (discoverability, status, etc), which is something to have in mind.

+1, we need to understand what could be a potentially common ground here

@david-martin
Copy link
Contributor

I love the diagrams! thanks @david-martin

Thanks

What would providerRef ref to? A provider specific GVK?

Yes.

I'd probably consider having the "inline rules" also as an extension type referenced via providerRef.
Why?

Any concerns with "simple" rules that are identified (or could be) as broadly applicable?

This was my thinking. All implementations ship with some simple solution that works for a broad set of use cases, but is perhaps not scalable and has some other limitations.

@hzxuzhonghu
Copy link
Member

For envoy the auth no matter jwt authentication or rbac are attached to listeners, and are enforced before routing to backend. Here a big difference, seems the auth is done before sending request to backend. So what is the complete api config look like?

Gateway --> HttpRoute --> AuthScheme or Backend?

How to make proxy like envoy support this api scheme? I cannot imagine, because envoy filters are mostly attached to listeners

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants