-
Notifications
You must be signed in to change notification settings - Fork 20k
feat(api): add dataset-scoped API key permission validation #31669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add permission check in validate_dataset_token decorator
- Dataset-scoped keys can only access their bound dataset (403 otherwise)
- Dataset-scoped keys cannot access tenant-level endpoints
- Change DatasetApiKeyListResource to use app_id field for storing dataset_id
- Add app_id.is_(None) filter to tenant-level key list endpoint
- Add unit tests for permission validation logic
Summary of ChangesHello @kinglisky, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the security and flexibility of API access by introducing dataset-scoped API keys. Previously, all dataset API keys were tenant-level, granting broad access to all datasets within a tenant. The new implementation allows keys to be tied to a single dataset, preventing over-privileged access and enabling more precise control over data access. This change reuses existing database structures and maintains backward compatibility for existing tenant-level keys. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces dataset-scoped API keys, a great feature for finer-grained access control. While the core permission validation logic in the validate_dataset_token decorator is sound, significant security issues have been identified. Repurposing the app_id column without a database migration leads to existing keys becoming unmanageable ('ghost keys'), and inconsistent key counting logic in the tenant-level key creation endpoint allows for a denial-of-service scenario where scoped keys can block the creation of tenant-level keys. Additionally, there are minor areas for improvement in the new test code to enhance clarity.
| resource_model = Dataset | ||
| resource_id_field = "dataset_id" | ||
| token_prefix = "ds-" | ||
| resource_id_field = "app_id" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR changes the resource_id_field for dataset API keys from dataset_id to app_id to reuse the existing column and avoid a database migration. However, as noted in the PR description, no database migration is performed. This creates a security issue where any existing dataset API keys (which likely have their associated dataset ID stored in the dataset_id column) will no longer be returned by the list endpoint or be deletable by the delete endpoint, as these endpoints now filter by the app_id column. These 'ghost' keys remain active and valid for authentication but cannot be managed or revoked by administrators through the API or console, leading to a broken access control and revocation mechanism.
| select(ApiToken).where( | ||
| ApiToken.type == self.resource_type, | ||
| ApiToken.tenant_id == current_tenant_id, | ||
| ApiToken.app_id.is_(None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the get method has been correctly updated to filter for tenant-level keys using ApiToken.app_id.is_(None), the post method in the same DatasetApiKeyApi class (lines 770-774) still counts all keys of type 'dataset' for the tenant when enforcing the max_keys limit. Since dataset-scoped keys also use the 'dataset' type, they are included in this count. This allows a user with edit permissions (who can create scoped keys) to reach the limit and effectively block administrators from creating any tenant-level dataset API keys, resulting in a denial of service on a critical management function and a privilege inconsistency.
| # Setup mock database queries | ||
| mock_db.session.query.return_value.where.return_value.first.return_value = mock_dataset | ||
| query_chain = mock_db.session.query.return_value.where.return_value | ||
| query_chain.where.return_value.where.return_value.where.return_value.one_or_none.return_value = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Setup mock database queries | ||
| mock_db.session.query.return_value.where.return_value.first.return_value = mock_dataset | ||
| query_chain = mock_db.session.query.return_value.where.return_value | ||
| query_chain.where.return_value.where.return_value.where.return_value.one_or_none.return_value = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description
Important
Summary
This PR introduces dataset-scoped API keys that are bound to specific datasets, providing finer-grained access control compared to existing tenant-level keys.
Resolve #31409
Problem
Currently, Dataset API Keys are tenant-level only (
app_id=NULL), which means:Solution
Introduce dataset-scoped API keys that coexist with tenant-level keys:
app_idValueNULL<dataset_id>Changes
Permission validation (
controllers/service_api/wraps.py)validate_dataset_tokendecoratorAPI key management (
controllers/console/apikey.py)DatasetApiKeyListResource.resource_id_fieldto useapp_idfor storing dataset_idDatasetApiKeyResource.resource_id_fieldto useapp_idTenant-level key listing (
controllers/console/datasets/datasets.py)app_id.is_(None)filter to ensure tenant-level endpoint only returns tenant-level keysUnit tests (
tests/unit_tests/controllers/service_api/test_dataset_token_validation.py)Design Principles
app_idfield to storedataset_idScreenshots
N/A - This is a backend API change with no UI modifications.
Checklist
make lintandmake type-check(backend) andcd web && npx lint-staged(frontend) to appease the lint gods