feat(api): add dataset-scoped API key permission validation #31669

kinglisky · 2026-01-28T12:06:09Z

Description

Important

Make sure you have read our contribution guidelines
Ensure there is an associated issue and you have been assigned to it
Use the correct syntax to link this PR: [Feature] Dataset-scoped API Keys for Fine-grained Access Control #31409

Summary

This PR introduces dataset-scoped API keys that are bound to specific datasets, providing finer-grained access control compared to existing tenant-level keys.

Resolve #31409

Problem

Currently, Dataset API Keys are tenant-level only (app_id=NULL), which means:

A single key can access all datasets within the tenant
No isolation between different datasets
Over-privileged access for scenarios requiring single-dataset access

Solution

Introduce dataset-scoped API keys that coexist with tenant-level keys:

Key Type	`app_id` Value	Access Scope
Tenant-level	`NULL`	All datasets in tenant
Dataset-scoped	`<dataset_id>`	Only the bound dataset

Changes

Permission validation (controllers/service_api/wraps.py)
- Added dataset-scoped key permission check in validate_dataset_token decorator
- Dataset-scoped keys can only access their bound dataset (403 Forbidden otherwise)
- Dataset-scoped keys cannot access tenant-level endpoints (e.g., list all datasets)
API key management (controllers/console/apikey.py)
- Changed DatasetApiKeyListResource.resource_id_field to use app_id for storing dataset_id
- Changed DatasetApiKeyResource.resource_id_field to use app_id
Tenant-level key listing (controllers/console/datasets/datasets.py)
- Added app_id.is_(None) filter to ensure tenant-level endpoint only returns tenant-level keys
Unit tests (tests/unit_tests/controllers/service_api/test_dataset_token_validation.py)
- Added 5 test cases covering permission validation scenarios

Design Principles

No database migration: Reuse existing app_id field to store dataset_id
Backward compatible: Existing tenant-level keys continue to work unchanged
Minimal invasion: Only modify necessary code paths

Screenshots

N/A - This is a backend API change with no UI modifications.

Checklist

This change requires a documentation update, included: Dify Document
I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
I've updated the documentation accordingly.
I ran make lint and make type-check (backend) and cd web && npx lint-staged (frontend) to appease the lint gods

- Add permission check in validate_dataset_token decorator - Dataset-scoped keys can only access their bound dataset (403 otherwise) - Dataset-scoped keys cannot access tenant-level endpoints - Change DatasetApiKeyListResource to use app_id field for storing dataset_id - Add app_id.is_(None) filter to tenant-level key list endpoint - Add unit tests for permission validation logic

gemini-code-assist · 2026-01-28T12:06:28Z

Summary of Changes

Hello @kinglisky, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the security and flexibility of API access by introducing dataset-scoped API keys. Previously, all dataset API keys were tenant-level, granting broad access to all datasets within a tenant. The new implementation allows keys to be tied to a single dataset, preventing over-privileged access and enabling more precise control over data access. This change reuses existing database structures and maintains backward compatibility for existing tenant-level keys.

Highlights

Dataset-Scoped API Keys: Introduced a new capability to create API keys that are specifically bound to individual datasets, enhancing granular access control.
Permission Validation Logic: Implemented new validation rules in validate_dataset_token to enforce that dataset-scoped keys can only access their designated dataset and cannot access tenant-level endpoints.
API Key Management Update: Modified apikey.py to utilize the existing app_id field for storing dataset_id for dataset-scoped keys, ensuring backward compatibility without database migrations.
Tenant-Level Key Filtering: Updated the API key listing endpoint in datasets.py to explicitly filter for app_id IS NULL, ensuring that only tenant-level keys are displayed for tenant-level requests.
Comprehensive Unit Tests: Added a new test file with 5 distinct test cases to thoroughly verify the correct behavior of the new dataset-scoped API key validation and permission checks.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces dataset-scoped API keys, a great feature for finer-grained access control. While the core permission validation logic in the validate_dataset_token decorator is sound, significant security issues have been identified. Repurposing the app_id column without a database migration leads to existing keys becoming unmanageable ('ghost keys'), and inconsistent key counting logic in the tenant-level key creation endpoint allows for a denial-of-service scenario where scoped keys can block the creation of tenant-level keys. Additionally, there are minor areas for improvement in the new test code to enhance clarity.

gemini-code-assist · 2026-01-28T12:10:28Z

api/controllers/console/apikey.py

    resource_model = Dataset
-    resource_id_field = "dataset_id"
-    token_prefix = "ds-"
+    resource_id_field = "app_id"


The PR changes the resource_id_field for dataset API keys from dataset_id to app_id to reuse the existing column and avoid a database migration. However, as noted in the PR description, no database migration is performed. This creates a security issue where any existing dataset API keys (which likely have their associated dataset ID stored in the dataset_id column) will no longer be returned by the list endpoint or be deletable by the delete endpoint, as these endpoints now filter by the app_id column. These 'ghost' keys remain active and valid for authentication but cannot be managed or revoked by administrators through the API or console, leading to a broken access control and revocation mechanism.

gemini-code-assist · 2026-01-28T12:10:28Z

api/controllers/console/datasets/datasets.py

+            select(ApiToken).where(
+                ApiToken.type == self.resource_type,
+                ApiToken.tenant_id == current_tenant_id,
+                ApiToken.app_id.is_(None),


While the get method has been correctly updated to filter for tenant-level keys using ApiToken.app_id.is_(None), the post method in the same DatasetApiKeyApi class (lines 770-774) still counts all keys of type 'dataset' for the tenant when enforcing the max_keys limit. Since dataset-scoped keys also use the 'dataset' type, they are included in this count. This allows a user with edit permissions (who can create scoped keys) to reach the limit and effectively block administrators from creating any tenant-level dataset API keys, resulting in a denial of service on a critical management function and a privilege inconsistency.

gemini-code-assist · 2026-01-28T12:10:28Z

api/tests/unit_tests/controllers/service_api/test_dataset_token_validation.py

+            # Setup mock database queries
+            mock_db.session.query.return_value.where.return_value.first.return_value = mock_dataset
+            query_chain = mock_db.session.query.return_value.where.return_value
+            query_chain.where.return_value.where.return_value.where.return_value.one_or_none.return_value = (


This line is redundant because the side_effect set on line 82 for the same mock object (...first) will override this return_value. You can safely remove this line to improve test clarity.

gemini-code-assist · 2026-01-28T12:10:28Z

api/tests/unit_tests/controllers/service_api/test_dataset_token_validation.py

+            # Setup mock database queries
+            mock_db.session.query.return_value.where.return_value.first.return_value = mock_dataset
+            query_chain = mock_db.session.query.return_value.where.return_value
+            query_chain.where.return_value.where.return_value.where.return_value.one_or_none.return_value = (


This line is redundant because the side_effect set on line 112 for the same mock object (...first) will override this return_value. You can safely remove this line to improve test clarity.

kinglisky requested review from JohnJyong and QuantumGhost as code owners January 28, 2026 12:06

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jan 28, 2026

[autofix.ci] apply automated fixes

45f452e

gemini-code-assist bot reviewed Jan 28, 2026

View reviewed changes

crazywoola self-assigned this Jan 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): add dataset-scoped API key permission validation #31669

feat(api): add dataset-scoped API key permission validation #31669

Uh oh!

kinglisky commented Jan 28, 2026 •

edited by crazywoola

Loading

Uh oh!

gemini-code-assist bot commented Jan 28, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 28, 2026

Uh oh!

gemini-code-assist bot Jan 28, 2026

Uh oh!

gemini-code-assist bot Jan 28, 2026

Uh oh!

gemini-code-assist bot Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(api): add dataset-scoped API key permission validation #31669

Are you sure you want to change the base?

feat(api): add dataset-scoped API key permission validation #31669

Uh oh!

Conversation

kinglisky commented Jan 28, 2026 • edited by crazywoola Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary

Problem

Solution

Changes

Design Principles

Screenshots

Checklist

Uh oh!

gemini-code-assist bot commented Jan 28, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kinglisky commented Jan 28, 2026 •

edited by crazywoola

Loading