CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

About OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance. This is a multi-module project with Java backend services, React frontend, Python ingestion framework, and comprehensive Docker infrastructure.

Architecture Overview

Backend: Java 21 + Dropwizard REST API framework, multi-module Maven project
Frontend: React + TypeScript + Ant Design, built with Webpack and Yarn
Ingestion: Python 3.10-3.12 with Pydantic 2.x, 75+ data source connectors
Database: MySQL (default) or PostgreSQL with Flyway migrations
Search: Elasticsearch 7.17+ or OpenSearch 2.6+ for metadata discovery
Infrastructure: Apache Airflow for workflow orchestration

Essential Development Commands

Prerequisites and Setup

make prerequisites              # Check system requirements
make install_dev_env           # Install all development dependencies
make yarn_install_cache        # Install UI dependencies

Frontend Development

cd openmetadata-ui/src/main/resources/ui
yarn start                     # Start development server on localhost:3000
yarn test                      # Run Jest unit tests
yarn test path/to/test.spec.ts # Run a specific test file
yarn test:watch               # Run tests in watch mode
yarn playwright:run            # Run E2E tests
yarn lint                      # ESLint check
yarn lint:fix                  # ESLint with auto-fix
yarn build                     # Production build

Backend Development

mvn clean package -DskipTests  # Build without tests
mvn clean package -DonlyBackend -pl !openmetadata-ui  # Backend only
mvn test                       # Run unit tests
mvn verify                     # Run integration tests
mvn spotless:apply             # Format Java code

Python Ingestion Development

cd ingestion
make install_dev_env           # Install in development mode
make generate                  # Generate Pydantic models from JSON schemas
make unit_ingestion_dev_env    # Run unit tests
make lint                      # Run pylint
make py_format                 # Format with black, isort, pycln
make static-checks             # Run type checking with basedpyright

Full Local Environment

./docker/run_local_docker.sh -m ui -d mysql        # Complete local setup with UI
./docker/run_local_docker.sh -m no-ui -d postgresql # Backend only with PostgreSQL
./docker/run_local_docker.sh -s true               # Skip Maven build step

Testing

make run_e2e_tests             # Full E2E test suite
make unit_ingestion            # Python unit tests with coverage
yarn test:coverage             # Frontend test coverage

Code Generation and Schemas

OpenMetadata uses a schema-first approach with JSON Schema definitions driving code generation:

make generate                  # Generate all models from schemas
make py_antlr                  # Generate Python ANTLR parsers
make js_antlr                  # Generate JavaScript ANTLR parsers
yarn parse-schema              # Parse JSON schemas for frontend (connection and ingestion schemas)

Schema Architecture

Source schemas in openmetadata-spec/ define the canonical data models
Connection schemas are pre-processed at build time via parseSchemas.js to resolve all $ref references
Application schemas in openmetadata-ui/.../ApplicationSchemas/ are resolved at runtime using schemaResolver.ts
JSON schemas with $ref references to external files require resolution before use in forms

Key Directories

openmetadata-service/ - Core Java backend services and REST APIs
openmetadata-ui/src/main/resources/ui/ - React frontend application
ingestion/ - Python ingestion framework with connectors
openmetadata-spec/ - JSON Schema specifications for all entities
bootstrap/sql/ - Database schema migrations and sample data
conf/ - Configuration files for different environments
docker/ - Docker configurations for local and production deployment

Development Workflow

Schema Changes: Modify JSON schemas in openmetadata-spec/, then run mvn clean install on openmetadata-spec to update models
Backend: Develop in Java using Dropwizard patterns, test with mvn test, format with mvn spotless:apply
Frontend: Use React/TypeScript with Ant Design components, test with Jest/Playwright
Ingestion: Python connectors follow plugin pattern, use make install_dev_env for development
Full Testing: Use make run_e2e_tests before major changes

Frontend Architecture Patterns

React Component Patterns

File Naming: Components use ComponentName.component.tsx, interfaces use ComponentName.interface.ts
State Management: Use useState with proper typing, avoid any
Side Effects: Use useEffect with proper dependency arrays
Performance: Use useCallback for event handlers, useMemo for expensive computations
Custom Hooks: Prefix with use, place in src/hooks/, return typed objects
Internationalization: Use useTranslation hook from react-i18next, access with t('key')
Component Structure: Functional components only, no class components
Props: Define interfaces for all component props, place in .interface.ts files
Loading States: Use object state for multiple loading states: useState<Record<string, boolean>>({})
Error Handling: Use showErrorToast and showSuccessToast utilities from ToastUtils
Navigation: Use useNavigate from react-router-dom, not direct history manipulation
Data Fetching: Async functions with try-catch blocks, update loading states appropriately

State Management

Use Zustand stores for global state (e.g., useLimitStore, useWelcomeStore)
Keep component state local when possible with useState
Use context providers for feature-specific shared state (e.g., ApplicationsProvider)

Styling

MUI Migration: The project is gradually migrating from Ant Design to Material-UI (MUI) v7.3.1
Preferred Approach: Use MUI components v7.3.1 and styles wherever possible for new features
Theme and Styles: MUI theme data and styles are defined in openmetadata-ui-core-components
Colors and Design Tokens: Always reference theme colors and design tokens from the MUI theme, not hardcoded values
Legacy Components: Ant Design components remain in existing code but should be replaced with MUI equivalents when refactoring
Do not add unnecessary spacing between logs and code.
In Java, avoid wildcards imports (e.g., use import java.util.List; instead of import java.util.*;)
Custom styles in .less files with component-specific naming (legacy pattern)
Follow BEM naming convention for custom CSS classes
Use CSS modules where appropriate

UI considerations

Do not use string literals at any place. You should use useTranslation hook and use it like const {t} = useTranslation(). And for example if you want to have "Run" as string, you should be using { t('label.run') }, this label is defined in locales.

Application Configuration

Applications use ApplicationsClassBase for schema loading and configuration
Dynamic imports handle application-specific schemas and assets
Form schemas use React JSON Schema Form (RJSF) with custom UI widgets

Service Utilities

Each service type has dedicated utility files (e.g., DatabaseServiceUtils.tsx)
Connection schemas are imported statically and pre-resolved
Service configurations use switch statements to map types to schemas

Type Safety

All API responses have generated TypeScript interfaces in generated/
Custom types extend base interfaces when needed
Avoid type assertions unless absolutely necessary
Use discriminated unions for action types and state variants

Database and Migrations

Flyway handles schema migrations in bootstrap/sql/migrations/
Use Docker containers for local database setup
Default MySQL, PostgreSQL supported as alternative
Sample data loaded automatically in development environment

Security and Authentication

JWT-based authentication with OAuth2/SAML support
Role-based access control defined in Java entities
Security configurations in conf/openmetadata.yaml
Never commit secrets - use environment variables or secure vaults

Code Generation Standards

Comments Policy

Do NOT add unnecessary comments - write self-documenting code
NEVER add single-line comments that describe what the code obviously does
Only include comments for:
- Complex business logic that isn't obvious
- Non-obvious algorithms or workarounds
- Public API JavaDoc documentation
- TODO/FIXME with ticket references
Bad examples (NEVER do this):
- // Create user before createUser()
- // Get client before SdkClients.adminClient()
- // Verify domain is set before assertNotNull(entity.getDomain())
- // User names are lowercased when the code toLowerCase() makes it obvious
If the code needs a comment to be understood, refactor the code to be clearer instead

Java Code Requirements

Always mention running mvn spotless:apply when generating/modifying .java files
Use clear, descriptive variable and method names instead of comments
Follow existing project patterns and conventions
Generate production-ready code, not tutorial code
Create integration tests in openmetadata-integration-tests
Do not use Fully Qualified Names in the code such as org.openmetadata.schema.type.Status instead import the class name
Do not import wild-card packages instead import exactly required packages

TypeScript/Frontend Code Requirements

NEVER use any type in TypeScript code - always use proper types
Use unknown when the type is truly unknown and add type guards
Import types from existing type definitions (e.g., RJSFSchema from @rjsf/utils)
Follow ESLint rules strictly - the project enforces no-console, proper formatting
Add // eslint-disable-next-line comments only when absolutely necessary
Import Organization (in order):
1. External libraries (React, Ant Design, etc.)
2. Internal absolute imports from generated/, constants/, hooks/, etc.
3. Relative imports for utilities and components
4. Asset imports (SVGs, styles)
5. Type imports grouped separately when needed

Python Code Requirements

Use pytest, not unittest - write tests using pytest style with plain assert statements
Use pytest fixtures for test setup instead of setUp/tearDown methods
Use unittest.mock for mocking (MagicMock, patch) - this is compatible with pytest
Test classes should not inherit from TestCase - use plain classes prefixed with Test
Use assert x == y instead of self.assertEqual(x, y)
Use assert x is None instead of self.assertIsNone(x)
Use assert "text" in string instead of self.assertIn("text", string)

Python Ingestion Connector Guidelines

Keep connector-specific logic in connector-specific files, not in generic/shared files like builders.py
Example: Redshift IAM auth should be in ingestion/src/metadata/ingestion/source/database/redshift/connection.py, not in ingestion/src/metadata/ingestion/connections/builders.py
This keeps the codebase modular and prevents generic utilities from becoming cluttered with connector-specific edge cases

Testing Philosophy

Test real behavior, not mock wiring - if a test requires mocking 3+ classes just to verify a method call, it's testing the wrong thing
Prefer integration tests over heavily-mocked unit tests. This project has full integration test infrastructure (OpenMetadataApplicationTest, Docker containers, real OpenSearch). Use it.
Mocks are for boundaries, not internals - mock external services (HTTP clients, third-party APIs), not your own classes. If you're mocking static methods left and right to test internal plumbing, write an integration test instead.
A test that mocks everything proves nothing - it only verifies that your mocks are wired correctly, not that the system works
Ask "what breaks if this test passes but the code is wrong?" - if the answer is "nothing, because everything real is mocked out", delete the test and write a better one
Test the outcome, not the implementation - assert on observable results (API responses, database state, stats values) rather than verifying internal method calls with verify()

Response Format

Provide clean code blocks without unnecessary explanations
Assume readers are experienced developers
Focus on functionality over education

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

About OpenMetadata

Architecture Overview

Essential Development Commands

Prerequisites and Setup

Frontend Development

Backend Development

Python Ingestion Development

Full Local Environment

Testing

Code Generation and Schemas

Schema Architecture

Key Directories

Development Workflow

Frontend Architecture Patterns

React Component Patterns

State Management

Styling

UI considerations

Application Configuration

Service Utilities

Type Safety

Database and Migrations

Security and Authentication

Code Generation Standards

Comments Policy

Java Code Requirements

TypeScript/Frontend Code Requirements

Python Code Requirements

Python Ingestion Connector Guidelines

Testing Philosophy

Response Format

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

About OpenMetadata

Architecture Overview

Essential Development Commands

Prerequisites and Setup

Frontend Development

Backend Development

Python Ingestion Development

Full Local Environment

Testing

Code Generation and Schemas

Schema Architecture

Key Directories

Development Workflow

Frontend Architecture Patterns

React Component Patterns

State Management

Styling

UI considerations

Application Configuration

Service Utilities

Type Safety

Database and Migrations

Security and Authentication

Code Generation Standards

Comments Policy

Java Code Requirements

TypeScript/Frontend Code Requirements

Python Code Requirements

Python Ingestion Connector Guidelines

Testing Philosophy

Response Format