This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance. This is a multi-module project with Java backend services, React frontend, Python ingestion framework, and comprehensive Docker infrastructure.
- Backend: Java 21 + Dropwizard REST API framework, multi-module Maven project
- Frontend: React + TypeScript + Ant Design, built with Webpack and Yarn
- Ingestion: Python 3.10-3.12 with Pydantic 2.x, 75+ data source connectors
- Database: MySQL (default) or PostgreSQL with Flyway migrations
- Search: Elasticsearch 7.17+ or OpenSearch 2.6+ for metadata discovery
- Infrastructure: Apache Airflow for workflow orchestration
make prerequisites # Check system requirements
make install_dev_env # Install all development dependencies
make yarn_install_cache # Install UI dependenciescd openmetadata-ui/src/main/resources/ui
yarn start # Start development server on localhost:3000
yarn test # Run Jest unit tests
yarn test path/to/test.spec.ts # Run a specific test file
yarn test:watch # Run tests in watch mode
yarn playwright:run # Run E2E tests
yarn lint # ESLint check
yarn lint:fix # ESLint with auto-fix
yarn build # Production buildmvn clean package -DskipTests # Build without tests
mvn clean package -DonlyBackend -pl !openmetadata-ui # Backend only
mvn test # Run unit tests
mvn verify # Run integration tests
mvn spotless:apply # Format Java codecd ingestion
make install_dev_env # Install in development mode
make generate # Generate Pydantic models from JSON schemas
make unit_ingestion_dev_env # Run unit tests
make lint # Run pylint
make py_format # Format with black, isort, pycln
make static-checks # Run type checking with basedpyright./docker/run_local_docker.sh -m ui -d mysql # Complete local setup with UI
./docker/run_local_docker.sh -m no-ui -d postgresql # Backend only with PostgreSQL
./docker/run_local_docker.sh -s true # Skip Maven build stepmake run_e2e_tests # Full E2E test suite
make unit_ingestion # Python unit tests with coverage
yarn test:coverage # Frontend test coverageOpenMetadata uses a schema-first approach with JSON Schema definitions driving code generation:
make generate # Generate all models from schemas
make py_antlr # Generate Python ANTLR parsers
make js_antlr # Generate JavaScript ANTLR parsers
yarn parse-schema # Parse JSON schemas for frontend (connection and ingestion schemas)- Source schemas in
openmetadata-spec/define the canonical data models - Connection schemas are pre-processed at build time via
parseSchemas.jsto resolve all$refreferences - Application schemas in
openmetadata-ui/.../ApplicationSchemas/are resolved at runtime usingschemaResolver.ts - JSON schemas with
$refreferences to external files require resolution before use in forms
openmetadata-service/- Core Java backend services and REST APIsopenmetadata-ui/src/main/resources/ui/- React frontend applicationingestion/- Python ingestion framework with connectorsopenmetadata-spec/- JSON Schema specifications for all entitiesbootstrap/sql/- Database schema migrations and sample dataconf/- Configuration files for different environmentsdocker/- Docker configurations for local and production deployment
- Schema Changes: Modify JSON schemas in
openmetadata-spec/, then runmvn clean installon openmetadata-spec to update models - Backend: Develop in Java using Dropwizard patterns, test with
mvn test, format withmvn spotless:apply - Frontend: Use React/TypeScript with Ant Design components, test with Jest/Playwright
- Ingestion: Python connectors follow plugin pattern, use
make install_dev_envfor development - Full Testing: Use
make run_e2e_testsbefore major changes
- File Naming: Components use
ComponentName.component.tsx, interfaces useComponentName.interface.ts - State Management: Use
useStatewith proper typing, avoidany - Side Effects: Use
useEffectwith proper dependency arrays - Performance: Use
useCallbackfor event handlers,useMemofor expensive computations - Custom Hooks: Prefix with
use, place insrc/hooks/, return typed objects - Internationalization: Use
useTranslationhook from react-i18next, access witht('key') - Component Structure: Functional components only, no class components
- Props: Define interfaces for all component props, place in
.interface.tsfiles - Loading States: Use object state for multiple loading states:
useState<Record<string, boolean>>({}) - Error Handling: Use
showErrorToastandshowSuccessToastutilities from ToastUtils - Navigation: Use
useNavigatefrom react-router-dom, not direct history manipulation - Data Fetching: Async functions with try-catch blocks, update loading states appropriately
- Use Zustand stores for global state (e.g.,
useLimitStore,useWelcomeStore) - Keep component state local when possible with
useState - Use context providers for feature-specific shared state (e.g.,
ApplicationsProvider)
- MUI Migration: The project is gradually migrating from Ant Design to Material-UI (MUI) v7.3.1
- Preferred Approach: Use MUI components v7.3.1 and styles wherever possible for new features
- Theme and Styles: MUI theme data and styles are defined in
openmetadata-ui-core-components - Colors and Design Tokens: Always reference theme colors and design tokens from the MUI theme, not hardcoded values
- Legacy Components: Ant Design components remain in existing code but should be replaced with MUI equivalents when refactoring
- Do not add unnecessary spacing between logs and code.
- In Java, avoid wildcards imports (e.g., use
import java.util.List;instead ofimport java.util.*;) - Custom styles in
.lessfiles with component-specific naming (legacy pattern) - Follow BEM naming convention for custom CSS classes
- Use CSS modules where appropriate
- Do not use string literals at any place. You should use useTranslation hook and use it like const {t} = useTranslation(). And for example if you want to have "Run" as string, you should be using { t('label.run') }, this label is defined in locales.
- Applications use
ApplicationsClassBasefor schema loading and configuration - Dynamic imports handle application-specific schemas and assets
- Form schemas use React JSON Schema Form (RJSF) with custom UI widgets
- Each service type has dedicated utility files (e.g.,
DatabaseServiceUtils.tsx) - Connection schemas are imported statically and pre-resolved
- Service configurations use switch statements to map types to schemas
- All API responses have generated TypeScript interfaces in
generated/ - Custom types extend base interfaces when needed
- Avoid type assertions unless absolutely necessary
- Use discriminated unions for action types and state variants
- Flyway handles schema migrations in
bootstrap/sql/migrations/ - Use Docker containers for local database setup
- Default MySQL, PostgreSQL supported as alternative
- Sample data loaded automatically in development environment
- JWT-based authentication with OAuth2/SAML support
- Role-based access control defined in Java entities
- Security configurations in
conf/openmetadata.yaml - Never commit secrets - use environment variables or secure vaults
- Do NOT add unnecessary comments - write self-documenting code
- NEVER add single-line comments that describe what the code obviously does
- Only include comments for:
- Complex business logic that isn't obvious
- Non-obvious algorithms or workarounds
- Public API JavaDoc documentation
- TODO/FIXME with ticket references
- Bad examples (NEVER do this):
// Create userbeforecreateUser()// Get clientbeforeSdkClients.adminClient()// Verify domain is setbeforeassertNotNull(entity.getDomain())// User names are lowercasedwhen the codetoLowerCase()makes it obvious
- If the code needs a comment to be understood, refactor the code to be clearer instead
- Always mention running
mvn spotless:applywhen generating/modifying .java files - Use clear, descriptive variable and method names instead of comments
- Follow existing project patterns and conventions
- Generate production-ready code, not tutorial code
- Create integration tests in openmetadata-integration-tests
- Do not use Fully Qualified Names in the code such as org.openmetadata.schema.type.Status instead import the class name
- Do not import wild-card packages instead import exactly required packages
- NEVER use
anytype in TypeScript code - always use proper types - Use
unknownwhen the type is truly unknown and add type guards - Import types from existing type definitions (e.g.,
RJSFSchemafrom@rjsf/utils) - Follow ESLint rules strictly - the project enforces no-console, proper formatting
- Add
// eslint-disable-next-linecomments only when absolutely necessary - Import Organization (in order):
- External libraries (React, Ant Design, etc.)
- Internal absolute imports from
generated/,constants/,hooks/, etc. - Relative imports for utilities and components
- Asset imports (SVGs, styles)
- Type imports grouped separately when needed
- Use pytest, not unittest - write tests using pytest style with plain
assertstatements - Use pytest fixtures for test setup instead of
setUp/tearDownmethods - Use
unittest.mockfor mocking (MagicMock, patch) - this is compatible with pytest - Test classes should not inherit from
TestCase- use plain classes prefixed withTest - Use
assert x == yinstead ofself.assertEqual(x, y) - Use
assert x is Noneinstead ofself.assertIsNone(x) - Use
assert "text" in stringinstead ofself.assertIn("text", string)
- Keep connector-specific logic in connector-specific files, not in generic/shared files like
builders.py - Example: Redshift IAM auth should be in
ingestion/src/metadata/ingestion/source/database/redshift/connection.py, not iningestion/src/metadata/ingestion/connections/builders.py - This keeps the codebase modular and prevents generic utilities from becoming cluttered with connector-specific edge cases
- Test real behavior, not mock wiring - if a test requires mocking 3+ classes just to verify a method call, it's testing the wrong thing
- Prefer integration tests over heavily-mocked unit tests. This project has full integration test infrastructure (OpenMetadataApplicationTest, Docker containers, real OpenSearch). Use it.
- Mocks are for boundaries, not internals - mock external services (HTTP clients, third-party APIs), not your own classes. If you're mocking static methods left and right to test internal plumbing, write an integration test instead.
- A test that mocks everything proves nothing - it only verifies that your mocks are wired correctly, not that the system works
- Ask "what breaks if this test passes but the code is wrong?" - if the answer is "nothing, because everything real is mocked out", delete the test and write a better one
- Test the outcome, not the implementation - assert on observable results (API responses, database state, stats values) rather than verifying internal method calls with
verify()
- Provide clean code blocks without unnecessary explanations
- Assume readers are experienced developers
- Focus on functionality over education