An AI-powered test flakiness detection and analysis tool that helps identify and resolve flaky tests in your CI/CD pipeline using semantic embeddings and density-based clustering.
This monorepo contains the following published packages:
| Package | Version | Description |
|---|---|---|
| @flakiness-detective/core | Core detection engine with DBSCAN clustering | |
| @flakiness-detective/adapters | Data adapters and AI providers | |
| @flakiness-detective/cli | Command-line interface | |
| @flakiness-detective/utils | Shared utilities and logger |
- AI-Powered Analysis: Uses semantic embeddings to understand test failure patterns beyond simple text matching
- DBSCAN Clustering: Groups similar failures using density-based clustering with configurable distance metrics (cosine/euclidean)
- Rich Pattern Extraction: Automatically extracts patterns from Playwright error messages including:
- Locators and matchers
- Actual vs expected values
- Timeouts and line numbers
- GitHub Actions run IDs
- Error snippets and stack traces
- Frequency Analysis: Identifies common patterns across test failures with 50% threshold for cluster identification
- Deterministic Cluster IDs: Stable, reproducible cluster identifiers for tracking over time
- In-Memory: Fast, ephemeral storage for development
- Filesystem: JSON-based persistence with automatic Date serialization
- Firestore: Production-ready Google Cloud Firestore integration
- Playwright Reporter: Direct integration with Playwright JSON reports
- Google Generative AI: Production-ready embeddings using
text-embedding-004 - Mock Provider: Fast, deterministic embeddings for testing
epsilon: DBSCAN distance threshold (default: 0.15 for cosine)minPoints: Minimum neighbors for core points (default: 2)minClusterSize: Minimum failures per cluster (default: 2)distance: Distance metric -cosine(default) oreuclideanmaxClusters: Maximum clusters to return (default: 5)
days: Number of days to look back for failures (default: 7)
- Input Validation: Comprehensive validation for configs and data
- Type Safety: Full TypeScript support with strict mode
- Error Handling: Graceful degradation and detailed error messages
- Testing: Extensive unit and E2E test coverage
- Documentation: Complete API documentation and examples
flakiness-detective-ts/
βββ packages/
β βββ core/ # Core algorithms and interfaces
β β βββ src/
β β β βββ flakiness-detective.ts # Main detection orchestrator
β β β βββ types.ts # Core type definitions
β β β βββ clustering/
β β β β βββ dbscan.ts # DBSCAN implementation
β β β βββ utils/
β β β βββ pattern-extraction.ts # Playwright error parsing
β β β βββ validation.ts # Input validation
β β βββ README.md # Core package documentation
β βββ adapters/ # Data storage and AI provider adapters
β β βββ src/
β β β βββ data-adapters/
β β β β βββ filesystem-adapter.ts
β β β β βββ firestore-adapter.ts
β β β β βββ in-memory-adapter.ts
β β β β βββ playwright-reporter-adapter.ts
β β β βββ embedding-providers/
β β β βββ google-genai-provider.ts
β β β βββ mock-provider.ts
β βββ cli/ # Command-line interface
β βββ utils/ # Shared utilities
β βββ analyzer/ # Test analysis (future)
β βββ visualization/ # Visualization tools (future)
βββ .github/ # GitHub Actions CI/CD
βββ biome.json # Linting and formatting config
βββ vitest.config.ts # Test configuration
# Install core packages
pnpm add @flakiness-detective/core @flakiness-detective/adapters
# Install peer dependencies (if using Firestore and Google AI)
pnpm add @google-cloud/firestore @google/generative-ai
# Or install CLI globally
pnpm add -g @flakiness-detective/cli# Clone the repository
git clone https://github.com/prosdev/flakiness-detective-ts.git
cd flakiness-detective-ts
# Install dependencies
pnpm install
# Build all packages
pnpm build- Node.js v22 LTS or higher
- PNPM v8 or higher
- Google Generative AI API key (for production embeddings)
import { FlakinessDetective } from '@flakiness-detective/core';
import {
createDataAdapter,
createEmbeddingProvider,
} from '@flakiness-detective/adapters';
import { createLogger } from '@flakiness-detective/utils';
// Create logger
const logger = createLogger({ level: 'info' });
// Create data adapter (Firestore example)
const dataAdapter = createDataAdapter(
{
type: 'firestore',
firestoreDb: admin.firestore(), // Your Firestore instance
},
logger
);
// Create embedding provider
const embeddingProvider = createEmbeddingProvider(
{
type: 'google',
apiKey: process.env.GOOGLE_AI_API_KEY,
},
logger
);
// Create and run detective
const detective = new FlakinessDetective(
dataAdapter,
embeddingProvider,
{
timeWindow: { days: 7 },
clustering: {
epsilon: 0.15,
minPoints: 2,
minClusterSize: 2,
distance: 'cosine',
maxClusters: 5,
},
},
'info'
);
const clusters = await detective.detect();
console.log(`Found ${clusters.length} flaky test clusters`);import { PlaywrightReporterAdapter } from '@flakiness-detective/adapters';
import { createLogger } from '@flakiness-detective/utils';
const logger = createLogger({ level: 'info' });
// Create adapter pointing to Playwright JSON report
const adapter = new PlaywrightReporterAdapter(
{
reportPath: './test-results/results.json',
runId: process.env.GITHUB_RUN_ID,
reportLink: `https://github.com/org/repo/actions/runs/${process.env.GITHUB_RUN_ID}`,
},
logger
);
// Fetch failures from the last 7 days
const failures = await adapter.fetchFailures(7);
console.log(`Found ${failures.length} test failures`);# Detect flakiness from Playwright reports
flakiness-detective detect \
--adapter playwright \
--adapter-path ./test-results/results.json \
--embedding google \
--api-key YOUR_API_KEY \
--max-clusters 10
# Generate report from saved clusters
flakiness-detective report \
--adapter firestore \
--output-format json \
--output-path ./flakiness-report.json
# Enable debug mode for detailed logging and performance metrics
flakiness-detective detect \
--adapter playwright \
--adapter-path ./test-results/results.json \
--embedding google \
--api-key YOUR_API_KEY \
--verboseπ‘ Tip: Use
--verboseto enable debug mode with timestamps, execution times, API usage stats, and cluster quality metrics.
Flakiness Detective supports configuration files to simplify setup and avoid repetitive CLI arguments. Config files are discovered automatically in the current directory or parent directories.
.flakinessrc.json- JSON configuration (recommended).flakinessrc.js- JavaScript configurationflakiness-detective.config.js- Alternative JS config.flakinessrc.ts- TypeScript configurationflakiness-detective.config.ts- Alternative TS configpackage.json- Inline config inflakinessDetectivefield
{
"timeWindow": {
"days": 7
},
"adapter": {
"type": "playwright",
"reportPath": "./test-results/results.json"
},
"embedding": {
"type": "google",
"apiKey": "${GOOGLE_AI_API_KEY}"
},
"clustering": {
"epsilon": 0.15,
"minPoints": 2,
"minClusterSize": 2,
"distance": "cosine",
"maxClusters": 5
},
"output": {
"format": "console"
},
"verbose": false
}import type { FlakinessDetectiveConfigFile } from '@flakiness-detective/cli';
const config: FlakinessDetectiveConfigFile = {
timeWindow: { days: 14 },
adapter: {
type: 'firestore',
projectId: process.env.GOOGLE_CLOUD_PROJECT_ID,
failuresCollection: 'test_failures',
clustersCollection: 'flaky_clusters',
},
embedding: {
type: 'google',
apiKey: process.env.GOOGLE_AI_API_KEY,
},
clustering: {
epsilon: 0.15,
distance: 'cosine',
maxClusters: 10,
},
output: {
format: 'json',
path: './flakiness-report.json',
},
verbose: true,
};
export default config;{
"name": "my-project",
"flakinessDetective": {
"timeWindow": { "days": 7 },
"adapter": {
"type": "playwright",
"reportPath": "./test-results/results.json"
},
"embedding": {
"type": "google",
"apiKey": "${GOOGLE_AI_API_KEY}"
},
"clustering": {
"epsilon": 0.15,
"maxClusters": 5
}
}
}When both a config file and CLI arguments are provided, CLI arguments take precedence:
# Uses config file but overrides epsilon and maxClusters
flakiness-detective detect \
--epsilon 0.2 \
--max-clusters 10Config files are validated automatically with helpful error messages:
Config validation error in .flakinessrc.json:
Invalid clustering.epsilon: must be a positive number
Details: Got: -0.1
- Core Package README: Complete guide to the core package, including:
- Detailed configuration options
- Playwright-specific examples
- Pattern extraction details
- API reference
- Migration guide from internal implementation
- CLI Package README: Command-line interface guide, including:
- CLI commands and options
- Configuration file formats and examples
- CI/CD integration examples
- Programmatic usage
- Troubleshooting guide
- ROADMAP.md: Future development plans and features
- AGENTS.md: Repository structure and monorepo guidelines
- CLAUDE.md: AI assistant configuration and project context
- CONTRIBUTING.md: How to contribute to this project
# Run all tests
pnpm test
# Run tests in watch mode
pnpm test:watch
# Run tests for a specific package
cd packages/core && pnpm test# Lint all packages
pnpm lint
# Format all packages
pnpm format
# Type check
pnpm typecheck# Build all packages
pnpm build
# Build specific package
pnpm -F "@flakiness-detective/core" build
# Clean build outputs
pnpm clean# Watch mode for a package
pnpm -F "@flakiness-detective/core" dev
# Add a dependency to a package
cd packages/core
pnpm add package-name- Fetch Failures: Data adapter retrieves test failures from the last N days
- Extract Patterns: Parse error messages, stack traces, and metadata
- Generate Embeddings: Convert rich context into vector representations
- Cluster Failures: Group similar failures using DBSCAN
- Analyze Clusters: Calculate frequency thresholds and identify patterns
- Persist Results: Save clusters with deterministic IDs for tracking
Main orchestrator that runs the detection pipeline end-to-end.
Pluggable storage backends implementing the DataAdapter interface:
fetchFailures(days): Retrieve test failuressaveClusters(clusters): Persist cluster resultsfetchClusters(limit): Retrieve saved clusters
AI services implementing the EmbeddingProvider interface:
generateEmbeddings(contexts): Convert text to vector embeddings
Parses Playwright error messages to extract:
- Structured error maps (actual, expected, locator, matcher, timeout)
- Assertion details from code snippets
- GitHub Actions run IDs from report links
- Line numbers, error snippets, and stack traces
This project has comprehensive test coverage:
- Unit Tests: Individual functions and utilities
- Integration Tests: Data adapters and embedding providers
- E2E Tests: Full detection pipeline with mock data
Test files follow the pattern *.test.ts and are located next to source files.
{
id: "2024-W42-0",
failureCount: 15,
failurePattern: "Locator(role=button[name='Submit']) (75%)",
assertionPattern: "toBeVisible on role=button[name='Submit'] (5000ms timeout) (80%)",
metadata: {
failureCount: 15,
firstSeen: "2024-10-14T08:30:00Z",
lastSeen: "2024-10-20T14:22:00Z",
failureIds: ["test-1", "test-2", ...],
runIds: ["123456", "123457", ...],
failureTimestamps: [...],
errorMessages: [
"Locator(role=button[name='Submit']) failed: locator.click: Timeout 5000ms exceeded...",
...
]
},
commonPatterns: {
filePaths: ["tests/checkout.spec.ts"],
lineNumbers: [45, 46],
locators: ["role=button[name='Submit']"],
matchers: ["toBeVisible"],
timeouts: [5000]
}
}-
CI Workflow: Runs on every push and PR
- Installs dependencies
- Lints code (Biome)
- Builds packages (TypeScript)
- Type checks
- Runs tests (Vitest)
-
Release Workflow: Runs after CI succeeds on main
- Uses Changesets for version management
- Publishes to npm (when packages are not private)
-
Create a feature branch
git checkout -b feature/my-feature
-
Make your changes following Conventional Commits
git commit -m "feat(core): add new clustering algorithm" -
Add a changeset
pnpm changeset
-
Push and create a PR
git push origin feature/my-feature
By default, all packages are "private": true. To publish:
- Set
"private": falsein package.json - Add
"publishConfig": { "access": "public" } - Add
NPM_TOKENsecret in GitHub - Merge changeset PR to publish
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
This project is based on an internal implementation developed at Lytics for detecting flaky Playwright tests in CI/CD pipelines. It has been open-sourced and enhanced with:
- Pluggable adapter architecture
- Multiple distance metrics
- Enhanced pattern extraction
- Comprehensive testing
- Full TypeScript support
- π Documentation: See packages/core/README.md
- π Issues: GitHub Issues
- π¬ Discussions: GitHub Discussions