Skip to content

An AI-powered test flakiness detection and analysis tool that helps identify and resolve flaky tests in your CI/CD pipeline using semantic embeddings and density-based clustering.

License

Notifications You must be signed in to change notification settings

prosdev/flakiness-detective-ts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

34 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Flakiness Detective

npm version Node.js pnpm License: MIT

An AI-powered test flakiness detection and analysis tool that helps identify and resolve flaky tests in your CI/CD pipeline using semantic embeddings and density-based clustering.

πŸ“¦ Packages

This monorepo contains the following published packages:

Package Version Description
@flakiness-detective/core npm Core detection engine with DBSCAN clustering
@flakiness-detective/adapters npm Data adapters and AI providers
@flakiness-detective/cli npm Command-line interface
@flakiness-detective/utils npm Shared utilities and logger

✨ Key Features

πŸ” Advanced Flakiness Detection

  • AI-Powered Analysis: Uses semantic embeddings to understand test failure patterns beyond simple text matching
  • DBSCAN Clustering: Groups similar failures using density-based clustering with configurable distance metrics (cosine/euclidean)
  • Rich Pattern Extraction: Automatically extracts patterns from Playwright error messages including:
    • Locators and matchers
    • Actual vs expected values
    • Timeouts and line numbers
    • GitHub Actions run IDs
    • Error snippets and stack traces
  • Frequency Analysis: Identifies common patterns across test failures with 50% threshold for cluster identification
  • Deterministic Cluster IDs: Stable, reproducible cluster identifiers for tracking over time

πŸ”Œ Flexible Adapters

Data Adapters

  • In-Memory: Fast, ephemeral storage for development
  • Filesystem: JSON-based persistence with automatic Date serialization
  • Firestore: Production-ready Google Cloud Firestore integration
  • Playwright Reporter: Direct integration with Playwright JSON reports

Embedding Providers

  • Google Generative AI: Production-ready embeddings using text-embedding-004
  • Mock Provider: Fast, deterministic embeddings for testing

πŸ“Š Configuration Options

Clustering Configuration

  • epsilon: DBSCAN distance threshold (default: 0.15 for cosine)
  • minPoints: Minimum neighbors for core points (default: 2)
  • minClusterSize: Minimum failures per cluster (default: 2)
  • distance: Distance metric - cosine (default) or euclidean
  • maxClusters: Maximum clusters to return (default: 5)

Time Window

  • days: Number of days to look back for failures (default: 7)

🎯 Production-Ready

  • Input Validation: Comprehensive validation for configs and data
  • Type Safety: Full TypeScript support with strict mode
  • Error Handling: Graceful degradation and detailed error messages
  • Testing: Extensive unit and E2E test coverage
  • Documentation: Complete API documentation and examples

πŸ“¦ Project Structure

flakiness-detective-ts/
β”œβ”€β”€ packages/
β”‚   β”œβ”€β”€ core/            # Core algorithms and interfaces
β”‚   β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”‚   β”œβ”€β”€ flakiness-detective.ts    # Main detection orchestrator
β”‚   β”‚   β”‚   β”œβ”€β”€ types.ts                  # Core type definitions
β”‚   β”‚   β”‚   β”œβ”€β”€ clustering/
β”‚   β”‚   β”‚   β”‚   └── dbscan.ts            # DBSCAN implementation
β”‚   β”‚   β”‚   └── utils/
β”‚   β”‚   β”‚       β”œβ”€β”€ pattern-extraction.ts # Playwright error parsing
β”‚   β”‚   β”‚       └── validation.ts        # Input validation
β”‚   β”‚   └── README.md                    # Core package documentation
β”‚   β”œβ”€β”€ adapters/        # Data storage and AI provider adapters
β”‚   β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”‚   β”œβ”€β”€ data-adapters/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ filesystem-adapter.ts
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ firestore-adapter.ts
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ in-memory-adapter.ts
β”‚   β”‚   β”‚   β”‚   └── playwright-reporter-adapter.ts
β”‚   β”‚   β”‚   └── embedding-providers/
β”‚   β”‚   β”‚       β”œβ”€β”€ google-genai-provider.ts
β”‚   β”‚   β”‚       └── mock-provider.ts
β”‚   β”œβ”€β”€ cli/             # Command-line interface
β”‚   β”œβ”€β”€ utils/           # Shared utilities
β”‚   β”œβ”€β”€ analyzer/        # Test analysis (future)
β”‚   └── visualization/   # Visualization tools (future)
β”œβ”€β”€ .github/             # GitHub Actions CI/CD
β”œβ”€β”€ biome.json           # Linting and formatting config
└── vitest.config.ts     # Test configuration

πŸš€ Quick Start

Installation

Option 1: Install from npm (Recommended)

# Install core packages
pnpm add @flakiness-detective/core @flakiness-detective/adapters

# Install peer dependencies (if using Firestore and Google AI)
pnpm add @google-cloud/firestore @google/generative-ai

# Or install CLI globally
pnpm add -g @flakiness-detective/cli

Option 2: Development Setup

# Clone the repository
git clone https://github.com/prosdev/flakiness-detective-ts.git
cd flakiness-detective-ts

# Install dependencies
pnpm install

# Build all packages
pnpm build

Prerequisites

  • Node.js v22 LTS or higher
  • PNPM v8 or higher
  • Google Generative AI API key (for production embeddings)

Basic Usage

Using the Core Package

import { FlakinessDetective } from '@flakiness-detective/core';
import {
  createDataAdapter,
  createEmbeddingProvider,
} from '@flakiness-detective/adapters';
import { createLogger } from '@flakiness-detective/utils';

// Create logger
const logger = createLogger({ level: 'info' });

// Create data adapter (Firestore example)
const dataAdapter = createDataAdapter(
  {
    type: 'firestore',
    firestoreDb: admin.firestore(), // Your Firestore instance
  },
  logger
);

// Create embedding provider
const embeddingProvider = createEmbeddingProvider(
  {
    type: 'google',
    apiKey: process.env.GOOGLE_AI_API_KEY,
  },
  logger
);

// Create and run detective
const detective = new FlakinessDetective(
  dataAdapter,
  embeddingProvider,
  {
    timeWindow: { days: 7 },
    clustering: {
      epsilon: 0.15,
      minPoints: 2,
      minClusterSize: 2,
      distance: 'cosine',
      maxClusters: 5,
    },
  },
  'info'
);

const clusters = await detective.detect();
console.log(`Found ${clusters.length} flaky test clusters`);

Reading Playwright JSON Reports

import { PlaywrightReporterAdapter } from '@flakiness-detective/adapters';
import { createLogger } from '@flakiness-detective/utils';

const logger = createLogger({ level: 'info' });

// Create adapter pointing to Playwright JSON report
const adapter = new PlaywrightReporterAdapter(
  {
    reportPath: './test-results/results.json',
    runId: process.env.GITHUB_RUN_ID,
    reportLink: `https://github.com/org/repo/actions/runs/${process.env.GITHUB_RUN_ID}`,
  },
  logger
);

// Fetch failures from the last 7 days
const failures = await adapter.fetchFailures(7);
console.log(`Found ${failures.length} test failures`);

Using CLI

# Detect flakiness from Playwright reports
flakiness-detective detect \
  --adapter playwright \
  --adapter-path ./test-results/results.json \
  --embedding google \
  --api-key YOUR_API_KEY \
  --max-clusters 10

# Generate report from saved clusters
flakiness-detective report \
  --adapter firestore \
  --output-format json \
  --output-path ./flakiness-report.json

# Enable debug mode for detailed logging and performance metrics
flakiness-detective detect \
  --adapter playwright \
  --adapter-path ./test-results/results.json \
  --embedding google \
  --api-key YOUR_API_KEY \
  --verbose

πŸ’‘ Tip: Use --verbose to enable debug mode with timestamps, execution times, API usage stats, and cluster quality metrics.

Configuration Files

Flakiness Detective supports configuration files to simplify setup and avoid repetitive CLI arguments. Config files are discovered automatically in the current directory or parent directories.

Supported Config Files (in priority order)

  1. .flakinessrc.json - JSON configuration (recommended)
  2. .flakinessrc.js - JavaScript configuration
  3. flakiness-detective.config.js - Alternative JS config
  4. .flakinessrc.ts - TypeScript configuration
  5. flakiness-detective.config.ts - Alternative TS config
  6. package.json - Inline config in flakinessDetective field

Example: .flakinessrc.json

{
  "timeWindow": {
    "days": 7
  },
  "adapter": {
    "type": "playwright",
    "reportPath": "./test-results/results.json"
  },
  "embedding": {
    "type": "google",
    "apiKey": "${GOOGLE_AI_API_KEY}"
  },
  "clustering": {
    "epsilon": 0.15,
    "minPoints": 2,
    "minClusterSize": 2,
    "distance": "cosine",
    "maxClusters": 5
  },
  "output": {
    "format": "console"
  },
  "verbose": false
}

Example: flakiness-detective.config.ts

import type { FlakinessDetectiveConfigFile } from '@flakiness-detective/cli';

const config: FlakinessDetectiveConfigFile = {
  timeWindow: { days: 14 },
  adapter: {
    type: 'firestore',
    projectId: process.env.GOOGLE_CLOUD_PROJECT_ID,
    failuresCollection: 'test_failures',
    clustersCollection: 'flaky_clusters',
  },
  embedding: {
    type: 'google',
    apiKey: process.env.GOOGLE_AI_API_KEY,
  },
  clustering: {
    epsilon: 0.15,
    distance: 'cosine',
    maxClusters: 10,
  },
  output: {
    format: 'json',
    path: './flakiness-report.json',
  },
  verbose: true,
};

export default config;

Example: package.json inline config

{
  "name": "my-project",
  "flakinessDetective": {
    "timeWindow": { "days": 7 },
    "adapter": {
      "type": "playwright",
      "reportPath": "./test-results/results.json"
    },
    "embedding": {
      "type": "google",
      "apiKey": "${GOOGLE_AI_API_KEY}"
    },
    "clustering": {
      "epsilon": 0.15,
      "maxClusters": 5
    }
  }
}

CLI Arguments Override Config Files

When both a config file and CLI arguments are provided, CLI arguments take precedence:

# Uses config file but overrides epsilon and maxClusters
flakiness-detective detect \
  --epsilon 0.2 \
  --max-clusters 10

Config Validation

Config files are validated automatically with helpful error messages:

Config validation error in .flakinessrc.json:
  Invalid clustering.epsilon: must be a positive number
  Details: Got: -0.1

πŸ“˜ Documentation

Package Documentation

  • Core Package README: Complete guide to the core package, including:
    • Detailed configuration options
    • Playwright-specific examples
    • Pattern extraction details
    • API reference
    • Migration guide from internal implementation
  • CLI Package README: Command-line interface guide, including:
    • CLI commands and options
    • Configuration file formats and examples
    • CI/CD integration examples
    • Programmatic usage
    • Troubleshooting guide

Project Documentation

  • ROADMAP.md: Future development plans and features
  • AGENTS.md: Repository structure and monorepo guidelines
  • CLAUDE.md: AI assistant configuration and project context
  • CONTRIBUTING.md: How to contribute to this project

πŸ”§ Development

Running Tests

# Run all tests
pnpm test

# Run tests in watch mode
pnpm test:watch

# Run tests for a specific package
cd packages/core && pnpm test

Linting and Formatting

# Lint all packages
pnpm lint

# Format all packages
pnpm format

# Type check
pnpm typecheck

Building

# Build all packages
pnpm build

# Build specific package
pnpm -F "@flakiness-detective/core" build

# Clean build outputs
pnpm clean

Package Development

# Watch mode for a package
pnpm -F "@flakiness-detective/core" dev

# Add a dependency to a package
cd packages/core
pnpm add package-name

πŸ—οΈ Architecture

Detection Pipeline

  1. Fetch Failures: Data adapter retrieves test failures from the last N days
  2. Extract Patterns: Parse error messages, stack traces, and metadata
  3. Generate Embeddings: Convert rich context into vector representations
  4. Cluster Failures: Group similar failures using DBSCAN
  5. Analyze Clusters: Calculate frequency thresholds and identify patterns
  6. Persist Results: Save clusters with deterministic IDs for tracking

Key Components

FlakinessDetective

Main orchestrator that runs the detection pipeline end-to-end.

Data Adapters

Pluggable storage backends implementing the DataAdapter interface:

  • fetchFailures(days): Retrieve test failures
  • saveClusters(clusters): Persist cluster results
  • fetchClusters(limit): Retrieve saved clusters

Embedding Providers

AI services implementing the EmbeddingProvider interface:

  • generateEmbeddings(contexts): Convert text to vector embeddings

Pattern Extraction

Parses Playwright error messages to extract:

  • Structured error maps (actual, expected, locator, matcher, timeout)
  • Assertion details from code snippets
  • GitHub Actions run IDs from report links
  • Line numbers, error snippets, and stack traces

πŸ§ͺ Testing

This project has comprehensive test coverage:

  • Unit Tests: Individual functions and utilities
  • Integration Tests: Data adapters and embedding providers
  • E2E Tests: Full detection pipeline with mock data

Test files follow the pattern *.test.ts and are located next to source files.

πŸ“Š Example Output

Cluster Structure

{
  id: "2024-W42-0",
  failureCount: 15,
  failurePattern: "Locator(role=button[name='Submit']) (75%)",
  assertionPattern: "toBeVisible on role=button[name='Submit'] (5000ms timeout) (80%)",
  metadata: {
    failureCount: 15,
    firstSeen: "2024-10-14T08:30:00Z",
    lastSeen: "2024-10-20T14:22:00Z",
    failureIds: ["test-1", "test-2", ...],
    runIds: ["123456", "123457", ...],
    failureTimestamps: [...],
    errorMessages: [
      "Locator(role=button[name='Submit']) failed: locator.click: Timeout 5000ms exceeded...",
      ...
    ]
  },
  commonPatterns: {
    filePaths: ["tests/checkout.spec.ts"],
    lineNumbers: [45, 46],
    locators: ["role=button[name='Submit']"],
    matchers: ["toBeVisible"],
    timeouts: [5000]
  }
}

πŸ”„ CI/CD

GitHub Actions Workflows

  • CI Workflow: Runs on every push and PR

    • Installs dependencies
    • Lints code (Biome)
    • Builds packages (TypeScript)
    • Type checks
    • Runs tests (Vitest)
  • Release Workflow: Runs after CI succeeds on main

    • Uses Changesets for version management
    • Publishes to npm (when packages are not private)

πŸ“ Making Changes

  1. Create a feature branch

    git checkout -b feature/my-feature
  2. Make your changes following Conventional Commits

    git commit -m "feat(core): add new clustering algorithm"
  3. Add a changeset

    pnpm changeset
  4. Push and create a PR

    git push origin feature/my-feature

πŸš€ Publishing to npm

By default, all packages are "private": true. To publish:

  1. Set "private": false in package.json
  2. Add "publishConfig": { "access": "public" }
  3. Add NPM_TOKEN secret in GitHub
  4. Merge changeset PR to publish

🀝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

πŸ“„ License

MIT

πŸ™ Acknowledgments

This project is based on an internal implementation developed at Lytics for detecting flaky Playwright tests in CI/CD pipelines. It has been open-sourced and enhanced with:

  • Pluggable adapter architecture
  • Multiple distance metrics
  • Enhanced pattern extraction
  • Comprehensive testing
  • Full TypeScript support

πŸ“§ Support

About

An AI-powered test flakiness detection and analysis tool that helps identify and resolve flaky tests in your CI/CD pipeline using semantic embeddings and density-based clustering.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages