CrawleeOne

Production-ready web scraping. Out of the box.

CrawleeOne wraps Crawlee with everything production scrapers need -- data transforms, privacy compliance, error tracking, caching, and more -- in a single function call. Write the extraction logic. CrawleeOne handles the rest.

Works seamlessly with Apify, but the storage backend is pluggable -- you're not locked in.

npm install crawlee-one

Quick start

import { crawleeOne } from 'crawlee-one';

await crawleeOne({
  type: 'cheerio',
  routes: {
    mainPage: {
      match: /example\.com\/home/i,
      handler: async (ctx) => {
        const { $, pushData, pushRequests } = ctx;
        await pushData([{ title: $('h1').text() }], {
          privacyMask: { author: true },
        });
        await pushRequests([{ url: 'https://example.com/page/2' }]);
      },
    },
    otherPage: {
      match: (url, ctx) => url.startsWith('/') && ctx.$('.author').length > 0,
      handler: async (ctx) => {
        /* ... */
      },
    },
  },
});

That's it. No Actor.main() boilerplate, no manual router setup, no input wiring. CrawleeOne handles initialization, routing, input resolution, error handling, and teardown.

Why CrawleeOne?

One function. Full crawler.

Replace 100+ lines of Actor + Router + input boilerplate with a single crawleeOne() call.

Switch strategies, not code.

Go from cheerio to playwright by changing one prop. Your route handlers stay the same.

Reshape output without touching scraper code.

Users filter, transform, rename, and limit results via input config -- no code changes needed.

{
  "outputPickFields": ["name", "email"],
  "outputRenameFields": { "photo": "media.photos[0].url" },
  "outputMaxEntries": 500,
  "outputFilter": "(entry) => entry.rating > 4.0"
}

Fully typed out of the box.

Route handlers and context objects are typed based on your crawler type. TypeScript knows whether you have ctx.page or ctx.$ -- no extra setup.

Privacy compliance, built in.

Mark fields as personal data. CrawleeOne redacts them automatically when includePersonalData is off.

Incremental scraping.

Only process entries you haven't seen before. Built-in cache with KeyValueStore tracks what's been scraped across runs.

Errors captured, not lost.

Failed requests are saved to a dataset automatically. Plug in Sentry with one line, or implement your own telemetry.

Match routes by URL or content.

Regex, functions, or both. CrawleeOne auto-routes unlabeled requests to the right handler.

See all features

Before and after

What CrawleeOne replaces (click to expand)

With CrawleeOne:

await crawleeOne({
  type: 'cheerio',
  routes: {
    mainPage: {
      match: /example\.com\/home/i,
      handler: async (ctx) => {
        const data = [
          /* ... */
        ];
        await ctx.pushData(data, { privacyMask: { author: true } });
        await ctx.pushRequests([{ url: 'https://...' }]);
      },
    },
  },
});

Without CrawleeOne (vanilla Crawlee + Apify):

import { Actor } from 'apify';
import { CheerioCrawler, createCheerioRouter } from 'crawlee';

await Actor.main(async () => {
  const rawInput = await Actor.getInput();
  const input = {
    ...rawInput,
    ...(await fetchInput(rawInput.inputFromUrl)),
    ...(await runFunc(rawInput.inputFromFunc)),
  };

  const router = createCheerioRouter();

  router.addHandler('mainPage', async (ctx) => {
    await onBeforeHandler(ctx);
    const data = [
      /* ... */
    ];
    const finalData = await transformAndFilterData(data, ctx, input);
    const dataset = await Actor.openDataset(input.datasetId);
    await dataset.pushData(data);
    const reqs = ['https://...'].map((url) => ({ url }));
    const finalReqs = await transformAndFilterReqs(reqs, ctx, input);
    const queue = await Actor.openRequestQueue(input.requestQueueId);
    await queue.addRequests(finalReqs);
    await onAfterHandler(ctx);
  });

  router.addDefaultHandler(async (ctx) => {
    await onBeforeHandler(ctx);
    const url = ctx.request.loadedUrl || ctx.request.url;
    if (url.match(/example\.com\/home/i)) {
      const req = { url, userData: { label: 'mainPage' } };
      const finalReqs = await transformAndFilterReqs([req], ctx, input);
      const queue = await Actor.openRequestQueue(input.requestQueueId);
      await queue.addRequests(finalReqs);
    }
    await onAfterHandler(ctx);
  });

  const crawler = new CheerioCrawler({ ...input, requestHandler: router });
  crawler.run(['https://...']);
});

And that's far from everything -- the vanilla version still doesn't include data transforms, privacy masking, error tracking, caching, or input validation.

Common use cases

CrawleeOne scrapers support these out of the box, all configurable via input:

Use case	What it does
Import URLs	Load URLs from databases, datasets, or custom functions.
Data transforms	Rename, select, limit, and reshape output without code changes.
Request filtering	Control what gets scraped to save time and money.
Caching	Incremental scraping -- only process new entries.
Privacy compliance	Redact personal data with a single toggle.
Error capture	Centralized error tracking across scrapers.

See all 12 use cases

Getting started

Installation

npm install crawlee-one

For scraper developers

Read the getting started guide for a full walkthrough of crawleeOne() and its options.
See example projects for real-world usage.
Use crawlee-one gen to generate types, actor.json, actorspec.json, and README from a single config file.

For end users

Scrapers built with CrawleeOne are configurable by the end users (via Apify platform). Transform, filter, limit, and reshape scraped data and requests -- all through input fields, no code changes needed.

User guide

Documentation

Document	Description
Getting started	Developer guide with full `crawleeOne()` options reference.
Features	Complete feature catalog with code examples.
Use cases	All 12 use cases with links to detailed guides.
Input reference	All available input fields.
Deploying to Apify	Step-by-step Apify deployment guide.
Code generation	Generate types, actor.json, actorspec, and README from config.
Integrations	Custom telemetry and storage backends.
User guide	Guide for end users of CrawleeOne scrapers.
API reference	Auto-generated TypeScript API docs.
Crawlee & Apify overview	Background on how Crawlee and Apify work.

Example projects

SKCRIS Scraper -- Slovak research database scraper.
Profesia.sk Scraper -- Slovak job board scraper.

Contributing

Found a bug or have a feature request? Please open an issue.

When contributing code, please fork the repo and submit a pull request. See CONTRIBUTING.md for dev setup and guidelines.

Development

Want to build, test, or hack on CrawleeOne? The development guide covers prerequisites, all npm scripts, project structure, architecture, and testing strategy.

Supporting CrawleeOne

CrawleeOne is a labour of love. If you find it useful, you can support the project on Buy Me a Coffee.

Name		Name	Last commit message	Last commit date
Latest commit History 324 Commits
.github		.github
.vscode		.vscode
docs		docs
packages		packages
scripts/validate		scripts/validate
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.nvmrc		.nvmrc
.prettierignore		.prettierignore
.prettierrc.mjs		.prettierrc.mjs
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
eslint.config.mjs		eslint.config.mjs
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.base.json		tsconfig.base.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CrawleeOne

Quick start

Why CrawleeOne?

One function. Full crawler.

Switch strategies, not code.

Reshape output without touching scraper code.

Fully typed out of the box.

Privacy compliance, built in.

Incremental scraping.

Errors captured, not lost.

Match routes by URL or content.

Before and after

Common use cases

Getting started

Installation

For scraper developers

For end users

Documentation

Example projects

Contributing

Development

Supporting CrawleeOne

About

Uh oh!

Releases 9

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

CrawleeOne

Quick start

Why CrawleeOne?

One function. Full crawler.

Switch strategies, not code.

Reshape output without touching scraper code.

Fully typed out of the box.

Privacy compliance, built in.

Incremental scraping.

Errors captured, not lost.

Match routes by URL or content.

Before and after

Common use cases

Getting started

Installation

For scraper developers

For end users

Documentation

Example projects

Contributing

Development

Supporting CrawleeOne

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 9

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages