Scrapyd Log Parser

A high-performance tool designed to parse Scrapyd logs and generate detailed statistics, providing deep insights that Scrapyd doesn't provide natively.

Features

High Performance: Leverages ProcessPoolExecutor for parallel log parsing, maximizing CPU utilization.
Organized Output: Centralizes all parsed JSON data into a dedicated scrapydlogparser directory, mirroring your project's structure.
Loop Mode: Automatically monitors and re-parses logs at configurable intervals.
Smart Cleanup: Automatically removes orphaned JSON files when their corresponding log files are deleted.
Incremental Parsing: Only processes new or modified logs by checking file size against existing data.
Error Detection: Specifically detects critical unhandled errors and crashes, labeling them for easy identification.

Quick Start

Installation

Install in editable mode for development:

pip install -e .

Basic Usage

Run the parser by pointing it to your Scrapyd logs directory:

scrapyd-logparser /path/to/scrapyd/logs

By default, this will:

Create a scrapydlogparser/ directory inside your logs folder.
Generate individual .json files for every log, mirroring the project/spider structure.
Save a global scrapydlogparser.json summary in the same directory.

CLI Options

Option	Shorthand	Description	Default
`--interval`	`-i`	Interval in seconds for continuous monitoring (loop mode).	`5`
`--output`	`-o`	Path to the summary JSON file.	`logs/scrapydlogparser/scrapydlogparser.json`
`--force`	`-f`	Forces a full re-parse of all log files.	`Disabled`
`--json-dir`		Custom directory to store individual JSON files.	`logs/scrapydlogparser/`

Advanced Usage

Continuous Monitoring (Loop Mode)

To keep your statistics updated in real-time (e.g., every 60 seconds):

scrapyd-logparser ./logs --interval 60

Data Structure

The tool transforms your standard Scrapyd logs into a clean, queryable JSON structure:

logs/
├── project/
│   └── spider/
│       └── job.log
└── scrapydlogparser/
    ├── scrapydlogparser.json (Global Summary)
    └── project/
        └── spider/
            └── job.json (Detailed stats)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github		.github
logparser		logparser
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scrapyd Log Parser

Features

Quick Start

Installation

Basic Usage

CLI Options

Advanced Usage

Continuous Monitoring (Loop Mode)

Data Structure

About

Uh oh!

Releases 1

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Scrapyd Log Parser

Features

Quick Start

Installation

Basic Usage

CLI Options

Advanced Usage

Continuous Monitoring (Loop Mode)

Data Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages