A high-performance tool designed to parse Scrapyd logs and generate detailed statistics, providing deep insights that Scrapyd doesn't provide natively.
- High Performance: Leverages
ProcessPoolExecutorfor parallel log parsing, maximizing CPU utilization. - Organized Output: Centralizes all parsed JSON data into a dedicated
scrapydlogparserdirectory, mirroring your project's structure. - Loop Mode: Automatically monitors and re-parses logs at configurable intervals.
- Smart Cleanup: Automatically removes orphaned JSON files when their corresponding log files are deleted.
- Incremental Parsing: Only processes new or modified logs by checking file size against existing data.
- Error Detection: Specifically detects critical unhandled errors and crashes, labeling them for easy identification.
Install in editable mode for development:
pip install -e .Run the parser by pointing it to your Scrapyd logs directory:
scrapyd-logparser /path/to/scrapyd/logsBy default, this will:
- Create a
scrapydlogparser/directory inside your logs folder. - Generate individual
.jsonfiles for every log, mirroring the project/spider structure. - Save a global
scrapydlogparser.jsonsummary in the same directory.
| Option | Shorthand | Description | Default |
|---|---|---|---|
--interval |
-i |
Interval in seconds for continuous monitoring (loop mode). | 5 |
--output |
-o |
Path to the summary JSON file. | logs/scrapydlogparser/scrapydlogparser.json |
--force |
-f |
Forces a full re-parse of all log files. | Disabled |
--json-dir |
Custom directory to store individual JSON files. | logs/scrapydlogparser/ |
To keep your statistics updated in real-time (e.g., every 60 seconds):
scrapyd-logparser ./logs --interval 60The tool transforms your standard Scrapyd logs into a clean, queryable JSON structure:
logs/
├── project/
│ └── spider/
│ └── job.log
└── scrapydlogparser/
├── scrapydlogparser.json (Global Summary)
└── project/
└── spider/
└── job.json (Detailed stats)