From ec5a9f0b0f1e6c358869d6d5e7fa471b14450135 Mon Sep 17 00:00:00 2001 From: Dominik Polzer Date: Tue, 16 Sep 2025 10:38:29 +0200 Subject: [PATCH 1/2] chore(kepler): update docs for local setup --- README.md | 174 +++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 134 insertions(+), 40 deletions(-) diff --git a/README.md b/README.md index 361e119..8a47b7f 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@
Kepler logo - +

Test badge @@ -19,6 +19,8 @@ Kepler is a vulnerability database and lookup store and API currently utilising # Setup +When setting up `kepler` project locally you can choose either `podman` or `docker` container runtime. + ## [Docker](https://docs.docker.com/engine/install/) (recommended) We provide a docker bundle with `kepler`, dedicated PostgreSQL database and [Ofelia](https://github.com/mcuadros/ofelia) as job scheduler for continuous update @@ -43,47 +45,127 @@ podman compose build podman-compose up ``` -Or just use an alias: +Or just use an alias (if you're using podman) ``` alias docker=podman ``` -### Database migration notes -When the application starts checks for pending database migrations and automatically applies them. Remove the `--migrate` option to stop when a pending migration is detected +### Data Import and Local Testing -## Migration runner (diesel-cli) +The `/data` directory serves as the source directory for downloading, extracting CVE JSON files and importing data into Kepler DB. When building the `kepler` image with `docker-compose.yaml`, the local `/data` directory is bound to the container: -If you're interested in adding new migrations you should check out and instal [Diesel-cli](https://diesel.rs/guides/getting-started). +```yaml +volumes: + - ./data:/data:Z +``` -After you have `diesel-cli` [installed](https://diesel.rs/guides/getting-started#installing-diesel-cli), you can run: +The system supports two scenarios: -```bash -diesel migration generate -``` +- **Pre-populated `/data`**: Contains `.gz` files for faster development setup - data is extracted and imported directly +- **Empty `/data`**: Triggers automatic download of NIST sources before extraction and import (takes longer as recent years contain large files) -This will generae `up.sql` and `down.sql` files which you can than apply with : +This flexibility allows for reduced initial image size in deployed environments, where sources are updated frequently and downloaded as needed. + +### Steps taken when testing + +#### Scenario 1. Normal import (`/data` is pre-populated) ```bash -diesel migration run +# Remove previous volumes +docker-compose down -v + +# Re-build a new image +docker compose build + +# Spin up a new kepler + kepler_db cluster +docker-compose up + +# Run the import task +for year in $(seq 2002 2025); do + docker exec -it kepler kepler import_nist $year -d /data +done ``` -- Or by re-starting your kepler conainer (this auto triggers migrations) +**Note** + +- Ensure you have removed old `/data` contents and only have v2.0 `.gz` NIST files +- Kepler doesn't automatically populate the database from `.gz` files until you explicitly run the `import_nist` command + +### 2025 log output example + +```ruby +[2025-09-15T09:17:46Z INFO domain_db::cve_sources::nist] reading /data/nvdcve-2.0-2025.json ... +[2025-09-15T09:17:46Z INFO domain_db::cve_sources::nist] loaded 11536 CVEs in 351.54686ms +[2025-09-15T09:17:46Z INFO kepler] connected to database, importing records ... +[2025-09-15T09:17:46Z INFO kepler] configured 'KEPLER__BATCH_SIZE' 5000 +[2025-09-15T09:17:46Z INFO kepler] 11536 CVEs pending import +[2025-09-15T09:17:47Z INFO domain_db::db] batch imported 5000 object records ... +[2025-09-15T09:17:47Z INFO domain_db::db] batch imported 5000 object records ... +[2025-09-15T09:17:47Z INFO domain_db::db] batch imported 1536 object records ... +[2025-09-15T09:17:48Z INFO kepler] batch imported 5000 cves ... +[2025-09-15T09:17:48Z INFO kepler] batch imported 10000 cves ... +[2025-09-15T09:17:48Z INFO kepler] batch imported 15000 cves ... +[2025-09-15T09:17:48Z INFO kepler] batch imported 20000 cves ... +[2025-09-15T09:17:48Z INFO kepler] batch imported 25000 cves ... +[2025-09-15T09:17:48Z INFO kepler] batch imported 30000 cves ... +[2025-09-15T09:17:49Z INFO kepler] batch imported 35000 cves ... +[2025-09-15T09:17:49Z INFO kepler] imported 37592 records Total +[2025-09-15T09:17:49Z INFO kepler] 37592 new records created +``` +#### Scenario 2. Clean import (`/data` is empty) -## Build from sources +Steps: + +```bash +# 1. Delete all `.gz` files from `/data` -Alternatively you can build `kepler` from sources. To build you need `rust`, `cargo` and `libpg-dev` (or equivalent PostgreSQL library for your Linux distribution) +# 2. Destroy the existing volume where we bound populated `/data`. +docker-compose down -v +# 3. Build a new image with an empty `/data` mount. +docker compose build + +# 4. Re-trigger import (this time Kepler will download all year `.gz` files first, then proceed with `.json` extraction and database import) + +for year in $(seq 2002 2025); do + docker exec -it kepler kepler import_nist $year -d /data +done ``` -cargo build --release + +### Example output + +**Notice:** The extra `downloading` step appears here compared to normal import with pre-populated `/data`. + +```ruby + for year in $(seq 2002 2025); do podman exec -it kepler kepler import_nist $year -d /data; done + +[2025-09-15T09:20:59Z INFO domain_db::cve_sources] downloading https://nvd.nist.gov/feeds/json/cve/2.0/nvdcve-2.0-2002.json.gz to /data/nvdcve-2.0-2002.json.gz ... +[2025-09-15T09:21:00Z INFO domain_db::cve_sources::nist] extracting /data/nvdcve-2.0-2002.json.gz to /data/nvdcve-2.0-2002.json ... +[2025-09-15T09:21:00Z INFO domain_db::cve_sources::nist] reading /data/nvdcve-2.0-2002.json ... +[2025-09-15T09:21:00Z INFO domain_db::cve_sources::nist] loaded 6546 CVEs in 92.942702ms +[2025-09-15T09:21:00Z INFO kepler] connected to database, importing records ... +[2025-09-15T09:21:00Z INFO kepler] configured 'KEPLER__BATCH_SIZE' 5000 +[2025-09-15T09:21:00Z INFO kepler] 6546 CVEs pending import +[2025-09-15T09:21:01Z INFO domain_db::db] batch imported 5000 object records ... +[2025-09-15T09:21:01Z INFO domain_db::db] batch imported 1546 object records ... +[2025-09-15T09:21:01Z INFO kepler] batch imported 5000 cves ... +[2025-09-15T09:21:01Z INFO kepler] imported 9159 records Total +[2025-09-15T09:21:01Z INFO kepler] 9159 new records created +[2025-09-15T09:21:01Z INFO domain_db::cve_sources] downloading https://nvd.nist.gov/feeds/json/cve/2.0/nvdcve-2.0-2003.json.gz to /data/nvdcve-2.0-2003.json.gz ... +[2025-09-15T09:21:02Z INFO domain_db::cve_sources::nist] extracting /data/nvdcve-2.0-2003.json.gz to /data/nvdcve-2.0-2003.json ... ``` +### Database migration notes + +When the application starts, it automatically checks for and applies any pending database migrations. To prevent automatic migration and stop when a pending migration is detected, remove the `--migrate` option. + # Data sources -The system will automatically fetch and import new records every 3 hours if you use our [bundle](#docker-recommended), while historical data must be imported manually. +When using our [Docker bundle](#docker-recommended), the system automatically fetches and imports new vulnerability records every 3 hours. Historical data must be imported manually using the commands below. -Kepler currently supports two data sources, [National Vulnerability Database](https://nvd.nist.gov/) and [NPM Advisories](https://npmjs.org/). You can import the data sources historically as follows. +Kepler currently supports two data sources: [National Vulnerability Database](https://nvd.nist.gov/) and [NPM Advisories](https://npmjs.org/). Historical data can be imported using the following methods: ## NIST Data @@ -95,9 +177,9 @@ for year in $(seq 2002 2025); do done ``` -- System will automatically fetch and import new records records every 3 hours. (using schedulled `ofelia` job) +- The system automatically fetches and imports new records every 3 hours using a scheduled `Ofelia` job -- Append `--refresh` argument if you want to refetch from [National Vulnerability Database (NVD)](https://nvd.nist.gov/) source. +- Use the `--refresh` argument to force re-downloading from the [National Vulnerability Database (NVD)](https://nvd.nist.gov/) source Example - Refresh data for 2025 @@ -113,23 +195,7 @@ docker exec -it -e KEPLER__BATCH_SIZE=4500 kepler kepler import_nist 2025 -d /da > NOTE: Postgres supports 65535 params total so be aware when changing the default `KEPLER__BATCH_SIZE=5000` - [Postgres limits](https://www.postgresql.org/docs/current/limits.html) -### Database tear down - -If you want to rebuild your dabase. You would do it in these steps: - -```bash -docker-compose down -v # -v (bring down volumes) -``` - -```bash -docker compose build # optional (if you made some backend changes) -``` - -```bash -docker-compose up -``` - -Than re-trigger the [NIST data Import](#nist-data) step. +--- # APIs @@ -169,6 +235,34 @@ curl \ Responses are cached in memory with a LRU limit of 4096 elements. +## Migration runner (diesel-cli) + +If you're interested in adding new migrations you should check out and install [Diesel-cli](https://diesel.rs/guides/getting-started). + +After you have `diesel-cli` [installed](https://diesel.rs/guides/getting-started#installing-diesel-cli), you can run: + +```bash +diesel migration generate +``` + +This will generate `up.sql` and `down.sql` files which you can then apply with: + +```bash +diesel migration run +``` + +- Or by restarting your Kepler container (this automatically triggers migrations) + +## Build from sources + +Alternatively, you can build Kepler from source. You'll need `rust`, `cargo`, and `libpg-dev` (or the equivalent PostgreSQL library for your Linux distribution): + +``` +cargo build --release +``` + +--- + ### Troubleshooting If you get the `linking with cc` error that looks similar to this one, you're likely missing some `c` related tooling or libs. @@ -180,14 +274,14 @@ error: linking with `cc` failed: exit status: 1 collect2: error: ld returned 1 exit status ``` -This one required Postgres related clib to be added. +This error requires installing PostgreSQL-related C libraries: -Fedora +**Fedora:** ```bash sudo dnf install postgresql-devel ``` -Arch +**Arch:** ```bash sudo pacman -S postgresql-libs ``` From fb2217bbf83b97dff85c8c1f29f743892f5b7e24 Mon Sep 17 00:00:00 2001 From: Dominik Polzer Date: Tue, 16 Sep 2025 10:52:16 +0200 Subject: [PATCH 2/2] chore(kepler): mention db constraints for duplicate prevention --- README.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/README.md b/README.md index 8a47b7f..debe82d 100644 --- a/README.md +++ b/README.md @@ -157,6 +157,20 @@ done [2025-09-15T09:21:02Z INFO domain_db::cve_sources::nist] extracting /data/nvdcve-2.0-2003.json.gz to /data/nvdcve-2.0-2003.json ... ``` +### 📝 Important Note: Duplicate Prevention + +Kepler automatically prevents duplicate data imports through database constraints: + +- **Object table**: Unique constraint on the `cve` field prevents duplicate objects +- **CVEs table**: Composite unique constraint on `(cve, vendor, product)` prevents duplicate vulnerability entries + +This ensures data integrity and prevents redundant imports when running import commands multiple times. + +**Database constraints source code:** +- [Object table constraint](https://github.com/exein-io/kepler/blob/72cfcbdee1f02899fc7e482b7f77cd6b4972bf6d/domain-db/src/db/mod.rs#L105) +- [CVEs table constraint](https://github.com/exein-io/kepler/blob/72cfcbdee1f02899fc7e482b7f77cd6b4972bf6d/domain-db/src/db/mod.rs#L141) +- [Migration file](https://github.com/exein-io/kepler/blob/28d7b8bb67e1b6f58038156fa909839b70965892/migrations/2025-05-15-124616_add_unique_constraint_to_objects_and_cves/up.sql) + ### Database migration notes When the application starts, it automatically checks for and applies any pending database migrations. To prevent automatic migration and stop when a pending migration is detected, remove the `--migrate` option.