Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
188 changes: 148 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<div align="center">
<img width="300" src="res/kepler-logo.png" alt="Kepler logo">

<p>
<a href="https://github.com/Exein-io/kepler/actions/workflows/test.yaml">
<img src="https://github.com/Exein-io/kepler/actions/workflows/test.yaml/badge.svg?branch=main" alt="Test badge">
Expand All @@ -19,6 +19,8 @@ Kepler is a vulnerability database and lookup store and API currently utilising

# Setup

When setting up `kepler` project locally you can choose either `podman` or `docker` container runtime.

## [Docker](https://docs.docker.com/engine/install/) (recommended)

We provide a docker bundle with `kepler`, dedicated PostgreSQL database and [Ofelia](https://github.com/mcuadros/ofelia) as job scheduler for continuous update
Expand All @@ -43,47 +45,141 @@ podman compose build
podman-compose up
```

Or just use an alias:
Or just use an alias (if you're using podman)

```
alias docker=podman
```

### Database migration notes
When the application starts checks for pending database migrations and automatically applies them. Remove the `--migrate` option to stop when a pending migration is detected
### Data Import and Local Testing

## Migration runner (diesel-cli)
The `/data` directory serves as the source directory for downloading, extracting CVE JSON files and importing data into Kepler DB. When building the `kepler` image with `docker-compose.yaml`, the local `/data` directory is bound to the container:

```yaml
volumes:
- ./data:/data:Z
```

If you're interested in adding new migrations you should check out and instal [Diesel-cli](https://diesel.rs/guides/getting-started).
The system supports two scenarios:

After you have `diesel-cli` [installed](https://diesel.rs/guides/getting-started#installing-diesel-cli), you can run:
- **Pre-populated `/data`**: Contains `.gz` files for faster development setup - data is extracted and imported directly
- **Empty `/data`**: Triggers automatic download of NIST sources before extraction and import (takes longer as recent years contain large files)

```bash
diesel migration generate <name_your_migration>
```
This flexibility allows for reduced initial image size in deployed environments, where sources are updated frequently and downloaded as needed.

This will generae `up.sql` and `down.sql` files which you can than apply with :
### Steps taken when testing

#### Scenario 1. Normal import (`/data` is pre-populated)

```bash
diesel migration run
# Remove previous volumes
docker-compose down -v

# Re-build a new image
docker compose build

# Spin up a new kepler + kepler_db cluster
docker-compose up

# Run the import task
for year in $(seq 2002 2025); do
docker exec -it kepler kepler import_nist $year -d /data
done
```

- Or by re-starting your kepler conainer (this auto triggers migrations)
**Note**

- Ensure you have removed old `/data` contents and only have v2.0 `.gz` NIST files
- Kepler doesn't automatically populate the database from `.gz` files until you explicitly run the `import_nist` command

### 2025 log output example

```ruby
[2025-09-15T09:17:46Z INFO domain_db::cve_sources::nist] reading /data/nvdcve-2.0-2025.json ...
[2025-09-15T09:17:46Z INFO domain_db::cve_sources::nist] loaded 11536 CVEs in 351.54686ms
[2025-09-15T09:17:46Z INFO kepler] connected to database, importing records ...
[2025-09-15T09:17:46Z INFO kepler] configured 'KEPLER__BATCH_SIZE' 5000
[2025-09-15T09:17:46Z INFO kepler] 11536 CVEs pending import
[2025-09-15T09:17:47Z INFO domain_db::db] batch imported 5000 object records ...
[2025-09-15T09:17:47Z INFO domain_db::db] batch imported 5000 object records ...
[2025-09-15T09:17:47Z INFO domain_db::db] batch imported 1536 object records ...
[2025-09-15T09:17:48Z INFO kepler] batch imported 5000 cves ...
[2025-09-15T09:17:48Z INFO kepler] batch imported 10000 cves ...
[2025-09-15T09:17:48Z INFO kepler] batch imported 15000 cves ...
[2025-09-15T09:17:48Z INFO kepler] batch imported 20000 cves ...
[2025-09-15T09:17:48Z INFO kepler] batch imported 25000 cves ...
[2025-09-15T09:17:48Z INFO kepler] batch imported 30000 cves ...
[2025-09-15T09:17:49Z INFO kepler] batch imported 35000 cves ...
[2025-09-15T09:17:49Z INFO kepler] imported 37592 records Total
[2025-09-15T09:17:49Z INFO kepler] 37592 new records created
```

#### Scenario 2. Clean import (`/data` is empty)

## Build from sources
Steps:

Alternatively you can build `kepler` from sources. To build you need `rust`, `cargo` and `libpg-dev` (or equivalent PostgreSQL library for your Linux distribution)
```bash
# 1. Delete all `.gz` files from `/data`

# 2. Destroy the existing volume where we bound populated `/data`.
docker-compose down -v

# 3. Build a new image with an empty `/data` mount.
docker compose build

# 4. Re-trigger import (this time Kepler will download all year `.gz` files first, then proceed with `.json` extraction and database import)

for year in $(seq 2002 2025); do
docker exec -it kepler kepler import_nist $year -d /data
done
```
cargo build --release

### Example output

**Notice:** The extra `downloading` step appears here compared to normal import with pre-populated `/data`.

```ruby
for year in $(seq 2002 2025); do podman exec -it kepler kepler import_nist $year -d /data; done

[2025-09-15T09:20:59Z INFO domain_db::cve_sources] downloading https://nvd.nist.gov/feeds/json/cve/2.0/nvdcve-2.0-2002.json.gz to /data/nvdcve-2.0-2002.json.gz ...
[2025-09-15T09:21:00Z INFO domain_db::cve_sources::nist] extracting /data/nvdcve-2.0-2002.json.gz to /data/nvdcve-2.0-2002.json ...
[2025-09-15T09:21:00Z INFO domain_db::cve_sources::nist] reading /data/nvdcve-2.0-2002.json ...
[2025-09-15T09:21:00Z INFO domain_db::cve_sources::nist] loaded 6546 CVEs in 92.942702ms
[2025-09-15T09:21:00Z INFO kepler] connected to database, importing records ...
[2025-09-15T09:21:00Z INFO kepler] configured 'KEPLER__BATCH_SIZE' 5000
[2025-09-15T09:21:00Z INFO kepler] 6546 CVEs pending import
[2025-09-15T09:21:01Z INFO domain_db::db] batch imported 5000 object records ...
[2025-09-15T09:21:01Z INFO domain_db::db] batch imported 1546 object records ...
[2025-09-15T09:21:01Z INFO kepler] batch imported 5000 cves ...
[2025-09-15T09:21:01Z INFO kepler] imported 9159 records Total
[2025-09-15T09:21:01Z INFO kepler] 9159 new records created
[2025-09-15T09:21:01Z INFO domain_db::cve_sources] downloading https://nvd.nist.gov/feeds/json/cve/2.0/nvdcve-2.0-2003.json.gz to /data/nvdcve-2.0-2003.json.gz ...
[2025-09-15T09:21:02Z INFO domain_db::cve_sources::nist] extracting /data/nvdcve-2.0-2003.json.gz to /data/nvdcve-2.0-2003.json ...
```

### 📝 Important Note: Duplicate Prevention

Kepler automatically prevents duplicate data imports through database constraints:

- **Object table**: Unique constraint on the `cve` field prevents duplicate objects
- **CVEs table**: Composite unique constraint on `(cve, vendor, product)` prevents duplicate vulnerability entries

This ensures data integrity and prevents redundant imports when running import commands multiple times.

**Database constraints source code:**
- [Object table constraint](https://github.com/exein-io/kepler/blob/72cfcbdee1f02899fc7e482b7f77cd6b4972bf6d/domain-db/src/db/mod.rs#L105)
- [CVEs table constraint](https://github.com/exein-io/kepler/blob/72cfcbdee1f02899fc7e482b7f77cd6b4972bf6d/domain-db/src/db/mod.rs#L141)
- [Migration file](https://github.com/exein-io/kepler/blob/28d7b8bb67e1b6f58038156fa909839b70965892/migrations/2025-05-15-124616_add_unique_constraint_to_objects_and_cves/up.sql)

### Database migration notes

When the application starts, it automatically checks for and applies any pending database migrations. To prevent automatic migration and stop when a pending migration is detected, remove the `--migrate` option.

# Data sources

The system will automatically fetch and import new records every 3 hours if you use our [bundle](#docker-recommended), while historical data must be imported manually.
When using our [Docker bundle](#docker-recommended), the system automatically fetches and imports new vulnerability records every 3 hours. Historical data must be imported manually using the commands below.

Kepler currently supports two data sources, [National Vulnerability Database](https://nvd.nist.gov/) and [NPM Advisories](https://npmjs.org/). You can import the data sources historically as follows.
Kepler currently supports two data sources: [National Vulnerability Database](https://nvd.nist.gov/) and [NPM Advisories](https://npmjs.org/). Historical data can be imported using the following methods:

## NIST Data

Expand All @@ -95,9 +191,9 @@ for year in $(seq 2002 2025); do
done
```

- System will automatically fetch and import new records records every 3 hours. (using schedulled `ofelia` job)
- The system automatically fetches and imports new records every 3 hours using a scheduled `Ofelia` job

- Append `--refresh` argument if you want to refetch from [National Vulnerability Database (NVD)](https://nvd.nist.gov/) source.
- Use the `--refresh` argument to force re-downloading from the [National Vulnerability Database (NVD)](https://nvd.nist.gov/) source

Example - Refresh data for 2025

Expand All @@ -113,23 +209,7 @@ docker exec -it -e KEPLER__BATCH_SIZE=4500 kepler kepler import_nist 2025 -d /da

> NOTE: Postgres supports 65535 params total so be aware when changing the default `KEPLER__BATCH_SIZE=5000` - [Postgres limits](https://www.postgresql.org/docs/current/limits.html)

### Database tear down

If you want to rebuild your dabase. You would do it in these steps:

```bash
docker-compose down -v # -v (bring down volumes)
```

```bash
docker compose build # optional (if you made some backend changes)
```

```bash
docker-compose up
```

Than re-trigger the [NIST data Import](#nist-data) step.
---

# APIs

Expand Down Expand Up @@ -169,6 +249,34 @@ curl \

Responses are cached in memory with a LRU limit of 4096 elements.

## Migration runner (diesel-cli)

If you're interested in adding new migrations you should check out and install [Diesel-cli](https://diesel.rs/guides/getting-started).

After you have `diesel-cli` [installed](https://diesel.rs/guides/getting-started#installing-diesel-cli), you can run:

```bash
diesel migration generate <name_your_migration>
```

This will generate `up.sql` and `down.sql` files which you can then apply with:

```bash
diesel migration run
```

- Or by restarting your Kepler container (this automatically triggers migrations)

## Build from sources

Alternatively, you can build Kepler from source. You'll need `rust`, `cargo`, and `libpg-dev` (or the equivalent PostgreSQL library for your Linux distribution):

```
cargo build --release
```

---

### Troubleshooting

If you get the `linking with cc` error that looks similar to this one, you're likely missing some `c` related tooling or libs.
Expand All @@ -180,14 +288,14 @@ error: linking with `cc` failed: exit status: 1
collect2: error: ld returned 1 exit status
```

This one required Postgres related clib to be added.
This error requires installing PostgreSQL-related C libraries:

Fedora
**Fedora:**
```bash
sudo dnf install postgresql-devel
```

Arch
**Arch:**
```bash
sudo pacman -S postgresql-libs
```