Normalized bars use 1.00 as a fixed visual baseline for fast scanning. Detailed methodology, raw counters, and reproduction commands stay in Benchmark scores.
Status: RTL simulation, verification, and benchmark automation are active. FPGA bring-up is paused until hardware and a stable implementation flow are back in hand.
A 5-stage in-order RV32IMC RISC-V core in SystemVerilog, with CSR / machine mode, caches, Wishbone, and a small SoC (UART, GPIO, SPI, I2C, timers, PLIC, and more). Built for learning, research, FPGA bring-up, and flow automation - not a minimal toy core.
- It is not a minimal core: the front-end includes RV32C handling, an align buffer, branch prediction, and cache-backed fetch.
- It is built for verification work: Spike comparison, riscv-tests, riscv-arch-test, Imperas flows, and optional riscv-dv / formal hooks are already integrated.
- It is parameterized for experiments: prefetch mode, cache hierarchy, multiplier/divider implementation, and simulation profiles are all configurable.
- It is easy to inspect: commit traces, Konata exports, dashboards, and memory-size reports are first-class workflows.
| Area | What you get |
|---|---|
| ISA | RV32I + M + C, Zicsr, Zifencei, machine mode |
| Frontend | Align buffer, RV32C decode, tournament branch predictor (GShare + bimodal), BTB, RAS, optional next-line prefetch (PREFETCH_TYPE in level_param.sv) |
| Memory | L1 I$/D$ + PMA; optional L2 — non-blocking, dual-pipe (I & D), multi-bank, write-back, MSHR, MESI-style tags (USE_L2_CACHE=1) |
| Execute | ALU, CSR file, selectable multiply/divide implementations |
| Verify | riscv-tests, riscv-arch-test, Imperas flows, Spike trace compare, optional formal / RISC-V DV |
| Observability | Spike-style commit trace, Konata pipeline export, HTML test dashboard (make dashboard) |
Click the badge above to open the live interactive architecture diagram in your browser (via htmlpreview.github.io). Tabs: Pipeline · Cache & MMU · SoC & Peripherals · Branch Predictor · Memory Map
| Block | Role |
|---|---|
| L1 I$ / D$ | Blocking line fills toward L2 or main memory; sizes and associativity from rtl/pkg/level_param.sv. |
L2 nbmbmp_l2_cache |
Non-blocking, multi-bank, multi-port cache: separate I-pipe and D-pipe FSMs, dp_bram arrays per way/bank, shared memory controller, inline MSHR for concurrent misses, write-back evictions to Wishbone. Turn on with USE_L2_CACHE=1 for sim/synth defines. |
| I-Cache Prefetch | next_line_prefetcher + prefetcher_wrapper in the fetch path; arms the line after a demand miss. PREFETCH_TYPE=1 to enable. |
| D-Cache Prefetch | Inline next-line prefetcher in memory.sv: on a D-cache load miss, the subsequent cache line is prefetched automatically (RAM region only, bit31=1). A stride prefetcher (stride_prefetcher.sv, RPT 64-entry) exists but is currently disabled — planned for a future release. |
After runs under results/logs/<sim>/, make dashboard builds a browsable HTML report for:
- ISA, benchmark, and regression-family grouping
- pass/fail summaries plus Spike diff drill-down
- quick navigation from failing runs into logs and artifacts
Illustrative preview:
Stylized preview — open the generated index.html after make dashboard for live data.
Tools this repo integrates with day to day. Click a badge to open the upstream project where applicable.
Prerequisites: Verilator 5+, RISC-V GCC (riscv32-unknown-elf-*), Python 3.8+, GNU Make. Optional: Spike, Yosys, ModelSim, GTKWave/Surfer.
git clone https://github.com/kerimturak/level-v.git
cd level-v
# Build the Verilator model
make verilate
# One-shot: fetch + build + import Berkeley ISA tests (needs subrepo / toolchain)
make isa_auto
# Run one test (RTL + Spike compare by default)
make run T=rv32ui-p-add
# Run the ISA regression suite
make isa
# Help
make helpUseful shortcuts: make t T=<isa-test>, make run T=<name>, make quick_test T=<name> (RTL only). See make help_tests and make help_sim.
├── rtl/ # Core, MMU/cache, peripherals, wrappers, pkg, flist.f
├── sim/ # C++ TB, test lists, custom C tests
├── env/ # Per-test link scripts & runtime for each suite
├── script/ # Python tools, shell helpers, JSON/.conf profiles
├── subrepo/ # riscv-tests, arch-test, Imperas, CoreMark, Embench, BEEBS, …
├── docs/ # Deep-dive markdown + MkDocs site source
├── makefile # Single entry point for sim, tests, synth helpers
└── results/ # Logs, waves, dashboards (generated)
| Target | What it does |
|---|---|
make verilate |
Compile RTL → build/obj_dir/Vlevel_wrapper |
make verilate-fast |
Same as make verilate VERILATE_FAST=1 (dev skip heuristic) |
make run T=<test> |
Full flow: prep → RTL → Spike → compare (see USE_PYTHON) |
make isa / make arch / make imperas |
Batch suites (requires imported ELFs under build/tests/) |
make isa_auto / make arch_auto |
Clone/configure/build/import pipelines |
make run_coremark |
CoreMark path (see docs/COREMARK_QUICK_START.md) |
make lint |
Verilator --lint-only pass |
make dashboard |
HTML summary over results/logs/<sim>/ |
make clean |
Clears build artifacts; keeps build/tests/ by default |
make clean_nuclear |
Deletes all of build/ including compiled tests |
make levelv_memory_report |
Prints riscv32-unknown-elf-size for every build/tests/**/*.elf plus per-suite max(dec) |
make custom_build TEST=<name> |
Bare-metal demo C tests → build/tests/custom/<name>.mem (UART; see sim/test/custom/) |
make beebs_clone / make beebs_build |
Git submodule subrepo/beebs (GPL-3.0); beebs_build runs native ./configure && make. RV32 .mem still needs a chip/board port (env/beebs/README.md) |
Configuration: simulator JSON under script/config/verilator.json & modelsim.json; simulation profile keys in script/config/tests/*.conf (merged with default.conf). Override with TEST_CONFIG=..., MAX_CYCLES=..., etc.
For on-chip RAM sizing and env/*/link.ld LENGTH, the relevant figure is the dec column from riscv32-unknown-elf-size (text + data + bss), which includes heap/stack reservations when the linker script places them in the image (e.g. CoreMark .heap / .stack NOLOAD regions).
Refresh numbers any time after (re)building tests:
make levelv_memory_reportThese are upper bounds per suite — individual tests can be smaller. riscv-arch-test images are aimed at simulation / compliance flows and can be hundreds of KiB; they are not representative of small FPGA BRAM.
| Suite | Typical max(dec) |
~KiB | Notes |
|---|---|---|---|
| torture | 5988 | ~5.9 | Small randomized fragments |
| imperas | 13028 | ~12.7 | |
| riscv-dv | 13432 | ~13.1 | |
| dhrystone | 19860 | ~19.4 | env/dhrystone/link.ld RAM 20 KiB |
| coremark | 30556 | ~29.8 | env/coremark/levelv/link.ld 32 KiB ceiling |
| embench-IoT | 39928 | ~39.0 | env/embench/link.ld 40 KiB LENGTH, 16 KiB stack (largest: qrduino); RTL WRAPPER_RAM_SIZE_KB must match |
| riscv-arch-test | often much larger than 32 KiB | — | Use levelv_memory_report for exact ELFs |
Sorted by name (one row per .elf under build/tests/embench/elf/ after make embench_build):
| Benchmark | dec (bytes) |
~KiB |
|---|---|---|
| aha-mont64 | 23170 | 22.63 |
| crc32 | 22717 | 22.19 |
| edn | 26193 | 25.58 |
| huffbench | 32798 | 32.03 |
| matmult-int | 31695 | 30.95 |
| md5sum | 26075 | 25.46 |
| nettle-aes | 35699 | 34.86 |
| nettle-sha256 | 27363 | 26.72 |
| nsichneu | 37069 | 36.20 |
| picojpeg | 35669 | 34.83 |
| qrduino | 39928 | 38.99 |
| sglib-combined | 33649 | 32.86 |
| slre | 24990 | 24.40 |
| statemate | 25757 | 25.15 |
| tarfind | 31019 | 30.29 |
UART / .mem note: .mem file line count is driven by the binary image (+ optional padding, e.g. COREMARK_MEM_PAD_BYTES in the makefile). Smaller linker images yield smaller .mem files for FPGA programming.
Site: kerimturak.github.io/level-v — architecture, tools, sim guides, cache tuning, exception priority, Wishbone, and more.
Local: mkdocs serve if you use MkDocs, or browse docs/ directly. Highlights:
| Topic | Entry |
|---|---|
| Architecture | docs/architecture.md |
| Tools | docs/tools.md |
| Simulation overview | docs/sim/overview.md |
| CoreMark | docs/COREMARK_QUICK_START.md |
| Performance logging | docs/PERF_PIPELINE_LOG.md |
OpenLane flow assets live under asic/openlane/. Example GDS snapshot:
Results below are from Verilator RTL simulation at CPU_CLK_HZ=25_000_000. If you want an apples-to-apples comparison against another core, keep the workload, ISA/ABI, clock, linker constraints, and compiler flags identical. Both runs use the repo's riscv32-unknown-elf-gcc toolchain; the CoreMark UART banner reported GCC15.1.0.
| Benchmark | Workload | Verilator / RTL sim | FPGA (target board) | Toolchain + optimization flags | Notes |
|---|---|---|---|---|---|
| CoreMark | 10 iterations | 2.62 CoreMark/MHz 65.38 CoreMarks @ 25 MHz 3,824,420 ticks |
— | riscv32-unknown-elf-gcc-O2 -g -march=rv32imc_zicsr -mabi=ilp32 -fno-builtin -fno-common -nostdlib -nostartfiles -DPERFORMANCE_RUN=1 -DITERATIONS=10 -lm -lgcc |
Quick comparison run. Runtime is under 10 s, so this is useful for relative comparison but not an official EEMBC-valid CoreMark publication score. |
| Dhrystone 2.1 | 200 iterations | ~66,112 Dhrystones/s 1.51 DMIPS/MHz ~37.63 DMIPS @ 25 MHz 75,629 total cycles |
— | riscv32-unknown-elf-gcc-O3 -march=rv32imc_zicsr -mabi=ilp32 -fno-inline -funroll-loops -static -nostdlib -nostartfiles -DTIME -DDHRY_ITERS=200 -Wl,--gc-sections |
Verilator RTL sim at 25 MHz equivalent clock; ~378.15 cycles/iter; UART output reached Dhrystone Complete. |
| Embench-IoT | suite geomean | — | — | varies by benchmark | Use host-side geomean over per-benchmark metrics; keep linker/RAM settings fixed when comparing. |
| Item | CoreMark | Dhrystone |
|---|---|---|
| Source | subrepo/coremark |
env/dhrystone |
| Build command | make coremark COREMARK_ITERATIONS=10 |
make dhrystone DHRY_ITERS=200 |
| Run command | make run_coremark COREMARK_ITERATIONS=10 SIM_UART_MONITOR=1 MAX_CYCLES=10000000 |
make dhrystone_run DHRY_ITERS=200 SIM_UART_MONITOR=1 MAX_CYCLES=5000000 |
| ISA / ABI | -march=rv32imc_zicsr -mabi=ilp32 |
-march=rv32imc_zicsr -mabi=ilp32 |
| Clock define | -DCPU_CLK_HZ=25000000UL |
-DCPU_CLK_HZ=25000000UL |
| Raw counter | total_ticks = 3,824,420 |
total_cycles = 75,629 |
| Score formula | CoreMark/MHz = iterations * 1e6 / total_ticks |
Dhrystones/s = iterations * Fclk / total_cyclesDMIPS/MHz = (Dhrystones/s / 1757) / Fclk_MHz |
- Fork and branch from
main. - Match RTL style: one module per file,
level_paramparameters, consistent*_i/*_osuffixes. - Run
make lintbefore opening a PR.
GPLv3 — see LICENSE.
Kerim Turak
Level — a documented RV32IMC core for simulation, verification, and SoC experiments.


