Bug Report: Extremely slow inference on LicheeRV Nano (RISC-V C906)
Device & OS
- Hardware: LicheeRV Nano (SG2002 SoC, RISC-V C906 1GHz, 256MB DDR3 — 128MB available to Linux)
- OS: Buildroot (custom minimal Linux)
- Compiler: gcc-riscv64-linux-gnu (cross-compiled on Kali Linux) with
-static flag
Model
- Model file: tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
- Quantization: Q4_K_M
What happened?
Inference speed is extremely slow on RISC-V — ~0.0 tok/s instead of the advertised ~1 tok/s. A simple 10-token generation took over 16 minutes. The prefill alone took 162 seconds for just 2 tokens.
Command you ran
/root/.picolm/bin/picolm /root/.picolm/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf -p "hi" -n 10 -j 2
Expected output
~1 tok/s as listed in the picolm README for embedded/lightweight devices.
Actual output
Loading model: /root/.picolm/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
Model config:
n_embd=2048, n_ffn=5632, n_heads=32, n_kv_heads=4
n_layers=22, vocab_size=32000, max_seq=2048
head_dim=64, rope_base=10000.0
Allocating 1.17 MB for runtime state (+ 44.00 MB FP16 KV cache)
Tokenizer loaded: 32000 tokens, bos=1, eos=2
Prompt: 2 tokens, generating up to 10 (temp=0.80, top_p=0.90, threads=2)
---
ểu như là những ng
---
Prefill: 2 tokens in 162.92s (0.0 tok/s)
Generation: 11 tokens in 815.72s (0.0 tok/s)
Total: 978.64s
Memory: 45.17 MB runtime state (FP16 KV cache)
real 16m 19.55s
user 13m 3.19s
sys 0m 29.07s
Build output
Cross-compiled on Kali Linux for RISC-V:
make CC=riscv64-linux-gnu-gcc CFLAGS="-static" riscv
Additional notes
The binary architecture is confirmed correct:
/root/.picolm/bin/picolm: ELF 64-bit LSB pie executable, UCB RISC-V, RVC, double-float ABI, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-riscv64-lp64d.so.1, for GNU/Linux 4.15.0
The board is running at 100% CPU during inference. Suspected missing RISC-V vectorization optimizations (RVV) in the build target. The riscv make target may not be enabling optimal compiler flags for the C906 core specifically.
Bug Report: Extremely slow inference on LicheeRV Nano (RISC-V C906)
Device & OS
-staticflagModel
What happened?
Inference speed is extremely slow on RISC-V — ~0.0 tok/s instead of the advertised ~1 tok/s. A simple 10-token generation took over 16 minutes. The prefill alone took 162 seconds for just 2 tokens.
Command you ran
/root/.picolm/bin/picolm /root/.picolm/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf -p "hi" -n 10 -j 2Expected output
~1 tok/s as listed in the picolm README for embedded/lightweight devices.
Actual output
Build output
Cross-compiled on Kali Linux for RISC-V:
Additional notes
The binary architecture is confirmed correct:
The board is running at 100% CPU during inference. Suspected missing RISC-V vectorization optimizations (RVV) in the build target. The
riscvmake target may not be enabling optimal compiler flags for the C906 core specifically.