16 min to reply to hi [LicheeRV Nano]

## Bug Report: Extremely slow inference on LicheeRV Nano (RISC-V C906)

**Device & OS**
- Hardware: LicheeRV Nano (SG2002 SoC, RISC-V C906 1GHz, 256MB DDR3 — 128MB available to Linux)
- OS: Buildroot (custom minimal Linux)
- Compiler: gcc-riscv64-linux-gnu (cross-compiled on Kali Linux) with `-static` flag

**Model**
- Model file: tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
- Quantization: Q4_K_M

**What happened?**
Inference speed is extremely slow on RISC-V — ~0.0 tok/s instead of the advertised ~1 tok/s. A simple 10-token generation took over 16 minutes. The prefill alone took 162 seconds for just 2 tokens.

**Command you ran**
```bash
/root/.picolm/bin/picolm /root/.picolm/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf -p "hi" -n 10 -j 2
```

**Expected output**
~1 tok/s as listed in the picolm README for embedded/lightweight devices.

**Actual output**
```
Loading model: /root/.picolm/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
Model config:
  n_embd=2048, n_ffn=5632, n_heads=32, n_kv_heads=4
  n_layers=22, vocab_size=32000, max_seq=2048
  head_dim=64, rope_base=10000.0
Allocating 1.17 MB for runtime state (+ 44.00 MB FP16 KV cache)
Tokenizer loaded: 32000 tokens, bos=1, eos=2
Prompt: 2 tokens, generating up to 10 (temp=0.80, top_p=0.90, threads=2)
---
ểu như là những ng
---
Prefill: 2 tokens in 162.92s (0.0 tok/s)
Generation: 11 tokens in 815.72s (0.0 tok/s)
Total: 978.64s
Memory: 45.17 MB runtime state (FP16 KV cache)
real    16m 19.55s
user    13m 3.19s
sys     0m 29.07s
```

**Build output**
Cross-compiled on Kali Linux for RISC-V:
```
make CC=riscv64-linux-gnu-gcc CFLAGS="-static" riscv
```

**Additional notes**
The binary architecture is confirmed correct:
```
/root/.picolm/bin/picolm: ELF 64-bit LSB pie executable, UCB RISC-V, RVC, double-float ABI, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-riscv64-lp64d.so.1, for GNU/Linux 4.15.0
```
The board is running at 100% CPU during inference. Suspected missing RISC-V vectorization optimizations (RVV) in the build target. The `riscv` make target may not be enabling optimal compiler flags for the C906 core specifically.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

16 min to reply to hi [LicheeRV Nano] #26

Bug Report: Extremely slow inference on LicheeRV Nano (RISC-V C906)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

16 min to reply to hi [LicheeRV Nano] #26

Description

Bug Report: Extremely slow inference on LicheeRV Nano (RISC-V C906)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions