AROC — Agentic Read-Only Chat

A self-contained agentic terminal chat with 19 tools, powered by Qwen3.5-9B and llama.cpp

AROC is an offline-first, read-only agentic chat application that runs entirely on local hardware. It combines a 2-bit quantized 9B parameter model with 19 built-in tools for filesystem exploration, code analysis, task management, and working memory — all in a single Python file with zero pip dependencies.

Highlights

19 built-in tools — filesystem ops, code analysis, todo tracking, session memory
Single file — one chat.py, pure Python stdlib, no installs
Offline — runs fully local on consumer GPUs (tested: RTX 3060 12GB)
64K context — large conversations with automatic pruning
Agentic — multi-turn tool chains up to 12 rounds per message
Persistent state — scratch pad, todos, and memory survive session save/load

Quick Start

# Prerequisites: llama-server binary + model file
# Get llama.cpp: https://github.com/ggerganov/llama.cpp
# Get model:     https://huggingface.co/g023/g023-Qwen3.5-9B-GGUF

# Place g023-Qwen3.5-9B-IQ2_M.gguf alongside chat.py, then:
python3 chat.py

AROC auto-starts a llama-server, loads the model, and drops you into an interactive chat. To connect to an existing server:

python3 chat.py --no-server --port 19300

Tools

Filesystem (8 tools)

Tool	Description
`read_file`	Read with optional line ranges (default max 200 lines)
`head`	First N lines — quick file previews
`tail`	Last N lines — logs and recent content
`list_dir`	Directory listing with sizes and dates
`find_files`	Glob pattern file search
`grep`	Regex search across files
`grep_context`	Regex search with surrounding context lines
`file_info`	Size, permissions, timestamps, line count

Code Analysis (3 tools)

Tool	Description
`python_outline`	AST-based class/function extraction with line numbers
`diff_files`	Unified diff between two files
`analyze_file`	Subagent delegation for deep file analysis

Task Management (4 tools)

Tool	Description
`todo_add`	Add task with priority (high/medium/low)
`todo_list`	List all todos with completion status
`todo_done`	Mark task as done
`todo_remove`	Remove task

Memory & State (4 tools)

Tool	Description
`scratch_pad`	Overwrite-style working notes
`memory_append`	Append timestamped notes (accumulates)
`memory_read`	Read all session memory
`get_time`	Current date, time, and uptime

Architecture

┌───────────────────────────────────────────────────┐
│  ChatUI  (terminal rendering, slash commands)      │
│    ↕                                               │
│  Agent  (tool dispatch, state, conversation loop)  │
│    ↕                                               │
│  LlamaServer  (HTTP → llama-server process)        │
│    ↕                                               │
│  llama-server  (GGUF inference, CUDA)              │
└───────────────────────────────────────────────────┘

The Agent runs a loop: send messages → stream response → if tool calls, execute and loop (up to 12 turns) → if no tool calls, deliver final answer.

Three in-session state mechanisms complement each other:

State	Behavior	Use Case
Scratch pad	Overwrite	Current plan / working notes
Memory	Append-only	Findings, decisions, facts
Todos	Structured list	Multi-step task tracking

Commands

Command	Action
`/think`	Enable reasoning display
`/nothink`	Disable reasoning (faster)
`/clear`	Clear conversation
`/save FILE`	Save session (messages + state)
`/load FILE`	Restore session
`/pad`	Show scratch pad
`/clearpad`	Clear scratch pad
`/todos`	Show todo list
`/memory`	Show memory notes
`/tokens`	Token usage stats
`/tools`	List tools
`/model`	Model info
`/quit`	Exit

Multi-line input: end a line with \
Interrupt: Ctrl+C during generation

Configuration

CLI Options

--port PORT     Server port (default: 19300)
--model PATH    GGUF model path
--ngl LAYERS    GPU layers (default: 36)
--ctx TOKENS    Context window (default: 64000)
--no-server     Connect to existing server
--think         Start in reasoning mode
--no-color      Disable ANSI colors

Model

Spec	Value
Model	g023-Qwen3.5-9B-IQ2_M.gguf (3.4GB)
Quantization	IQ2_M (2-bit)
Architecture	Qwen3.5-9B Hybrid Mamba2-Attention
Context	64,000 tokens
KV Cache	q4_0 quantized

Sampling

Tuned specifically to prevent repetition loops at 2-bit quantization:

temperature=0.3  top_p=0.9  top_k=40  min_p=0.05
repeat_penalty=1.15  frequency_penalty=0.2  presence_penalty=0.0

Key Technical Detail: reasoning_format

Critical for IQ2_M models: The reasoning_format: "deepseek" server parameter is required for the model to produce actual response content. Without it, all tokens go to internal reasoning and content stays empty. This is a server-level parameter — prompt engineering cannot fix it.

# Applied to all API calls:
extra_sampling = {"reasoning_format": "deepseek"}

Examples

You ▸ What's the structure of chat.py?
  🔧 python_outline(path="chat.py")
AI ▸ chat.py has 1887 lines with 5 main classes: LlamaServer, StreamParser, ...

You ▸ Compare line counts of chat.py and chat_test.py
  🔧 file_info(path="chat.py")
  🔧 file_info(path="chat_test.py")
AI ▸ chat.py: 1,887 lines. chat_test.py: 709 lines.

You ▸ Create a review plan for error handling
  🔧 todo_add(task="Find all try/except blocks", priority="high")
  🔧 todo_add(task="Check error messages", priority="medium")
AI ▸ Created 2 todos. Use /todos to track progress.

Requirements

Requirement	Details
Python	3.8+ (stdlib only)
GPU	CUDA-compatible, 12GB+ VRAM
RAM	16GB+ (64GB recommended)
OS	Linux
llama.cpp	With `reasoning_format` support

Testing

python3 chat_test.py

Version History

Version	Date	Changes
2.0.0	2026-04-19	19 tools, reasoning_format fix, 64K context, in-session state, reasoning fallback
0.1	2026-04-18	Initial release (7 tools, 16K context)

License

MIT License — see LICENSE

Author

g023 — HuggingFace

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
LICENSE		LICENSE
README.md		README.md
chat.py		chat.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AROC — Agentic Read-Only Chat

Highlights

Quick Start

Tools

Architecture

Commands

Configuration

CLI Options

Model

Sampling

Key Technical Detail: reasoning_format

Examples

Requirements

Testing

Version History

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AROC — Agentic Read-Only Chat

Highlights

Quick Start

Tools

Architecture

Commands

Configuration

CLI Options

Model

Sampling

Key Technical Detail: reasoning_format

Examples

Requirements

Testing

Version History

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages