telegram-persona

Extract an executable persona model from Telegram chat exports. Turns your messaging history into structured, machine-readable rules that an AI agent can use to mimic your communication style.

What it does

Telegram Desktop Export (result.json)
        |
        v
  Tier-1: Per-chunk annotation (concurrent LLM calls)
        |
        v
  Tier-1.5: Per-chat summarization
        |
        v
  Tier-2 Phase A: Identity model, contacts, groups, topics, memes (parallel)
        |
        v
  Tier-2 Phase B: Style rules, response policies (depends on Phase A)
        |
        v
  Structured JSON outputs + System prompts

Output files

File	Contents
`identity_model.json`	Core style, Big Five, context-dependent styles, vocabulary
`style_rules.json`	Executable rules with confidence, evidence, and context applicability
`contact_profiles.json`	Per-contact relationship and tone adjustment profiles
`group_profiles.json`	User's role and behavior within each group
`topic_graph.json`	Topics by type (expertise / discussion / avoided) with expression style
`response_policies.json`	Conditional triggers and behavioral boundaries
`memes.json`	Meme timing, grammar patterns, adoption style
`behavior_stats.json`	Quantitative behavioral statistics (non-LLM)
`style_examples.json`	Notable phrases grouped by chat
`core_prompt.md`	Compact system prompt for agent use (~100 lines)
`system_prompt.md`	Full system prompt with all data (reference)

Install

git clone https://github.com/kotodamai/telegram-persona.git
cd telegram-persona
python -m venv .venv
source .venv/bin/activate
pip install -e .

Configure

cp .env.example .env
# Edit .env with your API key and model preferences

Works with any OpenAI-compatible API endpoint.

Usage

Export your Telegram data

Open Telegram Desktop
Settings > Advanced > Export Telegram data
Select JSON format, include all chats you want analyzed
Wait for export, note the path to result.json

Run analysis

# Full run
telegram-persona path/to/result.json -o output/

# Only run Tier-1 annotation (useful for large datasets)
telegram-persona path/to/result.json --only-tier1

# Reuse cached Tier-1, only re-run Tier-2 modeling
telegram-persona path/to/result.json --only-tier2

# Re-render prompts from existing JSON (no API calls)
telegram-persona --render-only -o output/

# Estimate cost before running
telegram-persona path/to/result.json --estimate-cost

# Test with a small sample
telegram-persona path/to/result.json --sample 10

Filtering

# Date range
telegram-persona result.json --since 2024-01-01 --until 2024-12-31

# Specific chats only
telegram-persona result.json --include-chat "Alice" --include-chat "Work Group"

# Exclude chats
telegram-persona result.json --exclude-chat "Spam Group"

# Skip low-activity chats
telegram-persona result.json --min-messages 10

CLI options

positional arguments:
  export_path           Path to Telegram result.json

options:
  -o, --output          Output directory (default: output/)
  --tier1-model         Model for Tier-1 annotation
  --tier2-model         Model for Tier-2 distillation
  --base-url            OpenAI-compatible API base URL
  --api-key             API key
  --concurrency         Concurrent API calls for Tier-1 (default: 5)
  --api-timeout         Per-request timeout in seconds (default: 180)
  --api-max-retries     Max retries per API call (default: 3)
  --only-tier1          Only run Tier-1 annotation
  --only-tier2          Skip Tier-1, reuse cache
  --render-only         Re-render prompts from existing JSON
  --estimate-cost       Dry run: estimate token usage
  --clear-cache         Clear cached results before running
  --sample N            Only process N random chunks
  --since/--until       Date range filter (YYYY-MM-DD)
  --include-chat/--exclude-chat   Chat name filter (repeatable)
  --min-messages N      Skip chats with fewer user messages

How to use the output

The core_prompt.md is a compact system prompt meant to be loaded into an AI agent as its base persona. It contains only the essential style rules, vocabulary, and boundaries.

For context-specific data (contacts, groups, topics), load from the JSON files on demand:

import json

# Always loaded as system prompt
core_prompt = open("output/core_prompt.md").read()

# Injected when talking to a specific person
contacts = json.load(open("output/contact_profiles.json"))
if contact_name in contacts:
    context += format_contact(contacts[contact_name])

# Injected when in a specific group
groups = json.load(open("output/group_profiles.json"))
if group_name in groups:
    context += format_group(groups[group_name])

Caching

All LLM calls are cached in output/.cache/. Cache keys include content hash, model name, prompt version, and source file hash. Changing prompts or input data automatically invalidates relevant caches.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src/telegram_persona		src/telegram_persona
templates		templates
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

telegram-persona

What it does

Output files

Install

Configure

Usage

Export your Telegram data

Run analysis

Filtering

CLI options

How to use the output

Caching

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

telegram-persona

What it does

Output files

Install

Configure

Usage

Export your Telegram data

Run analysis

Filtering

CLI options

How to use the output

Caching

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages