DiscordianAI features a sophisticated AI orchestration system with intelligent routing, conversation consistency, and three flexible operation modes:
- OpenAI Only - Use GPT-5 for all responses
- Perplexity Only - Use Sonar-Pro with web search for all responses
- Smart Hybrid - Automatically choose the best AI for each message with conversation consistency
DiscordianAI features an advanced conversation consistency system that ensures follow-up questions are routed to the same AI service that handled the initial query, maintaining context and coherence.
- Metadata Tracking: Every AI response includes metadata about which service generated it
- Follow-up Detection: Automatically detects when users ask follow-up questions using advanced pattern matching
- Consistent Routing: Routes follow-ups to the same AI service for conversational continuity
User: "What's the weather in Tokyo today?"
→ 🌐 Routed to Perplexity (time-sensitive, current data needed)
→ Response: "Tokyo is currently 22°C with light rain..." + metadata
User: "Tell me more about the forecast"
→ 🔄 Detected as follow-up, checks recent AI service used
→ 🌐 Routed to Perplexity for consistency
→ User gets detailed forecast from same service with full context
The system recognizes these follow-up indicators:
- Continuation: "continue", "more", "tell me more", "also", "furthermore"
- References: "what about", "how about", "regarding that"
- Connectors: "and", "but", "however" + question mark
- Responses: "yes, but...", "okay, and...", "no, what about..."
The following diagram shows how messages flow through the DiscordianAI system:
flowchart TD
A[Discord Message Received] --> B{Bot Mentioned?}
B -->|No| Z[Ignore Message]
B -->|Yes| C[Rate Limit + Guardrails]
C -->|Failed| D[Send Rate Limit Message]
C -->|Passed| E[Smart Orchestrator Analysis]
E --> L[Model Guard\n(GPT-5 + Sonar-Pro)]
L --> F{Mode Selector}
F -->|OpenAI Only| G[OpenAI Processing\n(GPT-5 family)]
F -->|Perplexity Only| H[Perplexity Processing\n(Sonar models)]
F -->|Hybrid| I[Check Conversation Context]
I --> J{Follow-up Detected?}
J -->|Yes| K[Reuse Previous AI Service]
J -->|No| M[Analyze Message Content]
M --> S[Sanitize for routing]
S --> SI{Search Intent?}
SI -->|Yes| H
SI -->|No| N{URLs Detected?}
N -->|Yes| H
N -->|No| O{Time-sensitive or Entities?}
O -->|Yes| H
O -->|No| P{Conversational/Creative?}
P -->|Yes| G
P -->|No| Q{Factual Query?}
Q -->|Yes| H
Q -->|No| G
K --> T{Previous Service}
T -->|OpenAI| G
T -->|Perplexity| H
G --> R[GPT-5 Response Payload]
H --> S[Sonar-Pro Response + Citations]
%% OpenAI response path includes in-line web-inability detection and fallback
R --> OW[OpenAI Response]
OW --> WI{Indicates web-inability?}
WI -->|Yes| H
WI -->|No| U[Message Processor Formatting]
S --> V[Citation Embed Formatter]
U --> W[send_formatted_message]
V --> W
W --> X{Contains Embeds?}
X -->|Yes| Y{Embed > Limits?}
X -->|No| Z1{Message > 2000 chars?}
Y --> CC[Split Embed + Citations]
Z1 --> DD[Split Plain Message]
CC --> BB[✅ Response Complete]
DD --> BB
The diagram above shows the critical control flow fix that prevents duplicate messages:
- Single Entry Point: All responses go through
send_formatted_message - Explicit Returns: Each successful send has a
returnstatement to exit the function - Embed Priority: Perplexity responses with citations try embed first, fallback to regular message only on failure
- No Fallthrough: Fixed the bug where embed success still continued to regular message logic
This ensures exactly one message is sent per user query, eliminating the duplicate message issue.
api_utils.validate_gpt_model()rejects anything outside the GPT-5 family and logs actionable warnings.- Perplexity calls only expose
modelandmax_tokens; Sonar-Pro is the default and sampling is handled by the API. - Temperature toggles were removed because GPT-5 and Sonar-Pro ignore external sampling overrides.
The bot uses advanced semantic analysis to automatically determine when web search would be beneficial:
Uses Web Search When:
- Time-sensitive questions ("today", "recently", "latest")
- Factual queries about changeable information (weather, stocks, news)
- Questions about specific companies, people, or current events
- Longer, detailed questions that suggest need for current data
Uses Regular AI When:
- Creative requests (stories, poems, jokes)
- Technical/coding help
- Philosophical discussions
- Personal conversations and greetings
- General knowledge that doesn't change
Automatically Uses Perplexity:
- "What's happening in AI today?" ← Time-sensitive
- "How's Tesla's stock doing?" ← Financial data
- "What's the weather like?" ← Current information
- "Tell me about the recent SpaceX launch" ← Current events
Automatically Uses GPT-5:
- "Write me a poem about robots" ← Creative
- "How do I fix this Python error?" ← Technical
- "What do you think about philosophy?" ← Opinion/conversation
- "Hello! How are you?" ← Greeting
- Converts numbered citations [1], [2] into clickable Discord hyperlinks
- Format:
[1](https://full-url)- preserves the citation number for readability - No more unclickable numbers in brackets!
- Automatically suppresses link previews when there are multiple citations
- Prevents chat from being cluttered with preview images
- Keeps messages readable and focused
Uses both APIs and automatically chooses the best one:
# Both APIs configured
OPENAI_API_KEY=your_openai_api_key_here
PERPLEXITY_API_KEY=your_perplexity_api_key_hereUses only GPT-5 for all responses:
# Only OpenAI configured
OPENAI_API_KEY=your_openai_api_key_here
PERPLEXITY_API_KEY=Uses only Perplexity with web search for all responses:
# Only Perplexity configured
OPENAI_API_KEY=
PERPLEXITY_API_KEY=your_perplexity_api_key_hereFine-tune the AI service selection behavior with these advanced configuration options in your config.ini:
[Orchestrator]
# AI Orchestrator Configuration for advanced routing and optimization
# How many recent messages to check for AI service consistency
LOOKBACK_MESSAGES_FOR_CONSISTENCY=6
# Maximum conversation entries per user before pruning
MAX_HISTORY_PER_USER=50
# How often to clean up inactive user locks (seconds) - default 1 hour
USER_LOCK_CLEANUP_INTERVAL=3600| Setting | Default | Description |
|---|---|---|
LOOKBACK_MESSAGES_FOR_CONSISTENCY |
6 | How many recent messages to check when determining AI service consistency for follow-ups |
MAX_HISTORY_PER_USER |
50 | Maximum conversation entries stored per user (older entries are automatically pruned) |
USER_LOCK_CLEANUP_INTERVAL |
3600 | How often to clean up memory from inactive users (in seconds) |
- Higher
LOOKBACK_MESSAGES_FOR_CONSISTENCY: More accurate consistency but slightly more processing - Higher
MAX_HISTORY_PER_USER: Better long-term context but more memory usage - Lower
USER_LOCK_CLEANUP_INTERVAL: More frequent memory cleanup but slight CPU overhead
OpenAI API Key (for GPT-5):
- Go to https://platform.openai.com/api-keys
- Create a new API key
Perplexity API Key (for web search):
- Go to https://www.perplexity.ai/settings/api
- Generate an API key
Copy config.ini.example to config.ini and toggle the keys for your desired mode:
- Hybrid: Provide both
OPENAI_API_KEYandPERPLEXITY_API_KEY. - OpenAI Only: Set
PERPLEXITY_API_KEYblank. - Perplexity Only: Set
OPENAI_API_KEYblank.
Environment variables with the same names override config.ini, so production deployments can keep secrets out of the file system.
Start your bot and try these examples:
- "Hello!" ← Uses GPT-5
- "What's the latest AI news?" ← Uses Perplexity
- "Tell me about today's weather" ← Uses Perplexity
- "How do I learn Python?" ← Uses GPT-5
The hybrid orchestrator (smart_orchestrator.py) provides bidirectional fallback between services:
- OpenAI failure → Perplexity: When OpenAI returns no response (quota exhaustion, auth errors, timeouts), the orchestrator automatically retries with Perplexity for web-enabled fallback.
- Perplexity failure → OpenAI: When Perplexity fails (network errors, timeouts), the orchestrator falls back to OpenAI for standard completion.
- Web-inability reroute: If OpenAI responds but indicates it cannot browse the web (e.g., "I can't browse the internet"), the orchestrator detects this and reroutes to Perplexity.
- Both unavailable: Returns a user-friendly error message (
"All AI services are temporarily unavailable").
Both services share identical retry policy via retry_with_backoff:
- 2 attempts max (1 initial + 1 retry) for transient errors (network blips, 429, 5xx)
insufficient_quotaandAPI_TIMEOUTerrors bail immediately — no wasted retries- 2.0–4.0s jittered wait between retries to avoid thundering herd
- SDK-level
max_retries=0prevents nested retry multiplication
- GPT-5: Higher cost per token but excellent reasoning
- Perplexity: Lower cost for web search, includes citations
- The bot automatically chooses the most cost-effective option based on the query type
Check your logs to see the AI orchestration system in action:
INFO: Smart orchestrator processing message from user 123456789: What's the weather...
INFO: Running in hybrid mode - analyzing message for optimal routing
INFO: Message analysis suggests web search would be beneficial - trying Perplexity first
INFO: Perplexity response generated successfully (245 chars)
INFO: Smart orchestrator processing message from user 123456789: Tell me more...
INFO: Running in hybrid mode - analyzing message for optimal routing
DEBUG: Follow-up detected, using recent AI service: perplexity
INFO: Message analysis suggests web search would be beneficial - trying Perplexity first
INFO: Perplexity response successful in hybrid mode (312 chars)
DEBUG: Retrieved conversation for user 123456789: 8 messages
DEBUG: Generated conversation summary for user 123456789: 6 messages
INFO: Added assistant message for user 123456789: 245 chars, total history: 9 messages
INFO: Cleaned up 3 inactive user locks for memory optimization
DEBUG: Rate limit check successful for User123 (ID: 123456789)
DiscordianAI uses a robust thread-safe architecture that handles concurrent users safely:
- Per-User Locking: Each user has their own conversation lock to prevent race conditions
- Thread-Safe Conversation Management: All conversation operations are atomic and thread-safe
- Memory Management: Automatic cleanup of inactive user locks to prevent memory leaks
- Metadata Tracking: AI service information stored with each message for consistency
- Concurrent Users: Multiple users can chat simultaneously without data corruption
- Consistent State: Conversation state remains consistent even under high load
- Memory Efficiency: Automatic cleanup prevents memory bloat in long-running deployments
- Production Ready: Designed for high-availability production environments
Adjust the AI service selection patterns in src/config.py:
# Time-sensitive patterns (compiled in config.py)
TIME_SENSITIVITY_PATTERNS = [
re.compile(r"\b(today|yesterday|this week|recently|latest|current)\b", re.IGNORECASE),
]
# Follow-up detection patterns (compiled in config.py)
FOLLOW_UP_PATTERNS = [
re.compile(r"\b(continue|more|tell me more|also|furthermore)\b", re.IGNORECASE),
]All configuration values can be overridden via environment variables:
export LOOKBACK_MESSAGES_FOR_CONSISTENCY=8
export MAX_HISTORY_PER_USER=100- Monitor Memory: Watch the logs for cleanup messages to ensure memory management is working
- Adjust History: Increase
MAX_HISTORY_PER_USERfor better long-term context - Tune Consistency: Adjust
LOOKBACK_MESSAGES_FOR_CONSISTENCYbased on your user conversation patterns