Smart AI Mode: Advanced GPT-5 + Perplexity Integration

Overview

DiscordianAI features a sophisticated AI orchestration system with intelligent routing, conversation consistency, and three flexible operation modes:

OpenAI Only - Use GPT-5 for all responses
Perplexity Only - Use Sonar-Pro with web search for all responses
Smart Hybrid - Automatically choose the best AI for each message with conversation consistency

AI Service Consistency System 🔄

DiscordianAI features an advanced conversation consistency system that ensures follow-up questions are routed to the same AI service that handled the initial query, maintaining context and coherence.

How It Works

Metadata Tracking: Every AI response includes metadata about which service generated it
Follow-up Detection: Automatically detects when users ask follow-up questions using advanced pattern matching
Consistent Routing: Routes follow-ups to the same AI service for conversational continuity

Example Flow

User: "What's the weather in Tokyo today?"
→ 🌐 Routed to Perplexity (time-sensitive, current data needed)
→ Response: "Tokyo is currently 22°C with light rain..." + metadata

User: "Tell me more about the forecast"  
→ 🔄 Detected as follow-up, checks recent AI service used
→ 🌐 Routed to Perplexity for consistency
→ User gets detailed forecast from same service with full context

Follow-up Detection Patterns

The system recognizes these follow-up indicators:

Continuation: "continue", "more", "tell me more", "also", "furthermore"
References: "what about", "how about", "regarding that"
Connectors: "and", "but", "however" + question mark
Responses: "yes, but...", "okay, and...", "no, what about..."

Message Flow Architecture

The following diagram shows how messages flow through the DiscordianAI system:

flowchart TD
    A[Discord Message Received] --> B{Bot Mentioned?}
    B -->|No| Z[Ignore Message]
    B -->|Yes| C[Rate Limit + Guardrails]
    C -->|Failed| D[Send Rate Limit Message]
    C -->|Passed| E[Smart Orchestrator Analysis]

    E --> L[Model Guard\n(GPT-5 + Sonar-Pro)]
    L --> F{Mode Selector}

    F -->|OpenAI Only| G[OpenAI Processing\n(GPT-5 family)]
    F -->|Perplexity Only| H[Perplexity Processing\n(Sonar models)]
    F -->|Hybrid| I[Check Conversation Context]

    I --> J{Follow-up Detected?}
    J -->|Yes| K[Reuse Previous AI Service]
    J -->|No| M[Analyze Message Content]

    M --> S[Sanitize for routing]
    S --> SI{Search Intent?}
    SI -->|Yes| H
    SI -->|No| N{URLs Detected?}
    N -->|Yes| H
    N -->|No| O{Time-sensitive or Entities?}
    O -->|Yes| H
    O -->|No| P{Conversational/Creative?}
    P -->|Yes| G
    P -->|No| Q{Factual Query?}
    Q -->|Yes| H
    Q -->|No| G

    K --> T{Previous Service}
    T -->|OpenAI| G
    T -->|Perplexity| H

    G --> R[GPT-5 Response Payload]
    H --> S[Sonar-Pro Response + Citations]

    %% OpenAI response path includes in-line web-inability detection and fallback
    R --> OW[OpenAI Response]
    OW --> WI{Indicates web-inability?}
    WI -->|Yes| H
    WI -->|No| U[Message Processor Formatting]

    S --> V[Citation Embed Formatter]

    U --> W[send_formatted_message]
    V --> W

    W --> X{Contains Embeds?}
    X -->|Yes| Y{Embed > Limits?}
    X -->|No| Z1{Message > 2000 chars?}

    Y --> CC[Split Embed + Citations]
    Z1 --> DD[Split Plain Message]
    CC --> BB[✅ Response Complete]
    DD --> BB

Key Anti-Duplication Features

The diagram above shows the critical control flow fix that prevents duplicate messages:

Single Entry Point: All responses go through send_formatted_message
Explicit Returns: Each successful send has a return statement to exit the function
Embed Priority: Perplexity responses with citations try embed first, fallback to regular message only on failure
No Fallthrough: Fixed the bug where embed success still continued to regular message logic

This ensures exactly one message is sent per user query, eliminating the duplicate message issue.

Model Guardrails

api_utils.validate_gpt_model() rejects anything outside the GPT-5 family and logs actionable warnings.
Perplexity calls only expose model and max_tokens; Sonar-Pro is the default and sampling is handled by the API.
Temperature toggles were removed because GPT-5 and Sonar-Pro ignore external sampling overrides.

Smart Detection (No Manual Triggers Required!)

The bot uses advanced semantic analysis to automatically determine when web search would be beneficial:

Smart Detection Criteria

Uses Web Search When:

Time-sensitive questions ("today", "recently", "latest")
Factual queries about changeable information (weather, stocks, news)
Questions about specific companies, people, or current events
Longer, detailed questions that suggest need for current data

Uses Regular AI When:

Creative requests (stories, poems, jokes)
Technical/coding help
Philosophical discussions
Personal conversations and greetings
General knowledge that doesn't change

Example Auto-Detection

Automatically Uses Perplexity:

"What's happening in AI today?" ← Time-sensitive
"How's Tesla's stock doing?" ← Financial data
"What's the weather like?" ← Current information
"Tell me about the recent SpaceX launch" ← Current events

Automatically Uses GPT-5:

"Write me a poem about robots" ← Creative
"How do I fix this Python error?" ← Technical
"What do you think about philosophy?" ← Opinion/conversation
"Hello! How are you?" ← Greeting

Discord Citation Features

Proper Hyperlinks

Converts numbered citations [1], [2] into clickable Discord hyperlinks
Format: [1](https://full-url) - preserves the citation number for readability
No more unclickable numbers in brackets!

Smart Embed Suppression

Automatically suppresses link previews when there are multiple citations
Prevents chat from being cluttered with preview images
Keeps messages readable and focused

Configuration Modes

Mode 1: Smart Hybrid (Recommended)

Uses both APIs and automatically chooses the best one:

# Both APIs configured
OPENAI_API_KEY=your_openai_api_key_here
PERPLEXITY_API_KEY=your_perplexity_api_key_here

Mode 2: OpenAI Only

Uses only GPT-5 for all responses:

# Only OpenAI configured
OPENAI_API_KEY=your_openai_api_key_here
PERPLEXITY_API_KEY=

Mode 3: Perplexity Only

Uses only Perplexity with web search for all responses:

# Only Perplexity configured  
OPENAI_API_KEY=
PERPLEXITY_API_KEY=your_perplexity_api_key_here

Advanced Configuration: AI Orchestrator Settings ⚙️

Fine-tune the AI service selection behavior with these advanced configuration options in your config.ini:

[Orchestrator]
# AI Orchestrator Configuration for advanced routing and optimization
# How many recent messages to check for AI service consistency
LOOKBACK_MESSAGES_FOR_CONSISTENCY=6

# Maximum conversation entries per user before pruning
MAX_HISTORY_PER_USER=50

# How often to clean up inactive user locks (seconds) - default 1 hour
USER_LOCK_CLEANUP_INTERVAL=3600

Configuration Explained

Setting	Default	Description
`LOOKBACK_MESSAGES_FOR_CONSISTENCY`	6	How many recent messages to check when determining AI service consistency for follow-ups
`MAX_HISTORY_PER_USER`	50	Maximum conversation entries stored per user (older entries are automatically pruned)
`USER_LOCK_CLEANUP_INTERVAL`	3600	How often to clean up memory from inactive users (in seconds)

Performance Impact

Higher LOOKBACK_MESSAGES_FOR_CONSISTENCY: More accurate consistency but slightly more processing
Higher MAX_HISTORY_PER_USER: Better long-term context but more memory usage
Lower USER_LOCK_CLEANUP_INTERVAL: More frequent memory cleanup but slight CPU overhead

Setup Instructions

1. Get API Keys (as needed)

OpenAI API Key (for GPT-5):

Go to https://platform.openai.com/api-keys
Create a new API key

Perplexity API Key (for web search):

Go to https://www.perplexity.ai/settings/api
Generate an API key

2. Choose Your Configuration

Copy config.ini.example to config.ini and toggle the keys for your desired mode:

Hybrid: Provide both OPENAI_API_KEY and PERPLEXITY_API_KEY.
OpenAI Only: Set PERPLEXITY_API_KEY blank.
Perplexity Only: Set OPENAI_API_KEY blank.

Environment variables with the same names override config.ini, so production deployments can keep secrets out of the file system.

3. Test It

Start your bot and try these examples:

"Hello!" ← Uses GPT-5
"What's the latest AI news?" ← Uses Perplexity
"Tell me about today's weather" ← Uses Perplexity
"How do I learn Python?" ← Uses GPT-5

Fallback Behavior

The hybrid orchestrator (smart_orchestrator.py) provides bidirectional fallback between services:

OpenAI failure → Perplexity: When OpenAI returns no response (quota exhaustion, auth errors, timeouts), the orchestrator automatically retries with Perplexity for web-enabled fallback.
Perplexity failure → OpenAI: When Perplexity fails (network errors, timeouts), the orchestrator falls back to OpenAI for standard completion.
Web-inability reroute: If OpenAI responds but indicates it cannot browse the web (e.g., "I can't browse the internet"), the orchestrator detects this and reroutes to Perplexity.
Both unavailable: Returns a user-friendly error message ("All AI services are temporarily unavailable").

Both services share identical retry policy via retry_with_backoff:

2 attempts max (1 initial + 1 retry) for transient errors (network blips, 429, 5xx)
insufficient_quota and API_TIMEOUT errors bail immediately — no wasted retries
2.0–4.0s jittered wait between retries to avoid thundering herd
SDK-level max_retries=0 prevents nested retry multiplication

Cost Considerations

GPT-5: Higher cost per token but excellent reasoning
Perplexity: Lower cost for web search, includes citations
The bot automatically chooses the most cost-effective option based on the query type

Monitoring & Debugging 🔍

Check your logs to see the AI orchestration system in action:

AI Service Selection Logs

INFO: Smart orchestrator processing message from user 123456789: What's the weather...
INFO: Running in hybrid mode - analyzing message for optimal routing
INFO: Message analysis suggests web search would be beneficial - trying Perplexity first
INFO: Perplexity response generated successfully (245 chars)

Consistency System Logs

INFO: Smart orchestrator processing message from user 123456789: Tell me more...
INFO: Running in hybrid mode - analyzing message for optimal routing
DEBUG: Follow-up detected, using recent AI service: perplexity
INFO: Message analysis suggests web search would be beneficial - trying Perplexity first
INFO: Perplexity response successful in hybrid mode (312 chars)

Conversation Management

DEBUG: Retrieved conversation for user 123456789: 8 messages
DEBUG: Generated conversation summary for user 123456789: 6 messages
INFO: Added assistant message for user 123456789: 245 chars, total history: 9 messages

Performance Monitoring

INFO: Cleaned up 3 inactive user locks for memory optimization
DEBUG: Rate limit check successful for User123 (ID: 123456789)

Thread-Safe Architecture 🔒

DiscordianAI uses a robust thread-safe architecture that handles concurrent users safely:

Features

Per-User Locking: Each user has their own conversation lock to prevent race conditions
Thread-Safe Conversation Management: All conversation operations are atomic and thread-safe
Memory Management: Automatic cleanup of inactive user locks to prevent memory leaks
Metadata Tracking: AI service information stored with each message for consistency

Benefits

Concurrent Users: Multiple users can chat simultaneously without data corruption
Consistent State: Conversation state remains consistent even under high load
Memory Efficiency: Automatic cleanup prevents memory bloat in long-running deployments
Production Ready: Designed for high-availability production environments

Customization & Advanced Usage 🛠️

Detection Pattern Customization

Adjust the AI service selection patterns in src/config.py:

# Time-sensitive patterns (compiled in config.py)
TIME_SENSITIVITY_PATTERNS = [
    re.compile(r"\b(today|yesterday|this week|recently|latest|current)\b", re.IGNORECASE),
]

# Follow-up detection patterns (compiled in config.py)
FOLLOW_UP_PATTERNS = [
    re.compile(r"\b(continue|more|tell me more|also|furthermore)\b", re.IGNORECASE),
]

Environment Variable Override

All configuration values can be overridden via environment variables:

export LOOKBACK_MESSAGES_FOR_CONSISTENCY=8
export MAX_HISTORY_PER_USER=100

Production Deployment Tips

Monitor Memory: Watch the logs for cleanup messages to ensure memory management is working
Adjust History: Increase MAX_HISTORY_PER_USER for better long-term context
Tune Consistency: Adjust LOOKBACK_MESSAGES_FOR_CONSISTENCY based on your user conversation patterns

Uh oh!

FilesExpand file tree

HybridMode.md

Latest commit

History