Skip to content

Latest commit

Β 

History

History
561 lines (416 loc) Β· 18.1 KB

File metadata and controls

561 lines (416 loc) Β· 18.1 KB

Extension Challenges

You've built the base project. Now make it yours by extending it with new features.

These challenges are ordered by difficulty. Start with the easier ones to build confidence, then tackle the harder ones when you want to dive deeper.

Easy Challenges

Challenge 1: Add Shift-by-Words Support

What to build: Instead of shifting every letter by the same amount, shift the first letter by 1, second letter by 2, third by 3, etc. This is called an autokey variant.

Why it's useful: This breaks frequency analysis. The same plaintext letter encrypts to different ciphertext letters depending on position. Much stronger than basic Caesar.

What you'll learn:

  • How position-dependent encryption works
  • Why polyalphabetic ciphers are harder to crack
  • Modifying the core cipher loop

Hints:

  • Look at cipher.py:43-46 - You'll need to pass both char and its position to _shift_char()
  • Use enumerate() to get character positions
  • The key becomes (self.key + position) % ALPHABET_SIZE

Test it works:

caesar-cipher encrypt "AAA" --key 1
# Should give "ABC" not "BBB"

Challenge 2: Support Custom Alphabets

What to build: Let users specify their own alphabet, like "QWERTYUIOPASDFGHJKLZXCVBNM" (keyboard order) instead of "ABC...Z".

Why it's useful: Custom alphabets are used in actual historical ciphers (like cipher disk devices). This teaches you about cipher flexibility.

What you'll learn:

  • How to validate custom input
  • Why alphabet order doesn't affect Caesar's weakness
  • Interface design for optional parameters

Hints:

  • The CaesarCipher.__init__() already accepts an alphabet parameter in cipher.py:16
  • Add a --alphabet option in main.py
  • Validate that alphabet has 26 unique characters

Test it works:

caesar-cipher encrypt "HELLO" --key 1 --alphabet "ZYXWVUTSRQPONMLKJIHGFEDCBA"
# Should shift using reverse alphabet

Challenge 3: Add Statistics Display

What to build: A stats command that shows letter frequency distribution for any text, with a visual bar chart in the terminal.

Why it's useful: Visualizing frequency makes it obvious why frequency analysis works. You'll see the E and T spikes immediately.

What you'll learn:

  • Data visualization in the terminal
  • Using Rich's progress bars or custom formatting
  • Presenting statistical information clearly

Hints:

  • Use collections.Counter like analyzer.py:30
  • Rich can make bar charts with β–ˆ characters repeated N times
  • Sort letters by frequency to make patterns obvious

Test it works:

caesar-cipher stats "THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG"
# Should show E and O as most frequent

Intermediate Challenges

Challenge 4: Implement ROT13 Encoding Mode

What to build: A special rot13 command that's optimized for the common shift=13 case. Make it bidirectional (ROT13 of ROT13 is the original).

Real world application: ROT13 is actually used on Reddit, forums, and in email for spoilers. It's not security, just obfuscation.

What you'll learn:

  • Why shift=13 is special (it's its own inverse in a 26-letter alphabet)
  • Command aliasing in CLI tools
  • Specialized implementations vs general ones

Implementation approach:

  1. Add command to main.py

    • Files to create: None (add to existing file)
    • Files to modify: main.py (add @app.command() for rot13)
  2. Optimize for the special case

    • Since key is always 13, you can hardcode it
    • No need for separate encrypt/decrypt (same operation)
  3. Test edge cases:

    • What if text has numbers or punctuation?
    • Does it handle Unicode properly?

Hints:

  • ROT13 is just CaesarCipher(key=13)
  • The command can be simpler than encrypt/decrypt since no key argument
  • Make it work on stdin for piping: echo "SECRET" | caesar-cipher rot13

Extra credit: Support ROT47 which shifts all printable ASCII characters, not just letters.

Challenge 5: Add Dictionary-Based Ranking

What to build: Instead of just frequency analysis, check if the decrypted text contains valid English words from a word list. Rank candidates by number of recognized words.

Why it's useful: Frequency analysis can be fooled by short text or technical jargon. Dictionary checking is more robust for short messages.

What you'll learn:

  • Working with word lists (use /usr/share/dict/words on Unix or download one)
  • Combining multiple scoring methods
  • Performance optimization (loading and searching word lists)

Implementation approach:

  1. Load word list into a set for O(1) lookup

    • Create dictionary.py with a load_words() function
    • Store as a set for fast word in dictionary checks
  2. Score by word matches

    • Split decrypted text into words
    • Count how many are in the dictionary
    • Normalize by total words to get a percentage
  3. Combine with frequency scoring

    • Weight dictionary score and frequency score
    • Tune weights: maybe 70% dictionary, 30% frequency

Hints:

  • Download word list: curl https://github.com/dwyl/english-words/raw/master/words_alpha.txt > words.txt
  • Case normalize: word.lower() in dictionary
  • Handle punctuation: text.replace('!', '').replace('?', '').split()

Challenge 6: Crack Multi-Language Ciphers

What to build: Support cracking ciphers in languages other than English by loading different frequency tables. Add Spanish, French, German support.

Real world application: Frequency analysis is universal. Every language has characteristic letter patterns. Spanish uses Γ‘, German uses Γ„/Γ–/Ü, French has accents.

What you'll learn:

  • How frequency distributions differ across languages
  • Unicode handling in Python
  • Designing multi-language support

Implementation:

  1. Add frequency tables to constants.py

    • Research Spanish letter frequencies online
    • Store in dicts like SPANISH_LETTER_FREQUENCIES
  2. Update analyzer to accept language parameter

   analyzer = FrequencyAnalyzer(language='spanish')
  1. CLI support
   caesar-cipher crack "SADDW FW UYMJYJ" --language spanish

Gotchas:

  • Extended alphabets (Spanish has 27 letters with Γ‘)
  • Case sensitivity for accented characters
  • Do you normalize 'Γ‰' to 'E' or treat them separately?

Advanced Challenges

Challenge 7: Build a Vigenère Cipher Implementation

What to build: A polyalphabetic cipher that uses a keyword to determine different shifts for different positions. Much harder to break than Caesar.

Why this is hard: Vigenère was called "le chiffre indéchiffrable" (the indecipherable cipher) for centuries. Breaking it requires finding the key length first, then frequency analysis on each position.

What you'll learn:

  • Polyalphabetic substitution
  • Kasiski examination for finding key length
  • Index of coincidence statistical test
  • Multi-step cryptanalysis

Architecture changes needed:

Add to cipher layer:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  VigenereCipher     β”‚
β”‚  - repeating key    β”‚
β”‚  - position logic   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Add to analysis layer:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  VigenereAnalyzer   β”‚
β”‚  - Kasiski method   β”‚
β”‚  - IC calculation   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Implementation steps:

  1. Research phase

    • Read about VigenΓ¨re cipher algorithm
    • Understand Kasiski examination
    • Look at index of coincidence formula
  2. Design phase

    • Should VigenereCipher inherit from CaesarCipher? Probably not, different enough.
    • How to handle key wrapping when key is shorter than text?
    • Store key as string or list of shifts?
  3. Implementation phase

    • Start with encryption: repeat key to match text length
    • Add decryption: same but subtract shifts
    • Add Kasiski exam: find repeated sequences, calculate GCD of distances
  4. Testing phase

    • Unit test encryption with various key lengths
    • Test with known ciphertext (VigenΓ¨re challenges online)
    • Benchmark: how long to crack 100-char message?

Gotchas:

  • Key "CAT" means shifts of [2, 0, 19] (C=2, A=0, T=19)
  • Non-letters shouldn't advance key position
  • Cracking needs at least 100-200 characters of ciphertext

Resources:

  • Wikipedia: VigenΓ¨re cipher - Good overview of algorithm
  • "Breaking the VigenΓ¨re Cipher" paper - Detailed cryptanalysis methods

Challenge 8: Implement Frequency Analysis Visualization

What to build: A web-based tool that shows live frequency charts as you type ciphertext. Use Flask for backend, Chart.js for visualization.

Estimated time: 1-2 days including learning frontend stuff.

Prerequisites: You should have completed the statistics display challenge first. This builds on that concept.

What you'll learn:

  • Building web UIs for crypto tools
  • Real-time data visualization
  • Connecting Python analysis to JavaScript charts

Planning this feature:

Before you code, think through:

  • How does frontend send text to backend? (POST request with JSON)
  • How fast can you compute frequencies for 10kb of text? (Should be instant)
  • Do you need websockets or is HTTP enough? (HTTP fine for this)

High level architecture:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Browser    │◄─────────   Flask      β”‚
β”‚  Chart.js    β”‚  JSON   β”‚  analyzer.py β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      β–²                          β”‚
      β”‚                          β”‚
      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            Frequencies

Implementation phases:

Phase 1: Flask Backend (2-3 hours)

  • Create web.py with Flask app
  • Add /analyze endpoint that takes text, returns frequencies
  • Reuse FrequencyAnalyzer from existing code

Phase 2: Frontend (3-4 hours)

  • HTML page with textarea for ciphertext
  • JavaScript to send text on keyup
  • Chart.js bar chart to display frequencies

Phase 3: Add Caesar Decryption (2-3 hours)

  • Button to crack the ciphertext
  • Display all 26 candidates with scores
  • Highlight best match

Phase 4: Polish (2-3 hours)

  • Add loading indicators
  • Show character count
  • Dark mode toggle

Testing strategy:

  • Manual testing: Type various ciphertexts, verify charts update
  • Check with very long text (10kb+) to ensure no lag
  • Test on different browsers

Known challenges:

  1. Chart redrawing performance

    • Problem: Recreating chart on every keystroke is slow
    • Hint: Update chart data instead of destroying and recreating
  2. Handling empty input

    • Problem: Empty text causes division by zero
    • Hint: Return empty array when text is empty

Success criteria: Your implementation should:

  • Update frequency chart in real-time as you type
  • Show all 26 crack candidates with scores
  • Handle 10,000 character inputs smoothly
  • Work on mobile browsers
  • Display within 100ms of text input

Mix and Match

Combine features for bigger projects:

Project Idea 1: Multi-Cipher Tool

  • Combine Challenge 7 (VigenΓ¨re) + Challenge 6 (multi-language)
  • Add Playfair cipher
  • Result: Swiss Army knife for classical cryptography

Project Idea 2: Cryptanalysis Suite

  • Combine Challenge 5 (dictionary) + Challenge 4 (ROT13) + Challenge 8 (visualization)
  • Add automated decryption (try all methods)
  • Result: Tool that takes any ciphertext and figures out what cipher was used

Real World Integration Challenges

Integrate with Online Cipher Challenges

The goal: Make the tool able to fetch ciphertext from CryptoPals or other online CTF challenges and automatically crack them.

What you'll need:

  • HTTP client (requests library)
  • Parsing HTML or JSON responses
  • Handling rate limits

Implementation plan:

  1. Add --url option to crack command
  2. Fetch ciphertext from URL
  3. Run crack and submit answer back

Watch out for:

  • CAPTCHA on challenge sites
  • Different encoding (base64, hex)
  • Throttling after too many requests

Deploy as a Telegram Bot

The goal: Let users encrypt/decrypt messages via Telegram chat.

What you'll learn:

  • Building chatbots
  • Stateless conversation handling
  • API integration

Steps:

  1. Register bot with BotFather on Telegram
  2. Use python-telegram-bot library
  3. Parse commands: /encrypt <text> <key>
  4. Return formatted results

Production checklist:

  • Handle errors gracefully (show user-friendly messages)
  • Rate limit per user (prevent spam)
  • Log usage for debugging
  • Deploy to server (Heroku, AWS Lambda)

Performance Challenges

Challenge: Handle 1MB Files Instantly

The goal: Make crack command work on huge files without noticeable delay.

Current bottleneck: Frequency analysis runs on full text 26 times. For 1MB, that's 26MB of processing.

Optimization approaches:

Approach 1: Sample Instead of Full Analysis

  • How: Only analyze first 10,000 characters
  • Gain: 100x speedup on large files
  • Tradeoff: Less accurate if beginning isn't representative

Approach 2: Parallel Processing

  • How: Use multiprocessing to test all 26 shifts simultaneously
  • Gain: Near-linear speedup with CPU cores
  • Tradeoff: Overhead for small files makes it slower

Approach 3: Optimize Chi-Squared Calculation

  • How: Cache expected frequencies, use numpy for math
  • Gain: 2-3x speedup
  • Tradeoff: Adds numpy dependency

Benchmark it:

# Generate large file
python -c "import random, string; print(''.join(random.choices(string.ascii_uppercase, k=1000000)))" > large.txt

# Time it
time caesar-cipher crack --input-file large.txt

Target metrics:

  • 1MB file: Under 1 second
  • 10MB file: Under 10 seconds

Challenge: Reduce Memory Usage

The goal: Crack huge files without loading them entirely into memory.

Profile first:

python -m memory_profiler main.py crack --input-file huge.txt

Common optimization areas:

  • Streaming file read instead of read_text()
  • Generator expressions instead of lists
  • Don't store all 26 full decryptions, just the letter frequencies

Security Challenges

Challenge: Add HMAC for Message Authentication

What to implement: Generate an HMAC (keyed hash) alongside the ciphertext so recipients can verify messages weren't tampered with.

Threat model: This protects against:

  • Message modification (attacker changing ciphertext)
  • Forgery (attacker creating fake messages)

Implementation:

import hmac
import hashlib

def encrypt_and_mac(plaintext: str, key: int, mac_key: bytes) -> tuple[str, str]:
    cipher = CaesarCipher(key=key)
    ciphertext = cipher.encrypt(plaintext)
    mac = hmac.new(mac_key, ciphertext.encode(), hashlib.sha256).hexdigest()
    return ciphertext, mac

Testing the security:

  • Try to modify ciphertext without updating MAC
  • Attempt to forge MAC with wrong key
  • Verify legitimate messages pass validation

Challenge: Implement Timing Attack Protection

The goal: Make decryption take constant time regardless of whether key is correct. Prevents attackers from using timing to guess keys.

Threat model: Currently, incorrect keys might fail faster than correct ones (early exit on validation). An attacker measuring response times could exploit this.

Remediation:

  • Always decrypt completely even if you know it's wrong
  • Add random delay to normalize timing
  • Use hmac.compare_digest() for MAC comparison (constant time)

Contribution Ideas

Finished a challenge? Share it back:

  1. Fork the repo
  2. Implement your extension in a branch like feature/vigenere-cipher
  3. Document it - Add to learn folder showing how your extension works
  4. Submit a PR with:
    • Your implementation
    • Tests covering edge cases
    • Documentation in learn/
    • Example usage in README

Good extensions might get merged into the main project.

Challenge Yourself Further

Build Something New

Use the concepts you learned here to build:

  • Playfair cipher - Digraph substitution using 5x5 key square
  • Bifid cipher - Combines substitution and transposition
  • Four-square cipher - Uses four 5x5 matrices
  • Hill cipher - Matrix multiplication based polyalphabetic cipher

Study Real Implementations

Compare your implementation to production tools:

  • CyberChef - How do they structure multi-cipher support?
  • John the Ripper - How do they optimize brute force?
  • Cryptool - How do they present educational content?

Read their code, understand their tradeoffs, steal their good ideas.

Write About It

Document your extension:

  • Blog post: "How I Built a VigenΓ¨re Cracker"
  • Tutorial: "Breaking Classical Ciphers with Python"
  • Comparison: "Caesar vs VigenΓ¨re vs Enigma: Complexity Analysis"

Teaching others is the best way to verify you understand it.

Getting Help

Stuck on a challenge?

  1. Debug systematically

    • What did you expect? "Frequency analysis should rank 'HELLO' first"
    • What actually happened? "It ranked gibberish higher"
    • Smallest test case? "Input: 'KHOOR', Expected shift: 3, Got shift: 7"
  2. Read the existing code

    • The analyzer uses chi-squared - how does that work?
    • Look at test_analyzer.py - what cases are tested?
  3. Search for similar problems

    • Google: "python frequency analysis not working short text"
    • StackOverflow: [cryptography] [python] tags
  4. Ask for help

    • Post in GitHub discussions
    • Include: what you tried, what happened, what you expected
    • Show code: "Here's my _shift_char() modification, it doesn't wrap correctly"

Challenge Completion

Track your progress:

  • Easy Challenge 1: Autokey variant
  • Easy Challenge 2: Custom alphabets
  • Easy Challenge 3: Stats display
  • Intermediate Challenge 4: ROT13 mode
  • Intermediate Challenge 5: Dictionary ranking
  • Intermediate Challenge 6: Multi-language
  • Advanced Challenge 7: VigenΓ¨re cipher
  • Advanced Challenge 8: Web visualization

Completed all of them? You've mastered classical cryptography. Time to learn modern cryptography (AES, RSA, elliptic curves) or contribute back to this project with your extensions.