A C# .NET service that scrapes theater calendars from venues across the Kansas City Metro area and aggregates them into a single calendar.
- Multi-venue Scraping: Scrapes dozens of theater websites in the KC Metro area
- Flexible Architecture: Modular scraper system that can handle different website structures
- Calendar Export: Generates iCalendar (.ics) files compatible with Google Calendar, Outlook, and other calendar apps
- Scheduling: Runs automatically every 6 hours to keep event data current
- JSON Export: Also saves detailed event data in JSON format
- Statistics: Generates scraping statistics and venue analytics
- Command Line Interface: Can be run manually or as a scheduled service
The scraper is configured to work with major Kansas City theater venues including:
- Kauffman Center for the Performing Arts
- Kansas City Repertory Theatre
- The Unicorn Theatre
- Kansas City Young Audiences
- The Coterie Theatre
- Starlight Theatre
- Music Hall Kansas City
- The Folly Theater
- Uptown Theater
- KC Melting Pot Theatre
- The Living Room Theatre
- Quality Hill Playhouse
- Theatre in the Park
- New Theatre Restaurant
- Johnson County Community College Theatre
- .NET 8.0 SDK or later
- Windows, macOS, or Linux
-
Clone the repository:
git clone <repository-url> cd KCTheaterScraper
-
Restore dependencies:
dotnet restore
-
Build the project:
dotnet build
# Run scraping once and exit
dotnet run scrape
# Test all configured venues
dotnet run test
# List all configured venues
dotnet run list
# Show help
dotnet run help
# Run as a scheduled service (default)
dotnet runEdit appsettings.json to:
- Add or modify venue configurations
- Adjust scraping settings (delays, timeouts, etc.)
- Change output directory and file names
Example venue configuration:
{
"Name": "Example Theater",
"Url": "https://example-theater.com/events",
"Address": "123 Main St, Kansas City, MO 64111",
"ScraperType": "generic",
"IsActive": true,
"ScraperConfig": {
"EventSelector": ".event-item",
"TitleSelector": "h3.title",
"DateSelector": ".date"
}
}The scraper generates several output files in the configured output directory:
kc-theater-events.ics- iCalendar file with all eventskc-theater-events.json- Detailed event data in JSON formatscraping-statistics.json- Statistics about the scraping runlogs/- Log files with detailed scraping information
-
Specific Scrapers: Custom scrapers for venues with known website structures
KauffmanCenterScraperKCRepScraper
-
Generic Scraper: Fallback scraper that attempts to extract events from any theater website using common HTML patterns
- ScrapingService: Orchestrates the scraping process across all venues
- CalendarService: Handles iCalendar generation and event filtering
- ScheduledScrapingService: Background service that runs scraping on a schedule
- TheaterEvent: Represents a single theater performance with all relevant details
- TheaterVenue: Configuration for a theater venue including scraping parameters
- Add the venue configuration to
appsettings.jsonin theVenuesarray - Set
ScraperTypeto "generic" for most venues - For complex sites, create a custom scraper implementing
ITheaterScraper - Test the new venue with
dotnet run test
The application uses Serilog for logging with output to:
- Console (for immediate feedback)
- File (rolling daily logs in the
logs/directory)
Log levels can be configured in appsettings.json.
KCTheaterScraper/
├── Configuration/ # Configuration classes
├── Models/ # Data models
├── Scrapers/ # Scraper implementations
├── Services/ # Business logic services
├── appsettings.json # Configuration file
└── Program.cs # Application entry point
To create a scraper for a specific venue:
- Inherit from
BaseTheaterScraper - Implement the required methods:
CanScrape()- Determines if this scraper can handle a venueScrapeEventsAsync()- Extracts events from the venue's website
- Register the scraper in
Program.cs
Example:
public class CustomVenueScraper : BaseTheaterScraper
{
public override string ScraperName => "Custom Venue";
public override bool CanScrape(TheaterVenue venue)
{
return venue.ScraperType == "custom" ||
venue.Url.Contains("customvenue.com");
}
public override async Task<List<TheaterEvent>> ScrapeEventsAsync(
TheaterVenue venue, CancellationToken cancellationToken = default)
{
// Implementation here
}
}- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
This tool is for educational and personal use. Please respect the terms of service of the websites being scraped and implement appropriate delays and rate limiting. The authors are not responsible for any misuse of this software.