Skip to content

Add match entity support (Big 5 leagues, 2014-2023) #3

@rahulkeerthi

Description

@rahulkeerthi

Summary

Reep currently maps player, team, and coach IDs. This issue tracks adding match as a fourth entity type, starting with a community dataset of 16,330 Big 5 league matches (2014–2023).

Dataset

Field Details
Source match_dictionary_2014_2023_public by u/consolationgoal
Matches 16,330
Period Aug 2014 – May 2023
Leagues Premier League (3,420), La Liga (3,420), Serie A (3,419), Ligue 1 (3,318), Bundesliga (2,754)
ID coverage 100% across all providers

Provider IDs mapped

Provider Column Notes
FBref MatchURL URL contains match ID
FotMob fotmob_match_id Numeric
Understat understat_match_id Numeric
SofaScore sofascore_match_id Numeric
ESPN espn_match_id Numeric

Bonus metadata per match

  • Date, time, competition, round
  • Home/away team names (FBref convention)
  • Home/away xG, goals
  • Stadium lat/long

Wikidata coverage

Notable matches do have Wikidata items (e.g. Q754483 — 2005 Champions League Final), so the existing Wikidata pipeline can apply for a subset. However, the bulk of regular league matches won't have Wikidata entries, making this a hybrid entity type: Wikidata-backed where available, community-dataset-backed for the rest.

This is the first entity type where the primary data source is not Wikidata — the community CSV provides the canonical ID mappings, and Wikidata supplements with QIDs for notable matches.

Design considerations

  1. New entity type match — extends the existing player/team/coach model
  2. Schema: likely a matches table (date, competition, home/away team refs) + rows in external_ids with entity_type = 'match'
  3. Team linkage: home/away teams use FBref names — need to resolve these to existing Reep entity IDs
  4. Hybrid sourcing: community CSV as primary source; Wikidata SPARQL for notable matches (finals, derbies) to add QIDs where they exist
  5. API surface: new endpoints or extend existing ones (e.g. /lookup?provider=fotmob&id=1709697&type=match)
  6. Future expansion: dataset covers 2014–2023; could extend with newer seasons or additional leagues

Tasks

  • Design matches table schema
  • Decide how matches fit into external_ids (new entity_type vs separate table)
  • Write ingestion script for the CSV
  • Resolve FBref team names → Reep entity IDs
  • Explore Wikidata SPARQL for notable match QIDs to supplement the dataset
  • Update Worker API to support match lookups
  • Update /stats endpoint to include match counts
  • Document match data in README

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions