Summary
Reep currently maps player, team, and coach IDs. This issue tracks adding match as a fourth entity type, starting with a community dataset of 16,330 Big 5 league matches (2014–2023).
Dataset
| Field |
Details |
| Source |
match_dictionary_2014_2023_public by u/consolationgoal |
| Matches |
16,330 |
| Period |
Aug 2014 – May 2023 |
| Leagues |
Premier League (3,420), La Liga (3,420), Serie A (3,419), Ligue 1 (3,318), Bundesliga (2,754) |
| ID coverage |
100% across all providers |
Provider IDs mapped
| Provider |
Column |
Notes |
| FBref |
MatchURL |
URL contains match ID |
| FotMob |
fotmob_match_id |
Numeric |
| Understat |
understat_match_id |
Numeric |
| SofaScore |
sofascore_match_id |
Numeric |
| ESPN |
espn_match_id |
Numeric |
Bonus metadata per match
- Date, time, competition, round
- Home/away team names (FBref convention)
- Home/away xG, goals
- Stadium lat/long
Wikidata coverage
Notable matches do have Wikidata items (e.g. Q754483 — 2005 Champions League Final), so the existing Wikidata pipeline can apply for a subset. However, the bulk of regular league matches won't have Wikidata entries, making this a hybrid entity type: Wikidata-backed where available, community-dataset-backed for the rest.
This is the first entity type where the primary data source is not Wikidata — the community CSV provides the canonical ID mappings, and Wikidata supplements with QIDs for notable matches.
Design considerations
- New entity type
match — extends the existing player/team/coach model
- Schema: likely a
matches table (date, competition, home/away team refs) + rows in external_ids with entity_type = 'match'
- Team linkage: home/away teams use FBref names — need to resolve these to existing Reep entity IDs
- Hybrid sourcing: community CSV as primary source; Wikidata SPARQL for notable matches (finals, derbies) to add QIDs where they exist
- API surface: new endpoints or extend existing ones (e.g.
/lookup?provider=fotmob&id=1709697&type=match)
- Future expansion: dataset covers 2014–2023; could extend with newer seasons or additional leagues
Tasks
Summary
Reep currently maps player, team, and coach IDs. This issue tracks adding match as a fourth entity type, starting with a community dataset of 16,330 Big 5 league matches (2014–2023).
Dataset
match_dictionary_2014_2023_publicby u/consolationgoalProvider IDs mapped
MatchURLfotmob_match_idunderstat_match_idsofascore_match_idespn_match_idBonus metadata per match
Wikidata coverage
Notable matches do have Wikidata items (e.g. Q754483 — 2005 Champions League Final), so the existing Wikidata pipeline can apply for a subset. However, the bulk of regular league matches won't have Wikidata entries, making this a hybrid entity type: Wikidata-backed where available, community-dataset-backed for the rest.
This is the first entity type where the primary data source is not Wikidata — the community CSV provides the canonical ID mappings, and Wikidata supplements with QIDs for notable matches.
Design considerations
match— extends the existingplayer/team/coachmodelmatchestable (date, competition, home/away team refs) + rows inexternal_idswithentity_type = 'match'/lookup?provider=fotmob&id=1709697&type=match)Tasks
matchestable schemaexternal_ids(new entity_type vs separate table)/statsendpoint to include match counts