Skip to content

Scrapes potential twitter handles of various athletic organizations.

Notifications You must be signed in to change notification settings

Robbitron16/twitter_handle_manager

Repository files navigation

Twitter Handle Manager

This program provides a semi-autonomous method of extracting Twitter handles from rosters of professional organizations, like the NFL.

Application Basics

First, find a reputable organization or website that keeps track of professional/collegiate rosters. For example, ourlads.com was used for the NFL and NCAA rosters. Then, add the team names and codes to the config.py file (see the file for examples for the NFL and NCAA team names and codes).

Next, the program will scrape Twitter and the website you provided that contains the professional roster. It will return a CSV file with the relevant metadata on each player as well as a candidate Twitter handle and a confidence level indicating how likely that candidate Twitter handle is to be the correct Twitter handle of that player.

Finally, the program will then prompt you to go through each Twitter handle in the output that was rated low confidence and have you manually answer y/n to determine if that Twitter handle looks legitimate.

File Layout

  • config.py: Stores Twitter authentication keys, team names & codes, directory names, league names, and website URLs.
  • main.py: The main program that will be executed.
  • get_our_lads.py: The set of methods that are used to get rosters from the websites provided and to scrape Twitter handles. You will also find the rating system criteria in this file. Please note that some of these methods are specific to ourlads.com and will have to be changed to use other websites or made more generic for other leagues/websites.
  • csv_parser.py: Simple file that reads in a csv file generated by the main program. This is used to cross-reference old outputs against new rosters so that players that are still in a respective league are not rescraped. (In other words, once we find a Twitter handle for a player, we don't want to have to do this again).
  • combine_csvs.py: Simple file that combines multiple csvs into one csv file. This is used to aggregate team outputs into a single league output.
  • clean_up_low_confidence.py: This file provides the method that prompts the user for manual input to verify low confidence twitter handles.

Usage

Twitter Account and APIs

Python comes with many APIs to access Twitter. You can find a good introduction here: https://stackabuse.com/accessing-the-twitter-api-with-python/ (This will also show you how to create a developer account and generate authentication keys). I decided to use twapi and twitter.

You can see how the keys are stored in config.py. I invoke both API's in main.py.

Installation

  1. Download or clone this repository.
  2. Verify that you have a recent version of Python 2 (developed with Python 2.7.15) and Pip (developed with Pip 9.0.1) installed.
  3. Enter the repository and run pip install -r requirements.txt to install all package dependencies.

Running

  1. Verify that you have a compiled directory within the repository. This is where the most recent output of each league will be stored. You can change the name of this directory in config.py.
  2. Verify that you have a directory with the name of the league you want to find handles for. For example, if I was finding handles for the NFL, I need a directory named nfl/ in the repository. This is where team outputs will be stored and aggregated.
  3. Run python main.py <league_name>. For example, the NFL command is python main.py nfl. The process may take some time (~30 minutes for the NFL) before you get to the manual part. Once each team has been scraped, the program will then go through every low confidence rating in the CSV. Note that if the program is aborted, it will save your progress in this step, so you can run it again and it will speed through everything that had been completed before the early termination.
  4. Move the output file into compiled/ when finished.

Contact

If you have any questions, you can reach me at [email protected]

About

Scrapes potential twitter handles of various athletic organizations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published