selenium-youtube-scraper

A slow youtube scraper built with selenium and python. Stores channels, comments, and transcripts into a database file. Uses FireFox in order for the script to run with an adblocker.

Usage

This project is currently using a CLI interface, so run the program from the command line, like so:

python main.py auto "https://www.youtube.com/watch?v=jNQXAC9IVRw"

CLI Commands

auto: Tries to auto detect what kind of link you added, and works from there.
comments: Takes in a video url and tries to save all the comments under it to the database.
video: Takes in a video url and tries to save the generic content like its title and url.
search: Takes in a youtube video search and tries to save the resulting videos to a certain depth.
playlist: Takes in a playlist url and tries to save all videos on the playlist.
--headless: Changes whether the script opens a visible window or not.

Current Database Schema

Comment Saving

The comment scraping tends to crash about 10,000 comments deep. The script slows down substantially. I believe that is due in part to clone comments being loaded, increasing the time it takes for replies to load.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
README.md		README.md
dbSchema.png		dbSchema.png
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

selenium-youtube-scraper

Usage

CLI Commands

Current Database Schema

Comment Saving

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

selenium-youtube-scraper

Usage

CLI Commands

Current Database Schema

Comment Saving

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages