content-extractor

Extract article content from given URL

This project is hosted on a free Heroku dyno and you can test it for free: https://contentextractor.yan.dev.br/

It supports javascript rendered website using Chrome and Selenium, but most websites can be crawled using requests.

It is built to get information from articles with at least 1 heading and 2 paragraphs. This application will not work if you're trying to crawl an ecommerce or a social network.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
static		static
templates		templates
tmp		tmp
.gitignore		.gitignore
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
app.py		app.py
content_extractor.py		content_extractor.py
requirements.txt		requirements.txt
start.py		start.py
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

content-extractor

About

Uh oh!

Releases

Packages

Languages

License

gbretas/content-extractor

Folders and files

Latest commit

History

Repository files navigation

content-extractor

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages