Skip to content

gbretas/content-extractor

 
 

Repository files navigation

content-extractor

Extract article content from given URL

This project is hosted on a free Heroku dyno and you can test it for free: https://contentextractor.yan.dev.br/

It supports javascript rendered website using Chrome and Selenium, but most websites can be crawled using requests.

It is built to get information from articles with at least 1 heading and 2 paragraphs. This application will not work if you're trying to crawl an ecommerce or a social network.

About

Extract content from given URL

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 84.7%
  • HTML 15.2%
  • Procfile 0.1%