Projects

At Zyte we maintain and contribute to a wide variety of open source projects. See below for a list of projects we can mentor this year.

Scrapy

Very popular web crawling and scraping framework for Python used to write spiders for crawling and extracting data from websites.

Ideas Contribute

Dateparser

Python library to easily parse localized dates in almost any string format commonly found on web pages.

Ideas Contribute

Parsel

Python library to extract data from HTML and XML using XPath and CSS selectors.

Ideas Contribute

cssselect

Python library to translate CSS3 selectors into XPath 1.0 expressions.

Ideas Contribute

number-parser

Python library to parse natural language numbers.

Ideas Contribute

Splash

Headless-browser framework for web crawling and scraping, specifically designed to act as an accessory for Scrapy crawlers, though it can be used as a stand-alone tool as well.

Contribute

Scrapy-Splash

Scrapy plugin for transparent integration with Splash.

Contribute

Extruct

Python library for extracting embedded metadata from HTML markup.

Contribute

w3lib

Python library of web-related functions.

Contribute

queuelib

Collection of persistent, disk-based queues for Python.

Contribute

price-parser

Python library for extracting price and currency from raw text strings.

Contribute

js2xml

Convert Javascript code into an XML document.

Contribute

HTML to Text

Python library for extracting text from HTML.

Contribute

Playwright integration for Scrapy

Scrapy Download Handler which performs requests using Playwright.

Contribute

itemloaders

Python library that helps you collect data from HTML and XML sources.

Contribute

Protego

robots.txt parser with support for modern conventions.

Contribute

itemadapter

Common interface to handle objects of different types in an uniform manner, regardless of their underlying implementation.

Contribute