Trafilatura

Latest version: v1.9.0

Safety actively analyzes 629503 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 6 of 8

0.5.0

- faster and more robust text and metadata extraction
- more efficient batch processing (parallel processing, URL queues)
- extraction and processing of ATOM/RSS feeds
- complete command-line tool with corresponding options

0.4.1

- better metadata extraction and integration (XML & XML-TEI)
- more efficient processing
- output directory as CLI-option

0.4

- improved "fast" mode (accuracy and speed)
- better fallbacks with readability-lxml and justext
- metadata extraction added
- more robust processing (tests, encoding handling)

0.3.1

- support for Python 3.4 reactivated
- bugs in XML output and discarding sections solved
- new tests and documentation

0.3.0

- code base re-structured for clarity and readability
- streamlined HTML processing and conversion
- internal less-recently-used cache (LRU) for deduplication
- export as CSV
- better test coverage, extraction recall and precision
- further documentation (trafilatura.readthedocs.org)
- optional processing of text formatting
- more complete settings file

0.2.1

- added metadata to the XML output
- production of valid XML TEI for simple documents

Page 6 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.