Wpull

Latest version: v2.0.1

Safety actively analyzes 629855 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 15

0.1003

Not secure
===================

* Fixed FTP fetch where code 125 was not recognized as valid.
* Fixed FTP 12 o'clock AM/PM time logic.
* Fixed URLs fetched as lowercase URLs when scheme and authority separator is not provided.
* Added ``--database-uri`` option to specify a SQLAlchemy URI.
* Added ``none`` as a choice to ``--progress``.
* Added ``--user``/``--password`` support.

* Scripting:

* Fixed missing response callback during redirects. Regression introduced in v0.1002.

0.1002

Not secure
===================

* Fixed control characters printed without escaping.
* Fixed cookie size not limited correctly per domain name.
* Fixed URL parsing incorrectly allowing spaces in hostnames.
* Fixed ``--sitemaps`` option not respecting ``--no-parent``.
* Fixed "Content overrun" error on broken web servers. A warning is logged instead.
* Fixed SSL verification error despite ``--no-check-certificate`` is specified.
* Fixed crash on IPv6 URLs containing consecutive dots.
* Fixed crash attempting to connect to IPv6 addresses.
* Consecutive slashes in URL paths are now flattened.
* Fixed crash when fetching IPv6 robots.txt file.
* Added experimental FTP support.
* Switched default HTML parser to html5lib.

* Scripting:

* Added ``handle_pre_response`` callback hook.

* API:

* Fixed ``ConnectionPool`` ``max_host_count`` argument not used.
* Moved document scraping concerns from ``WebProcessorSession`` to ``ProcessingRule``.
* Renamed ``SSLVerficationError`` to ``SSLVerificationError``.

0.1001.2

Not secure
=====================

* Fixed ValueError crash on HTTP redirects with bad IPv6 URLs.
* Fixed AssertionError on link extraction with non-absolute URLs in "codebase" attribute.
* Fixed premature exit during an error fetching robots.txt.
* Fixed executable filename problem in setup.py for cx_Freeze builds.

0.1001.1

Not secure
=====================

* Fixed URLs with IPv6 addresses not including brackets when using them in host strings.
* Fixed AssertionError crash where PhantomJS crashed.
* Fixed database slowness over time.
* Cookies are now synchronized and shared with PhantomJS.

* Scripting:

* Fixed mismatched ``queued_url` and ``dequeued_url`` causing negative values in a counter. Issue was caused by requeued items in "error" status.

0.1001

Not secure
===================

* Fixed ``--warc-move`` option which had no effect.
* Fixed JavaScript scraper to not accept URLs with backslashes.
* Fixed CSS scraper to not accept URLs longer than 500 characters.
* Fixed ValueError crash in Cache when two URLs are added sequentially at the same time due to bad LinkedList key comparison.
* Fixed crash formatting text when sizes reach terabytes.
* Fixed hang which may occur with lots of connection across many hostnames.
* Support for HTTP/HTTPS proxies but no HTTPS tunnelling support. Wpull will refuse to start without the insecure override option. Note that if authentication and WARC file is enabled, the username and password is recorded into the WARC file.
* Improved database performance.
* Added ``--ignore-fatal-errors`` option.
* Added ``--http-parser`` option. You can now use html5lib as the HTML parser.
* Support for PyPy 2.3.1 running with Python 3.2 implementation.
* Consistent URL parsing among various Python versions.
* Added ``--link-extractors`` option.
* Added ``--debug-manhole`` option.

* API:

* ``document`` and ``scraper`` were put into their own packages.
* HTML parsing was put into ``document.htmlparse`` package.
* ``url.URLInfo`` no longer supports normalizing URLs by percent decoding unreserved/safe characters.

* Scripting:

* Dropped support for Scripting API version 1.

* Database schema:

* Column ``url_encoding`` is removed from ``urls`` table.

0.1000

Not secure
===================

* Dropped support for Python 2. Please file an issue if this is a problem.
* Fixed possible crash on empty content with deflate compression.
* Fixed document encoding detection on documents larger than 4096 bytes where an encoded character may have been truncated.
* Always percent-encode IRIs with UTF-8 to match de facto web browser implementation.
* HTTP headers are consistently decoded as Latin-1.
* Scripting API:

* New ``queued_url`` and ``dequeued_url`` hooks contributed by mback2k.

* API:

* Switched to Trollius instead of Tornado. Please use Trollius 1.0.2 alpha or greater.
* Most the of internals related to the HTTP protocol were rewritten and as a result, major components are not backwards compatible; lots of changes were made. If you happen to be using Wpull's API, please pin your requirements to ``<0.1000`` if you do not want to make a migration. Please file an issue if this is a problem.

Page 4 of 15

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.