Wpull

Latest version: v2.0.1

Safety actively analyzes 629765 Python packages for vulnerabilities to keep your Python projects secure.

Page 8 of 15

0.26

Not secure

==================

* Fixes crash when URLs like ``http://example.com]`` were encountered.
* Implements ``--sitemaps``.
* Implements ``--max-filename-length``.
* Implements ``--span-hosts-allow`` (experimental, see issues 61, 66).
* Query strings items like ``?a&b`` are now preserved and no longer normalized to ``?a=&b=``.
* API:

* url.URLInfo.normalize() was removed since it was mainly used internally.
* Added url.normalize() convenience function.
* writer: safe_filename(), url_to_filename(), url_to_dir_path() were modified.

0.25

Not secure

=================

* Fixes link converter not operating on the correct files when ``.N`` files were written.
* Fixes apparent hang when Wpull is almost finished on documents with many links.

* Previously, Wpull adds all URLs to the database causing overhead processing to be done in the database. Now, only requisite URLs are added to the database.

* Implements ``--restrict-file-names``.
* Implements ``--quota``.
* Implements ``--warc-max-size``. Like Wget, "max size" is not the maximum size of each WARC file but it is the threshold size to trigger a new file. Unlike Wget, ``request`` and ``response`` records are not split across WARC files.
* Implements ``--content-on-error``.
* Supports recording scrolling actions in WARC file when PhantomJS is enabled.
* Adds the ``wpull`` command to ``bin/``.
* Database schema change: ``filename`` column was added.
* API:

* converter.py: Converters no longer use PathNamer.
* writer.py: ``sanitize_file_parts()`` was removed in favor of new ``safe_filename()``. ``save_document()`` returns a filename.
* WebProcessor now requires a root path to be specified.
* WebProcessor initializer now takes "parameter objects".

* Install requires new dependency: ``namedlist``.

0.24

Not secure

==================

* Fixes crash when document encoding could not be detected. Thanks to DopefishJustin for reporting.
* Fixes non-index files incorrectly saved where an extra directory was added as part of their path.
* URL path escaping is relaxed. This helps with servers that don't handle percent-encoding correctly.
* ``robots.txt`` now bypasses the filters. Use ``--no-strong-robots`` to disable this behavior.
* Redirects implicitly span hosts. Use ``--no-strong-redirects`` to disable this behavior.
* Scripting: ``should_fetch()`` info dict now contains ``reason`` as a key.

0.23.1

Not secure

===================

* Important: Fixes issue where URLs were downloaded repeatedly.

0.23

Not secure

=================

* Fixes incorrect logic in fetching robots.txt when it redirects to another URL.
* Fixes port number not included in the HTTP Host header.
* Fixes occasional ``RuntimeError`` when pressing CTRL+C.
* Fixes fetching URL paths containing dot segments. They are now resolved appropriately.
* Fixes ASCII progress bar not showing 100% when finished download occasionally.
* Fixes crash and improves handling of unusual document encodings and settings.
* Improves handling of links with newlines and whitespace intermixed.
* Requires beautifulsoup4 as a dependency.
* API:

* ``util.detect_encoding()`` arguments modified to accept only a single fallback and to accept ``is_html``.
* ``document.get_encoding()`` accepts ``is_html`` and ``peek`` arguments.

0.22.5

Not secure

===================

* The 'Refresh' HTTP header is now scraped for URLs.
* When an error occurs during writing WARC files, the WARC file is truncated back to the last good state before crashing.
* Works around error "Reached maximum read buffer size" downloading on fast connections. Side effect is intensive CPU usage.

Page 8 of 15

Releases

Has known vulnerabilities

Previous Next

Wpull

Page 8 of 15

0.26

0.25

0.24

0.23.1

0.23

0.22.5

Page 8 of 15

Links

Releases