Wpull

Latest version: v2.0.1

Safety actively analyzes 629788 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 7 of 15

0.32

Not secure
==================

* Fixes crash when HTML meta refresh URL is empty.
* Fixes crash when decoding a document that is malformed later in the document. These invalid documents are not searched for links.
* Reduces CPU usage when ``--debug`` logging is not enabled.
* Better support for detecting and differentiating XHTML and XML documents.
* Fixes converting XHTML documents where it did not write XHTML syntax.
* RSS/Atom feed ``link``, ``url``, ``icon`` elements are searched for links.

* API:

* ``document.detect_response_encoding()`` default peek argument is lowered to reduce hanging.
* ``document.BaseDocumentDetector`` is now a base class for document type detection.

0.31

Not secure
==================

* Fixes issue where an early ``</html>`` causes link discovery to be broken and converted documents missing elements.
* Fixes ``--no-parent`` which did not behave like Wget. This issue was noticeable with options such as ``--span-hosts-allow linked-pages``.
* Fixes ``--level`` where page requisites were mistakenly not fetched if it exceeds recursion level.
* Includes PhantomJS version string in WARC warcinfo record.
* User-agent string no longer includes Mozilla reference.
* Implements ``--force-html`` and ``--base``.
* Cookies now are limited to approximately 4 kilobytes and a maximum of 50 cookies per domain.
* Document parsing is now streamed for better handling of large documents.

* Scripting:

* Ability to set a scripting API version.
* Scripting API version 2: Adds ``record_info`` argument to ``handle_error`` and ``handle_response``.

* API:

* WARCRecorder uses new parameter object WARCRecorderParams.
* ``document``, ``scraper``, ``converter`` modules heavily modified to accommodate streaming readers. ``document.BaseDocumentReader.parse`` was removed and replaced with ``read_links``.
* `version.version_info` available.

0.30

Not secure
==================

* Fixes crash on SSL handshake if connection is broken.
* DNS entries are periodically removed from cache instead of held for long times.
* Experimental cx_freeze support.

* PhantomJS:

* Fixes proxy errors with requests containing a body.
* Fixes proxy errors with occasional FileNotFoundError.
* Adds timeouts to calls.
* Viewport size is now 1200 × 1920.
* Default ``--phantomjs-scroll`` is now 10.
* Scrolls to top of page before taking snapshot.

* API:

* URL filters moved into urlfilter module.
* Engine uses and exposes interface to AdjustableSemaphore for issue 93.

0.29

Not secure
==================

* Fixes SSLVerficationError mistakenly raised during connection errors.
* ``--span-hosts`` no longer implicitly enabled on non-recursive downloads. This behavior is superseded by strong redirect logic. (Use ``--span-hosts-allow`` to guarantee fetching of page-requisites.)
* Fixes URL query strings normalized with unnecessary percent-encoding escapes. Some servers do not handle percent-encoded URLs well.
* Fixes crash handling directory paths that may contain a filename or a filename that is a directory. This crash occurs when a URL like `/blog` and `/blog/` exists. If a directory path contains a filename, the part of the directory path is suffixed with `.d`. If a filename is an existing directory, the filename is suffixed with `.f`.
* Fixes crash when URL's hostname contains characters that decompose to dots.
* Fixes crash when HTML document declares encoding name unknown to Python.
* Fixes stuck in loop if server returns errors on robots.txt.
* Implements ``--warc-dedup``.
* Implements ``--ignore-length``.
* Implements ``--output-document``.
* Implements ``--http-compression``.
* Supports reading HTTP compression "deflate" encoding (both zlib and raw deflate).

* Scripting:

* Adds ``engine_run()`` callback.
* Exposes the instance factory.

* API:

* connection: ``Connection`` arguments changed. Uses ``ConnectionParams`` as a parameter object. ``HostConnectionPool`` arguments also changed.
* database: ``URLDBRecord`` renamed to ``URL``. ``URLStrDBRecord`` renamed to ``URLString``.

* Schema change:

* New ``visits`` table.

0.28

Not secure
==================

* Fixes crash when redirected to malformed URL.
* Fixes ``--directory-prefix`` not being honored.
* Fixes unnecessary high CPU usage when determining encoding of document.
* Fixes crash (GeneratorExit exception) when exiting on Python 3.4.
* Uses new internal socket connection stream system.
* Updates bundled certificates (Tue Jan 28 09:38:07 2014).
* PhantomJS:

* Fixes things not appearing in WARC files. This regression was introduced in 0.26 where PhantomJS's disk cache was enabled. It is now disabled again.
* Fixes HTTPS proxy URL rewriting where relative URLs were not properly rewritten.
* Fixes proxy URL rewriting not working for localhost.
* Fixes unwanted ``Accept-Language`` header picked up from environment. The value has been overridden to ``*``.
* Fixes ``--header`` options left out in requests.

* API:

* New ``iostream`` module.
* ``extended`` module is deprecated.

0.27

Not secure
==================

* Fixes URLs ignored (if any) on command line when ``--input-file`` is specified.
* Fixes crash when redirected to a URL that is not HTTP.
* Fixes crash if lxml does not recognize the document encoding name. Falls back to Latin1 if lxml does not support the encoding after massaging the encoding name.
* Fixes crash on IPv6 addresses when using scripting or external API calls.
* Fixes speed shown as "0.0 B/s" instead of "-- B/s" when speed can not be calculated.
* Implements ``--local-encoding``, ``--remote-encoding``, ``--no-iri``.
* Implements ``--https-only``.
* Prints bandwidth speed statistics when exiting.
* PhantomJS:

* Implements "smart scrolling" that avoids unnecessary scrolling.
* Adds ``--no-phantomjs-smart-scroll``

* API:

* ``WebProcessorSession._parse_url()`` renamed to ``WebProcessorSession.parse_url()``

Page 7 of 15

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.