Pywb

Latest version: v2.8.3

Safety actively analyzes 630130 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 12 of 16

0.9.2

Not secure
~~~~~~~~~~~~~~~~~~~~~

* Collections Manager: Allow adding any templates to shared directory, fix adding WARCs with relative path.

* Replay: Remove limit by HTTP ``Content-Length`` as it may be invalid (only using the record length).

* WARC Revisit-Resolution Improvements: Support indexes and warcs without any ``digest`` field. If no digest is found, attempt to look up
the original WARC record from the ``WARC-Refers-To-Target-URI`` and ``WARC-Refers-To-Date`` only, even for same url revisits.
(Previously, only used this lookup original url was different from revisit url)

0.9.1

Not secure
~~~~~~~~~~~~~~~~~~~~~

* Implement pagination support for zipnum cluster and added to cdx server api:

https://github.com/ikreymer/pywb/wiki/CDX-Server-API

* cdx server query: add support for ``url=*.host`` and ``url=host/*`` as shortcuts for ``matchType=domain`` and ``matchType=prefix``

* zipnum cdx cluster: support loading index shared from prefix path instead of separate location file.

The ``shard_index_loc`` config property may contain match and replace properties.
Regex replacement is then used to obtain path prefix from the shard prefix path.

* wombat: fix `document.write()` rewriting to rewrite each element at a time and use underlying write for better compatibility.

0.9.0

Not secure
~~~~~~~~~~~~~~~~~~~~~

* New directory-based configuration-less init system! ``config.yaml`` no longer required.

* New ``wb-manager`` collection manager for adding warcs, indexing, adding/removing templates, setting metadata.

More details at: `Auto-Configuration and Wayback Collections Manager <https://github.com/ikreymer/pywb/wiki/Auto-Configuration-and-Wayback-Collections-Manager>`_

* Support for user metadata via per-collection ``metadata.yaml``

* Templates: improved/simpified home page and collection search page, show user metadata by default.

* Support for writing and reading new cdx JSON format (.cdxj), with searchable key followed by json dictionary: ``urlkey timestamp { ... }`` on each line

* ``cdx-indexer -j``: support for generating cdxj format

* ``cdx-indexer -mj``: support for minimal cdx format (in JSON format) only which skips reading the HTTP record.

Fields included in minimal format are: urlkey, timestamp, original url, record length, digest, offset, and filename

* ``cdx-indexer --root-dir <dir>``: option for custom root dir for cdx filenames to be relative to this directory.

* ``wb-manager cdx-convert``: option to convert any existing cdx to new cdxj format, including ensuring cdx key is in SURT canonicalized.

* ``wb-manager autoindex `` / ``wayback -a`` -- Support for auto-updating the cdx indexes whenever any WARC/ARC files are modified or created.

* Switch default ``wayback``, ``cdx-server``, ``live-rewrite-server`` cli apps to use ``waitress`` WSGI container instead of wsgi ref.

New cli options, including ``-p`` (port), ``-t`` (num threads), and ``-d`` (working directory)

* url rewrite: fixes to JS url rewrite (some urls with unencoded chars were not being rewritten),
fixes to WbUrl parsing of urls starting with digits (eg. 1234.example.com) not being parsed properly.

* framed replay: update frame_insert.html to be html5 compliant.

* wombat: fixed to WB_wombat_location.href assignment, properly redirects to dest page even if url is already rewritten

* static paths: static content included with pywb moved from ``static/default`` -> ``static/__pywb`` to free up default as possible collection name
and avoid any naming conflicts. For example, wombat.js can be accessed via ``/static/__pywb/wombat.js``

* default to replay with framed mode enabled: ``framed_replay: true``

0.8.3

Not secure
~~~~~~~~~~~~~~~~~~~~~

* cookie rewrite: all cookie rewriters remove ``secure`` flag to allow equivalent replay of sites with cookies via HTTP and HTTPS.

* html rewrite: fix ``<base>`` tag rewriting to add a trailing slash to the url if it is a hostname with no path, ex:

``<base href="http://example.com" />`` -> ``<base href="http://localhost:8080/rewrite/http://example.com/" />``

* framed replay: fix double slash that remainded when rewriting top frame url.

0.8.2

Not secure
~~~~~~~~~~~~~~~~~~~~~

* rewrite: fix for redirect loop related to pages with 'www.' prefix. Since canonicalization removes the prefix, treat redirect to 'www.' as self-redirect (for now).

* memento: ensure rel=memento url matches timegate redirect exactly (urls may differ due to canonicalization, use actual instead of requested for both)

0.8.1

Not secure
~~~~~~~~~~~~~~~~~~~~~

* wb.js top frame notification: use ``window.__orig_parent`` when referencing actual parent as ``window.parent`` now overriden.

* live proxy security: enable ssl verification for live proxy by default, for use with python 2.7.9 ssl improvements. Was disabled
due to incomplete ssl support in previous versions of python. Can be disabled via ``verify_ssl: False`` per collection.

* cdx-indexer: add recursive option to index warcs in all subdirectories with ``cdx-indexer -r <dir_name>``

Page 12 of 16

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.