Product Research Enterprise Plans Docs

Advertools

Latest version: v0.14.2

Safety actively analyzes 619038 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 7

0.14.2

-------------------

* Changed
- Allow ``sitemap_to_df`` to work on offline sitemaps.

0.14.1

-------------------

* Fixed
- Preserve the order of supplied URLs in the output of ``url_to_df``.

0.14.0

-------------------

* Added
- New module ``crawlytics`` for analyzing crawl DataFrames. Includes functions to
analyze crawl DataFrames (``images``, ``redirects``, and ``links``), as well as
functions to handle large files (``jl_to_parquet``, ``jl_subset``, ``parquet_columns``).
- New ``encoding`` option for ``logs_to_df``.
- Option to save the output of ``url_to_df`` to a parquet file.

* Changed
- Remove requirement to delete existing log output and error files if they exist.
The function will now overwrite them if they do.
- Autothrottling is enabled by default in ``crawl_headers`` to minimize being blocked.

* Fixed
- Always get absolute path for img src while crawling.
- Handle NA src attributes when extracting images.
- Change fillna(method="ffill") to ffill for ``url_to_df``.

0.13.5

-------------------

* Added
- Initial experimental functionality for ``crawl_images``.

* Changed
- Enable autothrottling by default for ``crawl_headers``.

0.13.4

-------------------

* Fixed
- Make img attributes consistent in length, and support all attributes.

0.13.3

-------------------

* Changed
- Allow optional trailing space in log files (contributed by andypayne)

* Fixed
- Replace newlines with spaces while parsing JSON-LD which was causing
errors in some cases.

Page 1 of 7

Releases

Has known vulnerabilities

Advertools

Page 1 of 7

0.14.2

0.14.1

0.14.0

0.13.5

0.13.4

0.13.3

Page 1 of 7

Links

Releases