Transmogrify.webcrawler

Latest version: v1.2.1

Safety actively analyzes 621409 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

1.2.2

------------------

- support whitelisting of urls [djay]
- post-only option to turn GET urls to POST [djay]
- experimental ZODB cache for items in order to handle larger crawls [djay]
- support python 2.6 [djay]
- clean up code and improved logging [djay]

1.2.1

-----------------

- setuptools-git wasn't installed so release was missing files [djay]

1.2

----------------
- fix cache check to prevent overwriting cache [djay]
- turn redirects into Link objects [djay]
- summary stats of which mimetypes were crawled [djay]
- fixed bug where redirected pages weren't getting uploaded [djay]
- fixed bugs with storing default pages in cache [djay]
- fixed bug with space chars in urls [ivanteoh]
- better handling of charset detection [djay]

1.1

----------------

- add start-urls option [djay]
- add ignore_robots option [djay]
- fixed bug in http-equiv refresh handling [djay]
- fixes to disk caching [djay]
- better logging [djay]
- default maxsize is unlimited [djay]
- Provide ability for the reformat function to substitute patterns with
empty strings (nothing). Buildout does not support empty lines within
configuration, so if a substitution is <EMPTYSTRING> this becomes an empty
string. [davidjb]
- Provide a logger in the LXMLPage class so the reformat function can
succeed [davidjb]
- Reformat spacing in webcrawler reformat function [davidjb]

1.0

----------------
- many fixes for importing from local directory w/ many languages [simahawk]
- fix UnicodeEncodeError when file name/language is not english [simahawk]
- fix iterating over non-sequence [simahawk]
- fix missing import for MyStringIO [simahawk]

1.0b7

------------------
- fix bug in cache check [djay]

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.