Pyspider

Latest version: v0.3.10

Safety actively analyzes 621803 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

0.3.10

New features:
* add phantomjs proxy support 692 volvofixthis
* support redis 3.x in cluster mode for message queue hackty

Fix several bugs:
* Improve the performance of counter.to_dict
* Fixed issue of counter changed during read
* Fix tornado version dependency in setup.py

0.3.9

New features:
* Support for Python 3.6.
* Auto Pause: the project will be paused for `scheduler.PAUSE_TIME` (default: 5min) when last `scheduler.FAIL_PAUSE_NUM` (default: 10) task failed, and dispatch `scheduler.UNPAUSE_CHECK_NUM` (default: 3) tasks after `scheduler.PAUSE_TIME`. Project will resume if any one of last `scheduler.UNPAUSE_CHECK_NUM` tasks success.
* Each callback now have a default 30s process time limit. (Platform support required) beader
* New Javascript render engine - Splash support: Enabled by fetch argument `--splash-endpoint=http://splash:8050/execute`
* Python3 webdav support.
* Python3 `from projects import project` support.
* A link to corresponding task is added to webui debug page when debugging a exists task in webui.
* New `user_agent` parameter in `self.crawl`, you can set user-agent by headers though.


Fix several bugs:
* New webui dashboard frontend framework - [vue.js](https://vuejs.org/), improved the performance when having large number of tasks (e.g. http://demo.pyspider.org/)
* Fix crawl_config doesn't work in webui while debugging a script issue.
* Fix CSS Selector Helper doesn't work issue. ackalker
* Fix `connection_timeout` not working issue.
* FIx `need_auth` option not applied on webdav issue.
* Fix "fix can't dump counter to file: scheduler.all" error.
* Some other fixes

0.3.8

New features:
- Now you can use [`cancel`](http://docs.pyspider.org/en/latest/apis/self.crawl/cancel) to stop an active task of a task with `auto_recrawl` enabled.
- `Handler.crawl_config` will now be applied to the task when fetching. (It's applied when the task created before, that means proxy/headers can be changed afterward). See [http://docs.pyspider.org/en/latest/apis/self.crawl/handlercrawl_config](http://docs.pyspider.org/en/latest/apis/self.crawl/handlercrawl_config)

Fix several bugs:
- \* Fixed a global config object thread interference issue, which may cause `connect to scheduler rpc error: error(10061, '')` error when `all --run-in=thread` (default in windows platform)
- Fix `response.save` lost when fetch failed issue
- Fix potential scheduler failure caused by old version of six
- Fix result dump return nothing when using mongodb backend

0.3.7

- ThreadBaseScheduler added to improve the performance of scheduler
- robots.txt supported!
- elasticsearch database backend supported!
- new script callback `on_finished`, http://docs.pyspider.org/en/latest/About-Projects/on_finished-callback
- you can now set the delay time between retries:

> retry_delay is a dict to specify retry intervals. The items in the dict
> are {retried: seconds}, and a special key: '' (empty string) is used to
> specify the default retry delay if not specified.
- dict parameters in crawl_config, config will be merged (e.g. headers), thanks to ihipop
- add parameter `max_redirects` in `self.crawl` to control maximum redirect numbers when doing the fetch, thanks to AtaLuZiK
- add parameter `validate_cert` in `self.crawl` to ignore the error of server’s certificate.
- new property `etree` for Response, `etree` is a cached lxml.html.HtmlElement object, thanks to waveyeung
- you can now pass arguments to phantomjs from command line or config file.
- support for pymongo 3.0
- local.projectdb now accept a glob path (e.g. script/*.py) to load multiple projects from local filesystem.
- queue size in the dashboard is not working for osx, thanks to xyb
- counters in dashboard will shown for stopped projects
- other bug fix

0.3.6

- NEW: webdav mode, now you can use [webdav](http://www.webdav.org/) to mount project folder to your local filesystem and edit scripts with your favority editor! (not support python 3, wsgidav required, which is not contained in setup.py)
- bug fixes for Python 3 compatibility, Postgresql, flask-Login>=0.3.0, typo and more, thanks for the help of lushl9301 hitjackma exoticknight d0ugal qiang.luo twinmegami jttoday machinewu littlezz yaokaige
- fix Queue.qsize NotImplementedError on Mac OS X, thanks xyb

0.3.5

- New parameter: auto_recrawl - auto restart task every `age`.
- New parameter: js_viewport_width/js_viewport_height to set viewport size for phantomjs engine.
- New command line option to set different message queue backends with URI scheme.
- New task level storage mechanism: `self.save`
- New redis taskdb
- New redis message queue.
- New high level message queue interface kombu.
- Fix bugs related to mongodb (keyword missing if not set).
- Fix phantomjs not work in all mode.
- Fix a potential deadlock in processor send_message.
- Default log level of scheduler is changed to INFO

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.