Tosixinch

Latest version: v0.9.0

Safety actively analyzes 621142 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

0.9.0

-------------------

Many filename related API changes.

The terms (both in code and doc) themselves are changed.
Basically::

url -> rsrc (resource)
fname, download_file -> dfile
fnew, extracted_file -> efile

**Change:**

* **!!** Change name syntax of dfile, Using suffix '.f'

Given URL::

https://en.wikipedia.org/wiki/XPath

Old dfile::

_htmls/en.wikipedia.org/wiki/XPath/_

Now::

_htmls/en.wikipedia.org/wiki/XPath

NOTE: you can no longer use old html cache in '_htmls' folder,
since names are different.

Especially, there should be many now-redundant directory,
which causes download Errors.

E.g. If there is a directory 'XPath', we can not write a file 'XPath'.

* **!!** Change name syntax of efile using suffix '.orig'

Reuse the same name as dfile,
but make sure to rename dfile with the suffix.

Given URL::

https://en.wikipedia.org/wiki/XPath

Old::

_htmls/en.wikipedia.org/wiki/XPath (dfile)
_htmls/en.wikipedia.org/wiki/XPath~.html (efile)

Now::

_htmls/en.wikipedia.org/wiki/XPath.orig (dfile)
_htmls/en.wikipedia.org/wiki/XPath (efile)

* **!!** Change default resource filename 'urls.txt' -> 'rsrcs.txt'

**Add:**

* Use temporary file when downloading (with suffix '.part')

0.8.0

-------------------

Cut dependencies I haven't used for years myself.

**Change:**

* **!!** Cut lib: chardet

* **!!** Cut lib: Qt (webkit and webengine options)

* **!!** Cut lib: readability

* **!!** Cut lib: wkhtmltopdf

* **!!** Cut support: Windows

0.7.0

-------------------

Incremental Improvements

**Change:**

* Change page and <body> margin slightly (for weasyprint)

* Add --inspect action, cutting --link and --news

* Add --headless option, cutting --javascript

* Add '//article' to guess option defaults

* Change '_add_index' from URL to Map class

Very slightly change the handling of '/' and '?' in dfile name

**Add:**

* Add github discussions to sample

* Add loose heuristic for too big images (for inside <table>)

* Add more exclusion of secondary boxes (sample wikipedia)

* Update and Change slugify, add more word separation ('-')

* Add minus number feature to --trimdirs option

* Add --timeout option

* Add --interval option

* Add KeepExtract (no-selection extractor, '--keep-html')

* Add IDTable (fragment link re-resolution when merging htmls)

* Add expose loc_index and loc_appendix to commandline

* Add reading tosixinch.ini and site.ini from current dir

* Add download_dir option

* Add overwrite_html option

* Add current directory search for css and css2 options

**Fix:**

* Fix Add charset for BLANK_HTML (for weasyprint)

* Fix toc title (able to use unicode)

* Fix link when baseurl (<base> tag) is provided

* Fix merge_htmls (no css references for weasyprint)

0.6.0

-------------------

I have refrained from uploading local changes,
waiting for a time when I could look into it closely.
But the time is not coming for a long time,
let's make it updated now.

**Change:**

* When the name of dfile is too long
(if it has a path segment more than 255 characters),
the filename is hashed,
and the filename now takes '_htmls/_hash/<sha1-hexdigit>' form.

* Change browser_engine option default: webkit -> selenium-firefox

* Skip cleaning possible MathJax tag attributes

**Add:**

* Add 'html5prescan' encoding option

* Add elements_to_keep_attrs option

* Add selenium downloading (browser_engine option)

* Add dprocess option

**Fix:**

* Fix ignore any errors in component download

* Fix --browser option error (url was not percent escaped)

0.5.0

-------------------

Most changes are just internal refactorings.

**Change:**

* Add latin_1 to default encoding option

From::

utf-8, cp1252

To::

utf-8, cp1252, latin_1

Which means no encoding errors from input,
and in general it should be preferable.

* Rename method ``_get_relpath`` to ``_get_relative_url``
in ``tosixinch.pcode._pygments.PygmentsCode``.

* Change application config data files to normal INI format

(``data/tosixinch.ini`` and ``data/site.ini``)

Previously the program exposed foreign FINI format files
which is specific to configfetch library.

* Cut Python 3.5

**Add:**

* Add urlno.py and urlmap.py (internal)

('urlno' means url normalization)

* Add lxml_html.py (internal)

* Add action.py (internal)

**Fix:**

* Fix and change user package import

Previously if user's Python environment includes some library
which also has, say, 'script' package,
the program aborted.

0.4.0

-------------------

In this version,
I concentrated many gratuitous API changes I've been thinking,
while trying not to add positive features.

So be careful to upgrade.

**Change:**

* Cut head data inclusion

Previously, the program kept the original <head> content in the extracted file.
Now it just includes a minimal <head> content.
(Shouldn't affect the end user usage).

* **!!** Change default intermediary filenames to '-' and '~'

Previously::

https://en.wikipedia.org/wiki/Xpath
_htmls/en.wikipedia.org/wiki/Xpath/index--tosixinch
_htmls/en.wikipedia.org/wiki/Xpath/index--tosixinch--extracted.html

Now::

https://en.wikipedia.org/wiki/Xpath
_htmls/en.wikipedia.org/wiki/Xpath/_
_htmls/en.wikipedia.org/wiki/Xpath/_~.html

To use old (or other) names, edit new config options.::

loc_index= index--tosixinch
loc_appendix= --extracted

* Cut 'use_sample' option

* Cut 'use_urlreplace' option

* Cut '--sample-urls' option

* Move css from commandline to html link

Previously they are just passed to converter's commandline arguments.

Now they are referenced in each html files as external css.

So you can now specify css files for each site configuration like this::

[wikipedia]
...
css= sample, my_wikipedia.css

(Note: Unlike ``auto_css``,
All css files must be specified explicitly. Not additions to the default.)

* **!!** Cut auto_css

It is now redundant. Just use 'css' option instead (see the above change).

* **!!** Cut auto glob feature (for 'match' option)

Sometimes we need exact match of the end. (like: '\*.html')

But since '\*' was automatically added to the end of the string,
is was impossible.

Now you have to add '\*' explicitly.

And you have to edit the past config files extensively,
like I did for 'site.sample.ini'.
Sorry.

From::

[wikipedia]
...
match= https://*.wikipedia.org/wiki/

To::

match= https://*.wikipedia.org/wiki/*

* Update configfetch (v0.1.0)

It is incompatible with the previous configfetch versions.
Codes and config files will be changed considerably.
It shouldn't affect tosixinch behavior.

* **!!** Rename tosixinch-complete.bash

From:

tosixinch/script/tosixinch-complete.bash

To:

tosixinch/data/_tosixinch.bash

If you are sourcing this bash completion file in e.g. .bashrc,
you have to edit.

* **!!** Rename pre_percmds and post_percmds to pre_each_cmds and post_each_cmds. ::

pre_percmd1 -> pre_each_cmd1
post_percmd1 -> post_each_cmd1
pre_percmd2 -> pre_each_cmd2
post_percmd2 -> post_each_cmd2

You have to edit user config files if you are using them.

* Rename 'qt' option to 'browser_engine'.

* Move 'javascript' option from (general) site.ini to tosixinch.ini.

You can now specify 'javascript' on commandline, tosixinch.ini, or some site sections.

* **!!** Cut util.py, gen.py and site.py and create sample.py (tosixinch.process directory)

Combined three sample files into one.

You have to edit user config files if you are using them. e.g.::

gen.youtube_video_to_thumbnail -> sample.youtube_video_to_thumbnail

or just (See below: 'Add no-dot function name..')::

gen.youtube_video_to_thumbnail -> youtube_video_to_thumbnail

* **!!** Change syntax: from comma to line (defaultprocess and process options)

From::

process= aaa, bbb, ccc

To::

process= aaa
bbb
ccc

You have to edit user config files if you are using them.

* **!!** Rename many process functions (process/sample.py) ::

check_parents_tag -> check_parent_tag
transform_xpath -> build_class_xpath
add_title -> add_h1
add_title_force -> add_h1_force
make_ahref_visible -> show_href
decrease_heading -> lower_heading
decrease_heading_order -> lower_heading_from_order
split_h1_string -> split_h1
replace_h1_string -> replace_h1
change_tagname -> replace_tags
add_noscript_img -> add_noscript_image

You have to edit user config files if you are using them.

* **!!** Rename script/open_viewer.py

From:

open_viewer.py

To:

_view.py

You have to edit user config files if you are using them.

**Add:**

* Add Python3.8

* Add css2 option (and fix misplaced css option)

* Add no-dot function name in process option

Previously the option only accepted one-dot name form
(``<module name>.<function name>``).

Now this form is optional.
The program searches all modules for the function name.

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.