Beautifulsoup4

Latest version: v4.12.3

Safety actively analyzes 629004 Python packages for vulnerabilities to keep your Python projects secure.

Page 4 of 12

4.5.1

* Fixed a crash when passing Unicode markup that contained a
processing instruction into the lxml HTML parser on Python
3. [bug=1608048]

4.5.0

* Beautiful Soup is no longer compatible with Python 2.6. This
actually happened a few releases ago, but it's now official.

* Beautiful Soup will now work with versions of html5lib greater than
0.99999999. [bug=1603299]

* If a search against each individual value of a multi-valued
attribute fails, the search will be run one final time against the
complete attribute value considered as a single string. That is, if
a tag has class="foo bar" and neither "foo" nor "bar" matches, but
"foo bar" does, the tag is now considered a match.

This happened in previous versions, but only when the value being
searched for was a string. Now it also works when that value is
a regular expression, a list of strings, etc. [bug=1476868]

* Fixed a bug that deranged the tree when a whitespace element was
reparented into a tag that contained an identical whitespace
element. [bug=1505351]

* Added support for CSS selector values that contain quoted spaces,
such as tag[style="display: foo"]. [bug=1540588]

* Corrected handling of XML processing instructions. [bug=1504393]

* Corrected an encoding error that happened when a BeautifulSoup
object was copied. [bug=1554439]

* The contents of <textarea> tags will no longer be modified when the
tree is prettified. [bug=1555829]

* When a BeautifulSoup object is pickled but its tree builder cannot
be pickled, its .builder attribute is set to None instead of being
destroyed. This avoids a performance problem once the object is
unpickled. [bug=1523629]

* Specify the file and line number when warning about a
BeautifulSoup object being instantiated without a parser being
specified. [bug=1574647]

* The `limit` argument to `select()` now works correctly, though it's
not implemented very efficiently. [bug=1520530]

* Fixed a Python 3 ByteWarning when a URL was passed in as though it
were markup. Thanks to James Salter for a patch and
test. [bug=1533762]

* We don't run the check for a filename passed in as markup if the
'filename' contains a less-than character; the less-than character
indicates it's most likely a very small document. [bug=1577864]

4.4.1

* Fixed a bug that deranged the tree when part of it was
removed. Thanks to Eric Weiser for the patch and John Wiseman for a
test. [bug=1481520]

* Fixed a parse bug with the html5lib tree-builder. Thanks to Roel
Kramer for the patch. [bug=1483781]

* Improved the implementation of CSS selector grouping. Thanks to
Orangain for the patch. [bug=1484543]

* Fixed the test_detect_utf8 test so that it works when chardet is
installed. [bug=1471359]

* Corrected the output of Declaration objects. [bug=1477847]

4.4.0

Especially important changes:

* Added a warning when you instantiate a BeautifulSoup object without
explicitly naming a parser. [bug=1398866]

* __repr__ now returns an ASCII bytestring in Python 2, and a Unicode
string in Python 3, instead of a UTF8-encoded bytestring in both
versions. In Python 3, __str__ now returns a Unicode string instead
of a bytestring. [bug=1420131]

* The `text` argument to the find_* methods is now called `string`,
which is more accurate. `text` still works, but `string` is the
argument described in the documentation. `text` may eventually
change its meaning, but not for a very long time. [bug=1366856]

* Changed the way soup objects work under copy.copy(). Copying a
NavigableString or a Tag will give you a new NavigableString that's
equal to the old one but not connected to the parse tree. Patch by
Martijn Peters. [bug=1307490]

* Started using a standard MIT license. [bug=1294662]

* Added a Chinese translation of the documentation by Delong .w.

New features:

* Introduced the select_one() method, which uses a CSS selector but
only returns the first match, instead of a list of
matches. [bug=1349367]

* You can now create a Tag object without specifying a
TreeBuilder. Patch by Martijn Pieters. [bug=1307471]

* You can now create a NavigableString or a subclass just by invoking
the constructor. [bug=1294315]

* Added an `exclude_encodings` argument to UnicodeDammit and to the
Beautiful Soup constructor, which lets you prohibit the detection of
an encoding that you know is wrong. [bug=1469408]

* The select() method now supports selector grouping. Patch by
Francisco Canas [bug=1191917]

Bug fixes:

* Fixed yet another problem that caused the html5lib tree builder to
create a disconnected parse tree. [bug=1237763]

* Force object_was_parsed() to keep the tree intact even when an element
from later in the document is moved into place. [bug=1430633]

* Fixed yet another bug that caused a disconnected tree when html5lib
copied an element from one part of the tree to another. [bug=1270611]

* Fixed a bug where Element.extract() could create an infinite loop in
the remaining tree.

* The select() method can now find tags whose names contain
dashes. Patch by Francisco Canas. [bug=1276211]

* The select() method can now find tags with attributes whose names
contain dashes. Patch by Marek Kapolka. [bug=1304007]

* Improved the lxml tree builder's handling of processing
instructions. [bug=1294645]

* Restored the helpful syntax error that happens when you try to
import the Python 2 edition of Beautiful Soup under Python
3. [bug=1213387]

* In Python 3.4 and above, set the new convert_charrefs argument to
the html.parser constructor to avoid a warning and future
failures. Patch by Stefano Revera. [bug=1375721]

* The warning when you pass in a filename or URL as markup will now be
displayed correctly even if the filename or URL is a Unicode
string. [bug=1268888]

* If the initial <html> tag contains a CDATA list attribute such as
'class', the html5lib tree builder will now turn its value into a
list, as it would with any other tag. [bug=1296481]

* Fixed an import error in Python 3.5 caused by the removal of the
HTMLParseError class. [bug=1420063]

* Improved docstring for encode_contents() and
decode_contents(). [bug=1441543]

* Fixed a crash in Unicode, Dammit's encoding detector when the name
of the encoding itself contained invalid bytes. [bug=1360913]

* Improved the exception raised when you call .unwrap() or
.replace_with() on an element that's not attached to a tree.

* Raise a NotImplementedError whenever an unsupported CSS pseudoclass
is used in select(). Previously some cases did not result in a
NotImplementedError.

* It's now possible to pickle a BeautifulSoup object no matter which
tree builder was used to create it. However, the only tree builder
that survives the pickling process is the HTMLParserTreeBuilder
('html.parser'). If you unpickle a BeautifulSoup object created with
some other tree builder, soup.builder will be None. [bug=1231545]

4.3.2

* Fixed a bug in which short Unicode input was improperly encoded to
ASCII when checking whether or not it was the name of a file on
disk. [bug=1227016]

* Fixed a crash when a short input contains data not valid in
filenames. [bug=1232604]

* Fixed a bug that caused Unicode data put into UnicodeDammit to
return None instead of the original data. [bug=1214983]

* Combined two tests to stop a spurious test failure when tests are
run by nosetests. [bug=1212445]

4.3.1

* Fixed yet another problem with the html5lib tree builder, caused by
html5lib's tendency to rearrange the tree during
parsing. [bug=1189267]

* Fixed a bug that caused the optimized version of find_all() to
return nothing. [bug=1212655]

Page 4 of 12

Releases

Has known vulnerabilities

Previous Next

Beautifulsoup4

Page 4 of 12

4.5.1

4.5.0

4.4.1

4.4.0

4.3.2

4.3.1

Page 4 of 12

Links

Releases