Beautifulsoup4

Latest version: v4.12.3

Safety actively analyzes 629004 Python packages for vulnerabilities to keep your Python projects secure.

Page 10 of 12

3.0.2

Previously, Beautiful Soup correctly handled attribute values that
contained embedded quotes (sometimes by escaping), but not other kinds
of XML character. Now, it correctly handles or escapes all special XML
characters in attribute values.

I aliased methods to the 2.x names (fetch, find, findText, etc.) for
backwards compatibility purposes. Those names are deprecated and if I
ever do a 4.0 I will remove them. I will, I tell you!

Fixed a bug where the findAll method wasn't passing along any keyword
arguments.

When run from the command line, Beautiful Soup now acts as an HTML
pretty-printer, not an XML pretty-printer.

3.0.1

Reintroduced the "fetch by CSS class" shortcut. I thought keyword
arguments would replace it, but they don't. You can't call soup('a',
class='foo') because class is a Python keyword.

If Beautiful Soup encounters a meta tag that declares the encoding,
but a SoupStrainer tells it not to parse that tag, Beautiful Soup will
no longer try to rewrite the meta tag to mention the new
encoding. Basically, this makes SoupStrainers work in real-world
applications instead of crashing the parser.

3.0.0

This release is not backward-compatible with previous releases. If
you've got code written with a previous version of the library, go
ahead and keep using it, unless one of the features mentioned here
really makes your life easier. Since the library is self-contained,
you can include an old copy of the library in your old applications,
and use the new version for everything else.

The documentation has been rewritten and greatly expanded with many
more examples.

Beautiful Soup autodetects the encoding of a document (or uses the one
you specify), and converts it from its native encoding to
Unicode. Internally, it only deals with Unicode strings. When you
print out the document, it converts to UTF-8 (or another encoding you
specify). [Doc reference]

It's now easy to make large-scale changes to the parse tree without
screwing up the navigation members. The methods are extract,
replaceWith, and insert. [Doc reference. See also Improving Memory
Usage with extract]

Passing True in as an attribute value gives you tags that have any
value for that attribute. You don't have to create a regular
expression. Passing None for an attribute value gives you tags that
don't have that attribute at all.

Tag objects now know whether or not they're self-closing. This avoids
the problem where Beautiful Soup thought that tags like <BR /> were
self-closing even in XML documents. You can customize the self-closing
tags for a parser object by passing them in as a list of
selfClosingTags: you don't have to subclass anymore.

There's a new built-in parser, MinimalSoup, which has most of
BeautifulSoup's HTML-specific rules, but no tag nesting rules. [Doc
reference]

You can use a SoupStrainer to tell Beautiful Soup to parse only part
of a document. This saves time and memory, often making Beautiful Soup
about as fast as a custom-built SGMLParser subclass. [Doc reference,
SoupStrainer reference]

You can (usually) use keyword arguments instead of passing a
dictionary of attributes to a search method. That is, you can replace
soup(args={"id" : "5"}) with soup(id="5"). You can still use args if
(for instance) you need to find an attribute whose name clashes with
the name of an argument to findAll. [Doc reference: **kwargs attrs]

The method names have changed to the better method names used in
Rubyful Soup. Instead of find methods and fetch methods, there are
only find methods. Instead of a scheme where you can't remember which
method finds one element and which one finds them all, we have find
and findAll. In general, if the method name mentions All or a plural
noun (eg. findNextSiblings), then it finds many elements
method. Otherwise, it only finds one element. [Doc reference]

Some of the argument names have been renamed for clarity. For instance
avoidParserProblems is now parserMassage.

Beautiful Soup no longer implements a feed method. You need to pass a
string or a filehandle into the soup constructor, not with feed after
the soup has been created. There is still a feed method, but it's the
feed method implemented by SGMLParser and calling it will bypass
Beautiful Soup and cause problems.

The NavigableText class has been renamed to NavigableString. There is
no NavigableUnicodeString anymore, because every string inside a
Beautiful Soup parse tree is a Unicode string.

findText and fetchText are gone. Just pass a text argument into find
or findAll.

Null was more trouble than it was worth, so I got rid of it. Anything
that used to return Null now returns None.

Special XML constructs like comments and CDATA now have their own
NavigableString subclasses, instead of being treated as oddly-formed
data. If you parse a document that contains CDATA and write it back
out, the CDATA will still be there.

When you're parsing a document, you can get Beautiful Soup to convert
XML or HTML entities into the corresponding Unicode characters. [Doc
reference]

2.1.1

Fixed a serious performance bug in BeautifulStoneSoup which was
causing parsing to be incredibly slow.

Corrected several entities that were previously being incorrectly
translated from Microsoft smart-quote-like characters.

Fixed a bug that was breaking text fetch.

Fixed a bug that crashed the parser when text chunks that look like
HTML tag names showed up within a SCRIPT tag.

THEAD, TBODY, and TFOOT tags are now nestable within TABLE
tags. Nested tables should parse more sensibly now.

BASE is now considered a self-closing tag.

2.1.0

Added a wide variety of new search methods which, given a starting
point inside the tree, follow a particular navigation member (like
nextSibling) over and over again, looking for Tag and NavigableText
objects that match certain criteria. The new methods are findNext,
fetchNext, findPrevious, fetchPrevious, findNextSibling,
fetchNextSiblings, findPreviousSibling, fetchPreviousSiblings,
findParent, and fetchParents. All of these use the same basic code
used by first and fetch, so you can pass your weird ways of matching
things into these methods.

The fetch method and its derivatives now accept a limit argument.

You can now pass keyword arguments when calling a Tag object as though
it were a method.

Fixed a bug that caused all hand-created tags to share a single set of
attributes.

2.0.3

Fixed Python 2.2 support for iterators.

Fixed a bug that gave the wrong representation to tags within quote
tags like <script>.

Took some code from Mark Pilgrim that treats CDATA declarations as
data instead of ignoring them.

Beautiful Soup's setup.py will now do an install even if the unit
tests fail. It won't build a source distribution if the unit tests
fail, so I can't release a new version unless they pass.

Page 10 of 12

Releases

Has known vulnerabilities

Previous Next

Beautifulsoup4

Page 10 of 12

3.0.2

3.0.1

3.0.0

2.1.1

2.1.0

2.0.3

Page 10 of 12

Links

Releases