Wordfreq

Latest version: v3.1.1

Safety actively analyzes 631178 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 5

1.4

- Add large lists in English, German, Spanish, French, and Portuguese
- Add `zipf_frequency` function

[Announcement blog post](https://blog.conceptnet.io/2016/06/02/wordfreq-1-4-more-words-plus-word-frequencies-from-reddit/)

1.3

- Add Reddit comments as an English source

1.2

- Add SUBTLEX data
- Better support for Chinese, using Jieba for tokenization, and mapping
Traditional Chinese characters to Simplified
- Improve Greek
- Add Polish, Swedish, and Turkish
- Tokenizer can optionally preserve punctuation
- Detect when sources stripped "'t" off of English words, and repair their
frequencies

[Announcement blog post](https://blog.luminoso.com/2015/10/29/wordfreq-1-2-is-better-at-chinese-english-greek-polish-swedish-and-turkish/)

1.1

- Use the 'regex' package to implement Unicode tokenization that's mostly
consistent across languages
- Use NFKC normalization in Japanese and Arabic

1.0

- Create compact word frequency lists in English, Arabic, German, Spanish,
French, Indonesian, Japanese, Malay, Dutch, Portuguese, and Russian
- Marginal support for Greek, Korean, Chinese
- Fresh start, dropping compatibility with wordfreq 0.x and its unreasonably
large downloads

Page 5 of 5

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.