Hazm

Latest version: v0.10.0

Safety actively analyzes 619647 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 3

1.0

Compatible with [Hazm 0.1](https://github.com/sobhe/hazm/tree/v0.1)
- Text cleaning
- Sentence and word tokenizer
- Word lemmatizer
- POS tagger (Stanford POSTagger)
- Dependency parser (MaltParser [v1.8](http://www.maltparser.org/))
- Corpus readers for Hamshahri and Bijankhan

You can download [pre-trained tagger](http://dl.dropboxusercontent.com/u/90405495/resources-extra.zip) and [parser models](http://dl.dropboxusercontent.com/u/90405495/resources-extra.zip) for persian and put these models in the `resources` folder of your project.

0.10.0

- Added `SpacyPOSTagger` class for utilizing the hazm deep learning transformer-based model in POS tagging. MortezaMahdaviMortazavi
- Added `SpacyChunker` class for leveraging the hazm deep learning transformer-based model in chunking. MortezaMahdaviMortazavi
- Added `SpacyDependencyParser` class for employing the hazm deep learning transformer-based model in dependency parsing. MortezaMahdaviMortazavi
- Added 160,000 new words to improve `normalizer` and `lemmatizer`. sir-kokabi
- Added `FaSpellReader` to read [FAspell corpus](https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-1547). sir-kokabi
- Added `ArmanReader` to read [ArmanPersoNERCorpus](https://github.com/HaniehP/PersianNER). sir-kokabi
- Added `PnSummaryReader` to read [pn-summary corpus](https://github.com/hooshvare/pn-summary). sir-kokabi
- Removed unnecessary old Stanford dependencies.. sir-kokabi

**[Download pretrained-models](https://github.com/roshan-research/hazm#pretrained-models)**

**Full Changelog**: https://github.com/roshan-research/hazm/compare/v0.9.4...v0.10.0

0.9.4

- Added `join_abbreviations` to skip abbrs tokenizing using [ParsiNorm's abbreviation lists](https://github.com/haraai/ParsiNorm). #216 optimopium sir-kokabi.
- Added `MizanReader` to read [Mizan corpus](https://github.com/omidkashefi/Mizan). sir-kokabi.
- Added `NaabReader` to read [Naab corpus](https://huggingface.co/datasets/SLPL/naab). sir-kokabi.
- Added `NerReader` to read [NER corpus](https://github.com/Text-Mining/Persian-NER). sir-kokabi.
- Improved `Normalizer` by adding support for normalizing words with the suffix 'هایی'. sir-kokabi.
- Fixed 298: Incompatibility issues with numpy. mhdi707 sir-kokabi

**[Download pretrained-models](https://github.com/roshan-research/hazm#pretrained-models)**

**Full Changelog**: https://github.com/roshan-research/hazm/compare/v0.9.3...v0.9.4

0.9.3

Fixed
- Fix critical bug in `Lemmatizer ` that caused incorrect lemmatization of certain words. sir-kokabi.
- Fix bug caused `WikipediaReader` to not work as before 287. sir-kokabi.
- Fix missing imports for `WikipediaReader`and `PersianPlainTextReader` 286. sir-kokabi.
- Fix some issues in the [demo](https://www.roshan-ai.ir/hazm/demo/) to make it compatible with the latest version of Hazm. sir-kokabi.
- Fix a few issues related to tests and mkdocs build. sir-kokabi.
- Improve [documentation](https://www.roshan-ai.ir/hazm/docs/). sir-kokabi.
- improve dependency tree visualization on the demo page. sir-kokabi.

**[Download pretrained-models](https://github.com/roshan-research/hazm#pretrained-models)**

**Full Changelog**: https://github.com/roshan-research/hazm/compare/v0.9.2...v0.9.3

0.9.2

Added
- Add pretrained `DependencyParser` models. E-Ghafour.
- Add `UniversalDadeganReader ` class for process and read [Universal Persian Dependency Treebank corpus](https://github.com/phsfr/UD_Persian-PerDT). E-Ghafour, imani.
- Add 400+ new words to improve `Normalizer`, `Lemmatizer `and `Tokenizer`. sir-kokabi.

Fixed
- Fix `DependencyParser` issue 282. E-Ghafour, imani.
- Fix Some tests issues. E-Ghafour.

**[Download pretrained-models](https://github.com/roshan-research/hazm#pretrained-models)**

**Full Changelog**: https://github.com/roshan-research/hazm/compare/v0.9...v0.9.2

0.9

Added
- Windows compaitiblity by using `Python-crfsuite` instead of `Wapiti`. E-Ghafour.
- Pretrained `Chunker` and `POSTagger` models with `Python-crfsuite`. E-Ghafour.
- new parameters in Normalizer to better text processing. sir-kokabi.
- Three regex patterns in Normalizer to fix ZWNJs and spacing issues. sir-kokabi.
- 400 Non-standard unicode characters to be replaced in `Normalizer`. sir-kokabi.
- 40,000+ new words to improve `Lemmatizer` and `Tokenizer`. sir-kokabi.
- `train` function for `Word2vec` and `Sent2vec` modules in `Embedding`. E-Ghafour.
- Implement `keywordExtraction` with the `embedRank` approach as a sample of Hazm usage. E-Ghafour.
- Support `Universal tags` in `POSTagger`. E-Ghafour.
- Support universal POS mapper in `PeykareReader` & `DadeganReader` (239). phsfr.
- `PersianPlainTextReader` to process raw text datasets (120). mhbashari.
- Support `EZ` tag in `PeykareReader`. E-Ghafour.
- Slash & back-slash (/ \) support in `Tokenizer` (102). elahimanesh.
- `Conjugation` class to handle verb conjugation. sir-kokabi.

Fixed
- Improve the accuracy of `POSTagger` and `Chunker`. E-Ghafour.
- Improve `InformalNormalizer` 219. riasati.
- Fix pep8 issues. (135). hadifar.
- Fix Some tests issues. sir-kokabi E-Ghafour.
- Fix `Stemmer` issues with multiple suffixes. sir-kokabi.
- Fix various reported issues

Changed
- Drop Python 2 support and migrate all code to Python 3. sir-kokabi.
- Use `data_maker` function instead of `patterns` in `SequenceTagger`. E-Ghafour.
- Refactor `IOBTagger` and `POSTagger` to be compatible with `data_maker`. E_Ghafour.
- Change می روم to می‌روم in example (203). SMSadegh19.
- Overhaul the project structure and GitHub repo. sir-kokabi.

**[Download Pretrained models](https://github.com/roshan-research/hazm#modules-accuracy)**

**Full Changelog**: https://github.com/roshan-research/hazm/compare/v0.8.2...v0.9

Page 1 of 3

Releases

Has known vulnerabilities

Hazm

Page 1 of 3

1.0

0.10.0

0.9.4

0.9.3

0.9.2

0.9

Page 1 of 3

Links

Releases