Sparknlp

Latest version: v1.0.0

Safety actively analyzes 630169 Python packages for vulnerabilities to keep your Python projects secure.

Page 11 of 12

1.2.6

========
---------------
Enhancements
---------------
* https://github.com/JohnSnowLabs/spark-nlp/pull/82
Vivekn Sentiment Analysis improved memory consumption and training performance
Parameter pruneCorpus is an adjustable value now, defaults to 1. Higher values lead to better performance
but are meant on larger corpora. tokenPattern params are meant to allow different tokenization regex
within the corpora provided on Vivekn and Norvig models.
* https://github.com/JohnSnowLabs/spark-nlp/pull/81
Serialization improvements. New default format (parquet lasted little) is RDD objects. Proved to be lighter on
heap memory. Also added lazier default values for Feature containers. New application.conf performance tunning
settings allow to customize whether we want to Feature broadcast or not, and use parquet or objects in serialization.

========

1.2.5

========
IMPORTANT: Pipelines from 1.2.4 or older cannot be loaded from 1.2.5
---------------
New features
---------------
* https://github.com/JohnSnowLabs/spark-nlp/pull/70
Word embeddings parameter for CRF NER annotator
* https://github.com/JohnSnowLabs/spark-nlp/pull/78
Annotator features replace params and are now serialized using KRYO and partitioned files, increases performance and smaller
memory consumption in Driver for saving and loading pipelines with large corpora. Such features are now also broadcasted
for better performance in distributed environments. This enhancement is a breaking change, does not allow to load older pipelines

----------------
Bug fixes
----------------
* https://github.com/JohnSnowLabs/spark-nlp/commit/cb9aa4366f3e2c9863482df39e07b7bacff13049
Stemmer was not capable of being deserialized (Implements DefaultParamsReadable)
* https://github.com/JohnSnowLabs/spark-nlp/pull/75
Sentence Boundary detector was not properly setting bounds

----------------
Documentation (thanks maziyarpanahi)
----------------
* https://github.com/JohnSnowLabs/spark-nlp/pull/79
Typo in code
* https://github.com/JohnSnowLabs/spark-nlp/pull/74
Bad description

========

1.2.4

========
---------------
New features
---------------
* https://github.com/JohnSnowLabs/spark-nlp/commit/c17ddac7a5a9e775cddc18d672e80e60f0040e38
ResourceHelper now allows input files to be read in the shape of Spark Dataset, implicitly enabling HDFS paths, allowing larger annotator input files. Needs to set 'TXTDS' as input format Param to let annotators read this way. Allowed in: Lemmatizer, EntityExtractor, RegexMatcher, Sentiment Analysis models, Spell Checker and Dependency Parser.

---------------
Enhancements and progress
---------------
* https://github.com/JohnSnowLabs/spark-nlp/commit/4920e5ce394b25937969cc4cab1d81172be722a3
CRF NER Benchmarking progress
* https://github.com/JohnSnowLabs/spark-nlp/pull/64
EntityExtractor refactored. This annotator uses an input file containing a list of entities to look for inside target text. This annotator has been refactored to be of better use and specifically faster, by using a Trie search algorithm. Proper examples included in python notebooks.

---------------
Bug fixes
---------------
* Issue https://github.com/JohnSnowLabs/spark-nlp/issues/41 <> https://github.com/JohnSnowLabs/spark-nlp/commit/d3b9086e834233f3281621d7c82e32195479fc82
Fixed default resources not being loaded properly when using the library through --spark-packages. Improved input reading from resources and folder resources, and falling back to disk, with better error handling.
* https://github.com/JohnSnowLabs/spark-nlp/commit/08405858c6186e6c3e8b668233e30df12fa50374
Corrected param names in DocumentAssembler
* Issue https://github.com/JohnSnowLabs/spark-nlp/issues/58 <> https://github.com/JohnSnowLabs/spark-nlp/commit/5a533952cdacf67970c5a8042340c8a4c9416b13
Deleted a left-over deprecated function which was misleading.
* https://github.com/JohnSnowLabs/spark-nlp/commit/c02591bd683db3f615150d7b1d121ffe5d9e4535
Added a filtering to ensure no empty sentences arrive to unnormalized Vivekn Sentiment Analysis

---------------
Documentation and examples
---------------
* https://github.com/JohnSnowLabs/spark-nlp/commit/b81e95ce37ed3c4bd7b05e9f9c7b63b31d57e660
Added additional resources into FAQ page.
* https://github.com/JohnSnowLabs/spark-nlp/commit/0c3f43c0d3e210f3940f7266fe84426900a6294e
Added Spark Summit example notebook with full Pipeline use case
* Issue https://github.com/JohnSnowLabs/spark-nlp/issues/53 <> https://github.com/JohnSnowLabs/spark-nlp/commit/20efe4a3a5ffbceedac7bf775466b7a8cde5044f
Fixed scala python documentation mistakes
* https://github.com/JohnSnowLabs/spark-nlp/commit/782eb8dce171b69a615887b3defaf8b729b735f2
Typos fix

---------------
Other
---------------
* https://github.com/JohnSnowLabs/spark-nlp/commit/91d8acb1f0f4840dad86db3319d0b062bd63b8c6
Removed Regex NER due to slowness and little use. CRF NER to replace NER.

---------------
Other
---------------
https://github.com/JohnSnowLabs/spark-nlp/commit/91d8acb1f0f4840dad86db3319d0b062bd63b8c6
Removed Regex NER due to slowness and little use. CRF NER to replace NER.

========

1.2.3

========
---------------
Bugfixes
---------------
* Sentence detection not properly bounding punctuation marks
* Sentence detection abbreviations feature disabled by default, since it is a slow feature and not necessary benefitial
* Sentence detection punctuation bounds improved algorithm performance by removing redundant formatting
* Fixed a bug causing sentiment analysis not to work when using normalized tokens that may cause tokens to be deleted from sentence
* Fixed Resource Helper text reading missing first line from text stream. More improvements to come later.

---------------
Other
---------------
CRF NER progress, word embeddings support. Not yet officially released.

========

1.2.2

========
---------------
New features
---------------
* Finisher parameter to export output as Array

========

1.2.1

========
---------------
Bugfixes
---------------
* Finisher not properly deleting annotation columns even when set to true explicitly

========

Page 11 of 12

Releases

Has known vulnerabilities

Previous Next

Sparknlp

Page 11 of 12

1.2.6

1.2.5

1.2.4

1.2.3

1.2.2

1.2.1

Page 11 of 12

Links

Releases