Spark-nlp

Latest version: v5.3.3

Safety actively analyzes 629532 Python packages for vulnerabilities to keep your Python projects secure.

Page 9 of 22

3.2.2

Not secure

========

----------------
New Features
----------------
* A new RoBertaSentenceEmbeddings annotator for sentence embeddings used in SentimentDL, ClassifierDL, and MultiClassifierDL annotators
* A new XlmRoBertaSentenceEmbeddings annotator for sentence embeddings used in SentimentDL, ClassifierDL, and MultiClassifierDL annotators
* Add support for AWS MFA via Spark NLP configuration
* Add new AWS configs to Spark NLP configuration when using a private S3 bucket to store logs for training models or access TF graphs needed in NerDLApproach
* spark.jsl.settings.aws.credentials.access_key_id
* spark.jsl.settings.aws.credentials.secret_access_key
* spark.jsl.settings.aws.credentials.session_token
* spark.jsl.settings.aws.s3_bucket
* spark.jsl.settings.aws.region

----------------
Bug Fixes & Enhancements
----------------
* Improve loading merges file for RoBERTa tokenizer
* Remove batchSize param from broadcast in XlmRoBertaEmbeddings to be set after it is created
* Preserve previsouly generated metadata in BertSentenceEmbeddings annotator
* Set `elmo` as a default poolingLayer in ElmoEmbeddings
* Fix special tokens ids in XlmRoBertaEmbeddings annotator
* Fix distilbert_base_token_classifier_ontonotes model
* Fix distilbert_base_token_classifier_conll03 model
* Fix distilbert_base_token_classifier_few_nerd model
* Fix distilbert_token_classifier_persian_ner model
* Fix ner_conll_longformer_base_4096 model

========

3.2.1

Not secure

========
----------------
Patch release
----------------
* Fix "unsupported model" error in pretrained function for LongformerEmbeddings, BertForTokenClassification, and DistilBertForTokenClassification

========

3.2.0

Not secure

========

----------------
Major features and improvements
----------------

* **NEW:** Introducing **LongformerEmbeddings** annotator
* **NEW:** Introducing **BertForTokenClassification** annotator
* **NEW:** Introducing **DistilBertForTokenClassification** annotator
* **NEW:** Introducing **GraphExctraction** and **GraphFinisher** annotators.
* **NEW:** Introducing support for multilingual **DateMatcher** and **MultiDateMatcher** annotators. These two annotators will support **English**, **French**, **Italian**, **Spanish**, **German**, and **Portuguese** languages
* **NEW:** Introducing new **Python APIs** and fully documented **Pydoc**
* **NEW:** Introducing new **Spark NLP configurations** via spark.conf() by deprecating `application.conf` usage
* Add support for S3 to `log_folder` Spark NLP config and `outputLogsPath` param in `NerDLApproach`, `ClassifierDlApproach`, `MultiClassifierDlApproach`, and `SentimentDlApproach` annotators
* Added examples to all Spark NLP Scaladoc
* Added examples to all Spark NLP Pydoc
* Welcoming new Databricks runtimes to our Spark NLP family:
* Databricks 8.4 ML & GPU
* Fix printing a wrong version return in sparknlp.version()

========

3.1.3

Not secure

========

----------------
Bug Fixes & Enhancements
----------------
* Fix serialization issue in NorvigSweetingModel
* Fix the issue with BertSentenceEmbeddings model in TF v2
* Update ArrayType structure to fix Finisher failing to clean up some annotators

========

3.1.2

Not secure

========

----------------
New Features
----------------
* Migrate XlnetEmbeddings to TensorFlow v2. This allows the importing of HuggingFace XLNet models to Spark NLP
* Migrate XlnetEmbeddings to BatchAnnotate to allow better performance on accelerated hardware such as GPU
* Dynamically extract special tokens from SentencePiece model in XlmRoBertaEmbeddings
* Add setIncludeAllConfidenceScores param in NerDLModel to merge confidence scores per label to only predicted label
* Sync Python params with Scala params in ContextSpellCheckerApproach, WordSegmenterApproach, RegexMatcher, and ViveknSentimentApproach,

----------------
Bug Fixes & Enhancements
----------------
* Fix issue with SymmetricDeleteModel
* Fix issue with encoding unknown bytes in RoBertaEmbeddings
* Fix issue with multi-lingual UniversalSentenceEncoder models

----------------
Backward compatibility
----------------

We have migrated XlnetEmbeddings to TensorFlow v2, the earlier models prior to 3.1.2 won't work after this release.
We have already updated the models and uploaded them on Models Hub. You can use `pretrained()` that takes care of it automatically or please make sure you download the new models manually.

========

3.1.1

Not secure

========

----------------
New Features
----------------
* Migrate AlbertEmbeddings to TensorFlow v2. This allows the importing of HuggingFace ALBERT models to Spark NLP
* Migrate AlbertEmbeddings to BatchAnnotate to allow better performance on accelerated hardware such as GPU
* Enable stdout/stderr in real-time for child processes `sparknlp.start()`. Thanks to PySpark 3.x, this is now possible with `sparknlp.start(real_time_output=True)` to have the outputs of Spark NLP (such as metrics during training) right in your Jupyter, Colab, and Kaggle notebooks.
* Complete examples for all annotators in Scaladoc APIs https://github.com/JohnSnowLabs/spark-nlp/pull/5668

----------------
Bug Fixes & Enhancements
----------------
* Fix YakeModel issue with empty token https://github.com/JohnSnowLabs/spark-nlp/pull/5683 thanks to shaddoxac
* Fix getAnchorDateMonth method in DateMatcher and MultiDateMatcher https://github.com/JohnSnowLabs/spark-nlp/pull/5693
* Fix the broken PubTutor class in Python https://github.com/JohnSnowLabs/spark-nlp/pull/5702
* Fix relative dates in DateMatcher and MultiDateMatcher such as `day after tomorrow` or `day before yesterday` https://github.com/JohnSnowLabs/spark-nlp/pull/5706
* Add isPaddedToken param to PubTutor https://github.com/JohnSnowLabs/spark-nlp/pull/5702
* Fix issue with `logger` inside session on some setup https://github.com/JohnSnowLabs/spark-nlp/pull/5715
* Add signatures to TF session to handle inputs/outputs more dynamically in BertEmbeddings, DistilBertEmbeddings, RoBertaEmbeddings, and XlmRoBertaEmbeddings https://github.com/JohnSnowLabs/spark-nlp/pull/5715
* Fix XlmRoBertaEmbeddings issue with `init_all_tables` https://github.com/JohnSnowLabs/spark-nlp/pull/5715
* Add missing random seed param to ClassifierDLApproach, MultiClassifierDLApproach, and SentimentDLApproach https://github.com/JohnSnowLabs/spark-nlp/pull/5697
* Make the Java Exceptions appear before Py4J exceptions for ease of debugging in Python https://github.com/JohnSnowLabs/spark-nlp/pull/5709
* Make sure batchSize set in NerDLModel is the same internally to feed TensorFlow https://github.com/JohnSnowLabs/spark-nlp/pull/5716

----------------
Backward compatibility
----------------

We have migrated AlbertEmbeddings to TensorFlow v2, the earlier models prior to 3.1.1 won't work after this release.
We have already updated the models and uploaded them on Models Hub. You can use `pretrained()` that takes care of it automatically or please make sure you download the new models manually.

========

Page 9 of 22

Releases

Has known vulnerabilities

Previous Next

Spark-nlp

Page 9 of 22

3.2.2

3.2.1

3.2.0

3.1.3

3.1.2

3.1.1

Page 9 of 22

Links

Releases