Spark-nlp

Latest version: v5.3.3

Safety actively analyzes 629503 Python packages for vulnerabilities to keep your Python projects secure.

Page 8 of 22

3.3.4

Not secure

========
----------------
Patch release
----------------
* Fix "ClassCastException" error in pretrained function for DistilBertForSequenceClassification in Python

========

3.3.3

Not secure

========
----------------
New Features & Enhancements
----------------
* **NEW:** Introducing **DistilBertForSequenceClassification** annotator in Spark NLP 🚀. `DistilBertForSequenceClassification` DistilBertForSequenceClassification can load DistilBERT Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks. This annotator is compatible with all the models trained/fine-tuned by using `DistilBertForSequenceClassification` or `TFDistilBertForSequenceClassification` in HuggingFace 🤗
* **NEW:** Introducing trainable and distributed **Doc2Vec** annotators based on Word2Vec in Spark ML
* Improving BertEmbeddings for single document/sentence DataFrame per row on a single machine with a GPU device
* Improving BertSentenceEmbeddings for single document/sentence DataFrame per row on a single machine with a GPU device
* Add a new feature to the CoNLL() class, allowing it to read multiple CoNLL files at the same time into a single DataFrame
* Add support for Long type in label column for ClassifierDLApproach and SentimentDLApproach

----------------
Bug Fixes
----------------
* Improve models and pipelines resolutions in Spark NLP when wrong models/pipelines are downloaded regardless of their Apache Spark version
* Fix MarianTransformer bug on empty sequences
* Fix TFInvalidArgumentException in MarianTransformer for sequences longer than 512
* Fix MarianTransformer multi-lingual models and pipelines such as `opus_mt_mul_en` and `opus_mt_mul_en`

========

3.3.2

Not secure

========
----------------
New Features
----------------
* Comet.ml integration with Spark NLP
* Introducing BertForSequenceClassification annotator

----------------
Bug Fixes
----------------
* Fix EntityRulerApproach name from import
* Fix missing EntityRulerModel in ResourceDownloader
* Fix NerDLApproach logs format on Databricks
* Fix a missing batchSize param in NerDLModel that degraded GPU performance

========

3.3.1

Not secure

========
----------------
New Features
----------------
* Introducing EntityRuler annotator to receive either a JSON or CSV ontology file that maps entities to patterns. You can implement a purely rule-based entity recognition system by using EntityRuler, it can be saved as a Model and reused in other pipelines to annotate your document against your knowledge base.

----------------
Bug Fixes
----------------
* Fix compatibility issue between NerOverwriter and AlbertForTokenClassification, BertForTokenClassification, DistilBertForTokenClassification, LongformerForTokenClassification, RoBertaForTokenClassification, XlmRoBertaForTokenClassification, XlnetForTokenClassification annotators
* Fix a bug in ContextSpellCheckerApproach annotator failing to find an appropriate TF graph
* Fix a bug in ContextSpellCheckerModel not being able to load a trained model
* Fix token alignment with token pieces in BertEmbeddings resulting in missing vectors with Unicode characters
* Add the missing pretrained NER models for the XlmRoBertaForTokenClassification annotator
* Add the missing pretrained NER models for the LongformerForTokenClassification annotator

----------------
Backward compatibility
----------------
* Renaming YakeModel to YakeKeywordExtraction to represent the actual purpose of this annotator more clearly.

========

3.3.0

Not secure

========

----------------
Major features and improvements
----------------
* **NEW:** Beginning of Spark NLP 3.3.0 release there will be no limitation of size when you import TensorFlow models! You can now import TF Hub & HuggingFace models larger than 2G of size.
* **NEW:** Up to 50x faster when saving Spark NLP models and pipelines! 🚀 We have improved the way we package TensorFlow SavedModel while saving Spark NLP models & pipelines. For instace, it used to take up to 10 minutes to save `xlm_roberta_base` model prior to Spark NLP 3.3.0, and now it only takes up to 15 seconds!
* **NEW:** Introducing **AlbertForTokenClassification** annotator in Spark NLP 🚀. `AlbertForTokenClassification` can load ALBERT Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by using `AlbertForTokenClassification` or `TFAlbertForTokenClassification` in HuggingFace 🤗
* **NEW:** Introducing **XlnetForTokenClassification** annotator in Spark NLP 🚀. `XlnetForTokenClassification` can load XLNet Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by using `XLNetForTokenClassificationet` or `TFXLNetForTokenClassificationet` in HuggingFace 🤗
* **NEW:** Introducing **RoBertaForTokenClassification** annotator in Spark NLP 🚀. `RoBertaForTokenClassification` can load RoBERTa Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by using `RobertaForTokenClassification` or `TFRobertaForTokenClassification` in HuggingFace 🤗
* **NEW:** Introducing **XlmRoBertaForTokenClassification** annotator in Spark NLP 🚀. `XlmRoBertaForTokenClassification` can load XLM-RoBERTa Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by using `XLMRobertaForTokenClassification` or `TFXLMRobertaForTokenClassification` in HuggingFace 🤗
* **NEW:** Introducing **LongformerForTokenClassification** annotator in Spark NLP 🚀. `LongformerForTokenClassification` can load Longformer Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This annotator is compatible with all the models trained/fine-tuned by using `LongformerForTokenClassification` or `TFLongformerForTokenClassification` in HuggingFace 🤗
* **NEW:** Introducing new ResourceDownloader functions to easily look for pretrained models & pipelines inside Spark NLP (Python and Scala). You can filter models or pipelines via `language`, `version`, or the name of the `annotator`
* Welcoming [Databricks Runtime 9.1 LTS](https://docs.databricks.com/release-notes/runtime/9.1.html), 9.1 ML, and 9.1 ML with GPU
* Fix printing a wrong version return in sparknlp.version()

----------------
Bug Fixes
----------------
* Fix a bug in RoBertaEmbeddings when all special tokens were identical
* Fix a bug in RoBertaEmbeddings when special token contained valid regex
* Fix a bug lead to memory leak inside NorvigSweeting spell checker. This issue caused issues with pretrained pipelines such as `explain_document_ml` and `explain_document_dl` when some inputs
* Fix the wrong types being assigned to `minCount` and `classCount` in Python for `ContextSpellCheckerApproach` annotator
* Fix `explain_document_ml` pretrained pipeline for Spark NLP 3.x on Apache Spark 2.x

========

3.2.3

Not secure

========

----------------
Bug Fixes & Enhancements
----------------
* Add delimiter feature to CoNLL() class to support other delimiters in CoNLL files https://github.com/JohnSnowLabs/spark-nlp/pull/5934
* Add support for IOB in addition to IOB2 format in GraphExctraction https://github.com/JohnSnowLabs/spark-nlp/pull/6101
* Change YakeModel output type from KEYWORD to CHUNK to have more available features after the YakeModel annotator such as Chunk2Doc or ChunkEmbeddings https://github.com/JohnSnowLabs/spark-nlp/pull/6065
* Fix the default language for XlmRoBertaSentenceEmbeddings pretrained model in Python https://github.com/JohnSnowLabs/spark-nlp/pull/6057
* Fix SentenceEmbeddings issue concatenating sentences instead of each correspondent sentence https://github.com/JohnSnowLabs/spark-nlp/pull/6060
* Fix GraphExctraction usage in LightPipeline https://github.com/JohnSnowLabs/spark-nlp/pull/6101
* Fix compatibility issue in `explain_document_ml` pipeline
* Better import process for corrupted merges file in Longformer tokenizer https://github.com/JohnSnowLabs/spark-nlp/pull/6083

========

Page 8 of 22

Releases

Has known vulnerabilities

Previous Next

Spark-nlp

Page 8 of 22

3.3.4

3.3.3

3.3.2

3.3.1

3.3.0

3.2.3

Page 8 of 22

Links

Releases