Python

fastcoref

Latest version: v2.1.1

PyUp actively tracks 471,271 Python packages for vulnerabilities to keep your Python environments secure.

Scan your dependencies

2.1.1

patch to disable progress bar at inference (thanks to radandreicristian)

2.1.0

NEW FEATURE (thanks to aryehgigi):

Predict function signature changed to support tokenized text as input:
python
texts: Union[str, List[str], List[List[str]]], similar to huggingface tokenizer inputs
is_split_into_words: bool = False


if you send a tokenized text to the predict function use `is_split_into_words=True`

If you want to use a **single** tokenized sequence you must set`is_split_into_words=True` (to lift the ambiguity with a batch of sequences)

2.0.3

Fix - disabling all spacy components except tokenizer. - see https://github.com/shon-otmazgin/fastcoref/issues/13 (thanks to radandreicristian)

2.0.2

Utilizing the existing spacy instance while using spacy component

2.0.1

Adding the following features:
1. Spacy component (Thanks to mlostar )
python
from fastcoref import spacy_component
import spacy


texts = ['Alice goes down the rabbit hole. Where she would discover a new reality beyond her expectations.']

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("fastcoref")

docs = nlp(texts)
docs[0]._.coref_clusters
> [[(0, 5), (39, 42), (79, 82)]]


2. Trainer
python
from fastcoref import TrainingArgs, CorefTrainer

args = TrainingArgs(
output_dir='test-trainer',
overwrite_output_dir=True,
model_name_or_path='distilroberta-base',
device='cuda:2',
epochs=129,
logging_steps=100,
eval_steps=100
) you can control other arguments such as learning head and others.

trainer = CorefTrainer(
args=args,
train_file='train_file_with_clusters.jsonlines',
dev_file='path-to-dev-file', optional
test_file='path-to-test-file' optional
)
trainer.train()
trainer.evaluate(test=True)

trainer.push_to_hub('your-fast-coref-model-path')



3. predict now support output file:
python
from fastcoref import LingMessCoref

model = LingMessCoref()
preds = model.predict(texts=texts, output_file='train_file_with_clusters.jsonlines')

Links

Releases

Has known vulnerabilities