Adding the following features:
1. Spacy component (Thanks to mlostar )
python
from fastcoref import spacy_component
import spacy
texts = ['Alice goes down the rabbit hole. Where she would discover a new reality beyond her expectations.']
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("fastcoref")
docs = nlp(texts)
docs[0]._.coref_clusters
> [[(0, 5), (39, 42), (79, 82)]]
2. Trainer
python
from fastcoref import TrainingArgs, CorefTrainer
args = TrainingArgs(
output_dir='test-trainer',
overwrite_output_dir=True,
model_name_or_path='distilroberta-base',
device='cuda:2',
epochs=129,
logging_steps=100,
eval_steps=100
) you can control other arguments such as learning head and others.
trainer = CorefTrainer(
args=args,
train_file='train_file_with_clusters.jsonlines',
dev_file='path-to-dev-file', optional
test_file='path-to-test-file' optional
)
trainer.train()
trainer.evaluate(test=True)
trainer.push_to_hub('your-fast-coref-model-path')
3. predict now support output file:
python
from fastcoref import LingMessCoref
model = LingMessCoref()
preds = model.predict(texts=texts, output_file='train_file_with_clusters.jsonlines')