Tech in T: depth + breadth‎ > ‎AI‎ > ‎

Text Analysis




Sentence bert https://www.sbert.net/docs/training/overview.html#training-data

A minimal example with CosineSimilarityLoss is the following:

from sentence_transformers import SentenceTransformer, InputExample, losses
from torch.utils.data import DataLoader

#Define the model. Either from scratch of by loading a pre-trained model
model = SentenceTransformer('distilbert-base-nli-mean-tokens')

#Define your train examples. You need more than just two examples...
train_examples = [InputExample(texts=['My first sentence', 'My second sentence'], label=0.8),
    InputExample(texts=['Another pair', 'Unrelated sentence'], label=0.3)]

#Define your train dataset, the dataloader and the train loss
train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16)
train_loss = losses.CosineSimilarityLoss(model)

#Tune the model
model.fit(train_objectives=[(train_dataloader, train_loss)], epochs=1, warmup_steps=100)









http://nealcaren.web.unc.edu/an-introduction-to-text-analysis-with-python-part-1/
http://www.kdnuggets.com/software/text.html


NER






Comments