Instructions to use Rallio67/roberta-base-128-factchecker with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Rallio67/roberta-base-128-factchecker with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="Rallio67/roberta-base-128-factchecker")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("Rallio67/roberta-base-128-factchecker") model = AutoModelForMaskedLM.from_pretrained("Rallio67/roberta-base-128-factchecker") - Notebooks
- Google Colab
- Kaggle
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Experimental roberta-base model pretrained from scratch on factual information
Thanks to LAION contributors and Stability.ai
for help building datasets and compute resources.
Using the model:
This is a BERT style model pretrained using the masked language modeling task (MLM)
The model was trained like a standard roberta-base model, except it was only trained on sequences of 128 tokens.
I believe there is a bug in the training script that doesn't properly assign the <BOS> and <EOS> tokens. So make sure to manually remove those from your outputs.
from transformers import AutoTokenizer, AutoModelForMaskedLM
import torch
checkpoint="Rallio67/roberta-base-128-factchecker"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForMaskedLM.from_pretrained(checkpoint, torch_dtype=torch.float16).cuda(0)
print(checkpoint)
text="""Neurospora crassa is a species of filamentous fungi that is widely used in genetic and molecular biology research. It is known for its fast growth rate, relatively simple life cycle, and ability to undergo sexual and asexual reproduction. Here are some specific areas of research where Neurospora crassa is commonly used: Genetics and Genomics: Neurospora crassa was one of the first organisms to have its entire genome sequenced, making it a model organism for studying gene expression, gene regulation, and functional genomics."""
input = tokenizer.encode(text)
original_tokens=input[1:-1]
print("Text length:",len(input))
sequences=[]
for i in range(1,len(input)-2):
newentry=input.copy()
newentry[i]=50264
sequences.append(tokenizer.decode(newentry[1:-1]))
for sequence in sequences:
input = tokenizer(sequence, truncation=True, padding=True, return_tensors="pt")
output = model(input["input_ids"].cuda(0))
predicted_token_id = output[0][0].argmax(axis=-1)
predicted_fixed=predicted_token_id[1:-1]
replaced=(tokenizer.decode(predicted_token_id[1:-1],skip_special_tokens=True))
for z,i in enumerate(input['input_ids'][0]):
if i.item() == 50264:
original=tokenizer.decode(original_tokens[z-5:z+5])
replace=tokenizer.decode(predicted_fixed[z-5:z+5])
if original != replace:
print("Original: "+original,"\n","Replace: "+replace,"\n"+"-"*8)
Outputs:
Rallio67/roberta-base-128-factchecker
Text length: 112
Original: species of filamentous fungi that is widely used in
Replace: species of filamentous fungus that is widely used in
--------
Original: ous fungi that is widely used in genetic and molecular
Replace: ous fungi that is commonly used in genetic and molecular
--------
Original: is widely used in genetic and molecular biology research.
Replace: is widely used in evolutionary and molecular biology research.
--------
- Downloads last month
- 9