mt5-scansion-gl-cx / README.md
pruizf's picture
Update README.md
2eb6e22 verified
---
library_name: transformers
base_model:
- google/mt5-small
license: apache-2.0
language:
- gl
---
# Model Card for mt5-scan-gl-cx
Metrical scansion in Galician (lexical to metrical syllabification). Fine-tuned mT5.
Uses the previous and following input line as context, as in this example from "Á moda" by Filomena Dato.
Input format: `PREV: sin / *fe / nin / cre- / *en- / zas | CUR: *ten / *cen- / tos / de / al- / *ta- / res | NEXT: *che- / os / de / ri- / *que- / zas | OUTPUT: `
Output for the above: `*ten / *cen- / tos / de al- / *ta- / res`
Use the code below to get started with the model.
```python
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_name = "compellit/mt5-scan-gl-cx"
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
text = "PREV: sin / *fe / nin / cre- / *en- / zas | CUR: *ten / *cen- / tos / de / al- / *ta- / res | NEXT: *che- / os / de / ri- / *que- / zas | OUTPUT: "
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_length=256,
num_beams=1,
do_sample=False
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```