SciBETO-IMRaD

Fine-tuned SciBETO-large para clasificación de segmentos de papers científicos en español en estructura IMRaD de 8 clases: INTRO, BACK, METH, RES, DISC, CONC, CONTR, LIM.

Entrenamiento

  • Base: Flaglab/SciBETO-large
  • Dataset: anotación manual de papers científicos en español (1264 segmentos de entrenamiento)
  • Estrategia: Head+Tail 256+256 tokens, lr=2e-5, 5 epochs
  • Split: 80/10/10 por documento_id, seed=42

Resultados (test set, 165 segmentos)

Métrica 4 clases 8 clases
F1-macro 0.5781 0.7083

Uso

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("wiflore/SciBETO-IMRaD")
model = AutoModelForSequenceClassification.from_pretrained("wiflore/SciBETO-IMRaD")

text = "En este estudio proponemos un método para..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    logits = model(**inputs).logits
pred = logits.argmax(-1).item()
labels = ['INTRO','BACK','METH','RES','DISC','CONC','CONTR','LIM']
print(labels[pred])
Downloads last month
15
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for wiflore/SciBETO-IMRaD

Finetuned
(1)
this model