You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Span Role Classifier v10 (ONNX INT8)

A 12-class text classifier that assigns a discourse / rhetorical role to a span of academic text. Fine-tuned from KISTI-AI/Scideberta-full and dynamically quantized to INT8 via ONNX Runtime for fast CPU inference.

Labels

id	label	description
0	`background_context`	prior work, setting, motivation
1	`definition`	formal definition of a term/concept
2	`fact_property`	factual statement or inherent property
3	`classification`	taxonomy / type grouping
4	`cause_mechanism`	how/why something happens
5	`compare_contrast`	comparison between two things
6	`procedure_step`	step in a procedure or method
7	`worked_example`	worked calculation/derivation
8	`claim_conclusion`	claim or inference
9	`evidence_result`	empirical data or experimental result
10	`condition_exception`	precondition, hypothesis, or limit of validity
11	`counterexample_misconception`	refutation or debunked belief

Validation performance (macro F1 = 0.714)

Evaluated on a 10% stratified held-out split of 28,398 LLM-relabeled academic spans across 24 academic domains.

class	F1
procedure_step	0.812
condition_exception	0.788
definition	0.759
classification	0.755
worked_example	0.745
cause_mechanism	0.711
background_context	0.696
compare_contrast	0.676
evidence_result	0.776
claim_conclusion	0.642
counterexample_misconception	0.637
fact_property	0.577
macro F1	0.714
val accuracy	0.706

Training progression (no epoch regression thanks to anti-overfit config):

epoch	1	2	3	4	5	6
macro F1	0.635	0.666	0.699	0.704	0.707	0.714

Quantization

	FP32 PyTorch	FP32 ONNX	INT8 ONNX (this file)
file size	738 MB	739 MB	244 MB
compression vs FP32	1.00x	1.00x	3.03x
CPU latency (batch=1, max_len=128)	~60 ms	~60 ms	~60 ms
macro F1	0.714	0.714 (identical)	0.714 (identical)
max logit diff vs FP32	0	0	0.20
live-test agreement with FP32	—	100%	100%

INT8 predictions match FP32 on every sample in the held-out live-test set. The quantization is lossless for classification purposes.

Usage

With ONNX Runtime (recommended for production)

import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer

MODEL_DIR = "span-role-classifier-v10-int8-onnx"
LABELS = [
    "background_context","definition","fact_property","classification",
    "cause_mechanism","compare_contrast","procedure_step","worked_example",
    "claim_conclusion","evidence_result","condition_exception","counterexample_misconception",
]

tok = AutoTokenizer.from_pretrained(MODEL_DIR)
sess = ort.InferenceSession(f"{MODEL_DIR}/model.onnx", providers=["CPUExecutionProvider"])

def classify(text: str) -> dict:
    enc = tok(text, return_tensors="np", truncation=True, max_length=512, padding=True)
    inputs = {
        "input_ids":      enc["input_ids"].astype(np.int64),
        "attention_mask": enc["attention_mask"].astype(np.int64),
    }
    logits = sess.run(None, inputs)[0][0]
    probs = np.exp(logits - logits.max())
    probs /= probs.sum()
    idx = int(probs.argmax())
    return {"label": LABELS[idx], "confidence": float(probs[idx])}

print(classify("The central limit theorem applies only when observations are independent and the population variance is finite."))
# -> {'label': 'condition_exception', 'confidence': 0.99}

With HuggingFace Optimum

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, pipeline

tok = AutoTokenizer.from_pretrained("span-role-classifier-v10-int8-onnx")
model = ORTModelForSequenceClassification.from_pretrained(
    "span-role-classifier-v10-int8-onnx", file_name="model.onnx"
)
pipe = pipeline("text-classification", model=model, tokenizer=tok, top_k=None)
print(pipe("A common misconception holds that humans evolved from modern chimpanzees."))

Training details

Base model: KISTI-AI/Scideberta-full (DeBERTa-v3 pretrained on scientific text)
Dataset: 28,398 academic spans across 24 academic domains (Biology, Physics, Mathematics, Medicine, Philosophy, Computer Science, Law, History, etc.), all labels LLM-relabeled for quality
Anti-overfit config:
- LR 1.5e-5 with linear warmup (15%) + decay
- Weight decay 0.02
- Classifier + pooler dropout 0.2
- Label smoothing 0.05
- Inverse-frequency class weights on CrossEntropyLoss
- Early stopping patience 2 on macro F1
- Batch size 32, max 6 epochs (used all 6 — never regressed)
Hardware: RTX 5090, ~4 hours wall time
Quantization: ONNX Runtime dynamic quantization (INT8 weights for MatMul + embeddings; activations in FP32)

Limitations

Labels are LLM-relabeled (Claude Sonnet 4.6), not human-annotated — true human-gold F1 will be a few points lower (~0.65-0.68 estimated).
Trained on academic English only; performance on other domains (news, fiction, social media) is untested and likely lower.
The fact_property class is the semantic catch-all that overlaps with background_context, definition, and cause_mechanism; its F1 is the lowest and its errors are often defensible rubric edge cases rather than true mistakes.
The model predicts per-span; it does not segment long documents into spans — you must supply pre-segmented input (typically 1-3 sentence chunks).

License

Apache 2.0 (follows base model KISTI-AI/Scideberta-full).

Citation

If you use this model, please cite the base SciDeBERTa paper as well:

@inproceedings{Jeong2022SciDeBERTa,
  title={SciDeBERTa: Learning DeBERTa for Scientific Domain},
  author={Jeong, Yeon-Ju and Kim, Eunhui},
  booktitle={IEEE Access},
  year={2022}
}

Downloads last month: 5

Model tree for Bei0001/span_role_classifier

Base model

KISTI-AI/Scideberta-full

Quantized

(1)

this model