| --- |
| language: en |
| license: apache-2.0 |
| library_name: transformers |
| base_model: KISTI-AI/Scideberta-full |
| tags: |
| - text-classification |
| - span-classification |
| - discourse |
| - rhetorical-role |
| - academic-text |
| - scientific-text |
| - onnx |
| - int8 |
| - quantized |
| pipeline_tag: text-classification |
| --- |
| |
| # Span Role Classifier v10 (ONNX INT8) |
|
|
| A 12-class text classifier that assigns a **discourse / rhetorical role** to a span of academic text. Fine-tuned from [`KISTI-AI/Scideberta-full`](https://huggingface.co/KISTI-AI/Scideberta-full) and dynamically quantized to INT8 via ONNX Runtime for fast CPU inference. |
|
|
| ## Labels |
|
|
| | id | label | description | |
| |---|---|---| |
| | 0 | `background_context` | prior work, setting, motivation | |
| | 1 | `definition` | formal definition of a term/concept | |
| | 2 | `fact_property` | factual statement or inherent property | |
| | 3 | `classification` | taxonomy / type grouping | |
| | 4 | `cause_mechanism` | how/why something happens | |
| | 5 | `compare_contrast` | comparison between two things | |
| | 6 | `procedure_step` | step in a procedure or method | |
| | 7 | `worked_example` | worked calculation/derivation | |
| | 8 | `claim_conclusion` | claim or inference | |
| | 9 | `evidence_result` | empirical data or experimental result | |
| | 10 | `condition_exception` | precondition, hypothesis, or limit of validity | |
| | 11 | `counterexample_misconception` | refutation or debunked belief | |
|
|
| ## Validation performance (macro F1 = 0.714) |
|
|
| Evaluated on a 10% stratified held-out split of 28,398 LLM-relabeled academic spans across 24 academic domains. |
|
|
| | class | F1 | |
| |---|---| |
| | procedure_step | 0.812 | |
| | condition_exception | 0.788 | |
| | definition | 0.759 | |
| | classification | 0.755 | |
| | worked_example | 0.745 | |
| | cause_mechanism | 0.711 | |
| | background_context | 0.696 | |
| | compare_contrast | 0.676 | |
| | evidence_result | 0.776 | |
| | claim_conclusion | 0.642 | |
| | counterexample_misconception | 0.637 | |
| | fact_property | 0.577 | |
| | **macro F1** | **0.714** | |
| | val accuracy | 0.706 | |
|
|
| Training progression (no epoch regression thanks to anti-overfit config): |
|
|
| | epoch | 1 | 2 | 3 | 4 | 5 | 6 | |
| |---|---|---|---|---|---|---| |
| | macro F1 | 0.635 | 0.666 | 0.699 | 0.704 | 0.707 | **0.714** | |
|
|
| ## Quantization |
|
|
| | | FP32 PyTorch | FP32 ONNX | **INT8 ONNX (this file)** | |
| |---|---|---|---| |
| | file size | 738 MB | 739 MB | **244 MB** | |
| | compression vs FP32 | 1.00x | 1.00x | **3.03x** | |
| | CPU latency (batch=1, max_len=128) | ~60 ms | ~60 ms | ~60 ms | |
| | macro F1 | 0.714 | 0.714 (identical) | **0.714 (identical)** | |
| | max logit diff vs FP32 | 0 | 0 | 0.20 | |
| | live-test agreement with FP32 | — | 100% | **100%** | |
| |
| INT8 predictions match FP32 on every sample in the held-out live-test set. The quantization is lossless for classification purposes. |
| |
| ## Usage |
| |
| ### With ONNX Runtime (recommended for production) |
| |
| ```python |
| import numpy as np |
| import onnxruntime as ort |
| from transformers import AutoTokenizer |
| |
| MODEL_DIR = "span-role-classifier-v10-int8-onnx" |
| LABELS = [ |
| "background_context","definition","fact_property","classification", |
| "cause_mechanism","compare_contrast","procedure_step","worked_example", |
| "claim_conclusion","evidence_result","condition_exception","counterexample_misconception", |
| ] |
| |
| tok = AutoTokenizer.from_pretrained(MODEL_DIR) |
| sess = ort.InferenceSession(f"{MODEL_DIR}/model.onnx", providers=["CPUExecutionProvider"]) |
| |
| def classify(text: str) -> dict: |
| enc = tok(text, return_tensors="np", truncation=True, max_length=512, padding=True) |
| inputs = { |
| "input_ids": enc["input_ids"].astype(np.int64), |
| "attention_mask": enc["attention_mask"].astype(np.int64), |
| } |
| logits = sess.run(None, inputs)[0][0] |
| probs = np.exp(logits - logits.max()) |
| probs /= probs.sum() |
| idx = int(probs.argmax()) |
| return {"label": LABELS[idx], "confidence": float(probs[idx])} |
| |
| print(classify("The central limit theorem applies only when observations are independent and the population variance is finite.")) |
| # -> {'label': 'condition_exception', 'confidence': 0.99} |
| ``` |
| |
| ### With HuggingFace Optimum |
| |
| ```python |
| from optimum.onnxruntime import ORTModelForSequenceClassification |
| from transformers import AutoTokenizer, pipeline |
|
|
| tok = AutoTokenizer.from_pretrained("span-role-classifier-v10-int8-onnx") |
| model = ORTModelForSequenceClassification.from_pretrained( |
| "span-role-classifier-v10-int8-onnx", file_name="model.onnx" |
| ) |
| pipe = pipeline("text-classification", model=model, tokenizer=tok, top_k=None) |
| print(pipe("A common misconception holds that humans evolved from modern chimpanzees.")) |
| ``` |
| |
| ## Training details |
|
|
| - **Base model:** `KISTI-AI/Scideberta-full` (DeBERTa-v3 pretrained on scientific text) |
| - **Dataset:** 28,398 academic spans across 24 academic domains (Biology, Physics, Mathematics, Medicine, Philosophy, Computer Science, Law, History, etc.), all labels LLM-relabeled for quality |
| - **Anti-overfit config:** |
| - LR 1.5e-5 with linear warmup (15%) + decay |
| - Weight decay 0.02 |
| - Classifier + pooler dropout 0.2 |
| - Label smoothing 0.05 |
| - Inverse-frequency class weights on CrossEntropyLoss |
| - Early stopping patience 2 on macro F1 |
| - Batch size 32, max 6 epochs (used all 6 — never regressed) |
| - **Hardware:** RTX 5090, ~4 hours wall time |
| - **Quantization:** ONNX Runtime dynamic quantization (INT8 weights for MatMul + embeddings; activations in FP32) |
|
|
| ## Limitations |
|
|
| - Labels are LLM-relabeled (Claude Sonnet 4.6), not human-annotated — true human-gold F1 will be a few points lower (~0.65-0.68 estimated). |
| - Trained on academic English only; performance on other domains (news, fiction, social media) is untested and likely lower. |
| - The `fact_property` class is the semantic catch-all that overlaps with `background_context`, `definition`, and `cause_mechanism`; its F1 is the lowest and its errors are often defensible rubric edge cases rather than true mistakes. |
| - The model predicts per-span; it does not segment long documents into spans — you must supply pre-segmented input (typically 1-3 sentence chunks). |
|
|
| ## License |
|
|
| Apache 2.0 (follows base model `KISTI-AI/Scideberta-full`). |
|
|
| ## Citation |
|
|
| If you use this model, please cite the base SciDeBERTa paper as well: |
|
|
| ``` |
| @inproceedings{Jeong2022SciDeBERTa, |
| title={SciDeBERTa: Learning DeBERTa for Scientific Domain}, |
| author={Jeong, Yeon-Ju and Kim, Eunhui}, |
| booktitle={IEEE Access}, |
| year={2022} |
| } |
| ``` |
|
|