REBEL Quantum Physics (Mixed Training)
This is a fine-tuned REBEL model for relation extraction and knowledge graph triplet generation, specialized for quantum physics domain while maintaining general knowledge extraction capabilities.
Model Description
REBEL (Relation Extraction By End-to-end Language generation) is a seq2seq model that performs end-to-end relation extraction for more than 200 different relation types. This model has been fine-tuned on a mixed dataset combining domain-specific quantum physics triplets with general knowledge triplets.
- Base Model: Babelscape/rebel-large
- Fine-tuned on: Mixed dataset (quantum physics + general REBEL data)
- Training Data: ~203k examples (191k train, 6k val, 6k test)
- Language: English
- Task: Relation Extraction / Knowledge Graph Triplet Generation
Training Data
The model was fine-tuned on a carefully curated mixed dataset:
- Domain-specific data: ~48,000 quantum physics triplets
- General data: ~144,000 general knowledge triplets (1:3 ratio)
- Validation: ~6,000 domain-only quantum physics examples
- Test: ~6,000 domain-only quantum physics examples
This mixed training approach allows the model to:
- Excel at quantum physics domain extraction
- Maintain strong general knowledge extraction capabilities
- Avoid catastrophic forgetting of general relations
Usage
Direct Inference
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_name = "konsman/rebel-quantum-mixed"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
text = "Quantum entanglement is a physical phenomenon that occurs when pairs of particles interact."
# Tokenize input
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=256)
# Generate triplets
outputs = model.generate(
inputs["input_ids"],
max_length=256,
num_beams=5,
early_stopping=True
)
# Decode output
triplets_text = tokenizer.decode(outputs[0], skip_special_tokens=False)
print(triplets_text)
Parsing Triplets
import re
def extract_triplets(text):
"""Extract structured triplets from REBEL output."""
triplets = []
pattern = r'<triplet> (.+?) <subj> (.+?) <obj> (.+?)(?=<triplet>|</s>|$)'
for match in re.finditer(pattern, text):
subject = match.group(1).strip()
obj = match.group(2).strip()
relation = match.group(3).strip()
triplets.append((subject, obj, relation))
return triplets
# Parse the output
triplets = extract_triplets(triplets_text)
for subj, obj, rel in triplets:
print(f"({subj} ; {obj} ; {rel})")
Output Format
The model generates triplets in the following format:
<triplet> SUBJECT <subj> OBJECT <obj> RELATION <triplet> ...
Example output:
<triplet> Albert Einstein <subj> German <obj> country of citizenship <triplet> theory of relativity <subj> Albert Einstein <obj> discoverer or inventor
Evaluation
The model achieves strong performance on both domain-specific and general relation extraction tasks due to the mixed training approach.
Intended Use
- Knowledge graph construction from scientific texts
- Relation extraction from quantum physics literature
- General purpose triplet extraction
- Domain adaptation for information extraction
Limitations
- Primarily trained on English text
- May have reduced performance on domains very different from quantum physics and general Wikipedia-style text
- Triplet extraction quality depends on input text quality and clarity
Training Details
- Base Model: Babelscape/rebel-large
- Training Framework: PyTorch Lightning
- Training Hardware: NVIDIA H200 GPU
- Batch Size: 1 (with gradient accumulation)
- Optimizer: AdamW
- Learning Rate: 3e-5
- Epochs: 3
- Mixed Precision: bf16
Citation
If you use this model, please cite:
@model{rebel_quantum_mixed,
author = {Konsman},
title = {REBEL Quantum Physics (Mixed Training)},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/konsman/rebel-quantum-mixed}
}
Also cite the original REBEL paper:
@inproceedings{huguet-cabot-navigli-2021-rebel-relation,
title = "{REBEL}: Relation Extraction By End-to-end Language generation",
author = "Huguet Cabot, Pere-Lluis and Navigli, Roberto",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
month = nov,
year = "2021",
address = "Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.findings-emnlp.204",
pages = "2370--2381",
}
License
CC-BY-4.0
Contact
For questions or issues, please open an issue on the model repository.
- Downloads last month
- 8
Model tree for konsman/rebel-quantum-mixed
Base model
Babelscape/rebel-large