BIBFRAME-OLMo-1B-v2
Fine-tuned OLMo-1B for BIBFRAME RDF/XML correction. Trained on ~8,500 Library of Congress BIBFRAME records.
Model Details
Model Description
This model corrects malformed or incomplete BIBFRAME RDF/XML to produce valid, well-formed output following Library of Congress conventions. It was trained using LoRA (Low-Rank Adaptation) on real BIBFRAME records from id.loc.gov.
- Developed by: Jim Hahn
- Model type: Causal Language Model with LoRA adapter
- Language(s): RDF/XML (BIBFRAME vocabulary)
- License: Apache 2.0
- Finetuned from model: amd/AMD-OLMo-1B (native transformers format for ONNX/WebGPU compatibility)
Model Sources
- Repository: https://github.com/jimfhahn/bibframe-olmo
- Training Dataset: https://huggingface.co/datasets/jimfhahn/bibframe-corrections
- Previous version: https://huggingface.co/jimfhahn/bibframe-olmo-1b
Uses
Direct Use
Correcting malformed BIBFRAME RDF/XML records to valid Library of Congress format.
Downstream Use
- Integration with BIBFRAME validation pipelines
- Post-processing AI-generated BIBFRAME records
- Cleaning bulk catalog imports
- Part of the mcp4rdf-core validation and correction service
Out-of-Scope Use
- Generating BIBFRAME from natural language descriptions (not trained for this)
- Non-BIBFRAME RDF vocabularies (Schema.org, Dublin Core, etc.)
- MARC record processing
Bias, Risks, and Limitations
- Trained exclusively on Library of Congress records; may not generalize to other BIBFRAME implementations
- Cannot fix semantic errors (e.g., wrong subject headings), only structural/syntactic issues
- Large RDF documents may exceed context length (4096 tokens)
Recommendations
- Validate model output with SHACL shapes before use in production
- Use as part of a pipeline with human review for critical cataloging
How to Get Started with the Model
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load base model + LoRA adapter
model = AutoModelForCausalLM.from_pretrained('amd/AMD-OLMo-1B')
model = PeftModel.from_pretrained(model, 'jimfhahn/bibframe-olmo-1b-v2')
tokenizer = AutoTokenizer.from_pretrained('amd/AMD-OLMo-1B')
# Example: correct malformed BIBFRAME
prompt = """<|system|>
You are a BIBFRAME expert. Fix the following malformed RDF/XML to produce valid BIBFRAME.
<|user|>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:bf="http://id.loc.gov/ontologies/bibframe/">
<bf:Work>
<bf:title>Example Book</bf:title>
</bf:Work>
</rdf:RDF>
<|assistant|>
"""
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Details
Training Data
- Source: Library of Congress (id.loc.gov)
- Records: ~4,100 Works + ~5,000 Instances
- Diversity: 102 facets covering subjects, languages, time periods, formats, and genres
Training pairs were generated by:
- Collecting valid BIBFRAME Works and Instances from id.loc.gov
- Applying synthetic corruptions (missing elements, invalid URIs, syntax errors)
- Training the model to restore the original valid RDF/XML
Training Procedure
Training Hyperparameters
- Training regime: bf16 mixed precision
- Optimizer: AdamW
- Learning rate: 2e-4
- Batch size: 4 (with gradient accumulation 4, effective batch 16)
- Epochs: 3
- LoRA rank: 64
- LoRA alpha: 128
- LoRA target modules: att_proj, attn_out, ff_proj, ff_out
Speeds, Sizes, Times
- Training time: ~7.5 hours
- Hardware: NVIDIA A100-SXM4-80GB
- Final loss: 0.118
- Adapter size: 168 MB
Evaluation
Metrics
- Training loss: 0.118 (final)
- Additional evaluation with SHACL validation pending
Environmental Impact
- Hardware Type: NVIDIA A100-SXM4-80GB
- Hours used: 7.5
- Cloud Provider: Illinois Campus Cluster (NCSA)
- Compute Region: Illinois, USA
Technical Specifications
Model Architecture and Objective
OLMo-1B base model with LoRA adapters for causal language modeling on BIBFRAME correction task.
Compute Infrastructure
Hardware
NVIDIA A100-SXM4-80GB (1 GPU)
Software
- PyTorch 2.9.1
- Transformers 4.57.5
- PEFT 0.7.0
- ai2-olmo
Citation
BibTeX:
@misc{bibframe-olmo-2026,
author = {Hahn, Jim},
title = {BIBFRAME-OLMo-1B-v2: Fine-tuned OLMo for BIBFRAME Correction},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/jimfhahn/bibframe-olmo-1b-v2}
}
Model Card Authors
Jim Hahn
Model Card Contact
Model tree for jimfhahn/bibframe-olmo-1b-v2
Base model
amd/AMD-OLMo-1B