BIBFRAME-OLMo-1B-v2

Fine-tuned OLMo-1B for BIBFRAME RDF/XML correction. Trained on ~8,500 Library of Congress BIBFRAME records.

Model Details

Model Description

This model corrects malformed or incomplete BIBFRAME RDF/XML to produce valid, well-formed output following Library of Congress conventions. It was trained using LoRA (Low-Rank Adaptation) on real BIBFRAME records from id.loc.gov.

  • Developed by: Jim Hahn
  • Model type: Causal Language Model with LoRA adapter
  • Language(s): RDF/XML (BIBFRAME vocabulary)
  • License: Apache 2.0
  • Finetuned from model: amd/AMD-OLMo-1B (native transformers format for ONNX/WebGPU compatibility)

Model Sources

Uses

Direct Use

Correcting malformed BIBFRAME RDF/XML records to valid Library of Congress format.

Downstream Use

  • Integration with BIBFRAME validation pipelines
  • Post-processing AI-generated BIBFRAME records
  • Cleaning bulk catalog imports
  • Part of the mcp4rdf-core validation and correction service

Out-of-Scope Use

  • Generating BIBFRAME from natural language descriptions (not trained for this)
  • Non-BIBFRAME RDF vocabularies (Schema.org, Dublin Core, etc.)
  • MARC record processing

Bias, Risks, and Limitations

  • Trained exclusively on Library of Congress records; may not generalize to other BIBFRAME implementations
  • Cannot fix semantic errors (e.g., wrong subject headings), only structural/syntactic issues
  • Large RDF documents may exceed context length (4096 tokens)

Recommendations

  • Validate model output with SHACL shapes before use in production
  • Use as part of a pipeline with human review for critical cataloging

How to Get Started with the Model

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load base model + LoRA adapter
model = AutoModelForCausalLM.from_pretrained('amd/AMD-OLMo-1B')
model = PeftModel.from_pretrained(model, 'jimfhahn/bibframe-olmo-1b-v2')
tokenizer = AutoTokenizer.from_pretrained('amd/AMD-OLMo-1B')

# Example: correct malformed BIBFRAME
prompt = """<|system|>
You are a BIBFRAME expert. Fix the following malformed RDF/XML to produce valid BIBFRAME.
<|user|>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:bf="http://id.loc.gov/ontologies/bibframe/">
  <bf:Work>
    <bf:title>Example Book</bf:title>
  </bf:Work>
</rdf:RDF>
<|assistant|>
"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

jimfhahn/bibframe-corrections

  • Source: Library of Congress (id.loc.gov)
  • Records: ~4,100 Works + ~5,000 Instances
  • Diversity: 102 facets covering subjects, languages, time periods, formats, and genres

Training pairs were generated by:

  1. Collecting valid BIBFRAME Works and Instances from id.loc.gov
  2. Applying synthetic corruptions (missing elements, invalid URIs, syntax errors)
  3. Training the model to restore the original valid RDF/XML

Training Procedure

Training Hyperparameters

  • Training regime: bf16 mixed precision
  • Optimizer: AdamW
  • Learning rate: 2e-4
  • Batch size: 4 (with gradient accumulation 4, effective batch 16)
  • Epochs: 3
  • LoRA rank: 64
  • LoRA alpha: 128
  • LoRA target modules: att_proj, attn_out, ff_proj, ff_out

Speeds, Sizes, Times

  • Training time: ~7.5 hours
  • Hardware: NVIDIA A100-SXM4-80GB
  • Final loss: 0.118
  • Adapter size: 168 MB

Evaluation

Metrics

  • Training loss: 0.118 (final)
  • Additional evaluation with SHACL validation pending

Environmental Impact

  • Hardware Type: NVIDIA A100-SXM4-80GB
  • Hours used: 7.5
  • Cloud Provider: Illinois Campus Cluster (NCSA)
  • Compute Region: Illinois, USA

Technical Specifications

Model Architecture and Objective

OLMo-1B base model with LoRA adapters for causal language modeling on BIBFRAME correction task.

Compute Infrastructure

Hardware

NVIDIA A100-SXM4-80GB (1 GPU)

Software

  • PyTorch 2.9.1
  • Transformers 4.57.5
  • PEFT 0.7.0
  • ai2-olmo

Citation

BibTeX:

@misc{bibframe-olmo-2026,
  author = {Hahn, Jim},
  title = {BIBFRAME-OLMo-1B-v2: Fine-tuned OLMo for BIBFRAME Correction},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/jimfhahn/bibframe-olmo-1b-v2}
}

Model Card Authors

Jim Hahn

Model Card Contact

https://github.com/jimfhahn

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jimfhahn/bibframe-olmo-1b-v2

Base model

amd/AMD-OLMo-1B
Adapter
(1)
this model

Dataset used to train jimfhahn/bibframe-olmo-1b-v2