BIBFRAME-OLMo 1B
A fine-tuned 1B parameter language model for correcting malformed BIBFRAME RDF/XML to produce valid, well-formed output following Library of Congress conventions.
Model Details
| Property | Value |
|---|---|
| Base Model | amd/AMD-OLMo-1B |
| Parameters | 1.2B |
| Training | LoRA fine-tuning, merged for deployment |
| Training Data | ~8,500 Library of Congress BIBFRAME records |
| Task | BIBFRAME RDF/XML correction |
| License | Apache 2.0 |
Quick Start
VS Code Extension (Recommended)
The easiest way to use this model is through the BIBFRAME Vibe VS Code extension:
- Install the extension from the VS Code marketplace
- Configure in VS Code settings:
{ "bf.huggingFaceModel": "jimfhahn/bibframe-olmo-1b", "bf.huggingFaceToken": "hf_your_token_here" } - Use
@bf-vibe /correctin GitHub Copilot Chat to fix BIBFRAME records
Inference Endpoints (Production)
Deploy your own endpoint for production use:
- Click Deploy β Inference Endpoints above
- Select Text Generation Inference (TGI)
- Choose instance:
nvidia-t4(recommended) orcpu-xlarge - Configure in VS Code:
{ "bf.huggingFaceEndpoint": "https://your-endpoint.us-east-1.aws.endpoints.huggingface.cloud", "bf.huggingFaceToken": "hf_your_token_here" }
Python API
from transformers import pipeline
pipe = pipeline("text-generation", model="jimfhahn/bibframe-olmo-1b")
prompt = """<|system|>
You are a BIBFRAME expert. Fix the following malformed RDF/XML to produce valid BIBFRAME.
<|user|>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:bf="http://id.loc.gov/ontologies/bibframe/">
<bf:Work>
<bf:title>Example Book</bf:title>
</bf:Work>
</rdf:RDF>
<|assistant|>
"""
result = pipe(prompt, max_new_tokens=1024, temperature=0.1)
print(result[0]["generated_text"])
cURL (Inference API)
curl https://router.huggingface.co/hf-inference/models/jimfhahn/bibframe-olmo-1b \
-X POST \
-H "Authorization: Bearer $HF_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"inputs": "<|system|>\nFix the BIBFRAME RDF/XML.\n<|user|>\n<your-rdf-here>\n<|assistant|>\n",
"parameters": {"max_new_tokens": 1024, "temperature": 0.1}
}'
What It Fixes
This model corrects common BIBFRAME errors:
- β Missing required properties (
bf:title,bf:adminMetadata) - β Wrong namespace prefixes (
bibframe:βbf:) - β Literal values where resources expected
- β Missing
rdf:typedeclarations - β Invalid property nesting
- β Malformed URIs
Prompt Format
The model expects this chat format:
<|system|>
You are a BIBFRAME expert. Fix the following malformed RDF/XML to produce valid BIBFRAME.
<|user|>
[Your invalid RDF/XML here]
<|assistant|>
Training Data
Trained on jimfhahn/bibframe-corrections:
- Source: Library of Congress (id.loc.gov)
- Records: ~4,100 Works + ~5,000 Instances
- Diversity: 102 facets (subjects, languages, time periods, formats, genres)
- Method: Synthetic corruptions β model learns to restore valid RDF/XML
Limitations
- Trained exclusively on Library of Congress BIBFRAME; may not generalize to other implementations
- Cannot fix semantic errors (wrong subject headings), only structural/syntactic issues
- Large RDF documents may exceed context length (4096 tokens)
- Recommendation: Validate output with SHACL shapes before production use
Ecosystem
| Project | Description |
|---|---|
| BIBFRAME Vibe | VS Code extension for BIBFRAME cataloging |
| mcp4rdf-core | SHACL validation service |
| bibframe-corrections | Training dataset |
| bibframe-olmo-1b-v2 | Original LoRA adapter |
Citation
@misc{bibframe-olmo-2026,
author = {Hahn, Jim},
title = {BIBFRAME-OLMo-1B: Fine-tuned OLMo for BIBFRAME Correction},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/jimfhahn/bibframe-olmo-1b}
}
License
Apache 2.0
- Downloads last month
- 17