BIBFRAME-OLMo 1B
A fine-tuned 1B parameter language model for correcting malformed BIBFRAME RDF/XML to produce valid, well-formed output following Library of Congress conventions.
Model Details
| Property | Value |
|---|---|
| Base Model | amd/AMD-OLMo-1B |
| Parameters | 1.2B |
| Training | LoRA fine-tuning, merged for deployment |
| Training Data | ~8,500 Library of Congress BIBFRAME records |
| Task | BIBFRAME RDF/XML correction |
| License | Apache 2.0 |
Quick Start
VS Code Extension (Recommended)
The easiest way to use this model is through the BIBFRAME Vibe VS Code extension:
- Install the extension from the VS Code marketplace
- Configure in VS Code settings:
{ "bf.huggingFaceModel": "jimfhahn/bibframe-olmo-1b", "bf.huggingFaceToken": "hf_your_token_here" } - Use
@bf-vibe /correctin GitHub Copilot Chat to fix BIBFRAME records
Inference Endpoints (Production)
Deploy your own endpoint for production use:
- Click Deploy → Inference Endpoints above
- Select Text Generation Inference (TGI)
- Choose instance:
nvidia-t4(recommended) orcpu-xlarge - Configure in VS Code:
{ "bf.huggingFaceEndpoint": "https://your-endpoint.us-east-1.aws.endpoints.huggingface.cloud", "bf.huggingFaceToken": "hf_your_token_here" }
Python API
from transformers import pipeline
pipe = pipeline("text-generation", model="jimfhahn/bibframe-olmo-1b")
prompt = (
"<|im_start|>system\n"
"You are a BIBFRAME expert. Fix the following malformed RDF/XML "
"to produce valid BIBFRAME.<|im_end|>\n"
"<|im_start|>user\n"
'<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"\n'
' xmlns:bf="http://id.loc.gov/ontologies/bibframe/">\n'
" <bf:Work>\n"
" <bf:title>Example Book</bf:title>\n"
" </bf:Work>\n"
"</rdf:RDF><|im_end|>\n"
"<|im_start|>assistant\n"
)
result = pipe(prompt, max_new_tokens=1024, temperature=0.1)
print(result[0]["generated_text"])
cURL (Inference API)
curl https://router.huggingface.co/hf-inference/models/jimfhahn/bibframe-olmo-1b \
-X POST \
-H "Authorization: Bearer $HF_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"inputs": "<|im_start|>system\nFix the BIBFRAME RDF/XML.<|im_end|>\n<|im_start|>user\n<your-rdf-here><|im_end|>\n<|im_start|>assistant\n",
"parameters": {"max_new_tokens": 1024, "temperature": 0.1}
}'
What It Fixes
This model corrects common BIBFRAME errors:
- ❌ Missing required properties (
bf:title,bf:adminMetadata) - ❌ Wrong namespace prefixes (
bibframe:→bf:) - ❌ Literal values where resources expected
- ❌ Missing
rdf:typedeclarations - ❌ Invalid property nesting
- ❌ Malformed URIs
Prompt Format
The model was trained on ChatML format. Use these exact tokens:
<|im_start|>system
You are a BIBFRAME expert. Fix the following malformed RDF/XML to produce valid BIBFRAME.<|im_end|>
<|im_start|>user
[Your invalid RDF/XML here]<|im_end|>
<|im_start|>assistant
Note: The
<|im_start|>/<|im_end|>tokens are required. Using other formats (e.g.,<|system|>) will produce poor results.
Training Data
Trained on jimfhahn/bibframe-corrections:
- Source: Library of Congress (id.loc.gov)
- Records: ~4,100 Works + ~5,000 Instances
- Diversity: 102 facets (subjects, languages, time periods, formats, genres)
- Method: Synthetic corruptions → model learns to restore valid RDF/XML
Limitations
- Trained exclusively on Library of Congress BIBFRAME; may not generalize to other implementations
- Cannot fix semantic errors (wrong subject headings), only structural/syntactic issues
- Large RDF documents may exceed context length (4096 tokens)
- Recommendation: Validate output with SHACL shapes before production use
Ecosystem
| Project | Description |
|---|---|
| BIBFRAME Vibe | VS Code extension for BIBFRAME cataloging |
| mcp4rdf-core | SHACL validation service |
| bibframe-corrections | Training dataset |
| bibframe-olmo-1b-v2 | Original LoRA adapter |
Citation
@misc{bibframe-olmo-2026,
author = {Hahn, Jim},
title = {BIBFRAME-OLMo-1B: Fine-tuned OLMo for BIBFRAME Correction},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/jimfhahn/bibframe-olmo-1b}
}
License
Apache 2.0
- Downloads last month
- 11