BIBFRAME-OLMo 1B

A fine-tuned 1B parameter language model for correcting malformed BIBFRAME RDF/XML to produce valid, well-formed output following Library of Congress conventions.

Model Details

Property Value
Base Model amd/AMD-OLMo-1B
Parameters 1.2B
Training LoRA fine-tuning, merged for deployment
Training Data ~8,500 Library of Congress BIBFRAME records
Task BIBFRAME RDF/XML correction
License Apache 2.0

Quick Start

VS Code Extension (Recommended)

The easiest way to use this model is through the BIBFRAME Vibe VS Code extension:

  1. Install the extension from the VS Code marketplace
  2. Configure in VS Code settings:
    {
      "bf.huggingFaceModel": "jimfhahn/bibframe-olmo-1b",
      "bf.huggingFaceToken": "hf_your_token_here"
    }
    
  3. Use @bf-vibe /correct in GitHub Copilot Chat to fix BIBFRAME records

Inference Endpoints (Production)

Deploy your own endpoint for production use:

  1. Click DeployInference Endpoints above
  2. Select Text Generation Inference (TGI)
  3. Choose instance: nvidia-t4 (recommended) or cpu-xlarge
  4. Configure in VS Code:
    {
      "bf.huggingFaceEndpoint": "https://your-endpoint.us-east-1.aws.endpoints.huggingface.cloud",
      "bf.huggingFaceToken": "hf_your_token_here"
    }
    

Python API

from transformers import pipeline

pipe = pipeline("text-generation", model="jimfhahn/bibframe-olmo-1b")

prompt = (
    "<|im_start|>system\n"
    "You are a BIBFRAME expert. Fix the following malformed RDF/XML "
    "to produce valid BIBFRAME.<|im_end|>\n"
    "<|im_start|>user\n"
    '<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"\n'
    '         xmlns:bf="http://id.loc.gov/ontologies/bibframe/">\n'
    "  <bf:Work>\n"
    "    <bf:title>Example Book</bf:title>\n"
    "  </bf:Work>\n"
    "</rdf:RDF><|im_end|>\n"
    "<|im_start|>assistant\n"
)

result = pipe(prompt, max_new_tokens=1024, temperature=0.1)
print(result[0]["generated_text"])

cURL (Inference API)

curl https://router.huggingface.co/hf-inference/models/jimfhahn/bibframe-olmo-1b \
  -X POST \
  -H "Authorization: Bearer $HF_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": "<|im_start|>system\nFix the BIBFRAME RDF/XML.<|im_end|>\n<|im_start|>user\n<your-rdf-here><|im_end|>\n<|im_start|>assistant\n",
    "parameters": {"max_new_tokens": 1024, "temperature": 0.1}
  }'

What It Fixes

This model corrects common BIBFRAME errors:

  • ❌ Missing required properties (bf:title, bf:adminMetadata)
  • ❌ Wrong namespace prefixes (bibframe:bf:)
  • ❌ Literal values where resources expected
  • ❌ Missing rdf:type declarations
  • ❌ Invalid property nesting
  • ❌ Malformed URIs

Prompt Format

The model was trained on ChatML format. Use these exact tokens:

<|im_start|>system
You are a BIBFRAME expert. Fix the following malformed RDF/XML to produce valid BIBFRAME.<|im_end|>
<|im_start|>user
[Your invalid RDF/XML here]<|im_end|>
<|im_start|>assistant

Note: The <|im_start|> / <|im_end|> tokens are required. Using other formats (e.g., <|system|>) will produce poor results.

Training Data

Trained on jimfhahn/bibframe-corrections:

  • Source: Library of Congress (id.loc.gov)
  • Records: ~4,100 Works + ~5,000 Instances
  • Diversity: 102 facets (subjects, languages, time periods, formats, genres)
  • Method: Synthetic corruptions → model learns to restore valid RDF/XML

Limitations

  • Trained exclusively on Library of Congress BIBFRAME; may not generalize to other implementations
  • Cannot fix semantic errors (wrong subject headings), only structural/syntactic issues
  • Large RDF documents may exceed context length (4096 tokens)
  • Recommendation: Validate output with SHACL shapes before production use

Ecosystem

Project Description
BIBFRAME Vibe VS Code extension for BIBFRAME cataloging
mcp4rdf-core SHACL validation service
bibframe-corrections Training dataset
bibframe-olmo-1b-v2 Original LoRA adapter

Citation

@misc{bibframe-olmo-2026,
  author = {Hahn, Jim},
  title = {BIBFRAME-OLMo-1B: Fine-tuned OLMo for BIBFRAME Correction},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/jimfhahn/bibframe-olmo-1b}
}

License

Apache 2.0

Downloads last month
11
Safetensors
Model size
1B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jimfhahn/bibframe-olmo-1b

Base model

amd/AMD-OLMo-1B
Finetuned
(1)
this model
Quantizations
1 model

Dataset used to train jimfhahn/bibframe-olmo-1b