Instructions to use fairdataihub/Llama-3.1-8B-Poster-Extraction with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use fairdataihub/Llama-3.1-8B-Poster-Extraction with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="fairdataihub/Llama-3.1-8B-Poster-Extraction")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("fairdataihub/Llama-3.1-8B-Poster-Extraction")
model = AutoModelForCausalLM.from_pretrained("fairdataihub/Llama-3.1-8B-Poster-Extraction")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use fairdataihub/Llama-3.1-8B-Poster-Extraction with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "fairdataihub/Llama-3.1-8B-Poster-Extraction"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "fairdataihub/Llama-3.1-8B-Poster-Extraction",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/fairdataihub/Llama-3.1-8B-Poster-Extraction

SGLang

How to use fairdataihub/Llama-3.1-8B-Poster-Extraction with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "fairdataihub/Llama-3.1-8B-Poster-Extraction" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "fairdataihub/Llama-3.1-8B-Poster-Extraction",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "fairdataihub/Llama-3.1-8B-Poster-Extraction" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "fairdataihub/Llama-3.1-8B-Poster-Extraction",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use fairdataihub/Llama-3.1-8B-Poster-Extraction with Docker Model Runner:
```
docker model run hf.co/fairdataihub/Llama-3.1-8B-Poster-Extraction
```

Llama-3.1-8B-Poster-Extraction

Model Description

This model powers the extraction pipeline for posters.science, a platform for making scientific conference posters Findable, Accessible, Interoperable, and Reusable (FAIR).

The model converts raw poster text into structured JSON metadata conforming to the poster-json-schema—a DataCite-based schema extended for poster-specific metadata including conference information, content sections, and figure/table captions.

Developed by the FAIR Data Innovations Hub at the California Medical Innovations Institute (CalMI²).

poster2json Library

This model is the core of the poster2json Python library:

Resource	Link
PyPI	poster2json
Documentation	fairdataihub.github.io/poster2json
GitHub	fairdataihub/poster2json
API Repository	fairdataihub/posters-science-extraction-api
Platform	posters.science

Quick Install

pip install poster2json

Python Usage

from poster2json import extract_poster

result = extract_poster("path/to/poster.pdf")
print(result["titles"][0]["title"])
print(result["creators"])

Output Schema

Output conforms to the poster-json-schema, based on DataCite Metadata Schema with poster-specific extensions:

{
  "$schema": "https://posters.science/schema/v0.1/poster_schema.json",
  "creators": [
    {
      "name": "Garcia, Sofia",
      "givenName": "Sofia",
      "familyName": "Garcia",
      "nameType": "Personal",
      "affiliation": ["University of California, San Diego"]
    }
  ],
  "titles": [
    { "title": "Machine Learning Approaches to Diabetic Retinopathy Detection" }
  ],
  "publicationYear": 2025,
  "subjects": [
    { "subject": "Machine Learning" },
    { "subject": "Diabetic Retinopathy" }
  ],
  "descriptions": [
    {
      "description": "This poster presents machine learning methods for automated diabetic retinopathy screening...",
      "descriptionType": "Abstract"
    }
  ],
  "conference": {
    "conferenceName": "AMIA 2025 Annual Symposium",
    "conferenceLocation": "San Francisco, CA"
  },
  "content": {
    "sections": [
      { "sectionTitle": "Introduction", "sectionContent": "..." },
      { "sectionTitle": "Methods", "sectionContent": "..." },
      { "sectionTitle": "Results", "sectionContent": "..." },
      { "sectionTitle": "Conclusions", "sectionContent": "..." }
    ]
  },
  "imageCaptions": [
    { "caption": "Figure 1. ROC curves showing model performance across datasets" }
  ],
  "tableCaptions": [
    { "caption": "Table 1. Summary of demographic characteristics" }
  ],
  "rightsList": [
    { "rights": "Creative Commons Attribution 4.0 International" }
  ],
  "formats": ["PDF"]
}

Key Schema Fields (DataCite-based)

Field	Description
`creators`	Authors with name, affiliation, ORCID identifiers
`titles`	Main title and alternative/translated titles
`subjects`	Keywords and classification codes (MeSH, LCSH)
`descriptions`	Abstract, methods, technical information
`conference`	Conference name, location, dates, URI
`content.sections`	Extracted poster sections with titles and content
`imageCaptions`	Figure captions extracted from the poster
`tableCaptions`	Table captions extracted from the poster
`fundingReferences`	Grant information (funder, award number)
`rightsList`	License information (CC-BY, etc.)
`relatedIdentifiers`	DOIs, URLs to related resources

Model Specifications

Attribute	Value
Base Model	meta-llama/Llama-3.1-8B-Instruct
Parameters	8 Billion
Context Length	128K tokens
Architecture	LLaMA 3.1
Precision	bfloat16
License	Llama 3.1 Community License

Performance

Validated on 10 manually annotated scientific posters:

Metric	Score	Threshold
Word Capture	0.96	≥0.75
ROUGE-L	0.89	≥0.75
Number Capture	0.93	≥0.75
Field Proportion	0.99	0.50–2.00

Pass Rate: 10/10 (100%)

Direct Usage (Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "fairdataihub/Llama-3.1-8B-Poster-Extraction"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)

prompt = """Extract structured metadata from the following scientific poster.
Return valid JSON conforming to the poster-json-schema with fields:
creators, titles, publicationYear, subjects, descriptions, conference, content, imageCaptions, tableCaptions.

Poster Content:
[Your poster text here]
"""

messages = [{"role": "user", "content": prompt}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=4096, temperature=0.1)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

System Requirements

GPU: NVIDIA CUDA-capable, ≥16GB VRAM (RTX 4090 recommended)
RAM: ≥32GB
Supports 8-bit quantization for memory-constrained environments
Compatible with vLLM and other inference optimization frameworks

Citation

@software{poster2json2026,
  title = {poster2json: Scientific Poster to JSON Metadata Extraction},
  author = {O'Neill, James and Soundarajan, Sanjay and Portillo, Dorian and Patel, Bhavesh},
  year = {2026},
  url = {https://github.com/fairdataihub/poster2json},
  doi = {10.5281/zenodo.18320010}
}

License

This model is released under the Llama 3.1 Community License.

Acknowledgments

FAIR Data Innovations Hub at California Medical Innovations Institute (CalMI²)
posters.science platform
Meta AI for the Llama 3.1 base model
HuggingFace for model hosting infrastructure
Funded by The Navigation Fund (10.71707/rk36-9x79) — "Poster Sharing and Discovery Made Easy"

Downloads last month: 240

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for fairdataihub/Llama-3.1-8B-Poster-Extraction

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Finetuned

(2831)

this model