Llama-3.1-8B-Poster-Extraction

Model Description

This repository hosts Meta Llama 3.1 8B Instruct configured for Machine Actionable Poster Extraction tasks. The model is designed to extract structured, machine-readable information from scientific conference posters, enabling automated processing and analysis of poster content.

Intended Use Cases

Machine Actionable Poster Extraction

  • Structured Data Extraction: Convert unstructured poster content into structured JSON/XML formats
  • Section Identification: Identify and segment poster sections (Title, Authors, Abstract, Methods, Results, Conclusions)
  • Entity Recognition: Extract key scientific entities including:
    • Author names and affiliations
    • Research methodologies
    • Statistical findings
    • Citations and references
  • Semantic Understanding: Interpret relationships between extracted elements
  • Metadata Generation: Create machine-readable metadata for poster cataloging

Scientific Document Processing

  • Conference poster digitization
  • Research content aggregation
  • Automated poster summarization
  • Cross-poster comparative analysis

Model Specifications

Attribute Value
Base Model meta-llama/Llama-3.1-8B-Instruct
Parameters 8 Billion
Context Length 128K tokens
Architecture LLaMA 3.1
Precision bfloat16
License Llama 3.1 Community License

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "jimnoneill/Llama-3.1-8B-Poster-Extraction"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)

# Example: Extract structured data from poster text
prompt = """Extract structured information from the following scientific poster content.
Return the extracted information in JSON format with fields: title, authors, affiliations, 
abstract, methods, results, conclusions.

Poster Content:
[Your poster text here]
"""

messages = [{"role": "user", "content": prompt}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=2048, temperature=0.1)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Extraction Schema Example

{
  "title": "Extracted poster title",
  "authors": [
    {"name": "Author Name", "affiliation": "Institution"}
  ],
  "abstract": "Extracted abstract text",
  "sections": {
    "background": "Background content",
    "methods": "Methodology description",
    "results": "Key findings",
    "conclusions": "Summary conclusions"
  },
  "entities": {
    "methods": ["method1", "method2"],
    "metrics": ["metric1", "metric2"],
    "findings": ["finding1", "finding2"]
  }
}

Performance Considerations

  • Optimized for GPU inference (recommended: NVIDIA RTX 4090 or equivalent)
  • Supports quantization for memory-constrained environments
  • Compatible with vLLM and other inference optimization frameworks

Citation

If you use this model for poster extraction research, please cite:

@misc{oneill2025poster,
  title={Machine Actionable Poster Extraction with Llama 3.1},
  author={O'Neill, James},
  year={2025},
  publisher={HuggingFace}
}

License

This model is released under the Llama 3.1 Community License.

Acknowledgments

  • Meta AI for the Llama 3.1 base model
  • HuggingFace for the model hosting infrastructure
Downloads last month
61
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jimnoneill/Llama-3.1-8B-Poster-Extraction

Finetuned
(2190)
this model