metadata
language:
- en
license: mit
library_name: transformers
tags:
- question-answering
- extractive-qa
- snippet-extraction
- text-extraction
pipeline_tag: question-answering
widget:
- text: What is the most compelling or interesting snippet from this text?
context: >-
The Crash at Crush was a one-day publicity stunt in the U.S. state of
Texas that took place on September 15, 1896, in which two uncrewed
locomotives were crashed into each other head-on at high speed. William
George Crush conceived the idea to demonstrate a staged train wreck as a
public spectacle. An estimated 40,000 people attended the event.
Unexpectedly, the impact caused both engine boilers to explode, resulting
in a shower of flying debris that killed two people and caused numerous
injuries among the spectators.
example_title: Train Crash Example
- text: What is the most compelling or interesting snippet from this text?
context: >-
TempleOS is a biblical-themed lightweight operating system designed to be
the Third Temple from the Hebrew Bible. It was created by American
computer programmer Terry A. Davis, who developed it alone over the course
of a decade after a series of manic episodes that he later described as a
revelation from God. The system was characterized as a modern x86-64
Commodore 64, using an interface similar to a mixture of DOS and Turbo C.
example_title: TempleOS Example
- text: What is the most compelling or interesting snippet from this text?
context: >-
Lina Marcela Medina de Jurado is a Peruvian woman who became the youngest
confirmed mother in history when she gave birth to her son Gerardo on 14
May 1939 when she was five years, seven months, and 21 days of age. Based
on the medical assessments of her pregnancy, she was four years old when
she became pregnant, which was biologically possible due to precocious
puberty.
example_title: Medical Record Example
datasets:
- custom
base_model: answerdotai/ModernBERT-large
Snippet Extractor Model
This model extracts the most compelling or interesting snippets from text passages. It's fine-tuned for extractive question answering where the "question" is always:
"What is the most compelling or interesting snippet from this text?"
Model Description
- Task: Extractive Question Answering / Snippet Extraction
- Base Model:
answerdotai/ModernBERT-large - Training Data: Wikipedia article snippets curated for interesting/compelling content
- Language: English
Usage
Quick Start with Pipeline (Recommended)
from transformers import pipeline
# Load the model
qa_pipeline = pipeline("question-answering", model="derenrich/snippet-extractor")
# Extract a compelling snippet
context = """
The Crash at Crush was a one-day publicity stunt in the U.S. state of Texas
that took place on September 15, 1896, in which two uncrewed locomotives were
crashed into each other head-on at high speed. An estimated 40,000 people
attended the event. Unexpectedly, the impact caused both engine boilers to
explode, resulting in a shower of flying debris that killed two people.
"""
result = qa_pipeline(
question="What is the most compelling or interesting snippet from this text?",
context=context
)
print(f"Snippet: {result['answer']}")
print(f"Confidence: {result['score']:.4f}")
Manual Loading
from transformers import AutoModelForQuestionAnswering, AutoTokenizer
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("derenrich/snippet-extractor")
model = AutoModelForQuestionAnswering.from_pretrained("derenrich/snippet-extractor")
# Prepare inputs
question = "What is the most compelling or interesting snippet from this text?"
context = "Your text here..."
inputs = tokenizer(question, context, return_tensors="pt", truncation=True, max_length=384)
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
# Decode answer
start_idx = outputs.start_logits.argmax()
end_idx = outputs.end_logits.argmax()
answer_tokens = inputs.input_ids[0][start_idx:end_idx+1]
answer = tokenizer.decode(answer_tokens)
print(f"Extracted snippet: {answer}")
Training Details
- Epochs: 3
- Learning Rate: 2e-5
- Batch Size: 8
- Max Sequence Length: 384
- Optimizer: AdamW with weight decay 0.01
Intended Use
This model is designed to:
- Extract interesting/compelling snippets from text for summaries
- Highlight the most notable information in articles
- Generate "hook" text for content previews
Limitations
- Works best on English text
- Trained primarily on Wikipedia-style content
- May not perform as well on highly technical or domain-specific text
- The concept of "compelling" is subjective; results may vary
Citation
If you use this model, please cite:
@misc{snippet-extractor,
title={Snippet Extractor: Extracting Compelling Text Snippets},
author={Daniel Erenrich},
year={2024},
publisher={Hugging Face},
url={https://huggingface.co/derenrich/snippet-extractor}
}