Snippet Extractor Model
This model extracts the most compelling or interesting snippets from text passages. It's fine-tuned for extractive question answering where the "question" is always:
"What is the most compelling or interesting snippet from this text?"
Model Description
- Task: Extractive Question Answering / Snippet Extraction
- Base Model:
answerdotai/ModernBERT-large - Training Data: Wikipedia article snippets curated for interesting/compelling content
- Language: English
Usage
Quick Start with Pipeline (Recommended)
from transformers import pipeline
# Load the model
qa_pipeline = pipeline("question-answering", model="derenrich/snippet-extractor")
# Extract a compelling snippet
context = """
The Crash at Crush was a one-day publicity stunt in the U.S. state of Texas
that took place on September 15, 1896, in which two uncrewed locomotives were
crashed into each other head-on at high speed. An estimated 40,000 people
attended the event. Unexpectedly, the impact caused both engine boilers to
explode, resulting in a shower of flying debris that killed two people.
"""
result = qa_pipeline(
question="What is the most compelling or interesting snippet from this text?",
context=context
)
print(f"Snippet: {result['answer']}")
print(f"Confidence: {result['score']:.4f}")
Manual Loading
from transformers import AutoModelForQuestionAnswering, AutoTokenizer
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("derenrich/snippet-extractor")
model = AutoModelForQuestionAnswering.from_pretrained("derenrich/snippet-extractor")
# Prepare inputs
question = "What is the most compelling or interesting snippet from this text?"
context = "Your text here..."
inputs = tokenizer(question, context, return_tensors="pt", truncation=True, max_length=384)
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
# Decode answer
start_idx = outputs.start_logits.argmax()
end_idx = outputs.end_logits.argmax()
answer_tokens = inputs.input_ids[0][start_idx:end_idx+1]
answer = tokenizer.decode(answer_tokens)
print(f"Extracted snippet: {answer}")
Training Details
- Epochs: 3
- Learning Rate: 2e-5
- Batch Size: 8
- Max Sequence Length: 384
- Optimizer: AdamW with weight decay 0.01
Intended Use
This model is designed to:
- Extract interesting/compelling snippets from text for summaries
- Highlight the most notable information in articles
- Generate "hook" text for content previews
Limitations
- Works best on English text
- Trained primarily on Wikipedia-style content
- May not perform as well on highly technical or domain-specific text
- The concept of "compelling" is subjective; results may vary
Citation
If you use this model, please cite:
@misc{snippet-extractor,
title={Snippet Extractor: Extracting Compelling Text Snippets},
author={Daniel Erenrich},
year={2024},
publisher={Hugging Face},
url={https://huggingface.co/derenrich/snippet-extractor}
}
- Downloads last month
- 11
Model tree for derenrich/snippet-extractor
Base model
answerdotai/ModernBERT-large