Snippet Extractor Model

This model extracts the most compelling or interesting snippets from text passages. It's fine-tuned for extractive question answering where the "question" is always:

"What is the most compelling or interesting snippet from this text?"

Model Description

Task: Extractive Question Answering / Snippet Extraction
Base Model: answerdotai/ModernBERT-large
Training Data: Wikipedia article snippets curated for interesting/compelling content
Language: English

Usage

Quick Start with Pipeline (Recommended)

from transformers import pipeline

# Load the model
qa_pipeline = pipeline("question-answering", model="derenrich/snippet-extractor")

# Extract a compelling snippet
context = """
The Crash at Crush was a one-day publicity stunt in the U.S. state of Texas 
that took place on September 15, 1896, in which two uncrewed locomotives were 
crashed into each other head-on at high speed. An estimated 40,000 people 
attended the event. Unexpectedly, the impact caused both engine boilers to 
explode, resulting in a shower of flying debris that killed two people.
"""

result = qa_pipeline(
    question="What is the most compelling or interesting snippet from this text?",
    context=context
)

print(f"Snippet: {result['answer']}")
print(f"Confidence: {result['score']:.4f}")

Manual Loading

from transformers import AutoModelForQuestionAnswering, AutoTokenizer
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("derenrich/snippet-extractor")
model = AutoModelForQuestionAnswering.from_pretrained("derenrich/snippet-extractor")

# Prepare inputs
question = "What is the most compelling or interesting snippet from this text?"
context = "Your text here..."

inputs = tokenizer(question, context, return_tensors="pt", truncation=True, max_length=384)

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)

# Decode answer
start_idx = outputs.start_logits.argmax()
end_idx = outputs.end_logits.argmax()
answer_tokens = inputs.input_ids[0][start_idx:end_idx+1]
answer = tokenizer.decode(answer_tokens)

print(f"Extracted snippet: {answer}")

Training Details

Epochs: 3
Learning Rate: 2e-5
Batch Size: 8
Max Sequence Length: 384
Optimizer: AdamW with weight decay 0.01

Intended Use

This model is designed to:

Extract interesting/compelling snippets from text for summaries
Highlight the most notable information in articles
Generate "hook" text for content previews

Limitations

Works best on English text
Trained primarily on Wikipedia-style content
May not perform as well on highly technical or domain-specific text
The concept of "compelling" is subjective; results may vary

Citation

If you use this model, please cite:

@misc{snippet-extractor,
  title={Snippet Extractor: Extracting Compelling Text Snippets},
  author={Daniel Erenrich},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/derenrich/snippet-extractor}
}

Downloads last month: 13

Safetensors

Model size

0.4B params

Tensor type

F32

Model tree for derenrich/snippet-extractor

Base model

answerdotai/ModernBERT-large

Finetuned

(267)

this model

derenrich
/

snippet-extractor