Snippet Extractor Model

This model extracts the most compelling or interesting snippets from text passages. It's fine-tuned for extractive question answering where the "question" is always:

"What is the most compelling or interesting snippet from this text?"

Model Description

  • Task: Extractive Question Answering / Snippet Extraction
  • Base Model: answerdotai/ModernBERT-large
  • Training Data: Wikipedia article snippets curated for interesting/compelling content
  • Language: English

Usage

Quick Start with Pipeline (Recommended)

from transformers import pipeline

# Load the model
qa_pipeline = pipeline("question-answering", model="derenrich/snippet-extractor")

# Extract a compelling snippet
context = """
The Crash at Crush was a one-day publicity stunt in the U.S. state of Texas 
that took place on September 15, 1896, in which two uncrewed locomotives were 
crashed into each other head-on at high speed. An estimated 40,000 people 
attended the event. Unexpectedly, the impact caused both engine boilers to 
explode, resulting in a shower of flying debris that killed two people.
"""

result = qa_pipeline(
    question="What is the most compelling or interesting snippet from this text?",
    context=context
)

print(f"Snippet: {result['answer']}")
print(f"Confidence: {result['score']:.4f}")

Manual Loading

from transformers import AutoModelForQuestionAnswering, AutoTokenizer
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("derenrich/snippet-extractor")
model = AutoModelForQuestionAnswering.from_pretrained("derenrich/snippet-extractor")

# Prepare inputs
question = "What is the most compelling or interesting snippet from this text?"
context = "Your text here..."

inputs = tokenizer(question, context, return_tensors="pt", truncation=True, max_length=384)

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)

# Decode answer
start_idx = outputs.start_logits.argmax()
end_idx = outputs.end_logits.argmax()
answer_tokens = inputs.input_ids[0][start_idx:end_idx+1]
answer = tokenizer.decode(answer_tokens)

print(f"Extracted snippet: {answer}")

Training Details

  • Epochs: 3
  • Learning Rate: 2e-5
  • Batch Size: 8
  • Max Sequence Length: 384
  • Optimizer: AdamW with weight decay 0.01

Intended Use

This model is designed to:

  • Extract interesting/compelling snippets from text for summaries
  • Highlight the most notable information in articles
  • Generate "hook" text for content previews

Limitations

  • Works best on English text
  • Trained primarily on Wikipedia-style content
  • May not perform as well on highly technical or domain-specific text
  • The concept of "compelling" is subjective; results may vary

Citation

If you use this model, please cite:

@misc{snippet-extractor,
  title={Snippet Extractor: Extracting Compelling Text Snippets},
  author={Daniel Erenrich},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/derenrich/snippet-extractor}
}
Downloads last month
11
Safetensors
Model size
0.4B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for derenrich/snippet-extractor

Finetuned
(226)
this model

Space using derenrich/snippet-extractor 1