--- language: - en license: mit library_name: transformers tags: - question-answering - extractive-qa - snippet-extraction - text-extraction pipeline_tag: question-answering widget: - text: "What is the most compelling or interesting snippet from this text?" context: "The Crash at Crush was a one-day publicity stunt in the U.S. state of Texas that took place on September 15, 1896, in which two uncrewed locomotives were crashed into each other head-on at high speed. William George Crush conceived the idea to demonstrate a staged train wreck as a public spectacle. An estimated 40,000 people attended the event. Unexpectedly, the impact caused both engine boilers to explode, resulting in a shower of flying debris that killed two people and caused numerous injuries among the spectators." example_title: "Train Crash Example" - text: "What is the most compelling or interesting snippet from this text?" context: "TempleOS is a biblical-themed lightweight operating system designed to be the Third Temple from the Hebrew Bible. It was created by American computer programmer Terry A. Davis, who developed it alone over the course of a decade after a series of manic episodes that he later described as a revelation from God. The system was characterized as a modern x86-64 Commodore 64, using an interface similar to a mixture of DOS and Turbo C." example_title: "TempleOS Example" - text: "What is the most compelling or interesting snippet from this text?" context: "Lina Marcela Medina de Jurado is a Peruvian woman who became the youngest confirmed mother in history when she gave birth to her son Gerardo on 14 May 1939 when she was five years, seven months, and 21 days of age. Based on the medical assessments of her pregnancy, she was four years old when she became pregnant, which was biologically possible due to precocious puberty." example_title: "Medical Record Example" datasets: - custom base_model: answerdotai/ModernBERT-large --- # Snippet Extractor Model This model extracts the most compelling or interesting snippets from text passages. It's fine-tuned for extractive question answering where the "question" is always: > **"What is the most compelling or interesting snippet from this text?"** ## Model Description - **Task**: Extractive Question Answering / Snippet Extraction - **Base Model**: `answerdotai/ModernBERT-large` - **Training Data**: Wikipedia article snippets curated for interesting/compelling content - **Language**: English ## Usage ### Quick Start with Pipeline (Recommended) ```python from transformers import pipeline # Load the model qa_pipeline = pipeline("question-answering", model="derenrich/snippet-extractor") # Extract a compelling snippet context = """ The Crash at Crush was a one-day publicity stunt in the U.S. state of Texas that took place on September 15, 1896, in which two uncrewed locomotives were crashed into each other head-on at high speed. An estimated 40,000 people attended the event. Unexpectedly, the impact caused both engine boilers to explode, resulting in a shower of flying debris that killed two people. """ result = qa_pipeline( question="What is the most compelling or interesting snippet from this text?", context=context ) print(f"Snippet: {result['answer']}") print(f"Confidence: {result['score']:.4f}") ``` ### Manual Loading ```python from transformers import AutoModelForQuestionAnswering, AutoTokenizer import torch # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("derenrich/snippet-extractor") model = AutoModelForQuestionAnswering.from_pretrained("derenrich/snippet-extractor") # Prepare inputs question = "What is the most compelling or interesting snippet from this text?" context = "Your text here..." inputs = tokenizer(question, context, return_tensors="pt", truncation=True, max_length=384) # Get predictions with torch.no_grad(): outputs = model(**inputs) # Decode answer start_idx = outputs.start_logits.argmax() end_idx = outputs.end_logits.argmax() answer_tokens = inputs.input_ids[0][start_idx:end_idx+1] answer = tokenizer.decode(answer_tokens) print(f"Extracted snippet: {answer}") ``` ## Training Details - **Epochs**: 3 - **Learning Rate**: 2e-5 - **Batch Size**: 8 - **Max Sequence Length**: 384 - **Optimizer**: AdamW with weight decay 0.01 ## Intended Use This model is designed to: - Extract interesting/compelling snippets from text for summaries - Highlight the most notable information in articles - Generate "hook" text for content previews ## Limitations - Works best on English text - Trained primarily on Wikipedia-style content - May not perform as well on highly technical or domain-specific text - The concept of "compelling" is subjective; results may vary ## Citation If you use this model, please cite: ```bibtex @misc{snippet-extractor, title={Snippet Extractor: Extracting Compelling Text Snippets}, author={Daniel Erenrich}, year={2024}, publisher={Hugging Face}, url={https://huggingface.co/derenrich/snippet-extractor} } ```