dejanseo's picture
Update README.md
787875d verified
metadata
title: Grounding Snippet Generator
emoji: 🚀
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
  - streamlit
pinned: false
short_description: Streamlit template space

title: Snippet Generator emoji: ✂️ colorFrom: blue colorTo: purple sdk: streamlit sdk_version: 1.28.0 app_file: app.py pinned: false license: mit

✂️ Snippet Generator

Recreates Google Vertex AI / Gemini grounding-style extractive snippets.

How it works

  1. Sentence Segmentation - Splits document on sentence boundaries and newlines, filters noise (URLs, questions, low-alpha content)
  2. Cross-Encoder Scoring - Uses cross-encoder/ms-marco-electra-base to score query-sentence relevance
  3. Budget Selection - Picks top-scoring sentences within character/count limits
  4. Document-Order Stitching - Reassembles in original order with ... for gaps

Model

Uses MS MARCO Cross-Encoder trained on search relevance data - the same task as snippet generation.

Usage

  1. Enter a search query
  2. Paste document content
  3. Adjust settings (max chars, max sentences)
  4. Click "Generate Snippet"

Example

Query: best prostate cancer treatment in the world

Output:

This makes Asklepios one of the best prostate cancer treatment centers for foreigners. ... Spain is a leader in prostate cancer treatment, with top clinics like Centro Médico Teknon, Quironsalud Madrid, and Hospital Quiron Barcelona. ...