| | --- |
| | license: mit |
| | datasets: |
| | - rungalileo/ragbench |
| | language: |
| | - en |
| | metrics: |
| | - f1 |
| | base_model: |
| | - answerdotai/ModernBERT-base |
| | pipeline_tag: text-classification |
| | --- |
| | # ChiliGround - A verbatim RAG framework |
| |
|
| | A sentence classification model for extracting relevant spans from documents based on a question. |
| |
|
| | ## Model Details |
| | - Base model: answerdotai/ModernBERT-base |
| | - Hidden dimension: 768 |
| | - Number of labels: 2 |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | from verbatim_rag.extractors import ModelSpanExtractor |
| | from verbatim_rag.document import Document |
| | |
| | # Initialize the extractor |
| | extractor = ModelSpanExtractor( |
| | model_path="KRLabsOrg/chiliground-base-modernbert-v1", |
| | threshold=0.5 |
| | ) |
| | |
| | # Create documents |
| | documents = [ |
| | Document( |
| | content=""" |
| | Climate change is a significant and lasting change in the statistical distribution of weather patterns. |
| | Global warming is the observed increase in the average temperature of the Earth's atmosphere and oceans. |
| | Greenhouse gases include water vapor, carbon dioxide, methane, nitrous oxide, and ozone. |
| | Human activities since the beginning of the Industrial Revolution have increased greenhouse gas levels. |
| | """, |
| | metadata={"source": "example_doc_1", "id": "climate_1"}, |
| | ), |
| | Document( |
| | content=""" |
| | Renewable energy comes from sources that are naturally replenished on a human timescale. |
| | Solar power is the conversion of energy from sunlight into electricity. |
| | Wind power is the use of wind to provide mechanical power or electricity. |
| | Hydropower is electricity generated from the energy of falling water. |
| | """, |
| | metadata={"source": "example_doc_2", "id": "energy_1"}, |
| | ), |
| | ] |
| | |
| | |
| | # Extract relevant spans |
| | question = "What causes climate change?" |
| | results = extractor.extract_spans(question, documents) |
| | |
| | # Print the results |
| | for doc_content, spans in results.items(): |
| | for span in spans: |
| | print(span) |
| | ``` |
| |
|
| | ## Training Data |
| |
|
| | This model was trained on a QA dataset to classify sentences as relevant or not relevant to a given question. |
| |
|
| | ## Limitations |
| |
|
| | - The model works at the sentence level and may miss relevant spans that cross sentence boundaries |
| | - Performance depends on the quality and relevance of the training data |
| | - The model is designed for English text only |