KRLabsOrg
/

chiliground-base-modernbert-v1

Text Classification

Model card Files Files and versions

chiliground-base-modernbert-v1 / README.md

adaamko's picture

Update README.md

d419f58 verified 11 months ago

|

history blame contribute delete

2.3 kB

	---
	license: mit
	datasets:
	- rungalileo/ragbench
	language:
	- en
	metrics:
	- f1
	base_model:
	- answerdotai/ModernBERT-base
	pipeline_tag: text-classification
	---
	# ChiliGround - A verbatim RAG framework

	A sentence classification model for extracting relevant spans from documents based on a question.

	## Model Details
	- Base model: answerdotai/ModernBERT-base
	- Hidden dimension: 768
	- Number of labels: 2

	## Usage

	```python
	from verbatim_rag.extractors import ModelSpanExtractor
	from verbatim_rag.document import Document

	# Initialize the extractor
	extractor = ModelSpanExtractor(
	model_path="KRLabsOrg/chiliground-base-modernbert-v1",
	threshold=0.5
	)

	# Create documents
	documents = [
	Document(
	content="""
	Climate change is a significant and lasting change in the statistical distribution of weather patterns.
	Global warming is the observed increase in the average temperature of the Earth's atmosphere and oceans.
	Greenhouse gases include water vapor, carbon dioxide, methane, nitrous oxide, and ozone.
	Human activities since the beginning of the Industrial Revolution have increased greenhouse gas levels.
	""",
	metadata={"source": "example_doc_1", "id": "climate_1"},
	),
	Document(
	content="""
	Renewable energy comes from sources that are naturally replenished on a human timescale.
	Solar power is the conversion of energy from sunlight into electricity.
	Wind power is the use of wind to provide mechanical power or electricity.
	Hydropower is electricity generated from the energy of falling water.
	""",
	metadata={"source": "example_doc_2", "id": "energy_1"},
	),
	]


	# Extract relevant spans
	question = "What causes climate change?"
	results = extractor.extract_spans(question, documents)

	# Print the results
	for doc_content, spans in results.items():
	for span in spans:
	print(span)
	```

	## Training Data

	This model was trained on a QA dataset to classify sentences as relevant or not relevant to a given question.

	## Limitations

	- The model works at the sentence level and may miss relevant spans that cross sentence boundaries
	- Performance depends on the quality and relevance of the training data
	- The model is designed for English text only