Instructions to use connect211/RRULE_Extractor with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use connect211/RRULE_Extractor with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="connect211/RRULE_Extractor",
	filename="model-q4_k_m.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use connect211/RRULE_Extractor with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf connect211/RRULE_Extractor:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf connect211/RRULE_Extractor:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf connect211/RRULE_Extractor:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf connect211/RRULE_Extractor:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf connect211/RRULE_Extractor:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf connect211/RRULE_Extractor:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf connect211/RRULE_Extractor:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf connect211/RRULE_Extractor:Q4_K_M

Use Docker

docker model run hf.co/connect211/RRULE_Extractor:Q4_K_M

LM Studio
Jan

vLLM

How to use connect211/RRULE_Extractor with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "connect211/RRULE_Extractor"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "connect211/RRULE_Extractor",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/connect211/RRULE_Extractor:Q4_K_M

Ollama
How to use connect211/RRULE_Extractor with Ollama:
```
ollama run hf.co/connect211/RRULE_Extractor:Q4_K_M
```

Unsloth Studio new

How to use connect211/RRULE_Extractor with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for connect211/RRULE_Extractor to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for connect211/RRULE_Extractor to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for connect211/RRULE_Extractor to start chatting

Docker Model Runner
How to use connect211/RRULE_Extractor with Docker Model Runner:
```
docker model run hf.co/connect211/RRULE_Extractor:Q4_K_M
```

Lemonade

How to use connect211/RRULE_Extractor with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull connect211/RRULE_Extractor:Q4_K_M

Run and chat with the model

lemonade run user.RRULE_Extractor-Q4_K_M

List all available models

lemonade list

CheetoBandito commited on Feb 27

Commit

4767b2f

verified ·

1 Parent(s): 2842319

Update README.md

Browse files

Files changed (1) hide show

README.md +31 -0

README.md CHANGED Viewed

@@ -40,6 +40,29 @@ The result is a highly accurate extraction model that:
 _(The model will output a valid JSON array of schedule objects containing fields such as `opens_at`, `closes_at`, `freq`, `interval`, `byday`, etc.)_
 ## Training Data
 The model was fine-tuned on examples of 211 schedule data consisting of unstructured schedule strings annotated with corresponding valid ICAL compliant RRULES.
@@ -48,3 +71,11 @@ The model was fine-tuned on examples of 211 schedule data consisting of unstruct
 - Designed primarily for English language schedule descriptions.
 - Output generation should be validated by a JSON parser to ensure strict downstream compatibility, though the model is highly trained to output valid JSON formats natively.

 _(The model will output a valid JSON array of schedule objects containing fields such as `opens_at`, `closes_at`, `freq`, `interval`, `byday`, etc.)_
+## Available Model Files
+This repository includes both the full-precision safetensors weights and a quantized GGUF for flexible usage:
+| File(s) | Format | Precision | Size | Use Case |
+|---|---|---|---|---|
+| `model-0000[1-4]-of-00004.safetensors` | SafeTensors | BF16 | ~21 GB | Further fine-tuning / research |
+| `model-q4_k_m.gguf` | GGUF | Q4_K_M | ~4.7 GB | Inference / deployment |
+### Full-Precision Weights (SafeTensors)
+The four `.safetensors` shards contain the complete model weights at **16-bit (F16) precision** — the native precision at which the model was trained and fine-tuned.  F16 was chosen over BF16 to keep the door open for future reinforcment learning techniques if I get the time to revisit this project when we are looking to deploy this model or a smaller QAT finetuned 4B parameter model at scale.  Either way, the safetensors are provided for researchers and practitioners who want to:
+- Continue fine-tuning on additional or domain-specific schedule data.
+- Experiment with alternative quantization schemes.
+- Run evaluations at full precision.
+### Quantized GGUF (Q4_K_M)
+The `model-q4_k_m.gguf` file is the recommended file for most inference use cases. It was produced using **Quantization-Aware Training (QAT)** — a technique that simulates the effects of quantization *during* the training process, allowing the model to adapt its weights to minimize accuracy loss before the final quantization step is applied. This is in contrast to post-training quantization (PTQ), which quantizes a fully trained model without any opportunity for weight adaptation.
+The practical result is that the Q4_K_M model retains the vast majority of the full-precision model's accuracy at a fraction of the memory footprint, making it well-suited for local inference and production deployment. For a deeper technical explanation of how QAT enables low-precision accuracy recovery, see [NVIDIA's overview here](https://developer.nvidia.com/blog/how-quantization-aware-training-enables-low-precision-accuracy-recovery/).
 ## Training Data
 The model was fine-tuned on examples of 211 schedule data consisting of unstructured schedule strings annotated with corresponding valid ICAL compliant RRULES.
 - Designed primarily for English language schedule descriptions.
 - Output generation should be validated by a JSON parser to ensure strict downstream compatibility, though the model is highly trained to output valid JSON formats natively.
+## Future Work & Significance
+At 8B parameters, this model carries capabilities well beyond what schedule extraction requires. Its generalized architecture handles the task reliably, but we believe similar extraction quality is achievable with a much smaller model — potentially in the 1–4B range — trained on the same labeled dataset. A purpose-built smaller model would dramatically reduce inference cost, latency, and memory footprint, which matters at the scale 211 networks operate at.
+Looking ahead, the ~10k labeled examples used here could be expanded to an estimated 80k+ training examples, and the same fine-tuning methodology applied to other structured fields across 211 datasets beyond schedules.
+More broadly, this model serves as a working proof of concept for a meaningful hypothesis: that the semantically rich, human-stewarded data maintained by 211 networks is not limited by its lack of machine-readable structure. With targeted fine-tuning, AI can bridge that gap — preserving the accuracy and nuance of human curation while producing the structured outputs that governments, hospitals, researchers, and software providers need to build effective solutions for society's biggest problems. The bottleneck is not the data. The bottleneck is tooling, and that bottleneck is solvable.