--- license: mit base_model: Qwen/Qwen2.5-3B-Instruct language: - en pipeline_tag: text-generation tags: - football - sports - json-extraction - gguf - lora - qwen2.5 library_name: llama-cpp --- # forecast-extractor A fine-tuned version of [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) for extracting structured JSON from football prediction messages (e.g. Telegram tip channels). ## What it does Given a raw football prediction message, it returns a structured JSON array: ```json [ { "league": "La Liga", "team_1": "Real Madrid", "team_2": "Barcelona", "prediction": "1X", "date": "25/03/2026", "odds": 1.42 } ] ``` Handles: - Single and multi-tip messages (up to 4 tips) - Bold unicode text (Telegram formatting) - Missing fields → null - Varied formats, emojis, noise ## Models | File | Size | Description | |---|---|---| | `football-extractor-q4.gguf` | 1.8GB | Q4_K_M quantized — recommended | | `football-extractor-f16.gguf` | 5.8GB | Full f16 precision | ## Quick start ### With llama-cpp-python (recommended) ```python from llama_cpp import Llama import json llm = Llama(model_path="football-extractor-q4.gguf", n_ctx=2048, n_gpu_layers=-1) response = llm.create_chat_completion( messages=[ {"role": "system", "content": "Extract structured data and return ONLY a valid JSON array. Keys: league, team_1, team_2, prediction, date, odds. Use null for missing fields."}, {"role": "user", "content": "YOUR TIP TEXT HERE"} ], temperature=0.0, max_tokens=512, ) print(json.loads(response["choices"][0]["message"]["content"])) ``` ### With Ollama ```bash ollama pull philippotiger/forecast-extractor ollama run philippotiger/forecast-extractor ``` ## Training details - **Base model:** Qwen/Qwen2.5-3B-Instruct - **Method:** QLoRA (4-bit NF4) with LoRA r=8 - **Dataset:** 300 synthetic examples generated from real team data - 70% single-tip, 30% multi-tip (2-4 events) - 10 message templates with emoji injection, typos, missing fields - **Epochs:** 3 - **Final val loss:** ~0.24 ## Intended use Parsing football prediction messages from Telegram channels or similar sources into structured data for further processing or storage.