philippotiger commited on
Commit
fd5712e
·
verified ·
1 Parent(s): a4a41f5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -3
README.md CHANGED
@@ -1,3 +1,90 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model: Qwen/Qwen2.5-3B-Instruct
4
+ language:
5
+ - en
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - football
9
+ - sports
10
+ - json-extraction
11
+ - gguf
12
+ - lora
13
+ - qwen2.5
14
+ library_name: llama-cpp
15
+ ---
16
+
17
+ # forecast-extractor
18
+
19
+ A fine-tuned version of [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)
20
+ for extracting structured JSON from football prediction messages (e.g. Telegram tip channels).
21
+
22
+ ## What it does
23
+
24
+ Given a raw football prediction message, it returns a structured JSON array:
25
+ ```json
26
+ [
27
+ {
28
+ "league": "La Liga",
29
+ "team_1": "Real Madrid",
30
+ "team_2": "Barcelona",
31
+ "prediction": "1X",
32
+ "date": "25/03/2026",
33
+ "odds": 1.42
34
+ }
35
+ ]
36
+ ```
37
+
38
+ Handles:
39
+ - Single and multi-tip messages (up to 4 tips)
40
+ - Bold unicode text (Telegram formatting)
41
+ - Missing fields → null
42
+ - Varied formats, emojis, noise
43
+
44
+ ## Models
45
+
46
+ | File | Size | Description |
47
+ |---|---|---|
48
+ | `football-extractor-q4.gguf` | 1.8GB | Q4_K_M quantized — recommended |
49
+ | `football-extractor-f16.gguf` | 5.8GB | Full f16 precision |
50
+
51
+ ## Quick start
52
+
53
+ ### With llama-cpp-python (recommended)
54
+ ```python
55
+ from llama_cpp import Llama
56
+ import json
57
+
58
+ llm = Llama(model_path="football-extractor-q4.gguf", n_ctx=2048, n_gpu_layers=-1)
59
+
60
+ response = llm.create_chat_completion(
61
+ messages=[
62
+ {"role": "system", "content": "Extract structured data and return ONLY a valid JSON array. Keys: league, team_1, team_2, prediction, date, odds. Use null for missing fields."},
63
+ {"role": "user", "content": "YOUR TIP TEXT HERE"}
64
+ ],
65
+ temperature=0.0,
66
+ max_tokens=512,
67
+ )
68
+ print(json.loads(response["choices"][0]["message"]["content"]))
69
+ ```
70
+
71
+ ### With Ollama
72
+ ```bash
73
+ ollama pull philippotiger/forecast-extractor
74
+ ollama run philippotiger/forecast-extractor
75
+ ```
76
+
77
+ ## Training details
78
+
79
+ - **Base model:** Qwen/Qwen2.5-3B-Instruct
80
+ - **Method:** QLoRA (4-bit NF4) with LoRA r=8
81
+ - **Dataset:** 300 synthetic examples generated from real team data
82
+ - 70% single-tip, 30% multi-tip (2-4 events)
83
+ - 10 message templates with emoji injection, typos, missing fields
84
+ - **Epochs:** 3
85
+ - **Final val loss:** ~0.24
86
+
87
+ ## Intended use
88
+
89
+ Parsing football prediction messages from Telegram channels or similar
90
+ sources into structured data for further processing or storage.