kalki-sambhal commited on
Commit
8eae3ec
·
verified ·
1 Parent(s): 50262bd

Initial upload of We-Math Phi-4 (multimodal) with model card

Browse files
README.md CHANGED
@@ -1,90 +0,0 @@
1
- ## We-Math Phi-4 (Multimodal)
2
-
3
- A fine-tuned variant of `Phi-4-multimodal-instruct` for image-based math reasoning on the We-Math dataset.
4
-
5
- - **Base model**: `Phi-4-multimodal-instruct`
6
- - **Repository**: `kalkiai3000/we-math-phi4`
7
- - **Modalities**: Vision + Text → Text
8
- - **Intended use**: Solve math problems that include an image (diagrams, handwritten notes, printed problems) with short, stepwise answers.
9
-
10
- ### Training
11
- - **Data**: We-Math processed with captions; 1,000 matched samples
12
- - Train: 800, Eval: 200 (random split)
13
- - **Prompting format** (Phi-4 MM): `<|user|><|image_1|>Please solve this math problem: {question}\nImage description: {caption}<|end|><|assistant|>{answer}<|end|>`
14
- - **Hyperparameters** (Trainer):
15
- - `num_train_epochs`: 20
16
- - `per_device_train_batch_size`: 1 (effective batch size 8 via `gradient_accumulation_steps=8`)
17
- - `per_device_eval_batch_size`: 1
18
- - `warmup_steps`: 50
19
- - `fp16`: true, `gradient_checkpointing`: true
20
- - `eval_strategy`: steps (every 25), `save_steps`: 50, `save_total_limit`: 3
21
- - `remove_unused_columns`: false, `label_names`: ["labels"]
22
- - **Implementation notes**:
23
- - Attention implementation forced to eager (flash attention disabled)
24
- - KV cache disabled during training (`use_cache=False`)
25
- - Minor compatibility shims for recent Transformers cache utilities
26
- - **Hardware**: NVIDIA A100 GPUs
27
- - **Seed**: 42
28
- - **Local paths**: finetuned at `/data-mount-large/models/phi4-wemath-finetuned`, merged at `/data-mount-large/models/phi4-wemath-merged`
29
-
30
- ### Evaluation
31
- - **Eval split size**: 200
32
- - **Metric**: Cross-entropy loss on eval split
33
- - **Observed eval losses**:
34
- - Best eval loss ~ **0.6948** (around step 550)
35
- - Later eval loss values trend ~0.78 as training proceeds
36
- - **Best model selection**: `load_best_model_at_end=True` using `eval_loss` → the exported model corresponds to the best checkpoint.
37
- - **Approximate perplexity**: exp(0.6948) ≈ **2.00**; exp(0.7794) ≈ **2.18**
38
-
39
- ### Data processing
40
- - Images loaded from `images/`; captions loaded from `captions-sentence.json`
41
- - Each conversation example matched to an image filename; question and answer extracted from structured messages
42
- - Images resized to max side 1024 and converted to RGB if needed
43
-
44
- ### Usage
45
- Minimal example with the Hub model:
46
-
47
- ```python
48
- import torch
49
- from PIL import Image
50
- from transformers import AutoProcessor, AutoModelForCausalLM
51
-
52
- model_id = "kalkiai3000/we-math-phi4"
53
- processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
54
- model = AutoModelForCausalLM.from_pretrained(
55
- model_id,
56
- trust_remote_code=True,
57
- torch_dtype=torch.float16,
58
- device_map="auto",
59
- attn_implementation="eager",
60
- )
61
- model.config.use_cache = False
62
-
63
- image = Image.open("/path/to/problem_image.png").convert("RGB")
64
- question = "What is the area of the triangle?"
65
- prompt = f"<|user|><|image_1|>Please solve this math problem: {question}<|end|><|assistant|>"
66
-
67
- inputs = processor(text=prompt, images=image, return_tensors="pt")
68
- inputs = {k: (v.to(model.device) if hasattr(v, "to") else v) for k, v in inputs.items()}
69
- inputs["input_mode"] = torch.tensor([1], dtype=torch.long, device=model.device)
70
-
71
- with torch.no_grad():
72
- output = model.generate(
73
- **inputs,
74
- max_new_tokens=128,
75
- temperature=0.7,
76
- do_sample=True,
77
- pad_token_id=processor.tokenizer.eos_token_id,
78
- num_logits_to_keep=1,
79
- )
80
-
81
- print(processor.tokenizer.decode(output[0], skip_special_tokens=True))
82
- ```
83
-
84
- ### Limitations and bias
85
- - Trained on a small subset (1k) of We-Math; may not generalize broadly
86
- - Sensitive to prompt structure; include `<|image_1|>` and the Phi-4 conversation tokens
87
- - Hallucinations possible; verify results for critical use cases
88
-
89
- ### Citation
90
- If you use this model, please cite the base model and this fine-tune.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
chat_template.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "chat_template": "{% for message in messages %}{% if message['role'] == 'system' and 'tools' in message and message['tools'] is not none %}{{ '<|' + message['role'] + '|>' + message['content'] + '<|tool|>' + message['tools'] + '<|/tool|>' + '<|end|>' }}{% else %}{{ '<|' + message['role'] + '|>' + message['content'] + '<|end|>' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>' }}{% else %}{{ eos_token }}{% endif %}"
3
+ }
config.json CHANGED
@@ -207,8 +207,8 @@
207
  "r": 320
208
  },
209
  "tie_word_embeddings": true,
210
- "torch_dtype": "float16",
211
- "transformers_version": "4.55.4",
212
  "use_cache": false,
213
  "vision_lora": {
214
  "dp": 0.0,
 
207
  "r": 320
208
  },
209
  "tie_word_embeddings": true,
210
+ "torch_dtype": "float32",
211
+ "transformers_version": "4.51.3",
212
  "use_cache": false,
213
  "vision_lora": {
214
  "dp": 0.0,
generation_config.json CHANGED
@@ -6,5 +6,5 @@
6
  199999
7
  ],
8
  "pad_token_id": 199999,
9
- "transformers_version": "4.55.4"
10
  }
 
6
  199999
7
  ],
8
  "pad_token_id": 199999,
9
+ "transformers_version": "4.51.3"
10
  }
model-00001-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d814457a67f12032e31f17b55ce51771f25798e53419a0d4927102378f74a8d4
3
+ size 4989600976
model-00002-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fbd8beaa84c1972e3f5b6e3b6fa58d536cbee9c3395a480bf7dafae4e04e2645
3
+ size 4988391504
model-00003-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4bf416eb90c4ae287f9fa1ba428a7e84c5b982f58e98876777e6b7017e027c00
3
+ size 4856491072
model-00004-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:46c7d79e1dc207decab5ed00e2a9704d57c8ae7c394d1c884a2aa281aca4c5e5
3
+ size 4983106416
model-00005-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:50fc6423cec56ec15294e98818dea7b67502a815ab28d49afc29e78bec575479
3
+ size 2480555192
model.safetensors.index.json CHANGED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json CHANGED
@@ -118,7 +118,9 @@
118
  "AutoProcessor": "processing_phi4mm.Phi4MMProcessor"
119
  },
120
  "bos_token": "<|endoftext|>",
 
121
  "clean_up_tokenization_spaces": false,
 
122
  "eos_token": "<|endoftext|>",
123
  "extra_special_tokens": {},
124
  "model_max_length": 131072,
 
118
  "AutoProcessor": "processing_phi4mm.Phi4MMProcessor"
119
  },
120
  "bos_token": "<|endoftext|>",
121
+ "chat_template": "{% for message in messages %}{% if message['role'] == 'system' and 'tools' in message and message['tools'] is not none %}{{ '<|' + message['role'] + '|>' + message['content'] + '<|tool|>' + message['tools'] + '<|/tool|>' + '<|end|>' }}{% else %}{{ '<|' + message['role'] + '|>' + message['content'] + '<|end|>' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>' }}{% else %}{{ eos_token }}{% endif %}",
122
  "clean_up_tokenization_spaces": false,
123
+ "dynamic_hd": 36,
124
  "eos_token": "<|endoftext|>",
125
  "extra_special_tokens": {},
126
  "model_max_length": 131072,