hanzlajavaid PRO
hanzla
AI & ML interests
Direct Preference Optimization, Supervised Finetuning, Stable Diffusion
Recent Activity
posted an update 10 days ago
Reinforcement learning can sometimes lead to emergent behavior through much simpler training setups compared to large scale pre-training.
I explored this idea by running a small GRPO experiment on Qwen3.5 4B, and the results were pretty exciting.
Hypothesis: improving visual mathematical reasoning may also improve the model’s ability to transcribe LaTeX from images.
I wrote a short breakdown of the experiment here:
https://hanzlajavaid.github.io/blog/grpo-experiment-exploring-emergent-properties/ updated a model 18 days ago
hanzla/Qwen3.5-4B-mathvista-GRPO published a model 18 days ago
hanzla/Qwen3.5-4B-mathvista-GRPO