hanzlajavaid's picture

hanzlajavaid PRO

hanzla

·

AI & ML interests

Direct Preference Optimization, Supervised Finetuning, Stable Diffusion

Recent Activity

posted an update 2 months ago

Reinforcement learning can sometimes lead to emergent behavior through much simpler training setups compared to large scale pre-training. I explored this idea by running a small GRPO experiment on Qwen3.5 4B, and the results were pretty exciting. Hypothesis: improving visual mathematical reasoning may also improve the model’s ability to transcribe LaTeX from images. I wrote a short breakdown of the experiment here: https://hanzlajavaid.github.io/blog/grpo-experiment-exploring-emergent-properties/

updated a model 2 months ago

hanzla/Qwen3.5-4B-mathvista-GRPO

published a model 2 months ago

hanzla/Qwen3.5-4B-mathvista-GRPO

View all activity

Organizations

upvoted a paper over 1 year ago

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 146

upvoted a paper almost 2 years ago

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

Paper • 2408.16725 • Published Aug 29, 2024 • 53