Alexander Gurung's picture

Alexander Gurung PRO

agurung

·

alex-gurung

AI & ML interests

None yet

Recent Activity

updated a model about 8 hours ago

agurung/cobalt-ft-qwen3-4b-dpo-mixed-12-mc-correct-only-v1-lora-r128-a32-lr5e-6-const-lr5e-6-gb64-ep2-be

published a model about 9 hours ago

agurung/cobalt-ft-qwen3-4b-dpo-mixed-12-mc-correct-only-v1-lora-r128-a32-lr5e-6-const-lr5e-6-gb64-ep2-be

published a model 4 days ago

agurung/cobalt-ft-qwen3-4b-dpo-mixed-12-mc-correct-v1-lora-r128-a32-lr5e-6-const-lr5e-6-gb64-ep2-beta0p1

View all activity

Organizations

agurung 's models 103

agurung/flawed-fictions-qwen3-4b

Reinforcement Learning • 4B • Updated Mar 21 • 1

agurung/colar-qwen25-7b-ff-post-sft

8B • Updated Mar 15

agurung/qwen-coconut-ff-v2

8B • Updated Mar 15 • 1

agurung/ncp-qwen25-7b-lengthpenalty

Reinforcement Learning • 8B • Updated Mar 11

agurung/flawed-fictions-qwen3-4b-lengthpenalty-litereason

Reinforcement Learning • 4B • Updated Mar 10

agurung/colar-qwen3-4b-ff-sft

4B • Updated Mar 9

agurung/flawed-fictions-gemma-3-4b-lengthpenalty

Reinforcement Learning • 4B • Updated Feb 25 • 1

agurung/flawed-fictions-qwen3-4b-lengthpenalty

Reinforcement Learning • 4B • Updated Feb 24 • 3

agurung/qwen3-4b-ff-grpo-lengthpenalty

4B • Updated Feb 24 • 2

agurung/colar-ff-qwen3-4b

4B • Updated Feb 23 • 1

agurung/flawed-fictions-qwen25-7b-lengthpenalty-litereason

Reinforcement Learning • 8B • Updated Feb 22 • 4

agurung/flawed-fictions-qwen25-7b-lengthpenalty

Reinforcement Learning • 8B • Updated Feb 20 • 2

agurung/flawed-fictions-olmo-3-7b

Reinforcement Learning • 7B • Updated Feb 16 • 2

agurung/flawed-fictions-gemma-3-4b

Reinforcement Learning • 4B • Updated Feb 15 • 5

agurung/qwen3-4b-lcb-dapo-correctness

agurung/Qwen2.5-7B-Instruct-flawedfiction-latent-grpo

Text Generation • 8B • Updated Feb 7 • 11

agurung/v4_savebestearly_sft_qwen7B_25percent_lr_1e3_bptt_offset

Text Generation • 8B • Updated Feb 5 • 2

agurung/v4_savebestearly_sft_qwen7B_25percent_lr_1e4_bptt_offset

Text Generation • 8B • Updated Feb 5 • 3

agurung/v3sft_qwen7B_25percent_lr_1e4_bptt_offset

Text Generation • 8B • Updated Feb 5 • 3

agurung/v1ff_savebestearly_sft_qwen7B_25percent_lr_1e4_bptt_offset

Text Generation • 8B • Updated Feb 5 • 2

agurung/v2ff_savebestearly_sft_qwen7B_25percent_lr_1e4_bptt_offset

Text Generation • 8B • Updated Feb 5 • 3

agurung/v3ff_savebestearly_sft_qwen7B_25percent_lr_1e4_bptt_offset_newprompt

Text Generation • 8B • Updated Feb 5 • 9

agurung/Qwen2.5-7B-Instruct-flawedfiction-latent-grpo-nosft

Text Generation • 8B • Updated Feb 5 • 2

agurung/olmo3-7b-lcb-mc-nosum-gspo

agurung/olmo3-7b-lcb-mc-gspo

agurung/olmo3-7b-lcb-standard-rl-gspo

agurung/olmo3-7b-lcb-mc-rl

agurung/olmo3-7b-lcb-standard-rl

Reinforcement Learning • Updated Jan 11

agurung/qwen34b-context-kd

Text Generation • 4B • Updated Dec 13, 2025 • 1

agurung/Qwen2.5-7B-Instruct-flawedfiction-grpo-impdata

Text Generation • 8B • Updated Oct 29, 2025 • 3