·
AI & ML interests
Reinforcement Learning
Organizations
luckeciano/Qwen-2.5-7B-RL-AC-BigLRv3-Fast-4-v6-Train-NoKL
Updated
luckeciano/Qwen-2.5-7B-RL-AC-BigLRv3-Fast-4-v5-Train-NoKL-Marg-NormAdv
Text Generation
• 8B • Updated • 2
luckeciano/Qwen-2.5-7B-RL-AC-BigLRv3-Fast-4-v5-Train-NoKL-Marg
Text Generation
• 8B • Updated • 2
luckeciano/Qwen-2.5-7B-RL-AC-BigLRv3-Fast-4-v5-Train-Marg-NormAdv
Updated
luckeciano/Qwen-2.5-7B-RL-AC-BigLRv3-Fast-4-v5-Train-Marg
Text Generation
• 8B • Updated • 2
luckeciano/Qwen-2.5-7B-RL-AC-BigLRv3-Fast-4-v4-Train
Text Generation
• 8B • Updated • 2
luckeciano/Qwen-2.5-7B-RL-AC-BigLRv3-Fast-4-v4-Train-NoKL
Text Generation
• 8B • Updated • 2
luckeciano/Qwen-2.5-7B-RL-AC-BigLRv3-Fast-4-v4-Train-ConstLR
Text Generation
• 8B • Updated • 2
luckeciano/Qwen-2.5-7B-RL-AC-BigLRv3-Fast-4-v3-AdamEps6
Text Generation
• 8B • Updated • 2
luckeciano/Qwen-2.5-7B-RL-AC-BigLRv3-Fast-4-v3-AdamEps8
8B • Updated • 1
luckeciano/Qwen-2.5-7B-RL-AC-BigLRv3-Fast-4-v3
8B • Updated luckeciano/Qwen-2.5-7B-RL-AC-BigLRv3-Fast-4-v2
8B • Updated • 1
luckeciano/Qwen-2.5-7B-RL-AC-BigLRv3-Fast-4
8B • Updated • 3
luckeciano/Qwen-2.5-7B-RL-AC-BigLRv3-Fast
8B • Updated • 1
luckeciano/Qwen-2.5-7B-RL-Baseline
8B • Updated • 2
luckeciano/Qwen-2.5-7B-RL-AC-BigLRv3-Eval
8B • Updated • 1
luckeciano/Qwen-2.5-7B-RL-AC-BigLRv3
Text Generation
• 8B • Updated • 2
luckeciano/Qwen-2.5-7B-RL-AC-NoBaseline
Updated
luckeciano/Qwen-2.5-7B-RL-Baseline-BigLR
Updated
luckeciano/Qwen-2.5-7B-RL-AC-BigLRv2
Updated
luckeciano/Qwen-2.5-7B-RL-AC-BigLR
Updated
luckeciano/Qwen-2.5-7B-RL-Baseline-NoKL
Text Generation
• 8B • Updated • 4
luckeciano/Qwen-2.5-7B-RL-AC
Updated
luckeciano/DeepSeek-R1-Distill-Qwen-1.5B-RL-Baseline
Updated
luckeciano/Qwen-2.5-1.5B-RL-Baseline-Long
Updated
luckeciano/Qwen-2.5-7B-Embedding-Entropy-0.45-Missing-Response
Text Generation
• 8B • Updated • 2
luckeciano/Qwen-2.5-7B-DrGRPO-Baseline
Updated
luckeciano/Qwen-2.5-7B-Embedding-Entropy-0.5-Missing-Response
Updated
luckeciano/Qwen-2.5-7B-Len-Penalty-Baseline-v2
Text Generation
• 8B • Updated • 2
luckeciano/Qwen-2.5-7B-Embedding-Entropy-Missing-Response
Updated