jayzou3773/qwen3_5-moe-neuron_structure_drop-p50-s1k-128samples-thinking-sft 19B • Updated 2 days ago • 20
jayzou3773/qwen3_5-moe-neuron_structure_drop-p50-s1k-128samples-thinking-sft 19B • Updated 2 days ago • 20
jayzou3773/qwen3_5-moe-expert_drop-pure_gradient_pruning-r128-s1k-128samples-thinking-sft 19B • Updated 2 days ago • 11
jayzou3773/qwen3_5-moe-expert_drop-pure_gradient_pruning-r128-s1k-128samples-thinking-sft 19B • Updated 2 days ago • 11
jayzou3773/qwen3_5-moe-expert_drop-pure_expert_gradient_pruning-r128-s1k-128samples-thinking-sft 19B • Updated 2 days ago • 12
jayzou3773/qwen3_5-moe-expert_drop-pure_expert_gradient_pruning-r128-s1k-128samples-thinking-sft 19B • Updated 2 days ago • 12
jayzou3773/qwen3_5-moe-expert_drop-layerwise_pruning-r128-s1k-128samples-thinking-sft 19B • Updated 2 days ago • 12
jayzou3773/qwen3_5-moe-expert_drop-layerwise_pruning-r128-s1k-128samples-thinking-sft 19B • Updated 2 days ago • 12
jayzou3773/qwen3_5-moe-expert_drop-bias_pruning-r128-s1k-128samples-thinking-sft 19B • Updated 2 days ago • 12
jayzou3773/qwen3_5-moe-expert_drop-bias_pruning-r128-s1k-128samples-thinking-sft 19B • Updated 2 days ago • 12
jayzou3773/qwen3-moe-neuron_structure_drop-p50-s1k-128samples-thinking-sft 211k • Updated 2 days ago • 10
jayzou3773/qwen3-moe-neuron_structure_drop-p50-s1k-128samples-thinking-sft 211k • Updated 2 days ago • 10
jayzou3773/qwen3-moe-expert_drop-pure_gradient_pruning-r64-s1k-128samples-thinking-sft 211k • Updated 2 days ago • 11
jayzou3773/qwen3-moe-expert_drop-pure_gradient_pruning-r64-s1k-128samples-thinking-sft 211k • Updated 2 days ago • 11
jayzou3773/qwen3-moe-expert_drop-pure_expert_gradient_pruning-r64-s1k-128samples-thinking-sft 211k • Updated 2 days ago • 12
jayzou3773/qwen3-moe-expert_drop-pure_expert_gradient_pruning-r64-s1k-128samples-thinking-sft 211k • Updated 2 days ago • 12