Edit Models filters
Apps
Inference Providers
Active filters: architecture-search
phanerozoic/segmentation-heads
Updated • 1
kshitijthakkar/moe-312m-114m-16x2-12L-baseline-390m
Updated
kshitijthakkar/moe-225m-108m-12x2-10L-baseline-330m
Updated
kshitijthakkar/moe-99m-70m-8x2-8L-tiny-200m-8exp
Updated
kshitijthakkar/moe-141m-89m-8x2-10L-small-250m-8exp
Updated
kshitijthakkar/moe-202m-104m-12x2-10L-medium-300m-12exp
Updated
kshitijthakkar/moe-241m-111m-12x2-12L-balanced-350m-12exp
Updated
kshitijthakkar/moe-353m-130m-16x2-12L-large-400m-16exp
Updated
kshitijthakkar/moe-415m-147m-16x2-12L-xlarge-450m-16exp
Updated
kshitijthakkar/moe-161m-123m-4x2-12L-4exp-large-experts
Updated
kshitijthakkar/moe-198m-114m-8x2-12L-8exp-balanced
Updated
kshitijthakkar/moe-340m-107m-24x2-12L-24exp-specialized
Updated
kshitijthakkar/moe-350m-102m-16x1-12L-top1-routing
Updated
kshitijthakkar/moe-274m-132m-16x4-12L-top4-routing
Updated
kshitijthakkar/moe-240m-103m-12x2-16L-deep-narrow-16l
Updated
kshitijthakkar/moe-270m-132m-12x2-8L-shallow-wide-8l
Updated
kshitijthakkar/moe-229m-111m-12x2-10L-full-attention-no-gqa
Updated
kshitijthakkar/moe-284m-119m-12x2-14L-aggressive-gqa-1kv
Updated
kshitijthakkar/moe-255m-114m-12x2-12L-full-attention-no-gqa-lr5e-06
Updated
kshitijthakkar/moe-255m-114m-12x2-12L-full-attention-no-gqa-lr1e-05
Updated
kshitijthakkar/moe-255m-114m-12x2-12L-full-attention-no-gqa-lr3e-05
Updated
kshitijthakkar/moe-255m-114m-12x2-12L-full-attention-no-gqa-lr5e-05
Updated
kshitijthakkar/moe-255m-114m-12x2-12L-full-attention-no-gqa-lr1e-04
Updated
kshitijthakkar/moe-255m-114m-12x2-12L-full-attention-no-gqa-lr2e-04
Updated
kshitijthakkar/moe-255m-114m-12x2-12L-full-attention-no-gqa-lr3e-04
Updated
kshitijthakkar/moe-255m-114m-12x2-12L-full-attention-no-gqa-lr5e-04
Updated
kshitijthakkar/moe-255m-114m-12x2-12L-full-attention-no-gqa-lr1e-03
Updated
kshitijthakkar/moe-255m-114m-12x2-12L-full-attention-no-gqa-bs2-ctx512
Updated
kshitijthakkar/moe-255m-114m-12x2-12L-full-attention-no-gqa-bs2-ctx1024
Updated