adjaysagar/SmolLM2-1.7B-finetune
Updated
Quite an interesting find - very similar to how this team reports - https://www.reddit.com/r/LocalLLaMA/comments/1l44lw8/sparse_transformers_run_2x_faster_llm_with_30/
llama-quantize now supports layer pruning via the --prune-layers flag!