Dataset generation and curation
Model quantization, pruning, and distillation
TinyML and edge AI deployment
Co-design of hardware (FPGA/ASIC) and ML pipelines
Lightweight LLMs and domain-specific fine-tuning
Inspired by the heroes of day zero quants (@TheBloke@danielhanchen@shimmyshimmer@bartowski), I decided to join the race by releasing the first FP8 quant of glm-4.7-flash! Not as easy as i expected, but I'm happy i was still able to have it working within a few hours after the original model was released! Interested in feedback if anyone wants to try it out!