Model Introduction
EPT-ZeRo is a sLM designed by Research Project ICT I team from Singapore Korean International School for on-device/edge environment, prioritizing lower memory usage and efficient inference.
To achieve this, the EPT series implements Rotary Positional Embeddings(RoPE), SwiGLU activation combined with causal convolution based FFNs, Weight tying, and RMS Layer Normalization, along with Multi-Head Latent Attention(MLA) for better expressive capability per parameter and lower memory footprint.
EPT-ZeRo and its derivatives(i.g. EPT-I) are created by modifying DeepSeek-V3's modeling code, converting the model into a dense model instead of a Mixture of Experts(MoE) model, reducing the total parameters to the same number as the original model's active parameters and modifying the configuration to suit the model's new architecture.
EPT-ZeRo is the prototype of the EPT family, which is the base model that was only pretrained and did not undergo post-training including SFT and Alignment.
- Downloads last month
- 141