Text Generation
Transformers
Safetensors
qwen3_next
conversational

Model Introduction

EPT-ZeRo is a sLM designed by Research Project ICT I team from Singapore Korean International School for on-device/edge environment, prioritizing lower memory usage and efficient inference. To achieve this, the EPT series implements Rotary Positional Embeddings(RoPE), SwiGLU activation combined with causal convolution based FFNs, Weight tying, and RMS Layer Normalization, along with Multi-Head Latent Attention(MLA) for better expressive capability per parameter and lower memory footprint.
EPT-ZeRo and its derivatives(i.g. EPT-I) are created by modifying DeepSeek-V3's modeling code, converting the model into a dense model instead of a Mixture of Experts(MoE) model, reducing the total parameters to the same number as the original model's active parameters and modifying the configuration to suit the model's new architecture.

EPT-ZeRo is the prototype of the EPT family, which is the base model that was only pretrained and did not undergo post-training including SFT and Alignment.

Downloads last month
141
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train SKIS-AI-Research/EPT-ZeRo

Collection including SKIS-AI-Research/EPT-ZeRo