SKIS-AI-Research
/

EPT-ZeRo

Text Generation

Model card Files Files and versions

Entity-27th commited on Dec 16, 2025

Commit

4decc35

·

verified ·

1 Parent(s): b143b2f

Update README.md

Files changed (1) hide show

README.md +3 -10

README.md CHANGED Viewed

@@ -4,18 +4,11 @@ datasets:
 - nvidia/Nemotron-Post-Training-Dataset-v1
 pipeline_tag: text-generation
 library_name: transformers
-base_model:
-- Qwen/Qwen3-Next-80B-A3B-Instruct
 ---
 **Model Introduction**
 EPT-ZeRo is a sLM designed by Research Project ICT I team from Singapore Korean International School for on-device/edge environment, prioritizing lower memory usage and efficient inference.
-To achieve this, the ISAC series implements Rotary Positional Embeddings(RoPE), SwiGLU activation, Weight tying, and RMS Layer Normalization, along with Gated DeltaNet and Gated Grouped Query Power Retention(GGQRP) via Token Routing for better expressive capability per parameter and lower memory footprint.
-EPT-ZeRo and its derivatives(i.g. EPT-I) are created by modifying Qwen3-Next-80B-A3B's modeling code, replacing the full Scaled Dot Product Attention with Power Retention, converting the model into a dense model instead of a Mixture of Experts(MoE) model, reducing the total parameters to the same number as the original model's active parameters and modifying the configuration to suit the model's new architecture.
-EPT-ZeRo is the prototype of the EPT family, which is the base model that was only pretrained and did not undergo post-training including SFT and Alignment.
-**Caution**
-Note that EPT series may not support conventional optimization kernels such as FlashAttention, due to implementing Power Retention instead of Scaled Dot Product Attention.
-Therefore, users should not attempt to pass ```attn_implementation``` parameter when loading the model with ```AutoModelForCausalLM```. Though not tested, there is a chance of using FlashAttention or SDPA to load ISAC may cause an error.

 - nvidia/Nemotron-Post-Training-Dataset-v1
 pipeline_tag: text-generation
 library_name: transformers
 ---
 **Model Introduction**
 EPT-ZeRo is a sLM designed by Research Project ICT I team from Singapore Korean International School for on-device/edge environment, prioritizing lower memory usage and efficient inference.
+To achieve this, the EPT series implements Rotary Positional Embeddings(RoPE), SwiGLU activation combined with causal convolution based FFNs, Weight tying, and RMS Layer Normalization, along with Multi-Head Latent Attention(MLA) for better expressive capability per parameter and lower memory footprint.
+EPT-ZeRo and its derivatives(i.g. EPT-I) are created by modifying DeepSeek-V3's modeling code, converting the model into a dense model instead of a Mixture of Experts(MoE) model, reducing the total parameters to the same number as the original model's active parameters and modifying the configuration to suit the model's new architecture.
+EPT-ZeRo is the prototype of the EPT family, which is the base model that was only pretrained and did not undergo post-training including SFT and Alignment.