Text Generation
Transformers
Safetensors
qwen3_next
conversational
Entity-27th commited on
Commit
4decc35
·
verified ·
1 Parent(s): b143b2f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -10
README.md CHANGED
@@ -4,18 +4,11 @@ datasets:
4
  - nvidia/Nemotron-Post-Training-Dataset-v1
5
  pipeline_tag: text-generation
6
  library_name: transformers
7
- base_model:
8
- - Qwen/Qwen3-Next-80B-A3B-Instruct
9
  ---
10
  **Model Introduction**
11
 
12
  EPT-ZeRo is a sLM designed by Research Project ICT I team from Singapore Korean International School for on-device/edge environment, prioritizing lower memory usage and efficient inference.
13
- To achieve this, the ISAC series implements Rotary Positional Embeddings(RoPE), SwiGLU activation, Weight tying, and RMS Layer Normalization, along with Gated DeltaNet and Gated Grouped Query Power Retention(GGQRP) via Token Routing for better expressive capability per parameter and lower memory footprint.
14
- EPT-ZeRo and its derivatives(i.g. EPT-I) are created by modifying Qwen3-Next-80B-A3B's modeling code, replacing the full Scaled Dot Product Attention with Power Retention, converting the model into a dense model instead of a Mixture of Experts(MoE) model, reducing the total parameters to the same number as the original model's active parameters and modifying the configuration to suit the model's new architecture.
15
 
16
- EPT-ZeRo is the prototype of the EPT family, which is the base model that was only pretrained and did not undergo post-training including SFT and Alignment.
17
-
18
- **Caution**
19
-
20
- Note that EPT series may not support conventional optimization kernels such as FlashAttention, due to implementing Power Retention instead of Scaled Dot Product Attention.
21
- Therefore, users should not attempt to pass ```attn_implementation``` parameter when loading the model with ```AutoModelForCausalLM```. Though not tested, there is a chance of using FlashAttention or SDPA to load ISAC may cause an error.
 
4
  - nvidia/Nemotron-Post-Training-Dataset-v1
5
  pipeline_tag: text-generation
6
  library_name: transformers
 
 
7
  ---
8
  **Model Introduction**
9
 
10
  EPT-ZeRo is a sLM designed by Research Project ICT I team from Singapore Korean International School for on-device/edge environment, prioritizing lower memory usage and efficient inference.
11
+ To achieve this, the EPT series implements Rotary Positional Embeddings(RoPE), SwiGLU activation combined with causal convolution based FFNs, Weight tying, and RMS Layer Normalization, along with Multi-Head Latent Attention(MLA) for better expressive capability per parameter and lower memory footprint.
12
+ EPT-ZeRo and its derivatives(i.g. EPT-I) are created by modifying DeepSeek-V3's modeling code, converting the model into a dense model instead of a Mixture of Experts(MoE) model, reducing the total parameters to the same number as the original model's active parameters and modifying the configuration to suit the model's new architecture.
13
 
14
+ EPT-ZeRo is the prototype of the EPT family, which is the base model that was only pretrained and did not undergo post-training including SFT and Alignment.