| --- |
| language: |
| - vi |
| - en |
| license: apache-2.0 |
| library_name: transformers |
| tags: |
| - moe |
| - mixture-of-experts |
| - text-generation |
| - decode-series |
| - llm |
| - vietnamese-llm |
| datasets: |
| - markov-ai/computer-use-large |
| metrics: |
| - loss |
| - perplexity |
| model-index: |
| - name: Decode-12B-MoE |
| results: [] |
| --- |
| |
| # π Decode-12B-MoE: High-Performance Mixture of Experts Model |
|
|
| **Decode-12B-MoE** is a Large Language Model (LLM) utilizing a **Sparse Mixture of Experts (MoE)** architecture with a total of **12.5 billion parameters**. This model is engineered to bridge the gap between massive parameter counts and computational efficiency, activating only a fraction of its weights (~2.5B) during inference. |
| ** Untrained model! ** |
| ## π Technical Specifications |
|
|
| | Attribute | Value | |
| | :--- | :--- | |
| | **Total Parameters** | 12,500,340,736 (12.5B) | |
| | **Active Parameters** | ~2.5B per token | |
| | **Architecture** | Sparse MoE (Decoder-only) | |
| | **Context Window** | 4096 tokens | |
| | **Format** | Bfloat16 / Float16 | |
| | **Training Hardware** | NVIDIA Tesla T4 (Prototyping) / [Your_Main_GPU] | |
|
|
| ## π Training Methodology |
|
|
| The model was trained with advanced memory optimization techniques to ensure stability on consumer and enterprise-grade hardware: |
| - **8-bit Optimizer:** Utilized `bitsandbytes` AdamW to reduce optimizer state memory footprint by 75%. |
| - **Gradient Checkpointing:** Enabled to manage activation memory for deep MoE layers. |
| - **Dataset:** Fine-tuned on a diverse corpus of Vietnamese and English text, focusing on reasoning, logic, and natural conversation. |
|
|
| ## π» Quick Start (Usage) |
|
|
| To use this model, ensure you have `transformers` and `accelerate` installed. |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| import torch |
| |
| # Replace with your actual Hugging Face repo ID |
| model_id = "your-username/decode-12b-moe" |
| |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| model = AutoModelForCausalLM.from_pretrained( |
| model_id, |
| torch_dtype=torch.bfloat16, |
| device_map="auto", |
| trust_remote_code=True # Required for custom MoE architectures |
| ) |
| |
| # Test Prompt |
| prompt = "Explain the concept of Quantum Computing in simple terms." |
| inputs = tokenizer(prompt, return_tensors="pt").to("cuda") |
| |
| with torch.no_grad(): |
| outputs = model.generate( |
| **inputs, |
| max_new_tokens=512, |
| temperature=0.7, |
| top_p=0.9, |
| do_sample=True |
| ) |
| |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |