Minh2508
/

Decode

Text Generation

Mixture of Experts

mixture-of-experts

Model card Files Files and versions

Decode / README.md

Minh2508's picture

update

739e7d6 verified about 2 months ago

|

2.44 kB

	---
	language:
	- vi
	- en
	license: apache-2.0
	library_name: transformers
	tags:
	- moe
	- mixture-of-experts
	- text-generation
	- decode-series
	- llm
	- vietnamese-llm
	datasets:
	- markov-ai/computer-use-large
	metrics:
	- loss
	- perplexity
	model-index:
	- name: Decode-12B-MoE
	results: []
	---

	# 🚀 Decode-12B-MoE: High-Performance Mixture of Experts Model

	Decode-12B-MoE is a Large Language Model (LLM) utilizing a Sparse Mixture of Experts (MoE) architecture with a total of 12.5 billion parameters. This model is engineered to bridge the gap between massive parameter counts and computational efficiency, activating only a fraction of its weights (~2.5B) during inference.

	## 📌 Technical Specifications

	\| Attribute \| Value \|
	\| :--- \| :--- \|
	\| Total Parameters \| 12,500,340,736 (12.5B) \|
	\| Active Parameters \| ~2.5B per token \|
	\| Architecture \| Sparse MoE (Decoder-only) \|
	\| Context Window \| 4096 tokens \|
	\| Format \| Bfloat16 / Float16 \|
	\| Training Hardware \| NVIDIA Tesla T4 (Prototyping) / [Your_Main_GPU] \|

	## 🛠 Training Methodology

	The model was trained with advanced memory optimization techniques to ensure stability on consumer and enterprise-grade hardware:
	- 8-bit Optimizer: Utilized `bitsandbytes` AdamW to reduce optimizer state memory footprint by 75%.
	- Gradient Checkpointing: Enabled to manage activation memory for deep MoE layers.
	- Dataset: Fine-tuned on a diverse corpus of Vietnamese and English text, focusing on reasoning, logic, and natural conversation.

	## 💻 Quick Start (Usage)

	To use this model, ensure you have `transformers` and `accelerate` installed.

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	# Replace with your actual Hugging Face repo ID
	model_id = "your-username/decode-12b-moe"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	trust_remote_code=True # Required for custom MoE architectures
	)

	# Test Prompt
	prompt = "Explain the concept of Quantum Computing in simple terms."
	inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_new_tokens=512,
	temperature=0.7,
	top_p=0.9,
	do_sample=True
	)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))