Text Generation
Transformers
Safetensors
PyTorch
Indonesian
deeplm
bitnet
Mixture of Experts
mla
mtp
indonesian
Instructions to use samcheng0/deeplm-108m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use samcheng0/deeplm-108m with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="samcheng0/deeplm-108m")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("samcheng0/deeplm-108m", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use samcheng0/deeplm-108m with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "samcheng0/deeplm-108m" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "samcheng0/deeplm-108m", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/samcheng0/deeplm-108m
- SGLang
How to use samcheng0/deeplm-108m with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "samcheng0/deeplm-108m" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "samcheng0/deeplm-108m", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "samcheng0/deeplm-108m" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "samcheng0/deeplm-108m", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use samcheng0/deeplm-108m with Docker Model Runner:
docker model run hf.co/samcheng0/deeplm-108m
Deeplm β 108M BitNet MoE Language Model
Deeplm adalah model bahasa berukuran ~105M parameter dengan BitNet b1.58 ternary quantization dari awal, terinspirasi dari arsitektur DeepSeek V4, Kimi K2.6, dan MiniMax M2.7.
ποΈ Arsitektur
| Komponen | Detail |
|---|---|
| Total Parameters | ~104.7M |
| Architecture | Decoder-only Transformer |
| Layers | 10 |
| Hidden Size | 512 |
| Vocab Size | 32,000 (BPETokenizer) |
| Max Seq Length | 4,096 |
| Attention Heads | 8 (MQA, 1 KV head) |
| Quantization | BitNet b1.58 ternary {-1, 0, +1}, absmean |
| Dtype | float32 (weights terkuantisasi ke ternary) |
β¨ Fitur Inovatif
| Fitur | Sumber | Keterangan |
|---|---|---|
| MLA | DeepSeek V4 | Multi-head Latent Attention, KV cache compression 24x |
| MoE | DeepSeek V4 + Kimi K2.6 | 4 routed + 1 shared expert, top-k=2 |
| Hybrid Attention | MiniMax M2.7 | Softmax + Lightning v2 linear attention |
| Hyper-Connections | DeepSeek V4 | Sinkhorn routing, menggantikan residual standar |
| MTP | DeepSeek V4 | Multi-Token Prediction, depth=2 |
| BitNet b1.58 | BitNet | Ternary quantization {-1, 0, +1} dari init |
| AutoTuner | Deeplm | Adaptive LR, GN, WD, momentum, revive, trajectory prediction |
| Curriculum Router | Deeplm | Phase-based category weighting |
| Self-Evolution | MiniMax M2.7 | Autonomous hypothesis β experiment β decision loop |
π Spesifikasi Model
{
"architectures": ["DeeplmModel"],
"model_type": "deeplm",
"vocab_size": 32000,
"hidden_size": 512,
"intermediate_size": 2048,
"num_hidden_layers": 10,
"num_attention_heads": 8,
"num_key_value_heads": 1,
"max_position_embeddings": 4096,
"rms_norm_eps": 1e-06,
"rope_theta": 50000.0,
"rope_dim": 64,
"tie_word_embeddings": true,
"num_routed_experts": 4,
"num_shared_experts": 1,
"expert_topk": 2,
"q_lora_rank": 192,
"kv_lora_rank": 64,
"qk_rope_head_dim": 64,
"qk_nope_head_dim": 64,
"v_head_dim": 128,
"mtp_depth": 2,
"mtp_num_layers": 2,
"bitnet_quantized": true,
"bitnet_scale": "absmean"
}
π Usage
Inference
import sys
sys.path.insert(0, "deeplm")
from deeplm.config import DeeplmConfig
from deeplm.model.deeplm import DeeplmModel
from safetensors.torch import load_file
import torch
# Load config
config = DeeplmConfig()
# Build model
model = DeeplmModel(config)
# Load BitNet quantized weights
state_dict = load_file("model.safetensors")
model.load_state_dict(state_dict, strict=False)
# Generate
input_ids = torch.tensor([[1, 2, 3]]) # bos + tokens
output = model.generate(input_ids, max_new_tokens=100, temperature=0.7)
Training
# Install dependencies
pip install torch datasets tokenizers pyyaml einops huggingface-hub safetensors
# Train with all features
python train.py --batch_size 3 --grad_accum 2 --max_steps 31250
# Custom config
python train.py \
--max_steps 100000 \
--batch_size 4 \
--seq_len 512 \
--lr 3e-4 \
--no_auto_tuner
π Struktur Project
deeplm-108m/
βββ config.json # Model config
βββ generation_config.json # Generation params
βββ model.safetensors # BitNet quantized weights (419MB)
βββ tokenizer.json # BPETokenizer
βββ tokenizer_config.json # Tokenizer config
βββ train.py # Training script (all features)
βββ init_model.py # Model initialization script
βββ deeplm_modal.py # Modal.com build script
βββ deeplm/ # Source code
βββ config.py # Dataclass configs
βββ model/
β βββ deeplm.py # Main model
β βββ mla.py # Multi-head Latent Attention
β βββ moe.py # Mixture of Experts
β βββ hybrid_attention.py # Softmax + Lightning
β βββ hyper_connections.py # Sinkhorn routing
β βββ mtp.py # Multi-Token Prediction
β βββ transformer_block.py
βββ training/
β βββ trainer.py # Training loop
β βββ auto_tuner.py # Adaptive training controller
β βββ curriculum_router.py # Phase-based routing
β βββ data_pipeline.py # Bucket dataset + sampler
β βββ logger.py # SmartLogger + anomaly detection
β βββ control/ # TrainingControl plane
βββ self_evolution/
β βββ framework.py # Autonomous evolution loop
βββ quantization/
βββ bitnet_quantize.py # BitNet b1.58
βββ gguf_export.py
π Training
| Parameter | Value |
|---|---|
| Dataset | afrizalha/KamusOne-28M-Indonesian |
| Optimizer | AdamW (Ξ²1=0.9, Ξ²2=0.95, Ξ΅=1e-8) |
| LR | 6e-4 (cosine, warmup=150) |
| Batch Size | 3 x grad_accum=2 = 6 effective |
| Weight Decay | 0.1 |
| Max Grad Norm | 1.0 |
| Max Steps | 31,250 |
π License
Apache 2.0
π Acknowledgments
Arsitektur terinspirasi dari:
- DeepSeek V4 β MLA, Hyper-Connections, MTP, MoE routing
- Kimi K2.6 β Shared Expert, Agent Swarm
- MiniMax M2.7 β Self-Evolution Framework, Hybrid Attention, Agent Harness
- BitNet β b1.58 ternary quantization
- Downloads last month
- 231