Text Generation
Transformers
Safetensors
PyTorch
Indonesian
deeplm
bitnet
Mixture of Experts
mla
mtp
indonesian
Instructions to use samcheng0/deeplm-108m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use samcheng0/deeplm-108m with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="samcheng0/deeplm-108m")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("samcheng0/deeplm-108m", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use samcheng0/deeplm-108m with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "samcheng0/deeplm-108m" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "samcheng0/deeplm-108m", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/samcheng0/deeplm-108m
- SGLang
How to use samcheng0/deeplm-108m with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "samcheng0/deeplm-108m" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "samcheng0/deeplm-108m", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "samcheng0/deeplm-108m" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "samcheng0/deeplm-108m", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use samcheng0/deeplm-108m with Docker Model Runner:
docker model run hf.co/samcheng0/deeplm-108m
| language: id | |
| license: apache-2.0 | |
| library_name: transformers | |
| tags: | |
| - pytorch | |
| - safetensors | |
| - deeplm | |
| - bitnet | |
| - moe | |
| - mla | |
| - mtp | |
| - indonesian | |
| pipeline_tag: text-generation | |
| # Deeplm β 108M BitNet MoE Language Model | |
| Deeplm adalah model bahasa berukuran ~105M parameter dengan **BitNet b1.58 ternary quantization** dari awal, terinspirasi dari arsitektur **DeepSeek V4**, **Kimi K2.6**, dan **MiniMax M2.7**. | |
| ## ποΈ Arsitektur | |
| | Komponen | Detail | | |
| |---|---| | |
| | **Total Parameters** | ~104.7M | | |
| | **Architecture** | Decoder-only Transformer | | |
| | **Layers** | 10 | | |
| | **Hidden Size** | 512 | | |
| | **Vocab Size** | 32,000 (BPETokenizer) | | |
| | **Max Seq Length** | 4,096 | | |
| | **Attention Heads** | 8 (MQA, 1 KV head) | | |
| | **Quantization** | BitNet b1.58 ternary {-1, 0, +1}, absmean | | |
| | **Dtype** | float32 (weights terkuantisasi ke ternary) | | |
| ## β¨ Fitur Inovatif | |
| | Fitur | Sumber | Keterangan | | |
| |---|---|---| | |
| | **MLA** | DeepSeek V4 | Multi-head Latent Attention, KV cache compression 24x | | |
| | **MoE** | DeepSeek V4 + Kimi K2.6 | 4 routed + 1 shared expert, top-k=2 | | |
| | **Hybrid Attention** | MiniMax M2.7 | Softmax + Lightning v2 linear attention | | |
| | **Hyper-Connections** | DeepSeek V4 | Sinkhorn routing, menggantikan residual standar | | |
| | **MTP** | DeepSeek V4 | Multi-Token Prediction, depth=2 | | |
| | **BitNet b1.58** | BitNet | Ternary quantization {-1, 0, +1} dari init | | |
| | **AutoTuner** | Deeplm | Adaptive LR, GN, WD, momentum, revive, trajectory prediction | | |
| | **Curriculum Router** | Deeplm | Phase-based category weighting | | |
| | **Self-Evolution** | MiniMax M2.7 | Autonomous hypothesis β experiment β decision loop | | |
| ## π Spesifikasi Model | |
| ```json | |
| { | |
| "architectures": ["DeeplmModel"], | |
| "model_type": "deeplm", | |
| "vocab_size": 32000, | |
| "hidden_size": 512, | |
| "intermediate_size": 2048, | |
| "num_hidden_layers": 10, | |
| "num_attention_heads": 8, | |
| "num_key_value_heads": 1, | |
| "max_position_embeddings": 4096, | |
| "rms_norm_eps": 1e-06, | |
| "rope_theta": 50000.0, | |
| "rope_dim": 64, | |
| "tie_word_embeddings": true, | |
| "num_routed_experts": 4, | |
| "num_shared_experts": 1, | |
| "expert_topk": 2, | |
| "q_lora_rank": 192, | |
| "kv_lora_rank": 64, | |
| "qk_rope_head_dim": 64, | |
| "qk_nope_head_dim": 64, | |
| "v_head_dim": 128, | |
| "mtp_depth": 2, | |
| "mtp_num_layers": 2, | |
| "bitnet_quantized": true, | |
| "bitnet_scale": "absmean" | |
| } | |
| ``` | |
| ## π Usage | |
| ### Inference | |
| ```python | |
| import sys | |
| sys.path.insert(0, "deeplm") | |
| from deeplm.config import DeeplmConfig | |
| from deeplm.model.deeplm import DeeplmModel | |
| from safetensors.torch import load_file | |
| import torch | |
| # Load config | |
| config = DeeplmConfig() | |
| # Build model | |
| model = DeeplmModel(config) | |
| # Load BitNet quantized weights | |
| state_dict = load_file("model.safetensors") | |
| model.load_state_dict(state_dict, strict=False) | |
| # Generate | |
| input_ids = torch.tensor([[1, 2, 3]]) # bos + tokens | |
| output = model.generate(input_ids, max_new_tokens=100, temperature=0.7) | |
| ``` | |
| ### Training | |
| ```bash | |
| # Install dependencies | |
| pip install torch datasets tokenizers pyyaml einops huggingface-hub safetensors | |
| # Train with all features | |
| python train.py --batch_size 3 --grad_accum 2 --max_steps 31250 | |
| # Custom config | |
| python train.py \ | |
| --max_steps 100000 \ | |
| --batch_size 4 \ | |
| --seq_len 512 \ | |
| --lr 3e-4 \ | |
| --no_auto_tuner | |
| ``` | |
| ## π Struktur Project | |
| ``` | |
| deeplm-108m/ | |
| βββ config.json # Model config | |
| βββ generation_config.json # Generation params | |
| βββ model.safetensors # BitNet quantized weights (419MB) | |
| βββ tokenizer.json # BPETokenizer | |
| βββ tokenizer_config.json # Tokenizer config | |
| βββ train.py # Training script (all features) | |
| βββ init_model.py # Model initialization script | |
| βββ deeplm_modal.py # Modal.com build script | |
| βββ deeplm/ # Source code | |
| βββ config.py # Dataclass configs | |
| βββ model/ | |
| β βββ deeplm.py # Main model | |
| β βββ mla.py # Multi-head Latent Attention | |
| β βββ moe.py # Mixture of Experts | |
| β βββ hybrid_attention.py # Softmax + Lightning | |
| β βββ hyper_connections.py # Sinkhorn routing | |
| β βββ mtp.py # Multi-Token Prediction | |
| β βββ transformer_block.py | |
| βββ training/ | |
| β βββ trainer.py # Training loop | |
| β βββ auto_tuner.py # Adaptive training controller | |
| β βββ curriculum_router.py # Phase-based routing | |
| β βββ data_pipeline.py # Bucket dataset + sampler | |
| β βββ logger.py # SmartLogger + anomaly detection | |
| β βββ control/ # TrainingControl plane | |
| βββ self_evolution/ | |
| β βββ framework.py # Autonomous evolution loop | |
| βββ quantization/ | |
| βββ bitnet_quantize.py # BitNet b1.58 | |
| βββ gguf_export.py | |
| ``` | |
| ## π Training | |
| | Parameter | Value | | |
| |---|---| | |
| | **Dataset** | afrizalha/KamusOne-28M-Indonesian | | |
| | **Optimizer** | AdamW (Ξ²1=0.9, Ξ²2=0.95, Ξ΅=1e-8) | | |
| | **LR** | 6e-4 (cosine, warmup=150) | | |
| | **Batch Size** | 3 x grad_accum=2 = 6 effective | | |
| | **Weight Decay** | 0.1 | | |
| | **Max Grad Norm** | 1.0 | | |
| | **Max Steps** | 31,250 | | |
| ## π License | |
| Apache 2.0 | |
| ## π Acknowledgments | |
| Arsitektur terinspirasi dari: | |
| - **DeepSeek V4** β MLA, Hyper-Connections, MTP, MoE routing | |
| - **Kimi K2.6** β Shared Expert, Agent Swarm | |
| - **MiniMax M2.7** β Self-Evolution Framework, Hybrid Attention, Agent Harness | |
| - **BitNet** β b1.58 ternary quantization | |