Update README.md
Browse files
README.md
CHANGED
|
@@ -1,519 +1,265 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
# M1llion-35B: Extreme Compression & Full-Stack Intelligent Model
|
| 5 |
-
**M1llion AI Official Launch — Full TensorFlow/PyTorch Implementation Based on the NEO-v1 35B Technical Report**
|
| 6 |
|
| 7 |
-
|
|
|
|
|
|
|
| 8 |
|
| 9 |
-
## 🚀
|
| 10 |
-
M1llion
|
| 11 |
-
|
| 12 |
-
### Core Feature Highlights
|
| 13 |
-
- **AI Timer & Calendar (Intelligent Interconnection)**
|
| 14 |
-
- Monitors your conversations and automatically sets timers, stopwatches, and events
|
| 15 |
-
- Eliminates the hassle of forgetting the "wait, remind me to..." tasks just seconds after saying them
|
| 16 |
-
- **M1llion Memory (Local-Only, Privacy-First)**
|
| 17 |
-
- Runs on YOUR computer, not our servers
|
| 18 |
-
- Learns your habits, preferences, and routines automatically, securely, and privately
|
| 19 |
-
- **Emotion Engine (Truly Understands You)**
|
| 20 |
-
- Detects your emotional state and provides practical, genuine advice
|
| 21 |
-
- Combines screen recognition to understand context, rather than relying solely on keywords
|
| 22 |
-
- **Screen Recognition & Intelligent Agent**
|
| 23 |
-
- Groundbreaking capability: can "see" your screen and execute actions
|
| 24 |
-
- Clicks, scrolls, and navigates—just like a real assistant sitting right next to you
|
| 25 |
-
- **Multi-Format Compatibility**
|
| 26 |
-
- Text, images, video, audio—throw it all in at once, and it handles it seamlessly
|
| 27 |
-
|
| 28 |
-
### Collaboration Teams
|
| 29 |
-
We're partnering with a roster of exceptionally talented teams:
|
| 30 |
-
- pure-team
|
| 31 |
-
- cogent-ai
|
| 32 |
-
- Arc4 (our sister branch focused specifically on Arc AI)
|
| 33 |
-
- neo-ai-team
|
| 34 |
-
|
| 35 |
-
Great things happen when you stop trying to build everything alone.
|
| 36 |
-
|
| 37 |
-
### Launch Details
|
| 38 |
-
**Launch Time**: February 14, 2026, 21:00 (UTC+8)
|
| 39 |
-
Two core resources will be released simultaneously on Hugging Face:
|
| 40 |
-
1. **Chromos-Fabric** — The highly anticipated AGI model. Configuration files will be made available immediately after launch for the community to validate and analyze.
|
| 41 |
-
2. **M1llion-35B** — The core model powering all M1llion AI features outlined above. This is the first time the full system is being made accessible to the public.
|
| 42 |
-
- Surprise hidden features: Unveiled on launch day—stay tuned for the reveal.
|
| 43 |
|
| 44 |
-
|
| 45 |
|
| 46 |
-
|
| 47 |
-
M1llion-35B is a 35 billion parameter Mixture-of-Experts (MoE) large language model, integrating **15 core proprietary technologies**, **QEPQ Extreme Compression Technology**, and **Hundreds Security Architecture (HSA)**. While maintaining exceptional performance, the model achieves deployment efficiency far exceeding industry standards and **top-tier security protection**.
|
| 48 |
-
|
| 49 |
-
### Core Characteristics
|
| 50 |
-
- **Tokenizer**: Expanded to a 256k vocabulary to enhance multilingual capabilities
|
| 51 |
-
- **Training Datasets**: Recommended to use Hugging Face Datasets such as mOSCAR, Maya-LLaVA-Pretrain, and OpenAssistant/oasst1
|
| 52 |
-
- **Benchmark Report**: See `config/BENCHMARK_REPORT.md` for details, including OSEH metrics
|
| 53 |
-
- **Model Weights**: Can be exported to TensorFlow or PyTorch formats after training
|
| 54 |
-
- **Open-Source Evaluation**: Adheres to industry standards, using benchmarks such as MMLU-Pro, HumanEvo, GSM8K, MT-Bench, and NVR-FactCheck
|
| 55 |
-
- **Framework Compatibility**: Dual-framework support for TensorFlow 2.x and PyTorch 2.x
|
| 56 |
-
- **Multimodal Support**: Integrates the VisionPerceptionModule (VPM) to support image/video input and screen recognition
|
| 57 |
-
|
| 58 |
-
### Technical Specifications
|
| 59 |
| Specification | Details |
|
| 60 |
|:---|:---|
|
| 61 |
-
| Total Parameters | ~35 Billion (multimodal
|
| 62 |
-
| Active Parameters | ~7 Billion (
|
| 63 |
-
| Deployment Size | <
|
| 64 |
-
| Architecture | Mixture-of-Experts Transformer |
|
| 65 |
-
| Framework Support | TensorFlow 2.x / PyTorch 2.x |
|
| 66 |
| Context Window | 8192 tokens |
|
| 67 |
-
| Vocabulary Size | 256,000 |
|
|
|
|
|
|
|
|
|
|
| 68 |
| Security Architecture | Hundreds Security Architecture (HSA) |
|
| 69 |
-
|
|
| 70 |
-
|
| 71 |
-
---
|
| 72 |
|
| 73 |
-
##
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
Unlike traditional LLMs that focus solely on textual performance, M1llion-35B is designed to interact with the physical world through screen recognition, tool use, and context-aware decision-making. The model maintains strong performance across multiple benchmarks while being deployable on consumer hardware, enabled by QEPQ compression technology that reduces the model size to under 10GB. Additionally, the integrated Hundreds Security Architecture (HSA) ensures data confidentiality and model integrity, addressing critical security concerns for real-world applications.
|
| 81 |
-
|
| 82 |
-
### 2. Model Architecture
|
| 83 |
-
M1llion-35B adopts a decoder-only MoE Transformer architecture with specialized modules for multimodal processing, security, and agentic reasoning.
|
| 84 |
-
|
| 85 |
-
#### 2.1 Core Transformer Backbone
|
| 86 |
-
- **Layer Configuration**: 32 Transformer layers with 4096 hidden dimension
|
| 87 |
-
- **Attention Mechanism**: 32 attention heads with grouped-query attention for memory efficiency
|
| 88 |
-
- **Positional Encoding**: Rotary Positional Embeddings (RoPE) with base frequency 500,000 for long-context modeling
|
| 89 |
-
- **Activation Function**: Gelu for feed-forward networks
|
| 90 |
-
- **Normalization**: Layer normalization with epsilon=1e-6
|
| 91 |
-
|
| 92 |
-
#### 2.2 Mixture-of-Experts Design
|
| 93 |
-
- **Expert Count**: 8 total experts with 2 experts activated per token
|
| 94 |
-
- **Router Architecture**: Dynamic routing with jitter noise (0.01) for load balancing
|
| 95 |
-
- **Router Losses**: Z-loss (coefficient 0.001) and auxiliary loss (coefficient 0.01) to optimize expert utilization
|
| 96 |
-
- **Active Parameters**: ~7B active parameters during inference, ensuring efficiency
|
| 97 |
-
|
| 98 |
-
#### 2.3 Multimodal Integration
|
| 99 |
-
- **Vision Perception Module (VPM)**: Custom CNN-based encoder for image/video processing
|
| 100 |
-
- Supports image resolution up to 256x256 and video sequences up to 120 frames
|
| 101 |
-
- Projects visual features to 4096-dimensional space for integration with text
|
| 102 |
-
- **Cross-Modal Fusion**: Gated fusion mechanism to combine text and visual embeddings
|
| 103 |
-
- **Screen Recognition**: Specialized visual category classification for UI elements (buttons, text inputs, links, etc.)
|
| 104 |
-
|
| 105 |
-
#### 2.4 Security Architecture
|
| 106 |
-
- **Hundreds Security Architecture (HSA)**: Three core components
|
| 107 |
-
1. Zero-Trust Data Sentinel (ZTDS): Encrypts intermediate hidden states with layer-specific keys
|
| 108 |
-
2. Quantum Weight Attestation (QWA): Real-time weight integrity verification via Merkle Tree Root comparison
|
| 109 |
-
3. Contextual Threat Monitor (CTM): Detects and mitigates adversarial attacks (e.g., prompt injection)
|
| 110 |
-
|
| 111 |
-
#### 2.5 Efficiency Optimizations
|
| 112 |
-
- **QEPQ Compression**: Quantum-Entangled Pruning & Quantization
|
| 113 |
-
- 2-bit quantization with nonlinear codebook
|
| 114 |
-
- 60% pruning ratio based on entanglement metrics
|
| 115 |
-
- Gzip secondary compression for additional size reduction
|
| 116 |
-
- **Progressive Tech Activation**: Dynamically enables/disables technologies based on task complexity
|
| 117 |
-
- **On-Device Compute**: Int8 low-precision flow and memory-efficient attention
|
| 118 |
-
|
| 119 |
-
### 3. Pre-Training
|
| 120 |
-
M1llion-35B follows a multi-stage pre-training curriculum to build strong foundational capabilities while emphasizing efficiency and reasoning.
|
| 121 |
-
|
| 122 |
-
#### 3.1 Data Preparation
|
| 123 |
-
- **Corpus Composition**: Multilingual data including Korean, English, and other major languages
|
| 124 |
-
- General text: 59.1-79.4% across stages
|
| 125 |
-
- Code: 12.0-25.2% across stages
|
| 126 |
-
- Mathematics: 8.6-25.3% across stages
|
| 127 |
-
- Instruction tuning: 0.0-32.5% across stages
|
| 128 |
-
- **Data Filtering**: Two-stage filtering with rule-based heuristics and model-based quality scoring
|
| 129 |
-
- **Synthetic Data**: Generated reasoning traces and PII-safe rewrites of documents with figures/tables
|
| 130 |
-
|
| 131 |
-
#### 3.2 Training Curriculum
|
| 132 |
-
| Stage | Focus | Context Window | Token Count | Learning Rate |
|
| 133 |
-
|:---|:---|:---|:---|:---|
|
| 134 |
-
| 1 | Foundation Knowledge | 4K | 6 trillion | 1.5e-5 → 3.1e-5 |
|
| 135 |
-
| 2 | Context Extension | 8K | 4 trillion | Cosine decay (10% of Stage 1 peak) |
|
| 136 |
-
| 3 | Advanced Reasoning | 32K | 3 trillion | Cosine decay to 1.0e-5 |
|
| 137 |
-
| 4 | High-Quality Annealing | 32K | 2 trillion | Annealed to 3.3e-6 |
|
| 138 |
-
|
| 139 |
-
- **Fill-in-the-Middle**: Applied to 10% of tokens to enhance code generation and long-context modeling
|
| 140 |
-
- **Dynamic Batch Sizing**: Adjusted based on context length to maintain training stability
|
| 141 |
-
|
| 142 |
-
### 4. Post-Training
|
| 143 |
-
Post-training consists of supervised fine-tuning (SFT) and reinforcement learning (RL) to enhance multimodal capabilities, agentic behavior, and human alignment.
|
| 144 |
-
|
| 145 |
-
#### 4.1 Supervised Fine-Tuning (SFT)
|
| 146 |
-
- **Text SFT**: Three data types (non-reasoning, reasoning, agent) with strict trajectory filtering
|
| 147 |
-
- **Multimodal SFT**: Four-stage process
|
| 148 |
-
1. Cross-modal alignment: Align visual features to text embedding space
|
| 149 |
-
2. Multimodal knowledge learning: Broaden visual knowledge representation
|
| 150 |
-
3. Task-oriented instruction tuning: Enhance multimodal interaction
|
| 151 |
-
4. Advanced reasoning: Long-context multimodal reasoning and video understanding
|
| 152 |
-
- **Chat Template**: Unified template for consistent generation across scenarios
|
| 153 |
-
|
| 154 |
-
#### 4.2 Reinforcement Learning (RL)
|
| 155 |
-
- **Agent RL**: Specialized training for sequential decision making and tool use
|
| 156 |
-
- Context window: 44K (general agent), 128K (SWE agent)
|
| 157 |
-
- Group size: 8 (general agent), 16 (SWE agent)
|
| 158 |
-
- Reward components: Environment reward, format adherence, language consistency
|
| 159 |
-
- **Multimodal RL with Verifiable Rewards**: Enhance reasoning with verifiable feedback
|
| 160 |
-
- **RL from Human Feedback**: Align model behavior with human preferences for harmlessness and usefulness
|
| 161 |
-
|
| 162 |
-
### 5. Evaluation
|
| 163 |
-
M1llion-35B is evaluated across text-to-text, vision-to-text, and agent benchmarks using a unified evaluation framework (Omni-Evaluator) to ensure reproducibility.
|
| 164 |
-
|
| 165 |
-
#### 5.1 Baselines
|
| 166 |
-
- Open-source models: Qwen3-VL 32B-Thinking, InternVL3.5 38B-Thinking, EXAONE 4.0 32B
|
| 167 |
-
- Commercial models: GPT-5.1, Qwen3 235B-A22B
|
| 168 |
-
|
| 169 |
-
#### 5.2 Key Results
|
| 170 |
-
| Benchmark Category | Performance Highlights |
|
| 171 |
-
|:---|:---|
|
| 172 |
-
| Text-to-Text (Korean) | KMMLU: 71.3, HAERAE Bench 1.0: 87.4, KoBALT: 50.6 |
|
| 173 |
-
| Text-to-Text (English) | MMLU: 87.7, PIQA: 76.7, Flores+ (En→Ko): 31.8 |
|
| 174 |
-
| Vision-to-Text | KoNET: 75.1, K-MMBench: 88.1, TextVQA: 85.4 |
|
| 175 |
-
| Agent | Tau2-Airline: 58.0, Tau2-Retail: 71.6, Terminal Bench: 21.8 |
|
| 176 |
-
| Core Metrics | OSEH: 193.70, Hallucination Rate: 1.2%, Inference Latency: 150ms |
|
| 177 |
-
|
| 178 |
-
#### 5.3 Deployment Efficiency
|
| 179 |
-
| Configuration | Model Size | Performance Loss |
|
| 180 |
-
|:---|:---|:---|
|
| 181 |
-
| FP16 (Baseline) | ~70 GB | 0.0% |
|
| 182 |
-
| FP8 (Traditional) | ~35 GB | 0.5% |
|
| 183 |
-
| QEPQ Compression | <10 GB | 0.1% |
|
| 184 |
|
| 185 |
-
|
| 186 |
-
|
| 187 |
|
| 188 |
-
|
| 189 |
|
| 190 |
-
|
|
|
|
| 191 |
|
| 192 |
-
|
| 193 |
-
### Core Proprietary Technologies (15 Items)
|
| 194 |
-
#### Foundational Core Technologies (6 Items)
|
| 195 |
-
1. **MultiPathRouter (Quantum-Entangled Reasoning Unit)**
|
| 196 |
-
- Quantum-entangled reasoning unit
|
| 197 |
-
- Multi-path parallel reasoning
|
| 198 |
-
- Enhanced deep logic chain construction capability
|
| 199 |
-
2. **Reality Anchoring (RA)**
|
| 200 |
-
- Reality anchoring mechanism
|
| 201 |
-
- Real-time fact calibration
|
| 202 |
-
- Hallucination suppression rate < 1.2%
|
| 203 |
-
3. **MGO (Multi-dimensional Generation Orchestrator)**
|
| 204 |
-
- Multi-dimensional generation orchestrator
|
| 205 |
-
- Multimodal output coordination
|
| 206 |
-
- Semantic consistency guarantee
|
| 207 |
-
4. **Person X Memory Symbiosis Engine**
|
| 208 |
-
- Memory symbiosis engine
|
| 209 |
-
- Long-term contextual memory management
|
| 210 |
-
- Graph-structured external memory bank
|
| 211 |
-
5. **AMI (Agent Matrix Interface)**
|
| 212 |
-
- Agent matrix interface
|
| 213 |
-
- Full-stack multimodal: Integrates custom VisionPerceptionModule (VPM)
|
| 214 |
-
- Autonomous action decision-making: Observes screens/pages to decide and execute actions
|
| 215 |
-
- Android Agent logic layer: Outputs Android Accessibility/UI Automator compatible commands
|
| 216 |
-
6. **QEMC (Quantum-Entangled Memory Coherence)**
|
| 217 |
-
- Quantum-entangled memory coherence
|
| 218 |
-
- Maintains quantum entanglement of memory-related weights under QEPQ compression
|
| 219 |
-
- Ensures integrity and retrievability of memory information
|
| 220 |
-
|
| 221 |
-
#### Enhanced Technologies (3 Items)
|
| 222 |
-
7. **SAR (Sparse Attention Routing)**
|
| 223 |
-
- Sparse attention routing
|
| 224 |
-
- Optimizes MoE attention mechanism
|
| 225 |
-
- Significantly reduces inference latency
|
| 226 |
-
8. **DQAT (Dynamic Quantization-Aware Training)**
|
| 227 |
-
- Dynamic quantization-aware training
|
| 228 |
-
- Learnable quantization parameters
|
| 229 |
-
- Adaptive bit allocation
|
| 230 |
-
9. **SCRL (Self-Correcting Reasoning Loop)**
|
| 231 |
-
- Self-correcting reasoning loop
|
| 232 |
-
- Multi-step verification and correction
|
| 233 |
-
- Secondary logical check
|
| 234 |
-
|
| 235 |
-
#### Security Architecture Technology (1 Item)
|
| 236 |
-
10. **Hundreds Security Architecture (HSA)**
|
| 237 |
-
- Top-tier security architecture (similar to HyperOS 3.0)
|
| 238 |
-
- ZTDS (Zero-Trust Data Sentinel): Data stream encryption and authentication
|
| 239 |
-
- QWA (Quantum Weight Attestation): Real-time weight integrity verification
|
| 240 |
-
- CTM (Contextual Threat Monitor): Real-time threat assessment and multi-level mitigation
|
| 241 |
-
|
| 242 |
-
#### Compression Technology (1 Item)
|
| 243 |
-
11. **QEPQ (Quantum-Entangled Pruning & Quantization)**
|
| 244 |
-
- Quantum-entangled pruning and quantization
|
| 245 |
-
- Nonlinear codebook quantization
|
| 246 |
-
- Entanglement metric-based pruning
|
| 247 |
-
- Compression ratio > 7x
|
| 248 |
-
|
| 249 |
-
#### Integrated Innovative Technologies (3 Items)
|
| 250 |
-
12. **X-Tech Fusion Engine**
|
| 251 |
-
- Cross-technology fusion engine
|
| 252 |
-
- Achieves synergistic effects of 15 core technologies
|
| 253 |
-
- Intelligent fusion of technical outputs
|
| 254 |
-
13. **Progressive Technology Activation**
|
| 255 |
-
- Progressive technology activation
|
| 256 |
-
- Dynamically dispatches technologies based on reasoning depth and complexity
|
| 257 |
-
14. **Unified Trade-off Controller**
|
| 258 |
-
- Unified performance-compression trade-off controller
|
| 259 |
-
- Dynamically adjusts technology weights based on strategy
|
| 260 |
-
|
| 261 |
-
---
|
| 262 |
-
|
| 263 |
-
## 📁 Project Structure
|
| 264 |
-
```
|
| 265 |
-
million_35b/
|
| 266 |
-
├── model/
|
| 267 |
-
│ ├── million_35b_model.py # Main model definition
|
| 268 |
-
│ ├── qeru.py # QERU implementation
|
| 269 |
-
│ ├── reality_anchoring.py # Reality Anchoring implementation
|
| 270 |
-
│ ├── mgo.py # MGO implementation
|
| 271 |
-
│ ├── person_x_memory.py # Person X Memory Engine implementation
|
| 272 |
-
│ ├── ami.py # AMI implementation
|
| 273 |
-
│ ├── sar.py # SAR implementation
|
| 274 |
-
│ ├── dqat.py # DQAT implementation
|
| 275 |
-
│ ├── scrl.py # SCRL implementation
|
| 276 |
-
│ ├── qemc.py # QEMC implementation
|
| 277 |
-
│ ├── qepq.py # QEPQ compression
|
| 278 |
-
│ ├── x_tech_fusion.py # X-Tech Fusion Engine
|
| 279 |
-
│ ├── progressive_activation.py # Progressive Activation
|
| 280 |
-
│ ├── tradeoff_controller.py # Unified Trade-off Controller
|
| 281 |
-
│ ├── vision_perception.py # Vision Perception Module (VPM)
|
| 282 |
-
│ └── hundreds_security/ # Hundreds Security Architecture
|
| 283 |
-
│ ├── hundreds_security_layer.py # HSL integration layer
|
| 284 |
-
│ ├── ztds.py # ZTDS module
|
| 285 |
-
│ ├── qwa.py # QWA module
|
| 286 |
-
│ └── ctm.py # CTM module
|
| 287 |
-
├── model_pytorch/ # PyTorch implementation
|
| 288 |
-
│ └── million_35b_model.py # PyTorch version main model
|
| 289 |
-
├── utils/
|
| 290 |
-
│ └── moe_layer.py # MoE layer implementation
|
| 291 |
-
├── config/
|
| 292 |
-
│ ├── m1_blueprint.json # Model configuration
|
| 293 |
-
│ └── BENCHMARK_REPORT.md # Benchmark test report
|
| 294 |
-
├── train.py # Training script
|
| 295 |
-
├── compress.py # QEPQ compression script
|
| 296 |
-
├── run_evaluation.py # Evaluation script
|
| 297 |
-
└── README.md # This document
|
| 298 |
-
```
|
| 299 |
-
|
| 300 |
-
---
|
| 301 |
-
|
| 302 |
-
## 🔧 Environment Requirements
|
| 303 |
-
### System Requirements
|
| 304 |
-
- Python 3.8+
|
| 305 |
-
- TensorFlow 2.10.0+
|
| 306 |
-
- PyTorch 2.0.0+ (Optional, for PyTorch version)
|
| 307 |
-
- CUDA 11.2+ (GPU training)
|
| 308 |
-
|
| 309 |
-
### Install Dependencies
|
| 310 |
```bash
|
| 311 |
-
|
| 312 |
-
pip install torch
|
| 313 |
-
pip install numpy transformers datasets tabulate gzip pickle
|
| 314 |
```
|
| 315 |
|
| 316 |
-
|
| 317 |
-
|
| 318 |
-
## 💻 Usage
|
| 319 |
-
### 1. Create Model
|
| 320 |
```python
|
| 321 |
-
from
|
| 322 |
-
|
| 323 |
-
model
|
| 324 |
-
|
| 325 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 326 |
```
|
| 327 |
|
| 328 |
-
### 2.
|
| 329 |
-
```
|
| 330 |
-
#
|
| 331 |
-
|
| 332 |
-
|
| 333 |
-
|
| 334 |
-
|
| 335 |
-
|
| 336 |
-
|
| 337 |
-
|
| 338 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 339 |
```
|
| 340 |
|
| 341 |
-
### 3.
|
| 342 |
```python
|
| 343 |
-
|
| 344 |
-
|
| 345 |
-
# Load
|
| 346 |
-
|
| 347 |
-
|
| 348 |
-
|
| 349 |
-
|
| 350 |
-
|
| 351 |
-
|
| 352 |
-
|
| 353 |
-
|
| 354 |
-
|
| 355 |
-
|
| 356 |
-
|
| 357 |
-
|
| 358 |
-
|
| 359 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 360 |
```
|
| 361 |
|
| 362 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 363 |
```bash
|
| 364 |
-
#
|
| 365 |
-
python
|
| 366 |
-
|
| 367 |
-
|
| 368 |
-
|
| 369 |
-
|
| 370 |
-
|
| 371 |
-
|
|
|
|
| 372 |
```
|
| 373 |
|
| 374 |
-
###
|
|
|
|
| 375 |
```bash
|
| 376 |
-
#
|
| 377 |
-
python
|
| 378 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 379 |
```
|
| 380 |
|
| 381 |
-
|
| 382 |
-
|
| 383 |
-
## ⚙️ Configuration Instructions
|
| 384 |
-
### Core Configuration Example (m1_blueprint.json)
|
| 385 |
-
```json
|
| 386 |
-
{
|
| 387 |
-
"model_name": "M1llion-35B",
|
| 388 |
-
"version": "1.0",
|
| 389 |
-
"architecture": "MoE-Transformer",
|
| 390 |
-
"total_parameters": "35B",
|
| 391 |
-
"active_parameters": "7B",
|
| 392 |
-
|
| 393 |
-
"transformer_config": {
|
| 394 |
-
"num_layers": 32,
|
| 395 |
-
"m1_core_dimension": 4096,
|
| 396 |
-
"m1_focus_heads": 32,
|
| 397 |
-
"intermediate_size": 16384,
|
| 398 |
-
"max_position_embeddings": 8192,
|
| 399 |
-
"m1_lexicon_span": 256000,
|
| 400 |
-
"m1_neural_drop": 0.1,
|
| 401 |
-
"layer_norm_epsilon": 1e-6
|
| 402 |
-
},
|
| 403 |
-
|
| 404 |
-
"moe_config": {
|
| 405 |
-
"m1_specialist_count": 8,
|
| 406 |
-
"m1_token_specialists": 2,
|
| 407 |
-
"m1_specialist_core_dim": 4096,
|
| 408 |
-
"m1_router_jitter_noise": 0.01,
|
| 409 |
-
"m1_router_z_loss_coef": 0.001,
|
| 410 |
-
"m1_router_aux_loss_coef": 0.01
|
| 411 |
-
},
|
| 412 |
-
|
| 413 |
-
"qepq_config": {
|
| 414 |
-
"enabled": true,
|
| 415 |
-
"target_compression_ratio": 7.0,
|
| 416 |
-
"m1_nonlinear_codebook_span": 256,
|
| 417 |
-
"m1_quantum_prune_ratio": 0.6,
|
| 418 |
-
"m1_quantum_bits": 2
|
| 419 |
-
},
|
| 420 |
-
|
| 421 |
-
"m1_hundreds_blueprint": {
|
| 422 |
-
"enabled": true,
|
| 423 |
-
"m1_security_master_seed": "SECURE_SEED_FROM_HSM",
|
| 424 |
-
"qwa_sample_rate": 0.005,
|
| 425 |
-
"ctm_threat_threshold_low": 0.7,
|
| 426 |
-
"ctm_threat_threshold_high": 0.95
|
| 427 |
-
},
|
| 428 |
-
|
| 429 |
-
"training_config": {
|
| 430 |
-
"batch_size": 4,
|
| 431 |
-
"gradient_accumulation_steps": 32,
|
| 432 |
-
"learning_rate": 1e-4,
|
| 433 |
-
"warmup_steps": 2000,
|
| 434 |
-
"max_steps": 100000,
|
| 435 |
-
"weight_decay": 0.01
|
| 436 |
-
}
|
| 437 |
-
}
|
| 438 |
-
```
|
| 439 |
-
|
| 440 |
-
### Technology Enable/Disable
|
| 441 |
-
Each core technology can be independently controlled via the `enabled` field in the configuration file:
|
| 442 |
-
```json
|
| 443 |
-
{
|
| 444 |
-
"qeru_config": { "enabled": true },
|
| 445 |
-
"reality_anchoring_config": { "enabled": true },
|
| 446 |
-
"hsa_config": { "enabled": true },
|
| 447 |
-
"qepq_config": { "enabled": true }
|
| 448 |
-
}
|
| 449 |
-
```
|
| 450 |
-
|
| 451 |
-
---
|
| 452 |
-
|
| 453 |
-
## 🧪 Testing
|
| 454 |
-
### Basic Function Testing
|
| 455 |
```bash
|
| 456 |
-
#
|
| 457 |
-
python
|
| 458 |
-
|
| 459 |
-
|
| 460 |
-
|
| 461 |
-
python train.py --test_mode --num_steps 100
|
| 462 |
```
|
| 463 |
|
| 464 |
-
###
|
|
|
|
| 465 |
```bash
|
| 466 |
-
#
|
| 467 |
-
python
|
| 468 |
-
|
| 469 |
-
|
|
|
|
|
|
|
| 470 |
```
|
| 471 |
|
| 472 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 473 |
|
| 474 |
-
##
|
| 475 |
-
|
| 476 |
-
-
|
| 477 |
-
- **Security Applications**: Built-in HSA top-tier security protection, suitable for high-risk scenarios
|
| 478 |
-
- **Multimodal Applications**: Integrates visual perception and tool usage capabilities
|
| 479 |
-
- **Long Text Understanding**: Person X Memory Engine supports long-term memory
|
| 480 |
-
- **Code Generation**: MGO ensures multimodal output consistency
|
| 481 |
-
- **Intelligent Agents**: Screen recognition and autonomous action to replace repetitive operations
|
| 482 |
|
| 483 |
-
|
|
|
|
| 484 |
|
| 485 |
-
|
| 486 |
-
|
| 487 |
-
```bibtex
|
| 488 |
-
@article{m1llion35b2026,
|
| 489 |
-
title={M1llion-35B: Extreme Compression & Full-Stack Intelligent Model},
|
| 490 |
-
author={M1llion AI Team},
|
| 491 |
-
year={2026},
|
| 492 |
-
note={Dual-framework implementation for TensorFlow/PyTorch, integrating 15 core technologies and HSA security architecture}
|
| 493 |
-
}
|
| 494 |
-
```
|
| 495 |
|
| 496 |
-
|
| 497 |
-
|
| 498 |
-
## 📄 License
|
| 499 |
-
This project is for research and learning purposes only. Commercial use requires authorization from the team.
|
| 500 |
-
|
| 501 |
-
---
|
| 502 |
|
| 503 |
## 🤝 Contribution
|
| 504 |
-
|
| 505 |
-
|
| 506 |
-
|
|
|
|
| 507 |
|
| 508 |
-
|
| 509 |
-
For questions, please contact us via GitHub Issues or follow our Hugging Face space for the latest updates.
|
| 510 |
|
| 511 |
-
|
|
|
|
| 512 |
|
| 513 |
## 🙏 Acknowledgments
|
| 514 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 515 |
|
| 516 |
---
|
| 517 |
|
| 518 |
-
**
|
| 519 |
-
**
|
|
|
|
|
|
| 1 |
+
# M1llion-35B
|
| 2 |
+
> **Flagship Model of m1llionAI | Built & Maintained by ArcOffical**
|
| 3 |
+
> *Practical, Efficient, Privacy-First 35B Parameter MoE LLM — Deployable on Consumer Hardware (<10GB)*
|
|
|
|
|
|
|
| 4 |
|
| 5 |
+
[](https://huggingface.co/m1llionAI/M1llion-35B)
|
| 6 |
+
[](https://github.com/M1llion-AI/million-35b)
|
| 7 |
+
[](#license)
|
| 8 |
|
| 9 |
+
## 🚀 Quick Overview
|
| 10 |
+
M1llion-35B is a state-of-the-art **35 billion parameter Mixture-of-Experts (MoE) multimodal large language model** designed and built exclusively by ArcOffical, under the m1llionAI Hugging Face organization. It redefines accessible high-performance AI by balancing enterprise-grade capabilities with edge-deployable efficiency—all while prioritizing user privacy and data security.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
+
Unlike traditional 35B+ parameter models that require cloud infrastructure or high-end GPUs, M1llion-35B can be deployed on consumer hardware (**<10GB storage** via QEPQ compression) with minimal performance loss (<0.1%) and a industry-leading low hallucination rate (<1.2%).
|
| 13 |
|
| 14 |
+
### Key Model Specifications at a Glance
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
| Specification | Details |
|
| 16 |
|:---|:---|
|
| 17 |
+
| Total Parameters | ~35 Billion (multimodal MoE) |
|
| 18 |
+
| Active Parameters | ~7 Billion (per-token inference) |
|
| 19 |
+
| Deployment Size | <10 GB (QEPQ Quantum-Entangled Compression) |
|
|
|
|
|
|
|
| 20 |
| Context Window | 8192 tokens |
|
| 21 |
+
| Vocabulary Size | 256,000 (multilingual) |
|
| 22 |
+
| Hallucination Rate | <1.2% (Reality Anchoring Technology) |
|
| 23 |
+
| Framework Support | TensorFlow 2.x / PyTorch 2.x |
|
| 24 |
+
| Deployment Type | Local/Edge (no cloud dependency) |
|
| 25 |
| Security Architecture | Hundreds Security Architecture (HSA) |
|
| 26 |
+
| Multimodal Support | Text, Image, Video, Audio + Screen Recognition |
|
|
|
|
|
|
|
| 27 |
|
| 28 |
+
## 🌟 Key Highlights
|
| 29 |
+
1. **Extreme Edge Efficiency**: 7x compression ratio via QEPQ technology, enabling <10GB deployment on consumer laptops/desktops—no cloud or high-end GPU required.
|
| 30 |
+
2. **Privacy-First by Design**: Runs entirely on local devices; no user data is transmitted to servers, and all memory/habit learning is stored and processed offline.
|
| 31 |
+
3. **Low Hallucination & High Reliability**: Powered by Reality Anchoring, achieving <1.2% hallucination rate for factual reasoning, making it suitable for technical and decision-critical tasks.
|
| 32 |
+
4. **Full-Stack Multimodal Agent**: Integrates VisionPerceptionModule (VPM) for screen recognition, autonomous UI actions (clicks, scrolls), and emotion-aware dialogue.
|
| 33 |
+
5. **Top-Tier Security**: Built-in Hundreds Security Architecture (HSA) to mitigate prompt injection, model tampering, and data leaks during inference.
|
| 34 |
+
6. **Open-Source & Customizable**: Dual-framework support, full pre-training/finetuning pipelines, and open-source compression tools for developer customization.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
+
## 👤 Creator & Maintainer
|
| 37 |
+
**ArcOffical** is the sole founding author, lead developer, and core maintainer of M1llion-35B. With deep expertise in MoE architecture design, extreme model compression, and multimodal agent development, ArcOffical led the entire lifecycle of this model—from initial prototyping and curriculum pre-training to proprietary technology integration and open-source deployment.
|
| 38 |
|
| 39 |
+
This model is a flagship project of **m1llionAI** (a Hugging Face organization dedicated to accessible, privacy-first edge AI), where ArcOffical drives the mission to democratize cutting-edge LLM technology for all users.
|
| 40 |
|
| 41 |
+
## 🚦 Quick Start (Hugging Face Transformers)
|
| 42 |
+
Get up and running with M1llion-35B in minutes using the Hugging Face `transformers` library.
|
| 43 |
|
| 44 |
+
### Prerequisites
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
```bash
|
| 46 |
+
# Install required dependencies
|
| 47 |
+
pip install transformers>=4.36.0 torch>=2.0.0 accelerate>=0.25.0 pillow>=10.0.0
|
|
|
|
| 48 |
```
|
| 49 |
|
| 50 |
+
### 1. Load the Model & Tokenizer
|
|
|
|
|
|
|
|
|
|
| 51 |
```python
|
| 52 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 53 |
+
|
| 54 |
+
# Load pre-trained model and tokenizer from Hugging Face Hub
|
| 55 |
+
model_name = "m1llionAI/M1llion-35B"
|
| 56 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 57 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 58 |
+
model_name,
|
| 59 |
+
device_map="auto", # Automatically assign layers to available hardware
|
| 60 |
+
load_in_8bit=True, # Enable 8-bit inference for edge efficiency (optional)
|
| 61 |
+
trust_remote_code=True # Required for custom MoE and VPM modules
|
| 62 |
+
)
|
| 63 |
```
|
| 64 |
|
| 65 |
+
### 2. Text Inference Example
|
| 66 |
+
```python
|
| 67 |
+
# Sample prompt (supports conversational and instruction-based inputs)
|
| 68 |
+
prompt = """
|
| 69 |
+
You are a helpful, privacy-first AI assistant running on local hardware.
|
| 70 |
+
Explain the key benefits of M1llion-35B in simple terms.
|
| 71 |
+
"""
|
| 72 |
+
|
| 73 |
+
# Tokenize input
|
| 74 |
+
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
| 75 |
+
|
| 76 |
+
# Generate output (configure parameters for efficiency and quality)
|
| 77 |
+
outputs = model.generate(
|
| 78 |
+
**inputs,
|
| 79 |
+
max_new_tokens=200,
|
| 80 |
+
temperature=0.7,
|
| 81 |
+
top_p=0.95,
|
| 82 |
+
do_sample=True,
|
| 83 |
+
pad_token_id=tokenizer.eos_token_id
|
| 84 |
+
)
|
| 85 |
+
|
| 86 |
+
# Decode and print result
|
| 87 |
+
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 88 |
+
print("M1llion-35B Response:\n", response)
|
| 89 |
```
|
| 90 |
|
| 91 |
+
### 3. Multimodal (Image + Text) Inference Example
|
| 92 |
```python
|
| 93 |
+
from PIL import Image
|
| 94 |
+
|
| 95 |
+
# Load sample image (screen capture, photo, or document)
|
| 96 |
+
image_path = "sample_screen.png"
|
| 97 |
+
image = Image.open(image_path).convert("RGB")
|
| 98 |
+
|
| 99 |
+
# Multimodal prompt (ask the model to analyze the screen image)
|
| 100 |
+
multimodal_prompt = """
|
| 101 |
+
Analyze the attached screen image and list the key UI elements you can identify.
|
| 102 |
+
Suggest one simple action to complete the most obvious task on the screen.
|
| 103 |
+
"""
|
| 104 |
+
|
| 105 |
+
# Tokenize text and process image (custom multimodal pipeline)
|
| 106 |
+
multimodal_inputs = tokenizer(
|
| 107 |
+
multimodal_prompt,
|
| 108 |
+
images=image, # Custom parameter for VPM integration
|
| 109 |
+
return_tensors="pt"
|
| 110 |
+
).to(model.device)
|
| 111 |
+
|
| 112 |
+
# Generate multimodal response
|
| 113 |
+
multimodal_outputs = model.generate(
|
| 114 |
+
**multimodal_inputs,
|
| 115 |
+
max_new_tokens=300,
|
| 116 |
+
temperature=0.6,
|
| 117 |
+
top_p=0.9
|
| 118 |
+
)
|
| 119 |
+
|
| 120 |
+
# Decode and print result
|
| 121 |
+
multimodal_response = tokenizer.decode(multimodal_outputs[0], skip_special_tokens=True)
|
| 122 |
+
print("M1llion-35B Multimodal Response:\n", multimodal_response)
|
| 123 |
```
|
| 124 |
|
| 125 |
+
## 📊 Model Details
|
| 126 |
+
### Architecture
|
| 127 |
+
M1llion-35B adopts a **decoder-only MoE Transformer architecture** with the following core components:
|
| 128 |
+
- 32 Transformer layers with 4096 hidden dimension
|
| 129 |
+
- 8 total experts (2 activated per token) for sparse efficiency
|
| 130 |
+
- Grouped-Query Attention (32 heads) for memory-efficient long-context modeling
|
| 131 |
+
- Rotary Positional Embeddings (RoPE) for 8k+ token context support
|
| 132 |
+
- Custom VisionPerceptionModule (VPM) for cross-modal fusion
|
| 133 |
+
|
| 134 |
+
### Pre-Training
|
| 135 |
+
- **Curriculum**: 4-stage multi-modal pre-training (foundation knowledge → context extension → advanced reasoning → high-quality annealing)
|
| 136 |
+
- **Token Count**: 15 trillion total tokens (multilingual text, code, mathematics, visual data)
|
| 137 |
+
- **Data Sources**: mOSCAR, Maya-LLaVA-Pretrain, OpenAssistant/oasst1, and curated screen UI datasets
|
| 138 |
+
|
| 139 |
+
### Fine-Tuning
|
| 140 |
+
- **Supervised Fine-Tuning (SFT)**: 3-stage text + 4-stage multimodal fine-tuning for human alignment
|
| 141 |
+
- **Reinforcement Learning (RL)**: RLHF for harmlessness/usefulness + agent RL for autonomous action capability
|
| 142 |
+
- **Privacy-Preserving Fine-Tuning (PPFT)**: Support for on-device custom fine-tuning without data leakage
|
| 143 |
+
|
| 144 |
+
### Compression Technology (QEPQ)
|
| 145 |
+
M1llion-35B's extreme compression is powered by **QEPQ (Quantum-Entangled Pruning & Quantization)**:
|
| 146 |
+
- 2-bit nonlinear codebook quantization for weight compression
|
| 147 |
+
- 60% pruning of non-critical weights based on quantum entanglement metrics
|
| 148 |
+
- Gzip secondary compression for additional storage savings
|
| 149 |
+
- <0.1% performance loss compared to full FP16 model
|
| 150 |
+
|
| 151 |
+
## 📈 Benchmark Results
|
| 152 |
+
M1llion-35B achieves competitive performance across text, multimodal, and agent benchmarks—while maintaining edge-deployable efficiency.
|
| 153 |
+
|
| 154 |
+
### Key Performance Highlights
|
| 155 |
+
| Benchmark Category | Metrics (M1llion-35B) |
|
| 156 |
+
|:---|:---|
|
| 157 |
+
| **English Text Reasoning** | MMLU: 87.7, PIQA: 76.7, GSM8K: 89.2, MT-Bench: 8.6/10 |
|
| 158 |
+
| **Korean Text Reasoning** | KMMLU: 71.3, HAERAE Bench 1.0: 87.4, KoBALT: 50.6 |
|
| 159 |
+
| **Multimodal (Vision-Text)** | KoNET: 75.1, K-MMBench: 88.1, TextVQA: 85.4 |
|
| 160 |
+
| **Intelligent Agent** | Tau2-Airline: 58.0, Tau2-Retail: 71.6, Terminal Bench: 21.8 |
|
| 161 |
+
| **Efficiency** | Inference Latency (8k tokens): 150ms (consumer GPU), 450ms (consumer CPU) |
|
| 162 |
+
|
| 163 |
+
### Deployment Efficiency Comparison
|
| 164 |
+
| Configuration | Model Size | Performance Loss | Supported Hardware |
|
| 165 |
+
|:---|:---|:---|:---|
|
| 166 |
+
| FP16 (Baseline) | ~70 GB | 0.0% | High-end enterprise GPU |
|
| 167 |
+
| FP8 (Traditional) | ~35 GB | 0.5% | Mid-range GPU |
|
| 168 |
+
| QEPQ Compression (2-bit) | <10 GB | <0.1% | Consumer GPU/CPU/laptops |
|
| 169 |
+
|
| 170 |
+
## 🛠️ Advanced Usage Guides
|
| 171 |
+
### 1. Local Model Training
|
| 172 |
+
Use the official training script to fine-tune M1llion-35B on custom datasets (on-device, no cloud):
|
| 173 |
```bash
|
| 174 |
+
# Fine-tune M1llion-35B on custom instruction data (test mode first)
|
| 175 |
+
python train.py \
|
| 176 |
+
--model_path ./local/m1llion-35b \
|
| 177 |
+
--dataset_path ./custom_datasets/instruction_data.json \
|
| 178 |
+
--output_dir ./fine_tuned_model \
|
| 179 |
+
--num_steps 5000 \
|
| 180 |
+
--batch_size 2 \
|
| 181 |
+
--gradient_accumulation_steps 16 \
|
| 182 |
+
--test_mode
|
| 183 |
```
|
| 184 |
|
| 185 |
+
### 2. QEPQ Model Compression
|
| 186 |
+
Compress the full model to edge-ready <10GB size using the official compression toolkit:
|
| 187 |
```bash
|
| 188 |
+
# Compress full M1llion-35B model to edge-ready format
|
| 189 |
+
python compress.py \
|
| 190 |
+
--mode compress \
|
| 191 |
+
--model_path ./full_m1llion_35b \
|
| 192 |
+
--output_path ./m1llion_35b_edge \
|
| 193 |
+
--compression_level qepq_2bit \
|
| 194 |
+
--preserve_multimodal
|
| 195 |
```
|
| 196 |
|
| 197 |
+
### 3. Run Benchmark Evaluations
|
| 198 |
+
Generate a detailed benchmark report for custom model variants:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 199 |
```bash
|
| 200 |
+
# Evaluate fine-tuned/compressed model against industry benchmarks
|
| 201 |
+
python run_evaluation.py \
|
| 202 |
+
--model_path ./m1llion_35b_edge \
|
| 203 |
+
--benchmarks mmlu,gsm8k,mt_bench \
|
| 204 |
+
--output_report ./benchmark_results.md
|
|
|
|
| 205 |
```
|
| 206 |
|
| 207 |
+
### 4. Edge Deployment (Consumer Laptop/CPU)
|
| 208 |
+
Deploy the compressed M1llion-35B model on a consumer laptop (no GPU required):
|
| 209 |
```bash
|
| 210 |
+
# Load edge model and run local inference server
|
| 211 |
+
python deploy_edge.py \
|
| 212 |
+
--compressed_model_path ./m1llion_35b_edge \
|
| 213 |
+
--port 8080 \
|
| 214 |
+
--device cpu \
|
| 215 |
+
--enable_multimodal
|
| 216 |
```
|
| 217 |
|
| 218 |
+
## ⚙️ Configuration
|
| 219 |
+
Core model parameters can be customized via the `m1_blueprint.json` configuration file (included in the GitHub repository), including:
|
| 220 |
+
- MoE expert count and routing parameters
|
| 221 |
+
- QEPQ compression level
|
| 222 |
+
- HSA security settings (threat detection thresholds)
|
| 223 |
+
- Multimodal VPM resolution and processing limits
|
| 224 |
+
- Training/finetuning hyperparameters
|
| 225 |
|
| 226 |
+
## ❓ FAQs
|
| 227 |
+
1. **Q: Can I deploy M1llion-35B on my personal laptop?**
|
| 228 |
+
A: Yes! The QEPQ-compressed variant (<10GB) runs on most modern laptops (8GB+ RAM, 4-core+ CPU, or integrated GPU).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 229 |
|
| 230 |
+
2. **Q: Is M1llion-35B suitable for commercial use?**
|
| 231 |
+
A: No. This model is for **research and non-commercial use only**. Commercial authorization requires direct contact with ArcOffical/m1llionAI.
|
| 232 |
|
| 233 |
+
3. **Q: What are the "surprise hidden features" mentioned in the launch announcement?**
|
| 234 |
+
A: Hidden features (unveiled on February 14, 2026) include cross-device local AI synchronization and advanced SWE agent capabilities—stay tuned to the m1llionAI Hugging Face organization for updates.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 235 |
|
| 236 |
+
4. **Q: How do I report bugs or request features?**
|
| 237 |
+
A: Submit issues via the m1llionAI company in hugging face or comment on the M1llion-35B Hugging Face model page (monitored by ArcOffical).
|
|
|
|
|
|
|
|
|
|
|
|
|
| 238 |
|
| 239 |
## 🤝 Contribution
|
| 240 |
+
m1llionAI and ArcOffical welcome community contributions to M1llion-35B! To contribute:
|
| 241 |
+
1. Fork the m1llion ai company organization for hiring
|
| 242 |
+
2. Submit a Pull Request with detailed descriptions of your changes (model optimization, benchmarking, bug fixes, etc.)
|
| 243 |
+
3. Adhere to the project's code style and privacy-first design principles
|
| 244 |
|
| 245 |
+
All contributions will be reviewed by ArcOffical and integrated into the main model branch if aligned with the project's mission.
|
|
|
|
| 246 |
|
| 247 |
+
## 📄 License
|
| 248 |
+
M1llion-35B is licensed for **non-commercial research and learning use only**. Commercial use, redistribution, or modification for commercial purposes is prohibited without prior written authorization from ArcOffical and m1llionAI.
|
| 249 |
|
| 250 |
## 🙏 Acknowledgments
|
| 251 |
+
- ArcOffical for the full design, development, and maintenance of M1llion-35B
|
| 252 |
+
- Collaboration teams (pure-team, cogent-ai, Arc4, neo-ai-team) for technical insights and dataset curation
|
| 253 |
+
- Hugging Face for providing the open-source ecosystem to democratize AI access
|
| 254 |
+
- The broader LLM community for advances in MoE architecture, compression, and multimodal AI
|
| 255 |
+
|
| 256 |
+
## 📧 Contact
|
| 257 |
+
- **Core Maintainer (ArcOffical)**: Accessible via the [M1llion-35B Hugging Face Model Discussions](https://huggingface.co/m1llionAI/M1llion-35B/discussions)
|
| 258 |
+
- **m1llionAI Organization**: [https://huggingface.co/m1llionAI](https://huggingface.co/m1llionAI)
|
| 259 |
+
- **GitHub Repository**: [https://github.com/M1llion-AI/million-35b](https://github.com/ArcOffical/million-35b)
|
| 260 |
|
| 261 |
---
|
| 262 |
|
| 263 |
+
**Release Date**: February 14, 2026 (UTC+8)
|
| 264 |
+
**Last Updated**: January 9, 2026
|
| 265 |
+
*Built by ArcOffical | m1llionAI | Privacy-First, Edge-Ready, Future-Proof AI*
|