File size: 8,739 Bytes
48a1ad4 b077766 021e2fb 48a1ad4 4cb9bcf 48a1ad4 4cb9bcf 48a1ad4 4cb9bcf 48a1ad4 d36bcd8 48a1ad4 4cb9bcf 48a1ad4 4cb9bcf d36bcd8 4cb9bcf 48a1ad4 4cb9bcf 48a1ad4 4cb9bcf 48a1ad4 4cb9bcf 48a1ad4 4cb9bcf 48a1ad4 4cb9bcf 48a1ad4 4cb9bcf 48a1ad4 4cb9bcf 48a1ad4 4cb9bcf 48a1ad4 4cb9bcf 48a1ad4 4cb9bcf 48a1ad4 4cb9bcf 48a1ad4 4cb9bcf 48a1ad4 4cb9bcf 48a1ad4 4cb9bcf 48a1ad4 4cb9bcf 48a1ad4 4cb9bcf 48a1ad4 4cb9bcf 48a1ad4 4cb9bcf 48a1ad4 4cb9bcf 48a1ad4 4cb9bcf d36bcd8 48a1ad4 4cb9bcf d36bcd8 4cb9bcf 48a1ad4 4cb9bcf 48a1ad4 4cb9bcf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 |
---
license: apache-2.0
datasets:
- HuggingFaceFW/finewiki
metrics:
- accuracy
base_model:
- PaddlePaddle/PaddleOCR-VL
new_version: OpenTrouter/Trouter-Terminus-20b
pipeline_tag: text-generation
library_name: adapter-transformers
tags:
- agent
- code
---
# Trouter-20B
<div align="center">




*A powerful 20 billion parameter language model for advanced natural language processing*
[π€ Model Card](https://huggingface.co/Trouter-Library/Trouter-20B) | [π Documentation](./USAGE_GUIDE.md)
</div>
---
## π Table of Contents
- [Overview](#overview)
- [Key Features](#key-features)
- [Quick Start](#quick-start)
- [Model Details](#model-details)
- [Performance](#performance)
- [Use Cases](#use-cases)
- [System Requirements](#system-requirements)
- [Training Details](#training-details)
- [Limitations & Bias](#limitations--bias)
- [License](#license)
- [Citation](#citation)
- [Acknowledgments](#acknowledgments)
## π― Overview
Trouter-20B is a state-of-the-art decoder-only transformer language model with 20 billion parameters. Designed for versatility and performance, it excels at a wide range of natural language understanding and generation tasks including reasoning, question answering, creative writing, code generation, and conversational AI.
## β¨ Key Features
- **20B Parameters**: Optimal balance between performance and computational efficiency
- **4K Context Length**: Process and generate longer sequences with 4096 token context window
- **Apache 2.0 License**: Fully open for commercial and research use
- **Optimized Architecture**: Efficient attention mechanisms with GQA (Grouped Query Attention)
- **Multi-lingual Capable**: Strong performance on English with support for multiple languages
- **Quantization Ready**: Compatible with 8-bit and 4-bit quantization for reduced memory footprint
- **Chat Optimized**: Built-in chat template for conversational applications
## π Quick Start
### Installation
```bash
pip install transformers>=4.38.0 torch>=2.0.0 accelerate bitsandbytes
```
### Basic Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model_id = "Trouter-Library/Trouter-20B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Generate text
prompt = "Explain the concept of neural networks:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### Memory-Efficient Loading (4-bit)
```python
from transformers import BitsAndBytesConfig
# Configure 4-bit quantization
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
device_map="auto"
)
```
For more detailed usage examples, see the [Usage Guide](./USAGE_GUIDE.md).
## π Model Details
| Specification | Value |
|--------------|-------|
| **Parameters** | 20 billion |
| **Architecture** | Decoder-only Transformer |
| **Layers** | 48 |
| **Hidden Size** | 5120 |
| **Attention Heads** | 40 (8 KV heads with GQA) |
| **Context Length** | 4096 tokens |
| **Vocabulary Size** | 32,000 tokens |
| **Activation** | SiLU (Swish) |
| **Positional Encoding** | RoPE (Rotary Position Embedding) |
| **Normalization** | RMSNorm |
| **Precision** | BFloat16 |
## π Performance
### Benchmark Results
| Benchmark | Score | Notes |
|-----------|-------|-------|
| MMLU (5-shot) | TBD | Multitask Language Understanding |
| HellaSwag | TBD | Commonsense Reasoning |
| TruthfulQA | TBD | Truthfulness & Accuracy |
| HumanEval | TBD | Code Generation |
| GSM8K | TBD | Mathematical Reasoning |
| BBH | TBD | Big Bench Hard |
*Benchmarks to be updated after comprehensive evaluation*
### Inference Speed
| Configuration | Tokens/Second | Memory Usage |
|--------------|---------------|--------------|
| BF16 (A100 80GB) | ~XX tokens/s | ~40GB |
| 8-bit (A100 40GB) | ~XX tokens/s | ~20GB |
| 4-bit (RTX 4090) | ~XX tokens/s | ~10GB |
## π‘ Use Cases
### β
Recommended Uses
- **Text Generation**: Articles, stories, creative writing
- **Question Answering**: Information retrieval and explanation
- **Code Assistance**: Code completion, debugging, explanation
- **Summarization**: Document and conversation summarization
- **Translation**: Multi-language translation tasks
- **Dialogue Systems**: Chatbots and conversational AI
- **Content Analysis**: Sentiment analysis, classification
- **Educational Tools**: Tutoring and learning assistance
### β οΈ Limitations
- May generate incorrect or nonsensical information (hallucinations)
- Not suitable for high-stakes decision making without human oversight
- Performance may vary on specialized or domain-specific tasks
- Requires careful prompt engineering for optimal results
- May reflect biases present in training data
### β Out of Scope
- Real-time medical diagnosis or treatment recommendations
- Legal advice or binding interpretations
- Financial investment decisions
- Safety-critical systems without human verification
- Generating harmful, illegal, or unethical content
## π» System Requirements
### Minimum Requirements
- **GPU**: 24GB VRAM (with 4-bit quantization)
- **RAM**: 32GB system memory
- **Storage**: 50GB free space
- **CUDA**: 11.8 or higher
### Recommended Specifications
- **GPU**: A100 (40GB/80GB) or H100
- **RAM**: 64GB+ system memory
- **Storage**: 100GB+ SSD
- **Multi-GPU**: Supported via `device_map="auto"`
## ποΈ Training Details
### Training Data
Trouter-20B was trained on a diverse corpus of high-quality text data including:
- Web documents and articles
- Books and academic papers
- Code repositories
- Conversational data
- Multilingual text
**Total Training Tokens**: [Specify total tokens]
**Data Mix**: [Provide breakdown of data sources]
**Cutoff Date**: January 2025
### Training Infrastructure
- **Framework**: PyTorch 2.0+ with FSDP
- **Hardware**: [Specify GPU cluster details]
- **Training Time**: [Specify duration]
- **Optimizer**: AdamW
- **Learning Rate**: Cosine schedule with warmup
- **Batch Size**: [Specify effective batch size]
- **Sequence Length**: 4096 tokens
### Training Objective
Causal language modeling with next-token prediction using cross-entropy loss.
## βοΈ Limitations & Bias
### Known Limitations
1. **Hallucinations**: May generate plausible-sounding but incorrect information
2. **Temporal Knowledge**: Training data cutoff is January 2025
3. **Mathematical Reasoning**: May struggle with complex multi-step calculations
4. **Multilingual Performance**: Optimized for English; other languages may have reduced quality
5. **Context Window**: Limited to 4096 tokens
### Bias Considerations
Like all large language models, Trouter-20B may exhibit biases including:
- Gender, racial, and cultural biases from training data
- Western/English-centric perspective
- Potential stereotyping in generated content
**Mitigation Efforts**: We encourage users to:
- Implement appropriate content filtering
- Use diverse evaluation datasets
- Apply bias detection tools
- Provide human oversight for production deployments
## π License
Trouter-20B is released under the **Apache 2.0 License**. You are free to:
β
Use commercially
β
Modify and distribute
β
Use privately
β
Use for patent purposes
See [LICENSE](./LICENSE) file for full terms.
## π Citation
If you use Trouter-20B in your research or applications, please cite:
```bibtex
@software{trouter20b2025,
title={Trouter-20B: A 20 Billion Parameter Language Model},
author={Trouter-Library},
year={2025},
month={10},
url={https://huggingface.co/Trouter-Library/Trouter-20B},
version={1.0},
license={Apache-2.0}
}
```
## π Acknowledgments
We thank the open-source community and the following projects that made this work possible:
- [Hugging Face Transformers](https://github.com/huggingface/transformers)
- [PyTorch](https://pytorch.org/)
- [LLaMA](https://ai.meta.com/llama/) architecture inspiration
- [EleutherAI](https://www.eleuther.ai/) for evaluation frameworks
---
<div align="center">
**Built with β€οΈ for the AI community**
[β¬ Back to Top](#trouter-20b)
</div> |