|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- HuggingFaceFW/finewiki |
|
|
metrics: |
|
|
- accuracy |
|
|
base_model: |
|
|
- PaddlePaddle/PaddleOCR-VL |
|
|
new_version: OpenTrouter/Trouter-Terminus-20b |
|
|
pipeline_tag: text-generation |
|
|
library_name: adapter-transformers |
|
|
tags: |
|
|
- agent |
|
|
- code |
|
|
--- |
|
|
# Trouter-20B |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
 |
|
|
 |
|
|
 |
|
|
 |
|
|
|
|
|
*A powerful 20 billion parameter language model for advanced natural language processing* |
|
|
|
|
|
[π€ Model Card](https://huggingface.co/Trouter-Library/Trouter-20B) | [π Documentation](./USAGE_GUIDE.md) |
|
|
|
|
|
</div> |
|
|
|
|
|
--- |
|
|
|
|
|
## π Table of Contents |
|
|
|
|
|
- [Overview](#overview) |
|
|
- [Key Features](#key-features) |
|
|
- [Quick Start](#quick-start) |
|
|
- [Model Details](#model-details) |
|
|
- [Performance](#performance) |
|
|
- [Use Cases](#use-cases) |
|
|
- [System Requirements](#system-requirements) |
|
|
- [Training Details](#training-details) |
|
|
- [Limitations & Bias](#limitations--bias) |
|
|
- [License](#license) |
|
|
- [Citation](#citation) |
|
|
- [Acknowledgments](#acknowledgments) |
|
|
|
|
|
## π― Overview |
|
|
|
|
|
Trouter-20B is a state-of-the-art decoder-only transformer language model with 20 billion parameters. Designed for versatility and performance, it excels at a wide range of natural language understanding and generation tasks including reasoning, question answering, creative writing, code generation, and conversational AI. |
|
|
|
|
|
## β¨ Key Features |
|
|
|
|
|
- **20B Parameters**: Optimal balance between performance and computational efficiency |
|
|
- **4K Context Length**: Process and generate longer sequences with 4096 token context window |
|
|
- **Apache 2.0 License**: Fully open for commercial and research use |
|
|
- **Optimized Architecture**: Efficient attention mechanisms with GQA (Grouped Query Attention) |
|
|
- **Multi-lingual Capable**: Strong performance on English with support for multiple languages |
|
|
- **Quantization Ready**: Compatible with 8-bit and 4-bit quantization for reduced memory footprint |
|
|
- **Chat Optimized**: Built-in chat template for conversational applications |
|
|
|
|
|
## π Quick Start |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
pip install transformers>=4.38.0 torch>=2.0.0 accelerate bitsandbytes |
|
|
``` |
|
|
|
|
|
### Basic Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
model_id = "Trouter-Library/Trouter-20B" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_id, |
|
|
torch_dtype=torch.bfloat16, |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
# Generate text |
|
|
prompt = "Explain the concept of neural networks:" |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
### Memory-Efficient Loading (4-bit) |
|
|
|
|
|
```python |
|
|
from transformers import BitsAndBytesConfig |
|
|
|
|
|
# Configure 4-bit quantization |
|
|
bnb_config = BitsAndBytesConfig( |
|
|
load_in_4bit=True, |
|
|
bnb_4bit_quant_type="nf4", |
|
|
bnb_4bit_compute_dtype=torch.bfloat16 |
|
|
) |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_id, |
|
|
quantization_config=bnb_config, |
|
|
device_map="auto" |
|
|
) |
|
|
``` |
|
|
|
|
|
For more detailed usage examples, see the [Usage Guide](./USAGE_GUIDE.md). |
|
|
|
|
|
## π Model Details |
|
|
|
|
|
| Specification | Value | |
|
|
|--------------|-------| |
|
|
| **Parameters** | 20 billion | |
|
|
| **Architecture** | Decoder-only Transformer | |
|
|
| **Layers** | 48 | |
|
|
| **Hidden Size** | 5120 | |
|
|
| **Attention Heads** | 40 (8 KV heads with GQA) | |
|
|
| **Context Length** | 4096 tokens | |
|
|
| **Vocabulary Size** | 32,000 tokens | |
|
|
| **Activation** | SiLU (Swish) | |
|
|
| **Positional Encoding** | RoPE (Rotary Position Embedding) | |
|
|
| **Normalization** | RMSNorm | |
|
|
| **Precision** | BFloat16 | |
|
|
|
|
|
## π Performance |
|
|
|
|
|
### Benchmark Results |
|
|
|
|
|
| Benchmark | Score | Notes | |
|
|
|-----------|-------|-------| |
|
|
| MMLU (5-shot) | TBD | Multitask Language Understanding | |
|
|
| HellaSwag | TBD | Commonsense Reasoning | |
|
|
| TruthfulQA | TBD | Truthfulness & Accuracy | |
|
|
| HumanEval | TBD | Code Generation | |
|
|
| GSM8K | TBD | Mathematical Reasoning | |
|
|
| BBH | TBD | Big Bench Hard | |
|
|
|
|
|
*Benchmarks to be updated after comprehensive evaluation* |
|
|
|
|
|
### Inference Speed |
|
|
|
|
|
| Configuration | Tokens/Second | Memory Usage | |
|
|
|--------------|---------------|--------------| |
|
|
| BF16 (A100 80GB) | ~XX tokens/s | ~40GB | |
|
|
| 8-bit (A100 40GB) | ~XX tokens/s | ~20GB | |
|
|
| 4-bit (RTX 4090) | ~XX tokens/s | ~10GB | |
|
|
|
|
|
## π‘ Use Cases |
|
|
|
|
|
### β
Recommended Uses |
|
|
|
|
|
- **Text Generation**: Articles, stories, creative writing |
|
|
- **Question Answering**: Information retrieval and explanation |
|
|
- **Code Assistance**: Code completion, debugging, explanation |
|
|
- **Summarization**: Document and conversation summarization |
|
|
- **Translation**: Multi-language translation tasks |
|
|
- **Dialogue Systems**: Chatbots and conversational AI |
|
|
- **Content Analysis**: Sentiment analysis, classification |
|
|
- **Educational Tools**: Tutoring and learning assistance |
|
|
|
|
|
### β οΈ Limitations |
|
|
|
|
|
- May generate incorrect or nonsensical information (hallucinations) |
|
|
- Not suitable for high-stakes decision making without human oversight |
|
|
- Performance may vary on specialized or domain-specific tasks |
|
|
- Requires careful prompt engineering for optimal results |
|
|
- May reflect biases present in training data |
|
|
|
|
|
### β Out of Scope |
|
|
|
|
|
- Real-time medical diagnosis or treatment recommendations |
|
|
- Legal advice or binding interpretations |
|
|
- Financial investment decisions |
|
|
- Safety-critical systems without human verification |
|
|
- Generating harmful, illegal, or unethical content |
|
|
|
|
|
## π» System Requirements |
|
|
|
|
|
### Minimum Requirements |
|
|
|
|
|
- **GPU**: 24GB VRAM (with 4-bit quantization) |
|
|
- **RAM**: 32GB system memory |
|
|
- **Storage**: 50GB free space |
|
|
- **CUDA**: 11.8 or higher |
|
|
|
|
|
### Recommended Specifications |
|
|
|
|
|
- **GPU**: A100 (40GB/80GB) or H100 |
|
|
- **RAM**: 64GB+ system memory |
|
|
- **Storage**: 100GB+ SSD |
|
|
- **Multi-GPU**: Supported via `device_map="auto"` |
|
|
|
|
|
## ποΈ Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
Trouter-20B was trained on a diverse corpus of high-quality text data including: |
|
|
|
|
|
- Web documents and articles |
|
|
- Books and academic papers |
|
|
- Code repositories |
|
|
- Conversational data |
|
|
- Multilingual text |
|
|
|
|
|
**Total Training Tokens**: [Specify total tokens] |
|
|
**Data Mix**: [Provide breakdown of data sources] |
|
|
**Cutoff Date**: January 2025 |
|
|
|
|
|
### Training Infrastructure |
|
|
|
|
|
- **Framework**: PyTorch 2.0+ with FSDP |
|
|
- **Hardware**: [Specify GPU cluster details] |
|
|
- **Training Time**: [Specify duration] |
|
|
- **Optimizer**: AdamW |
|
|
- **Learning Rate**: Cosine schedule with warmup |
|
|
- **Batch Size**: [Specify effective batch size] |
|
|
- **Sequence Length**: 4096 tokens |
|
|
|
|
|
### Training Objective |
|
|
|
|
|
Causal language modeling with next-token prediction using cross-entropy loss. |
|
|
|
|
|
## βοΈ Limitations & Bias |
|
|
|
|
|
### Known Limitations |
|
|
|
|
|
1. **Hallucinations**: May generate plausible-sounding but incorrect information |
|
|
2. **Temporal Knowledge**: Training data cutoff is January 2025 |
|
|
3. **Mathematical Reasoning**: May struggle with complex multi-step calculations |
|
|
4. **Multilingual Performance**: Optimized for English; other languages may have reduced quality |
|
|
5. **Context Window**: Limited to 4096 tokens |
|
|
|
|
|
### Bias Considerations |
|
|
|
|
|
Like all large language models, Trouter-20B may exhibit biases including: |
|
|
|
|
|
- Gender, racial, and cultural biases from training data |
|
|
- Western/English-centric perspective |
|
|
- Potential stereotyping in generated content |
|
|
|
|
|
**Mitigation Efforts**: We encourage users to: |
|
|
- Implement appropriate content filtering |
|
|
- Use diverse evaluation datasets |
|
|
- Apply bias detection tools |
|
|
- Provide human oversight for production deployments |
|
|
|
|
|
## π License |
|
|
|
|
|
Trouter-20B is released under the **Apache 2.0 License**. You are free to: |
|
|
|
|
|
β
Use commercially |
|
|
β
Modify and distribute |
|
|
β
Use privately |
|
|
β
Use for patent purposes |
|
|
|
|
|
See [LICENSE](./LICENSE) file for full terms. |
|
|
|
|
|
## π Citation |
|
|
|
|
|
If you use Trouter-20B in your research or applications, please cite: |
|
|
|
|
|
```bibtex |
|
|
@software{trouter20b2025, |
|
|
title={Trouter-20B: A 20 Billion Parameter Language Model}, |
|
|
author={Trouter-Library}, |
|
|
year={2025}, |
|
|
month={10}, |
|
|
url={https://huggingface.co/Trouter-Library/Trouter-20B}, |
|
|
version={1.0}, |
|
|
license={Apache-2.0} |
|
|
} |
|
|
``` |
|
|
|
|
|
## π Acknowledgments |
|
|
|
|
|
We thank the open-source community and the following projects that made this work possible: |
|
|
|
|
|
- [Hugging Face Transformers](https://github.com/huggingface/transformers) |
|
|
- [PyTorch](https://pytorch.org/) |
|
|
- [LLaMA](https://ai.meta.com/llama/) architecture inspiration |
|
|
- [EleutherAI](https://www.eleuther.ai/) for evaluation frameworks |
|
|
|
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
**Built with β€οΈ for the AI community** |
|
|
|
|
|
[β¬ Back to Top](#trouter-20b) |
|
|
|
|
|
</div> |