File size: 6,746 Bytes
a81b5cd bc52965 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 |
---
language:
- en
license:
- gpl-3.0
- other
tags:
- text-generation
- language-model
- open-source
- gpt
- transformer
- causal-lm
datasets:
- squad
metrics:
- perplexity
- loss
library_name: transformers
pipeline_tag: text-generation
model-index:
- name: OpenLLM Small Extended 7K
results:
- task:
type: text-generation
dataset:
type: squad
name: Wikipedia passages from SQuAD
metrics:
- type: loss
value: 2.1
- type: perplexity
value: 8.2
---
# OpenLLM Small Extended 7K Model
<!-- Copyright (C) 2024 Louis Chua Bean Chong -->
<!-- This file is part of OpenLLM - dual-licensed under GPLv3 and Commercial License -->
## π Model Overview
This is the **OpenLLM Small Extended 7K** model, a 35.8M parameter GPT-style language model trained for 7,000 steps on Wikipedia passages from the SQuAD dataset. This model represents the latest iteration of our small model architecture with extended training.
### **π Model Specifications**
- **Architecture**: GPT-style Transformer
- **Parameters**: 35,823,616 (35.8M)
- **Layers**: 6 transformer layers
- **Heads**: 8 attention heads
- **Embedding Dimension**: 512
- **Vocabulary Size**: 32,000 tokens
- **Context Length**: 1,024 tokens
- **Training Steps**: 7,000
- **Model Size**: Small
### **π― Training Details**
- **Dataset**: Wikipedia passages from SQuAD dataset (~41k passages)
- **Tokenization**: SentencePiece with 32k vocabulary
- **Training Objective**: Next token prediction (causal language modeling)
- **Optimizer**: AdamW with learning rate scheduling
- **Hardware**: Trained on consumer GPU with gradient accumulation
### **π Model Files**
```
huggingface/
βββ config.json # Model configuration
βββ generation_config.json # Generation parameters
βββ pytorch_model.bin # Model weights (161MB)
βββ tokenizer_config.json # Tokenizer configuration
βββ tokenizer.model # SentencePiece tokenizer
βββ load_hf_model.py # Loading script
```
## π Usage
### **Loading with Hugging Face Transformers**
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model_name = "path/to/huggingface"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Generate text
prompt = "The history of artificial intelligence"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
inputs.input_ids,
max_new_tokens=100,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.pad_token_id
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
```
### **Using the Custom Loader**
```python
from load_hf_model import load_openllm_model
# Load the model using our custom loader
model, tokenizer = load_openllm_model("path/to/huggingface")
# Generate text
prompt = "Explain quantum computing in simple terms"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
inputs.input_ids,
max_new_tokens=150,
temperature=0.8,
top_p=0.9
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### **Inference Server**
```bash
# Start the FastAPI inference server
python core/src/inference_server.py \
--model_path exports/huggingface-7k/huggingface \
--port 8000
# Make API calls
curl -X POST "http://localhost:8000/generate" \
-H "Content-Type: application/json" \
-d '{
"prompt": "The future of renewable energy",
"max_tokens": 100,
"temperature": 0.7
}'
```
## π Performance
### **Training Metrics**
- **Final Loss**: ~2.1 (cross-entropy)
- **Training Time**: ~7 hours on consumer GPU
- **Memory Usage**: ~2GB VRAM during training
- **Inference Speed**: ~50 tokens/second on CPU, ~200 tokens/second on GPU
### **Model Capabilities**
- **Text Generation**: Coherent paragraph generation
- **Question Answering**: Basic factual responses
- **Summarization**: Short text summarization
- **Language Understanding**: Context-aware responses
## π§ Configuration
### **Generation Parameters**
```json
{
"max_length": 512,
"max_new_tokens": 256,
"temperature": 0.7,
"top_k": 40,
"top_p": 0.9,
"do_sample": true,
"pad_token_id": 0,
"eos_token_id": 1,
"bos_token_id": 2
}
```
### **Model Architecture**
```json
{
"vocab_size": 32000,
"n_layer": 6,
"n_head": 8,
"n_embd": 512,
"block_size": 1024,
"dropout": 0.1,
"bias": true
}
```
## π§ͺ Testing
### **Quick Test**
```python
# Test the model with a simple prompt
test_prompt = "Hello, how are you today?"
inputs = tokenizer(test_prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
inputs.input_ids,
max_new_tokens=20,
temperature=0.7
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Input: {test_prompt}")
print(f"Output: {response}")
```
## π Limitations
- **Context Length**: Limited to 1,024 tokens
- **Training Data**: Only Wikipedia passages (limited domain)
- **Model Size**: Small model with limited reasoning capabilities
- **Bias**: May inherit biases from training data
- **Factual Accuracy**: Not guaranteed for current events
## π Model Comparison
| Model | Parameters | Training Steps | Context Length | Use Case |
|-------|------------|----------------|----------------|----------|
| Small 4K | 35.8M | 4,000 | 1,024 | Basic text generation |
| Small 6K | 35.8M | 6,000 | 1,024 | Improved coherence |
| **Small 7K** | **35.8M** | **7,000** | **1,024** | **Extended training** |
## π License
This model is dual-licensed:
- **Open Source**: GNU General Public License v3.0
- **Commercial**: Commercial License (contact for details)
See `LICENSE` and `docs/LICENSES.md` for full license information.
## π€ Contributing
We welcome contributions to improve the model! Please see:
- `docs/CONTRIBUTING.md` for contribution guidelines
- `docs/CODE_OF_CONDUCT.md` for community standards
## π Support
For questions, issues, or commercial licensing:
- **GitHub Issues**: Report bugs and feature requests
- **Documentation**: Check `docs/` directory
- **Commercial License**: Contact for enterprise use
---
**Author**: Louis Chua Bean Chong
**Project**: OpenLLM - Open Source Large Language Model
**Version**: 0.1.0
**Last Updated**: 2024
|