lemms's picture
Fix YAML metadata warning in README.md
a81b5cd verified
---
language:
- en
license:
- gpl-3.0
- other
tags:
- text-generation
- language-model
- open-source
- gpt
- transformer
- causal-lm
datasets:
- squad
metrics:
- perplexity
- loss
library_name: transformers
pipeline_tag: text-generation
model-index:
- name: OpenLLM Small Extended 7K
results:
- task:
type: text-generation
dataset:
type: squad
name: Wikipedia passages from SQuAD
metrics:
- type: loss
value: 2.1
- type: perplexity
value: 8.2
---
# OpenLLM Small Extended 7K Model
<!-- Copyright (C) 2024 Louis Chua Bean Chong -->
<!-- This file is part of OpenLLM - dual-licensed under GPLv3 and Commercial License -->
## 🌟 Model Overview
This is the **OpenLLM Small Extended 7K** model, a 35.8M parameter GPT-style language model trained for 7,000 steps on Wikipedia passages from the SQuAD dataset. This model represents the latest iteration of our small model architecture with extended training.
### **πŸ“Š Model Specifications**
- **Architecture**: GPT-style Transformer
- **Parameters**: 35,823,616 (35.8M)
- **Layers**: 6 transformer layers
- **Heads**: 8 attention heads
- **Embedding Dimension**: 512
- **Vocabulary Size**: 32,000 tokens
- **Context Length**: 1,024 tokens
- **Training Steps**: 7,000
- **Model Size**: Small
### **🎯 Training Details**
- **Dataset**: Wikipedia passages from SQuAD dataset (~41k passages)
- **Tokenization**: SentencePiece with 32k vocabulary
- **Training Objective**: Next token prediction (causal language modeling)
- **Optimizer**: AdamW with learning rate scheduling
- **Hardware**: Trained on consumer GPU with gradient accumulation
### **πŸ“ Model Files**
```
huggingface/
β”œβ”€β”€ config.json # Model configuration
β”œβ”€β”€ generation_config.json # Generation parameters
β”œβ”€β”€ pytorch_model.bin # Model weights (161MB)
β”œβ”€β”€ tokenizer_config.json # Tokenizer configuration
β”œβ”€β”€ tokenizer.model # SentencePiece tokenizer
└── load_hf_model.py # Loading script
```
## πŸš€ Usage
### **Loading with Hugging Face Transformers**
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model_name = "path/to/huggingface"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Generate text
prompt = "The history of artificial intelligence"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
inputs.input_ids,
max_new_tokens=100,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.pad_token_id
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
```
### **Using the Custom Loader**
```python
from load_hf_model import load_openllm_model
# Load the model using our custom loader
model, tokenizer = load_openllm_model("path/to/huggingface")
# Generate text
prompt = "Explain quantum computing in simple terms"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
inputs.input_ids,
max_new_tokens=150,
temperature=0.8,
top_p=0.9
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### **Inference Server**
```bash
# Start the FastAPI inference server
python core/src/inference_server.py \
--model_path exports/huggingface-7k/huggingface \
--port 8000
# Make API calls
curl -X POST "http://localhost:8000/generate" \
-H "Content-Type: application/json" \
-d '{
"prompt": "The future of renewable energy",
"max_tokens": 100,
"temperature": 0.7
}'
```
## πŸ“ˆ Performance
### **Training Metrics**
- **Final Loss**: ~2.1 (cross-entropy)
- **Training Time**: ~7 hours on consumer GPU
- **Memory Usage**: ~2GB VRAM during training
- **Inference Speed**: ~50 tokens/second on CPU, ~200 tokens/second on GPU
### **Model Capabilities**
- **Text Generation**: Coherent paragraph generation
- **Question Answering**: Basic factual responses
- **Summarization**: Short text summarization
- **Language Understanding**: Context-aware responses
## πŸ”§ Configuration
### **Generation Parameters**
```json
{
"max_length": 512,
"max_new_tokens": 256,
"temperature": 0.7,
"top_k": 40,
"top_p": 0.9,
"do_sample": true,
"pad_token_id": 0,
"eos_token_id": 1,
"bos_token_id": 2
}
```
### **Model Architecture**
```json
{
"vocab_size": 32000,
"n_layer": 6,
"n_head": 8,
"n_embd": 512,
"block_size": 1024,
"dropout": 0.1,
"bias": true
}
```
## πŸ§ͺ Testing
### **Quick Test**
```python
# Test the model with a simple prompt
test_prompt = "Hello, how are you today?"
inputs = tokenizer(test_prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
inputs.input_ids,
max_new_tokens=20,
temperature=0.7
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Input: {test_prompt}")
print(f"Output: {response}")
```
## πŸ“‹ Limitations
- **Context Length**: Limited to 1,024 tokens
- **Training Data**: Only Wikipedia passages (limited domain)
- **Model Size**: Small model with limited reasoning capabilities
- **Bias**: May inherit biases from training data
- **Factual Accuracy**: Not guaranteed for current events
## πŸ”„ Model Comparison
| Model | Parameters | Training Steps | Context Length | Use Case |
|-------|------------|----------------|----------------|----------|
| Small 4K | 35.8M | 4,000 | 1,024 | Basic text generation |
| Small 6K | 35.8M | 6,000 | 1,024 | Improved coherence |
| **Small 7K** | **35.8M** | **7,000** | **1,024** | **Extended training** |
## πŸ“„ License
This model is dual-licensed:
- **Open Source**: GNU General Public License v3.0
- **Commercial**: Commercial License (contact for details)
See `LICENSE` and `docs/LICENSES.md` for full license information.
## 🀝 Contributing
We welcome contributions to improve the model! Please see:
- `docs/CONTRIBUTING.md` for contribution guidelines
- `docs/CODE_OF_CONDUCT.md` for community standards
## πŸ“ž Support
For questions, issues, or commercial licensing:
- **GitHub Issues**: Report bugs and feature requests
- **Documentation**: Check `docs/` directory
- **Commercial License**: Contact for enterprise use
---
**Author**: Louis Chua Bean Chong
**Project**: OpenLLM - Open Source Large Language Model
**Version**: 0.1.0
**Last Updated**: 2024