|
|
---
|
|
|
language:
|
|
|
- en
|
|
|
license:
|
|
|
- gpl-3.0
|
|
|
- other
|
|
|
tags:
|
|
|
- text-generation
|
|
|
- language-model
|
|
|
- open-source
|
|
|
- gpt
|
|
|
- transformer
|
|
|
- causal-lm
|
|
|
datasets:
|
|
|
- squad
|
|
|
metrics:
|
|
|
- perplexity
|
|
|
- loss
|
|
|
library_name: transformers
|
|
|
pipeline_tag: text-generation
|
|
|
model-index:
|
|
|
- name: OpenLLM Small Extended 7K
|
|
|
results:
|
|
|
- task:
|
|
|
type: text-generation
|
|
|
dataset:
|
|
|
type: squad
|
|
|
name: Wikipedia passages from SQuAD
|
|
|
metrics:
|
|
|
- type: loss
|
|
|
value: 2.1
|
|
|
- type: perplexity
|
|
|
value: 8.2
|
|
|
---
|
|
|
|
|
|
# OpenLLM Small Extended 7K Model
|
|
|
|
|
|
<!-- Copyright (C) 2024 Louis Chua Bean Chong -->
|
|
|
<!-- This file is part of OpenLLM - dual-licensed under GPLv3 and Commercial License -->
|
|
|
|
|
|
## π Model Overview
|
|
|
|
|
|
This is the **OpenLLM Small Extended 7K** model, a 35.8M parameter GPT-style language model trained for 7,000 steps on Wikipedia passages from the SQuAD dataset. This model represents the latest iteration of our small model architecture with extended training.
|
|
|
|
|
|
### **π Model Specifications**
|
|
|
|
|
|
- **Architecture**: GPT-style Transformer
|
|
|
- **Parameters**: 35,823,616 (35.8M)
|
|
|
- **Layers**: 6 transformer layers
|
|
|
- **Heads**: 8 attention heads
|
|
|
- **Embedding Dimension**: 512
|
|
|
- **Vocabulary Size**: 32,000 tokens
|
|
|
- **Context Length**: 1,024 tokens
|
|
|
- **Training Steps**: 7,000
|
|
|
- **Model Size**: Small
|
|
|
|
|
|
### **π― Training Details**
|
|
|
|
|
|
- **Dataset**: Wikipedia passages from SQuAD dataset (~41k passages)
|
|
|
- **Tokenization**: SentencePiece with 32k vocabulary
|
|
|
- **Training Objective**: Next token prediction (causal language modeling)
|
|
|
- **Optimizer**: AdamW with learning rate scheduling
|
|
|
- **Hardware**: Trained on consumer GPU with gradient accumulation
|
|
|
|
|
|
### **π Model Files**
|
|
|
|
|
|
```
|
|
|
huggingface/
|
|
|
βββ config.json # Model configuration
|
|
|
βββ generation_config.json # Generation parameters
|
|
|
βββ pytorch_model.bin # Model weights (161MB)
|
|
|
βββ tokenizer_config.json # Tokenizer configuration
|
|
|
βββ tokenizer.model # SentencePiece tokenizer
|
|
|
βββ load_hf_model.py # Loading script
|
|
|
```
|
|
|
|
|
|
## π Usage
|
|
|
|
|
|
### **Loading with Hugging Face Transformers**
|
|
|
|
|
|
```python
|
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|
|
import torch
|
|
|
|
|
|
# Load model and tokenizer
|
|
|
model_name = "path/to/huggingface"
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|
|
model = AutoModelForCausalLM.from_pretrained(model_name)
|
|
|
|
|
|
# Generate text
|
|
|
prompt = "The history of artificial intelligence"
|
|
|
inputs = tokenizer(prompt, return_tensors="pt")
|
|
|
|
|
|
with torch.no_grad():
|
|
|
outputs = model.generate(
|
|
|
inputs.input_ids,
|
|
|
max_new_tokens=100,
|
|
|
temperature=0.7,
|
|
|
do_sample=True,
|
|
|
pad_token_id=tokenizer.pad_token_id
|
|
|
)
|
|
|
|
|
|
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
|
|
print(generated_text)
|
|
|
```
|
|
|
|
|
|
### **Using the Custom Loader**
|
|
|
|
|
|
```python
|
|
|
from load_hf_model import load_openllm_model
|
|
|
|
|
|
# Load the model using our custom loader
|
|
|
model, tokenizer = load_openllm_model("path/to/huggingface")
|
|
|
|
|
|
# Generate text
|
|
|
prompt = "Explain quantum computing in simple terms"
|
|
|
inputs = tokenizer(prompt, return_tensors="pt")
|
|
|
|
|
|
outputs = model.generate(
|
|
|
inputs.input_ids,
|
|
|
max_new_tokens=150,
|
|
|
temperature=0.8,
|
|
|
top_p=0.9
|
|
|
)
|
|
|
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|
|
```
|
|
|
|
|
|
### **Inference Server**
|
|
|
|
|
|
```bash
|
|
|
# Start the FastAPI inference server
|
|
|
python core/src/inference_server.py \
|
|
|
--model_path exports/huggingface-7k/huggingface \
|
|
|
--port 8000
|
|
|
|
|
|
# Make API calls
|
|
|
curl -X POST "http://localhost:8000/generate" \
|
|
|
-H "Content-Type: application/json" \
|
|
|
-d '{
|
|
|
"prompt": "The future of renewable energy",
|
|
|
"max_tokens": 100,
|
|
|
"temperature": 0.7
|
|
|
}'
|
|
|
```
|
|
|
|
|
|
## π Performance
|
|
|
|
|
|
### **Training Metrics**
|
|
|
|
|
|
- **Final Loss**: ~2.1 (cross-entropy)
|
|
|
- **Training Time**: ~7 hours on consumer GPU
|
|
|
- **Memory Usage**: ~2GB VRAM during training
|
|
|
- **Inference Speed**: ~50 tokens/second on CPU, ~200 tokens/second on GPU
|
|
|
|
|
|
### **Model Capabilities**
|
|
|
|
|
|
- **Text Generation**: Coherent paragraph generation
|
|
|
- **Question Answering**: Basic factual responses
|
|
|
- **Summarization**: Short text summarization
|
|
|
- **Language Understanding**: Context-aware responses
|
|
|
|
|
|
## π§ Configuration
|
|
|
|
|
|
### **Generation Parameters**
|
|
|
|
|
|
```json
|
|
|
{
|
|
|
"max_length": 512,
|
|
|
"max_new_tokens": 256,
|
|
|
"temperature": 0.7,
|
|
|
"top_k": 40,
|
|
|
"top_p": 0.9,
|
|
|
"do_sample": true,
|
|
|
"pad_token_id": 0,
|
|
|
"eos_token_id": 1,
|
|
|
"bos_token_id": 2
|
|
|
}
|
|
|
```
|
|
|
|
|
|
### **Model Architecture**
|
|
|
|
|
|
```json
|
|
|
{
|
|
|
"vocab_size": 32000,
|
|
|
"n_layer": 6,
|
|
|
"n_head": 8,
|
|
|
"n_embd": 512,
|
|
|
"block_size": 1024,
|
|
|
"dropout": 0.1,
|
|
|
"bias": true
|
|
|
}
|
|
|
```
|
|
|
|
|
|
## π§ͺ Testing
|
|
|
|
|
|
### **Quick Test**
|
|
|
|
|
|
```python
|
|
|
# Test the model with a simple prompt
|
|
|
test_prompt = "Hello, how are you today?"
|
|
|
inputs = tokenizer(test_prompt, return_tensors="pt")
|
|
|
|
|
|
with torch.no_grad():
|
|
|
outputs = model.generate(
|
|
|
inputs.input_ids,
|
|
|
max_new_tokens=20,
|
|
|
temperature=0.7
|
|
|
)
|
|
|
|
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
|
|
print(f"Input: {test_prompt}")
|
|
|
print(f"Output: {response}")
|
|
|
```
|
|
|
|
|
|
## π Limitations
|
|
|
|
|
|
- **Context Length**: Limited to 1,024 tokens
|
|
|
- **Training Data**: Only Wikipedia passages (limited domain)
|
|
|
- **Model Size**: Small model with limited reasoning capabilities
|
|
|
- **Bias**: May inherit biases from training data
|
|
|
- **Factual Accuracy**: Not guaranteed for current events
|
|
|
|
|
|
## π Model Comparison
|
|
|
|
|
|
| Model | Parameters | Training Steps | Context Length | Use Case |
|
|
|
|-------|------------|----------------|----------------|----------|
|
|
|
| Small 4K | 35.8M | 4,000 | 1,024 | Basic text generation |
|
|
|
| Small 6K | 35.8M | 6,000 | 1,024 | Improved coherence |
|
|
|
| **Small 7K** | **35.8M** | **7,000** | **1,024** | **Extended training** |
|
|
|
|
|
|
## π License
|
|
|
|
|
|
This model is dual-licensed:
|
|
|
- **Open Source**: GNU General Public License v3.0
|
|
|
- **Commercial**: Commercial License (contact for details)
|
|
|
|
|
|
See `LICENSE` and `docs/LICENSES.md` for full license information.
|
|
|
|
|
|
## π€ Contributing
|
|
|
|
|
|
We welcome contributions to improve the model! Please see:
|
|
|
- `docs/CONTRIBUTING.md` for contribution guidelines
|
|
|
- `docs/CODE_OF_CONDUCT.md` for community standards
|
|
|
|
|
|
## π Support
|
|
|
|
|
|
For questions, issues, or commercial licensing:
|
|
|
- **GitHub Issues**: Report bugs and feature requests
|
|
|
- **Documentation**: Check `docs/` directory
|
|
|
- **Commercial License**: Contact for enterprise use
|
|
|
|
|
|
---
|
|
|
|
|
|
**Author**: Louis Chua Bean Chong
|
|
|
**Project**: OpenLLM - Open Source Large Language Model
|
|
|
**Version**: 0.1.0
|
|
|
**Last Updated**: 2024
|
|
|
|