File size: 5,084 Bytes
782e66c 8aeeaba 1c960ef 8aeeaba 1c960ef 8aeeaba 1c960ef 8aeeaba 1c960ef 8aeeaba 1c960ef 8aeeaba 1c960ef 8aeeaba 1c960ef 8aeeaba 1c960ef 8aeeaba 1c960ef 8aeeaba 1c960ef 8aeeaba |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 |
---
language:
- en
license:
- gpl-3.0
- other
tags:
- text-generation
- pytorch
- causal-lm
- openllm
- gpt
- language-model
datasets:
- squad
metrics:
- perplexity
- loss
pipeline_tag: text-generation
model-index:
- name: OpenLLM Small Extended 10k
results:
- task:
type: text-generation
dataset:
type: squad
name: SQUAD
metrics:
- type: loss
value: 5.22
- type: perplexity
value: 184.5
---
# OpenLLM Small Extended 10k
This is the OpenLLM small model trained for 10,000 steps on the SQUAD dataset.
## Model Details
- **Model Type**: GPT-style transformer (decoder-only)
- **Training Steps**: 10,000
- **Parameters**: 35.8M
- **Vocabulary Size**: 32,000
- **Context Length**: 1,024 tokens
- **Architecture**: 6 layers, 8 attention heads, 512 embedding dimension
## Training Information
- **Dataset**: SQUAD (Stanford Question Answering Dataset)
- **Training Data**: ~41k Wikipedia passages
- **Tokenizer**: SentencePiece BPE with 32k vocabulary
- **Optimizer**: AdamW
- **Learning Rate**: 3e-4
- **Batch Size**: 4 (with gradient accumulation)
## Performance
- **Final Loss**: ~5.22
- **Inference Speed**: ~8.3 tokens/second (CPU)
- **Memory Usage**: ~143MB for inference
## Usage
### Using the Model
This model uses a custom configuration format and requires the OpenLLM framework to load properly.
```python
# Load using the OpenLLM framework
from core.src.model import GPTModel
import json
import torch
# Load configuration
with open("config.json", "r") as f:
config = json.load(f)
# Create model instance
model = GPTModel(config["model_config"])
# Load trained weights
model.load_state_dict(torch.load("pytorch_model.bin", map_location="cpu"))
# Load tokenizer
import sentencepiece as spm
tokenizer = spm.SentencePieceProcessor()
tokenizer.load("tokenizer.model")
# Generate text
prompt = "The future of artificial intelligence"
tokens = tokenizer.encode(prompt)
inputs = torch.tensor([tokens], dtype=torch.long)
with torch.no_grad():
outputs = model.generate(
inputs,
max_length=100,
temperature=0.7
)
generated_text = tokenizer.decode(outputs[0].tolist())
print(generated_text)
```
### Using the Custom Loader
```python
from load_hf_model import load_model_and_tokenizer
# Load model using custom loader
model, tokenizer = load_model_and_tokenizer("lemms/openllm-small-extended-10k")
# Generate text
prompt = "The history of machine learning"
tokens = tokenizer.encode(prompt)
inputs = torch.tensor([tokens], dtype=torch.long)
with torch.no_grad():
outputs = model.generate(
inputs,
max_length=100,
temperature=0.7
)
print(tokenizer.decode(outputs[0].tolist()))
```
## Model Architecture
This model follows the standard GPT architecture:
- **Token Embeddings**: Maps token IDs to dense vectors
- **Positional Embeddings**: Adds position information
- **Transformer Blocks**: 6 layers with multi-head attention and feed-forward networks
- **Layer Normalization**: Pre-norm placement for training stability
- **Output Head**: Linear projection to vocabulary for next-token prediction
## Training Details
The model was trained using:
- **Framework**: PyTorch
- **Hardware**: CPU training with gradient accumulation
- **Regularization**: Dropout (0.1), weight decay
- **Optimization**: AdamW with cosine learning rate scheduling
- **Gradient Clipping**: 1.0
## Limitations
- This is a small model (35.8M parameters) with limited capacity
- Training was done on CPU, which limited the training steps
- Model quality is basic and suitable for educational/research purposes
- Not suitable for production use without further training
## License
This model is dual-licensed:
- **Open Source**: GPLv3 License
- **Commercial**: Commercial License available
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{openllm2024,
title={OpenLLM: Open Source Large Language Model Framework},
author={Louis Chua Bean Chong},
year={2024},
url={https://github.com/louischua/openllm}
}
```
## Model Card
- **Developed by**: Louis Chua Bean Chong
- **Model type**: Language Model
- **Language(s)**: English
- **License**: GPLv3 / Commercial
- **Finetuned from model**: Trained from scratch
- **Training data**: SQUAD dataset
- **Training procedure**: Supervised learning
- **Evaluation results**: Basic text generation capability
## Related Models
- [lemms/openllm-small-extended-4k](https://huggingface.co/lemms/openllm-small-extended-4k)
- [lemms/openllm-small-extended-6k](https://huggingface.co/lemms/openllm-small-extended-6k)
- [lemms/openllm-small-extended-7k](https://huggingface.co/lemms/openllm-small-extended-7k)
- [lemms/openllm-small-extended-8k](https://huggingface.co/lemms/openllm-small-extended-8k)
- [lemms/openllm-small-extended-9k](https://huggingface.co/lemms/openllm-small-extended-9k)
|