|
|
---
|
|
|
language:
|
|
|
- en
|
|
|
license:
|
|
|
- gpl-3.0
|
|
|
- other
|
|
|
tags:
|
|
|
- text-generation
|
|
|
- pytorch
|
|
|
- causal-lm
|
|
|
- openllm
|
|
|
- gpt
|
|
|
- language-model
|
|
|
datasets:
|
|
|
- squad
|
|
|
metrics:
|
|
|
- perplexity
|
|
|
- loss
|
|
|
pipeline_tag: text-generation
|
|
|
model-index:
|
|
|
- name: OpenLLM Small Extended 10k
|
|
|
results:
|
|
|
- task:
|
|
|
type: text-generation
|
|
|
dataset:
|
|
|
type: squad
|
|
|
name: SQUAD
|
|
|
metrics:
|
|
|
- type: loss
|
|
|
value: 5.22
|
|
|
- type: perplexity
|
|
|
value: 184.5
|
|
|
---
|
|
|
|
|
|
# OpenLLM Small Extended 10k
|
|
|
|
|
|
This is the OpenLLM small model trained for 10,000 steps on the SQUAD dataset.
|
|
|
|
|
|
## Model Details
|
|
|
|
|
|
- **Model Type**: GPT-style transformer (decoder-only)
|
|
|
- **Training Steps**: 10,000
|
|
|
- **Parameters**: 35.8M
|
|
|
- **Vocabulary Size**: 32,000
|
|
|
- **Context Length**: 1,024 tokens
|
|
|
- **Architecture**: 6 layers, 8 attention heads, 512 embedding dimension
|
|
|
|
|
|
## Training Information
|
|
|
|
|
|
- **Dataset**: SQUAD (Stanford Question Answering Dataset)
|
|
|
- **Training Data**: ~41k Wikipedia passages
|
|
|
- **Tokenizer**: SentencePiece BPE with 32k vocabulary
|
|
|
- **Optimizer**: AdamW
|
|
|
- **Learning Rate**: 3e-4
|
|
|
- **Batch Size**: 4 (with gradient accumulation)
|
|
|
|
|
|
## Performance
|
|
|
|
|
|
- **Final Loss**: ~5.22
|
|
|
- **Inference Speed**: ~8.3 tokens/second (CPU)
|
|
|
- **Memory Usage**: ~143MB for inference
|
|
|
|
|
|
## Usage
|
|
|
|
|
|
### Using the Model
|
|
|
|
|
|
This model uses a custom configuration format and requires the OpenLLM framework to load properly.
|
|
|
|
|
|
```python
|
|
|
# Load using the OpenLLM framework
|
|
|
from core.src.model import GPTModel
|
|
|
import json
|
|
|
import torch
|
|
|
|
|
|
# Load configuration
|
|
|
with open("config.json", "r") as f:
|
|
|
config = json.load(f)
|
|
|
|
|
|
# Create model instance
|
|
|
model = GPTModel(config["model_config"])
|
|
|
|
|
|
# Load trained weights
|
|
|
model.load_state_dict(torch.load("pytorch_model.bin", map_location="cpu"))
|
|
|
|
|
|
# Load tokenizer
|
|
|
import sentencepiece as spm
|
|
|
tokenizer = spm.SentencePieceProcessor()
|
|
|
tokenizer.load("tokenizer.model")
|
|
|
|
|
|
# Generate text
|
|
|
prompt = "The future of artificial intelligence"
|
|
|
tokens = tokenizer.encode(prompt)
|
|
|
inputs = torch.tensor([tokens], dtype=torch.long)
|
|
|
|
|
|
with torch.no_grad():
|
|
|
outputs = model.generate(
|
|
|
inputs,
|
|
|
max_length=100,
|
|
|
temperature=0.7
|
|
|
)
|
|
|
|
|
|
generated_text = tokenizer.decode(outputs[0].tolist())
|
|
|
print(generated_text)
|
|
|
```
|
|
|
|
|
|
### Using the Custom Loader
|
|
|
|
|
|
```python
|
|
|
from load_hf_model import load_model_and_tokenizer
|
|
|
|
|
|
# Load model using custom loader
|
|
|
model, tokenizer = load_model_and_tokenizer("lemms/openllm-small-extended-10k")
|
|
|
|
|
|
# Generate text
|
|
|
prompt = "The history of machine learning"
|
|
|
tokens = tokenizer.encode(prompt)
|
|
|
inputs = torch.tensor([tokens], dtype=torch.long)
|
|
|
|
|
|
with torch.no_grad():
|
|
|
outputs = model.generate(
|
|
|
inputs,
|
|
|
max_length=100,
|
|
|
temperature=0.7
|
|
|
)
|
|
|
|
|
|
print(tokenizer.decode(outputs[0].tolist()))
|
|
|
```
|
|
|
|
|
|
## Model Architecture
|
|
|
|
|
|
This model follows the standard GPT architecture:
|
|
|
|
|
|
- **Token Embeddings**: Maps token IDs to dense vectors
|
|
|
- **Positional Embeddings**: Adds position information
|
|
|
- **Transformer Blocks**: 6 layers with multi-head attention and feed-forward networks
|
|
|
- **Layer Normalization**: Pre-norm placement for training stability
|
|
|
- **Output Head**: Linear projection to vocabulary for next-token prediction
|
|
|
|
|
|
## Training Details
|
|
|
|
|
|
The model was trained using:
|
|
|
- **Framework**: PyTorch
|
|
|
- **Hardware**: CPU training with gradient accumulation
|
|
|
- **Regularization**: Dropout (0.1), weight decay
|
|
|
- **Optimization**: AdamW with cosine learning rate scheduling
|
|
|
- **Gradient Clipping**: 1.0
|
|
|
|
|
|
## Limitations
|
|
|
|
|
|
- This is a small model (35.8M parameters) with limited capacity
|
|
|
- Training was done on CPU, which limited the training steps
|
|
|
- Model quality is basic and suitable for educational/research purposes
|
|
|
- Not suitable for production use without further training
|
|
|
|
|
|
## License
|
|
|
|
|
|
This model is dual-licensed:
|
|
|
- **Open Source**: GPLv3 License
|
|
|
- **Commercial**: Commercial License available
|
|
|
|
|
|
## Citation
|
|
|
|
|
|
If you use this model in your research, please cite:
|
|
|
|
|
|
```bibtex
|
|
|
@misc{openllm2024,
|
|
|
title={OpenLLM: Open Source Large Language Model Framework},
|
|
|
author={Louis Chua Bean Chong},
|
|
|
year={2024},
|
|
|
url={https://github.com/louischua/openllm}
|
|
|
}
|
|
|
```
|
|
|
|
|
|
## Model Card
|
|
|
|
|
|
- **Developed by**: Louis Chua Bean Chong
|
|
|
- **Model type**: Language Model
|
|
|
- **Language(s)**: English
|
|
|
- **License**: GPLv3 / Commercial
|
|
|
- **Finetuned from model**: Trained from scratch
|
|
|
- **Training data**: SQUAD dataset
|
|
|
- **Training procedure**: Supervised learning
|
|
|
- **Evaluation results**: Basic text generation capability
|
|
|
|
|
|
## Related Models
|
|
|
|
|
|
- [lemms/openllm-small-extended-4k](https://huggingface.co/lemms/openllm-small-extended-4k)
|
|
|
- [lemms/openllm-small-extended-6k](https://huggingface.co/lemms/openllm-small-extended-6k)
|
|
|
- [lemms/openllm-small-extended-7k](https://huggingface.co/lemms/openllm-small-extended-7k)
|
|
|
- [lemms/openllm-small-extended-8k](https://huggingface.co/lemms/openllm-small-extended-8k)
|
|
|
- [lemms/openllm-small-extended-9k](https://huggingface.co/lemms/openllm-small-extended-9k)
|
|
|
|