OpenLLM Small Extended 10k
This is the OpenLLM small model trained for 10,000 steps on the SQUAD dataset.
Model Details
- Model Type: GPT-style transformer (decoder-only)
- Training Steps: 10,000
- Parameters: 35.8M
- Vocabulary Size: 32,000
- Context Length: 1,024 tokens
- Architecture: 6 layers, 8 attention heads, 512 embedding dimension
Training Information
- Dataset: SQUAD (Stanford Question Answering Dataset)
- Training Data: ~41k Wikipedia passages
- Tokenizer: SentencePiece BPE with 32k vocabulary
- Optimizer: AdamW
- Learning Rate: 3e-4
- Batch Size: 4 (with gradient accumulation)
Performance
- Final Loss: ~5.22
- Inference Speed: ~8.3 tokens/second (CPU)
- Memory Usage: ~143MB for inference
Usage
Using the Model
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model_name = "lemms/openllm-small-extended-10k"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Generate text
prompt = "The future of artificial intelligence"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
inputs["input_ids"],
max_length=100,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
Using the Custom Loader
from load_hf_model import load_model_and_tokenizer
# Load model using custom loader
model, tokenizer = load_model_and_tokenizer("lemms/openllm-small-extended-10k")
# Generate text
prompt = "The history of machine learning"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
inputs["input_ids"],
max_length=100,
temperature=0.7
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Model Architecture
This model follows the standard GPT architecture:
- Token Embeddings: Maps token IDs to dense vectors
- Positional Embeddings: Adds position information
- Transformer Blocks: 6 layers with multi-head attention and feed-forward networks
- Layer Normalization: Pre-norm placement for training stability
- Output Head: Linear projection to vocabulary for next-token prediction
Training Details
The model was trained using:
- Framework: PyTorch
- Hardware: CPU training with gradient accumulation
- Regularization: Dropout (0.1), weight decay
- Optimization: AdamW with cosine learning rate scheduling
- Gradient Clipping: 1.0
Limitations
- This is a small model (35.8M parameters) with limited capacity
- Training was done on CPU, which limited the training steps
- Model quality is basic and suitable for educational/research purposes
- Not suitable for production use without further training
License
This model is dual-licensed:
- Open Source: GPLv3 License
- Commercial: Commercial License available
Citation
If you use this model in your research, please cite:
@misc{openllm2024,
title={OpenLLM: Open Source Large Language Model Framework},
author={Louis Chua Bean Chong},
year={2024},
url={https://github.com/louischua/openllm}
}
Model Card
- Developed by: Louis Chua Bean Chong
- Model type: Language Model
- Language(s): English
- License: GPLv3 / Commercial
- Finetuned from model: Trained from scratch
- Training data: SQUAD dataset
- Training procedure: Supervised learning
- Evaluation results: Basic text generation capability