OpenLLM Small Extended 10k

This is the OpenLLM small model trained for 10,000 steps on the SQUAD dataset.

Model Details

Model Type: GPT-style transformer (decoder-only)
Training Steps: 10,000
Parameters: 35.8M
Vocabulary Size: 32,000
Context Length: 1,024 tokens
Architecture: 6 layers, 8 attention heads, 512 embedding dimension

Training Information

Dataset: SQUAD (Stanford Question Answering Dataset)
Training Data: ~41k Wikipedia passages
Tokenizer: SentencePiece BPE with 32k vocabulary
Optimizer: AdamW
Learning Rate: 3e-4
Batch Size: 4 (with gradient accumulation)

Performance

Final Loss: ~5.22
Inference Speed: ~8.3 tokens/second (CPU)
Memory Usage: ~143MB for inference

Usage

Using the Model

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "lemms/openllm-small-extended-10k"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Generate text
prompt = "The future of artificial intelligence"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        inputs["input_ids"],
        max_length=100,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Using the Custom Loader

from load_hf_model import load_model_and_tokenizer

# Load model using custom loader
model, tokenizer = load_model_and_tokenizer("lemms/openllm-small-extended-10k")

# Generate text
prompt = "The history of machine learning"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        inputs["input_ids"],
        max_length=100,
        temperature=0.7
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Model Architecture

This model follows the standard GPT architecture:

Token Embeddings: Maps token IDs to dense vectors
Positional Embeddings: Adds position information
Transformer Blocks: 6 layers with multi-head attention and feed-forward networks
Layer Normalization: Pre-norm placement for training stability
Output Head: Linear projection to vocabulary for next-token prediction

Training Details

The model was trained using:

Framework: PyTorch
Hardware: CPU training with gradient accumulation
Regularization: Dropout (0.1), weight decay
Optimization: AdamW with cosine learning rate scheduling
Gradient Clipping: 1.0

Limitations

This is a small model (35.8M parameters) with limited capacity
Training was done on CPU, which limited the training steps
Model quality is basic and suitable for educational/research purposes
Not suitable for production use without further training

License

This model is dual-licensed:

Open Source: GPLv3 License
Commercial: Commercial License available

Citation

If you use this model in your research, please cite:

@misc{openllm2024,
  title={OpenLLM: Open Source Large Language Model Framework},
  author={Louis Chua Bean Chong},
  year={2024},
  url={https://github.com/louischua/openllm}
}

Model Card

Developed by: Louis Chua Bean Chong
Model type: Language Model
Language(s): English
License: GPLv3 / Commercial
Finetuned from model: Trained from scratch
Training data: SQUAD dataset
Training procedure: Supervised learning
Evaluation results: Basic text generation capability

lemms
/

openllm-small-extended-10k