# OpenLLM Small Extended 10k This is the OpenLLM small model trained for 10,000 steps on the SQUAD dataset. ## Model Details - **Model Type**: GPT-style transformer (decoder-only) - **Training Steps**: 10,000 - **Parameters**: 35.8M - **Vocabulary Size**: 32,000 - **Context Length**: 1,024 tokens - **Architecture**: 6 layers, 8 attention heads, 512 embedding dimension ## Training Information - **Dataset**: SQUAD (Stanford Question Answering Dataset) - **Training Data**: ~41k Wikipedia passages - **Tokenizer**: SentencePiece BPE with 32k vocabulary - **Optimizer**: AdamW - **Learning Rate**: 3e-4 - **Batch Size**: 4 (with gradient accumulation) ## Performance - **Final Loss**: ~5.22 - **Inference Speed**: ~8.3 tokens/second (CPU) - **Memory Usage**: ~143MB for inference ## Usage ### Using the Model ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch # Load model and tokenizer model_name = "lemms/openllm-small-extended-10k" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) # Generate text prompt = "The future of artificial intelligence" inputs = tokenizer(prompt, return_tensors="pt") with torch.no_grad(): outputs = model.generate( inputs["input_ids"], max_length=100, temperature=0.7, do_sample=True, pad_token_id=tokenizer.eos_token_id ) generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) print(generated_text) ``` ### Using the Custom Loader ```python from load_hf_model import load_model_and_tokenizer # Load model using custom loader model, tokenizer = load_model_and_tokenizer("lemms/openllm-small-extended-10k") # Generate text prompt = "The history of machine learning" inputs = tokenizer(prompt, return_tensors="pt") with torch.no_grad(): outputs = model.generate( inputs["input_ids"], max_length=100, temperature=0.7 ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Model Architecture This model follows the standard GPT architecture: - **Token Embeddings**: Maps token IDs to dense vectors - **Positional Embeddings**: Adds position information - **Transformer Blocks**: 6 layers with multi-head attention and feed-forward networks - **Layer Normalization**: Pre-norm placement for training stability - **Output Head**: Linear projection to vocabulary for next-token prediction ## Training Details The model was trained using: - **Framework**: PyTorch - **Hardware**: CPU training with gradient accumulation - **Regularization**: Dropout (0.1), weight decay - **Optimization**: AdamW with cosine learning rate scheduling - **Gradient Clipping**: 1.0 ## Limitations - This is a small model (35.8M parameters) with limited capacity - Training was done on CPU, which limited the training steps - Model quality is basic and suitable for educational/research purposes - Not suitable for production use without further training ## License This model is dual-licensed: - **Open Source**: GPLv3 License - **Commercial**: Commercial License available ## Citation If you use this model in your research, please cite: ```bibtex @misc{openllm2024, title={OpenLLM: Open Source Large Language Model Framework}, author={Louis Chua Bean Chong}, year={2024}, url={https://github.com/louischua/openllm} } ``` ## Model Card - **Developed by**: Louis Chua Bean Chong - **Model type**: Language Model - **Language(s)**: English - **License**: GPLv3 / Commercial - **Finetuned from model**: Trained from scratch - **Training data**: SQUAD dataset - **Training procedure**: Supervised learning - **Evaluation results**: Basic text generation capability ## Related Models - [lemms/openllm-small-extended-4k](https://huggingface.co/lemms/openllm-small-extended-4k) - [lemms/openllm-small-extended-6k](https://huggingface.co/lemms/openllm-small-extended-6k) - [lemms/openllm-small-extended-7k](https://huggingface.co/lemms/openllm-small-extended-7k) - [lemms/openllm-small-extended-8k](https://huggingface.co/lemms/openllm-small-extended-8k) - [lemms/openllm-small-extended-9k](https://huggingface.co/lemms/openllm-small-extended-9k)