--- language: - en license: - gpl-3.0 - other tags: - text-generation - language-model - gpt - transformer - open-source - squad - wikipedia datasets: - squad metrics: - perplexity - text-generation-quality library_name: transformers pipeline_tag: text-generation model-index: - name: OpenLLM Small Extended 6k results: - task: type: text-generation dataset: type: squad name: SQUAD Wikipedia Passages metrics: - type: perplexity value: 816.04 - type: training_loss value: 5.4302 --- # OpenLLM Small Extended 6k This is the OpenLLM Small Extended model trained for 6,000 steps on Wikipedia passages from the SQUAD dataset. ## Model Details - **Model Type:** GPT-style Transformer - **Architecture:** Small (35.8M parameters) - **Training Steps:** 6,000 - **Training Data:** ~41k Wikipedia passages from SQUAD dataset - **Tokenizer:** SentencePiece BPE (32k vocabulary) - **License:** GPL-3.0 (Open Source) / Commercial License available ## Model Performance - **Final Training Loss:** 5.4302 - **Model Parameters:** 35,823,616 - **Context Length:** 512 tokens - **Training Hardware:** CPU/GPU compatible ## Usage ### Using Transformers ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch # Load model and tokenizer model_name = "lemms/openllm-small-extended-6k" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) # Generate text prompt = "The history of artificial intelligence" inputs = tokenizer(prompt, return_tensors="pt") with torch.no_grad(): outputs = model.generate( inputs.input_ids, max_new_tokens=50, temperature=0.7, top_k=40, do_sample=True ) generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) print(generated_text) ``` ### Using the Custom Loader ```python # Use the provided load_hf_model.py script from load_hf_model import load_model_and_tokenizer model, tokenizer = load_model_and_tokenizer() # ... rest of usage ``` ## Training Details This model was trained using the OpenLLM training pipeline: 1. **Data Preparation:** SQUAD dataset processing (~41k passages) 2. **Tokenizer Training:** SentencePiece BPE with 32k vocabulary 3. **Model Training:** GPT-style transformer for 6,000 steps 4. **Evaluation:** Perplexity and text generation quality assessment ## Model Architecture - **Layers:** 12 transformer layers - **Attention Heads:** 12 - **Hidden Size:** 768 - **Intermediate Size:** 3072 - **Activation:** GELU - **Layer Norm:** Pre-norm ## Limitations - **Training Data:** Limited to Wikipedia passages - **Context Length:** 512 tokens maximum - **Model Size:** Small model with 35.8M parameters - **Performance:** Basic text generation capabilities ## License This model is dual-licensed: - **Open Source:** GPL-3.0 for research and community use - **Commercial:** Commercial license available for enterprise use For commercial licensing, contact: louischua@gmail.com ## Citation If you use this model in your research, please cite: ```bibtex @misc{openllm2024, title={OpenLLM: Open Source Large Language Model}, author={Louis Chua Bean Chong}, year={2024}, url={https://github.com/louischua/openllm} } ``` ## Links - **Repository:** https://github.com/louischua/openllm - **Documentation:** https://github.com/louischua/openllm/docs - **Training Pipeline:** https://github.com/louischua/openllm/docs/training_pipeline.md