--- language: - en license: - gpl-3.0 - other tags: - text-generation - language-model - open-source - gpt - transformer - causal-lm datasets: - squad metrics: - perplexity - loss library_name: transformers pipeline_tag: text-generation model-index: - name: OpenLLM Small Extended 7K results: - task: type: text-generation dataset: type: squad name: Wikipedia passages from SQuAD metrics: - type: loss value: 2.1 - type: perplexity value: 8.2 --- # OpenLLM Small Extended 7K Model ## ๐ŸŒŸ Model Overview This is the **OpenLLM Small Extended 7K** model, a 35.8M parameter GPT-style language model trained for 7,000 steps on Wikipedia passages from the SQuAD dataset. This model represents the latest iteration of our small model architecture with extended training. ### **๐Ÿ“Š Model Specifications** - **Architecture**: GPT-style Transformer - **Parameters**: 35,823,616 (35.8M) - **Layers**: 6 transformer layers - **Heads**: 8 attention heads - **Embedding Dimension**: 512 - **Vocabulary Size**: 32,000 tokens - **Context Length**: 1,024 tokens - **Training Steps**: 7,000 - **Model Size**: Small ### **๐ŸŽฏ Training Details** - **Dataset**: Wikipedia passages from SQuAD dataset (~41k passages) - **Tokenization**: SentencePiece with 32k vocabulary - **Training Objective**: Next token prediction (causal language modeling) - **Optimizer**: AdamW with learning rate scheduling - **Hardware**: Trained on consumer GPU with gradient accumulation ### **๐Ÿ“ Model Files** ``` huggingface/ โ”œโ”€โ”€ config.json # Model configuration โ”œโ”€โ”€ generation_config.json # Generation parameters โ”œโ”€โ”€ pytorch_model.bin # Model weights (161MB) โ”œโ”€โ”€ tokenizer_config.json # Tokenizer configuration โ”œโ”€โ”€ tokenizer.model # SentencePiece tokenizer โ””โ”€โ”€ load_hf_model.py # Loading script ``` ## ๐Ÿš€ Usage ### **Loading with Hugging Face Transformers** ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch # Load model and tokenizer model_name = "path/to/huggingface" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) # Generate text prompt = "The history of artificial intelligence" inputs = tokenizer(prompt, return_tensors="pt") with torch.no_grad(): outputs = model.generate( inputs.input_ids, max_new_tokens=100, temperature=0.7, do_sample=True, pad_token_id=tokenizer.pad_token_id ) generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) print(generated_text) ``` ### **Using the Custom Loader** ```python from load_hf_model import load_openllm_model # Load the model using our custom loader model, tokenizer = load_openllm_model("path/to/huggingface") # Generate text prompt = "Explain quantum computing in simple terms" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate( inputs.input_ids, max_new_tokens=150, temperature=0.8, top_p=0.9 ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ### **Inference Server** ```bash # Start the FastAPI inference server python core/src/inference_server.py \ --model_path exports/huggingface-7k/huggingface \ --port 8000 # Make API calls curl -X POST "http://localhost:8000/generate" \ -H "Content-Type: application/json" \ -d '{ "prompt": "The future of renewable energy", "max_tokens": 100, "temperature": 0.7 }' ``` ## ๐Ÿ“ˆ Performance ### **Training Metrics** - **Final Loss**: ~2.1 (cross-entropy) - **Training Time**: ~7 hours on consumer GPU - **Memory Usage**: ~2GB VRAM during training - **Inference Speed**: ~50 tokens/second on CPU, ~200 tokens/second on GPU ### **Model Capabilities** - **Text Generation**: Coherent paragraph generation - **Question Answering**: Basic factual responses - **Summarization**: Short text summarization - **Language Understanding**: Context-aware responses ## ๐Ÿ”ง Configuration ### **Generation Parameters** ```json { "max_length": 512, "max_new_tokens": 256, "temperature": 0.7, "top_k": 40, "top_p": 0.9, "do_sample": true, "pad_token_id": 0, "eos_token_id": 1, "bos_token_id": 2 } ``` ### **Model Architecture** ```json { "vocab_size": 32000, "n_layer": 6, "n_head": 8, "n_embd": 512, "block_size": 1024, "dropout": 0.1, "bias": true } ``` ## ๐Ÿงช Testing ### **Quick Test** ```python # Test the model with a simple prompt test_prompt = "Hello, how are you today?" inputs = tokenizer(test_prompt, return_tensors="pt") with torch.no_grad(): outputs = model.generate( inputs.input_ids, max_new_tokens=20, temperature=0.7 ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(f"Input: {test_prompt}") print(f"Output: {response}") ``` ## ๐Ÿ“‹ Limitations - **Context Length**: Limited to 1,024 tokens - **Training Data**: Only Wikipedia passages (limited domain) - **Model Size**: Small model with limited reasoning capabilities - **Bias**: May inherit biases from training data - **Factual Accuracy**: Not guaranteed for current events ## ๐Ÿ”„ Model Comparison | Model | Parameters | Training Steps | Context Length | Use Case | |-------|------------|----------------|----------------|----------| | Small 4K | 35.8M | 4,000 | 1,024 | Basic text generation | | Small 6K | 35.8M | 6,000 | 1,024 | Improved coherence | | **Small 7K** | **35.8M** | **7,000** | **1,024** | **Extended training** | ## ๐Ÿ“„ License This model is dual-licensed: - **Open Source**: GNU General Public License v3.0 - **Commercial**: Commercial License (contact for details) See `LICENSE` and `docs/LICENSES.md` for full license information. ## ๐Ÿค Contributing We welcome contributions to improve the model! Please see: - `docs/CONTRIBUTING.md` for contribution guidelines - `docs/CODE_OF_CONDUCT.md` for community standards ## ๐Ÿ“ž Support For questions, issues, or commercial licensing: - **GitHub Issues**: Report bugs and feature requests - **Documentation**: Check `docs/` directory - **Commercial License**: Contact for enterprise use --- **Author**: Louis Chua Bean Chong **Project**: OpenLLM - Open Source Large Language Model **Version**: 0.1.0 **Last Updated**: 2024