--- title: VelocityLM emoji: 🚀 colorFrom: yellow colorTo: red sdk: gradio sdk_version: 5.43.1 app_file: app.py pinned: false license: mit models: - gpt2 datasets: - tiiuae/falcon-refinedweb tags: - text-generation - transformer - pytorch - custom-model - llm - foundational-model short_description: FoundationalLM for fast text-generation --- # 🤖 Custom LLM - Foundational Language Model A custom-trained foundational language model with **2 billion parameters**, built with modern transformer architecture and deployed with streaming text generation capabilities. ## 🚀 Features - **Custom Architecture**: Modern transformer with RoPE (Rotary Position Embedding), RMSNorm, and SwiGLU activation - **Streaming Generation**: Real-time text generation with token-by-token streaming - **Flexible Sampling**: Configurable temperature, top-p, top-k, and repetition penalty - **ZeroGPU Integration**: Optimized for Hugging Face Spaces with GPU acceleration - **Responsive UI**: Clean, intuitive Gradio interface ## 📊 Model Details | Specification | Value | |---------------|-------| | **Parameters** | ~2 billion | | **Architecture** | Custom Transformer | | **Context Length** | 2,048 tokens | | **Vocab Size** | 50,257 (GPT-2 tokenizer) | | **Layers** | 24 | | **Attention Heads** | 32 | | **Hidden Size** | 2,048 | | **Intermediate Size** | 8,192 | ## 🏗️ Architecture Components - **RMSNorm**: Root Mean Square Layer Normalization for better training stability - **RoPE**: Rotary Position Embeddings for better length extrapolation - **SwiGLU**: Switch GLU activation function for improved performance - **Causal Attention**: Standard autoregressive attention mechanism ## 🎯 Training Details - **Dataset**: Falcon RefinedWeb (curated web text) - **Training Steps**: 100,000 steps - **Learning Rate**: 6e-4 with warmup and decay - **Batch Size**: 32 (4 per device × 8 accumulation steps) - **Optimization**: AdamW with β1=0.9, β2=0.95 - **Precision**: Mixed precision (FP16) ## 🛠️ Generation Parameters - **Max Tokens**: Control the length of generated text (1-1024) - **Temperature**: Sampling randomness (0.1-2.0, higher = more creative) - **Top-p**: Nucleus sampling threshold (0.1-1.0) - **Top-k**: Top-k sampling limit (0-200, 0 = disabled) - **Repetition Penalty**: Reduce repetitive text (1.0-2.0) ## 💡 Usage Tips 1. **For Creative Writing**: Use higher temperature (1.0-1.5) and top-p (0.9-0.95) 2. **For Factual Content**: Use lower temperature (0.3-0.7) and top-p (0.8-0.9) 3. **For Code Generation**: Use temperature ~0.2 with top-k filtering 4. **Longer Context**: The model handles up to 2,048 tokens of context ## 🚨 Limitations - **Knowledge Cutoff**: Training data knowledge cutoff varies by source - **Biases**: May reflect biases present in training data - **Factuality**: Generated content should be verified for factual accuracy - **Context Window**: Limited to 2,048 tokens (approximately 1,500 words) ## 🔧 Technical Implementation The model uses a custom PyTorch implementation with: - Efficient attention mechanisms - Memory-optimized layer implementations - Streaming generation with proper token handling - GPU acceleration via ZeroGPU ## 📝 License This project is licensed under the MIT License - see the LICENSE file for details. ## 🙏 Acknowledgments - Hugging Face for the Spaces platform and ZeroGPU infrastructure - The open-source community for transformer implementations and best practices - TII UAE for the Falcon RefinedWeb dataset --- **Note**: This is a foundational language model trained for research and educational purposes. Please use responsibly and be aware of potential biases and limitations.