File size: 3,699 Bytes
a12c1bc
0f7477b
7503eca
e0acb15
 
a12c1bc
 
 
 
 
27b9282
3ce5c98
27b9282
3ce5c98
27b9282
3ce5c98
 
 
 
 
 
73df0d9
a12c1bc
 
27b9282
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
title: VelocityLM
emoji: πŸš€
colorFrom: yellow
colorTo: red
sdk: gradio
sdk_version: 5.43.1
app_file: app.py
pinned: false
license: mit
models:
- gpt2
datasets:
- tiiuae/falcon-refinedweb
tags:
- text-generation
- transformer
- pytorch
- custom-model
- llm
- foundational-model
short_description: FoundationalLM for fast text-generation
---

# πŸ€– Custom LLM - Foundational Language Model

A custom-trained foundational language model with **2 billion parameters**, built with modern transformer architecture and deployed with streaming text generation capabilities.

## πŸš€ Features

- **Custom Architecture**: Modern transformer with RoPE (Rotary Position Embedding), RMSNorm, and SwiGLU activation
- **Streaming Generation**: Real-time text generation with token-by-token streaming
- **Flexible Sampling**: Configurable temperature, top-p, top-k, and repetition penalty
- **ZeroGPU Integration**: Optimized for Hugging Face Spaces with GPU acceleration
- **Responsive UI**: Clean, intuitive Gradio interface

## πŸ“Š Model Details

| Specification | Value |
|---------------|-------|
| **Parameters** | ~2 billion |
| **Architecture** | Custom Transformer |
| **Context Length** | 2,048 tokens |
| **Vocab Size** | 50,257 (GPT-2 tokenizer) |
| **Layers** | 24 |
| **Attention Heads** | 32 |
| **Hidden Size** | 2,048 |
| **Intermediate Size** | 8,192 |

## πŸ—οΈ Architecture Components

- **RMSNorm**: Root Mean Square Layer Normalization for better training stability
- **RoPE**: Rotary Position Embeddings for better length extrapolation
- **SwiGLU**: Switch GLU activation function for improved performance
- **Causal Attention**: Standard autoregressive attention mechanism

## 🎯 Training Details

- **Dataset**: Falcon RefinedWeb (curated web text)
- **Training Steps**: 100,000 steps
- **Learning Rate**: 6e-4 with warmup and decay
- **Batch Size**: 32 (4 per device Γ— 8 accumulation steps)
- **Optimization**: AdamW with Ξ²1=0.9, Ξ²2=0.95
- **Precision**: Mixed precision (FP16)

## πŸ› οΈ Generation Parameters

- **Max Tokens**: Control the length of generated text (1-1024)
- **Temperature**: Sampling randomness (0.1-2.0, higher = more creative)
- **Top-p**: Nucleus sampling threshold (0.1-1.0)
- **Top-k**: Top-k sampling limit (0-200, 0 = disabled)
- **Repetition Penalty**: Reduce repetitive text (1.0-2.0)

## πŸ’‘ Usage Tips

1. **For Creative Writing**: Use higher temperature (1.0-1.5) and top-p (0.9-0.95)
2. **For Factual Content**: Use lower temperature (0.3-0.7) and top-p (0.8-0.9)
3. **For Code Generation**: Use temperature ~0.2 with top-k filtering
4. **Longer Context**: The model handles up to 2,048 tokens of context

## 🚨 Limitations

- **Knowledge Cutoff**: Training data knowledge cutoff varies by source
- **Biases**: May reflect biases present in training data
- **Factuality**: Generated content should be verified for factual accuracy
- **Context Window**: Limited to 2,048 tokens (approximately 1,500 words)

## πŸ”§ Technical Implementation

The model uses a custom PyTorch implementation with:
- Efficient attention mechanisms
- Memory-optimized layer implementations
- Streaming generation with proper token handling
- GPU acceleration via ZeroGPU

## πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

## πŸ™ Acknowledgments

- Hugging Face for the Spaces platform and ZeroGPU infrastructure
- The open-source community for transformer implementations and best practices
- TII UAE for the Falcon RefinedWeb dataset

---

**Note**: This is a foundational language model trained for research and educational purposes. Please use responsibly and be aware of potential biases and limitations.