File size: 6,746 Bytes
a81b5cd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bc52965
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
---

language:
- en
license:
- gpl-3.0
- other
tags:
- text-generation
- language-model
- open-source
- gpt
- transformer
- causal-lm
datasets:
- squad
metrics:
- perplexity
- loss
library_name: transformers
pipeline_tag: text-generation
model-index:
- name: OpenLLM Small Extended 7K
  results:
  - task:
      type: text-generation
    dataset:
      type: squad
      name: Wikipedia passages from SQuAD
    metrics:
      - type: loss
        value: 2.1
      - type: perplexity
        value: 8.2
---


# OpenLLM Small Extended 7K Model

<!-- Copyright (C) 2024 Louis Chua Bean Chong -->
<!-- This file is part of OpenLLM - dual-licensed under GPLv3 and Commercial License -->

## 🌟 Model Overview

This is the **OpenLLM Small Extended 7K** model, a 35.8M parameter GPT-style language model trained for 7,000 steps on Wikipedia passages from the SQuAD dataset. This model represents the latest iteration of our small model architecture with extended training.

### **πŸ“Š Model Specifications**

- **Architecture**: GPT-style Transformer
- **Parameters**: 35,823,616 (35.8M)
- **Layers**: 6 transformer layers
- **Heads**: 8 attention heads
- **Embedding Dimension**: 512
- **Vocabulary Size**: 32,000 tokens
- **Context Length**: 1,024 tokens
- **Training Steps**: 7,000
- **Model Size**: Small

### **🎯 Training Details**

- **Dataset**: Wikipedia passages from SQuAD dataset (~41k passages)
- **Tokenization**: SentencePiece with 32k vocabulary
- **Training Objective**: Next token prediction (causal language modeling)
- **Optimizer**: AdamW with learning rate scheduling
- **Hardware**: Trained on consumer GPU with gradient accumulation

### **πŸ“ Model Files**

```

huggingface/

β”œβ”€β”€ config.json              # Model configuration

β”œβ”€β”€ generation_config.json   # Generation parameters

β”œβ”€β”€ pytorch_model.bin        # Model weights (161MB)

β”œβ”€β”€ tokenizer_config.json    # Tokenizer configuration

β”œβ”€β”€ tokenizer.model          # SentencePiece tokenizer

└── load_hf_model.py         # Loading script

```

## πŸš€ Usage

### **Loading with Hugging Face Transformers**

```python

from transformers import AutoTokenizer, AutoModelForCausalLM

import torch



# Load model and tokenizer

model_name = "path/to/huggingface"

tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForCausalLM.from_pretrained(model_name)



# Generate text

prompt = "The history of artificial intelligence"

inputs = tokenizer(prompt, return_tensors="pt")



with torch.no_grad():

    outputs = model.generate(

        inputs.input_ids,

        max_new_tokens=100,

        temperature=0.7,

        do_sample=True,

        pad_token_id=tokenizer.pad_token_id

    )



generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(generated_text)

```

### **Using the Custom Loader**

```python

from load_hf_model import load_openllm_model



# Load the model using our custom loader

model, tokenizer = load_openllm_model("path/to/huggingface")



# Generate text

prompt = "Explain quantum computing in simple terms"

inputs = tokenizer(prompt, return_tensors="pt")



outputs = model.generate(

    inputs.input_ids,

    max_new_tokens=150,

    temperature=0.8,

    top_p=0.9

)



print(tokenizer.decode(outputs[0], skip_special_tokens=True))

```

### **Inference Server**

```bash

# Start the FastAPI inference server

python core/src/inference_server.py \

    --model_path exports/huggingface-7k/huggingface \

    --port 8000



# Make API calls

curl -X POST "http://localhost:8000/generate" \

    -H "Content-Type: application/json" \

    -d '{

        "prompt": "The future of renewable energy",

        "max_tokens": 100,

        "temperature": 0.7

    }'

```

## πŸ“ˆ Performance

### **Training Metrics**

- **Final Loss**: ~2.1 (cross-entropy)
- **Training Time**: ~7 hours on consumer GPU
- **Memory Usage**: ~2GB VRAM during training
- **Inference Speed**: ~50 tokens/second on CPU, ~200 tokens/second on GPU

### **Model Capabilities**

- **Text Generation**: Coherent paragraph generation
- **Question Answering**: Basic factual responses
- **Summarization**: Short text summarization
- **Language Understanding**: Context-aware responses

## πŸ”§ Configuration

### **Generation Parameters**

```json

{

  "max_length": 512,

  "max_new_tokens": 256,

  "temperature": 0.7,

  "top_k": 40,

  "top_p": 0.9,

  "do_sample": true,

  "pad_token_id": 0,

  "eos_token_id": 1,

  "bos_token_id": 2

}

```

### **Model Architecture**

```json

{

  "vocab_size": 32000,

  "n_layer": 6,

  "n_head": 8,

  "n_embd": 512,

  "block_size": 1024,

  "dropout": 0.1,

  "bias": true

}

```

## πŸ§ͺ Testing

### **Quick Test**

```python

# Test the model with a simple prompt

test_prompt = "Hello, how are you today?"

inputs = tokenizer(test_prompt, return_tensors="pt")



with torch.no_grad():

    outputs = model.generate(

        inputs.input_ids,

        max_new_tokens=20,

        temperature=0.7

    )



response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(f"Input: {test_prompt}")

print(f"Output: {response}")

```

## πŸ“‹ Limitations

- **Context Length**: Limited to 1,024 tokens
- **Training Data**: Only Wikipedia passages (limited domain)
- **Model Size**: Small model with limited reasoning capabilities
- **Bias**: May inherit biases from training data
- **Factual Accuracy**: Not guaranteed for current events

## πŸ”„ Model Comparison

| Model | Parameters | Training Steps | Context Length | Use Case |
|-------|------------|----------------|----------------|----------|
| Small 4K | 35.8M | 4,000 | 1,024 | Basic text generation |
| Small 6K | 35.8M | 6,000 | 1,024 | Improved coherence |
| **Small 7K** | **35.8M** | **7,000** | **1,024** | **Extended training** |

## πŸ“„ License

This model is dual-licensed:
- **Open Source**: GNU General Public License v3.0
- **Commercial**: Commercial License (contact for details)

See `LICENSE` and `docs/LICENSES.md` for full license information.

## 🀝 Contributing

We welcome contributions to improve the model! Please see:
- `docs/CONTRIBUTING.md` for contribution guidelines
- `docs/CODE_OF_CONDUCT.md` for community standards

## πŸ“ž Support

For questions, issues, or commercial licensing:
- **GitHub Issues**: Report bugs and feature requests
- **Documentation**: Check `docs/` directory
- **Commercial License**: Contact for enterprise use

---

**Author**: Louis Chua Bean Chong  
**Project**: OpenLLM - Open Source Large Language Model  
**Version**: 0.1.0  
**Last Updated**: 2024