File size: 4,698 Bytes
46d9707
 
8493c0e
 
 
c2b755c
8493c0e
 
 
 
 
 
 
 
 
 
 
46d9707
8493c0e
 
 
 
c2b755c
95dc308
8493c0e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95dc308
 
 
 
8493c0e
95dc308
8493c0e
 
 
95dc308
 
8493c0e
 
 
 
 
95dc308
 
8493c0e
95dc308
8493c0e
 
 
95dc308
8493c0e
95dc308
8493c0e
 
 
95dc308
 
8493c0e
 
 
 
 
95dc308
 
 
8493c0e
95dc308
8493c0e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
language: en
license: mit
library_name: pytorch
pipeline_tag: text-generation
tags:
- deepseek
- cpu-optimized
- transformer
- language-model
- tinystories
- grouped-query-attention
- rotary-position-embeddings
- rmsnorm
- swiglu
datasets:
- roneneldan/TinyStories
---

# Shoonya Model v0.2 - DeepSeek CPU-Optimized

This model is a CPU-optimized version of the Shoonya language model, incorporating techniques from the DeepSeek team for efficient inference on CPU hardware.

## Model Description

**Shoonya Model v0.2** is a lightweight transformer-based language model designed for efficient CPU inference. It incorporates architectural optimizations inspired by DeepSeek's research to achieve better performance on CPU hardware while maintaining good generation quality.

### Model Details

- **Developed by:** VaidhyaMegha
- **Model type:** Transformer-based language model
- **Language(s):** English
- **Training Data:** TinyStories dataset
- **Parameters:** 16.41M
- **Context Length:** 512 tokens
- **Hidden Size:** 256
- **Attention Heads:** 8
- **Key-Value Heads:** 4
- **Hidden Layers:** 6
- **License:** MIT
- **Repository:** [GitHub - VaidhyaMegha/Shoonya](https://github.com/VaidhyaMegha/Shoonya)

## DeepSeek CPU Optimizations

This model incorporates the following optimizations from the DeepSeek team:

1. **Grouped-Query Attention (GQA)** with a 2:1 ratio - Reduces memory usage and computational cost by sharing key and value projections across multiple query heads
2. **Rotary Position Embeddings (RoPE)** - Provides better positional encoding with improved extrapolation to longer sequences
3. **RMSNorm** - Offers improved training stability compared to LayerNorm
4. **SwiGLU activation** - Provides better performance in feed-forward networks compared to standard GELU
5. **Sliding Window Attention** with window size 256 - Reduces memory usage for longer sequences by limiting attention to a local window
6. **ONNX export** - Enables optimized runtime on various hardware platforms

## Intended Uses & Limitations

**Intended Uses:**
- Educational purposes to understand transformer architecture and optimizations
- Research on efficient language model deployment
- Text generation for simple creative writing tasks
- Baseline for further fine-tuning on specific tasks

**Limitations:**
- The model is trained on a limited dataset (TinyStories) and has a relatively small parameter count
- It may not perform well on complex reasoning tasks or specialized domains
- The model has not been extensively evaluated for biases or harmful outputs

## Training Procedure

### Training Data

The model was trained on the [TinyStories dataset](https://huggingface.co/datasets/roneneldan/TinyStories), which contains simple stories suitable for young children, generated by GPT-3.5/4.

### Training Hyperparameters

- **Optimizer:** AdamW
- **Learning Rate:** 5e-5
- **Batch Size:** 4
- **Weight Decay:** 0.01
- **Warmup Steps:** 100
- **Gradient Accumulation Steps:** 4
- **Training Device:** CPU (Mac Mini M4)
- **Training Epochs:** 5

## Note on Quantization

The quantized version of this model is not included due to PyTorch quantization limitations on Mac M-series chips. See quantization_note.md for instructions on how to quantize the model on a compatible system.

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("VaidhyaMegha/Shoonya")
tokenizer = AutoTokenizer.from_pretrained("VaidhyaMegha/Shoonya")

# Generate text
input_text = "Once upon a time"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
output = model.generate(input_ids, max_length=100, temperature=0.7, top_p=0.9, repetition_penalty=1.1)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)
```

## Evaluation Results

The model achieved the following metrics during training:
- **Final Loss:** 7.21
- **Final Perplexity:** 1358.28

## Ethical Considerations

This model is trained on the TinyStories dataset, which was designed to be suitable for children and contains simple, non-harmful content. However, as with any language model, it may still produce unexpected or potentially problematic outputs. Users should exercise caution and implement appropriate content filtering if deploying this model in production environments.

## Citations

```bibtex
@article{eldan2023tinystories,
  title={{TinyStories: How Small Can Language Models Be and Still Speak Coherent English?}},
  author={Eldan, Ronen and Li, Yuanzhi},
  journal={arXiv preprint arXiv:2305.07759},
  year={2023}
}
```

## License

This model is released under the MIT License.