|
|
--- |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- text-generation |
|
|
- pytorch |
|
|
- deepseek |
|
|
- mixture-of-experts |
|
|
- moe |
|
|
- tinystories |
|
|
- language-model |
|
|
- multi-head-latent-attention |
|
|
datasets: |
|
|
- roneneldan/TinyStories |
|
|
--- |
|
|
|
|
|
# Deepseek-inspired TinyStories Model |
|
|
|
|
|
This is a Deepseek-inspired model trained on TinyStories dataset, featuring Mixture of Experts (MoE) architecture. |
|
|
|
|
|
Github: https://github.com/sky-2002/Generative-Modelling/tree/master/deepseek |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Model Type**: Autoregressive Language Model with Mixture of Experts |
|
|
- **Architecture**: Deepseek-inspired with MHLA, MoE layers with auxiliary loss free load balancing, etc |
|
|
- **Parameters**: ~60M |
|
|
- **Training Data**: TinyStories dataset |
|
|
- **License**: MIT |
|
|
|
|
|
## Model Architecture |
|
|
|
|
|
- **Attention Heads**: 8 |
|
|
- **Embedding Dimension**: 512 |
|
|
- **Max Sequence Length**: 512 |
|
|
- **MoE Configuration**: |
|
|
- Shared Experts: 2 |
|
|
- Routed Experts: 4 |
|
|
- Top-K routing: 2 |
|
|
- Expert Intermediate Dimension: 1536 |
|
|
|
|
|
## Usage Example |
|
|
|
|
|
### Method 1: Direct imports from package |
|
|
```python |
|
|
from deepseek_tinystories import DeepseekInspiredModel, DeepSeekModelConfig, TinyStoriesProcesssor, generate_text |
|
|
import torch, json |
|
|
|
|
|
# Load config & model |
|
|
config = DeepSeekModelConfig(**json.load(open("config.json"))) |
|
|
model = DeepseekInspiredModel(config) |
|
|
model.load_state_dict(torch.load("pytorch_model.bin", map_location="cpu")) |
|
|
model.eval() |
|
|
|
|
|
# Initialize processor |
|
|
processor = TinyStoriesProcesssor() |
|
|
|
|
|
# Generate text |
|
|
prompt = "Once upon a time, there was a little girl..." |
|
|
generated_text = generate_text( |
|
|
model=model, |
|
|
data_processor=processor, |
|
|
prompt=prompt, |
|
|
max_new_tokens=50, |
|
|
temperature=0.8, |
|
|
top_k=40, |
|
|
device="cpu" |
|
|
) |
|
|
print(generated_text) |
|
|
``` |
|
|
|
|
|
### Method 2: Using snapshot_download |
|
|
```python |
|
|
from huggingface_hub import snapshot_download |
|
|
import sys |
|
|
import torch, json |
|
|
|
|
|
# Download the entire repository |
|
|
repo_id = "sky-2002/deepseek-tinystories-60M" |
|
|
repo_dir = snapshot_download(repo_id) |
|
|
|
|
|
# Import the local package from the downloaded repo |
|
|
sys.path.append(str(repo_dir)) |
|
|
|
|
|
from deepseek_tinystories import DeepseekInspiredModel, DeepSeekModelConfig, TinyStoriesProcesssor, generate_text |
|
|
|
|
|
# Load config & model |
|
|
config_path = f"{repo_dir}/config.json" |
|
|
model_path = f"{repo_dir}/pytorch_model.bin" |
|
|
|
|
|
config = DeepSeekModelConfig(**json.load(open(config_path))) |
|
|
model = DeepseekInspiredModel(config) |
|
|
model.load_state_dict(torch.load(model_path, map_location="cpu")) |
|
|
model.eval() |
|
|
|
|
|
# Initialize processor |
|
|
processor = TinyStoriesProcesssor() |
|
|
|
|
|
# Generate text |
|
|
prompt = "Once upon a time, there was a little girl..." |
|
|
generated_text = generate_text( |
|
|
model=model, |
|
|
data_processor=processor, |
|
|
prompt=prompt, |
|
|
max_new_tokens=50, |
|
|
temperature=0.8, |
|
|
top_k=40, |
|
|
device="cpu" |
|
|
) |
|
|
print(generated_text) |
|
|
``` |
|
|
|
|
|
### Method 3: Module-specific imports |
|
|
```python |
|
|
from deepseek_tinystories.modeling_deepseek import DeepseekInspiredModel, DeepSeekModelConfig |
|
|
from deepseek_tinystories.processor import TinyStoriesProcesssor |
|
|
from deepseek_tinystories.utils import generate_text |
|
|
import torch, json |
|
|
|
|
|
# Load config & model |
|
|
config = DeepSeekModelConfig(**json.load(open("config.json"))) |
|
|
model = DeepseekInspiredModel(config) |
|
|
|
|
|
# Load clean model weights (not checkpoint) |
|
|
model.load_state_dict(torch.load("pytorch_model.bin", map_location="cpu")) |
|
|
model.eval() |
|
|
|
|
|
# Initialize processor |
|
|
processor = TinyStoriesProcesssor() |
|
|
|
|
|
# Generate text using utils |
|
|
prompt = "Once upon a time, there was a little girl..." |
|
|
generated_text = generate_text( |
|
|
model=model, |
|
|
data_processor=processor, |
|
|
prompt=prompt, |
|
|
max_new_tokens=50, |
|
|
temperature=0.8, |
|
|
top_k=40, |
|
|
device="cpu" |
|
|
) |
|
|
print(generated_text) |
|
|
``` |
|
|
|