--- language: - en tags: - text-generation - pytorch - deepseek - mixture-of-experts - moe - tinystories - language-model - multi-head-latent-attention datasets: - roneneldan/TinyStories --- # Deepseek-inspired TinyStories Model This is a Deepseek-inspired model trained on TinyStories dataset, featuring Mixture of Experts (MoE) architecture. Github: https://github.com/sky-2002/Generative-Modelling/tree/master/deepseek ## Model Details - **Model Type**: Autoregressive Language Model with Mixture of Experts - **Architecture**: Deepseek-inspired with MHLA, MoE layers with auxiliary loss free load balancing, etc - **Parameters**: ~60M - **Training Data**: TinyStories dataset - **License**: MIT ## Model Architecture - **Attention Heads**: 8 - **Embedding Dimension**: 512 - **Max Sequence Length**: 512 - **MoE Configuration**: - Shared Experts: 2 - Routed Experts: 4 - Top-K routing: 2 - Expert Intermediate Dimension: 1536 ## Usage Example ### Method 1: Direct imports from package ```python from deepseek_tinystories import DeepseekInspiredModel, DeepSeekModelConfig, TinyStoriesProcesssor, generate_text import torch, json # Load config & model config = DeepSeekModelConfig(**json.load(open("config.json"))) model = DeepseekInspiredModel(config) model.load_state_dict(torch.load("pytorch_model.bin", map_location="cpu")) model.eval() # Initialize processor processor = TinyStoriesProcesssor() # Generate text prompt = "Once upon a time, there was a little girl..." generated_text = generate_text( model=model, data_processor=processor, prompt=prompt, max_new_tokens=50, temperature=0.8, top_k=40, device="cpu" ) print(generated_text) ``` ### Method 2: Using snapshot_download ```python from huggingface_hub import snapshot_download import sys import torch, json # Download the entire repository repo_id = "sky-2002/deepseek-tinystories-60M" repo_dir = snapshot_download(repo_id) # Import the local package from the downloaded repo sys.path.append(str(repo_dir)) from deepseek_tinystories import DeepseekInspiredModel, DeepSeekModelConfig, TinyStoriesProcesssor, generate_text # Load config & model config_path = f"{repo_dir}/config.json" model_path = f"{repo_dir}/pytorch_model.bin" config = DeepSeekModelConfig(**json.load(open(config_path))) model = DeepseekInspiredModel(config) model.load_state_dict(torch.load(model_path, map_location="cpu")) model.eval() # Initialize processor processor = TinyStoriesProcesssor() # Generate text prompt = "Once upon a time, there was a little girl..." generated_text = generate_text( model=model, data_processor=processor, prompt=prompt, max_new_tokens=50, temperature=0.8, top_k=40, device="cpu" ) print(generated_text) ``` ### Method 3: Module-specific imports ```python from deepseek_tinystories.modeling_deepseek import DeepseekInspiredModel, DeepSeekModelConfig from deepseek_tinystories.processor import TinyStoriesProcesssor from deepseek_tinystories.utils import generate_text import torch, json # Load config & model config = DeepSeekModelConfig(**json.load(open("config.json"))) model = DeepseekInspiredModel(config) # Load clean model weights (not checkpoint) model.load_state_dict(torch.load("pytorch_model.bin", map_location="cpu")) model.eval() # Initialize processor processor = TinyStoriesProcesssor() # Generate text using utils prompt = "Once upon a time, there was a little girl..." generated_text = generate_text( model=model, data_processor=processor, prompt=prompt, max_new_tokens=50, temperature=0.8, top_k=40, device="cpu" ) print(generated_text) ```