MistyozAI
/

CosmicFish-90M

@@ -3,13 +3,17 @@ license: apache-2.0
 tags:
 - text-generation
 - language-model
-- LLM
-- CosmicFish
-- 90M
 - transformer
 language: en
 datasets:
-- CosmicSet-1.0
 - akkiisfrommars/TreeCorpusCleanedmodel
 model_type: CosmicFish
 pipeline_tag: text-generation
@@ -28,7 +32,7 @@ A 90M parameter language model with modern architecture improvements developed b
 wget https://huggingface.co/MistyozAI/CosmicFish-90M/resolve/main/chat.py
 # Install dependencies
-pip install transformers huggingface-hub termcolor
 # Run the chat interface (automatically downloads model)
 python chat.py
@@ -42,16 +46,17 @@ The `chat.py` script handles all model loading, generation, and provides the bes
 - **Architecture**: CosmicFish (RoPE, GQA, SwiGLU, RMSNorm)
 - **Context Length**: 512 tokens
 - **Vocabulary**: 50,257 tokens
-- **Training Data**: CosmicSet 1.0
 - **Developer**: Mistyoz AI
 - **Repository**: MistyozAI/CosmicFish-90M
 ## Usage
 ### Installation
 ```bash
-pip install transformers huggingface-hub termcolor
 ```
 ### Quick Chat Interface
@@ -59,6 +64,7 @@ pip install transformers huggingface-hub termcolor
 ```python
 from transformers import GPT2Tokenizer
 from huggingface_hub import snapshot_download
 import torch
 import json
 import os
@@ -73,8 +79,8 @@ tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
 with open(os.path.join(cache_dir, "config.json")) as f:
     config_dict = json.load(f)
-# Load model weights
-state_dict = torch.load(os.path.join(cache_dir, "pytorch_model.bin"), map_location="cpu")
 # Note: Full model class available in the repository
 print("Model downloaded and ready for use!")
@@ -83,7 +89,7 @@ print("Model downloaded and ready for use!")
 ### Advanced Generation with Repetition Penalty
 ```python
-def generate_with_repetition_penalty(model, tokenizer, prompt, max_tokens=100, temperature=0.7, penalty=1.2):
     input_ids = torch.tensor(tokenizer.encode(prompt)).unsqueeze(0)
     generated = input_ids.clone()
@@ -112,6 +118,49 @@ def generate_with_repetition_penalty(model, tokenizer, prompt, max_tokens=100, t
     return tokenizer.decode(generated[0], skip_special_tokens=True)
 ```
 ### Chat Interface
 ```python
@@ -154,21 +203,23 @@ CosmicFish uses several modern improvements over standard transformers:
 ## Training
-- **Dataset**: CosmicSet 1.0
 - **Sequence Length**: 512 tokens
-- **Training Steps**: ~300K iterations
 - **Hardware**: Nvidia A40 x1
 ## Performance
 - **Speed**: Varies by hardware (not benchmarked)
-- **Memory**: ~500MB RAM (FP16)
 - **File Size**: 175MB
 ## Limitations
 - Small model size (90M parameters) may produce less accurate responses
 - 512 token context limit
 - Training data cutoff applies
 - May generate incorrect information
 - Cannot browse internet or access real-time data

 tags:
 - text-generation
 - language-model
+- causal-lm
+- cosmicfish
+- 90m
 - transformer
+- rope
+- gqa
+- swiglu
+- rmsnorm
 language: en
 datasets:
+- CosmicSet-2.0-mini
 - akkiisfrommars/TreeCorpusCleanedmodel
 model_type: CosmicFish
 pipeline_tag: text-generation
 wget https://huggingface.co/MistyozAI/CosmicFish-90M/resolve/main/chat.py
 # Install dependencies
+pip install transformers huggingface-hub termcolor safetensors
 # Run the chat interface (automatically downloads model)
 python chat.py
 - **Architecture**: CosmicFish (RoPE, GQA, SwiGLU, RMSNorm)
 - **Context Length**: 512 tokens
 - **Vocabulary**: 50,257 tokens
+- **Training Data**: CosmicSet 2.0 mini
 - **Developer**: Mistyoz AI
 - **Repository**: MistyozAI/CosmicFish-90M
+- **Format**: Safetensors
 ## Usage
 ### Installation
 ```bash
+pip install transformers huggingface-hub termcolor safetensors
 ```
 ### Quick Chat Interface
 ```python
 from transformers import GPT2Tokenizer
 from huggingface_hub import snapshot_download
+from safetensors.torch import load_file
 import torch
 import json
 import os
 with open(os.path.join(cache_dir, "config.json")) as f:
     config_dict = json.load(f)
+# Load model weights from safetensors
+state_dict = load_file(os.path.join(cache_dir, "model.safetensors"))
 # Note: Full model class available in the repository
 print("Model downloaded and ready for use!")
 ### Advanced Generation with Repetition Penalty
 ```python
+def generate_with_repetition_penalty(model, tokenizer, prompt, max_tokens=100, temperature=0.5, penalty=1.2):
     input_ids = torch.tensor(tokenizer.encode(prompt)).unsqueeze(0)
     generated = input_ids.clone()
     return tokenizer.decode(generated[0], skip_special_tokens=True)
 ```
+### Loading Model with Safetensors
+```python
+from safetensors.torch import load_file
+from modeling_cosmicfish import CosmicFish, CosmicConfig
+import json
+def load_cosmicfish_model(model_path):
+    # Load config
+    with open(os.path.join(model_path, "config.json")) as f:
+        config_dict = json.load(f)
+    # Create model config
+    config = CosmicConfig(
+        vocab_size=config_dict["vocab_size"],
+        block_size=config_dict["block_size"],
+        n_layer=config_dict["n_layer"],
+        n_head=config_dict["n_head"],
+        n_embd=config_dict["n_embd"],
+        bias=config_dict["bias"],
+        dropout=0.0,
+        use_rotary=config_dict["use_rotary"],
+        use_swiglu=config_dict["use_swiglu"],
+        use_gqa=config_dict["use_gqa"],
+        n_query_groups=config_dict["n_query_groups"]
+    )
+    # Create model
+    model = CosmicFish(config)
+    # Load weights from safetensors (secure format)
+    state_dict = load_file(os.path.join(model_path, "model.safetensors"))
+    # Handle weight sharing (lm_head.weight shares with transformer.wte.weight)
+    if 'lm_head.weight' not in state_dict and 'transformer.wte.weight' in state_dict:
+        state_dict['lm_head.weight'] = state_dict['transformer.wte.weight']
+    model.load_state_dict(state_dict)
+    model.eval()
+    return model
+```
 ### Chat Interface
 ```python
 ## Training
+- **Dataset**: CosmicSet 2.0 mini
 - **Sequence Length**: 512 tokens
+- **Training Steps**: ~250K iterations
 - **Hardware**: Nvidia A40 x1
 ## Performance
 - **Speed**: Varies by hardware (not benchmarked)
+- **Memory**: ~256MB RAM
 - **File Size**: 175MB
+- **Loading**: Fast and secure with safetensors
 ## Limitations
 - Small model size (90M parameters) may produce less accurate responses
 - 512 token context limit
+- English only
 - Training data cutoff applies
 - May generate incorrect information
 - Cannot browse internet or access real-time data