MistyozAI
/

CosmicFish-120M

+---
+license: apache-2.0
+tags:
+- text-generation
+- language-model
+- LLM
+- CosmicFish
+- 120M
+- transformer
+language: en
+datasets: CosmicSet-1.0
+model_type: CosmicFish
+---
+# CosmicFish-120M
+A 120M parameter causal language model with modern architecture improvements.
+## Model Details
+- **Parameters**: 121M
+- **Architecture**: CosmicFish (RoPE, GQA, SwiGLU, RMSNorm)
+- **Context Length**: 512 tokens
+- **Vocabulary**: 50,257 tokens
+- **Training Data**: CosmicSet 1.0
+- **Developer**: Mistyoz AI
+## Usage
+### Installation
+```bash
+pip install torch transformers
+```
+### Loading the Model
+```python
+import torch
+import json
+from transformers import GPT2Tokenizer
+from modeling_cosmicfish import CosmicFish, CosmicConfig
+# Load model
+with open("config.json") as f:
+    config_dict = json.load(f)
+config = CosmicConfig(**{k: v for k, v in config_dict.items() if k in [
+    'vocab_size', 'block_size', 'n_layer', 'n_head', 'n_embd', 'bias',
+    'use_rotary', 'use_swiglu', 'use_gqa', 'n_query_groups'
+]})
+config.dropout = 0.0  # Inference mode
+model = CosmicFish(config)
+model.load_state_dict(torch.load("pytorch_model.bin", map_location="cpu"))
+model.eval()
+# Load tokenizer
+tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
+```
+### Basic Generation
+```python
+def generate_text(prompt, max_tokens=100):
+    inputs = tokenizer.encode(prompt, return_tensors="pt")
+    with torch.no_grad():
+        outputs = model.generate(
+            inputs,
+            max_new_tokens=max_tokens,
+            temperature=0.7,
+            top_k=40,
+            do_sample=True
+        )
+    return tokenizer.decode(outputs[0], skip_special_tokens=True)
+# Example
+text = generate_text("The future of AI is")
+print(text)
+```
+### Chat Interface
+```python
+def chat_with_model():
+    conversation = []
+    while True:
+        user_input = input("You: ")
+        if user_input.lower() in ['quit', 'exit']:
+            break
+        context = "Below is a conversation between a human and an AI assistant.\n\n"
+        for human, ai in conversation:
+            context += f"Human: {human}\nAssistant: {ai}\n\n"
+        context += f"Human: {user_input}\nAssistant:"
+        # Generate response
+        inputs = tokenizer.encode(context, return_tensors="pt")
+        if inputs.shape[1] > 400:
+            inputs = inputs[:, -400:]
+        with torch.no_grad():
+            outputs = model.generate(
+                inputs,
+                max_new_tokens=150,
+                temperature=0.7,
+                top_k=40,
+                do_sample=True,
+                pad_token_id=tokenizer.eos_token_id
+            )
+        response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
+        response = response.split('\n')[0].strip()
+        print(f"CosmicFish: {response}")
+        conversation.append((user_input, response))
+chat_with_model()
+```
+## Architecture
+CosmicFish uses several modern improvements over standard transformers:
+- **RoPE (Rotary Position Embeddings)**: Better position encoding than absolute positions
+- **GQA (Grouped-Query Attention)**: Reduces memory usage with 4 query groups
+- **SwiGLU**: More effective activation function than ReLU/GELU
+- **RMSNorm**: Simpler, more stable normalization than LayerNorm
+## Training
+- **Dataset**: CosmicSet 1.0
+- **Sequence Length**: 512 tokens
+- **Training Steps**: ~300K iterations
+- **Hardware**: Nvidia A40 x1
+## Performance
+- **Speed**: Varies by hardware (not benchmarked)
+- **Memory**: ~500MB RAM (FP16)
+- **File Size**: 243MB
+## Limitations
+- Small model size (120M parameters) may produce less accurate responses
+- 512 token context limit
+- Training data cutoff applies
+- May generate incorrect information
+- Cannot browse internet or access real-time data
+## License
+Apache 2.0 - see LICENSE file.
+## Credit
+If you use CosmicFish-120M, please credit Mistyoz AI.