akkiisfrommars commited on
Commit
c4e758f
·
verified ·
1 Parent(s): caef92e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +161 -3
README.md CHANGED
@@ -1,3 +1,161 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - text-generation
5
+ - language-model
6
+ - LLM
7
+ - CosmicFish
8
+ - 120M
9
+ - transformer
10
+ language: en
11
+ datasets: CosmicSet-1.0
12
+ model_type: CosmicFish
13
+ ---
14
+
15
+
16
+ # CosmicFish-120M
17
+
18
+ A 120M parameter causal language model with modern architecture improvements.
19
+
20
+ ## Model Details
21
+
22
+ - **Parameters**: 121M
23
+ - **Architecture**: CosmicFish (RoPE, GQA, SwiGLU, RMSNorm)
24
+ - **Context Length**: 512 tokens
25
+ - **Vocabulary**: 50,257 tokens
26
+ - **Training Data**: CosmicSet 1.0
27
+ - **Developer**: Mistyoz AI
28
+
29
+ ## Usage
30
+
31
+ ### Installation
32
+
33
+ ```bash
34
+ pip install torch transformers
35
+ ```
36
+
37
+ ### Loading the Model
38
+
39
+ ```python
40
+ import torch
41
+ import json
42
+ from transformers import GPT2Tokenizer
43
+ from modeling_cosmicfish import CosmicFish, CosmicConfig
44
+
45
+ # Load model
46
+ with open("config.json") as f:
47
+ config_dict = json.load(f)
48
+
49
+ config = CosmicConfig(**{k: v for k, v in config_dict.items() if k in [
50
+ 'vocab_size', 'block_size', 'n_layer', 'n_head', 'n_embd', 'bias',
51
+ 'use_rotary', 'use_swiglu', 'use_gqa', 'n_query_groups'
52
+ ]})
53
+ config.dropout = 0.0 # Inference mode
54
+
55
+ model = CosmicFish(config)
56
+ model.load_state_dict(torch.load("pytorch_model.bin", map_location="cpu"))
57
+ model.eval()
58
+
59
+ # Load tokenizer
60
+ tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
61
+ ```
62
+
63
+ ### Basic Generation
64
+
65
+ ```python
66
+ def generate_text(prompt, max_tokens=100):
67
+ inputs = tokenizer.encode(prompt, return_tensors="pt")
68
+
69
+ with torch.no_grad():
70
+ outputs = model.generate(
71
+ inputs,
72
+ max_new_tokens=max_tokens,
73
+ temperature=0.7,
74
+ top_k=40,
75
+ do_sample=True
76
+ )
77
+
78
+ return tokenizer.decode(outputs[0], skip_special_tokens=True)
79
+
80
+ # Example
81
+ text = generate_text("The future of AI is")
82
+ print(text)
83
+ ```
84
+
85
+ ### Chat Interface
86
+
87
+ ```python
88
+ def chat_with_model():
89
+ conversation = []
90
+
91
+ while True:
92
+ user_input = input("You: ")
93
+ if user_input.lower() in ['quit', 'exit']:
94
+ break
95
+
96
+
97
+ context = "Below is a conversation between a human and an AI assistant.\n\n"
98
+ for human, ai in conversation:
99
+ context += f"Human: {human}\nAssistant: {ai}\n\n"
100
+ context += f"Human: {user_input}\nAssistant:"
101
+
102
+ # Generate response
103
+ inputs = tokenizer.encode(context, return_tensors="pt")
104
+ if inputs.shape[1] > 400:
105
+ inputs = inputs[:, -400:]
106
+
107
+ with torch.no_grad():
108
+ outputs = model.generate(
109
+ inputs,
110
+ max_new_tokens=150,
111
+ temperature=0.7,
112
+ top_k=40,
113
+ do_sample=True,
114
+ pad_token_id=tokenizer.eos_token_id
115
+ )
116
+
117
+ response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
118
+ response = response.split('\n')[0].strip()
119
+
120
+ print(f"CosmicFish: {response}")
121
+ conversation.append((user_input, response))
122
+
123
+ chat_with_model()
124
+ ```
125
+
126
+ ## Architecture
127
+
128
+ CosmicFish uses several modern improvements over standard transformers:
129
+
130
+ - **RoPE (Rotary Position Embeddings)**: Better position encoding than absolute positions
131
+ - **GQA (Grouped-Query Attention)**: Reduces memory usage with 4 query groups
132
+ - **SwiGLU**: More effective activation function than ReLU/GELU
133
+ - **RMSNorm**: Simpler, more stable normalization than LayerNorm
134
+
135
+ ## Training
136
+
137
+ - **Dataset**: CosmicSet 1.0
138
+ - **Sequence Length**: 512 tokens
139
+ - **Training Steps**: ~300K iterations
140
+ - **Hardware**: Nvidia A40 x1
141
+
142
+ ## Performance
143
+
144
+ - **Speed**: Varies by hardware (not benchmarked)
145
+ - **Memory**: ~500MB RAM (FP16)
146
+ - **File Size**: 243MB
147
+
148
+ ## Limitations
149
+ - Small model size (120M parameters) may produce less accurate responses
150
+ - 512 token context limit
151
+ - Training data cutoff applies
152
+ - May generate incorrect information
153
+ - Cannot browse internet or access real-time data
154
+
155
+ ## License
156
+
157
+ Apache 2.0 - see LICENSE file.
158
+
159
+
160
+ ## Credit
161
+ If you use CosmicFish-120M, please credit Mistyoz AI.