dill-dev commited on
Commit
9228d09
·
verified ·
1 Parent(s): 1ccec9b

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +45 -0
  2. tokenizer.json +0 -0
README.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - causal-lm
5
+ - chat
6
+ - reasoning
7
+ - momo
8
+ license: apache-2.0
9
+ ---
10
+
11
+ # 🌸 {MOMO_VERSION}
12
+
13
+ Momo is a friendly 336M parameter language model trained from scratch,
14
+ designed to feel like chatting with a warm, knowledgeable friend.
15
+
16
+ ## Model Details
17
+ - **Parameters:** ~336M
18
+ - **Architecture:** Transformer (RoPE + RMSNorm + GQA + SwiGLU)
19
+ - **Trained on:** WikiText-103 + Alpaca + Custom reasoning data
20
+ - **Context length:** {MAX_SEQ_LEN} tokens
21
+ - **Vocabulary:** {VOCAB_FINAL} tokens
22
+ ## Capabilities
23
+ - 💬 Friendly, casual chat
24
+ - 🧠 Reasoning with `<think>` tags
25
+ - ❓ Question answering
26
+ - 🤗 Emotional support
27
+
28
+ ## Quick Start
29
+ ```python
30
+ # Load and chat with Momo
31
+ model = MomoForCausalLM.from_pretrained('path/to/Momo-336M')
32
+ tokenizer = AutoTokenizer.from_pretrained('path/to/Momo-336M')
33
+
34
+ messages = [{{'role': 'user', 'content': 'Hey Momo! How are you?'}}]
35
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
36
+ inputs = tokenizer(prompt, return_tensors='pt')
37
+ output = model.generate(**inputs, max_new_tokens=200, temperature=0.75)
38
+ print(tokenizer.decode(output[0], skip_special_tokens=True))
39
+ ```
40
+
41
+ ## Training Setup
42
+ - GPU: 2× NVIDIA T4 (Kaggle)
43
+ - Precision: float16 AMP
44
+ - Gradient checkpointing: enabled
45
+ - Training stages: Pretrain → SFT → Reasoning
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff