File size: 1,320 Bytes
9228d09 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | ---
language: en
tags:
- causal-lm
- chat
- reasoning
- momo
license: apache-2.0
---
# 🌸 {MOMO_VERSION}
Momo is a friendly 336M parameter language model trained from scratch,
designed to feel like chatting with a warm, knowledgeable friend.
## Model Details
- **Parameters:** ~336M
- **Architecture:** Transformer (RoPE + RMSNorm + GQA + SwiGLU)
- **Trained on:** WikiText-103 + Alpaca + Custom reasoning data
- **Context length:** {MAX_SEQ_LEN} tokens
- **Vocabulary:** {VOCAB_FINAL} tokens
## Capabilities
- 💬 Friendly, casual chat
- 🧠 Reasoning with `<think>` tags
- ❓ Question answering
- 🤗 Emotional support
## Quick Start
```python
# Load and chat with Momo
model = MomoForCausalLM.from_pretrained('path/to/Momo-336M')
tokenizer = AutoTokenizer.from_pretrained('path/to/Momo-336M')
messages = [{{'role': 'user', 'content': 'Hey Momo! How are you?'}}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors='pt')
output = model.generate(**inputs, max_new_tokens=200, temperature=0.75)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```
## Training Setup
- GPU: 2× NVIDIA T4 (Kaggle)
- Precision: float16 AMP
- Gradient checkpointing: enabled
- Training stages: Pretrain → SFT → Reasoning
|