Upload folder using huggingface_hub

Files changed (2) hide show

README.md ADDED Viewed

+---
+language: en
+tags:
+  - causal-lm
+  - chat
+  - reasoning
+  - momo
+license: apache-2.0
+---
+# 🌸 {MOMO_VERSION}
+Momo is a friendly 336M parameter language model trained from scratch,
+designed to feel like chatting with a warm, knowledgeable friend.
+## Model Details
+- **Parameters:** ~336M
+- **Architecture:** Transformer (RoPE + RMSNorm + GQA + SwiGLU)
+- **Trained on:** WikiText-103 + Alpaca + Custom reasoning data
+- **Context length:** {MAX_SEQ_LEN} tokens
+- **Vocabulary:** {VOCAB_FINAL} tokens
+## Capabilities
+- 💬 Friendly, casual chat
+- 🧠 Reasoning with `<think>` tags
+- ❓ Question answering
+- 🤗 Emotional support
+## Quick Start
+```python
+# Load and chat with Momo
+model = MomoForCausalLM.from_pretrained('path/to/Momo-336M')
+tokenizer = AutoTokenizer.from_pretrained('path/to/Momo-336M')
+messages = [{{'role': 'user', 'content': 'Hey Momo! How are you?'}}]
+prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(prompt, return_tensors='pt')
+output = model.generate(**inputs, max_new_tokens=200, temperature=0.75)
+print(tokenizer.decode(output[0], skip_special_tokens=True))
+```
+## Training Setup
+- GPU: 2× NVIDIA T4 (Kaggle)
+- Precision: float16 AMP
+- Gradient checkpointing: enabled
+- Training stages: Pretrain → SFT → Reasoning

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff