| --- |
| language: en |
| tags: |
| - causal-lm |
| - chat |
| - reasoning |
| - momo |
| license: apache-2.0 |
| --- |
| |
| # πΈ {MOMO_VERSION} |
| |
| Momo is a friendly 336M parameter language model trained from scratch, |
| designed to feel like chatting with a warm, knowledgeable friend. |
| |
| ## Model Details |
| - **Parameters:** ~336M |
| - **Architecture:** Transformer (RoPE + RMSNorm + GQA + SwiGLU) |
| - **Trained on:** WikiText-103 + Alpaca + Custom reasoning data |
| - **Context length:** {MAX_SEQ_LEN} tokens |
| - **Vocabulary:** {VOCAB_FINAL} tokens |
| ## Capabilities |
| - π¬ Friendly, casual chat |
| - π§ Reasoning with `<think>` tags |
| - β Question answering |
| - π€ Emotional support |
|
|
| ## Quick Start |
| ```python |
| # Load and chat with Momo |
| model = MomoForCausalLM.from_pretrained('path/to/Momo-336M') |
| tokenizer = AutoTokenizer.from_pretrained('path/to/Momo-336M') |
| |
| messages = [{{'role': 'user', 'content': 'Hey Momo! How are you?'}}] |
| prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
| inputs = tokenizer(prompt, return_tensors='pt') |
| output = model.generate(**inputs, max_new_tokens=200, temperature=0.75) |
| print(tokenizer.decode(output[0], skip_special_tokens=True)) |
| ``` |
|
|
| ## Training Setup |
| - GPU: 2Γ NVIDIA T4 (Kaggle) |
| - Precision: float16 AMP |
| - Gradient checkpointing: enabled |
| - Training stages: Pretrain β SFT β Reasoning |
|
|