Momo-336M-sft / README.md
dill-dev's picture
Upload folder using huggingface_hub
9228d09 verified
metadata
language: en
tags:
  - causal-lm
  - chat
  - reasoning
  - momo
license: apache-2.0

🌸 {MOMO_VERSION}

Momo is a friendly 336M parameter language model trained from scratch, designed to feel like chatting with a warm, knowledgeable friend.

Model Details

  • Parameters: ~336M
  • Architecture: Transformer (RoPE + RMSNorm + GQA + SwiGLU)
  • Trained on: WikiText-103 + Alpaca + Custom reasoning data
  • Context length: {MAX_SEQ_LEN} tokens
  • Vocabulary: {VOCAB_FINAL} tokens

Capabilities

  • 💬 Friendly, casual chat
  • 🧠 Reasoning with <think> tags
  • ❓ Question answering
  • 🤗 Emotional support

Quick Start

# Load and chat with Momo
model = MomoForCausalLM.from_pretrained('path/to/Momo-336M')
tokenizer = AutoTokenizer.from_pretrained('path/to/Momo-336M')

messages = [{{'role': 'user', 'content': 'Hey Momo! How are you?'}}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors='pt')
output = model.generate(**inputs, max_new_tokens=200, temperature=0.75)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Training Setup

  • GPU: 2× NVIDIA T4 (Kaggle)
  • Precision: float16 AMP
  • Gradient checkpointing: enabled
  • Training stages: Pretrain → SFT → Reasoning