dill-dev
/

Momo-336M-sft

Model card Files Files and versions

Momo-336M-sft / README.md

dill-dev's picture

Upload folder using huggingface_hub

9228d09 verified about 2 months ago

|

history blame contribute delete

1.32 kB

	---
	language: en
	tags:
	- causal-lm
	- chat
	- reasoning
	- momo
	license: apache-2.0
	---

	# 🌸 {MOMO_VERSION}

	Momo is a friendly 336M parameter language model trained from scratch,
	designed to feel like chatting with a warm, knowledgeable friend.

	## Model Details
	- Parameters: ~336M
	- Architecture: Transformer (RoPE + RMSNorm + GQA + SwiGLU)
	- Trained on: WikiText-103 + Alpaca + Custom reasoning data
	- Context length: {MAX_SEQ_LEN} tokens
	- Vocabulary: {VOCAB_FINAL} tokens
	## Capabilities
	- 💬 Friendly, casual chat
	- 🧠 Reasoning with `<think>` tags
	- ❓ Question answering
	- 🤗 Emotional support

	## Quick Start
	```python
	# Load and chat with Momo
	model = MomoForCausalLM.from_pretrained('path/to/Momo-336M')
	tokenizer = AutoTokenizer.from_pretrained('path/to/Momo-336M')

	messages = [{{'role': 'user', 'content': 'Hey Momo! How are you?'}}]
	prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(prompt, return_tensors='pt')
	output = model.generate(**inputs, max_new_tokens=200, temperature=0.75)
	print(tokenizer.decode(output[0], skip_special_tokens=True))
	```

	## Training Setup
	- GPU: 2× NVIDIA T4 (Kaggle)
	- Precision: float16 AMP
	- Gradient checkpointing: enabled
	- Training stages: Pretrain → SFT → Reasoning