--- datasets: - HuggingFaceFW/fineweb-edu license: apache-2.0 --- # RSCaLM-138M-LLaMA **RSCaLM** (Research Scale Causal Language Model) is an experimental 138M-parameter LLaMA-architecture model trained for **20,000 steps**. This run was conducted purely for **experimental and benchmarking purposes** — **no high expectations** for downstream task quality. --- ## 📌 Experiment Summary * **Architecture:** LLaMA-style causal decoder * Rotary positional embeddings (RoPE) * Pre-normalization with RMSNorm * SwiGLU feed-forward layers * Multi-head self-attention with key-value caching support * **Parameter Count:** \~138M * **Context Length:** 2048 tokens * **Tokenizer:** LLaMA tokenizer * **Training Framework:** PyTorch + Hugging Face Transformers * **Optimizer:** AdamW (β1=0.9, β2=0.95, weight decay=0.1) * **Scheduler:** Cosine decay with warmup * **Precision:** Mixed-precision (FP16/BF16) * **Batching:** Gradient accumulation to simulate large batch size * **Dataset:** General text corpus for pipeline validation (not domain-specific) * **Steps Completed:** 20,000 (\~32% of planned total) --- ## 📉 Validation Loss Progress | Step | Val Loss | | ----- | -------- | | 1000 | 5.5968 | | 2000 | 4.8513 | | 5000 | 4.2105 | | 10000 | 3.9603 | | 15000 | 3.8497 | | 20000 | 3.7891 | Loss shows steady improvement over the limited training period. --- ## ⚠️ Notes * This is an **early prototype** — not tuned for production use. * Training stopped after \~32% of planned total steps. * Possible repetition loops observed in generation — expected for low-step runs. * Intended for research reference, not for deployment in critical tasks. --- ## 🔧 Example Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "yasserrmd/RSCaLM-138M-LLaMA" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto") prompt = "The sun is" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --- ## 🔧 Example Usage (with repetition control) ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "yasserrmd/RSCaLM-138M-LLaMA" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto") prompt = "when a man goes to fishing" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) # Generation settings to reduce repetition outputs = model.generate( **inputs, max_new_tokens=100, # Limit length of output temperature=0.7, # Lower temperature = more focused top_p=0.9, # Nucleus sampling top_k=50, # Top-K filtering repetition_penalty=1.2, # Penalize repeating tokens no_repeat_ngram_size=3, # Prevent repeating trigrams eos_token_id=tokenizer.eos_token_id, # End generation at EOS ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --- ### 💡 Tips for controlling repetition: 1. **`repetition_penalty`** – Increase slightly above `1.0` (e.g., `1.2–1.5`) to discourage repeated phrases. 2. **`no_repeat_ngram_size`** – Set to `3` or `4` to avoid repeated n-grams. 3. **`top_k` + `top_p`** – Combine both for better randomness control. 4. **Lower `temperature`** – Keeps outputs focused and less chaotic. 5. **Stop sequences** – Add specific words/phrases to halt generation early if needed. --- ## 📜 License apache-2.0