---
datasets:
- HuggingFaceFW/fineweb-edu
license: apache-2.0
---


# RSCaLM-138M-LLaMA

**RSCaLM** (Research Scale Causal Language Model) is an experimental 138M-parameter LLaMA-architecture model trained for **20,000 steps**.
This run was conducted purely for **experimental and benchmarking purposes** — **no high expectations** for downstream task quality.

---

## 📌 Experiment Summary

* **Architecture:** LLaMA-style causal decoder

  * Rotary positional embeddings (RoPE)
  * Pre-normalization with RMSNorm
  * SwiGLU feed-forward layers
  * Multi-head self-attention with key-value caching support
* **Parameter Count:** \~138M
* **Context Length:** 2048 tokens
* **Tokenizer:** LLaMA tokenizer
* **Training Framework:** PyTorch + Hugging Face Transformers
* **Optimizer:** AdamW (β1=0.9, β2=0.95, weight decay=0.1)
* **Scheduler:** Cosine decay with warmup
* **Precision:** Mixed-precision (FP16/BF16)
* **Batching:** Gradient accumulation to simulate large batch size
* **Dataset:** General text corpus for pipeline validation (not domain-specific)
* **Steps Completed:** 20,000 (\~32% of planned total)

---

## 📉 Validation Loss Progress

| Step  | Val Loss |
| ----- | -------- |
| 1000  | 5.5968   |
| 2000  | 4.8513   |
| 5000  | 4.2105   |
| 10000 | 3.9603   |
| 15000 | 3.8497   |
| 20000 | 3.7891   |

Loss shows steady improvement over the limited training period.

---

## ⚠️ Notes

* This is an **early prototype** — not tuned for production use.
* Training stopped after \~32% of planned total steps.
* Possible repetition loops observed in generation — expected for low-step runs.
* Intended for research reference, not for deployment in critical tasks.

---

## 🔧 Example Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "yasserrmd/RSCaLM-138M-LLaMA"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

prompt = "The sun is"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

---

## 🔧 Example Usage (with repetition control)

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "yasserrmd/RSCaLM-138M-LLaMA"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

prompt = "when a man goes to fishing"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generation settings to reduce repetition
outputs = model.generate(
    **inputs,
    max_new_tokens=100,        # Limit length of output
    temperature=0.7,           # Lower temperature = more focused
    top_p=0.9,                  # Nucleus sampling
    top_k=50,                   # Top-K filtering
    repetition_penalty=1.2,     # Penalize repeating tokens
    no_repeat_ngram_size=3,     # Prevent repeating trigrams
    eos_token_id=tokenizer.eos_token_id,  # End generation at EOS
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

---

### 💡 Tips for controlling repetition:

1. **`repetition_penalty`** – Increase slightly above `1.0` (e.g., `1.2–1.5`) to discourage repeated phrases.
2. **`no_repeat_ngram_size`** – Set to `3` or `4` to avoid repeated n-grams.
3. **`top_k` + `top_p`** – Combine both for better randomness control.
4. **Lower `temperature`** – Keeps outputs focused and less chaotic.
5. **Stop sequences** – Add specific words/phrases to halt generation early if needed.

---

## 📜 License
apache-2.0