RSCaLM-138M-LLaMA / README.md
yasserrmd's picture
Update README.md
f1c3e65 verified
---
datasets:
- HuggingFaceFW/fineweb-edu
license: apache-2.0
---
# RSCaLM-138M-LLaMA
**RSCaLM** (Research Scale Causal Language Model) is an experimental 138M-parameter LLaMA-architecture model trained for **20,000 steps**.
This run was conducted purely for **experimental and benchmarking purposes** β€” **no high expectations** for downstream task quality.
---
## πŸ“Œ Experiment Summary
* **Architecture:** LLaMA-style causal decoder
* Rotary positional embeddings (RoPE)
* Pre-normalization with RMSNorm
* SwiGLU feed-forward layers
* Multi-head self-attention with key-value caching support
* **Parameter Count:** \~138M
* **Context Length:** 2048 tokens
* **Tokenizer:** LLaMA tokenizer
* **Training Framework:** PyTorch + Hugging Face Transformers
* **Optimizer:** AdamW (Ξ²1=0.9, Ξ²2=0.95, weight decay=0.1)
* **Scheduler:** Cosine decay with warmup
* **Precision:** Mixed-precision (FP16/BF16)
* **Batching:** Gradient accumulation to simulate large batch size
* **Dataset:** General text corpus for pipeline validation (not domain-specific)
* **Steps Completed:** 20,000 (\~32% of planned total)
---
## πŸ“‰ Validation Loss Progress
| Step | Val Loss |
| ----- | -------- |
| 1000 | 5.5968 |
| 2000 | 4.8513 |
| 5000 | 4.2105 |
| 10000 | 3.9603 |
| 15000 | 3.8497 |
| 20000 | 3.7891 |
Loss shows steady improvement over the limited training period.
---
## ⚠️ Notes
* This is an **early prototype** β€” not tuned for production use.
* Training stopped after \~32% of planned total steps.
* Possible repetition loops observed in generation β€” expected for low-step runs.
* Intended for research reference, not for deployment in critical tasks.
---
## πŸ”§ Example Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "yasserrmd/RSCaLM-138M-LLaMA"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
prompt = "The sun is"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
---
## πŸ”§ Example Usage (with repetition control)
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "yasserrmd/RSCaLM-138M-LLaMA"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
prompt = "when a man goes to fishing"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
# Generation settings to reduce repetition
outputs = model.generate(
**inputs,
max_new_tokens=100, # Limit length of output
temperature=0.7, # Lower temperature = more focused
top_p=0.9, # Nucleus sampling
top_k=50, # Top-K filtering
repetition_penalty=1.2, # Penalize repeating tokens
no_repeat_ngram_size=3, # Prevent repeating trigrams
eos_token_id=tokenizer.eos_token_id, # End generation at EOS
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
---
### πŸ’‘ Tips for controlling repetition:
1. **`repetition_penalty`** – Increase slightly above `1.0` (e.g., `1.2–1.5`) to discourage repeated phrases.
2. **`no_repeat_ngram_size`** – Set to `3` or `4` to avoid repeated n-grams.
3. **`top_k` + `top_p`** – Combine both for better randomness control.
4. **Lower `temperature`** – Keeps outputs focused and less chaotic.
5. **Stop sequences** – Add specific words/phrases to halt generation early if needed.
---
## πŸ“œ License
apache-2.0