RSCaLM-138M-LLaMA / README.md

Update README.md

f1c3e65 verified 5 months ago

3.66 kB

	---
	datasets:
	- HuggingFaceFW/fineweb-edu
	license: apache-2.0
	---


	# RSCaLM-138M-LLaMA

	RSCaLM (Research Scale Causal Language Model) is an experimental 138M-parameter LLaMA-architecture model trained for 20,000 steps.
	This run was conducted purely for experimental and benchmarking purposes — no high expectations for downstream task quality.

	---

	## 📌 Experiment Summary

	* Architecture: LLaMA-style causal decoder

	* Rotary positional embeddings (RoPE)
	* Pre-normalization with RMSNorm
	* SwiGLU feed-forward layers
	* Multi-head self-attention with key-value caching support
	* Parameter Count: \~138M
	* Context Length: 2048 tokens
	* Tokenizer: LLaMA tokenizer
	* Training Framework: PyTorch + Hugging Face Transformers
	* Optimizer: AdamW (β1=0.9, β2=0.95, weight decay=0.1)
	* Scheduler: Cosine decay with warmup
	* Precision: Mixed-precision (FP16/BF16)
	* Batching: Gradient accumulation to simulate large batch size
	* Dataset: General text corpus for pipeline validation (not domain-specific)
	* Steps Completed: 20,000 (\~32% of planned total)

	---

	## 📉 Validation Loss Progress

	\| Step \| Val Loss \|
	\| ----- \| -------- \|
	\| 1000 \| 5.5968 \|
	\| 2000 \| 4.8513 \|
	\| 5000 \| 4.2105 \|
	\| 10000 \| 3.9603 \|
	\| 15000 \| 3.8497 \|
	\| 20000 \| 3.7891 \|

	Loss shows steady improvement over the limited training period.

	---

	## ⚠️ Notes

	* This is an early prototype — not tuned for production use.
	* Training stopped after \~32% of planned total steps.
	* Possible repetition loops observed in generation — expected for low-step runs.
	* Intended for research reference, not for deployment in critical tasks.

	---

	## 🔧 Example Usage

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_id = "yasserrmd/RSCaLM-138M-LLaMA"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

	prompt = "The sun is"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	---

	## 🔧 Example Usage (with repetition control)

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_id = "yasserrmd/RSCaLM-138M-LLaMA"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

	prompt = "when a man goes to fishing"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	# Generation settings to reduce repetition
	outputs = model.generate(
	**inputs,
	max_new_tokens=100, # Limit length of output
	temperature=0.7, # Lower temperature = more focused
	top_p=0.9, # Nucleus sampling
	top_k=50, # Top-K filtering
	repetition_penalty=1.2, # Penalize repeating tokens
	no_repeat_ngram_size=3, # Prevent repeating trigrams
	eos_token_id=tokenizer.eos_token_id, # End generation at EOS
	)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	---

	### 💡 Tips for controlling repetition:

	1. `repetition_penalty` – Increase slightly above `1.0` (e.g., `1.2–1.5`) to discourage repeated phrases.
	2. `no_repeat_ngram_size` – Set to `3` or `4` to avoid repeated n-grams.
	3. `top_k` + `top_p` – Combine both for better randomness control.
	4. Lower `temperature` – Keeps outputs focused and less chaotic.
	5. Stop sequences – Add specific words/phrases to halt generation early if needed.

	---

	## 📜 License
	apache-2.0