Roy / README.md
souvik18's picture
Update README.md
4f5c8c9 verified
---
language: en
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
datasets:
- souvik18/mistral_tokenized_2048_fixed_v2
pipeline_tag: text-generation
library_name: transformers
tags:
- mistral
- lora
- qlora
- instruction-tuning
- causal-lm
metrics:
- accuracy
---
# Roy
## Model Overview
**Roy** is a fine-tuned large language model based on
[`mistralai/Mistral-7B-Instruct-v0.2`](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2).
The model was trained using **QLoRA** with a resumable streaming pipeline and later **merged into the base model** to produce a **single standalone checkpoint** (no LoRA adapter required at inference time).
This model is optimized for:
- Instruction following
- Conversational responses
- General reasoning and explanation tasks
---
## Base Model
- **Base:** Mistral-7B-Instruct-v0.2
- **Architecture:** Decoder-only Transformer
- **Parameters:** ~7B
- **Context Length:** 2048 tokens
---
## Training Dataset
The model was trained on a custom tokenized dataset:
- **Dataset name:** `mistral_tokenized_2048_fixed_v2`
- **Dataset repository:**
https://huggingface.co/datasets/souvik18/mistral_tokenized_2048_fixed_v2
- **Owner:** souvik18
- **Format:** Pre-tokenized `input_ids`
- **Sequence length:** 2048
- **Tokenizer:** Mistral tokenizer
- **Dataset size:** ~10.7M tokens
### Dataset Processing
- Fixed padding and truncation
- Removed malformed / corrupted samples
- Validated against NaN and overflow issues
- Optimized for streaming-based training
---
## Training Method
- **Fine-tuning method:** QLoRA
- **Quantization:** 4-bit (NF4)
- **Optimizer:** AdamW
- **Learning rate:** 2e-4
- **LoRA rank (r):** 32
- **Target modules:**
`q_proj`, `k_proj`, `v_proj`, `o_proj`,
`gate_proj`, `up_proj`, `down_proj`
- **Gradient checkpointing:** Enabled
- **Training style:** Streaming + resumable
- **Checkpointing:** Hugging Face Hub (HF-only)
After training, the LoRA adapter was **merged into the base model weights** to create this final model.
---
## Inference
This model can be used **directly** without any LoRA adapter.
### Example (Transformers)
```python
!pip uninstall -y transformers peft accelerate torch safetensors numpy
!pip install numpy==1.26.4
!pip install torch==2.2.2
!pip install transformers==4.41.2
!pip install peft==0.11.1
!pip install accelerate==0.30.1
!pip install safetensors==0.4.3
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# -----------------------------
# CONFIG
# -----------------------------
MODEL_ID = "souvik18/Roy"
DTYPE = torch.float16 # use float16 for GPU
# -----------------------------
# LOAD TOKENIZER & MODEL
# -----------------------------
print("🔹 Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
tokenizer.pad_token = tokenizer.eos_token
print("🔹 Loading model...")
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
torch_dtype=DTYPE,
device_map="auto"
)
model.eval()
print("\n✅ Model loaded successfully")
print("Type 'exit' or 'quit' to stop\n")
# -----------------------------
# CHAT LOOP
# -----------------------------
while True:
user_input = input("🧑 You: ").strip()
if user_input.lower() in ["exit", "quit"]:
print("👋 Bye!")
break
prompt = f"[INST] {user_input} [/INST]"
inputs = tokenizer(
prompt,
return_tensors="pt"
).to(model.device)
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=200,
temperature=0.7,
top_p=0.9,
do_sample=True,
repetition_penalty=1.1,
eos_token_id=tokenizer.eos_token_id,
)
response = tokenizer.decode(output[0], skip_special_tokens=True)
print(f"\n Roy: {response}\n")