File size: 3,841 Bytes

---
language: en
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
datasets:
- souvik18/mistral_tokenized_2048_fixed_v2
pipeline_tag: text-generation
library_name: transformers
tags:
- mistral
- lora
- qlora
- instruction-tuning
- causal-lm
metrics:
- accuracy
---

# Roy

## Model Overview

**Roy** is a fine-tuned large language model based on  
[`mistralai/Mistral-7B-Instruct-v0.2`](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2).

The model was trained using **QLoRA** with a resumable streaming pipeline and later **merged into the base model** to produce a **single standalone checkpoint** (no LoRA adapter required at inference time).

This model is optimized for:
- Instruction following
- Conversational responses
- General reasoning and explanation tasks

---

## Base Model

- **Base:** Mistral-7B-Instruct-v0.2  
- **Architecture:** Decoder-only Transformer  
- **Parameters:** ~7B  
- **Context Length:** 2048 tokens  

---

## Training Dataset

The model was trained on a custom tokenized dataset:

- **Dataset name:** `mistral_tokenized_2048_fixed_v2`
- **Dataset repository:**  
  https://huggingface.co/datasets/souvik18/mistral_tokenized_2048_fixed_v2
- **Owner:** souvik18
- **Format:** Pre-tokenized `input_ids`
- **Sequence length:** 2048
- **Tokenizer:** Mistral tokenizer
- **Dataset size:** ~10.7M tokens

### Dataset Processing
- Fixed padding and truncation
- Removed malformed / corrupted samples
- Validated against NaN and overflow issues
- Optimized for streaming-based training

---

## Training Method

- **Fine-tuning method:** QLoRA
- **Quantization:** 4-bit (NF4)
- **Optimizer:** AdamW
- **Learning rate:** 2e-4
- **LoRA rank (r):** 32
- **Target modules:**  
  `q_proj`, `k_proj`, `v_proj`, `o_proj`,  
  `gate_proj`, `up_proj`, `down_proj`
- **Gradient checkpointing:** Enabled
- **Training style:** Streaming + resumable
- **Checkpointing:** Hugging Face Hub (HF-only)

After training, the LoRA adapter was **merged into the base model weights** to create this final model.

---

## Inference

This model can be used **directly** without any LoRA adapter.

### Example (Transformers)

```python
!pip uninstall -y transformers peft accelerate torch safetensors numpy
!pip install numpy==1.26.4
!pip install torch==2.2.2
!pip install transformers==4.41.2
!pip install peft==0.11.1
!pip install accelerate==0.30.1
!pip install safetensors==0.4.3

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# -----------------------------
# CONFIG
# -----------------------------
MODEL_ID = "souvik18/Roy"
DTYPE = torch.float16   # use float16 for GPU

# -----------------------------
# LOAD TOKENIZER & MODEL
# -----------------------------
print("🔹 Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
tokenizer.pad_token = tokenizer.eos_token

print("🔹 Loading model...")
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=DTYPE,
    device_map="auto"
)
model.eval()

print("\n✅ Model loaded successfully")
print("Type 'exit' or 'quit' to stop\n")

# -----------------------------
# CHAT LOOP
# -----------------------------
while True:
    user_input = input("🧑 You: ").strip()

    if user_input.lower() in ["exit", "quit"]:
        print("👋 Bye!")
        break

    prompt = f"[INST] {user_input} [/INST]"

    inputs = tokenizer(
        prompt,
        return_tensors="pt"
    ).to(model.device)

    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_new_tokens=200,
            temperature=0.7,
            top_p=0.9,
            do_sample=True,
            repetition_penalty=1.1,
            eos_token_id=tokenizer.eos_token_id,
        )

    response = tokenizer.decode(output[0], skip_special_tokens=True)
    print(f"\n Roy: {response}\n")