File size: 3,841 Bytes
73bb5df
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4f5c8c9
 
 
 
 
 
 
 
73bb5df
 
 
4f5c8c9
 
 
 
 
 
 
 
 
 
 
 
73bb5df
4f5c8c9
73bb5df
4f5c8c9
 
73bb5df
 
4f5c8c9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
language: en
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
datasets:
- souvik18/mistral_tokenized_2048_fixed_v2
pipeline_tag: text-generation
library_name: transformers
tags:
- mistral
- lora
- qlora
- instruction-tuning
- causal-lm
metrics:
- accuracy
---

# Roy

## Model Overview

**Roy** is a fine-tuned large language model based on  
[`mistralai/Mistral-7B-Instruct-v0.2`](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2).

The model was trained using **QLoRA** with a resumable streaming pipeline and later **merged into the base model** to produce a **single standalone checkpoint** (no LoRA adapter required at inference time).

This model is optimized for:
- Instruction following
- Conversational responses
- General reasoning and explanation tasks

---

## Base Model

- **Base:** Mistral-7B-Instruct-v0.2  
- **Architecture:** Decoder-only Transformer  
- **Parameters:** ~7B  
- **Context Length:** 2048 tokens  

---

## Training Dataset

The model was trained on a custom tokenized dataset:

- **Dataset name:** `mistral_tokenized_2048_fixed_v2`
- **Dataset repository:**  
  https://huggingface.co/datasets/souvik18/mistral_tokenized_2048_fixed_v2
- **Owner:** souvik18
- **Format:** Pre-tokenized `input_ids`
- **Sequence length:** 2048
- **Tokenizer:** Mistral tokenizer
- **Dataset size:** ~10.7M tokens

### Dataset Processing
- Fixed padding and truncation
- Removed malformed / corrupted samples
- Validated against NaN and overflow issues
- Optimized for streaming-based training

---

## Training Method

- **Fine-tuning method:** QLoRA
- **Quantization:** 4-bit (NF4)
- **Optimizer:** AdamW
- **Learning rate:** 2e-4
- **LoRA rank (r):** 32
- **Target modules:**  
  `q_proj`, `k_proj`, `v_proj`, `o_proj`,  
  `gate_proj`, `up_proj`, `down_proj`
- **Gradient checkpointing:** Enabled
- **Training style:** Streaming + resumable
- **Checkpointing:** Hugging Face Hub (HF-only)

After training, the LoRA adapter was **merged into the base model weights** to create this final model.

---

## Inference

This model can be used **directly** without any LoRA adapter.

### Example (Transformers)

```python
!pip uninstall -y transformers peft accelerate torch safetensors numpy
!pip install numpy==1.26.4
!pip install torch==2.2.2
!pip install transformers==4.41.2
!pip install peft==0.11.1
!pip install accelerate==0.30.1
!pip install safetensors==0.4.3

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# -----------------------------
# CONFIG
# -----------------------------
MODEL_ID = "souvik18/Roy"
DTYPE = torch.float16   # use float16 for GPU

# -----------------------------
# LOAD TOKENIZER & MODEL
# -----------------------------
print("🔹 Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
tokenizer.pad_token = tokenizer.eos_token

print("🔹 Loading model...")
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=DTYPE,
    device_map="auto"
)
model.eval()

print("\n✅ Model loaded successfully")
print("Type 'exit' or 'quit' to stop\n")

# -----------------------------
# CHAT LOOP
# -----------------------------
while True:
    user_input = input("🧑 You: ").strip()

    if user_input.lower() in ["exit", "quit"]:
        print("👋 Bye!")
        break

    prompt = f"[INST] {user_input} [/INST]"

    inputs = tokenizer(
        prompt,
        return_tensors="pt"
    ).to(model.device)

    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_new_tokens=200,
            temperature=0.7,
            top_p=0.9,
            do_sample=True,
            repetition_penalty=1.1,
            eos_token_id=tokenizer.eos_token_id,
        )

    response = tokenizer.decode(output[0], skip_special_tokens=True)
    print(f"\n Roy: {response}\n")