| | --- |
| | language: en |
| | license: apache-2.0 |
| | base_model: mistralai/Mistral-7B-Instruct-v0.2 |
| | datasets: |
| | - souvik18/mistral_tokenized_2048_fixed_v2 |
| | pipeline_tag: text-generation |
| | library_name: transformers |
| | tags: |
| | - mistral |
| | - lora |
| | - qlora |
| | - instruction-tuning |
| | - causal-lm |
| | metrics: |
| | - accuracy |
| | --- |
| | |
| | # Roy |
| |
|
| | ## Model Overview |
| |
|
| | **Roy** is a fine-tuned large language model based on |
| | [`mistralai/Mistral-7B-Instruct-v0.2`](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2). |
| |
|
| | The model was trained using **QLoRA** with a resumable streaming pipeline and later **merged into the base model** to produce a **single standalone checkpoint** (no LoRA adapter required at inference time). |
| |
|
| | This model is optimized for: |
| | - Instruction following |
| | - Conversational responses |
| | - General reasoning and explanation tasks |
| |
|
| | --- |
| |
|
| | ## Base Model |
| |
|
| | - **Base:** Mistral-7B-Instruct-v0.2 |
| | - **Architecture:** Decoder-only Transformer |
| | - **Parameters:** ~7B |
| | - **Context Length:** 2048 tokens |
| |
|
| | --- |
| |
|
| | ## Training Dataset |
| |
|
| | The model was trained on a custom tokenized dataset: |
| |
|
| | - **Dataset name:** `mistral_tokenized_2048_fixed_v2` |
| | - **Dataset repository:** |
| | https://huggingface.co/datasets/souvik18/mistral_tokenized_2048_fixed_v2 |
| | - **Owner:** souvik18 |
| | - **Format:** Pre-tokenized `input_ids` |
| | - **Sequence length:** 2048 |
| | - **Tokenizer:** Mistral tokenizer |
| | - **Dataset size:** ~10.7M tokens |
| |
|
| | ### Dataset Processing |
| | - Fixed padding and truncation |
| | - Removed malformed / corrupted samples |
| | - Validated against NaN and overflow issues |
| | - Optimized for streaming-based training |
| |
|
| | --- |
| |
|
| | ## Training Method |
| |
|
| | - **Fine-tuning method:** QLoRA |
| | - **Quantization:** 4-bit (NF4) |
| | - **Optimizer:** AdamW |
| | - **Learning rate:** 2e-4 |
| | - **LoRA rank (r):** 32 |
| | - **Target modules:** |
| | `q_proj`, `k_proj`, `v_proj`, `o_proj`, |
| | `gate_proj`, `up_proj`, `down_proj` |
| | - **Gradient checkpointing:** Enabled |
| | - **Training style:** Streaming + resumable |
| | - **Checkpointing:** Hugging Face Hub (HF-only) |
| |
|
| | After training, the LoRA adapter was **merged into the base model weights** to create this final model. |
| |
|
| | --- |
| |
|
| | ## Inference |
| |
|
| | This model can be used **directly** without any LoRA adapter. |
| |
|
| | ### Example (Transformers) |
| |
|
| | ```python |
| | !pip uninstall -y transformers peft accelerate torch safetensors numpy |
| | !pip install numpy==1.26.4 |
| | !pip install torch==2.2.2 |
| | !pip install transformers==4.41.2 |
| | !pip install peft==0.11.1 |
| | !pip install accelerate==0.30.1 |
| | !pip install safetensors==0.4.3 |
| | |
| | from transformers import AutoTokenizer, AutoModelForCausalLM |
| | import torch |
| | |
| | # ----------------------------- |
| | # CONFIG |
| | # ----------------------------- |
| | MODEL_ID = "souvik18/Roy" |
| | DTYPE = torch.float16 # use float16 for GPU |
| | |
| | # ----------------------------- |
| | # LOAD TOKENIZER & MODEL |
| | # ----------------------------- |
| | print("🔹 Loading tokenizer...") |
| | tokenizer = AutoTokenizer.from_pretrained(MODEL_ID) |
| | tokenizer.pad_token = tokenizer.eos_token |
| | |
| | print("🔹 Loading model...") |
| | model = AutoModelForCausalLM.from_pretrained( |
| | MODEL_ID, |
| | torch_dtype=DTYPE, |
| | device_map="auto" |
| | ) |
| | model.eval() |
| | |
| | print("\n✅ Model loaded successfully") |
| | print("Type 'exit' or 'quit' to stop\n") |
| | |
| | # ----------------------------- |
| | # CHAT LOOP |
| | # ----------------------------- |
| | while True: |
| | user_input = input("🧑 You: ").strip() |
| | |
| | if user_input.lower() in ["exit", "quit"]: |
| | print("👋 Bye!") |
| | break |
| | |
| | prompt = f"[INST] {user_input} [/INST]" |
| | |
| | inputs = tokenizer( |
| | prompt, |
| | return_tensors="pt" |
| | ).to(model.device) |
| | |
| | with torch.no_grad(): |
| | output = model.generate( |
| | **inputs, |
| | max_new_tokens=200, |
| | temperature=0.7, |
| | top_p=0.9, |
| | do_sample=True, |
| | repetition_penalty=1.1, |
| | eos_token_id=tokenizer.eos_token_id, |
| | ) |
| | |
| | response = tokenizer.decode(output[0], skip_special_tokens=True) |
| | print(f"\n Roy: {response}\n") |
| | |