--- language: en license: apache-2.0 base_model: mistralai/Mistral-7B-Instruct-v0.2 datasets: - souvik18/mistral_tokenized_2048_fixed_v2 pipeline_tag: text-generation library_name: transformers tags: - mistral - lora - qlora - instruction-tuning - causal-lm metrics: - accuracy --- # Roy ## Model Overview **Roy** is a fine-tuned large language model based on [`mistralai/Mistral-7B-Instruct-v0.2`](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2). The model was trained using **QLoRA** with a resumable streaming pipeline and later **merged into the base model** to produce a **single standalone checkpoint** (no LoRA adapter required at inference time). This model is optimized for: - Instruction following - Conversational responses - General reasoning and explanation tasks --- ## Base Model - **Base:** Mistral-7B-Instruct-v0.2 - **Architecture:** Decoder-only Transformer - **Parameters:** ~7B - **Context Length:** 2048 tokens --- ## Training Dataset The model was trained on a custom tokenized dataset: - **Dataset name:** `mistral_tokenized_2048_fixed_v2` - **Dataset repository:** https://huggingface.co/datasets/souvik18/mistral_tokenized_2048_fixed_v2 - **Owner:** souvik18 - **Format:** Pre-tokenized `input_ids` - **Sequence length:** 2048 - **Tokenizer:** Mistral tokenizer - **Dataset size:** ~10.7M tokens ### Dataset Processing - Fixed padding and truncation - Removed malformed / corrupted samples - Validated against NaN and overflow issues - Optimized for streaming-based training --- ## Training Method - **Fine-tuning method:** QLoRA - **Quantization:** 4-bit (NF4) - **Optimizer:** AdamW - **Learning rate:** 2e-4 - **LoRA rank (r):** 32 - **Target modules:** `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` - **Gradient checkpointing:** Enabled - **Training style:** Streaming + resumable - **Checkpointing:** Hugging Face Hub (HF-only) After training, the LoRA adapter was **merged into the base model weights** to create this final model. --- ## Inference This model can be used **directly** without any LoRA adapter. ### Example (Transformers) ```python !pip uninstall -y transformers peft accelerate torch safetensors numpy !pip install numpy==1.26.4 !pip install torch==2.2.2 !pip install transformers==4.41.2 !pip install peft==0.11.1 !pip install accelerate==0.30.1 !pip install safetensors==0.4.3 from transformers import AutoTokenizer, AutoModelForCausalLM import torch # ----------------------------- # CONFIG # ----------------------------- MODEL_ID = "souvik18/Roy" DTYPE = torch.float16 # use float16 for GPU # ----------------------------- # LOAD TOKENIZER & MODEL # ----------------------------- print("šŸ”¹ Loading tokenizer...") tokenizer = AutoTokenizer.from_pretrained(MODEL_ID) tokenizer.pad_token = tokenizer.eos_token print("šŸ”¹ Loading model...") model = AutoModelForCausalLM.from_pretrained( MODEL_ID, torch_dtype=DTYPE, device_map="auto" ) model.eval() print("\nāœ… Model loaded successfully") print("Type 'exit' or 'quit' to stop\n") # ----------------------------- # CHAT LOOP # ----------------------------- while True: user_input = input("šŸ§‘ You: ").strip() if user_input.lower() in ["exit", "quit"]: print("šŸ‘‹ Bye!") break prompt = f"[INST] {user_input} [/INST]" inputs = tokenizer( prompt, return_tensors="pt" ).to(model.device) with torch.no_grad(): output = model.generate( **inputs, max_new_tokens=200, temperature=0.7, top_p=0.9, do_sample=True, repetition_penalty=1.1, eos_token_id=tokenizer.eos_token_id, ) response = tokenizer.decode(output[0], skip_special_tokens=True) print(f"\n Roy: {response}\n")