Roy / README.md

Update README.md

4f5c8c9 verified 3 months ago

3.84 kB

	---
	language: en
	license: apache-2.0
	base_model: mistralai/Mistral-7B-Instruct-v0.2
	datasets:
	- souvik18/mistral_tokenized_2048_fixed_v2
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- mistral
	- lora
	- qlora
	- instruction-tuning
	- causal-lm
	metrics:
	- accuracy
	---

	# Roy

	## Model Overview

	Roy is a fine-tuned large language model based on
	[`mistralai/Mistral-7B-Instruct-v0.2`](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2).

	The model was trained using QLoRA with a resumable streaming pipeline and later merged into the base model to produce a single standalone checkpoint (no LoRA adapter required at inference time).

	This model is optimized for:
	- Instruction following
	- Conversational responses
	- General reasoning and explanation tasks

	---

	## Base Model

	- Base: Mistral-7B-Instruct-v0.2
	- Architecture: Decoder-only Transformer
	- Parameters: ~7B
	- Context Length: 2048 tokens

	---

	## Training Dataset

	The model was trained on a custom tokenized dataset:

	- Dataset name: `mistral_tokenized_2048_fixed_v2`
	- Dataset repository:
	https://huggingface.co/datasets/souvik18/mistral_tokenized_2048_fixed_v2
	- Owner: souvik18
	- Format: Pre-tokenized `input_ids`
	- Sequence length: 2048
	- Tokenizer: Mistral tokenizer
	- Dataset size: ~10.7M tokens

	### Dataset Processing
	- Fixed padding and truncation
	- Removed malformed / corrupted samples
	- Validated against NaN and overflow issues
	- Optimized for streaming-based training

	---

	## Training Method

	- Fine-tuning method: QLoRA
	- Quantization: 4-bit (NF4)
	- Optimizer: AdamW
	- Learning rate: 2e-4
	- LoRA rank (r): 32
	- Target modules:
	`q_proj`, `k_proj`, `v_proj`, `o_proj`,
	`gate_proj`, `up_proj`, `down_proj`
	- Gradient checkpointing: Enabled
	- Training style: Streaming + resumable
	- Checkpointing: Hugging Face Hub (HF-only)

	After training, the LoRA adapter was merged into the base model weights to create this final model.

	---

	## Inference

	This model can be used directly without any LoRA adapter.

	### Example (Transformers)

	```python
	!pip uninstall -y transformers peft accelerate torch safetensors numpy
	!pip install numpy==1.26.4
	!pip install torch==2.2.2
	!pip install transformers==4.41.2
	!pip install peft==0.11.1
	!pip install accelerate==0.30.1
	!pip install safetensors==0.4.3

	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	# -----------------------------
	# CONFIG
	# -----------------------------
	MODEL_ID = "souvik18/Roy"
	DTYPE = torch.float16 # use float16 for GPU

	# -----------------------------
	# LOAD TOKENIZER & MODEL
	# -----------------------------
	print("🔹 Loading tokenizer...")
	tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
	tokenizer.pad_token = tokenizer.eos_token

	print("🔹 Loading model...")
	model = AutoModelForCausalLM.from_pretrained(
	MODEL_ID,
	torch_dtype=DTYPE,
	device_map="auto"
	)
	model.eval()

	print("\n✅ Model loaded successfully")
	print("Type 'exit' or 'quit' to stop\n")

	# -----------------------------
	# CHAT LOOP
	# -----------------------------
	while True:
	user_input = input("🧑 You: ").strip()

	if user_input.lower() in ["exit", "quit"]:
	print("👋 Bye!")
	break

	prompt = f"[INST] {user_input} [/INST]"

	inputs = tokenizer(
	prompt,
	return_tensors="pt"
	).to(model.device)

	with torch.no_grad():
	output = model.generate(
	**inputs,
	max_new_tokens=200,
	temperature=0.7,
	top_p=0.9,
	do_sample=True,
	repetition_penalty=1.1,
	eos_token_id=tokenizer.eos_token_id,
	)

	response = tokenizer.decode(output[0], skip_special_tokens=True)
	print(f"\n Roy: {response}\n")