Instructions to use souvik18/Roy with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use souvik18/Roy with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="souvik18/Roy")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("souvik18/Roy")
model = AutoModelForCausalLM.from_pretrained("souvik18/Roy")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use souvik18/Roy with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "souvik18/Roy"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "souvik18/Roy",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/souvik18/Roy

SGLang

How to use souvik18/Roy with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "souvik18/Roy" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "souvik18/Roy",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "souvik18/Roy" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "souvik18/Roy",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use souvik18/Roy with Docker Model Runner:
```
docker model run hf.co/souvik18/Roy
```

Roy

Model Overview

Roy is a fine-tuned large language model based on
mistralai/Mistral-7B-Instruct-v0.2.

The model was trained using QLoRA with a resumable streaming pipeline and later merged into the base model to produce a single standalone checkpoint (no LoRA adapter required at inference time).

This model is optimized for:

Instruction following
Conversational responses
General reasoning and explanation tasks

Base Model

Base: Mistral-7B-Instruct-v0.2
Architecture: Decoder-only Transformer
Parameters: ~7B
Context Length: 2048 tokens

Training Dataset

The model was trained on a custom tokenized dataset:

Dataset name: mistral_tokenized_2048_fixed_v2
Dataset repository:
https://huggingface.co/datasets/souvik18/mistral_tokenized_2048_fixed_v2
Owner: souvik18
Format: Pre-tokenized input_ids
Sequence length: 2048
Tokenizer: Mistral tokenizer
Dataset size: ~10.7M tokens

Dataset Processing

Fixed padding and truncation
Removed malformed / corrupted samples
Validated against NaN and overflow issues
Optimized for streaming-based training

Training Method

Fine-tuning method: QLoRA
Quantization: 4-bit (NF4)
Optimizer: AdamW
Learning rate: 2e-4
LoRA rank (r): 32
Target modules:
q_proj, k_proj, v_proj, o_proj,
gate_proj, up_proj, down_proj
Gradient checkpointing: Enabled
Training style: Streaming + resumable
Checkpointing: Hugging Face Hub (HF-only)

After training, the LoRA adapter was merged into the base model weights to create this final model.

Inference

This model can be used directly without any LoRA adapter.

Example (Transformers)

!pip uninstall -y transformers peft accelerate torch safetensors numpy
!pip install numpy==1.26.4
!pip install torch==2.2.2
!pip install transformers==4.41.2
!pip install peft==0.11.1
!pip install accelerate==0.30.1
!pip install safetensors==0.4.3

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# -----------------------------
# CONFIG
# -----------------------------
MODEL_ID = "souvik18/Roy"
DTYPE = torch.float16   # use float16 for GPU

# -----------------------------
# LOAD TOKENIZER & MODEL
# -----------------------------
print("🔹 Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
tokenizer.pad_token = tokenizer.eos_token

print("🔹 Loading model...")
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=DTYPE,
    device_map="auto"
)
model.eval()

print("\n✅ Model loaded successfully")
print("Type 'exit' or 'quit' to stop\n")

# -----------------------------
# CHAT LOOP
# -----------------------------
while True:
    user_input = input("🧑 You: ").strip()

    if user_input.lower() in ["exit", "quit"]:
        print("👋 Bye!")
        break

    prompt = f"[INST] {user_input} [/INST]"

    inputs = tokenizer(
        prompt,
        return_tensors="pt"
    ).to(model.device)

    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_new_tokens=200,
            temperature=0.7,
            top_p=0.9,
            do_sample=True,
            repetition_penalty=1.1,
            eos_token_id=tokenizer.eos_token_id,
        )

    response = tokenizer.decode(output[0], skip_special_tokens=True)
    print(f"\n Roy: {response}\n")

Downloads last month: 1

Safetensors

Model size

7B params

Tensor type

F16

Model tree for souvik18/Roy

Base model

mistralai/Mistral-7B-Instruct-v0.2

Adapter

(1283)

this model

Adapters

1 model

souvik18
/

Roy

Roy