Instructions to use santhosh/GRMR-2B-Instruct-openvino with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use santhosh/GRMR-2B-Instruct-openvino with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="santhosh/GRMR-2B-Instruct-openvino")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("santhosh/GRMR-2B-Instruct-openvino")
model = AutoModelForCausalLM.from_pretrained("santhosh/GRMR-2B-Instruct-openvino")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use santhosh/GRMR-2B-Instruct-openvino with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "santhosh/GRMR-2B-Instruct-openvino"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "santhosh/GRMR-2B-Instruct-openvino",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/santhosh/GRMR-2B-Instruct-openvino

SGLang

How to use santhosh/GRMR-2B-Instruct-openvino with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "santhosh/GRMR-2B-Instruct-openvino" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "santhosh/GRMR-2B-Instruct-openvino",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "santhosh/GRMR-2B-Instruct-openvino" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "santhosh/GRMR-2B-Instruct-openvino",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Unsloth Studio new

How to use santhosh/GRMR-2B-Instruct-openvino with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for santhosh/GRMR-2B-Instruct-openvino to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for santhosh/GRMR-2B-Instruct-openvino to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for santhosh/GRMR-2B-Instruct-openvino to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="santhosh/GRMR-2B-Instruct-openvino",
    max_seq_length=2048,
)

Docker Model Runner
How to use santhosh/GRMR-2B-Instruct-openvino with Docker Model Runner:
```
docker model run hf.co/santhosh/GRMR-2B-Instruct-openvino
```

This model was converted to OpenVINO from qingy2024/GRMR-2B-Instruct using optimum-intel via the export space.

First make sure you have optimum-intel installed:

pip install optimum[openvino]

To load your model you can do as follows:

from transformers import AutoTokenizer, AutoConfig, pipeline
from optimum.intel.openvino import OVModelForSeq2SeqLM
import time

mode_id = "santhosh/GRMR-2B-Instruct-openvino"
model = OVModelForSeq2SeqLM.from_pretrained(
    model_id,
    config=AutoConfig.from_pretrained(model_id),
    use_cache=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Create a pipeline
pipe = pipeline(
    "text2text-generation",
    model=model,
    tokenizer=tokenizer,
    truncation=True,
    max_length=256,
)

texts = [
    "Most of the course is about semantic or  content of language but there are also interesting topics to be learned from the servicefeatures except statistics in characters in documents.",
    "At this point, He introduces herself as his native English speaker and goes on to say that if you contine to work on social scnce",
    "He come after the event.",
    "When I grew up, I start to understand what he said is quite right",
    "Write this more formally: omg! i love that song im listening to right now",
    "Improve the grammaticality: As the number of people grows, the need of habitable environment is unquestionably essential.",
]
start_time = time.time()
for result in pipe(texts):
    print(result)
end_time = time.time()
duration = end_time - start_time
print(f"Correction completed in {duration:.2f} seconds.")

Downloads last month: 3

Model tree for santhosh/GRMR-2B-Instruct-openvino

Base model

google/gemma-2-2b

Quantized

unsloth/gemma-2-2b-bnb-4bit

Finetuned

qingy2024/GRMR-2B-Instruct

Finetuned

(1)

this model