Text Generation
Transformers
PyTorch
Safetensors
OpenVINO
English
gemma2
text-generation-inference
unsloth
trl
openvino-export
Instructions to use santhosh/GRMR-2B-Instruct-openvino with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use santhosh/GRMR-2B-Instruct-openvino with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="santhosh/GRMR-2B-Instruct-openvino")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("santhosh/GRMR-2B-Instruct-openvino") model = AutoModelForCausalLM.from_pretrained("santhosh/GRMR-2B-Instruct-openvino") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use santhosh/GRMR-2B-Instruct-openvino with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "santhosh/GRMR-2B-Instruct-openvino" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "santhosh/GRMR-2B-Instruct-openvino", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/santhosh/GRMR-2B-Instruct-openvino
- SGLang
How to use santhosh/GRMR-2B-Instruct-openvino with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "santhosh/GRMR-2B-Instruct-openvino" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "santhosh/GRMR-2B-Instruct-openvino", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "santhosh/GRMR-2B-Instruct-openvino" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "santhosh/GRMR-2B-Instruct-openvino", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio new
How to use santhosh/GRMR-2B-Instruct-openvino with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for santhosh/GRMR-2B-Instruct-openvino to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for santhosh/GRMR-2B-Instruct-openvino to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for santhosh/GRMR-2B-Instruct-openvino to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="santhosh/GRMR-2B-Instruct-openvino", max_seq_length=2048, ) - Docker Model Runner
How to use santhosh/GRMR-2B-Instruct-openvino with Docker Model Runner:
docker model run hf.co/santhosh/GRMR-2B-Instruct-openvino
This model was converted to OpenVINO from qingy2024/GRMR-2B-Instruct using optimum-intel
via the export space.
First make sure you have optimum-intel installed:
pip install optimum[openvino]
To load your model you can do as follows:
from transformers import AutoTokenizer, AutoConfig, pipeline
from optimum.intel.openvino import OVModelForSeq2SeqLM
import time
mode_id = "santhosh/GRMR-2B-Instruct-openvino"
model = OVModelForSeq2SeqLM.from_pretrained(
model_id,
config=AutoConfig.from_pretrained(model_id),
use_cache=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Create a pipeline
pipe = pipeline(
"text2text-generation",
model=model,
tokenizer=tokenizer,
truncation=True,
max_length=256,
)
texts = [
"Most of the course is about semantic or content of language but there are also interesting topics to be learned from the servicefeatures except statistics in characters in documents.",
"At this point, He introduces herself as his native English speaker and goes on to say that if you contine to work on social scnce",
"He come after the event.",
"When I grew up, I start to understand what he said is quite right",
"Write this more formally: omg! i love that song im listening to right now",
"Improve the grammaticality: As the number of people grows, the need of habitable environment is unquestionably essential.",
]
start_time = time.time()
for result in pipe(texts):
print(result)
end_time = time.time()
duration = end_time - start_time
print(f"Correction completed in {duration:.2f} seconds.")
- Downloads last month
- 3
Model tree for santhosh/GRMR-2B-Instruct-openvino
Base model
google/gemma-2-2b Quantized
unsloth/gemma-2-2b-bnb-4bit Finetuned
qingy2024/GRMR-2B-Instruct