Instructions to use GenVRadmin/AryaBhatta-GemmaOrca with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use GenVRadmin/AryaBhatta-GemmaOrca with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="GenVRadmin/AryaBhatta-GemmaOrca")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("GenVRadmin/AryaBhatta-GemmaOrca")
model = AutoModelForCausalLM.from_pretrained("GenVRadmin/AryaBhatta-GemmaOrca")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use GenVRadmin/AryaBhatta-GemmaOrca with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "GenVRadmin/AryaBhatta-GemmaOrca"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GenVRadmin/AryaBhatta-GemmaOrca",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/GenVRadmin/AryaBhatta-GemmaOrca

SGLang

How to use GenVRadmin/AryaBhatta-GemmaOrca with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "GenVRadmin/AryaBhatta-GemmaOrca" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GenVRadmin/AryaBhatta-GemmaOrca",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "GenVRadmin/AryaBhatta-GemmaOrca" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GenVRadmin/AryaBhatta-GemmaOrca",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use GenVRadmin/AryaBhatta-GemmaOrca with Docker Model Runner:
```
docker model run hf.co/GenVRadmin/AryaBhatta-GemmaOrca
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

This model is finetuned from HuggingFaceH4/zephyr-7b-gemma-v0.1 and is finetuned on 9 Indian languages (Hindi, Tamil, Punjabi, Bengali, Gujarati, Oriya, Telugu, Kannada, Malayalam) plus English. To improve the resoning and maths skills, we first SFT tune the gemma on Microsoft's Orca datasets.

We utilize Orca maths Hindi dataset: GenVRadmin/Aryabhatta-Orca-Maths-Hindi
And original Orca maths dataset: microsoft/orca-math-word-problems-200k

This pushes the MATHS score from 24.3 in Gemma-7B to 25.5 in Zephyr-Gemma and 31.6 in GemmaOrca.

The model is then finetuned on GenVR's Samvaad datasets (GenVRadmin/Samvaad-Indic-Positive and GenVRadmin/Samvaad-Tamil-Mixtral and a subset of GenVRadmin/Samvaad-Mixed-Language-3).

This is then finetuned on various open sourced datasets like:

Telugu-LLM-Labs/yahma_alpaca_cleaned_telugu_filtered_and_romanized
Telugu-LLM-Labs/teknium_GPTeacher_general_instruct_telugu_filtered_and_romanized
abhinand/tamil-alpaca
Tensoic/airoboros-3.2_kn
Tensoic/gpt-teacher_kn
Tensoic/Alpaca-Gujarati
HydraIndicLM/bengali_alpaca_dolly_67k
Open-Orca/OpenOrca
pankajmathur/alpaca_orca
OdiaGenAI/Odia_Alpaca_instructions_52k
OdiaGenAI/gpt-teacher-roleplay-odia-3k
GenVRadmin/Samvaad-Punjabi-Mini
pankajmathur/WizardLM_Orca

The model achieves following scores on benchmarks:

Model AGIEval GPT4All TruthfulQA BigBench Average ⬇️
AryaBhatta-GemmaOrca 35.9 72.26 53.85 40.35 50.59
zephyr-7b-beta 37.52 71.77 55.26 39.77 51.08
zephyr-7b-gemma-v0.1 34.22 66.37 52.19 37.10 47.47
mlabonne/Gemmalpaca-7B 21.6 40.87 44.85 30.49 34.45
google/gemma-7b-it 21.33 40.84 41.70 30.25 33.53

How to use:-

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer

model = AutoPeftModelForCausalLM.from_pretrained(
    "GenVRadmin/AryaBhatta-GemmaOrca",
    load_in_4bit = False,
    token = hf_token
)
tokenizer = AutoTokenizer.from_pretrained("GenVRadmin/AryaBhatta-GemmaOrca")

input_prompt = """
### Instruction:
{}

### Input:
{}

### Response:
{}"""

input_text = input_prompt.format(
        "Answer this question about India.", # instruction
        "Who is the Prime Minister of India", # input
        "", # output - leave this blank for generation!
    )

inputs = tokenizer([input_text], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 300, use_cache = True)
response = tokenizer.batch_decode(outputs)[0]

Downloads last month: 1

Safetensors

Model size

9B params

Tensor type

F16

Model tree for GenVRadmin/AryaBhatta-GemmaOrca

Quantizations

1 model