Instructions to use mshojaei77/Gemma-2-2b-fa with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mshojaei77/Gemma-2-2b-fa with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="mshojaei77/Gemma-2-2b-fa")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("mshojaei77/Gemma-2-2b-fa")
model = AutoModelForCausalLM.from_pretrained("mshojaei77/Gemma-2-2b-fa")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use mshojaei77/Gemma-2-2b-fa with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="mshojaei77/Gemma-2-2b-fa",
	filename="Gemma_fa_2b_q8_0.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Inference
Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use mshojaei77/Gemma-2-2b-fa with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf mshojaei77/Gemma-2-2b-fa:Q8_0
# Run inference directly in the terminal:
llama-cli -hf mshojaei77/Gemma-2-2b-fa:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf mshojaei77/Gemma-2-2b-fa:Q8_0
# Run inference directly in the terminal:
llama-cli -hf mshojaei77/Gemma-2-2b-fa:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf mshojaei77/Gemma-2-2b-fa:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf mshojaei77/Gemma-2-2b-fa:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf mshojaei77/Gemma-2-2b-fa:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf mshojaei77/Gemma-2-2b-fa:Q8_0

Use Docker

docker model run hf.co/mshojaei77/Gemma-2-2b-fa:Q8_0

LM Studio
Jan

vLLM

How to use mshojaei77/Gemma-2-2b-fa with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "mshojaei77/Gemma-2-2b-fa"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mshojaei77/Gemma-2-2b-fa",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/mshojaei77/Gemma-2-2b-fa:Q8_0

SGLang

How to use mshojaei77/Gemma-2-2b-fa with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "mshojaei77/Gemma-2-2b-fa" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mshojaei77/Gemma-2-2b-fa",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "mshojaei77/Gemma-2-2b-fa" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mshojaei77/Gemma-2-2b-fa",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use mshojaei77/Gemma-2-2b-fa with Ollama:
```
ollama run hf.co/mshojaei77/Gemma-2-2b-fa:Q8_0
```

Unsloth Studio new

How to use mshojaei77/Gemma-2-2b-fa with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for mshojaei77/Gemma-2-2b-fa to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for mshojaei77/Gemma-2-2b-fa to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for mshojaei77/Gemma-2-2b-fa to start chatting

Docker Model Runner
How to use mshojaei77/Gemma-2-2b-fa with Docker Model Runner:
```
docker model run hf.co/mshojaei77/Gemma-2-2b-fa:Q8_0
```

Lemonade

How to use mshojaei77/Gemma-2-2b-fa with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull mshojaei77/Gemma-2-2b-fa:Q8_0

Run and chat with the model

lemonade run user.Gemma-2-2b-fa-Q8_0

List all available models

lemonade list

A newer version of this model is available: mshojaei77/gemma-2-2b-fa-v2

Persian Gemma 2b - Conversational AI Experiment (Early Stage)

This repository presents Persian Gemma 2b, an early-stage experimental model derived from Google's Gemma-2-2b-it. It has been fine-tuned using QLoRA on the mshojaei77/Persian_sft dataset to explore its capabilities in Persian language conversational tasks.

1. Model Architecture

Base Model: google/gemma-2-2b-it
Architecture Type: Gemma2ForCausalLM
Model Size: 2 billion parameters.
Description: Persian Gemma 2b inherits the architecture of Gemma-2-2b-it, a lightweight yet capable model known for its efficiency and strong performance for its size. It is designed for text generation tasks and is particularly suited for conversational applications. The model uses standard transformer layers with attention mechanisms, enabling it to process and generate text in Persian.

2. Training Details

Fine-tuning Method: QLoRA (Quantization-aware Low-Rank Adaptation)
- QLoRA is used for parameter-efficient fine-tuning, allowing adaptation of the base model with reduced computational resources and memory footprint.
- LoRA Rank (r): 32
- LoRA Alpha: 16
- LoRA Dropout: 0.05
- LoRA Target Modules: ['down_proj', 'gate_proj', 'k_proj', 'o_proj', 'q_proj', 'up_proj', 'v_proj'] (linear layers)
Training Dataset: mshojaei77/Persian_sft
Training Steps: 20 (Extremely limited - Proof of Concept)
Hardware: Kaggle Notebook, T4 GPU
Software: Axolotl library
Optimizer: paged_adamw_32bit
Learning Rate Scheduler: cosine
Learning Rate: 0.0002
Micro Batch Size: 1
Gradient Accumulation Steps: 1
Sequence Length: 2048
Sample Packing: Enabled (sample_packing: true)
Mixed Precision: FP16 (fp16: true), Load in 4bit (load_in_4bit: true), BF16: Disabled (bf16: false)
Gradient Checkpointing: Enabled (gradient_checkpointing: true)
Attention Implementation: SDPA (default, Flash Attention: explicitly disabled - flash_attention: false)
Tokenizer: Uses the tokenizer from the base model google/gemma-2-2b-it.
Chat Template: gemma
Training Objective: Supervised Fine-tuning (SFT) to adapt the base model for Persian conversational responses, guided by the Persian_sft dataset.
Validation Set: None used in this preliminary experiment.

Critical Note: The model was trained for an exceptionally short duration (20 steps). This is insufficient for robust learning and generalization. Expect significantly under-optimized performance.

3. Dataset Information

Dataset Name: mshojaei77/Persian_sft
Dataset Description: The Persian_sft dataset is a collection of Persian conversations designed for instruction fine-tuning of language models. It likely contains examples of user queries and desired model responses in Persian, formatted for conversational fine-tuning.
Dataset Type: Supervised Fine-tuning (SFT) dataset for conversational AI.
Language: Primarily Persian (fa).

4. Intended Use

Intended Use Cases:

Research & Experimentation: Primary use is to investigate the feasibility of fine-tuning Gemma-2-2b-it for Persian language conversational tasks and to serve as a starting point for further research.
Educational Purposes: Demonstration of QLoRA fine-tuning techniques using Axolotl, and a practical example for learning about Persian language model development.
Community Development: To encourage community contributions towards building better Persian language models and resources.
Prototyping (with caution): For rapid prototyping and exploring potential applications of Persian conversational AI, strictly acknowledging the model's limitations and preliminary state.

5. Limitations

Severe Under-training: Trained for only 20 steps, leading to significantly sub-optimal performance across all aspects.
Lack of Validation: Absence of a validation set hinders monitoring of generalization and increases the risk of overfitting.
Limited Fluency and Coherence: May produce grammatically incorrect, disfluent, or incoherent Persian text, especially in complex or lengthy conversations.
Hallucinations and Factual Errors: Prone to generating factually incorrect or nonsensical information. Verification of output is crucial.
Bias: Likely inherits and potentially amplifies biases from the base model and the fine-tuning dataset, leading to biased or unfair outputs.
Poor Generalization: Performance is expected to degrade significantly on data outside the training distribution (different conversational styles, topics, or domains).
Limited Conversational Abilities: May struggle with complex conversational turns, context maintenance, and nuanced understanding of user intent.
Ethical Concerns: Potential for biased, inaccurate, or inappropriate output raises ethical concerns, especially in sensitive applications.

6. Performance Metrics

Current Evaluation:

No formal evaluation has been conducted for this preliminary model due to its extremely limited training. Performance is expected to be significantly below optimal.

7. How to Use

import torch
from transformers import pipeline

# Initialize the text generation pipeline
pipe = pipeline(
    "text-generation",
    model="mshojaei77/Gemma-2b-fa",
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",  # Or "mps" for Macs with Apple Silicon
)

# Prepare input messages (using the gemma chat template implicitly)
messages = [
    {"role": "user", "content": "سلام چطوری؟"},
]

# Generate a response with a maximum of 512 new tokens
outputs = pipe(messages, max_new_tokens=512, chat_template="gemma") # Explicitly using chat_template for clarity
assistant_response = outputs[0]["generated_text"][-1]["content"].strip()

print(assistant_response)
# Example Output (Illustrative - Output quality may vary significantly):
# سلام! من خوبم، ممنون. شما چطوری؟ 😊

Important Usage Notes:

library_name: transformers and pipeline_tag: text-generation: Specified in metadata and Model Details for discoverability and clarity.
chat_template="gemma": Use the correct chat template for Gemma models.
Hardware Recommendations: CUDA GPU recommended. device="mps" for Apple Silicon (performance may vary).
Output Quality: Expect highly variable and often suboptimal output due to limited training. Critical evaluation of generated text is essential.

Downloads last month: 40

Safetensors

Model size

3B params

Tensor type

F16

Model tree for mshojaei77/Gemma-2-2b-fa

Base model

google/gemma-2-2b

Finetuned

google/gemma-2-2b-it

Quantized

(177)

this model

Quantizations

2 models

mshojaei77
/

Gemma-2-2b-fa