Text Generation
PEFT
Safetensors
GGUF
gemma4
unsloth
lora
qlora
fine-tuning
hackathon
gemma-4-good-hackathon
kaggle
translation
speech-recognition
accessibility
on-device
conversational
Instructions to use bradduy/banhmi-gemma4-e4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use bradduy/banhmi-gemma4-e4b with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/gemma-4-E4B-it-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "bradduy/banhmi-gemma4-e4b") - llama-cpp-python
How to use bradduy/banhmi-gemma4-e4b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="bradduy/banhmi-gemma4-e4b", filename="banhmi-gemma4.Q3_K_S.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use bradduy/banhmi-gemma4-e4b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf bradduy/banhmi-gemma4-e4b:Q3_K_S # Run inference directly in the terminal: llama-cli -hf bradduy/banhmi-gemma4-e4b:Q3_K_S
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf bradduy/banhmi-gemma4-e4b:Q3_K_S # Run inference directly in the terminal: llama-cli -hf bradduy/banhmi-gemma4-e4b:Q3_K_S
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf bradduy/banhmi-gemma4-e4b:Q3_K_S # Run inference directly in the terminal: ./llama-cli -hf bradduy/banhmi-gemma4-e4b:Q3_K_S
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf bradduy/banhmi-gemma4-e4b:Q3_K_S # Run inference directly in the terminal: ./build/bin/llama-cli -hf bradduy/banhmi-gemma4-e4b:Q3_K_S
Use Docker
docker model run hf.co/bradduy/banhmi-gemma4-e4b:Q3_K_S
- LM Studio
- Jan
- vLLM
How to use bradduy/banhmi-gemma4-e4b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "bradduy/banhmi-gemma4-e4b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bradduy/banhmi-gemma4-e4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/bradduy/banhmi-gemma4-e4b:Q3_K_S
- Ollama
How to use bradduy/banhmi-gemma4-e4b with Ollama:
ollama run hf.co/bradduy/banhmi-gemma4-e4b:Q3_K_S
- Unsloth Studio new
How to use bradduy/banhmi-gemma4-e4b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for bradduy/banhmi-gemma4-e4b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for bradduy/banhmi-gemma4-e4b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for bradduy/banhmi-gemma4-e4b to start chatting
- Docker Model Runner
How to use bradduy/banhmi-gemma4-e4b with Docker Model Runner:
docker model run hf.co/bradduy/banhmi-gemma4-e4b:Q3_K_S
- Lemonade
How to use bradduy/banhmi-gemma4-e4b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull bradduy/banhmi-gemma4-e4b:Q3_K_S
Run and chat with the model
lemonade run user.banhmi-gemma4-e4b-Q3_K_S
List all available models
lemonade list
| #!/usr/bin/env python3 | |
| """ | |
| Dataset preparation for Gemma 4 fine-tuning. | |
| Converts raw datasets into Gemma 4 chat format and saves to data/processed/. | |
| Usage: | |
| python scripts/prepare_data.py --dataset <name> --output data/processed/train.jsonl | |
| """ | |
| import argparse | |
| import json | |
| import os | |
| from datasets import load_dataset | |
| def convert_to_gemma4_chat(example, system_prompt=None): | |
| """Convert a single example to Gemma 4 chat format. | |
| Gemma 4 uses "model" (not "assistant") as the role name. | |
| """ | |
| messages = [] | |
| if system_prompt: | |
| messages.append({"role": "system", "content": system_prompt}) | |
| # Handle different dataset formats | |
| if "conversations" in example: | |
| for turn in example["conversations"]: | |
| role = turn.get("role", turn.get("from", "")) | |
| content = turn.get("content", turn.get("value", "")) | |
| # Normalize roles | |
| if role in ("assistant", "gpt", "bot"): | |
| role = "model" | |
| elif role in ("human", "user"): | |
| role = "user" | |
| messages.append({"role": role, "content": content}) | |
| elif "messages" in example: | |
| for msg in example["messages"]: | |
| role = msg["role"] | |
| if role == "assistant": | |
| role = "model" | |
| messages.append({"role": role, "content": msg["content"]}) | |
| elif "instruction" in example: | |
| user_content = example["instruction"] | |
| if example.get("input"): | |
| user_content += f"\n\nInput: {example['input']}" | |
| messages.append({"role": "user", "content": user_content}) | |
| messages.append({"role": "model", "content": example["output"]}) | |
| elif "question" in example and "answer" in example: | |
| messages.append({"role": "user", "content": example["question"]}) | |
| messages.append({"role": "model", "content": example["answer"]}) | |
| elif "prompt" in example and "response" in example: | |
| messages.append({"role": "user", "content": example["prompt"]}) | |
| messages.append({"role": "model", "content": example["response"]}) | |
| else: | |
| raise ValueError(f"Unknown dataset format. Keys: {list(example.keys())}") | |
| return {"messages": messages} | |
| def load_and_convert(dataset_name, split="train", system_prompt=None, max_samples=None): | |
| """Load a HuggingFace dataset and convert to Gemma 4 format.""" | |
| print(f"Loading dataset: {dataset_name} (split={split})") | |
| if max_samples: | |
| dataset = load_dataset(dataset_name, split=f"{split}[:{max_samples}]") | |
| else: | |
| dataset = load_dataset(dataset_name, split=split) | |
| print(f"Loaded {len(dataset)} examples") | |
| converted = [] | |
| errors = 0 | |
| for i, example in enumerate(dataset): | |
| try: | |
| converted.append(convert_to_gemma4_chat(example, system_prompt)) | |
| except ValueError as e: | |
| if errors == 0: | |
| print(f" Warning: {e}") | |
| errors += 1 | |
| if errors: | |
| print(f" Skipped {errors} examples due to format errors") | |
| print(f"Converted {len(converted)} examples to Gemma 4 chat format") | |
| return converted | |
| def save_jsonl(data, output_path): | |
| """Save data as JSONL file.""" | |
| os.makedirs(os.path.dirname(output_path) or ".", exist_ok=True) | |
| with open(output_path, "w") as f: | |
| for item in data: | |
| f.write(json.dumps(item) + "\n") | |
| print(f"Saved {len(data)} examples to {output_path}") | |
| def main(): | |
| parser = argparse.ArgumentParser(description="Prepare dataset for Gemma 4 fine-tuning") | |
| parser.add_argument("--dataset", type=str, required=True, | |
| help="HuggingFace dataset name (e.g., 'mlabonne/FineTome-100k')") | |
| parser.add_argument("--split", type=str, default="train", | |
| help="Dataset split to use") | |
| parser.add_argument("--output", type=str, default="data/processed/train.jsonl", | |
| help="Output JSONL file path") | |
| parser.add_argument("--system-prompt", type=str, default=None, | |
| help="System prompt to prepend to every conversation") | |
| parser.add_argument("--max-samples", type=int, default=None, | |
| help="Maximum number of samples to use") | |
| parser.add_argument("--eval-split", type=float, default=0.05, | |
| help="Fraction of data to hold out for evaluation (0 to disable)") | |
| args = parser.parse_args() | |
| data = load_and_convert( | |
| args.dataset, | |
| split=args.split, | |
| system_prompt=args.system_prompt, | |
| max_samples=args.max_samples, | |
| ) | |
| if args.eval_split > 0 and len(data) > 20: | |
| eval_size = max(1, int(len(data) * args.eval_split)) | |
| train_data = data[:-eval_size] | |
| eval_data = data[-eval_size:] | |
| save_jsonl(train_data, args.output) | |
| eval_path = args.output.replace(".jsonl", "_eval.jsonl") | |
| save_jsonl(eval_data, eval_path) | |
| else: | |
| save_jsonl(data, args.output) | |
| if __name__ == "__main__": | |
| main() | |