Instructions to use ConicCat/Gemma4Test with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ConicCat/Gemma4Test with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("google/gemma-4-31B-it")
model = PeftModel.from_pretrained(base_model, "ConicCat/Gemma4Test")

Transformers

How to use ConicCat/Gemma4Test with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ConicCat/Gemma4Test")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("ConicCat/Gemma4Test")
model = AutoModelForImageTextToText.from_pretrained("ConicCat/Gemma4Test")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ConicCat/Gemma4Test with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ConicCat/Gemma4Test"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ConicCat/Gemma4Test",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ConicCat/Gemma4Test

SGLang

How to use ConicCat/Gemma4Test with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ConicCat/Gemma4Test" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ConicCat/Gemma4Test",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ConicCat/Gemma4Test" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ConicCat/Gemma4Test",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ConicCat/Gemma4Test with Docker Model Runner:
```
docker model run hf.co/ConicCat/Gemma4Test
```

Gemma4Test / README.md

ConicCat

Upload folder using huggingface_hub

d819a48 verified 15 days ago

preview code

raw

history blame contribute delete

6.3 kB

	---
	library_name: peft
	license: apache-2.0
	base_model: google/gemma-4-31B-it
	tags:
	- axolotl
	- base_model:adapter:google/gemma-4-31B-it
	- lora
	- transformers
	datasets:
	- ConicCat/Mura_Books
	pipeline_tag: text-generation
	model-index:
	- name: Writer-Stage-2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.16.0.dev0`
	```yaml
	base_model: google/gemma-4-31B-it

	load_in_8bit: false
	load_in_4bit: false

	plugins:
	- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
	- axolotl.integrations.liger.LigerPlugin
	torch_compile: false
	liger_layer_norm: true
	liger_rope: true
	liger_rms_norm: true
	liger_glu_activation: true
	liger_rms_norm_gated: true
	strict: false

	sequence_len: 2048
	max_sample_length: 2048

	flash_attention: false
	sdp_attention: true

	sample_packing: true
	gradient_checkpointing: true
	activation_offloading: true

	bf16: true
	tf32: true

	lora_mlp_kernel: false
	lora_qkv_kernel: false
	lora_o_kernel: false

	datasets:

	- path: ConicCat/Mura_Books
	type: chat_template

	chat_template_jinja: >
	{%- macro strip_thinking(text) -%}
	{%- set ns = namespace(result='') -%}
	{%- for part in text.split('<channel\|>') -%}
	{%- if '<\|channel>' in part -%}
	{%- set ns.result = ns.result + part.split('<\|channel>')[0] -%}
	{%- else -%}
	{%- set ns.result = ns.result + part -%}
	{%- endif -%}
	{%- endfor -%}
	{{- ns.result \| trim -}}
	{%- endmacro -%}

	{%- set loop_messages = messages -%}
	{{ bos_token }}

	{#- Handle System Definitions Block -#}
	{%- if (enable_thinking is defined and enable_thinking) or messages[0]['role'] in ['system', 'developer'] -%}
	{{- '<\|turn>system\n' -}}

	{#- Inject Thinking token at the very top of the FIRST system turn -#}
	{%- if enable_thinking is defined and enable_thinking -%}
	{{- '<\|think\|>' -}}
	{%- endif -%}

	{%- if messages[0]['role'] in ['system', 'developer'] -%}
	{{- messages[0]['content'] \| trim -}}
	{%- set loop_messages = messages[1:] -%}
	{%- endif -%}

	{{- '<turn\|>\n' -}}
	{%- endif %}

	{#- Loop through messages -#}
	{%- for message in loop_messages -%}
	{%- set role = 'model' if message['role'] == 'assistant' else message['role'] -%}
	{{- '<\|turn>' + role + '\n' -}}

	{#- Flag to identify the final SFT turn -#}
	{%- set is_final_sft_turn = loop.last and not add_generation_prompt -%}

	{%- if message['content'] is string -%}
	{%- if role == 'model' -%}
	{%- if is_final_sft_turn and '<\|channel>thought' not in message['content'] -%}
	{{- '<\|channel>thought\n<channel\|>' -}}
	{%- endif -%}
	{{- strip_thinking(message['content']) -}}
	{%- else -%}
	{{- message['content'] \| trim -}}
	{%- endif -%}
	{%- elif message['content'] is sequence -%}
	{%- set ns = namespace(has_thinking=false) -%}
	{%- for item in message['content'] -%}
	{%- if item['type'] == 'text' and '<\|channel>thought' in item['text'] -%}
	{%- set ns.has_thinking = true -%}
	{%- endif -%}
	{%- endfor -%}

	{%- if role == 'model' and is_final_sft_turn and not ns.has_thinking -%}
	{{- '<\|channel>thought\n<channel\|>' -}}
	{%- endif -%}

	{%- for item in message['content'] -%}
	{%- if item['type'] == 'text' -%}
	{%- if role == 'model' -%}
	{{- strip_thinking(item['text']) -}}
	{%- else -%}
	{{- item['text'] \| trim -}}
	{%- endif -%}
	{%- endif -%}
	{%- endfor -%}
	{%- endif -%}

	{{- '<turn\|>\n' -}}
	{%- endfor -%}

	{#- Generation Prompt handled as normal (serves as the final turn when true) -#}
	{%- if add_generation_prompt -%}
	{{- '<\|turn>model\n' -}}
	{%- if not enable_thinking \| default(false) -%}
	{{- '<\|channel>thought\n<channel\|>' -}}
	{%- endif -%}
	{%- endif -%}


	adapter: lora
	lora_r: 32
	lora_alpha: 64
	lora_dropout: 0.0
	lora_bias: None
	lora_target_modules: 'model.language_model.layers.[\d]+.(_checkpoint_wrapped_module.)?(mlp\|self_attn).(up\|down\|gate\|q\|k\|v\|o)_proj'
	use_tensorboard: true

	optimizer: paged_adamw_8bit
	learning_rate: 2.5e-5 # 1e-4 / 4
	loraplus_lr_ratio: 16

	# Training arguments
	output_dir: ./Writer-Stage-2
	num_epochs: 11
	micro_batch_size: 2
	gradient_accumulation_steps: 4
	save_strategy: 'no'
	warmup_ratio: 0.05
	lr_scheduler: 'cosine'
	max_grad_norm: 1
	logging_steps: 1
	seed: 42

	eot_tokens:
	- "<turn\|>"

	push_dataset_to_hub: ConicCat/Gemma4-Mura
	hf_use_auth_token: true
	```

	</details><br>

	# Writer-Stage-2

	This model is a fine-tuned version of [google/gemma-4-31B-it](https://huggingface.co/google/gemma-4-31B-it) on the ConicCat/Mura_Books dataset.

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2.5e-05
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 8
	- optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 12
	- training_steps: 242

	### Training results



	### Framework versions

	- PEFT 0.19.1
	- Transformers 5.5.0
	- Pytorch 2.8.0+cu128
	- Datasets 4.5.0
	- Tokenizers 0.22.2