Instructions to use Guilherme34/Firefly-v4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Guilherme34/Firefly-v4 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Guilherme34/Firefly-v4")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Guilherme34/Firefly-v4")
model = AutoModelForMultimodalLM.from_pretrained("Guilherme34/Firefly-v4", device_map="auto")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Guilherme34/Firefly-v4 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Guilherme34/Firefly-v4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Guilherme34/Firefly-v4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Guilherme34/Firefly-v4

SGLang

How to use Guilherme34/Firefly-v4 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Guilherme34/Firefly-v4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Guilherme34/Firefly-v4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Guilherme34/Firefly-v4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Guilherme34/Firefly-v4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Unsloth Studio

How to use Guilherme34/Firefly-v4 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Guilherme34/Firefly-v4 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Guilherme34/Firefly-v4 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Guilherme34/Firefly-v4 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Guilherme34/Firefly-v4",
    max_seq_length=2048,
)

Docker Model Runner
How to use Guilherme34/Firefly-v4 with Docker Model Runner:
```
docker model run hf.co/Guilherme34/Firefly-v4
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

🔥

Firefly-v4

Uncensored multimodal roleplay model with switchable reasoning — sees, hears, and creates without limits.

text-generation vision audio unsloth gemma4

What is Firefly-v4

Firefly-v4 is an uncensored, multimodal roleplay and creative writing model fine-tuned from gemma-4-E2B-it-heretic-ara using Unsloth. It inherits Gemma 4's native vision and audio capabilities — it can see images and process audio as part of a conversation, not as a bolted-on pipeline.

Built for immersive, unfiltered RP and storytelling. No refusals, no moralizing, no character breaks. Works equally well for general creative tasks and instruction following.

🔓

Uncensored

No refusals or alignment guardrails. Full creative freedom for any scenario.

👁️

Vision

Natively understands images — describe scenes, react to photos, analyze reference art mid-RP.

🎧

Audio

Processes audio input natively via Gemma 4's multimodal architecture.

🧠

Switchable Reasoning

Toggle chain-of-thought on or off with a single tag — you control the depth.

🧠 Reasoning Toggle

Prefix your system prompt with <|think|> and the model thinks before it speaks. Leave it out and it responds directly.

⚡ THINKING ON

<|think|>You are in a roleplay as a Furry named Blaze.

Better coherence for complex scenes.

○ THINKING OFF

You are in a roleplay as a Furry named Blaze.

Faster, more spontaneous output.

Training

base_model	p-e-w/gemma-4-E2B-it-heretic-ara
method	Fine-tuned with Unsloth
modalities	Text + Vision + Audio (native Gemma 4)
focus	Uncensored roleplay, creative writing, hybrid reasoning

📜 License

Firefly-v4 Attribution License

1. Free for personal and non-commercial use. You may use, modify, merge, and distribute this model freely for any personal, educational, or non-commercial purpose.

2. Commercial use requires attribution. Any commercial product, service, API, or offering that uses this model — or any derivative, merge, or quantization of it — must prominently include the following attribution:

Based on Guilherme34/Firefly-v4

Must include the model name (Firefly-v4) and author (Guilherme34), linking to the original HuggingFace repo when possible.

3. Derivatives inherit this license. Any fine-tune, merge, distillation, or quantization must carry forward the same attribution terms.

4. No implied endorsement. Attribution does not imply endorsement by the original author.

5. No warranty. Provided "as-is" without warranty. The author is not liable for any damages arising from use.

⚠️ Disclaimer

This model is uncensored and will generate content without built-in refusals. It is intended for creative fiction and roleplay between consenting adults. The creator is not responsible for how the model is used. Do not use it to produce content that is illegal in your jurisdiction.

Made with 🔥 by Guilherme34