Instructions to use Guilherme34/Firefly-v4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Guilherme34/Firefly-v4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Guilherme34/Firefly-v4") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Guilherme34/Firefly-v4") model = AutoModelForImageTextToText.from_pretrained("Guilherme34/Firefly-v4") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Guilherme34/Firefly-v4 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Guilherme34/Firefly-v4" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Guilherme34/Firefly-v4", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Guilherme34/Firefly-v4
- SGLang
How to use Guilherme34/Firefly-v4 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Guilherme34/Firefly-v4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Guilherme34/Firefly-v4", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Guilherme34/Firefly-v4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Guilherme34/Firefly-v4", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Unsloth Studio new
How to use Guilherme34/Firefly-v4 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Guilherme34/Firefly-v4 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Guilherme34/Firefly-v4 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Guilherme34/Firefly-v4 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Guilherme34/Firefly-v4", max_seq_length=2048, ) - Docker Model Runner
How to use Guilherme34/Firefly-v4 with Docker Model Runner:
docker model run hf.co/Guilherme34/Firefly-v4
Firefly-v4
Uncensored multimodal roleplay model with switchable reasoning — sees, hears, and creates without limits.
What is Firefly-v4
Firefly-v4 is an uncensored, multimodal roleplay and creative writing model fine-tuned from gemma-4-E2B-it-heretic-ara using Unsloth. It inherits Gemma 4's native vision and audio capabilities — it can see images and process audio as part of a conversation, not as a bolted-on pipeline.
Built for immersive, unfiltered RP and storytelling. No refusals, no moralizing, no character breaks. Works equally well for general creative tasks and instruction following.
No refusals or alignment guardrails. Full creative freedom for any scenario.
Natively understands images — describe scenes, react to photos, analyze reference art mid-RP.
Processes audio input natively via Gemma 4's multimodal architecture.
Toggle chain-of-thought on or off with a single tag — you control the depth.
🧠 Reasoning Toggle
Prefix your system prompt with <|think|> and the model thinks before it speaks. Leave it out and it responds directly.
<|think|>You are in a roleplay as a Furry named Blaze.
Better coherence for complex scenes.
You are in a roleplay as a Furry named Blaze.
Faster, more spontaneous output.
Training
| base_model | p-e-w/gemma-4-E2B-it-heretic-ara |
| method | Fine-tuned with Unsloth |
| modalities | Text + Vision + Audio (native Gemma 4) |
| focus | Uncensored roleplay, creative writing, hybrid reasoning |
📜 License
Firefly-v4 Attribution License
Based on Guilherme34/Firefly-v4
Must include the model name (Firefly-v4) and author (Guilherme34), linking to the original HuggingFace repo when possible.
This model is uncensored and will generate content without built-in refusals. It is intended for creative fiction and roleplay between consenting adults. The creator is not responsible for how the model is used. Do not use it to produce content that is illegal in your jurisdiction.
- Downloads last month
- 79