Instructions to use Wizcoderr/qwen-flutter-fused with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Wizcoderr/qwen-flutter-fused with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Wizcoderr/qwen-flutter-fused")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Wizcoderr/qwen-flutter-fused")
model = AutoModelForCausalLM.from_pretrained("Wizcoderr/qwen-flutter-fused")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

MLX

How to use Wizcoderr/qwen-flutter-fused with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("Wizcoderr/qwen-flutter-fused")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

vLLM

How to use Wizcoderr/qwen-flutter-fused with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Wizcoderr/qwen-flutter-fused"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Wizcoderr/qwen-flutter-fused",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Wizcoderr/qwen-flutter-fused

SGLang

How to use Wizcoderr/qwen-flutter-fused with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Wizcoderr/qwen-flutter-fused" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Wizcoderr/qwen-flutter-fused",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Wizcoderr/qwen-flutter-fused" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Wizcoderr/qwen-flutter-fused",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Pi new

How to use Wizcoderr/qwen-flutter-fused with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "Wizcoderr/qwen-flutter-fused"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Wizcoderr/qwen-flutter-fused"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Wizcoderr/qwen-flutter-fused with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "Wizcoderr/qwen-flutter-fused"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Wizcoderr/qwen-flutter-fused

Run Hermes

hermes

MLX LM

How to use Wizcoderr/qwen-flutter-fused with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "Wizcoderr/qwen-flutter-fused"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "Wizcoderr/qwen-flutter-fused"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "Wizcoderr/qwen-flutter-fused",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Docker Model Runner
How to use Wizcoderr/qwen-flutter-fused with Docker Model Runner:
```
docker model run hf.co/Wizcoderr/qwen-flutter-fused
```

GenMobiAi — Qwen2.5-Coder-14B Flutter Specialist

GenMobiAi is a fine-tuned version of Qwen2.5-Coder-14B-Instruct specialized for Flutter and Dart development. Optimized for agentic code generation, mobile development, and multi-framework orchestration.

Overview

Type: Code Generation + Agentic AI
Parameters: 14.77B
Architecture: Qwen2ForCausalLM (48 layers)
Context Length: 128,000 tokens
Quantization: 4-bit MLX (group_size=64)
Training Method: QLoRA fine-tuning via MLX-LM
Training Data: 311 Flutter/Dart samples from flutter.dev + pub.dev
License: Apache 2.0

Key Features

Flutter Code Generation

Widgets: StatelessWidget, StatefulWidget, custom widgets, Material 3 design
State Management: Provider, Riverpod, GetX, BLoC, MobX patterns
Async Dart: Futures, Streams, isolates, error handling
Architecture: MVVM, Clean Architecture, Repository pattern

Pub.dev Package Intelligence

HTTP clients (Dio, http with interceptors)
Local storage (hive, shared_preferences)
Animations (flutter_animate, lottie)
Testing (widget tests, unit tests with mockito)

Agentic Capabilities

ChatML format with tool-call support (LangGraph-compatible)
Multi-message context preservation
Structured JSON tool responses

Quick Start

Transformers (CPU/GPU)

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("your-org/genmobiai-qwen2.5-coder-14b-flutter")
model = AutoModelForCausalLM.from_pretrained(
    "your-org/genmobiai-qwen2.5-coder-14b-flutter",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are GenMobiAi, an expert Flutter developer."},
    {"role": "user", "content": "Create a Riverpod provider for a shopping cart."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=1024, temperature=0.3, top_p=0.9)
print(tokenizer.decode(output[0], skip_special_tokens=True))

MLX-LM (Apple Silicon, recommended)

python -m mlx_lm.generate \
  --model path/to/genmobiai-qwen2.5-coder-14b-flutter \
  --prompt "Write a Flutter Counter widget with SharedPreferences persistence" \
  --max-tokens 1024 \
  --temp 0.3

vLLM (High-Throughput)

from vllm import LLM, SamplingParams

llm = LLM("path/to/genmobiai-qwen2.5-coder-14b-flutter", max_model_len=8192)
outputs = llm.generate(
    ["<|im_start|>user\nWrite a Flutter auth provider<|im_end|>\n"],
    SamplingParams(temperature=0.3, top_p=0.9, max_tokens=1024)
)
print(outputs[0].outputs[0].text)

Ollama

# Convert to GGUF first
python -m llama_cpp.server --model path/genmobiai-q4_k_m.gguf --port 8000

# Or use Modelfile
ollama create genmobiai -f - <<EOF
FROM ./genmobiai-q4_k_m.gguf
SYSTEM "You are GenMobiAi, an expert Flutter developer."
PARAMETER temperature 0.3
PARAMETER top_p 0.9
EOF

ollama run genmobiai "Build a Flutter provider for authentication"

Recommended Sampling Parameters

Use Case	Temperature	Top-P	Top-K	Repetition Penalty
Code Generation	0.3	0.9	40	1.05
Complex Logic	0.5	0.95	50	1.0
Agentic Output	0.2	0.85	40	1.1
Creative Patterns	0.7	0.95	50	0.95

Model Specifications

Architecture

Model Type: Qwen2ForCausalLM
Hidden Size: 5,120
Intermediate Size: 13,824
Num Layers: 48
Num Attention Heads: 40
Num KV Heads: 8
RoPE Theta: 1,000,000
Max Position Embeddings: 128,000

Tokenizer

Type: Qwen2Tokenizer
Vocab Size: 152,064
EOS Token: <|im_end|> (151645)
PAD Token: <|endoftext|> (151643)
Special Tokens: ChatML (<|im_start|>, <|im_end|>) + tool-call markers

Quantization (MLX)

Bits: 4
Group Size: 64
Reduces Size: ~28GB (BF16) → ~8.3GB (4-bit)

Training Configuration

Dataset: 311 Flutter/Dart samples (279 train / 32 eval)
Method: QLoRA via MLX-LM on Apple Silicon
LoRA Rank: 8
Trainable Layers: 16 of 48
Batch Size: 1 | Grad Accumulation: 2
Learning Rate: 1e-5
Max Seq Length: 1,024
Iterations: 1,000
Estimated Training Time: 4–8 hours (M3/M4 24GB)

Hardware Requirements

Hardware	Memory	Inference Speed	Use Case
Apple M3/M4 (MLX)	16GB+	100+ tok/s @ 4K	Development
RTX 4090 (BF16)	24GB	200+ tok/s	Production
H100 (batched)	80GB	1000+ tok/s	Server
CPU (GGUF Q4)	32GB	10–15 tok/s	Edge

Capabilities & Use Cases

Flutter Development

✅ Widget scaffolding (Material 3, Cupertino, adaptive)
✅ State management patterns (Provider, Riverpod, GetX, BLoC)
✅ REST API integration (Dio, http, interceptors)
✅ Local storage (hive, shared_preferences, file I/O)
✅ Testing (widget tests, unit tests, integration tests)
✅ Platform channels & native integration

Code Quality

Null safety best practices
MVVM + Clean Architecture patterns
Error handling & logging
Performance optimization tips
Documentation & inline comments

Agentic Features

Tool-call support via XML-wrapped JSON
Multi-message context preservation
Chat template integration (ChatML)
LangGraph workflow compatibility

Limitations

Dataset Size: 311 samples may cause hallucinations on less-documented packages
Quantization Artifacts: 4-bit rounding in floating-point operations
Vision Tokens: Vocabulary includes image tokens (inactive) from multimodal base
Context in Practice: MLX 4-bit inference optimal at 4K–8K tokens on 24GB
No Formal Benchmarks: Performance validated empirically, not on standard evals
Dart 3+ Features: records, sealed classes partially covered

Special Tokens

<|endoftext|>      (ID: 151643)  → Padding / Fallback EOS
<|im_start|>       (ID: 151644)  → ChatML message start
<|im_end|>         (ID: 151645)  → ChatML message end (Primary EOS)
<tool_call>        (Custom)       → Agentic tool invocation (XML wrapper)
</tool_call>       (Custom)       → Agentic tool response end

Citation

@misc{genmobiai2025,
  title   = {GenMobiAi: Qwen2.5-Coder-14B Fine-tuned for Flutter/Dart Development},
  author  = {GenMobiAi Contributors},
  year    = {2025},
  url     = {https://huggingface.co/your-org/genmobiai-qwen2.5-coder-14b-flutter},
  license = {Apache 2.0}
}

@misc{qwen2_5_coder,
  title  = {Qwen2.5-Coder: A Capable Code Language Model},
  author = {Alibaba Cloud},
  year   = {2024},
  url    = {https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct}
}

License

This model is licensed under the Apache License 2.0.

Base Model: Qwen2.5-Coder-14B-Instruct by Alibaba Cloud (Apache 2.0)
Fine-tuning & Specialization: GenMobiAi Contributors (Apache 2.0)
Training Data: flutter.dev (BSD 3-Clause), pub.dev packages (per-package), Flutter GitHub (BSD 3-Clause)

See LICENSE for full text.

Contributing

Issues or improvements?

Report on GitHub or HF Hub
Submit Flutter patterns to expand the training dataset
Improve documentation

Last Updated: 2025-05-25
Status: Production-Ready
Framework Support: Transformers, MLX-LM, vLLM, llama.cpp, Ollama

Downloads last month: 657

Safetensors

Model size

15B params

Tensor type

F16

U32

MLX

Hardware compatibility

4-bit

Model tree for Wizcoderr/qwen-flutter-fused

Base model

Qwen/Qwen2.5-14B

Finetuned

Qwen/Qwen2.5-Coder-14B

Finetuned

Qwen/Qwen2.5-Coder-14B-Instruct

Quantized

(102)

this model