Instructions to use Wizcoderr/qwen-flutter-fused with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Wizcoderr/qwen-flutter-fused with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Wizcoderr/qwen-flutter-fused") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Wizcoderr/qwen-flutter-fused") model = AutoModelForCausalLM.from_pretrained("Wizcoderr/qwen-flutter-fused") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - MLX
How to use Wizcoderr/qwen-flutter-fused with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("Wizcoderr/qwen-flutter-fused") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- vLLM
How to use Wizcoderr/qwen-flutter-fused with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Wizcoderr/qwen-flutter-fused" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Wizcoderr/qwen-flutter-fused", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Wizcoderr/qwen-flutter-fused
- SGLang
How to use Wizcoderr/qwen-flutter-fused with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Wizcoderr/qwen-flutter-fused" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Wizcoderr/qwen-flutter-fused", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Wizcoderr/qwen-flutter-fused" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Wizcoderr/qwen-flutter-fused", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Pi new
How to use Wizcoderr/qwen-flutter-fused with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "Wizcoderr/qwen-flutter-fused"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Wizcoderr/qwen-flutter-fused" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Wizcoderr/qwen-flutter-fused with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "Wizcoderr/qwen-flutter-fused"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Wizcoderr/qwen-flutter-fused
Run Hermes
hermes
- MLX LM
How to use Wizcoderr/qwen-flutter-fused with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "Wizcoderr/qwen-flutter-fused"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "Wizcoderr/qwen-flutter-fused" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Wizcoderr/qwen-flutter-fused", "messages": [ {"role": "user", "content": "Hello"} ] }' - Docker Model Runner
How to use Wizcoderr/qwen-flutter-fused with Docker Model Runner:
docker model run hf.co/Wizcoderr/qwen-flutter-fused
GenMobiAi — Qwen2.5-Coder-14B Flutter Specialist
GenMobiAi is a fine-tuned version of Qwen2.5-Coder-14B-Instruct specialized for Flutter and Dart development. Optimized for agentic code generation, mobile development, and multi-framework orchestration.
Overview
Type: Code Generation + Agentic AI
Parameters: 14.77B
Architecture: Qwen2ForCausalLM (48 layers)
Context Length: 128,000 tokens
Quantization: 4-bit MLX (group_size=64)
Training Method: QLoRA fine-tuning via MLX-LM
Training Data: 311 Flutter/Dart samples from flutter.dev + pub.dev
License: Apache 2.0
Key Features
Flutter Code Generation
- Widgets: StatelessWidget, StatefulWidget, custom widgets, Material 3 design
- State Management: Provider, Riverpod, GetX, BLoC, MobX patterns
- Async Dart: Futures, Streams, isolates, error handling
- Architecture: MVVM, Clean Architecture, Repository pattern
Pub.dev Package Intelligence
- HTTP clients (Dio, http with interceptors)
- Local storage (hive, shared_preferences)
- Animations (flutter_animate, lottie)
- Testing (widget tests, unit tests with mockito)
Agentic Capabilities
- ChatML format with tool-call support (LangGraph-compatible)
- Multi-message context preservation
- Structured JSON tool responses
Quick Start
Transformers (CPU/GPU)
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("your-org/genmobiai-qwen2.5-coder-14b-flutter")
model = AutoModelForCausalLM.from_pretrained(
"your-org/genmobiai-qwen2.5-coder-14b-flutter",
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [
{"role": "system", "content": "You are GenMobiAi, an expert Flutter developer."},
{"role": "user", "content": "Create a Riverpod provider for a shopping cart."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=1024, temperature=0.3, top_p=0.9)
print(tokenizer.decode(output[0], skip_special_tokens=True))
MLX-LM (Apple Silicon, recommended)
python -m mlx_lm.generate \
--model path/to/genmobiai-qwen2.5-coder-14b-flutter \
--prompt "Write a Flutter Counter widget with SharedPreferences persistence" \
--max-tokens 1024 \
--temp 0.3
vLLM (High-Throughput)
from vllm import LLM, SamplingParams
llm = LLM("path/to/genmobiai-qwen2.5-coder-14b-flutter", max_model_len=8192)
outputs = llm.generate(
["<|im_start|>user\nWrite a Flutter auth provider<|im_end|>\n"],
SamplingParams(temperature=0.3, top_p=0.9, max_tokens=1024)
)
print(outputs[0].outputs[0].text)
Ollama
# Convert to GGUF first
python -m llama_cpp.server --model path/genmobiai-q4_k_m.gguf --port 8000
# Or use Modelfile
ollama create genmobiai -f - <<EOF
FROM ./genmobiai-q4_k_m.gguf
SYSTEM "You are GenMobiAi, an expert Flutter developer."
PARAMETER temperature 0.3
PARAMETER top_p 0.9
EOF
ollama run genmobiai "Build a Flutter provider for authentication"
Recommended Sampling Parameters
| Use Case | Temperature | Top-P | Top-K | Repetition Penalty |
|---|---|---|---|---|
| Code Generation | 0.3 | 0.9 | 40 | 1.05 |
| Complex Logic | 0.5 | 0.95 | 50 | 1.0 |
| Agentic Output | 0.2 | 0.85 | 40 | 1.1 |
| Creative Patterns | 0.7 | 0.95 | 50 | 0.95 |
Model Specifications
Architecture
- Model Type: Qwen2ForCausalLM
- Hidden Size: 5,120
- Intermediate Size: 13,824
- Num Layers: 48
- Num Attention Heads: 40
- Num KV Heads: 8
- RoPE Theta: 1,000,000
- Max Position Embeddings: 128,000
Tokenizer
- Type: Qwen2Tokenizer
- Vocab Size: 152,064
- EOS Token:
<|im_end|>(151645) - PAD Token:
<|endoftext|>(151643) - Special Tokens: ChatML (
<|im_start|>,<|im_end|>) + tool-call markers
Quantization (MLX)
- Bits: 4
- Group Size: 64
- Reduces Size: ~28GB (BF16) → ~8.3GB (4-bit)
Training Configuration
Dataset: 311 Flutter/Dart samples (279 train / 32 eval)
Method: QLoRA via MLX-LM on Apple Silicon
LoRA Rank: 8
Trainable Layers: 16 of 48
Batch Size: 1 | Grad Accumulation: 2
Learning Rate: 1e-5
Max Seq Length: 1,024
Iterations: 1,000
Estimated Training Time: 4–8 hours (M3/M4 24GB)
Hardware Requirements
| Hardware | Memory | Inference Speed | Use Case |
|---|---|---|---|
| Apple M3/M4 (MLX) | 16GB+ | 100+ tok/s @ 4K | Development |
| RTX 4090 (BF16) | 24GB | 200+ tok/s | Production |
| H100 (batched) | 80GB | 1000+ tok/s | Server |
| CPU (GGUF Q4) | 32GB | 10–15 tok/s | Edge |
Capabilities & Use Cases
Flutter Development
- ✅ Widget scaffolding (Material 3, Cupertino, adaptive)
- ✅ State management patterns (Provider, Riverpod, GetX, BLoC)
- ✅ REST API integration (Dio, http, interceptors)
- ✅ Local storage (hive, shared_preferences, file I/O)
- ✅ Testing (widget tests, unit tests, integration tests)
- ✅ Platform channels & native integration
Code Quality
- Null safety best practices
- MVVM + Clean Architecture patterns
- Error handling & logging
- Performance optimization tips
- Documentation & inline comments
Agentic Features
- Tool-call support via XML-wrapped JSON
- Multi-message context preservation
- Chat template integration (ChatML)
- LangGraph workflow compatibility
Limitations
- Dataset Size: 311 samples may cause hallucinations on less-documented packages
- Quantization Artifacts: 4-bit rounding in floating-point operations
- Vision Tokens: Vocabulary includes image tokens (inactive) from multimodal base
- Context in Practice: MLX 4-bit inference optimal at 4K–8K tokens on 24GB
- No Formal Benchmarks: Performance validated empirically, not on standard evals
- Dart 3+ Features: records, sealed classes partially covered
Special Tokens
<|endoftext|> (ID: 151643) → Padding / Fallback EOS
<|im_start|> (ID: 151644) → ChatML message start
<|im_end|> (ID: 151645) → ChatML message end (Primary EOS)
<tool_call> (Custom) → Agentic tool invocation (XML wrapper)
</tool_call> (Custom) → Agentic tool response end
Citation
@misc{genmobiai2025,
title = {GenMobiAi: Qwen2.5-Coder-14B Fine-tuned for Flutter/Dart Development},
author = {GenMobiAi Contributors},
year = {2025},
url = {https://huggingface.co/your-org/genmobiai-qwen2.5-coder-14b-flutter},
license = {Apache 2.0}
}
@misc{qwen2_5_coder,
title = {Qwen2.5-Coder: A Capable Code Language Model},
author = {Alibaba Cloud},
year = {2024},
url = {https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct}
}
License
This model is licensed under the Apache License 2.0.
- Base Model: Qwen2.5-Coder-14B-Instruct by Alibaba Cloud (Apache 2.0)
- Fine-tuning & Specialization: GenMobiAi Contributors (Apache 2.0)
- Training Data: flutter.dev (BSD 3-Clause), pub.dev packages (per-package), Flutter GitHub (BSD 3-Clause)
See LICENSE for full text.
Contributing
Issues or improvements?
- Report on GitHub or HF Hub
- Submit Flutter patterns to expand the training dataset
- Improve documentation
Last Updated: 2025-05-25
Status: Production-Ready
Framework Support: Transformers, MLX-LM, vLLM, llama.cpp, Ollama
- Downloads last month
- 657
4-bit
Model tree for Wizcoderr/qwen-flutter-fused
Base model
Qwen/Qwen2.5-14B
docker model run hf.co/Wizcoderr/qwen-flutter-fused