Instructions to use Wizcoderr/qwen-flutter-fused with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Wizcoderr/qwen-flutter-fused with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Wizcoderr/qwen-flutter-fused")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Wizcoderr/qwen-flutter-fused")
model = AutoModelForCausalLM.from_pretrained("Wizcoderr/qwen-flutter-fused")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

MLX

How to use Wizcoderr/qwen-flutter-fused with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("Wizcoderr/qwen-flutter-fused")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

vLLM

How to use Wizcoderr/qwen-flutter-fused with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Wizcoderr/qwen-flutter-fused"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Wizcoderr/qwen-flutter-fused",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Wizcoderr/qwen-flutter-fused

SGLang

How to use Wizcoderr/qwen-flutter-fused with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Wizcoderr/qwen-flutter-fused" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Wizcoderr/qwen-flutter-fused",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Wizcoderr/qwen-flutter-fused" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Wizcoderr/qwen-flutter-fused",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

How to use Wizcoderr/qwen-flutter-fused with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "Wizcoderr/qwen-flutter-fused"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Wizcoderr/qwen-flutter-fused"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Wizcoderr/qwen-flutter-fused with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "Wizcoderr/qwen-flutter-fused"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Wizcoderr/qwen-flutter-fused

Run Hermes

hermes

MLX LM

How to use Wizcoderr/qwen-flutter-fused with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "Wizcoderr/qwen-flutter-fused"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "Wizcoderr/qwen-flutter-fused"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "Wizcoderr/qwen-flutter-fused",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Docker Model Runner
How to use Wizcoderr/qwen-flutter-fused with Docker Model Runner:
```
docker model run hf.co/Wizcoderr/qwen-flutter-fused
```

qwen-flutter-fused / README.md

Wizcoderr

Upload folder using huggingface_hub

8b6c46c verified 10 days ago

preview code

raw

history blame contribute delete

8.08 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- flutter
	- dart
	- code-generation
	- mobile-development
	- qwen
	- qwen2.5-coder
	- mlx
	- transformers
	- vllm
	- text-generation
	- agentic
	- agent
	library_name: transformers
	pipeline_tag: text-generation
	base_model: Qwen/Qwen2.5-Coder-14B-Instruct
	datasets:
	- flutter_docs_alpaca
	---

	# GenMobiAi — Qwen2.5-Coder-14B Flutter Specialist

	GenMobiAi is a fine-tuned version of [Qwen2.5-Coder-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct) specialized for Flutter and Dart development. Optimized for agentic code generation, mobile development, and multi-framework orchestration.

	## Overview

	Type: Code Generation + Agentic AI
	Parameters: 14.77B
	Architecture: Qwen2ForCausalLM (48 layers)
	Context Length: 128,000 tokens
	Quantization: 4-bit MLX (group_size=64)
	Training Method: QLoRA fine-tuning via MLX-LM
	Training Data: 311 Flutter/Dart samples from flutter.dev + pub.dev
	License: Apache 2.0

	## Key Features

	### Flutter Code Generation
	- Widgets: StatelessWidget, StatefulWidget, custom widgets, Material 3 design
	- State Management: Provider, Riverpod, GetX, BLoC, MobX patterns
	- Async Dart: Futures, Streams, isolates, error handling
	- Architecture: MVVM, Clean Architecture, Repository pattern

	### Pub.dev Package Intelligence
	- HTTP clients (Dio, http with interceptors)
	- Local storage (hive, shared_preferences)
	- Animations (flutter_animate, lottie)
	- Testing (widget tests, unit tests with mockito)

	### Agentic Capabilities
	- ChatML format with tool-call support (LangGraph-compatible)
	- Multi-message context preservation
	- Structured JSON tool responses

	## Quick Start

	### Transformers (CPU/GPU)

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	tokenizer = AutoTokenizer.from_pretrained("your-org/genmobiai-qwen2.5-coder-14b-flutter")
	model = AutoModelForCausalLM.from_pretrained(
	"your-org/genmobiai-qwen2.5-coder-14b-flutter",
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)

	messages = [
	{"role": "system", "content": "You are GenMobiAi, an expert Flutter developer."},
	{"role": "user", "content": "Create a Riverpod provider for a shopping cart."}
	]

	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer([text], return_tensors="pt").to(model.device)
	output = model.generate(**inputs, max_new_tokens=1024, temperature=0.3, top_p=0.9)
	print(tokenizer.decode(output[0], skip_special_tokens=True))
	```

	### MLX-LM (Apple Silicon, recommended)

	```bash
	python -m mlx_lm.generate \
	--model path/to/genmobiai-qwen2.5-coder-14b-flutter \
	--prompt "Write a Flutter Counter widget with SharedPreferences persistence" \
	--max-tokens 1024 \
	--temp 0.3
	```

	### vLLM (High-Throughput)

	```python
	from vllm import LLM, SamplingParams

	llm = LLM("path/to/genmobiai-qwen2.5-coder-14b-flutter", max_model_len=8192)
	outputs = llm.generate(
	["<\|im_start\|>user\nWrite a Flutter auth provider<\|im_end\|>\n"],
	SamplingParams(temperature=0.3, top_p=0.9, max_tokens=1024)
	)
	print(outputs[0].outputs[0].text)
	```

	### Ollama

	```bash
	# Convert to GGUF first
	python -m llama_cpp.server --model path/genmobiai-q4_k_m.gguf --port 8000

	# Or use Modelfile
	ollama create genmobiai -f - <<EOF
	FROM ./genmobiai-q4_k_m.gguf
	SYSTEM "You are GenMobiAi, an expert Flutter developer."
	PARAMETER temperature 0.3
	PARAMETER top_p 0.9
	EOF

	ollama run genmobiai "Build a Flutter provider for authentication"
	```

	## Recommended Sampling Parameters

	\| Use Case \| Temperature \| Top-P \| Top-K \| Repetition Penalty \|
	\|----------\|------------\|-------\|-------\|-------------------\|
	\| Code Generation \| 0.3 \| 0.9 \| 40 \| 1.05 \|
	\| Complex Logic \| 0.5 \| 0.95 \| 50 \| 1.0 \|
	\| Agentic Output \| 0.2 \| 0.85 \| 40 \| 1.1 \|
	\| Creative Patterns \| 0.7 \| 0.95 \| 50 \| 0.95 \|

	## Model Specifications

	### Architecture
	- Model Type: Qwen2ForCausalLM
	- Hidden Size: 5,120
	- Intermediate Size: 13,824
	- Num Layers: 48
	- Num Attention Heads: 40
	- Num KV Heads: 8
	- RoPE Theta: 1,000,000
	- Max Position Embeddings: 128,000

	### Tokenizer
	- Type: Qwen2Tokenizer
	- Vocab Size: 152,064
	- EOS Token: `<\|im_end\|>` (151645)
	- PAD Token: `<\|endoftext\|>` (151643)
	- Special Tokens: ChatML (`<\|im_start\|>`, `<\|im_end\|>`) + tool-call markers

	### Quantization (MLX)
	- Bits: 4
	- Group Size: 64
	- Reduces Size: ~28GB (BF16) → ~8.3GB (4-bit)

	## Training Configuration

	Dataset: 311 Flutter/Dart samples (279 train / 32 eval)
	Method: QLoRA via MLX-LM on Apple Silicon
	LoRA Rank: 8
	Trainable Layers: 16 of 48
	Batch Size: 1 \| Grad Accumulation: 2
	Learning Rate: 1e-5
	Max Seq Length: 1,024
	Iterations: 1,000
	Estimated Training Time: 4–8 hours (M3/M4 24GB)

	## Hardware Requirements

	\| Hardware \| Memory \| Inference Speed \| Use Case \|
	\|----------\|--------\|-----------------\|----------\|
	\| Apple M3/M4 (MLX) \| 16GB+ \| 100+ tok/s @ 4K \| Development \|
	\| RTX 4090 (BF16) \| 24GB \| 200+ tok/s \| Production \|
	\| H100 (batched) \| 80GB \| 1000+ tok/s \| Server \|
	\| CPU (GGUF Q4) \| 32GB \| 10–15 tok/s \| Edge \|

	## Capabilities & Use Cases

	### Flutter Development
	- ✅ Widget scaffolding (Material 3, Cupertino, adaptive)
	- ✅ State management patterns (Provider, Riverpod, GetX, BLoC)
	- ✅ REST API integration (Dio, http, interceptors)
	- ✅ Local storage (hive, shared_preferences, file I/O)
	- ✅ Testing (widget tests, unit tests, integration tests)
	- ✅ Platform channels & native integration

	### Code Quality
	- Null safety best practices
	- MVVM + Clean Architecture patterns
	- Error handling & logging
	- Performance optimization tips
	- Documentation & inline comments

	### Agentic Features
	- Tool-call support via XML-wrapped JSON
	- Multi-message context preservation
	- Chat template integration (ChatML)
	- LangGraph workflow compatibility

	## Limitations

	1. Dataset Size: 311 samples may cause hallucinations on less-documented packages
	2. Quantization Artifacts: 4-bit rounding in floating-point operations
	3. Vision Tokens: Vocabulary includes image tokens (inactive) from multimodal base
	4. Context in Practice: MLX 4-bit inference optimal at 4K–8K tokens on 24GB
	5. No Formal Benchmarks: Performance validated empirically, not on standard evals
	6. Dart 3+ Features: records, sealed classes partially covered

	## Special Tokens

	```
	<\|endoftext\|> (ID: 151643) → Padding / Fallback EOS
	<\|im_start\|> (ID: 151644) → ChatML message start
	<\|im_end\|> (ID: 151645) → ChatML message end (Primary EOS)
	<tool_call> (Custom) → Agentic tool invocation (XML wrapper)
	</tool_call> (Custom) → Agentic tool response end
	```

	## Citation

	```bibtex
	@misc{genmobiai2025,
	title = {GenMobiAi: Qwen2.5-Coder-14B Fine-tuned for Flutter/Dart Development},
	author = {GenMobiAi Contributors},
	year = {2025},
	url = {https://huggingface.co/your-org/genmobiai-qwen2.5-coder-14b-flutter},
	license = {Apache 2.0}
	}

	@misc{qwen2_5_coder,
	title = {Qwen2.5-Coder: A Capable Code Language Model},
	author = {Alibaba Cloud},
	year = {2024},
	url = {https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct}
	}
	```

	## License

	This model is licensed under the Apache License 2.0.

	- Base Model: Qwen2.5-Coder-14B-Instruct by Alibaba Cloud (Apache 2.0)
	- Fine-tuning & Specialization: GenMobiAi Contributors (Apache 2.0)
	- Training Data: flutter.dev (BSD 3-Clause), pub.dev packages (per-package), Flutter GitHub (BSD 3-Clause)

	See [LICENSE](./LICENSE) for full text.

	## Contributing

	Issues or improvements?
	- Report on [GitHub](https://github.com/your-org/genmobiai) or [HF Hub](https://huggingface.co/your-org/genmobiai-qwen2.5-coder-14b-flutter)
	- Submit Flutter patterns to expand the training dataset
	- Improve documentation

	---

	Last Updated: 2025-05-25
	Status: Production-Ready
	Framework Support: Transformers, MLX-LM, vLLM, llama.cpp, Ollama