Instructions to use yarenty/qwen2.5-3B-datafusion-small with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use yarenty/qwen2.5-3B-datafusion-small with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="yarenty/qwen2.5-3B-datafusion-small", filename="qwen2.5-3B-datafusion.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use yarenty/qwen2.5-3B-datafusion-small with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf yarenty/qwen2.5-3B-datafusion-small # Run inference directly in the terminal: llama-cli -hf yarenty/qwen2.5-3B-datafusion-small
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf yarenty/qwen2.5-3B-datafusion-small # Run inference directly in the terminal: llama-cli -hf yarenty/qwen2.5-3B-datafusion-small
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf yarenty/qwen2.5-3B-datafusion-small # Run inference directly in the terminal: ./llama-cli -hf yarenty/qwen2.5-3B-datafusion-small
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf yarenty/qwen2.5-3B-datafusion-small # Run inference directly in the terminal: ./build/bin/llama-cli -hf yarenty/qwen2.5-3B-datafusion-small
Use Docker
docker model run hf.co/yarenty/qwen2.5-3B-datafusion-small
- LM Studio
- Jan
- Ollama
How to use yarenty/qwen2.5-3B-datafusion-small with Ollama:
ollama run hf.co/yarenty/qwen2.5-3B-datafusion-small
- Unsloth Studio new
How to use yarenty/qwen2.5-3B-datafusion-small with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for yarenty/qwen2.5-3B-datafusion-small to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for yarenty/qwen2.5-3B-datafusion-small to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for yarenty/qwen2.5-3B-datafusion-small to start chatting
- Pi new
How to use yarenty/qwen2.5-3B-datafusion-small with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf yarenty/qwen2.5-3B-datafusion-small
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "yarenty/qwen2.5-3B-datafusion-small" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use yarenty/qwen2.5-3B-datafusion-small with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf yarenty/qwen2.5-3B-datafusion-small
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default yarenty/qwen2.5-3B-datafusion-small
Run Hermes
hermes
- Docker Model Runner
How to use yarenty/qwen2.5-3B-datafusion-small with Docker Model Runner:
docker model run hf.co/yarenty/qwen2.5-3B-datafusion-small
- Lemonade
How to use yarenty/qwen2.5-3B-datafusion-small with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull yarenty/qwen2.5-3B-datafusion-small
Run and chat with the model
lemonade run user.qwen2.5-3B-datafusion-small-{{QUANT_TAG}}List all available models
lemonade list
Qwen2.5-3B-DataFusion-Instruct Quantized Model
Model Card: Quantized Version
Model Name: Qwen2.5-3B-DataFusion-Instruct (Quantized)
File: qwen2.5-3B-datafusion.gguf
Size: 1.8GB
Type: Quantized GGUF Model
Base Model: Qwen2.5-3B
Specialization: DataFusion SQL Engine and Rust Programming
License: Apache 2.0
Model Overview
This is the quantized version of the Qwen2.5-3B-DataFusion-Instruct model, optimized for production deployment and resource-constrained environments. The quantization process reduces memory usage while maintaining high accuracy for DataFusion and Rust programming tasks.
Quantization Details
Quantization Method
- Format: GGUF (GGML Universal Format)
- Quantization Level: Optimized for inference speed and memory efficiency
- Precision: Reduced from full precision to quantized representation
- Memory Reduction: ~69% reduction from 5.8GB to 1.8GB
Performance Characteristics
- Inference Speed: Faster than full precision model
- Memory Usage: Significantly reduced memory footprint
- Accuracy: Minimal degradation in specialized domain knowledge
- Deployment: Optimized for production environments
Technical Specifications
Model Architecture
- Base Architecture: Qwen2.5-3B transformer model
- Fine-tuning: Specialized on DataFusion ecosystem data
- Context Handling: Optimized for technical Q&A format
- Output Format: Structured responses with stop sequences
Inference Parameters
- Temperature: 0.7 (balanced creativity vs consistency)
- Top-p: 0.9 (nucleus sampling for quality)
- Repeat Penalty: 1.2 (prevents repetitive output)
- Max Tokens: 1024 (controlled response length)
Performance Metrics
Memory Efficiency
- Original Size: 5.8GB
- Quantized Size: 1.8GB
- Memory Reduction: 69%
- RAM Usage: Significantly lower during inference
Speed Improvements
- Inference Speed: 20-40% faster than full precision
- Loading Time: Reduced model loading time
- Response Generation: Faster token generation
- Batch Processing: Improved throughput
Accuracy Trade-offs
- Domain Knowledge: Maintained DataFusion expertise
- Code Generation: High quality Rust and SQL output
- Technical Explanations: Clear and accurate responses
- Edge Cases: Slight degradation in complex scenarios
Deployment Guidelines
System Requirements
- Minimum RAM: 4GB (vs 8GB+ for full model)
- CPU: Modern multi-core processor
- Storage: 2GB available space
- OS: Linux, macOS, or Windows
Recommended Configurations
- Development: 8GB RAM, modern CPU
- Production: 16GB+ RAM, dedicated CPU cores
- High-Throughput: 32GB+ RAM, GPU acceleration (optional)
Integration Options
- Ollama: Native support with optimized performance
- llama.cpp: Direct GGUF file usage
- Custom Applications: REST API integration
- Batch Processing: High-volume inference pipelines
Comparison with Full Model
| Metric | Quantized Model | Full Model |
|---|---|---|
| File Size | 1.8GB | 5.8GB |
| Memory Usage | Lower | Higher |
| Inference Speed | Faster | Standard |
| Accuracy | High | Highest |
| Deployment | Production-ready | Development/Production |
| Resource Efficiency | High | Standard |
Best Practices
For Production Use
- Load Testing: Validate performance under expected load
- Memory Monitoring: Track RAM usage during operation
- Response Validation: Implement quality checks for outputs
- Fallback Strategy: Plan for model switching if needed
For Development
- Iterative Testing: Test with various input types
- Performance Profiling: Monitor inference times
- Quality Assessment: Compare outputs with full model
- Integration Testing: Validate in target environment
This quantized model provides an excellent balance of performance, accuracy, and resource efficiency, making it ideal for production deployment of DataFusion-specialized AI assistance.
- Downloads last month
- 1
We're not able to determine the quantization variants.