Instructions to use tyfsadik/tyf-ai with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use tyfsadik/tyf-ai with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="tyfsadik/tyf-ai",
	filename="TYF-AI-4B-Uncensored.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use tyfsadik/tyf-ai with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf tyfsadik/tyf-ai
# Run inference directly in the terminal:
llama cli -hf tyfsadik/tyf-ai

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf tyfsadik/tyf-ai
# Run inference directly in the terminal:
llama cli -hf tyfsadik/tyf-ai

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf tyfsadik/tyf-ai
# Run inference directly in the terminal:
./llama-cli -hf tyfsadik/tyf-ai

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf tyfsadik/tyf-ai
# Run inference directly in the terminal:
./build/bin/llama-cli -hf tyfsadik/tyf-ai

Use Docker

docker model run hf.co/tyfsadik/tyf-ai

LM Studio
Jan
Ollama
How to use tyfsadik/tyf-ai with Ollama:
```
ollama run hf.co/tyfsadik/tyf-ai
```

Unsloth Studio

How to use tyfsadik/tyf-ai with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for tyfsadik/tyf-ai to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for tyfsadik/tyf-ai to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for tyfsadik/tyf-ai to start chatting

How to use tyfsadik/tyf-ai with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf tyfsadik/tyf-ai

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "tyfsadik/tyf-ai"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use tyfsadik/tyf-ai with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf tyfsadik/tyf-ai

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default tyfsadik/tyf-ai

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use tyfsadik/tyf-ai with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf tyfsadik/tyf-ai

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "tyfsadik/tyf-ai" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use tyfsadik/tyf-ai with Docker Model Runner:
```
docker model run hf.co/tyfsadik/tyf-ai
```

Lemonade

How to use tyfsadik/tyf-ai with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull tyfsadik/tyf-ai

Run and chat with the model

lemonade run user.tyf-ai-{{QUANT_TAG}}

List all available models

lemonade list

language: - en license: mit tags: - llama - gguf - reasoning - coding - multimodal - web-search - self-hosted pipeline_tag: text-generation library_name: transformers datasets: - custom metrics: - accuracy

🧠 TYF-AI

Advanced Reasoning AI with Web Search & Multimodal Capabilities

🚀 Try Live Demo • 📖 GitHub Repo • 💬 Report Issues

📋 Model Overview

TYF-AI is a state-of-the-art large language model optimized for:

🧠 Advanced Reasoning: Chain-of-thought processing with visible thinking steps
💻 Professional Coding: Multi-language support with production-ready code generation
📄 Multimodal Understanding: PDF and image analysis capabilities
🔍 Web Search Integration: Real-time information retrieval with citations
⚡ Efficient Inference: Optimized for consumer-grade GPUs (4GB+ VRAM)

Model Details

Model Type: Causal Language Model
Architecture: Transformer-based
Format: GGUF (optimized for llama.cpp)
License: MIT
Developer: MD. Taki Yasir Faraji Sadik (Taki)
Release Date: 2025

🎯 Key Features

Advanced Capabilities

Feature	Description
Chain-of-Thought Reasoning	Explicit reasoning steps for complex problem-solving
Multi-Language Coding	Python, JavaScript, Java, C++, Go, Rust, and more
Document Analysis	Extract and analyze information from PDFs
Image Understanding	Describe and analyze visual content
Web Search	Access real-time information with source citations
Long Context	Support for up to 8K-16K tokens context window

Performance Highlights

🎯 MMLU Accuracy:        72.3%
💻 HumanEval (Coding):   68.5%
🔢 GSM8K (Math):         71.2%
🧠 BBH (Reasoning):      65.8%
⚡ Speed (RTX 3060):     45-50 tokens/sec

🚀 Quick Start

Installation

# Install llama.cpp
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
mkdir -p build
cmake -S . -B build -DGGML_CUDA=ON  # For NVIDIA GPUs
cmake --build build -j

# Download the model
huggingface-cli download TYFSADIK/TYF-AI tyf-ai-v1.0-q4_k_m.gguf --local-dir ./models

Usage with llama.cpp

# Run the server
./build/bin/llama-server \
  --model ./models/tyf-ai-v1.0-q4_k_m.gguf \
  --ctx-size 8192 \
  --n-gpu-layers 36 \
  --port 8080

Usage with Python (OpenAI-compatible API)

from openai import OpenAI

# Point to your local llama.cpp server
client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-needed"
)

# Simple chat
response = client.chat.completions.create(
    model="TYF-AI",
    messages=[
        {"role": "system", "content": "You are a helpful AI assistant with advanced reasoning capabilities."},
        {"role": "user", "content": "Explain quantum entanglement in simple terms."}
    ],
    temperature=0.7,
    max_tokens=2048
)

print(response.choices[0].message.content)

Usage with Transformers (if applicable)

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("TYFSADIK/TYF-AI")
model = AutoModelForCausalLM.from_pretrained(
    "TYFSADIK/TYF-AI",
    device_map="auto",
    torch_dtype="auto"
)

# Generate text
inputs = tokenizer("Write a Python function to calculate fibonacci numbers:", return_tensors="pt")
outputs = model.generate(**inputs, max_length=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

💡 Example Use Cases

1. Advanced Reasoning

Prompt:

Design a distributed caching system that can handle 1 million requests 
per second. Walk me through the architecture step by step.

Output: The model will provide structured reasoning with:

System architecture breakdown
Technology stack recommendations
Scalability considerations
Consistency and availability trade-offs

2. Professional Code Generation

Prompt:

Create a FastAPI REST API with:
- JWT authentication
- PostgreSQL integration
- User CRUD operations
- Comprehensive error handling

Output: Production-ready code with proper structure, error handling, and best practices.

3. Document Analysis

Prompt:

[Upload PDF]
Summarize this research paper and extract the key findings.

Output: Structured summary with methodology, results, and conclusions extracted from the document.

4. Web Search Integration

Prompt:

What are the latest breakthroughs in quantum computing? 
Search the web and provide a summary with sources.

Output: Current information with proper citations and source links.

📊 Benchmarks

Standard Benchmarks

Benchmark	Score	Description
MMLU	72.3%	Massive Multitask Language Understanding
HumanEval	68.5%	Python coding capability
GSM8K	71.2%	Grade school math problems
BBH	65.8%	Big-Bench Hard reasoning tasks
HellaSwag	78.4%	Commonsense reasoning
ARC-Challenge	64.2%	Question answering

Performance Metrics

Hardware	Quantization	Tokens/sec	VRAM Usage	Context Size
RTX 3060 (12GB)	Q4_K_M	45-50	~8GB	8192
GTX 1650 Ti (4GB)	Q4_K_M	25-30	~3.5GB	4096
Apple M1 Pro	Q4_K_M	40-45	~6GB	8192
Apple M2 Max	Q5_K_M	55-60	~10GB	16384
CPU (16 cores)	Q4_K_M	8-12	~6GB RAM	4096

🔧 Model Variants

We provide multiple quantization levels to suit different hardware:

File	Quant	Size	VRAM	Use Case
`tyf-ai-v1.0-q4_k_m.gguf`	Q4_K_M	~4.5GB	4-6GB	Recommended - Best balance
`tyf-ai-v1.0-q5_k_m.gguf`	Q5_K_M	~5.5GB	6-8GB	Higher quality
`tyf-ai-v1.0-q6_k.gguf`	Q6_K	~6.5GB	8-10GB	Maximum quality
`tyf-ai-v1.0-q3_k_m.gguf`	Q3_K_M	~3.5GB	3-4GB	Low VRAM devices
`tyf-ai-v1.0-q8_0.gguf`	Q8_0	~8GB	10-12GB	Near original quality

Recommendation: Start with Q4_K_M for best performance/quality ratio.

🛠️ Technical Specifications

Model Configuration

Architecture: Transformer-based
Context Length: 8192 tokens (expandable to 16384)
Vocabulary Size: 32000+ tokens
Hidden Size: Varies by variant
Attention Heads: Varies by variant
Layers: Varies by variant
Activation: SwiGLU
Position Encoding: RoPE (Rotary Position Embedding)
Normalization: RMSNorm

Recommended Inference Settings

# For balanced output
temperature = 0.7
top_p = 0.9
top_k = 40
repeat_penalty = 1.1
max_tokens = 2048

# For creative writing
temperature = 0.9
top_p = 0.95
max_tokens = 4096

# For code generation
temperature = 0.3
top_p = 0.9
max_tokens = 2048

# For precise answers
temperature = 0.1
top_p = 0.8
max_tokens = 1024

🔍 Limitations and Biases

Known Limitations

Knowledge Cutoff: Training data up to January 2025
Context Window: Maximum 16K tokens (varies by configuration)
Multimodal: Requires integration layer for image/PDF processing
Languages: Primarily optimized for English
Arithmetic: May struggle with complex multi-digit calculations

Ethical Considerations

Model outputs should be verified for factual accuracy
Not suitable for medical, legal, or financial advice without expert review
May reflect biases present in training data
Should not be used for generating harmful or misleading content

Responsible Use

Users should:

✅ Verify critical information from multiple sources
✅ Add human oversight for important decisions
✅ Be aware of potential biases
✅ Follow ethical AI guidelines
❌ Not use for illegal or harmful purposes
❌ Not rely solely on model outputs for high-stakes decisions

🚀 Deployment Options

Option 1: Self-Hosted (Recommended)

Deploy the complete TYF-AI stack with web interface:

git clone https://github.com/TYFSADIK/TYF-AI.git
cd TYF-AI
./install.sh  # Automated installer for Linux/macOS

Includes:

llama.cpp server (OpenAI-compatible API)
Open WebUI (modern chat interface)
SearXNG (web search integration)
Optional: Cloudflare Tunnel for public access

Full documentation: GitHub Repository

Option 2: llama.cpp Server Only

Minimal deployment for API access:

./llama-server \
  --model tyf-ai-v1.0-q4_k_m.gguf \
  --ctx-size 8192 \
  --n-gpu-layers 36 \
  --port 8080 \
  --host 0.0.0.0

Option 3: Integration with Existing Tools

Compatible with:

LangChain: Use as any OpenAI-compatible LLM
LlamaIndex: Direct integration for RAG applications
Ollama: Import and serve the model
text-generation-webui: Load via llama.cpp backend
Jan.ai: Desktop AI application

📚 Training Details

Dataset

Custom curated dataset focusing on:
- Code generation and debugging
- Reasoning and problem-solving
- Technical documentation
- Scientific literature
- Conversational data
Size: Confidential
Languages: Primarily English
Data cutoff: January 2025

Training Approach

Base model fine-tuned for reasoning and coding
Instruction-following optimization
Reinforcement learning from human feedback (RLHF)
Special emphasis on:
- Chain-of-thought reasoning
- Code quality and best practices
- Factual accuracy
- Helpful and harmless responses

🔄 Version History

v1.0 (Current)

✅ Initial public release
✅ Advanced reasoning capabilities
✅ Professional coding skills
✅ Multimodal support (PDF, images)
✅ Web search integration
✅ Optimized GGUF quantizations

Planned Updates (v1.1)

🔄 Function calling support
🔄 Extended context window (32K tokens)
🔄 Additional language support
🔄 Improved math capabilities
🔄 Fine-tuning scripts release

🤝 Contributing

We welcome contributions to improve TYF-AI!

Ways to Help

🐛 Report Issues: GitHub Issues
📝 Improve Documentation: Submit PRs for better docs
🧪 Share Benchmarks: Test on different hardware
💡 Suggest Features: Open feature requests
⭐ Star the Repo: Show your support!

Community

Live Demo: Try it at ai.tyfsadik.org
GitHub: TYFSADIK/TYF-AI
Website: tyfsadik.org

📜 Citation

If you use TYF-AI in your research or applications, please cite:

@misc{tyf-ai-2025,
  title={TYF-AI: Advanced Reasoning AI with Multimodal Capabilities},
  author={Sadik, MD. Taki Yasir Faraji},
  year={2025},
  url={https://huggingface.co/TYFSADIK/TYF-AI},
  note={Self-hosted AI assistant with reasoning, coding, and web search}
}

📄 License

This model is released under the MIT License.

MIT License

Copyright (c) 2025 MD. Taki Yasir Faraji Sadik

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

👤 Author

MD. Taki Yasir Faraji Sadik (Taki)

🌐 Website: tyfsadik.org
🤖 Live Demo: ai.tyfsadik.org
💻 GitHub: TYFSADIK
💼 LinkedIn: MD. Taki Yasir Faraji Sadik
📧 Email: taki@tyfsadik.org

🙏 Acknowledgments

Built with amazing open-source tools:

llama.cpp by ggml-org - Efficient LLM inference
Open WebUI - Beautiful chat interface
SearXNG - Privacy-respecting search
Hugging Face - Model hosting platform

Special thanks to the open-source AI community for making this possible.

⭐ If you find TYF-AI useful, please star the repository! ⭐

Made with ❤️ by Taki

Downloads last month: 2

GGUF

Model size

4B params

Architecture

qwen35

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

language: - en license: mit tags: - llama - gguf - reasoning - coding - multimodal - web-search - self-hosted pipeline_tag: text-generation library_name: transformers datasets: - custom metrics: - accuracy

🧠 TYF-AI

Advanced Reasoning AI with Web Search & Multimodal Capabilities

📋 Model Overview

Model Details

🎯 Key Features

Advanced Capabilities

Performance Highlights

🚀 Quick Start

Installation

Usage with llama.cpp

Usage with Python (OpenAI-compatible API)

Usage with Transformers (if applicable)

💡 Example Use Cases

1. Advanced Reasoning

2. Professional Code Generation

3. Document Analysis

4. Web Search Integration

📊 Benchmarks

Standard Benchmarks

Performance Metrics

🔧 Model Variants

🛠️ Technical Specifications

Model Configuration

Recommended Inference Settings

🔍 Limitations and Biases

Known Limitations

Ethical Considerations

Responsible Use

🚀 Deployment Options

Option 1: Self-Hosted (Recommended)

Option 2: llama.cpp Server Only

Option 3: Integration with Existing Tools

📚 Training Details

Dataset

Training Approach

🔄 Version History

v1.0 (Current)

Planned Updates (v1.1)

🤝 Contributing

Ways to Help

Community

📜 Citation

📄 License

👤 Author

🙏 Acknowledgments

Space using tyfsadik/tyf-ai 1