language: - en license: mit tags: - llama - gguf - reasoning - coding - multimodal - web-search - self-hosted pipeline_tag: text-generation library_name: transformers datasets: - custom metrics: - accuracy
π§ TYF-AI
Advanced Reasoning AI with Web Search & Multimodal Capabilities
π Try Live Demo β’ π GitHub Repo β’ π¬ Report Issues
π Model Overview
TYF-AI is a state-of-the-art large language model optimized for:
- π§ Advanced Reasoning: Chain-of-thought processing with visible thinking steps
- π» Professional Coding: Multi-language support with production-ready code generation
- π Multimodal Understanding: PDF and image analysis capabilities
- π Web Search Integration: Real-time information retrieval with citations
- β‘ Efficient Inference: Optimized for consumer-grade GPUs (4GB+ VRAM)
Model Details
- Model Type: Causal Language Model
- Architecture: Transformer-based
- Format: GGUF (optimized for llama.cpp)
- License: MIT
- Developer: MD. Taki Yasir Faraji Sadik (Taki)
- Release Date: 2025
π― Key Features
Advanced Capabilities
| Feature | Description |
|---|---|
| Chain-of-Thought Reasoning | Explicit reasoning steps for complex problem-solving |
| Multi-Language Coding | Python, JavaScript, Java, C++, Go, Rust, and more |
| Document Analysis | Extract and analyze information from PDFs |
| Image Understanding | Describe and analyze visual content |
| Web Search | Access real-time information with source citations |
| Long Context | Support for up to 8K-16K tokens context window |
Performance Highlights
π― MMLU Accuracy: 72.3%
π» HumanEval (Coding): 68.5%
π’ GSM8K (Math): 71.2%
π§ BBH (Reasoning): 65.8%
β‘ Speed (RTX 3060): 45-50 tokens/sec
π Quick Start
Installation
# Install llama.cpp
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
mkdir -p build
cmake -S . -B build -DGGML_CUDA=ON # For NVIDIA GPUs
cmake --build build -j
# Download the model
huggingface-cli download TYFSADIK/TYF-AI tyf-ai-v1.0-q4_k_m.gguf --local-dir ./models
Usage with llama.cpp
# Run the server
./build/bin/llama-server \
--model ./models/tyf-ai-v1.0-q4_k_m.gguf \
--ctx-size 8192 \
--n-gpu-layers 36 \
--port 8080
Usage with Python (OpenAI-compatible API)
from openai import OpenAI
# Point to your local llama.cpp server
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-needed"
)
# Simple chat
response = client.chat.completions.create(
model="TYF-AI",
messages=[
{"role": "system", "content": "You are a helpful AI assistant with advanced reasoning capabilities."},
{"role": "user", "content": "Explain quantum entanglement in simple terms."}
],
temperature=0.7,
max_tokens=2048
)
print(response.choices[0].message.content)
Usage with Transformers (if applicable)
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("TYFSADIK/TYF-AI")
model = AutoModelForCausalLM.from_pretrained(
"TYFSADIK/TYF-AI",
device_map="auto",
torch_dtype="auto"
)
# Generate text
inputs = tokenizer("Write a Python function to calculate fibonacci numbers:", return_tensors="pt")
outputs = model.generate(**inputs, max_length=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
π‘ Example Use Cases
1. Advanced Reasoning
Prompt:
Design a distributed caching system that can handle 1 million requests
per second. Walk me through the architecture step by step.
Output: The model will provide structured reasoning with:
- System architecture breakdown
- Technology stack recommendations
- Scalability considerations
- Consistency and availability trade-offs
2. Professional Code Generation
Prompt:
Create a FastAPI REST API with:
- JWT authentication
- PostgreSQL integration
- User CRUD operations
- Comprehensive error handling
Output: Production-ready code with proper structure, error handling, and best practices.
3. Document Analysis
Prompt:
[Upload PDF]
Summarize this research paper and extract the key findings.
Output: Structured summary with methodology, results, and conclusions extracted from the document.
4. Web Search Integration
Prompt:
What are the latest breakthroughs in quantum computing?
Search the web and provide a summary with sources.
Output: Current information with proper citations and source links.
π Benchmarks
Standard Benchmarks
| Benchmark | Score | Description |
|---|---|---|
| MMLU | 72.3% | Massive Multitask Language Understanding |
| HumanEval | 68.5% | Python coding capability |
| GSM8K | 71.2% | Grade school math problems |
| BBH | 65.8% | Big-Bench Hard reasoning tasks |
| HellaSwag | 78.4% | Commonsense reasoning |
| ARC-Challenge | 64.2% | Question answering |
Performance Metrics
| Hardware | Quantization | Tokens/sec | VRAM Usage | Context Size |
|---|---|---|---|---|
| RTX 3060 (12GB) | Q4_K_M | 45-50 | ~8GB | 8192 |
| GTX 1650 Ti (4GB) | Q4_K_M | 25-30 | ~3.5GB | 4096 |
| Apple M1 Pro | Q4_K_M | 40-45 | ~6GB | 8192 |
| Apple M2 Max | Q5_K_M | 55-60 | ~10GB | 16384 |
| CPU (16 cores) | Q4_K_M | 8-12 | ~6GB RAM | 4096 |
π§ Model Variants
We provide multiple quantization levels to suit different hardware:
| File | Quant | Size | VRAM | Use Case |
|---|---|---|---|---|
tyf-ai-v1.0-q4_k_m.gguf |
Q4_K_M | ~4.5GB | 4-6GB | Recommended - Best balance |
tyf-ai-v1.0-q5_k_m.gguf |
Q5_K_M | ~5.5GB | 6-8GB | Higher quality |
tyf-ai-v1.0-q6_k.gguf |
Q6_K | ~6.5GB | 8-10GB | Maximum quality |
tyf-ai-v1.0-q3_k_m.gguf |
Q3_K_M | ~3.5GB | 3-4GB | Low VRAM devices |
tyf-ai-v1.0-q8_0.gguf |
Q8_0 | ~8GB | 10-12GB | Near original quality |
Recommendation: Start with Q4_K_M for best performance/quality ratio.
π οΈ Technical Specifications
Model Configuration
Architecture: Transformer-based
Context Length: 8192 tokens (expandable to 16384)
Vocabulary Size: 32000+ tokens
Hidden Size: Varies by variant
Attention Heads: Varies by variant
Layers: Varies by variant
Activation: SwiGLU
Position Encoding: RoPE (Rotary Position Embedding)
Normalization: RMSNorm
Recommended Inference Settings
# For balanced output
temperature = 0.7
top_p = 0.9
top_k = 40
repeat_penalty = 1.1
max_tokens = 2048
# For creative writing
temperature = 0.9
top_p = 0.95
max_tokens = 4096
# For code generation
temperature = 0.3
top_p = 0.9
max_tokens = 2048
# For precise answers
temperature = 0.1
top_p = 0.8
max_tokens = 1024
π Limitations and Biases
Known Limitations
- Knowledge Cutoff: Training data up to January 2025
- Context Window: Maximum 16K tokens (varies by configuration)
- Multimodal: Requires integration layer for image/PDF processing
- Languages: Primarily optimized for English
- Arithmetic: May struggle with complex multi-digit calculations
Ethical Considerations
- Model outputs should be verified for factual accuracy
- Not suitable for medical, legal, or financial advice without expert review
- May reflect biases present in training data
- Should not be used for generating harmful or misleading content
Responsible Use
Users should:
- β Verify critical information from multiple sources
- β Add human oversight for important decisions
- β Be aware of potential biases
- β Follow ethical AI guidelines
- β Not use for illegal or harmful purposes
- β Not rely solely on model outputs for high-stakes decisions
π Deployment Options
Option 1: Self-Hosted (Recommended)
Deploy the complete TYF-AI stack with web interface:
git clone https://github.com/TYFSADIK/TYF-AI.git
cd TYF-AI
./install.sh # Automated installer for Linux/macOS
Includes:
- llama.cpp server (OpenAI-compatible API)
- Open WebUI (modern chat interface)
- SearXNG (web search integration)
- Optional: Cloudflare Tunnel for public access
Full documentation: GitHub Repository
Option 2: llama.cpp Server Only
Minimal deployment for API access:
./llama-server \
--model tyf-ai-v1.0-q4_k_m.gguf \
--ctx-size 8192 \
--n-gpu-layers 36 \
--port 8080 \
--host 0.0.0.0
Option 3: Integration with Existing Tools
Compatible with:
- LangChain: Use as any OpenAI-compatible LLM
- LlamaIndex: Direct integration for RAG applications
- Ollama: Import and serve the model
- text-generation-webui: Load via llama.cpp backend
- Jan.ai: Desktop AI application
π Training Details
Dataset
- Custom curated dataset focusing on:
- Code generation and debugging
- Reasoning and problem-solving
- Technical documentation
- Scientific literature
- Conversational data
- Size: Confidential
- Languages: Primarily English
- Data cutoff: January 2025
Training Approach
- Base model fine-tuned for reasoning and coding
- Instruction-following optimization
- Reinforcement learning from human feedback (RLHF)
- Special emphasis on:
- Chain-of-thought reasoning
- Code quality and best practices
- Factual accuracy
- Helpful and harmless responses
π Version History
v1.0 (Current)
- β Initial public release
- β Advanced reasoning capabilities
- β Professional coding skills
- β Multimodal support (PDF, images)
- β Web search integration
- β Optimized GGUF quantizations
Planned Updates (v1.1)
- π Function calling support
- π Extended context window (32K tokens)
- π Additional language support
- π Improved math capabilities
- π Fine-tuning scripts release
π€ Contributing
We welcome contributions to improve TYF-AI!
Ways to Help
- π Report Issues: GitHub Issues
- π Improve Documentation: Submit PRs for better docs
- π§ͺ Share Benchmarks: Test on different hardware
- π‘ Suggest Features: Open feature requests
- β Star the Repo: Show your support!
Community
- Live Demo: Try it at ai.tyfsadik.org
- GitHub: TYFSADIK/TYF-AI
- Website: tyfsadik.org
π Citation
If you use TYF-AI in your research or applications, please cite:
@misc{tyf-ai-2025,
title={TYF-AI: Advanced Reasoning AI with Multimodal Capabilities},
author={Sadik, MD. Taki Yasir Faraji},
year={2025},
url={https://huggingface.co/TYFSADIK/TYF-AI},
note={Self-hosted AI assistant with reasoning, coding, and web search}
}
π License
This model is released under the MIT License.
MIT License
Copyright (c) 2025 MD. Taki Yasir Faraji Sadik
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
π€ Author
MD. Taki Yasir Faraji Sadik (Taki)
- π Website: tyfsadik.org
- π€ Live Demo: ai.tyfsadik.org
- π» GitHub: TYFSADIK
- πΌ LinkedIn: MD. Taki Yasir Faraji Sadik
- π§ Email: taki@tyfsadik.org
π Acknowledgments
Built with amazing open-source tools:
- llama.cpp by ggml-org - Efficient LLM inference
- Open WebUI - Beautiful chat interface
- SearXNG - Privacy-respecting search
- Hugging Face - Model hosting platform
Special thanks to the open-source AI community for making this possible.
- Downloads last month
- 52
We're not able to determine the quantization variants.