Spaces:

ihtesham0345
/

key_word_Fast_API

Sleeping

App Files Files Community

key_word_Fast_API / PROJECT_DOCS.md

ihtesham0345

Add documentation for running on RTX 4050 6GB

2caec2a 5 months ago

preview code

Raw

History Blame Contribute Delete

5.36 kB

📘 SEO Keyword Analyzer API - Complete Development Guide

1. Project Overview

This project is an AI-Powered Microservice built with FastAPI. It serves as an intelligent SEO consultant that accepts a topic (e.g., "Digital Marketing") and generates a comprehensive strategy including:

High-volume Keywords
Viral Hashtags
Competition Analysis
Strategic Tips

Key Feature: This project runs the Qwen2.5-0.5B-Instruct model LOCALLY inside the container.

Zero External Dependencies: It does NOT use an external API. The brain lives inside the app.
100% Free: No rate limits, no credit usage.
Privacy: Data never leaves your container.

2. Technology Stack used

We used the following technologies to build this application from scratch:

Component	Technology	Purpose
Framework	FastAPI	High-performance web framework for building the API endpoints.
AI Model	Qwen2.5-0.5B-Instruct	The "Nano" model. Ultra-lightweight (0.5B) for maximum speed and zero timeouts.
Connection	Local Inference (CPU)	No API calls. The model lives inside your app. Zero external dependencies.
Container	Docker + PyTorch	Includes Torch/Transformers to run the AI engine self-contained.
Deployment	Hugging Face Spaces	The cloud platform hosting the Docker container.

3. Directory Structure Explaination

Here is how the project files are organized:

SEO_Analyzer_FastAPI/
├── main.py                 # 🚦 Entry Point: Defines the API routes & server.
├── requirements.txt        # 📦 Dependencies: Lists libraries (torch, transformers, fastapi).
├── Dockerfile              # 🐳 Deployment: Instructions to build the Linux container.
├── models/
│   └── schemas.py          # 📝 Data Models: Pydantic classes to validate input/output.
└── services/
    └── analyzer.py         # 🧠 The Brain: Loads the Local Model and handles inference.

4. How It Was Built (A to Z)

Step 1: Defining the Data Structure (`models/schemas.py`)

Before writing code, we defined what the "Input" and "Output" should look like using Pydantic.

Input: A simple JSON object {"content": "..."}.
Output: A strict JSON schema ensuring the UI always receives core_keywords, hashtags, relevance scores, etc.

Step 2: Building the Logic Core (`services/analyzer.py`)

This is the heart of the "Local AI" engine:

Loading: On startup, we use transformers.pipeline to download Qwen2.5-0.5B (approx 1GB).
Inference: When a request comes in, the CPU runs the mathematical calculations to generate text.
Optimization: We use torch_dtype=bfloat16 to make it run faster and use less RAM.
Temperature Control: We set temperature=0.3 to make the AI strict and reliable for JSON.

Step 3: Creating the API Endpoints (`main.py`)

We created a FastAPI app with two routes:

GET /: A health check.
POST /analyze-seo: The main worker. It includes a Safety Net that auto-fills missing data if the AI makes a mistake.

Step 4: Dockerization (`Dockerfile`)

To make this run on the cloud:

Base Image: python:3.9
Dependency: We install torch (PyTorch) so the AI can run mathematically.
Port: Exposes port 7860 for Hugging Face Spaces.

5. How It Works (The Flow)

User Action: Sends a request: POST {"content": "dropshipping"}.
API Layer: FastAPI receives it.
Local Inference:
- The server passes the text to the loaded Qwen model.
- The CPU generates the response token-by-token.
- This takes ~10-20 seconds.
Parsing & Repair: The app cleans the JSON and fixes any syntax errors automatically.
Response: The user receives the data.

6. How to Run Locally

Install Requirements:
```
pip install -r requirements.txt
```
No Keys Needed: You do NOT need an API key. It runs locally.
Run the Server:
```
python -m uvicorn main:app --reload
```
Note: The first run will download the model (1GB).
Access Documentation: Open http://localhost:8000/docs.

7. Configuration Limitations

CPU Speed: Since it runs on a free CPU, we limit generation to 30 keywords to ensure it finishes quickly.
Model Choice: We used the 0.5B (Nano) model because it is the only modern LLM that fits comfortably in the free tier RAM while remaining fast.

Developed by Ihtesham | Powered by Open Source AI

8. Local Hardware Recommendations (RTX 4050)

If you run this application on an RTX 4050 (6GB VRAM) + 16GB RAM:

Model	Size	VRAM Usage	Speed (Tokens/s)	Recommendation
Qwen-0.5B	0.5B	~0.8 GB	100+ (Instant)	⚡ Overkill Speed. Low Memory usage.
Qwen-1.5B	1.5B	~2.5 GB	70+ (Very Fast)	✅ Perfect Balance. Best for 6GB cards.
Qwen-7B (4-bit)	7B	~5.5 GB	30+ (Fast)	🧠 Smartest. Maxes out your VRAM.

Conclusion: Your RTX 4050 is 10x more powerful than the Free Tier CPU. You should upgrade to the 1.5B Model locally for better intelligence without sacrificing speed.