Spaces:

ihtesham0345
/

key_word_Fast_API

Sleeping

App Files Files Community

key_word_Fast_API / PROJECT_DOCS.md

ihtesham0345

Add documentation for running on RTX 4050 6GB

2caec2a 5 months ago

preview code

Raw

History Blame Contribute Delete

5.36 kB

	# 📘 SEO Keyword Analyzer API - Complete Development Guide

	## 1. Project Overview
	This project is an AI-Powered Microservice built with FastAPI. It serves as an intelligent SEO consultant that accepts a topic (e.g., "Digital Marketing") and generates a comprehensive strategy including:
	- High-volume Keywords
	- Viral Hashtags
	- Competition Analysis
	- Strategic Tips

	Key Feature: This project runs the Qwen2.5-0.5B-Instruct model LOCALLY inside the container.
	- Zero External Dependencies: It does NOT use an external API. The brain lives inside the app.
	- 100% Free: No rate limits, no credit usage.
	- Privacy: Data never leaves your container.

	---

	## 2. Technology Stack used
	We used the following technologies to build this application from scratch:

	\| Component \| Technology \| Purpose \|
	\| :--- \| :--- \| :--- \|
	\| Framework \| FastAPI \| High-performance web framework for building the API endpoints. \|
	\| AI Model \| Qwen2.5-0.5B-Instruct \| The "Nano" model. Ultra-lightweight (0.5B) for maximum speed and zero timeouts. \|
	\| Connection \| Local Inference (CPU) \| No API calls. The model lives inside your app. Zero external dependencies. \|
	\| Container \| Docker + PyTorch \| Includes Torch/Transformers to run the AI engine self-contained. \|
	\| Deployment \| Hugging Face Spaces \| The cloud platform hosting the Docker container. \|

	---

	## 3. Directory Structure Explaination
	Here is how the project files are organized:

	```
	SEO_Analyzer_FastAPI/
	├── main.py # 🚦 Entry Point: Defines the API routes & server.
	├── requirements.txt # 📦 Dependencies: Lists libraries (torch, transformers, fastapi).
	├── Dockerfile # 🐳 Deployment: Instructions to build the Linux container.
	├── models/
	│ └── schemas.py # 📝 Data Models: Pydantic classes to validate input/output.
	└── services/
	└── analyzer.py # 🧠 The Brain: Loads the Local Model and handles inference.
	```

	---

	## 4. How It Was Built (A to Z)

	### Step 1: Defining the Data Structure (`models/schemas.py`)
	Before writing code, we defined what the "Input" and "Output" should look like using Pydantic.
	- Input: A simple JSON object `{"content": "..."}`.
	- Output: A strict JSON schema ensuring the UI always receives `core_keywords`, `hashtags`, `relevance` scores, etc.

	### Step 2: Building the Logic Core (`services/analyzer.py`)
	This is the heart of the "Local AI" engine:
	1. Loading: On startup, we use `transformers.pipeline` to download `Qwen2.5-0.5B` (approx 1GB).
	2. Inference: When a request comes in, the CPU runs the mathematical calculations to generate text.
	3. Optimization: We use `torch_dtype=bfloat16` to make it run faster and use less RAM.
	4. Temperature Control: We set `temperature=0.3` to make the AI strict and reliable for JSON.

	### Step 3: Creating the API Endpoints (`main.py`)
	We created a FastAPI app with two routes:
	- `GET /`: A health check.
	- `POST /analyze-seo`: The main worker. It includes a Safety Net that auto-fills missing data if the AI makes a mistake.

	### Step 4: Dockerization (`Dockerfile`)
	To make this run on the cloud:
	- Base Image: `python:3.9`
	- Dependency: We install `torch` (PyTorch) so the AI can run mathematically.
	- Port: Exposes port 7860 for Hugging Face Spaces.

	---

	## 5. How It Works (The Flow)

	1. User Action: Sends a request: `POST {"content": "dropshipping"}`.
	2. API Layer: FastAPI receives it.
	3. Local Inference:
	- The server passes the text to the loaded Qwen model.
	- The CPU generates the response token-by-token.
	- This takes ~10-20 seconds.
	4. Parsing & Repair: The app cleans the JSON and fixes any syntax errors automatically.
	5. Response: The user receives the data.

	---

	## 6. How to Run Locally

	1. Install Requirements:
	```bash
	pip install -r requirements.txt
	```

	2. No Keys Needed: You do NOT need an API key. It runs locally.

	3. Run the Server:
	```bash
	python -m uvicorn main:app --reload
	```
	Note: The first run will download the model (1GB).

	4. Access Documentation:
	Open `http://localhost:8000/docs`.

	---

	## 7. Configuration Limitations
	- CPU Speed: Since it runs on a free CPU, we limit generation to 30 keywords to ensure it finishes quickly.
	- Model Choice: We used the 0.5B (Nano) model because it is the only modern LLM that fits comfortably in the free tier RAM while remaining fast.

	---
	Developed by Ihtesham \| Powered by Open Source AI

	---

	## 8. Local Hardware Recommendations (RTX 4050)
	If you run this application on an RTX 4050 (6GB VRAM) + 16GB RAM:

	\| Model \| Size \| VRAM Usage \| Speed (Tokens/s) \| Recommendation \|
	\| :--- \| :--- \| :--- \| :--- \| :--- \|
	\| Qwen-0.5B \| 0.5B \| ~0.8 GB \| 100+ (Instant) \| ⚡ Overkill Speed. Low Memory usage. \|
	\| Qwen-1.5B \| 1.5B \| ~2.5 GB \| 70+ (Very Fast) \| ✅ Perfect Balance. Best for 6GB cards. \|
	\| Qwen-7B (4-bit) \| 7B \| ~5.5 GB \| 30+ (Fast) \| 🧠 Smartest. Maxes out your VRAM. \|

	Conclusion: Your RTX 4050 is 10x more powerful than the Free Tier CPU. You should upgrade to the 1.5B Model locally for better intelligence without sacrificing speed.