key_word_Fast_API / PROJECT_DOCS.md
ihtesham0345's picture
Add documentation for running on RTX 4050 6GB
2caec2a
|
Raw
History Blame Contribute Delete
5.36 kB
# πŸ“˜ SEO Keyword Analyzer API - Complete Development Guide
## 1. Project Overview
This project is an **AI-Powered Microservice** built with **FastAPI**. It serves as an intelligent SEO consultant that accepts a topic (e.g., "Digital Marketing") and generates a comprehensive strategy including:
- High-volume Keywords
- Viral Hashtags
- Competition Analysis
- Strategic Tips
**Key Feature:** This project runs the **Qwen2.5-0.5B-Instruct** model **LOCALLY** inside the container.
- **Zero External Dependencies**: It does NOT use an external API. The brain lives inside the app.
- **100% Free**: No rate limits, no credit usage.
- **Privacy**: Data never leaves your container.
---
## 2. Technology Stack used
We used the following technologies to build this application from scratch:
| Component | Technology | Purpose |
| :--- | :--- | :--- |
| **Framework** | **FastAPI** | High-performance web framework for building the API endpoints. |
| **AI Model** | **Qwen2.5-0.5B-Instruct** | The "Nano" model. Ultra-lightweight (0.5B) for maximum speed and zero timeouts. |
| **Connection** | **Local Inference (CPU)** | No API calls. The model lives inside your app. Zero external dependencies. |
| **Container** | **Docker + PyTorch** | Includes Torch/Transformers to run the AI engine self-contained. |
| **Deployment** | **Hugging Face Spaces** | The cloud platform hosting the Docker container. |
---
## 3. Directory Structure Explaination
Here is how the project files are organized:
```
SEO_Analyzer_FastAPI/
β”œβ”€β”€ main.py # 🚦 Entry Point: Defines the API routes & server.
β”œβ”€β”€ requirements.txt # πŸ“¦ Dependencies: Lists libraries (torch, transformers, fastapi).
β”œβ”€β”€ Dockerfile # 🐳 Deployment: Instructions to build the Linux container.
β”œβ”€β”€ models/
β”‚ └── schemas.py # πŸ“ Data Models: Pydantic classes to validate input/output.
└── services/
└── analyzer.py # 🧠 The Brain: Loads the Local Model and handles inference.
```
---
## 4. How It Was Built (A to Z)
### Step 1: Defining the Data Structure (`models/schemas.py`)
Before writing code, we defined what the "Input" and "Output" should look like using **Pydantic**.
- **Input**: A simple JSON object `{"content": "..."}`.
- **Output**: A strict JSON schema ensuring the UI always receives `core_keywords`, `hashtags`, `relevance` scores, etc.
### Step 2: Building the Logic Core (`services/analyzer.py`)
This is the heart of the "Local AI" engine:
1. **Loading**: On startup, we use `transformers.pipeline` to download `Qwen2.5-0.5B` (approx 1GB).
2. **Inference**: When a request comes in, the **CPU** runs the mathematical calculations to generate text.
3. **Optimization**: We use `torch_dtype=bfloat16` to make it run faster and use less RAM.
4. **Temperature Control**: We set `temperature=0.3` to make the AI strict and reliable for JSON.
### Step 3: Creating the API Endpoints (`main.py`)
We created a FastAPI app with two routes:
- `GET /`: A health check.
- `POST /analyze-seo`: The main worker. It includes a **Safety Net** that auto-fills missing data if the AI makes a mistake.
### Step 4: Dockerization (`Dockerfile`)
To make this run on the cloud:
- **Base Image**: `python:3.9`
- **Dependency**: We install `torch` (PyTorch) so the AI can run mathematically.
- **Port**: Exposes port **7860** for Hugging Face Spaces.
---
## 5. How It Works (The Flow)
1. **User Action**: Sends a request: `POST {"content": "dropshipping"}`.
2. **API Layer**: FastAPI receives it.
3. **Local Inference**:
- The server passes the text to the loaded Qwen model.
- The **CPU** generates the response token-by-token.
- This takes ~10-20 seconds.
4. **Parsing & Repair**: The app cleans the JSON and fixes any syntax errors automatically.
5. **Response**: The user receives the data.
---
## 6. How to Run Locally
1. **Install Requirements**:
```bash
pip install -r requirements.txt
```
2. **No Keys Needed**: You do NOT need an API key. It runs locally.
3. **Run the Server**:
```bash
python -m uvicorn main:app --reload
```
*Note: The first run will download the model (1GB).*
4. **Access Documentation**:
Open `http://localhost:8000/docs`.
---
## 7. Configuration Limitations
- **CPU Speed**: Since it runs on a free CPU, we limit generation to **30 keywords** to ensure it finishes quickly.
- **Model Choice**: We used the **0.5B (Nano)** model because it is the only modern LLM that fits comfortably in the free tier RAM while remaining fast.
---
**Developed by Ihtesham | Powered by Open Source AI**
---
## 8. Local Hardware Recommendations (RTX 4050)
If you run this application on an **RTX 4050 (6GB VRAM) + 16GB RAM**:
| Model | Size | VRAM Usage | Speed (Tokens/s) | Recommendation |
| :--- | :--- | :--- | :--- | :--- |
| **Qwen-0.5B** | 0.5B | ~0.8 GB | **100+** (Instant) | ⚑ Overkill Speed. Low Memory usage. |
| **Qwen-1.5B** | 1.5B | ~2.5 GB | **70+** (Very Fast) | βœ… **Perfect Balance.** Best for 6GB cards. |
| **Qwen-7B (4-bit)** | 7B | ~5.5 GB | **30+** (Fast) | 🧠 **Smartest.** Maxes out your VRAM. |
**Conclusion**: Your RTX 4050 is **10x more powerful** than the Free Tier CPU. You should upgrade to the **1.5B Model** locally for better intelligence without sacrificing speed.