key_word_Fast_API / PROJECT_DOCS.md
ihtesham0345's picture
Add documentation for running on RTX 4050 6GB
2caec2a
|
Raw
History Blame Contribute Delete
5.36 kB

πŸ“˜ SEO Keyword Analyzer API - Complete Development Guide

1. Project Overview

This project is an AI-Powered Microservice built with FastAPI. It serves as an intelligent SEO consultant that accepts a topic (e.g., "Digital Marketing") and generates a comprehensive strategy including:

  • High-volume Keywords
  • Viral Hashtags
  • Competition Analysis
  • Strategic Tips

Key Feature: This project runs the Qwen2.5-0.5B-Instruct model LOCALLY inside the container.

  • Zero External Dependencies: It does NOT use an external API. The brain lives inside the app.
  • 100% Free: No rate limits, no credit usage.
  • Privacy: Data never leaves your container.

2. Technology Stack used

We used the following technologies to build this application from scratch:

Component Technology Purpose
Framework FastAPI High-performance web framework for building the API endpoints.
AI Model Qwen2.5-0.5B-Instruct The "Nano" model. Ultra-lightweight (0.5B) for maximum speed and zero timeouts.
Connection Local Inference (CPU) No API calls. The model lives inside your app. Zero external dependencies.
Container Docker + PyTorch Includes Torch/Transformers to run the AI engine self-contained.
Deployment Hugging Face Spaces The cloud platform hosting the Docker container.

3. Directory Structure Explaination

Here is how the project files are organized:

SEO_Analyzer_FastAPI/
β”œβ”€β”€ main.py                 # 🚦 Entry Point: Defines the API routes & server.
β”œβ”€β”€ requirements.txt        # πŸ“¦ Dependencies: Lists libraries (torch, transformers, fastapi).
β”œβ”€β”€ Dockerfile              # 🐳 Deployment: Instructions to build the Linux container.
β”œβ”€β”€ models/
β”‚   └── schemas.py          # πŸ“ Data Models: Pydantic classes to validate input/output.
└── services/
    └── analyzer.py         # 🧠 The Brain: Loads the Local Model and handles inference.

4. How It Was Built (A to Z)

Step 1: Defining the Data Structure (models/schemas.py)

Before writing code, we defined what the "Input" and "Output" should look like using Pydantic.

  • Input: A simple JSON object {"content": "..."}.
  • Output: A strict JSON schema ensuring the UI always receives core_keywords, hashtags, relevance scores, etc.

Step 2: Building the Logic Core (services/analyzer.py)

This is the heart of the "Local AI" engine:

  1. Loading: On startup, we use transformers.pipeline to download Qwen2.5-0.5B (approx 1GB).
  2. Inference: When a request comes in, the CPU runs the mathematical calculations to generate text.
  3. Optimization: We use torch_dtype=bfloat16 to make it run faster and use less RAM.
  4. Temperature Control: We set temperature=0.3 to make the AI strict and reliable for JSON.

Step 3: Creating the API Endpoints (main.py)

We created a FastAPI app with two routes:

  • GET /: A health check.
  • POST /analyze-seo: The main worker. It includes a Safety Net that auto-fills missing data if the AI makes a mistake.

Step 4: Dockerization (Dockerfile)

To make this run on the cloud:

  • Base Image: python:3.9
  • Dependency: We install torch (PyTorch) so the AI can run mathematically.
  • Port: Exposes port 7860 for Hugging Face Spaces.

5. How It Works (The Flow)

  1. User Action: Sends a request: POST {"content": "dropshipping"}.
  2. API Layer: FastAPI receives it.
  3. Local Inference:
    • The server passes the text to the loaded Qwen model.
    • The CPU generates the response token-by-token.
    • This takes ~10-20 seconds.
  4. Parsing & Repair: The app cleans the JSON and fixes any syntax errors automatically.
  5. Response: The user receives the data.

6. How to Run Locally

  1. Install Requirements:

    pip install -r requirements.txt
    
  2. No Keys Needed: You do NOT need an API key. It runs locally.

  3. Run the Server:

    python -m uvicorn main:app --reload
    

    Note: The first run will download the model (1GB).

  4. Access Documentation: Open http://localhost:8000/docs.


7. Configuration Limitations

  • CPU Speed: Since it runs on a free CPU, we limit generation to 30 keywords to ensure it finishes quickly.
  • Model Choice: We used the 0.5B (Nano) model because it is the only modern LLM that fits comfortably in the free tier RAM while remaining fast.

Developed by Ihtesham | Powered by Open Source AI


8. Local Hardware Recommendations (RTX 4050)

If you run this application on an RTX 4050 (6GB VRAM) + 16GB RAM:

Model Size VRAM Usage Speed (Tokens/s) Recommendation
Qwen-0.5B 0.5B ~0.8 GB 100+ (Instant) ⚑ Overkill Speed. Low Memory usage.
Qwen-1.5B 1.5B ~2.5 GB 70+ (Very Fast) βœ… Perfect Balance. Best for 6GB cards.
Qwen-7B (4-bit) 7B ~5.5 GB 30+ (Fast) 🧠 Smartest. Maxes out your VRAM.

Conclusion: Your RTX 4050 is 10x more powerful than the Free Tier CPU. You should upgrade to the 1.5B Model locally for better intelligence without sacrificing speed.