match-api / README.md
JermaineAI's picture
changed readme file
3407011
metadata
title: CineMatch API
emoji: ๐ŸŽฌ
colorFrom: purple
colorTo: pink
sdk: docker
app_port: 7860

๐ŸŽฌ CineMatch API

CineMatch is an intelligent, content-based movie recommendation engine powered by cutting-edge AI. It combines semantic search, vector embeddings, and personalization to deliver highly accurate movie recommendations tailored to user preferences.

๐Ÿ“‹ Table of Contents


โœจ Features

1. Semantic Search ๐Ÿ”

Search for movies using natural language queries. The system converts your text into a vector embedding and finds semantically similar movies.

  • Example: "A romantic movie about a sinking ship" โ†’ Returns Titanic

2. Vibe-Based Recommendations ๐ŸŽฏ

Search by combining tags (genres, themes) and descriptions for more refined results.

  • Example: Tags: ["Sci-Fi", "Action"], Description: "Robots fighting in space" โ†’ Returns relevant matches

3. Personalized Recommendations ๐Ÿ‘ค

Provide a list of movies you've liked, and the system averages their vectors to create a personalized profile, then recommends similar movies.

  • Example: Liked: ["The Matrix", "Inception"] โ†’ Get similar mind-bending films

4. Content-Based Similarity ๐Ÿ”—

Find movies similar to a specific title already in the database.

  • Example: Similar to "Inception" โ†’ Returns "Interstellar", "The Matrix", etc.

5. Rich Movie Metadata ๐Ÿ“Š

Each movie includes:

  • Director information
  • Top 4 cast members
  • Keywords (e.g., "time travel", "dystopia")
  • Genres
  • Plot overview
  • IMDB ratings

6. Incremental Learning ๐Ÿ“ˆ

Add new movies to the system without retrainingโ€”updates are instant!


๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   User Request  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
    โ”Œโ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚   FastAPI Server         โ”‚
    โ”‚  (Endpoint Handler)       โ”‚
    โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
    โ”Œโ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚  MovieRecommender Engine      โ”‚
    โ”‚  (FAISS + Vector Search)      โ”‚
    โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
    โ”Œโ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚  Embedding Model          โ”‚
    โ”‚  (SentenceTransformers)   โ”‚
    โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
    โ”Œโ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚  FAISS Index              โ”‚
    โ”‚  (movie_index.faiss)      โ”‚
    โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
    โ”Œโ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚  Movie Metadata           โ”‚
    โ”‚  (metadata.pkl)           โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ› ๏ธ Tech Stack

Component Technology Purpose
Backend Framework FastAPI High-performance async API
Vector Search FAISS Fast similarity search on embeddings
Embeddings SentenceTransformers (MiniLM-L6-v2) Convert text to 384-dim vectors
Data Source TMDB API Movie metadata (titles, cast, genres, etc.)
Data Processing Pandas, NumPy Data cleaning & preprocessing
Deployment Docker Containerized deployment
Python Version 3.9+ Modern async/await support

๐Ÿ“ฆ Installation

Prerequisites

  • Python 3.9 or higher
  • TMDB API Key (free, get it at themoviedb.org)
  • ~2GB free disk space (for models and indices)

Step 1: Clone & Setup

# Navigate to project directory
cd CineMatch

# Create virtual environment
python -m venv .venv

# Activate virtual environment
# On Windows:
.\.venv\Scripts\Activate.ps1

# On macOS/Linux:
source .venv/bin/activate

Step 2: Install Dependencies

pip install -r requirements.txt

Step 3: Configure Environment

Create a .env file in the project root:

TMDB_API_KEY=your_api_key_here

โš™๏ธ Configuration

Environment Variables

Variable Description Example
TMDB_API_KEY Your TMDB API key abc123xyz...

Model Configuration

The default embedding model is all-MiniLM-L6-v2 from SentenceTransformers:

  • Embedding Dimension: 384
  • Speed: Very fast (optimized for CPU)
  • Quality: High for semantic similarity
  • Memory: ~80MB

To use a different model, modify recommender.py in the MovieRecommender.__init__() method.


๐Ÿš€ Usage

Running the Server

# Make sure your virtual environment is activated
python app.py

The server will start at http://localhost:8000

API Documentation (auto-generated Swagger UI):

  • Swagger: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

Data Ingestion

Before using the API, you need to populate the FAISS index with movies:

python src/ingest.py

This will:

  1. Fetch ~50 high-quality movies from TMDB (popularity โ‰ฅ 7.0, votes โ‰ฅ 500)
  2. Extract director, cast, and keywords for each movie
  3. Generate embeddings
  4. Save to models/movie_index.faiss and models/metadata.pkl

To reset and rebuild the index:

# In src/ingest.py, modify the last line:
ingest_high_quality_movies(target_count=100, reset=True)  # reset=True to rebuild

๐Ÿ“ก API Endpoints

1. Health Check

GET /

Response:

{
  "status": "online and active!!!",
  "model_loaded": true
}

2. Semantic Search ๐Ÿ”

POST /search

Request:

{
  "query": "A romantic movie about a sinking ship",
  "k": 5
}

Response:

[
  {
    "movie_id": 597,
    "title": "Titanic",
    "score": 0.856
  },
  {
    "movie_id": 285,
    "title": "The Poseidon Adventure",
    "score": 0.743
  }
]

3. Vibe-Based Search ๐ŸŽฏ

POST /recommend/vibe

Request:

{
  "tags": ["Sci-Fi", "Action", "Space"],
  "description": "Robots fighting in space with stunning visuals",
  "k": 10
}

Response:

{
  "interpreted_query": "Sci-Fi Action Space Sci-Fi Action Space Robots fighting in space with stunning visuals",
  "results": [
    {
      "movie_id": 58,
      "title": "The Fifth Element",
      "score": 0.912
    }
  ]
}

4. Personalized Recommendations ๐Ÿ‘ค

POST /recommend/user

Request:

{
  "liked_movies": ["The Matrix", "Inception", "Interstellar"],
  "k": 5
}

Response:

[
  {
    "movie_id": 27205,
    "title": "Oblivion",
    "score": 0.834
  },
  {
    "movie_id": 284054,
    "title": "Doctor Strange",
    "score": 0.798
  }
]

5. Similar Movies ๐Ÿ”—

GET /recommend/movie/{title}

Example:

GET /recommend/movie/Inception

Response:

[
  {
    "movie_id": 38372,
    "title": "Interstellar",
    "score": 0.891
  },
  {
    "movie_id": 603,
    "title": "The Matrix",
    "score": 0.867
  }
]

6. Admin: Trigger Background Update ๐Ÿ”„

POST /admin/trigger-update

Response:

{
  "message": "Update process started in background. Check server logs for progress."
}

This endpoint triggers background ingestion without blocking the API.


๐Ÿ“ Examples

Example 1: Find Movies Similar to Your Favorite

import requests

BASE_URL = "http://localhost:8000"

# Get movies similar to "The Matrix"
response = requests.get(f"{BASE_URL}/recommend/movie/The Matrix")
recommendations = response.json()

for movie in recommendations:
    print(f"{movie['title']} (Score: {movie['score']:.2f})")

Example 2: Semantic Search with Natural Language

response = requests.post(
    f"{BASE_URL}/search",
    json={
        "query": "A thrilling space adventure with amazing visuals",
        "k": 5
    }
)

for movie in response.json():
    print(f"โœ“ {movie['title']}")

Example 3: Personalized Recommendations Based on History

response = requests.post(
    f"{BASE_URL}/recommend/user",
    json={
        "liked_movies": ["Dune", "Blade Runner 2049", "Arrival"],
        "k": 10
    }
)

for movie in response.json():
    print(f"โ˜… {movie['title']}")

๐Ÿ“‚ Project Structure

CineMatch/
โ”œโ”€โ”€ app.py                    # Main FastAPI application
โ”œโ”€โ”€ main.py                   # (Optional) Alternative entry point
โ”œโ”€โ”€ Dockerfile                # Docker configuration
โ”œโ”€โ”€ requirements.txt          # Python dependencies
โ”œโ”€โ”€ .env                      # API keys (create this)
โ”‚
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ recommender.py        # Core FAISS-based recommendation engine
โ”‚   โ”œโ”€โ”€ ingest.py             # TMDB data ingestion pipeline
โ”‚   โ””โ”€โ”€ preprocessing.py      # Data cleaning & feature engineering
โ”‚
โ”œโ”€โ”€ models/
โ”‚   โ”œโ”€โ”€ movie_index.faiss     # FAISS index (generated after ingestion)
โ”‚   โ””โ”€โ”€ metadata.pkl          # Movie metadata dataframe (generated)
โ”‚
โ”œโ”€โ”€ eda/
โ”‚   โ””โ”€โ”€ Untitled.ipynb        # Exploratory data analysis notebook
โ”‚
โ””โ”€โ”€ README.md                 # This file

๐Ÿง  How It Works

The Embedding Pipeline

Raw Text Input (Movie Title + Metadata)
          โ†“
    [SentenceTransformers]
          โ†“
    384-Dimensional Vector
          โ†“
    [L2 Normalization]
          โ†“
    Normalized Vector (Unit Length)
          โ†“
    [FAISS IndexFlatIP]
          โ†“
    Stored in Index

Recommendation Flow

  1. User provides query (text, tags, or movie titles)
  2. Convert to vector using SentenceTransformers
  3. Normalize vector (for cosine similarity)
  4. FAISS search finds K nearest neighbors in index
  5. Return results with similarity scores

Why This Approach?

  • Fast: FAISS is optimized for billion-scale vector search
  • Accurate: Semantic embeddings capture meaning, not just keywords
  • Scalable: Can handle millions of movies
  • CPU-Friendly: MiniLM model is tiny but effective
  • Incremental: Add movies without retraining

โšก Performance Considerations

Indexing Speed

  • MiniLM Model: ~100-200 movies/second on modern CPU
  • FAISS Indexing: Instant for additions
  • Memory: ~384 bytes per movie embedding

Search Speed

  • Single Query: 1-5ms
  • Batch Queries: Linear time complexity O(n)
  • Max Practical Size: 10+ million movies

Optimization Tips

  1. Use Batch Processing: Send multiple queries at once
  2. Tune k Parameter: Lower k = faster results (typically k=5-10 is good)
  3. CPU: The MiniLM model leverages BLAS libraries for speed
  4. GPU: Optionalโ€”can speed up embedding generation 10x

๐Ÿณ Deployment

Docker Build & Run

# Build image
docker build -t cinematch:latest .

# Run container
docker run -p 8000:8000 \
  -e TMDB_API_KEY=your_key \
  cinematch:latest

Production Deployment

The project includes a Dockerfile configured for production use:

  • Base Image: Python 3.9+
  • Port: 8000 (configurable)
  • Entry: python app.py

For production, consider:

  • Using Gunicorn or Uvicorn with multiple workers
  • Adding Nginx reverse proxy
  • Implementing authentication (API keys)
  • Using cloud storage for models (S3, GCS)

๐Ÿ› Troubleshooting

Issue: "No model found" Error

Solution: Run data ingestion first:

python src/ingest.py

Issue: TMDB API Key Invalid

Solution: Verify your .env file:

cat .env  # Check the key is there

Issue: Out of Memory

Solution: Reduce batch size in recommender.py:

batch_size = 32  # Lower from 64

Issue: Slow Embedding Generation

Solution:

  • The MiniLM model is already optimized for CPU
  • For GPU support, install PyTorch with CUDA:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Issue: CORS Errors

Solution: Already handled in app.py. The API allows all origins (allow_origins=["*"]). For production, restrict this:

allow_origins=["https://yourdomain.com"]

๐Ÿ“Š Dataset Information

Movie Source: The Movie Database (TMDB) API

Filtering Criteria:

  • Minimum Rating: 7.0 / 10.0
  • Minimum Vote Count: 500 votes
  • Sorted by: Popularity (descending)

Metadata Included:

  • Title
  • Director
  • Cast (top 4 actors)
  • Keywords
  • Genres
  • Overview / Plot
  • Vote Average

๐Ÿ”ฎ Future Enhancements

  • User authentication & API key management
  • Collaborative filtering (user-user similarity)
  • Real-time model updates with webhooks
  • Advanced filtering (year, rating, runtime)
  • Movie rating & feedback loop for model improvement
  • Multi-language support
  • Mobile app integration

๐Ÿ“„ License

This project is open source. Feel free to modify and extend it!


๐Ÿ’ฌ Support

For issues, questions, or contributions:

  1. Check the Troubleshooting section
  2. Review the API Documentation
  3. Examine the source code in src/ directory

Enjoy discovering your next favorite movie! ๐Ÿฟ๐ŸŽฌ