title: CineMatch API
emoji: ๐ฌ
colorFrom: purple
colorTo: pink
sdk: docker
app_port: 7860
๐ฌ CineMatch API
CineMatch is an intelligent, content-based movie recommendation engine powered by cutting-edge AI. It combines semantic search, vector embeddings, and personalization to deliver highly accurate movie recommendations tailored to user preferences.
๐ Table of Contents
- Features
- Architecture
- Tech Stack
- Installation
- Configuration
- Usage
- Project Structure
- How It Works
- Performance Considerations
- Deployment
- Troubleshooting
โจ Features
1. Semantic Search ๐
Search for movies using natural language queries. The system converts your text into a vector embedding and finds semantically similar movies.
- Example: "A romantic movie about a sinking ship" โ Returns Titanic
2. Vibe-Based Recommendations ๐ฏ
Search by combining tags (genres, themes) and descriptions for more refined results.
- Example: Tags:
["Sci-Fi", "Action"], Description:"Robots fighting in space"โ Returns relevant matches
3. Personalized Recommendations ๐ค
Provide a list of movies you've liked, and the system averages their vectors to create a personalized profile, then recommends similar movies.
- Example: Liked:
["The Matrix", "Inception"]โ Get similar mind-bending films
4. Content-Based Similarity ๐
Find movies similar to a specific title already in the database.
- Example: Similar to "Inception" โ Returns "Interstellar", "The Matrix", etc.
5. Rich Movie Metadata ๐
Each movie includes:
- Director information
- Top 4 cast members
- Keywords (e.g., "time travel", "dystopia")
- Genres
- Plot overview
- IMDB ratings
6. Incremental Learning ๐
Add new movies to the system without retrainingโupdates are instant!
๐๏ธ Architecture
โโโโโโโโโโโโโโโโโโโ
โ User Request โ
โโโโโโโโโโฌโโโโโโโโโ
โ
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโ
โ FastAPI Server โ
โ (Endpoint Handler) โ
โโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MovieRecommender Engine โ
โ (FAISS + Vector Search) โ
โโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโ
โ Embedding Model โ
โ (SentenceTransformers) โ
โโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโ
โ FAISS Index โ
โ (movie_index.faiss) โ
โโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโ
โ Movie Metadata โ
โ (metadata.pkl) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ ๏ธ Tech Stack
| Component | Technology | Purpose |
|---|---|---|
| Backend Framework | FastAPI | High-performance async API |
| Vector Search | FAISS | Fast similarity search on embeddings |
| Embeddings | SentenceTransformers (MiniLM-L6-v2) | Convert text to 384-dim vectors |
| Data Source | TMDB API | Movie metadata (titles, cast, genres, etc.) |
| Data Processing | Pandas, NumPy | Data cleaning & preprocessing |
| Deployment | Docker | Containerized deployment |
| Python Version | 3.9+ | Modern async/await support |
๐ฆ Installation
Prerequisites
- Python 3.9 or higher
- TMDB API Key (free, get it at themoviedb.org)
- ~2GB free disk space (for models and indices)
Step 1: Clone & Setup
# Navigate to project directory
cd CineMatch
# Create virtual environment
python -m venv .venv
# Activate virtual environment
# On Windows:
.\.venv\Scripts\Activate.ps1
# On macOS/Linux:
source .venv/bin/activate
Step 2: Install Dependencies
pip install -r requirements.txt
Step 3: Configure Environment
Create a .env file in the project root:
TMDB_API_KEY=your_api_key_here
โ๏ธ Configuration
Environment Variables
| Variable | Description | Example |
|---|---|---|
TMDB_API_KEY |
Your TMDB API key | abc123xyz... |
Model Configuration
The default embedding model is all-MiniLM-L6-v2 from SentenceTransformers:
- Embedding Dimension: 384
- Speed: Very fast (optimized for CPU)
- Quality: High for semantic similarity
- Memory: ~80MB
To use a different model, modify recommender.py in the MovieRecommender.__init__() method.
๐ Usage
Running the Server
# Make sure your virtual environment is activated
python app.py
The server will start at http://localhost:8000
API Documentation (auto-generated Swagger UI):
- Swagger:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
Data Ingestion
Before using the API, you need to populate the FAISS index with movies:
python src/ingest.py
This will:
- Fetch ~50 high-quality movies from TMDB (popularity โฅ 7.0, votes โฅ 500)
- Extract director, cast, and keywords for each movie
- Generate embeddings
- Save to
models/movie_index.faissandmodels/metadata.pkl
To reset and rebuild the index:
# In src/ingest.py, modify the last line:
ingest_high_quality_movies(target_count=100, reset=True) # reset=True to rebuild
๐ก API Endpoints
1. Health Check
GET /
Response:
{
"status": "online and active!!!",
"model_loaded": true
}
2. Semantic Search ๐
POST /search
Request:
{
"query": "A romantic movie about a sinking ship",
"k": 5
}
Response:
[
{
"movie_id": 597,
"title": "Titanic",
"score": 0.856
},
{
"movie_id": 285,
"title": "The Poseidon Adventure",
"score": 0.743
}
]
3. Vibe-Based Search ๐ฏ
POST /recommend/vibe
Request:
{
"tags": ["Sci-Fi", "Action", "Space"],
"description": "Robots fighting in space with stunning visuals",
"k": 10
}
Response:
{
"interpreted_query": "Sci-Fi Action Space Sci-Fi Action Space Robots fighting in space with stunning visuals",
"results": [
{
"movie_id": 58,
"title": "The Fifth Element",
"score": 0.912
}
]
}
4. Personalized Recommendations ๐ค
POST /recommend/user
Request:
{
"liked_movies": ["The Matrix", "Inception", "Interstellar"],
"k": 5
}
Response:
[
{
"movie_id": 27205,
"title": "Oblivion",
"score": 0.834
},
{
"movie_id": 284054,
"title": "Doctor Strange",
"score": 0.798
}
]
5. Similar Movies ๐
GET /recommend/movie/{title}
Example:
GET /recommend/movie/Inception
Response:
[
{
"movie_id": 38372,
"title": "Interstellar",
"score": 0.891
},
{
"movie_id": 603,
"title": "The Matrix",
"score": 0.867
}
]
6. Admin: Trigger Background Update ๐
POST /admin/trigger-update
Response:
{
"message": "Update process started in background. Check server logs for progress."
}
This endpoint triggers background ingestion without blocking the API.
๐ Examples
Example 1: Find Movies Similar to Your Favorite
import requests
BASE_URL = "http://localhost:8000"
# Get movies similar to "The Matrix"
response = requests.get(f"{BASE_URL}/recommend/movie/The Matrix")
recommendations = response.json()
for movie in recommendations:
print(f"{movie['title']} (Score: {movie['score']:.2f})")
Example 2: Semantic Search with Natural Language
response = requests.post(
f"{BASE_URL}/search",
json={
"query": "A thrilling space adventure with amazing visuals",
"k": 5
}
)
for movie in response.json():
print(f"โ {movie['title']}")
Example 3: Personalized Recommendations Based on History
response = requests.post(
f"{BASE_URL}/recommend/user",
json={
"liked_movies": ["Dune", "Blade Runner 2049", "Arrival"],
"k": 10
}
)
for movie in response.json():
print(f"โ
{movie['title']}")
๐ Project Structure
CineMatch/
โโโ app.py # Main FastAPI application
โโโ main.py # (Optional) Alternative entry point
โโโ Dockerfile # Docker configuration
โโโ requirements.txt # Python dependencies
โโโ .env # API keys (create this)
โ
โโโ src/
โ โโโ __init__.py
โ โโโ recommender.py # Core FAISS-based recommendation engine
โ โโโ ingest.py # TMDB data ingestion pipeline
โ โโโ preprocessing.py # Data cleaning & feature engineering
โ
โโโ models/
โ โโโ movie_index.faiss # FAISS index (generated after ingestion)
โ โโโ metadata.pkl # Movie metadata dataframe (generated)
โ
โโโ eda/
โ โโโ Untitled.ipynb # Exploratory data analysis notebook
โ
โโโ README.md # This file
๐ง How It Works
The Embedding Pipeline
Raw Text Input (Movie Title + Metadata)
โ
[SentenceTransformers]
โ
384-Dimensional Vector
โ
[L2 Normalization]
โ
Normalized Vector (Unit Length)
โ
[FAISS IndexFlatIP]
โ
Stored in Index
Recommendation Flow
- User provides query (text, tags, or movie titles)
- Convert to vector using SentenceTransformers
- Normalize vector (for cosine similarity)
- FAISS search finds K nearest neighbors in index
- Return results with similarity scores
Why This Approach?
- Fast: FAISS is optimized for billion-scale vector search
- Accurate: Semantic embeddings capture meaning, not just keywords
- Scalable: Can handle millions of movies
- CPU-Friendly: MiniLM model is tiny but effective
- Incremental: Add movies without retraining
โก Performance Considerations
Indexing Speed
- MiniLM Model: ~100-200 movies/second on modern CPU
- FAISS Indexing: Instant for additions
- Memory: ~384 bytes per movie embedding
Search Speed
- Single Query: 1-5ms
- Batch Queries: Linear time complexity O(n)
- Max Practical Size: 10+ million movies
Optimization Tips
- Use Batch Processing: Send multiple queries at once
- Tune k Parameter: Lower k = faster results (typically k=5-10 is good)
- CPU: The MiniLM model leverages BLAS libraries for speed
- GPU: Optionalโcan speed up embedding generation 10x
๐ณ Deployment
Docker Build & Run
# Build image
docker build -t cinematch:latest .
# Run container
docker run -p 8000:8000 \
-e TMDB_API_KEY=your_key \
cinematch:latest
Production Deployment
The project includes a Dockerfile configured for production use:
- Base Image: Python 3.9+
- Port: 8000 (configurable)
- Entry:
python app.py
For production, consider:
- Using Gunicorn or Uvicorn with multiple workers
- Adding Nginx reverse proxy
- Implementing authentication (API keys)
- Using cloud storage for models (S3, GCS)
๐ Troubleshooting
Issue: "No model found" Error
Solution: Run data ingestion first:
python src/ingest.py
Issue: TMDB API Key Invalid
Solution: Verify your .env file:
cat .env # Check the key is there
Issue: Out of Memory
Solution: Reduce batch size in recommender.py:
batch_size = 32 # Lower from 64
Issue: Slow Embedding Generation
Solution:
- The MiniLM model is already optimized for CPU
- For GPU support, install PyTorch with CUDA:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Issue: CORS Errors
Solution: Already handled in app.py. The API allows all origins (allow_origins=["*"]). For production, restrict this:
allow_origins=["https://yourdomain.com"]
๐ Dataset Information
Movie Source: The Movie Database (TMDB) API
Filtering Criteria:
- Minimum Rating: 7.0 / 10.0
- Minimum Vote Count: 500 votes
- Sorted by: Popularity (descending)
Metadata Included:
- Title
- Director
- Cast (top 4 actors)
- Keywords
- Genres
- Overview / Plot
- Vote Average
๐ฎ Future Enhancements
- User authentication & API key management
- Collaborative filtering (user-user similarity)
- Real-time model updates with webhooks
- Advanced filtering (year, rating, runtime)
- Movie rating & feedback loop for model improvement
- Multi-language support
- Mobile app integration
๐ License
This project is open source. Feel free to modify and extend it!
๐ฌ Support
For issues, questions, or contributions:
- Check the Troubleshooting section
- Review the API Documentation
- Examine the source code in
src/directory
Enjoy discovering your next favorite movie! ๐ฟ๐ฌ