# Project Gemini: YouTube Dual-Language Subtitle Backend Service This document outlines the development guidelines, architecture, and technology stack for the YouTube Dual-Language Subtitle translation service. ## 1. Core Mission To create a free, open-source, and self-hosted backend service that provides English-to-Chinese translation for a companion browser extension. The service is designed for efficient deployment on **Hugging Face Spaces**. ## 2. Architecture Overview The project is a standalone Python-based microservice built with FastAPI. It exposes a single API endpoint to receive text, translates it using a pre-loaded Hugging Face model, and returns the result. This design allows the heavy lifting of AI translation to be handled by a dedicated, scalable server. ## 3. Technology Stack ### Backend (Translation Service) - **Framework**: Python with FastAPI (for creating a high-performance API) - **AI/ML Library**: Hugging Face `transformers` and `torch` - **Translation Model**: `Helsinki-NLP/opus-mt-en-zh` (A lightweight, high-quality model for English-to-Chinese translation) - **Server**: Uvicorn ## 4. Deployment & Development on Hugging Face Spaces ### Primary Hosting & Version Control - **Platform**: **Hugging Face Spaces** is used for both hosting the service and the Git repository. - **Deployment Trigger**: Pushing code to the `main` branch of the Hugging Face repository automatically triggers a new build and deployment on the Space. ### Hugging Face Spaces Best Practices 1. **`app.py`**: The main application file must be named `app.py` and located at the root of the repository. It will contain the FastAPI application logic. 2. **`requirements.txt`**: All Python dependencies must be listed in a `requirements.txt` file. The Space will automatically install these dependencies upon deployment. 3. **Secrets Management**: Use Hugging Face Space Secrets for storing any sensitive information. Do not hardcode secrets in the source code. For local development, `huggingface-cli login` can be used to manage credentials. 4. **Resource Configuration**: The `README.md` file's metadata block (YAML front matter) is used to configure the Space's hardware. For this project, a CPU instance is sufficient and should be specified to optimize resource allocation. 5. **Health Checks**: FastAPI provides a default `/docs` endpoint which serves as a basic health check to verify the service is running. ### API Workflow 1. **Request**: The service waits for a POST request to its `/translate` endpoint. The request body should contain the English text to be translated. 2. **Translate**: The service utilizes the `Helsinki-NLP/opus-mt-en-zh` model to translate the received text into Chinese. 3. **Respond**: The service returns a JSON object containing the translated text. ### Error Handling - The backend will include robust error handling for invalid requests, translation failures, or other server-side issues. It will return appropriate HTTP status codes and clear error messages in the response body. ## 5. Code Quality & Conventions - **Naming**: Use descriptive and clear names for variables and functions (e.g., `translate_text`, `translation_router`). - **Comments**: Add comments to explain the "why" behind complex logic, not the "what". - **Style**: Follow standard Python (PEP 8) and FastAPI best practices. - **Commit Messages**: Keep commit titles concise, lowercase, and under 70 characters.