Spaces:
Sleeping
Sleeping
| # Project Gemini: YouTube Dual-Language Subtitle Backend Service | |
| This document outlines the development guidelines, architecture, and technology stack for the YouTube Dual-Language Subtitle translation service. | |
| ## 1. Core Mission | |
| To create a free, open-source, and self-hosted backend service that provides English-to-Chinese translation for a companion browser extension. The service is designed for efficient deployment on **Hugging Face Spaces**. | |
| ## 2. Architecture Overview | |
| The project is a standalone Python-based microservice built with FastAPI. It exposes a single API endpoint to receive text, translates it using a pre-loaded Hugging Face model, and returns the result. This design allows the heavy lifting of AI translation to be handled by a dedicated, scalable server. | |
| ## 3. Technology Stack | |
| ### Backend (Translation Service) | |
| - **Framework**: Python with FastAPI (for creating a high-performance API) | |
| - **AI/ML Library**: Hugging Face `transformers` and `torch` | |
| - **Translation Model**: `Helsinki-NLP/opus-mt-en-zh` (A lightweight, high-quality model for English-to-Chinese translation) | |
| - **Server**: Uvicorn | |
| ## 4. Deployment & Development on Hugging Face Spaces | |
| ### Primary Hosting & Version Control | |
| - **Platform**: **Hugging Face Spaces** is used for both hosting the service and the Git repository. | |
| - **Deployment Trigger**: Pushing code to the `main` branch of the Hugging Face repository automatically triggers a new build and deployment on the Space. | |
| ### Hugging Face Spaces Best Practices | |
| 1. **`app.py`**: The main application file must be named `app.py` and located at the root of the repository. It will contain the FastAPI application logic. | |
| 2. **`requirements.txt`**: All Python dependencies must be listed in a `requirements.txt` file. The Space will automatically install these dependencies upon deployment. | |
| 3. **Secrets Management**: Use Hugging Face Space Secrets for storing any sensitive information. Do not hardcode secrets in the source code. For local development, `huggingface-cli login` can be used to manage credentials. | |
| 4. **Resource Configuration**: The `README.md` file's metadata block (YAML front matter) is used to configure the Space's hardware. For this project, a CPU instance is sufficient and should be specified to optimize resource allocation. | |
| 5. **Health Checks**: FastAPI provides a default `/docs` endpoint which serves as a basic health check to verify the service is running. | |
| ### API Workflow | |
| 1. **Request**: The service waits for a POST request to its `/translate` endpoint. The request body should contain the English text to be translated. | |
| 2. **Translate**: The service utilizes the `Helsinki-NLP/opus-mt-en-zh` model to translate the received text into Chinese. | |
| 3. **Respond**: The service returns a JSON object containing the translated text. | |
| ### Error Handling | |
| - The backend will include robust error handling for invalid requests, translation failures, or other server-side issues. It will return appropriate HTTP status codes and clear error messages in the response body. | |
| ## 5. Code Quality & Conventions | |
| - **Naming**: Use descriptive and clear names for variables and functions (e.g., `translate_text`, `translation_router`). | |
| - **Comments**: Add comments to explain the "why" behind complex logic, not the "what". | |
| - **Style**: Follow standard Python (PEP 8) and FastAPI best practices. | |
| - **Commit Messages**: Keep commit titles concise, lowercase, and under 70 characters. |