TransPlugin / GEMINI.md
angre369's picture
feat: Add initial Hugging Face Space files (app.py, Dockerfile, requirements.txt)
615441e
# Project Gemini: YouTube Dual-Language Subtitle Backend Service
This document outlines the development guidelines, architecture, and technology stack for the YouTube Dual-Language Subtitle translation service.
## 1. Core Mission
To create a free, open-source, and self-hosted backend service that provides English-to-Chinese translation for a companion browser extension. The service is designed for efficient deployment on **Hugging Face Spaces**.
## 2. Architecture Overview
The project is a standalone Python-based microservice built with FastAPI. It exposes a single API endpoint to receive text, translates it using a pre-loaded Hugging Face model, and returns the result. This design allows the heavy lifting of AI translation to be handled by a dedicated, scalable server.
## 3. Technology Stack
### Backend (Translation Service)
- **Framework**: Python with FastAPI (for creating a high-performance API)
- **AI/ML Library**: Hugging Face `transformers` and `torch`
- **Translation Model**: `Helsinki-NLP/opus-mt-en-zh` (A lightweight, high-quality model for English-to-Chinese translation)
- **Server**: Uvicorn
## 4. Deployment & Development on Hugging Face Spaces
### Primary Hosting & Version Control
- **Platform**: **Hugging Face Spaces** is used for both hosting the service and the Git repository.
- **Deployment Trigger**: Pushing code to the `main` branch of the Hugging Face repository automatically triggers a new build and deployment on the Space.
### Hugging Face Spaces Best Practices
1. **`app.py`**: The main application file must be named `app.py` and located at the root of the repository. It will contain the FastAPI application logic.
2. **`requirements.txt`**: All Python dependencies must be listed in a `requirements.txt` file. The Space will automatically install these dependencies upon deployment.
3. **Secrets Management**: Use Hugging Face Space Secrets for storing any sensitive information. Do not hardcode secrets in the source code. For local development, `huggingface-cli login` can be used to manage credentials.
4. **Resource Configuration**: The `README.md` file's metadata block (YAML front matter) is used to configure the Space's hardware. For this project, a CPU instance is sufficient and should be specified to optimize resource allocation.
5. **Health Checks**: FastAPI provides a default `/docs` endpoint which serves as a basic health check to verify the service is running.
### API Workflow
1. **Request**: The service waits for a POST request to its `/translate` endpoint. The request body should contain the English text to be translated.
2. **Translate**: The service utilizes the `Helsinki-NLP/opus-mt-en-zh` model to translate the received text into Chinese.
3. **Respond**: The service returns a JSON object containing the translated text.
### Error Handling
- The backend will include robust error handling for invalid requests, translation failures, or other server-side issues. It will return appropriate HTTP status codes and clear error messages in the response body.
## 5. Code Quality & Conventions
- **Naming**: Use descriptive and clear names for variables and functions (e.g., `translate_text`, `translation_router`).
- **Comments**: Add comments to explain the "why" behind complex logic, not the "what".
- **Style**: Follow standard Python (PEP 8) and FastAPI best practices.
- **Commit Messages**: Keep commit titles concise, lowercase, and under 70 characters.