File size: 3,479 Bytes
615441e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# Project Gemini: YouTube Dual-Language Subtitle Backend Service

This document outlines the development guidelines, architecture, and technology stack for the YouTube Dual-Language Subtitle translation service.

## 1. Core Mission

To create a free, open-source, and self-hosted backend service that provides English-to-Chinese translation for a companion browser extension. The service is designed for efficient deployment on **Hugging Face Spaces**.

## 2. Architecture Overview

The project is a standalone Python-based microservice built with FastAPI. It exposes a single API endpoint to receive text, translates it using a pre-loaded Hugging Face model, and returns the result. This design allows the heavy lifting of AI translation to be handled by a dedicated, scalable server.

## 3. Technology Stack

### Backend (Translation Service)
-   **Framework**: Python with FastAPI (for creating a high-performance API)
-   **AI/ML Library**: Hugging Face `transformers` and `torch`
-   **Translation Model**: `Helsinki-NLP/opus-mt-en-zh` (A lightweight, high-quality model for English-to-Chinese translation)
-   **Server**: Uvicorn

## 4. Deployment & Development on Hugging Face Spaces

### Primary Hosting & Version Control
-   **Platform**: **Hugging Face Spaces** is used for both hosting the service and the Git repository.
-   **Deployment Trigger**: Pushing code to the `main` branch of the Hugging Face repository automatically triggers a new build and deployment on the Space.

### Hugging Face Spaces Best Practices
1.  **`app.py`**: The main application file must be named `app.py` and located at the root of the repository. It will contain the FastAPI application logic.
2.  **`requirements.txt`**: All Python dependencies must be listed in a `requirements.txt` file. The Space will automatically install these dependencies upon deployment.
3.  **Secrets Management**: Use Hugging Face Space Secrets for storing any sensitive information. Do not hardcode secrets in the source code. For local development, `huggingface-cli login` can be used to manage credentials.
4.  **Resource Configuration**: The `README.md` file's metadata block (YAML front matter) is used to configure the Space's hardware. For this project, a CPU instance is sufficient and should be specified to optimize resource allocation.
5.  **Health Checks**: FastAPI provides a default `/docs` endpoint which serves as a basic health check to verify the service is running.

### API Workflow
1.  **Request**: The service waits for a POST request to its `/translate` endpoint. The request body should contain the English text to be translated.
2.  **Translate**: The service utilizes the `Helsinki-NLP/opus-mt-en-zh` model to translate the received text into Chinese.
3.  **Respond**: The service returns a JSON object containing the translated text.

### Error Handling
-   The backend will include robust error handling for invalid requests, translation failures, or other server-side issues. It will return appropriate HTTP status codes and clear error messages in the response body.

## 5. Code Quality & Conventions

-   **Naming**: Use descriptive and clear names for variables and functions (e.g., `translate_text`, `translation_router`).
-   **Comments**: Add comments to explain the "why" behind complex logic, not the "what".
-   **Style**: Follow standard Python (PEP 8) and FastAPI best practices.
-   **Commit Messages**: Keep commit titles concise, lowercase, and under 70 characters.