Spaces:
Running
Running
| # Categorization Module 🏷️ | |
| ## Responsibility | |
| This module handles **automatic categorization of notes**. | |
| ## Functionality | |
| 1. Receive summary text. | |
| 2. Use **Google Gemini** to analyze content. | |
| 3. Return a single category (e.g., Programming, Medicine, History). | |
| ## Files | |
| ### 1. `categorizer.py` | |
| - **Purpose:** Categorize text using AI. | |
| - **Main Class:** `CategorizationService` | |
| - **Key Method:** `categorize_text(text)` - Returns category name. | |
| ## How It Works | |
| 1. **Receive Text:** Take first 2000 characters from summary. | |
| 2. **Send Prompt:** Ask Gemini to determine one or two-word category. | |
| 3. **Clean Result:** Remove periods and capitalize first letter. | |
| 4. **Validate:** If result is too long (>30 chars), truncate it. | |
| ## Category Examples | |
| - **Programming** - Coding and development tutorials. | |
| - **Medicine** - Health and medical content. | |
| - **Business** - Business management and entrepreneurship. | |
| - **Science** - Physics, chemistry, biology. | |
| - **History** - Historical events and civilizations. | |
| - **Personal Development** - Self-improvement content. | |
| - **Uncategorized** - If categorization fails. | |
| ## Proposed Enhancements | |
| - [ ] Add predefined list of allowed categories. | |
| - [ ] Use embeddings to improve categorization accuracy. | |
| - [ ] Add support for sub-categories. | |
| - [ ] Store categorization results in database for future analysis. | |
| ## Testing | |
| ```python | |
| from src.ai_modules.categorization.categorizer import CategorizationService | |
| categorizer = CategorizationService() | |
| # Categorize text | |
| text = "This video explains how to build a REST API using FastAPI and Python..." | |
| category = await categorizer.categorize_text(text) | |
| print(f"Category: {category}") # Output: Programming | |
| ``` | |
| ## Libraries Used | |
| - `google-genai` - Communicate with Google Gemini. | |
| ## Important Notes | |
| - Currently using `gemini-1.5-flash` model. | |
| - If text is too short (<10 chars), returns "Uncategorized". | |
| - Accuracy can be improved by adding examples in the prompt. | |
| ## Improving the Prompt | |
| To improve categorization accuracy, you can modify the prompt in the file: | |
| ```python | |
| prompt = ( | |
| "Analyze the following text and categorize it into ONE of these categories: " | |
| "Programming, Medicine, Business, Science, History, Personal Development, Education, Technology. " | |
| "Return ONLY the category name.\n\n" | |
| f"Text: {text[:2000]}\n\n" | |
| "Category:" | |
| ) | |
| ``` | |