Ahmed Mostafa
server v2
6405808
# Categorization Module 🏷️
## Responsibility
This module handles **automatic categorization of notes**.
## Functionality
1. Receive summary text.
2. Use **Google Gemini** to analyze content.
3. Return a single category (e.g., Programming, Medicine, History).
## Files
### 1. `categorizer.py`
- **Purpose:** Categorize text using AI.
- **Main Class:** `CategorizationService`
- **Key Method:** `categorize_text(text)` - Returns category name.
## How It Works
1. **Receive Text:** Take first 2000 characters from summary.
2. **Send Prompt:** Ask Gemini to determine one or two-word category.
3. **Clean Result:** Remove periods and capitalize first letter.
4. **Validate:** If result is too long (>30 chars), truncate it.
## Category Examples
- **Programming** - Coding and development tutorials.
- **Medicine** - Health and medical content.
- **Business** - Business management and entrepreneurship.
- **Science** - Physics, chemistry, biology.
- **History** - Historical events and civilizations.
- **Personal Development** - Self-improvement content.
- **Uncategorized** - If categorization fails.
## Proposed Enhancements
- [ ] Add predefined list of allowed categories.
- [ ] Use embeddings to improve categorization accuracy.
- [ ] Add support for sub-categories.
- [ ] Store categorization results in database for future analysis.
## Testing
```python
from src.ai_modules.categorization.categorizer import CategorizationService
categorizer = CategorizationService()
# Categorize text
text = "This video explains how to build a REST API using FastAPI and Python..."
category = await categorizer.categorize_text(text)
print(f"Category: {category}") # Output: Programming
```
## Libraries Used
- `google-genai` - Communicate with Google Gemini.
## Important Notes
- Currently using `gemini-1.5-flash` model.
- If text is too short (<10 chars), returns "Uncategorized".
- Accuracy can be improved by adding examples in the prompt.
## Improving the Prompt
To improve categorization accuracy, you can modify the prompt in the file:
```python
prompt = (
"Analyze the following text and categorize it into ONE of these categories: "
"Programming, Medicine, Business, Science, History, Personal Development, Education, Technology. "
"Return ONLY the category name.\n\n"
f"Text: {text[:2000]}\n\n"
"Category:"
)
```