File size: 25,065 Bytes
9858829 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 | # Comment Processing with Agentic Workflow
A scalable, modular system for processing comments from multiple data sources using OpenAI API, LangChain, and LangGraph. The system performs language detection, translation, and context-aware sentiment analysis using an agentic workflow architecture.
## Data Sources Supported
- **Social Media Comments**: External platforms (Facebook, Instagram, YouTube, etc.)
- **Musora Internal Comments**: Comments from Musora internal applications
- **Extensible Architecture**: Easily add new data sources via configuration
## Features
- **Multi-Source Support**: Process comments from multiple data sources with a single codebase
- **Configuration-Driven**: Add new data sources without code changes
- **Parent Comment Context**: Automatically includes parent comment text for reply analysis
- **Modular Agent Architecture**: Extensible base classes for easy addition of new agents
- **Language Detection**: Hybrid approach using lingua library for fast English detection, with LLM fallback for non-English languages
- **Translation**: High-quality translation for non-English comments using OpenAI models
- **Context-Aware Sentiment Analysis**:
- Uses content description for context
- Includes parent comment text when analyzing replies
- Multi-label intent classification
- **LangGraph Workflow**: Flexible graph-based orchestration of agent operations
- **Snowflake Integration**: Seamless data fetching and storage with source-specific tables
- **Parallel Processing**: Multiprocessing support for high-performance batch processing
- **Dynamic Batch Sizing**: Intelligent batch size calculation based on workload and available resources
- **Independent Batch Execution**: Each batch processes and stores results independently
- **Comprehensive Logging**: Detailed logging for monitoring and debugging
- **Scalable Configuration**: Easy-to-modify sentiment categories and intents via JSON config
## Project Structure
```
musora-sentiment-analysis/
βββ agents/
β βββ __init__.py
β βββ base_agent.py # Base class for all agents
β βββ language_detection_agent.py # Language detection agent
β βββ translation_agent.py # Translation agent
β βββ sentiment_analysis_agent.py # Sentiment analysis agent (parent context support)
βββ workflow/
β βββ __init__.py
β βββ comment_processor.py # LangGraph workflow orchestrator
βββ sql/
β βββ fetch_comments.sql # Query for social media comments (with parent join)
β βββ fetch_musora_comments.sql # Query for Musora internal comments (with parent join)
β βββ create_ml_features_table.sql # Schema for social media table (with parent fields)
β βββ init_musora_table.sql # Initialize empty Musora table (run first!)
β βββ create_musora_ml_features_table.sql # Full Musora schema with views (optional)
βββ config_files/
β βββ data_sources_config.json # Data source configuration (NEW)
β βββ sentiment_config.json # Configuration for agents and workflow
β βββ sentiment_analysis_config.json # Sentiment categories and intents
βββ logs/ # Processing logs (auto-created)
βββ LLM.py # LLM utility class
βββ SnowFlakeConnection.py # Snowflake connection handler
βββ main.py # Main execution script (multi-source support)
βββ requirements.txt # Python dependencies
βββ .env # Environment variables (not in git)
βββ README.md # This file
βββ CLAUDE.md # Detailed technical documentation
```
## Setup
### 1. Install Dependencies
```bash
pip install -r requirements.txt
```
### 2. Configure Environment Variables
Ensure your `.env` file contains the required credentials:
```env
# Snowflake
SNOWFLAKE_USER=your_user
SNOWFLAKE_PASSWORD=your_password
SNOWFLAKE_ACCOUNT=your_account
SNOWFLAKE_ROLE=your_role
SNOWFLAKE_DATABASE=SOCIAL_MEDIA_DB
SNOWFLAKE_WAREHOUSE=your_warehouse
SNOWFLAKE_SCHEMA=ML_FEATURES
# OpenAI
OPENAI_API_KEY=your_openai_key
```
### 3. Create Snowflake Tables
Run the SQL scripts to create the output tables:
```bash
# Execute the SQL files in Snowflake
# For social media comments (if not already exists)
sql/create_ml_features_table.sql
# For Musora internal comments - INITIAL SETUP (First time only)
# This creates the empty table structure
sql/init_musora_table.sql
```
**Note**: Run `init_musora_table.sql` before the first Musora comments processing run. After that, you can optionally run `create_musora_ml_features_table.sql` to create the additional views if needed.
## Usage
### Basic Usage (Process All Data Sources)
Process unprocessed comments from all enabled data sources:
```bash
python main.py
```
This will:
- Process all enabled data sources (social media and Musora comments)
- Fetch only comments that haven't been processed yet
- Process them through the workflow using parallel workers (CPU count - 2, max 5)
- Each batch processes and stores to Snowflake independently
- Append new results to the existing tables (no overwrite)
### Process Specific Data Source
Process only social media comments:
```bash
python main.py --data-source social_media
```
Process only Musora internal comments:
```bash
python main.py --data-source musora_comments
```
### Process Limited Number of Comments
Limit applies per data source:
```bash
# Process 100 comments from each enabled data source
python main.py --limit 100
# Process 100 comments from only Musora source
python main.py --limit 100 --data-source musora_comments
```
### Sequential Processing (Debug Mode)
For debugging purposes, use sequential processing:
```bash
python main.py --limit 100 --sequential
```
This processes all comments in a single batch, making it easier to debug issues.
### First Run for New Data Source
For the first run of Musora comments:
1. **First**: Run the initialization SQL script in Snowflake:
```sql
-- Execute in Snowflake
sql/init_musora_table.sql
```
2. **Then**: Run the processing with overwrite flag:
```bash
python main.py --overwrite --data-source musora_comments --limit 100
```
**Why two steps?**
- The fetch query checks for already-processed comments by querying the output table
- On first run, that table doesn't exist, causing an error
- The init script creates the empty table structure first
- Then processing can run normally
**Warning**: Overwrite will replace all existing data in the output table. Only use for initial table creation or when reprocessing from scratch.
### Custom Configuration File
```bash
python main.py --config path/to/custom_config.json
```
### Command-Line Arguments
- `--limit N`: Process only N comments per data source (default: 10000)
- `--overwrite`: Overwrite existing Snowflake table (default: append mode)
- `--config PATH`: Custom configuration file path
- `--sequential`: Use sequential processing instead of parallel (for debugging)
- `--data-source SOURCE`: Process only specific data source (e.g., social_media, musora_comments)
### Parallel Processing
The system uses multiprocessing to process comments in parallel:
**Worker Calculation**:
- Number of workers: `CPU count - 2` (max 5 workers)
- Leaves CPU cores available for system operations
- Example: 8-core system β 5 workers (capped at max)
**Dynamic Batch Sizing**:
- Batch size calculated as: `total_comments / num_workers`
- Minimum batch size: 20 comments
- Maximum batch size: 1000 comments
- Batches β€ 20 comments are not split
**Independent Execution**:
- Each batch runs in a separate process
- Batches store to Snowflake immediately upon completion
- No waiting for all batches to complete
- Failed batches don't affect successful ones
**Performance**:
- Expected speedup: ~1.8-4.5x depending on number of workers
- Real-time progress reporting as batches complete
- Processing time and average per comment displayed in summary
### Incremental Processing
The pipeline is designed for incremental processing:
- **Automatic deduplication**: SQL query excludes comments already in `COMMENT_SENTIMENT_FEATURES`
- **Append-only by default**: New results are added without overwriting existing data
- **Failed comment retry**: Comments with `success=False` are not stored and will be retried in future runs
- **Run regularly**: Safe to run daily/weekly to process new comments
## Configuration
### Data Sources Configuration
The `config_files/data_sources_config.json` file defines available data sources:
```json
{
"data_sources": {
"social_media": {
"name": "Social Media Comments",
"enabled": true,
"sql_query_file": "sql/fetch_comments.sql",
"output_config": {
"table_name": "COMMENT_SENTIMENT_FEATURES",
"database": "SOCIAL_MEDIA_DB",
"schema": "ML_FEATURES"
}
},
"musora_comments": {
"name": "Musora Internal Comments",
"enabled": true,
"sql_query_file": "sql/fetch_musora_comments.sql",
"output_config": {
"table_name": "MUSORA_COMMENT_SENTIMENT_FEATURES",
"database": "SOCIAL_MEDIA_DB",
"schema": "ML_FEATURES"
},
"additional_fields": [
"PERMALINK_URL",
"THUMBNAIL_URL"
]
}
}
}
```
**To add a new data source**: Simply add a new entry to this config file and create the corresponding SQL query file.
### Agent Configuration
The `config_files/sentiment_config.json` file controls agent behavior:
```json
{
"agents": {
"language_detection": {
"model": "gpt-5-nano",
"temperature": 0.0,
"max_retries": 3
},
"translation": {
"model": "gpt-5-nano",
"temperature": 0.3,
"max_retries": 3
},
"sentiment_analysis": {
"model": "gpt-5-nano",
"temperature": 0.2,
"max_retries": 3
}
},
"workflow": {
"description": "Batch size is calculated dynamically based on number of workers (min: 20, max: 1000)",
"parallel_processing": {
"enabled": true,
"worker_calculation": "CPU count - 2, max 5 workers",
"min_batch_size": 20,
"max_batch_size": 1000
}
},
"snowflake": {
"output_table": "COMMENT_SENTIMENT_FEATURES",
"database": "SOCIAL_MEDIA_DB",
"schema": "ML_FEATURES"
}
}
```
**Note**: Batch size is now calculated dynamically and no longer needs to be configured manually.
### Sentiment Categories Configuration
The `config_files/sentiment_analysis_config.json` file defines sentiment categories and intents (easily extensible):
```json
{
"sentiment_polarity": {
"categories": [
{"value": "very_positive", "label": "Very Positive", "description": "..."},
{"value": "positive", "label": "Positive", "description": "..."},
{"value": "neutral", "label": "Neutral", "description": "..."},
{"value": "negative", "label": "Negative", "description": "..."},
{"value": "very_negative", "label": "Very Negative", "description": "..."}
]
},
"intent": {
"categories": [
{"value": "praise", "label": "Praise", "description": "..."},
{"value": "question", "label": "Question", "description": "..."},
{"value": "request", "label": "Request", "description": "..."},
{"value": "feedback_negative", "label": "Negative Feedback", "description": "..."},
{"value": "suggestion", "label": "Suggestion", "description": "..."},
{"value": "humor_sarcasm", "label": "Humor/Sarcasm", "description": "..."},
{"value": "off_topic", "label": "Off Topic", "description": "..."},
{"value": "spam_selfpromo", "label": "Spam/Self-Promotion", "description": "..."}
]
},
"reply_policy": {
"requires_reply_intents": ["question", "request"],
"description": "Comments with these intents should be flagged for reply"
},
"intent_settings": {
"multi_label": true,
"description": "Intent can have multiple labels as a comment can express multiple intents"
}
}
```
## Adding New Agents
The system is designed for easy extensibility. To add a new agent:
### 1. Create Agent Class
```python
from agents.base_agent import BaseAgent
from typing import Dict, Any
class MyNewAgent(BaseAgent):
def __init__(self, config: Dict[str, Any], api_key: str):
super().__init__("MyNewAgent", config)
# Initialize your agent-specific components
def validate_input(self, input_data: Dict[str, Any]) -> bool:
# Validate input data
return True
def process(self, input_data: Dict[str, Any]) -> Dict[str, Any]:
# Implement your agent logic
return {"success": True, "result": "..."}
```
### 2. Update Workflow
Add the agent to `workflow/comment_processor.py`:
```python
# Add to CommentState TypedDict
new_agent_result: str
# Add node
workflow.add_node("my_new_agent", self._my_new_agent_node)
# Add edges
workflow.add_edge("translation", "my_new_agent")
workflow.add_edge("my_new_agent", END)
```
### 3. Update Configuration
Add agent config to `sentiment_config.json`:
```json
{
"agents": {
"my_new_agent": {
"name": "MyNewAgent",
"model": "gpt-4o-mini",
"temperature": 0.5,
"max_retries": 3
}
}
}
```
## Output Schema
### Social Media Comments Table
Stored in `SOCIAL_MEDIA_DB.ML_FEATURES.COMMENT_SENTIMENT_FEATURES`
### Musora Comments Table
Stored in `SOCIAL_MEDIA_DB.ML_FEATURES.MUSORA_COMMENT_SENTIMENT_FEATURES`
### Common Columns (Both Tables)
| Column | Type | Description |
|--------|------|-------------|
| COMMENT_SK | NUMBER | Surrogate key from source |
| COMMENT_ID | VARCHAR | Platform comment ID |
| ORIGINAL_TEXT | VARCHAR | Original comment text |
| **PARENT_COMMENT_ID** | **VARCHAR** | **ID of parent comment if this is a reply** |
| **PARENT_COMMENT_TEXT** | **VARCHAR** | **Text of parent comment for context** |
| DETECTED_LANGUAGE | VARCHAR | Detected language name |
| LANGUAGE_CODE | VARCHAR | ISO 639-1 code |
| IS_ENGLISH | BOOLEAN | Is comment in English |
| TRANSLATED_TEXT | VARCHAR | English translation |
| TRANSLATION_PERFORMED | BOOLEAN | Was translation performed |
| SENTIMENT_POLARITY | VARCHAR | Sentiment (very_positive, positive, neutral, negative, very_negative) |
| INTENT | VARCHAR | Multi-label intents (comma-separated) |
| REQUIRES_REPLY | BOOLEAN | Does comment need a response |
| SENTIMENT_CONFIDENCE | VARCHAR | Analysis confidence (high, medium, low) |
| PROCESSING_SUCCESS | BOOLEAN | Processing status |
| PROCESSED_AT | TIMESTAMP | Processing timestamp |
### Musora-Specific Additional Columns
| Column | Type | Description |
|--------|------|-------------|
| PERMALINK_URL | VARCHAR | Web URL path of the content |
| THUMBNAIL_URL | VARCHAR | Thumbnail URL of the content |
### Available Views
**Social Media:**
- `VW_COMMENTS_REQUIRING_REPLY`: Comments that need responses (includes parent comment info)
- `VW_SENTIMENT_DISTRIBUTION`: Sentiment and intent statistics by channel (includes reply comment count)
- `VW_NON_ENGLISH_COMMENTS`: Filtered view of non-English comments
**Musora:**
- `VW_MUSORA_COMMENTS_REQUIRING_REPLY`: Musora comments needing responses
- `VW_MUSORA_SENTIMENT_DISTRIBUTION`: Musora sentiment and intent statistics
- `VW_MUSORA_NON_ENGLISH_COMMENTS`: Non-English Musora comments
## Workflow Architecture
The system uses LangGraph to create a flexible, state-based workflow:
```
βββββββββββββββββββββββ
β Fetch Comments β
β from Snowflake β
β (Unprocessed Only) β
ββββββββββββ¬βββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Language Detection β
β Agent β
ββββββββββββ¬βββββββββββ
β
βΌ
ββββββ΄βββββ
β English?β
ββββββ¬βββββ
β
βββββββ΄ββββββ
β β
Yes No
β β
β βΌ
β βββββββββββββββ
β β Translation β
β β Agent β
β ββββββββ¬βββββββ
β β
βββββββ¬ββββββ
β
βΌ
ββββββββββββββββββββ
β Sentiment β
β Analysis Agent β
βββββββββββ¬βββββββββ
β
βΌ
ββββββββββββββββ
βStore Results β
βto Snowflake β
β(Append Mode) β
ββββββββββββββββ
```
**Note**: The fetch step automatically excludes comments already present in `COMMENT_SENTIMENT_FEATURES`, enabling incremental processing.
## Logging
Logs are automatically created in the `logs/` directory with timestamps:
```
logs/comment_processing_20251001_143022.log
```
## Adding New Data Sources
The system is designed to make adding new data sources easy:
### Steps to Add a New Source:
1. **Update Configuration** (`config_files/data_sources_config.json`):
```json
"your_new_source": {
"name": "Your New Source Name",
"enabled": true,
"sql_query_file": "sql/fetch_your_source.sql",
"output_config": {
"table_name": "YOUR_SOURCE_SENTIMENT_FEATURES",
"database": "SOCIAL_MEDIA_DB",
"schema": "ML_FEATURES"
},
"additional_fields": ["FIELD1", "FIELD2"] // Optional
}
```
2. **Create SQL Query File** (`sql/fetch_your_source.sql`):
- Fetch comments with consistent column names
- Include self-join for parent comments if available
- Exclude already-processed comments (LEFT JOIN with output table)
3. **Create Table Initialization Script** (`sql/init_your_source_table.sql`):
- Creates empty table structure
- Base schema on `init_musora_table.sql`
- Add source-specific fields as needed
- **Run this in Snowflake FIRST before processing**
4. **Create Full Schema** (optional):
- Base schema on `create_musora_ml_features_table.sql`
- Include views and indexes
5. **Run First Time**:
```bash
# Step 1: Run init script in Snowflake
sql/init_your_source_table.sql
# Step 2: Process first batch
python main.py --overwrite --data-source your_new_source --limit 100
```
**No code changes required!**
## Best Practices
1. **Testing**: Always test with `--limit` flag first (e.g., `--limit 100`)
2. **New Data Sources**: Test new sources with `--sequential --limit 100` first
3. **Debugging**: Use `--sequential` flag for easier debugging of processing issues
4. **Incremental Processing**: Run regularly without `--overwrite` to process only new comments
5. **Monitoring**: Check logs for processing errors and batch completion
6. **Performance**: Use default parallel mode for production workloads
7. **Extensibility**: Follow the base agent pattern for consistency
8. **Error Handling**: All agents include robust error handling
9. **Failed Comments**: Review logs for failed comments - they'll be automatically retried in future runs
10. **Resource Management**: System automatically adapts to available CPU resources
11. **Parent Comments**: Ensure SQL queries include parent comment joins for best accuracy
## Sentiment Analysis Features
### Multi-Label Intent Classification
The system supports **multi-label intent classification**, meaning a single comment can have multiple intents:
- **Example**: "This is amazing! What scale are you using?" β `["praise", "question"]`
- **Example**: "Love this but can you make a tutorial on it?" β `["praise", "request"]`
### Context-Aware Analysis with Parent Comment Support
The sentiment analysis agent provides rich context understanding:
1. **Content Context**: Uses the `content_description` field to understand what the comment is about
2. **Parent Comment Context** (NEW): When analyzing reply comments, the system:
- Automatically detects when a comment is a reply
- Fetches the parent comment text from the database
- Includes parent comment in the LLM prompt
- Explicitly instructs the LLM that this is a reply comment
- Results in more accurate sentiment and intent classification
**Example**:
- Parent Comment: "Does anyone know how to play this riff?"
- Reply Comment: "Yes!"
- Without parent context: Might be classified as unclear/off-topic
- With parent context: Correctly classified as answering a question
This dramatically improves accuracy for:
- Short reply comments ("Yes", "Thanks!", "Agreed")
- Sarcastic replies (context crucial for understanding)
- Continuation of discussions
- Agreement/disagreement comments
### Failure Handling & Reprocessing
Comments that fail sentiment analysis (missing critical fields like sentiment_polarity or intents) are:
- Marked as `success=False` in the workflow
- **NOT stored in Snowflake**
- **Automatically available for reprocessing** in future runs
This ensures only successfully processed comments are stored, while failed comments remain available for retry.
### Incremental Processing & Deduplication
The pipeline automatically handles incremental processing:
- **SQL-level deduplication**: Query excludes comments already in `COMMENT_SENTIMENT_FEATURES` using `LEFT JOIN`
- **Automatic retry**: Failed comments (not stored) are automatically retried on next run
- **Append-only mode**: Default behavior appends new records without overwriting
- **Production-ready**: Safe to run daily/weekly/monthly to process new comments
### Scalable Configuration
To add or modify sentiment categories or intents:
1. Edit `config_files/sentiment_analysis_config.json`
2. Add/modify categories in the `sentiment_polarity` or `intent` sections
3. Update `reply_policy.requires_reply_intents` if needed
4. No code changes required!
## Future Extensions
The modular architecture supports easy addition of:
- Topic classification agent
- Entity extraction agent
- Engagement score prediction agent
- Named entity recognition agent
Simply create a new agent inheriting from `BaseAgent` and add it to the workflow graph.
## Troubleshooting
### Issue: "Object does not exist or not authorized" on First Run
**Error**: `Object 'SOCIAL_MEDIA_DB.ML_FEATURES.MUSORA_COMMENT_SENTIMENT_FEATURES' does not exist or not authorized`
**Cause**: The fetch query tries to check for already-processed comments, but the output table doesn't exist yet on first run.
**Solution**:
1. Run the initialization script first:
```sql
-- Execute in Snowflake
sql/init_musora_table.sql
```
2. Then run the processing:
```bash
python main.py --overwrite --data-source musora_comments --limit 100
```
### Issue: API Rate Limits
If hitting API rate limits, reduce the number of parallel workers or process fewer comments:
```bash
# Process fewer comments at a time
python main.py --limit 500
# Or use sequential mode
python main.py --sequential --limit 100
```
### Issue: Memory Issues
Process in smaller batches using `--limit`:
```bash
python main.py --limit 500
```
### Issue: Debugging Processing Errors
Use sequential mode to debug issues more easily:
```bash
python main.py --sequential --limit 50
```
This processes all comments in a single batch with clearer error messages.
### Issue: Connection Timeouts
Check Snowflake credentials in `.env` and network connectivity.
### Issue: Parallel Processing Not Working
If multiprocessing issues occur, use sequential mode:
```bash
python main.py --sequential
```
## Performance
### Expected Speedup
Parallel processing provides significant performance improvements:
- **Sequential**: 1x (baseline)
- **2 workers**: ~1.8-1.9x faster
- **5 workers**: ~4-4.5x faster
Speedup isn't perfectly linear due to:
- Snowflake connection overhead
- LLM API rate limits (shared across workers)
- I/O operations
### Monitoring Performance
The processing summary includes:
- Total processing time
- Average time per comment
- Number of workers used
- Batch size calculations
- Failed batches (if any)
## License
Internal use only - Musora sentiment analysis project. |