Upload 6 files

Browse files

Files changed (6) hide show

Setup Instructions for All Techniques/[SETUP] Fine-Tuning (Gemma) - General Models.txt +45 -0
Setup Instructions for All Techniques/[SETUP] Fine-Tuning (Gemma) - Hierarchical.txt +280 -0
Setup Instructions for All Techniques/[SETUP] Fine-Tuning (Gemma) - Specifc Models.txt +200 -0
Setup Instructions for All Techniques/[SETUP] LLM.txt +37 -0
Setup Instructions for All Techniques/[SETUP] Rule-Based.txt +33 -0
Setup Instructions for All Techniques/[SETUP] Topic Modeling.txt +38 -0

Setup Instructions for All Techniques/[SETUP] Fine-Tuning (Gemma) - General Models.txt ADDED Viewed

	@@ -0,0 +1,45 @@

+Project Dependencies and Setup Instructions
+1. Python Environment
+   This project requires Python 3.10 or higher.
+2. Required External Libraries
+   The following Python libraries are required to run the data processing and fine-tuning scripts. You can install them using pip:
+   pip install pandas torch transformers peft scikit-learn numpy matplotlib accelerate huggingface_hub
+   Library Descriptions:
+   - pandas: Used for data manipulation and loading CSV files.
+   - torch: PyTorch framework used for deep learning model training.
+   - transformers: Hugging Face library to load the Gemma tokenizer and model.
+   - peft: Parameter-Efficient Fine-Tuning (LoRA) library.
+   - scikit-learn: Used for calculating metrics (F1, precision, recall) and splitting data.
+   - numpy: Used for numerical operations and array manipulation.
+   - matplotlib: Used for generating training loss plots.
+   - accelerate: Helper library often required by Transformers for model loading.
+   - huggingface_hub: Required for authenticating with the Hugging Face Hub.
+3. Hugging Face Authentication (Gemma Model Access)
+   The scripts use the Google Gemma model (e.g., 'google/gemma-3-1b-pt'), which is a gated model. To access it, you must follow these steps:
+   Step A: Grant Access
+   1. Go to the Hugging Face model page (https://huggingface.co/google/gemma-3-1b-pt).
+   2. Log in to your Hugging Face account.
+   3. Review and accept the license terms to gain access.
+   Step B: Authenticate in the Environment
+   You must provide a valid Hugging Face Access Token. You can generate one at https://huggingface.co/settings/tokens.
+   Option 1 (Command Line / Local):
+   Run the following command in your terminal before starting the script:
+   huggingface-cli login
+   (Paste your token when prompted).
+   Option 2 (Google Colab / Jupyter Notebook):
+   If running in a notebook, add a cell at the very top with the following code:
+   from huggingface_hub import login
+   login("YOUR_HUGGING_FACE_TOKEN_HERE")
+4. Hardware Requirements
+   The training scripts are configured to use CUDA (NVIDIA GPU). Ensure you have a GPU enabled environment (e.g., Google Colab with T4/A100 GPU selected in Runtime settings) and the appropriate CUDA drivers installed.

Setup Instructions for All Techniques/[SETUP] Fine-Tuning (Gemma) - Hierarchical.txt ADDED Viewed

	@@ -0,0 +1,280 @@

+PROJECT SETUP GUIDE - RULE-BASED OR LLM (GEMINI) HIERARCHICAL MODEL EVALUATION
+The following setup stated below are the same for both hierarchical codes.
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+SYSTEM REQUIREMENTS
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Python 3.8 or higher
+16GB RAM minimum (32GB recommended)
+NVIDIA GPU with 8GB+ VRAM (required for model evaluation)
+Windows, Linux, or macOS
+15GB free disk space
+Jupyter Notebook or JupyterLab
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+STEP 1: INSTALL PYTHON
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Download and install Python from: https://www.python.org/downloads/
+During installation:
+Check "Add Python to PATH"
+Check "Install pip"
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+STEP 2: INSTALL JUPYTER NOTEBOOK
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Open Command Prompt (Windows) or Terminal (Mac/Linux) and run:
+pip install jupyter notebook
+Or if you prefer JupyterLab:
+pip install jupyterlab
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+STEP 3: INSTALL REQUIRED PACKAGES
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Copy-paste these commands in Command Prompt, Terminal, or Cell:
+pip install pandas numpy scikit-learn transformers peft huggingface-hub
+For GPU support (NVIDIA only):
+pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+STEP 4: SET UP HUGGING FACE ACCOUNT
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+The scripts use the Google Gemma model (e.g., 'google/gemma-3-1b-pt'), which is a gated model. To access it, you must follow these steps:
+Step A: Grant Access
+1. Go to the Hugging Face model page (https://huggingface.co/google/gemma-3-1b-pt).
+2. Log in to your Hugging Face account.
+3. Review and accept the license terms to gain access.
+Step B: Authenticate in the Environment
+You must provide a valid Hugging Face Access Token. You can generate one at https://huggingface.co/settings/tokens.
+Option 1 (Command Line / Local):
+Go to: https://huggingface.co/
+Login or Create a free account
+Go to Settings > Access Tokens
+Create a new token
+Install Hugging Face CLI:
+pip install huggingface-hub
+Login with your token:
+huggingface-cli login
+Paste your token when prompted
+Option 2 (Google Colab / Jupyter Notebook):
+If running in a notebook, add a cell at the very top with the following code:
+from huggingface_hub import login
+login("YOUR_HUGGING_FACE_TOKEN_HERE")
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+STEP 5: CREATE PROJECT FOLDERS
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Create this folder structure anywhere on your computer:
+your_project/
+├── datasets/
+│   ├── Boolean23.csv
+│   ├── test_product_dataset.csv
+│   ├── test_delivery_dataset.csv
+│   ├── test_service_dataset.csv
+│   ├── test_price_dataset.csv
+│   └── test_hierarchy.csv
+├── models/
+│   ├── gemini/
+│   │   ├── gemini_general.pth
+│   │   ├── gemma_product_classifier.pth
+│   │   ├── gemma_delivery_classifier.pth
+│   │   ├── gemma_service_classifier.pth
+│   │   └── gemma_price_classifier.pth
+│   └── rule-based/
+│       ├── rule-based_general.pth
+│       ├── gemma_product_classifier.pth
+│       ├── gemma_delivery_classifier.pth
+│       ├── gemma_service_classifier.pth
+│       └── gemma_price_classifier.pth
+├── gemini_hierarchical.ipynb
+└── rule-based_hierarchical.ipynb
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+STEP 6: PREPARE YOUR DATA FILES
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+You need 6 CSV files with specific columns:
+Boolean23.csv - General aspects test data
+Required columns: Review, Product, Delivery, Price, Service
+test_product_dataset.csv - Product-specific test data
+Required columns: Review, Color_PRO, Condition_PRO, Correctness_PRO,
+Durability_PRO, Effectiveness_PRO, Functionality_PRO,
+Material_PRO, Sensory_PRO, Size_PRO, General_PRO
+test_delivery_dataset.csv - Delivery-specific test data
+Required columns: Review, Condition_DEL, Correctness_DEL, Timeliness_DEL,
+General_DEL
+test_service_dataset.csv - Service-specific test data
+Required columns: Review, Handling_SER, Responsiveness_SER,
+Trustworthiness_SER, General_SER
+test_price_dataset.csv - Price-specific test data
+Required columns: Review, Affordability_PRICE, Value_for_Money_PRICE,
+General_PRICE
+test_hierarchy.csv - Complete hierarchical test data
+Required columns: Review + ALL 25 aspect columns from above
+Important notes:
+Review column: Text with customer feedback
+All label columns: 0 or 1 (binary labels)
+Column names must match exactly (case-sensitive)
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+STEP 7: PREPARE YOUR TRAINED MODELS
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+You need 10 trained model files (.pth format) in the 'models/rule-based/' and 'models/gemini/' folder:
+For gemini:
+gemini_general.pth - General aspect classifier
+gemma_product_classifier.pth - Product-specific classifier
+gemma_delivery_classifier.pth - Delivery-specific classifier
+gemma_service_classifier.pth - Service-specific classifier
+gemma_price_classifier.pth - Price-specific classifier
+For rule-based:
+rule-based_general.pth - General aspect classifier
+gemma_product_classifier.pth - Product-specific classifier
+gemma_delivery_classifier.pth - Delivery-specific classifier
+gemma_service_classifier.pth - Service-specific classifier
+gemma_price_classifier.pth - Price-specific classifier
+These should be trained models from your previous training sessions.
+Important: Each model file must contain:
+model_state_dict: The trained model weights
+optimal_thresholds OR optimized_thresholds: Decision thresholds for each label
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+STEP 8: LAUNCH JUPYTER NOTEBOOK
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Open Command Prompt or Terminal
+Navigate to your project folder:
+cd C:\path\to\your_project
+Launch Jupyter Notebook:
+jupyter notebook
+Or if using JupyterLab:
+jupyter lab
+Your browser will open automatically
+Click on '[rule-based/gemini]_hierarchical.ipynb' to open it
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━��━━━━━━━━━━━━━━━━━━━━━
+STEP 9: RUN THE NOTEBOOK
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Click "Cell" in the top menu
+Click "Run"
+Wait for cell to complete
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+WHAT THE CODE DOES
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+PHASE 1: Individual Model Evaluation
+Loads each of the 5 trained models one at a time
+Evaluates each model on its specific test dataset
+Calculates metrics (accuracy, precision, recall, F1-score, etc.)
+Saves predictions to separate CSV files
+Cleans up memory after each model
+PHASE 2: Hierarchical Model Evaluation
+Loads general model and predicts 4 main aspects (Product, Delivery, Service, Price)
+Loads each specific model and predicts detailed sub-aspects
+Applies hierarchical constraints (if general aspect = 0, all sub-aspects = 0)
+Combines all predictions into complete 25-label predictions
+Evaluates combined hierarchical model performance
+Calculates per-aspect and overall metrics
+PHASE 3: Results and Reports
+Displays comprehensive metrics summary in notebook
+Shows sample predictions with ground truth
+Saves detailed results to CSV and text files
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+OUTPUT FILES
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Individual Model Predictions:
+'[rule-based/gemini]_general_test_predictions.csv'
+'[rule-based/gemini]_product_test_predictions.csv'
+'[rule-based/gemini]_delivery_test_predictions.csv'
+'[rule-based/gemini]_service_test_predictions.csv'
+'[rule-based/gemini]_price_test_predictions.csv'
+Each contains: Original review, predicted labels, probabilities, exact match indicator
+Hierarchical Model Results:
+'[rule-based/gemini]_hierarchical_evaluation_results.csv'
+Complete predictions with hierarchical constraints applied
+'[rule-based/gemini]_hierarchical_metrics_summary.txt'
+Comprehensive metrics report including:
+Overall accuracy and F1 scores
+Per-aspect metrics
+Confusion matrices
+Exact match statistics
+All files will be saved in your project folder.
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+UNDERSTANDING THE OUTPUT
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+In Jupyter Notebook, you'll see output directly below each cell:
+PHASE 1 Output:
+Each model cell shows:
+✓ Model loading progress
+✓ Inference progress (samples processed)
+✓ Metrics summary table
+✓ Per-aspect performance breakdown
+✓ Memory cleanup confirmation
+PHASE 2 Output:
+Hierarchical evaluation shows:
+✓ Step-by-step progress (7 steps)
+✓ General aspect predictions
+✓ Specific aspect predictions
+✓ Hierarchical constraint application
+✓ Per-aspect metrics
+✓ Sample predictions (first 23 reviews)
+✓ Overall performance summary
+Final Summary Cell:
+Comprehensive table showing:
+✓ Individual model results
+✓ Hierarchical model results
+✓ General aspects performance
+✓ Specific aspects performance
+✓ Overall 25-label performance
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+QUICK START CHECKLIST
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+□ Python 3.8+ installed
+□ Jupyter Notebook installed
+□ All required packages installed via pip
+□ GPU with 8GB+ VRAM available
+□ Hugging Face account created and logged in
+□ Project folders created
+□ '[rule-based/gemini]_hierarchical.ipynb' file in project folder
+□ All 6 CSV test files in 'datasets/' folder
+□ All 10 trained model files in 'models/gemini/' and 'models/rule-based/' folder
+□ CSV files have correct column names
+□ Ready to launch: jupyter notebook
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+END OF SETUP GUIDE
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Setup Instructions for All Techniques/[SETUP] Fine-Tuning (Gemma) - Specifc Models.txt ADDED Viewed

	@@ -0,0 +1,200 @@

+PROJECT SETUP GUIDE - SPECIFIC MODELS
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+SYSTEM REQUIREMENTS
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Python 3.8 or higher
+16GB RAM minimum (32GB recommended)
+NVIDIA GPU with 8GB+ VRAM (recommended for faster training)
+Windows, Linux, or macOS
+10GB free disk space
+You may also use Google Colab with T4/A100 GPU selected in Runtime settings
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+STEP 1: INSTALL PYTHON
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Download and install Python from: https://www.python.org/downloads/
+During installation:
+Check "Add Python to PATH"
+Check "Install pip"
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+STEP 2: INSTALL REQUIRED PACKAGES
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Open Command Prompt (Windows) or Terminal (Mac/Linux) and copy-paste these commands:
+pip install pandas numpy scikit-learn matplotlib transformers peft huggingface-hub
+For CPU-only (slower training):
+pip install torch torchvision torchaudio
+For GPU support (NVIDIA only - faster training):
+pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+STEP 3: SET UP HUGGING FACE ACCOUNT
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+The scripts use the Google Gemma model (e.g., 'google/gemma-3-1b-pt'), which is a gated model. To access it, you must follow these steps:
+Step A: Grant Access
+1. Go to the Hugging Face model page (https://huggingface.co/google/gemma-3-1b-pt).
+2. Log in to your Hugging Face account.
+3. Review and accept the license terms to gain access.
+Step B: Authenticate in the Environment
+You must provide a valid Hugging Face Access Token. You can generate one at https://huggingface.co/settings/tokens.
+Option 1 (Command Line / Local):
+Go to: https://huggingface.co/
+Login or Create a free account
+Go to Settings > Access Tokens
+Create a new token
+Install Hugging Face CLI:
+pip install huggingface-hub
+Login with your token:
+huggingface-cli login
+Paste your token when prompted
+Option 2 (Google Colab / Jupyter Notebook):
+If running in a notebook, add a cell at the very top with the following code:
+from huggingface_hub import login
+login("YOUR_HUGGING_FACE_TOKEN_HERE")
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+STEP 4: CREATE PROJECT FOLDERS
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Create this folder structure anywhere on your computer:
+your_project/
+├── datasets/
+│   ├── [rule-based/gemini]/
+│   │   └── [specific aspect]_train_dataset.csv
+│   └── test_[specific aspect]_dataset.csv
+└── [rule-based/gemini]_[specific aspect]_model.py
+The project directory should look like this:
+your_project/
+├── datasets/
+│   ├── rule-based/
+│   │   ├── product_train_dataset.csv
+│   │   ├── delivery_train_dataset.csv
+│   │   ├── price_train_dataset.csv
+│   │   └── service_train_dataset.csv
+│   ├��─ gemini/
+│   │   ├── product_train_dataset.csv
+│   │   ├── delivery_train_dataset.csv
+│   │   ├── price_train_dataset.csv
+│   │   └── service_train_dataset.csv
+│   ├── test_product_dataset.csv
+│   ├── test_delivery_dataset.csv
+│   ├── test_price_dataset.csv
+│   └── test_service_dataset.csv
+├── rule-based_product_model.py
+├── rule-based_delivery_model.py
+├── rule-based_price_model.py
+├── rule-based_service_model.py
+├── gemini_product_model.py
+├── gemini_delivery_model.py
+├── gemini_price_model.py
+└── gemini_service_model.py
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+STEP 5: PREPARE YOUR DATA FILES
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+The CSV files must be in its respective directory.
+Training set: datasets/[rule-based/gemini]/[specific aspect]_train_dataset.csv
+Test set/Ground truth: datasets/test_[specific aspect]_dataset.csv
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+STEP 6: UPDATE MODEL SAVE LOCATION (OPTIONAL)
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+By default, the model saves to: C:\temp\new_models
+If you want to save it somewhere else:
+Open script.py in a text editor
+Find line 416 (around line 416):
+SAVE_DIR = r"C:\temp\new_models"
+Change it to your preferred location:
+Windows example:
+SAVE_DIR = r"C:\Users\YourName\Documents\my_models"
+Mac/Linux example:
+SAVE_DIR = "/home/username/my_models"
+Note: The folder will be created automatically if it doesn't exist.
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+STEP 7: RUN THE CODE
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Open Command Prompt or Terminal
+Navigate to your project folder:
+cd C:\path\to\your_project
+Run the script:
+[rule-based/gemini]_[specific aspect]_model.py
+For example if its Gemini annotated Product specific model, it is "gemini_product_model.py"
+Wait for training to complete (1-4 hours depending on hardware)
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+WHAT THE CODE DOES
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Loads training and test datasets
+Splits training into 80% train / 20% validation
+Trains respective technique annotated model for specific aspect classification
+Optimizes classification thresholds
+Evaluates model performance
+Saves trained model and generates reports
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+OUTPUT FILES
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+gemma_[specific aspect]_specific.pt
+Main trained model file
+gemma_[specific aspect]_classifier.pth
+Model checkpoint with training metadata
+training_loss_plot_[specific aspect].png
+Training progress visualization
+training_loss_per_batch_detailed_[specific aspect].png
+Detailed batch-level training curves
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━��━━━
+CONSOLE OUTPUT EXPLANATION
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+While running, you'll see:
+✓ Dataset loading confirmation
+✓ Class imbalance analysis (positive/negative ratios)
+✓ Training progress for each epoch
+✓ Validation loss after each epoch
+✓ Early stopping notifications
+✓ Optimal threshold calculations
+✓ Classification reports (precision, recall, F1-score)
+✓ Sample predictions vs ground truth
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+QUICK START CHECKLIST
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+□ Python 3.8+ installed
+□ All packages installed via pip
+□ Hugging Face account created and logged in
+□ Project folders created
+□ CSV files placed in correct locations
+□ (Optional) Updated saved model directory "SAVE_DIR" in [rule-based/gemini]_[specific aspect]_model.py
+□ Ready to run: [rule-based/gemini]_[specific aspect]_model.py
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+END OF SETUP GUIDE
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Setup Instructions for All Techniques/[SETUP] LLM.txt ADDED Viewed

	@@ -0,0 +1,37 @@

+# SETUP INSTRUCTIONS FOR LLM MODEL
+Option 1: Quick Start (Google Colab)
+---------------------------------------------------------
+The easiest way to run these notebooks is using Google Colab, which requires no local installation.
+1. Go to https://colab.research.google.com/
+2. Click "File" > "Upload notebook"
+3. Upload the .ipynb file you wish to run: gemini_pipeline.ipynb
+4. Upload the required data files (CSVs, JSONs) to the Colab "Files" sidebar.
+	- go to CSVs, JSONs, and PDFs found in the SOURCE>Data folder
+5. Run `pip install -r requirements.txt` to install dependencies.
+Option 2: Local Installation (Run on your computer)
+---------------------------------------------------------
+Prerequisites: Python 3.8 or higher
+1. Install Jupyter Notebook (if not already installed):
+   Open your terminal/command prompt and run:
+   pip install notebook
+2. Create a Virtual Environment (Recommended):
+   python -m venv venv
+   # Windows:
+   venv\Scripts\activate
+   # Mac/Linux:
+   source venv/bin/activate
+3. Install Project Dependencies:
+   Navigate to the SOURCE>Data folder and run:
+   pip install -r requirements.txt
+4. Start the Application:
+   Run the following command to open the interface:
+   jupyter notebook

Setup Instructions for All Techniques/[SETUP] Rule-Based.txt ADDED Viewed

	@@ -0,0 +1,33 @@

+Rule-Based Keyword Annotator Dependencies and Setup
+1. Python Environment
+   This script requires Python.
+2. Required External Libraries
+   The following Python libraries are required. You can install them using pip:
+   pip install pandas nltk
+   Library Descriptions:
+   - pandas: Used for loading the dataset (CSV) and handling data frames.
+   - nltk: (Natural Language Toolkit) Used for tokenization and accessing standard stopword lists.
+   Note: Other imported modules (re, csv, collections, string, warnings) are part of the standard Python library and do not need installation.
+3. Required Data Files
+   Ensure the following files are present in the same directory as the notebook before running:
+   a. Input Dataset: 'SentiTaglish_ProductsAndServices.csv'
+      The script expects this CSV file containing the reviews to be processed.
+   b. Stopwords File: 'stopwords-new.txt'
+      The script attempts to load a custom list of Filipino stopwords from this file.
+      Ensure this text file exists in the directory.
+4. NLTK Data Downloads
+   The script includes automated commands to download necessary NLTK data.
+   On the first run, ensure you have an internet connection so the script can download:
+   - 'punkt' (Tokenizer models)
+   - 'stopwords' (Standard stopword corpora)
+   If you are running in an offline environment, you must download these NLTK packages beforehand using `nltk.download()`.

Setup Instructions for All Techniques/[SETUP] Topic Modeling.txt ADDED Viewed

	@@ -0,0 +1,38 @@

+Topic Modeling Project Setup (LDA & BERTopic)
+1. Python Environment
+   These scripts require Python 3.8 or higher.
+2. Required External Libraries
+   Install the following libraries to run both the LDA and BERTopic notebooks. You can install them using pip:
+   pip install pandas gensim nltk pyldavis bertopic plotly scikit-learn
+   Library Descriptions:
+   - pandas: Data manipulation and CSV loading.
+   - gensim: Core library for LDA topic modeling.
+   - nltk: Natural Language Toolkit for stopword removal and tokenization.
+   - pyldavis: Interactive visualization for LDA models.
+   - bertopic: Advanced topic modeling technique that leverages transformers (BERTopic notebook).
+   - plotly: Visualization library used by BERTopic.
+   - scikit-learn: Required dependency for BERTopic (and general ML utilities).
+3. Required Data Files
+   Ensure the following files are present in the same directory as the notebooks before running:
+   a. Input Dataset: 'SentiTaglish_ProductsAndServices.csv'
+      Both notebooks require this CSV file containing the reviews to be processed.
+   b. Stopwords File: 'stopwords-new.txt'
+      The LDA script specifically looks for this file to load custom Tagalog/Filipino stopwords.
+      Ensure this text file exists in the directory.
+4. NLTK Data Downloads
+   The scripts include automated commands (`nltk.download('stopwords')`) to download necessary NLTK data.
+   On the first run, ensure you have an internet connection.
+5. Hardware Note (BERTopic)
+   The BERTopic notebook uses transformer models which can be computationally intensive. A GPU is recommended for faster processing, though it will run on a standard CPU (just slower).
+   If running on Google Colab:
+   - Go to Runtime > Change runtime type > Select T4 GPU for better performance.