# Correlation Analysis with Plotting ## Overview The correlation analysis script now includes plotting functionality to visualize the relationships between axiia scores (problem_score, ability_score) and KPI ratings (FY23/24, FY24/25). ## Usage ### Basic usage with plotting: ```bash python3 analyze_correlations_v2.py -k -s -p ``` ### Example: ```bash python3 analyze_correlations_v2.py -k ../../data/lenovo_kpi.csv -s ../../data/lenovo-scores-0603.csv -o score_corr.yaml -p ``` ### Command line options: - `-k, --kpi`: Path to the KPI CSV file (required) - `-s, --scores`: Path to the scores CSV file (required) - `-o, --output`: Output YAML file name (default: score_corr.yaml) - `-p, --plot`: Enable plotting (optional) ## Output Files When plotting is enabled (-p flag), the script generates: 1. **correlation_plots.png**: A 2x2 grid showing all four correlations: - AC: problem_score vs FY23/24 Rating - AD: problem_score vs FY24/25 Rating - BC: ability_score vs FY23/24 Rating - BD: ability_score vs FY24/25 Rating 2. **Individual plots**: - correlation_AC.png - correlation_AD.png - correlation_BC.png - correlation_BD.png ## Plot Features Each plot includes: - Scatter plot of data points - Linear trend line (red dashed) - Correlation statistics box showing: - Pearson correlation coefficient (r) - Spearman correlation coefficient (ρ) - Number of valid data points (n) - Proper axis labels - IPM values formatted as percentages ## Test Script Use `test_plotting.py` to quickly test the plotting functionality: ```bash python3 test_plotting.py ``` This will automatically find the data files and run the analysis with plotting enabled. ## Requirements Make sure you have the required packages installed: ```bash pip install pandas numpy scipy pyyaml matplotlib seaborn ``` ## Notes - The script handles missing data by only using matched emails with complete data - Empty or invalid values are excluded from correlation calculations - IPM percentage values (e.g., "120%") are automatically converted to decimals - The script provides detailed data quality reports showing match rates and missing data