Spaces:
Sleeping
Sleeping
| # Correlation Analysis with Plotting | |
| ## Overview | |
| The correlation analysis script now includes plotting functionality to visualize the relationships between axiia scores (problem_score, ability_score) and KPI ratings (FY23/24, FY24/25). | |
| ## Usage | |
| ### Basic usage with plotting: | |
| ```bash | |
| python3 analyze_correlations_v2.py -k <kpi_file> -s <scores_file> -p | |
| ``` | |
| ### Example: | |
| ```bash | |
| python3 analyze_correlations_v2.py -k ../../data/lenovo_kpi.csv -s ../../data/lenovo-scores-0603.csv -o score_corr.yaml -p | |
| ``` | |
| ### Command line options: | |
| - `-k, --kpi`: Path to the KPI CSV file (required) | |
| - `-s, --scores`: Path to the scores CSV file (required) | |
| - `-o, --output`: Output YAML file name (default: score_corr.yaml) | |
| - `-p, --plot`: Enable plotting (optional) | |
| ## Output Files | |
| When plotting is enabled (-p flag), the script generates: | |
| 1. **correlation_plots.png**: A 2x2 grid showing all four correlations: | |
| - AC: problem_score vs FY23/24 Rating | |
| - AD: problem_score vs FY24/25 Rating | |
| - BC: ability_score vs FY23/24 Rating | |
| - BD: ability_score vs FY24/25 Rating | |
| 2. **Individual plots**: | |
| - correlation_AC.png | |
| - correlation_AD.png | |
| - correlation_BC.png | |
| - correlation_BD.png | |
| ## Plot Features | |
| Each plot includes: | |
| - Scatter plot of data points | |
| - Linear trend line (red dashed) | |
| - Correlation statistics box showing: | |
| - Pearson correlation coefficient (r) | |
| - Spearman correlation coefficient (ρ) | |
| - Number of valid data points (n) | |
| - Proper axis labels | |
| - IPM values formatted as percentages | |
| ## Test Script | |
| Use `test_plotting.py` to quickly test the plotting functionality: | |
| ```bash | |
| python3 test_plotting.py | |
| ``` | |
| This will automatically find the data files and run the analysis with plotting enabled. | |
| ## Requirements | |
| Make sure you have the required packages installed: | |
| ```bash | |
| pip install pandas numpy scipy pyyaml matplotlib seaborn | |
| ``` | |
| ## Notes | |
| - The script handles missing data by only using matched emails with complete data | |
| - Empty or invalid values are excluded from correlation calculations | |
| - IPM percentage values (e.g., "120%") are automatically converted to decimals | |
| - The script provides detailed data quality reports showing match rates and missing data |