kpi_analysis / docs /plotting_readme.md
zh3036's picture
Deploy KPI snapshot 2025-06-12
4e67a93
# Correlation Analysis with Plotting
## Overview
The correlation analysis script now includes plotting functionality to visualize the relationships between axiia scores (problem_score, ability_score) and KPI ratings (FY23/24, FY24/25).
## Usage
### Basic usage with plotting:
```bash
python3 analyze_correlations_v2.py -k <kpi_file> -s <scores_file> -p
```
### Example:
```bash
python3 analyze_correlations_v2.py -k ../../data/lenovo_kpi.csv -s ../../data/lenovo-scores-0603.csv -o score_corr.yaml -p
```
### Command line options:
- `-k, --kpi`: Path to the KPI CSV file (required)
- `-s, --scores`: Path to the scores CSV file (required)
- `-o, --output`: Output YAML file name (default: score_corr.yaml)
- `-p, --plot`: Enable plotting (optional)
## Output Files
When plotting is enabled (-p flag), the script generates:
1. **correlation_plots.png**: A 2x2 grid showing all four correlations:
- AC: problem_score vs FY23/24 Rating
- AD: problem_score vs FY24/25 Rating
- BC: ability_score vs FY23/24 Rating
- BD: ability_score vs FY24/25 Rating
2. **Individual plots**:
- correlation_AC.png
- correlation_AD.png
- correlation_BC.png
- correlation_BD.png
## Plot Features
Each plot includes:
- Scatter plot of data points
- Linear trend line (red dashed)
- Correlation statistics box showing:
- Pearson correlation coefficient (r)
- Spearman correlation coefficient (ρ)
- Number of valid data points (n)
- Proper axis labels
- IPM values formatted as percentages
## Test Script
Use `test_plotting.py` to quickly test the plotting functionality:
```bash
python3 test_plotting.py
```
This will automatically find the data files and run the analysis with plotting enabled.
## Requirements
Make sure you have the required packages installed:
```bash
pip install pandas numpy scipy pyyaml matplotlib seaborn
```
## Notes
- The script handles missing data by only using matched emails with complete data
- Empty or invalid values are excluded from correlation calculations
- IPM percentage values (e.g., "120%") are automatically converted to decimals
- The script provides detailed data quality reports showing match rates and missing data