File size: 2,188 Bytes
4e67a93
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
# Correlation Analysis with Plotting

## Overview
The correlation analysis script now includes plotting functionality to visualize the relationships between axiia scores (problem_score, ability_score) and KPI ratings (FY23/24, FY24/25).

## Usage

### Basic usage with plotting:
```bash
python3 analyze_correlations_v2.py -k <kpi_file> -s <scores_file> -p
```

### Example:
```bash
python3 analyze_correlations_v2.py -k ../../data/lenovo_kpi.csv -s ../../data/lenovo-scores-0603.csv -o score_corr.yaml -p
```

### Command line options:
- `-k, --kpi`: Path to the KPI CSV file (required)
- `-s, --scores`: Path to the scores CSV file (required)  
- `-o, --output`: Output YAML file name (default: score_corr.yaml)
- `-p, --plot`: Enable plotting (optional)

## Output Files

When plotting is enabled (-p flag), the script generates:

1. **correlation_plots.png**: A 2x2 grid showing all four correlations:
   - AC: problem_score vs FY23/24 Rating
   - AD: problem_score vs FY24/25 Rating
   - BC: ability_score vs FY23/24 Rating
   - BD: ability_score vs FY24/25 Rating

2. **Individual plots**:
   - correlation_AC.png
   - correlation_AD.png
   - correlation_BC.png
   - correlation_BD.png

## Plot Features

Each plot includes:
- Scatter plot of data points
- Linear trend line (red dashed)
- Correlation statistics box showing:
  - Pearson correlation coefficient (r)
  - Spearman correlation coefficient (ρ)
  - Number of valid data points (n)
- Proper axis labels
- IPM values formatted as percentages

## Test Script

Use `test_plotting.py` to quickly test the plotting functionality:

```bash
python3 test_plotting.py
```

This will automatically find the data files and run the analysis with plotting enabled.

## Requirements

Make sure you have the required packages installed:
```bash
pip install pandas numpy scipy pyyaml matplotlib seaborn
```

## Notes

- The script handles missing data by only using matched emails with complete data
- Empty or invalid values are excluded from correlation calculations
- IPM percentage values (e.g., "120%") are automatically converted to decimals
- The script provides detailed data quality reports showing match rates and missing data