File size: 4,160 Bytes
4e67a93
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
---
title: KPI Score Correlation Analysis
emoji: 📊
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "5.33.0"
app_file: kpi_correlation_app.py
pinned: false
---

# KPI Score Correlation Analysis

This directory contains tools for analyzing correlations between IPM scores and axiia scores.

## Architecture

The code has been refactored to share common functionality:

- **`correlation_analysis_core.py`**: Core analysis module with shared functions
- **`csv_utils.py`**: Utilities for loading CSV/Excel files
- **`analyze_correlations_v2.py`**: Command-line interface (CLI)
- **`kpi_correlation_app.py`**: Gradio web interface

## Installation

Ensure you have the required dependencies:

```bash
pip install pandas numpy scipy matplotlib seaborn gradio pyyaml openpyxl xlrd
```

## Usage

### Command-Line Interface

The CLI tool is best for batch processing and automation:

```bash
# Basic usage
python3 analyze_correlations_v2.py -k kpi_file.csv -s scores_file.csv

# With custom output file
python3 analyze_correlations_v2.py -k kpi_file.csv -s scores_file.csv -o results.yaml

# With plots
python3 analyze_correlations_v2.py -k kpi_file.csv -s scores_file.csv -p

# Full example
python3 analyze_correlations_v2.py \
  -k ../../data/lenovo_kpi.csv \
  -s ../../data/lenovo-scores-0603.csv \
  -o score_corr.yaml \
  -p
```

Options:
- `-k, --kpi`: Path to KPI file (CSV or Excel)
- `-s, --scores`: Path to scores file (CSV)
- `-o, --output`: Output YAML file (default: score_corr.yaml)
- `-p, --plot`: Generate correlation plots

### Gradio Web Interface

The Gradio app provides an interactive UI with parameterized analysis:

```bash
# Basic usage (uses default scores file in same directory)
python3 kpi_correlation_app.py

# With custom scores file
python3 kpi_correlation_app.py --scores-file path/to/scores.csv

# Share publicly
python3 kpi_correlation_app.py --share

# Custom port
python3 kpi_correlation_app.py --port 8080

# Full example
python3 kpi_correlation_app.py \
  --scores-file ../../data/lenovo-scores-0603.csv \
  --port 7860
```

Options:
- `--scores-file`: Path to scores CSV file (default: lenovo-scores-0603.csv)
- `--share`: Create a public link
- `--port`: Port to run on (default: 7860)

The Gradio interface provides the following parameterized features:

1. **Data Selection**:
   - Choose between different KPI files (CSV/Excel)
   - Select specific score columns for analysis
   - Filter data by manager status and other criteria

2. **Analysis Parameters**:
   - Correlation method selection (Pearson/Spearman)
   - Confidence level adjustment
   - Sample size requirements
   - Outlier detection thresholds

3. **Visualization Options**:
   - Plot type selection (scatter, regression, etc.)
   - Color scheme customization
   - Figure size and DPI settings
   - Trend line display options

4. **Output Configuration**:
   - Export format selection (YAML/CSV/Excel)
   - Custom output file naming
   - Detailed vs. summary report options
   - Plot export settings

All parameters can be adjusted in real-time through the web interface, with immediate updates to the analysis results and visualizations.

## Input File Requirements

### KPI File
- Must contain an email column (case-insensitive)
- Must contain IPM columns for FY23/24 and FY24/25
- Supports CSV and Excel formats

### Scores File
- Must contain columns: `email`, `problem_score`, `ability_score`
- CSV format

## Output

### CLI Output
- Console output with data quality report and correlation analysis
- YAML file with detailed results
- Optional PNG plots (individual and combined)

### Gradio Output
- Interactive web interface
- Real-time analysis results
- Interactive scatter plots with trend lines
- Data quality statistics

## Correlation Pairs Analyzed

- **AC**: problem_score vs FY23/24 IPM
- **AD**: problem_score vs FY24/25 IPM
- **BC**: ability_score vs FY23/24 IPM
- **BD**: ability_score vs FY24/25 IPM

Each pair shows:
- Pearson correlation coefficient
- Spearman correlation coefficient
- Number of valid samples
- Data quality metrics

# HF deployment

git subtree push --prefix=data-analysis/kpi_score_analysis hfspace main