Spaces:

zh3036
/

kpi_analysis

Sleeping

App Files Files Community

kpi_analysis / docs /plotting_readme.md

zh3036

Deploy KPI snapshot 2025-06-12

4e67a93 10 months ago

preview code

raw

history blame contribute delete

2.19 kB

	# Correlation Analysis with Plotting

	## Overview
	The correlation analysis script now includes plotting functionality to visualize the relationships between axiia scores (problem_score, ability_score) and KPI ratings (FY23/24, FY24/25).

	## Usage

	### Basic usage with plotting:
	```bash
	python3 analyze_correlations_v2.py -k <kpi_file> -s <scores_file> -p
	```

	### Example:
	```bash
	python3 analyze_correlations_v2.py -k ../../data/lenovo_kpi.csv -s ../../data/lenovo-scores-0603.csv -o score_corr.yaml -p
	```

	### Command line options:
	- `-k, --kpi`: Path to the KPI CSV file (required)
	- `-s, --scores`: Path to the scores CSV file (required)
	- `-o, --output`: Output YAML file name (default: score_corr.yaml)
	- `-p, --plot`: Enable plotting (optional)

	## Output Files

	When plotting is enabled (-p flag), the script generates:

	1. correlation_plots.png: A 2x2 grid showing all four correlations:
	- AC: problem_score vs FY23/24 Rating
	- AD: problem_score vs FY24/25 Rating
	- BC: ability_score vs FY23/24 Rating
	- BD: ability_score vs FY24/25 Rating

	2. Individual plots:
	- correlation_AC.png
	- correlation_AD.png
	- correlation_BC.png
	- correlation_BD.png

	## Plot Features

	Each plot includes:
	- Scatter plot of data points
	- Linear trend line (red dashed)
	- Correlation statistics box showing:
	- Pearson correlation coefficient (r)
	- Spearman correlation coefficient (ρ)
	- Number of valid data points (n)
	- Proper axis labels
	- IPM values formatted as percentages

	## Test Script

	Use `test_plotting.py` to quickly test the plotting functionality:

	```bash
	python3 test_plotting.py
	```

	This will automatically find the data files and run the analysis with plotting enabled.

	## Requirements

	Make sure you have the required packages installed:
	```bash
	pip install pandas numpy scipy pyyaml matplotlib seaborn
	```

	## Notes

	- The script handles missing data by only using matched emails with complete data
	- Empty or invalid values are excluded from correlation calculations
	- IPM percentage values (e.g., "120%") are automatically converted to decimals
	- The script provides detailed data quality reports showing match rates and missing data