kpi_analysis / docs /plotting_readme.md
zh3036's picture
Deploy KPI snapshot 2025-06-12
4e67a93

A newer version of the Gradio SDK is available: 6.11.0

Upgrade

Correlation Analysis with Plotting

Overview

The correlation analysis script now includes plotting functionality to visualize the relationships between axiia scores (problem_score, ability_score) and KPI ratings (FY23/24, FY24/25).

Usage

Basic usage with plotting:

python3 analyze_correlations_v2.py -k <kpi_file> -s <scores_file> -p

Example:

python3 analyze_correlations_v2.py -k ../../data/lenovo_kpi.csv -s ../../data/lenovo-scores-0603.csv -o score_corr.yaml -p

Command line options:

  • -k, --kpi: Path to the KPI CSV file (required)
  • -s, --scores: Path to the scores CSV file (required)
  • -o, --output: Output YAML file name (default: score_corr.yaml)
  • -p, --plot: Enable plotting (optional)

Output Files

When plotting is enabled (-p flag), the script generates:

  1. correlation_plots.png: A 2x2 grid showing all four correlations:

    • AC: problem_score vs FY23/24 Rating
    • AD: problem_score vs FY24/25 Rating
    • BC: ability_score vs FY23/24 Rating
    • BD: ability_score vs FY24/25 Rating
  2. Individual plots:

    • correlation_AC.png
    • correlation_AD.png
    • correlation_BC.png
    • correlation_BD.png

Plot Features

Each plot includes:

  • Scatter plot of data points
  • Linear trend line (red dashed)
  • Correlation statistics box showing:
    • Pearson correlation coefficient (r)
    • Spearman correlation coefficient (ρ)
    • Number of valid data points (n)
  • Proper axis labels
  • IPM values formatted as percentages

Test Script

Use test_plotting.py to quickly test the plotting functionality:

python3 test_plotting.py

This will automatically find the data files and run the analysis with plotting enabled.

Requirements

Make sure you have the required packages installed:

pip install pandas numpy scipy pyyaml matplotlib seaborn

Notes

  • The script handles missing data by only using matched emails with complete data
  • Empty or invalid values are excluded from correlation calculations
  • IPM percentage values (e.g., "120%") are automatically converted to decimals
  • The script provides detailed data quality reports showing match rates and missing data