Spaces:

Gmrock
/

engineer-impact

Sleeping

App Files Files Community

Gmrock commited on May 29

Commit

fef9b59

verified ·

1 Parent(s): 3dfd68e

Upload 5 files

Browse files

Files changed (5) hide show

README.md +245 -9
app.py +338 -0
fetch_data.py +178 -0
posthog_impact_data.csv +107 -0
requirements.txt +4 -0

README.md CHANGED Viewed

@@ -1,12 +1,248 @@
 ---
-title: Engineer Impact
-emoji: 👁
-colorFrom: blue
-colorTo: green
-sdk: docker
-pinned: false
-license: afl-3.0
-short_description: Most impactful engineer in public repo
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# 🏛️ Engineering Impact Dashboard
+A hybrid quantitative–qualitative engineering leadership engine that moves beyond naive developer analytics (such as counting commits or lines of code) and instead measures **engineering leverage, intent, and team citizenship**.
+This framework scales telemetry relative to active team baselines, filters out low-signal automated activity, rewards high-leverage structural work, and incorporates qualitative leadership impact.
+---
+# 🧭 Core Philosophy & Pillars
+Traditional engineering trackers are often easy to game and can alienate developers. This project evaluates engineering value across four strategic pillars:
+## 📦 Execution Baseline
+Measures operational scope, complex feature delivery, and high-priority issue resolution.
+The engine scans pull request metadata for:
+* Critical indicators
+* Bug labels and fix signals
+* Architectural modifications
+* Scope and delivery patterns
+---
+## 💬 Collaboration & Mentorship
+Quantifies engineering leverage and team citizenship.
+The framework analyzes code review behavior using a **Substantive Word Filter (>15 words)** to isolate meaningful engineering feedback from low-signal approvals such as:
+* "LGTM"
+* "Looks good"
+* Rubber-stamp reviews
+This helps surface engineers contributing thoughtful mentorship and review depth.
+---
+## 🛑 System Quality
+Tracks production stability and defensive engineering practices.
+The system introduces a structural accountability layer by applying deduction penalties for:
+* Triggered Git reverts
+* Avoidable regressions
+* Stability-related disruptions
+---
+## 🤝 Human Touch
+A qualitative layer completed by engineering managers to capture high-value leadership signals that repositories cannot measure directly, including:
+* Architectural planning
+* Team leadership
+* Mentorship
+* Incident responsiveness
+* Availability during unscripted operational escalations
+---
+# 📐 How the Scoring Engine Works
+The scoring model avoids rigid quotas by using **Peer Cohort Normalization**.
+Instead of evaluating engineers against fixed thresholds, raw metrics are scaled relative to the strongest contributor (**Peak**) inside a rolling **90-day window**.
+This ensures performance expectations adapt naturally to:
+* Team velocity
+* Product lifecycle stage
+* Organizational priorities
+### Pillar Component Ratio
+```math
+Pillar Component Ratio =
+Individual Raw Value / Cohort Max Ceiling (90-day Peak)
+```
+### Impact Score Formula
+An engineer’s final score is dynamically calculated across all weighted pillars and capped at **100 points**.
+```math
+Impact Score =
+Σ (Normalized Pillar Strength × Strategy Weight) × 100
+```
+---
+# 🛠️ System Architecture
+The ecosystem consists of a lightweight two-tier telemetry pipeline:
+```text
+        [ GitHub API Engine ]
+                 │
+                 ▼
+ (Extracts Raw Telemetry & Text Filters)
+       ┌──────────────────────────┐
+       │      fetch_data.py       │
+       └──────────────────────────┘
+                 │
+                 ▼
+      (Persists Metrics Matrix)
+       ┌──────────────────────────┐
+       │ posthog_impact_data.csv  │
+       └──────────────────────────┘
+                 │
+                 ▼
+(Dynamic Weights & Normalization Engine)
+       ┌──────────────────────────┐
+       │   app.py (Streamlit UI)  │
+       └──────────────────────────┘
+```
+## `fetch_data.py`
+The ingestion pipeline.
+Responsibilities include:
+* Connecting to repository APIs
+* Parsing pull request labels
+* Tracking merge timelines
+* Measuring code review comment depth
+* Detecting revert activity
+* Persisting telemetry into:
+```text
+posthog_impact_data.csv
+```
+## `app.py`
+The interactive leadership dashboard built using Streamlit.
+Responsibilities include:
+* Reading telemetry matrices
+* Applying normalization logic
+* Dynamically adjusting strategy weights
+* Re-scoring engineers in real time based on business priorities
+---
+# 🚀 Quick Start & Installation
+## 1. Clone the Repository
+```bash
+git clone https://github.com/gmrock/engineer-impact.git
+cd engineer-impact
+```
+## 2. Install Dependencies
+Ensure you have **Python 3.9+** installed.
+Then install the required packages:
+```bash
+pip install -r requirements.txt
+```
+## 3. Generate Telemetry Cache
+Run the ingestion pipeline:
+```bash
+python fetch_data.py
+```
+This step populates:
+```text
+posthog_impact_data.csv
+```
+with the underlying telemetry baseline variables.
+You may configure environment credentials to connect against production repository APIs.
+## 4. Launch the Dashboard
+Start the Streamlit application locally:
+```bash
+streamlit run app.py
+```
+---
+# ⚙️ Strategic Priority Alignment in Practice
+Instead of enforcing a rigid definition of engineering impact, the dashboard gives leadership dynamic control through adjustable strategy weights.
+## 🚀 Feature Shipping Sprint
+Increase **Execution Weight** (`0.50+`) to prioritize:
+* Feature throughput
+* Fast iteration cycles
+* Delivery velocity
+---
+## 🛡️ System Stability Freeze
+Increase **System Quality Weight** (`0.40+`) when reliability becomes the top priority.
+This shifts rewards toward engineers who:
+* Stabilize production systems
+* Reduce regressions
+* Prevent reverts
+* Slow feature development to improve reliability
 ---
+## 👥 Mentorship & Onboarding Focus
+Increase **Collaboration Weight** to recognize engineers investing time in:
+* Detailed code reviews
+* Technical mentoring
+* Structural engineering guidance
+* Onboarding support
 ---
+# 🎯 Why This Exists
+Most engineering metrics systems optimize for **activity**.
+This framework optimizes for **impact**.
+Rather than rewarding sheer output volume, it attempts to surface engineers who:
+* Create leverage
+* Improve system reliability
+* Mentor teammates
+* Make thoughtful architectural contributions
+* Increase overall engineering effectiveness

app.py ADDED Viewed

	@@ -0,0 +1,338 @@

+import streamlit as st
+import pandas as pd
+import numpy as np
+from datetime import datetime, timedelta
+# Set page layout to wide for dashboard tracking
+st.set_page_config(layout="wide", page_title="PostHog Engineering Impact Dashboard")
+# -------------------------------------------------------------
+# 🎯 INJECTED CSS: HIDES STREAMLIT ROW-SELECTION BUTTONS COLUMN
+# -------------------------------------------------------------
+st.html("""
+    <style>
+        /* Target and completely hide the data grid's row-selection column wrapper */
+        div[data-testid="stDataFrame"] [class*="gdg-row-header"],
+        div[data-testid="stDataFrame"] .glide-data-grid-row-header-container,
+        div[data-testid="stDataFrame"] th[class*="row-header"] {
+            display: none !important;
+            width: 0px !important;
+        }
+    </style>
+""")
+# Load the data generated by fetch_data.py
+try:
+    df = pd.read_csv("posthog_impact_data.csv")
+except FileNotFoundError:
+    st.error("❌ Data file 'posthog_impact_data.csv' not found. Please run 'python fetch_data.py' first to collect telemetry.")
+    st.stop()
+# -------------------------------------------------------------
+# DYNAMIC TIMELINE DETECTOR
+# -------------------------------------------------------------
+end_date = datetime.now()
+start_date = end_date - timedelta(days=90)
+date_string = f"🗓️ Duration: {start_date.strftime('%b %d, %Y')} – {end_date.strftime('%b %d, %Y')} (Past 90 Days)"
+# -------------------------------------------------------------
+# SIDEBAR: CORE PILLARS PHILOSOPHY & CONTROLS
+# -------------------------------------------------------------
+st.sidebar.title("🏛️ Impact Framework Definitions")
+st.sidebar.markdown("""
+**📦 1. Execution:**
+Measures operational scope, and handling of complex features. Blends bug Fix tags, core architectural, library, infrastructure, core, critical, P0, P1 text/labels/tags matches.
+***
+**💬 2. Collaboration:**
+Quantifies engineering leverage and team citizenship. Blends *Review Actions* with a *Rubber-Stamp Filter* (>15 words) to isolate meaningful mentorship.
+***
+**🛑 3. System Quality:**
+Tracks production stability and defensive coding. Evaluates long-term stability by applying a deduction penalty for triggered *Git Reverts*.
+***
+**🤝 4. Human Touch:**
+Captures critical qualitive values provided through direct team leadership, presence during incident escalation triage, and guidance in design/planning syncs.
+""")
+st.sidebar.markdown("---")
+st.sidebar.header("⚖️ Strategic Priority Weights")
+st.sidebar.markdown("Adjust macro priorities based on organizational needs:")
+# Default weights: 0.35, 0.35, 0.20, 0.10
+exec_w = st.sidebar.slider("Execution Weight", 0.0, 1.0, 0.35, 0.05)
+collab_w = st.sidebar.slider("Collaboration Weight", 0.0, 1.0, 0.35, 0.05)
+quality_w = st.sidebar.slider("System Quality Weight", 0.0, 1.0, 0.20, 0.05)
+human_w = st.sidebar.slider("Human Touch Weight", 0.0, 1.0, 0.10, 0.05)
+# Defensive Zero-Weight Divide-by-Zero Guard
+total_weight = exec_w + collab_w + quality_w + human_w
+if np.isclose(total_weight, 0.0):
+    exec_w_norm = 0.25
+    collab_w_norm = 0.25
+    quality_w_norm = 0.25
+    human_w_norm = 0.25
+    st.sidebar.info("ℹ️ All weights set to 0. Defaulting to an equal split (25% each) to prevent math errors.")
+else:
+    exec_w_norm = exec_w / total_weight
+    collab_w_norm = collab_w / total_weight
+    quality_w_norm = quality_w / total_weight
+    human_w_norm = human_w / total_weight
+# -------------------------------------------------------------
+# CORE METRICS ENGINE: Peer Cohort Normalization
+# -------------------------------------------------------------
+max_prs = df['prs_merged'].max() if df['prs_merged'].max() > 0 else 1
+max_bugs = df['bug_fixes'].max() if df['bug_fixes'].max() > 0 else 1
+max_mult = df['multiplier_impact'].max() if df['multiplier_impact'].max() > 0 else 1
+max_actions = df['review_actions'].max() if df['review_actions'].max() > 0 else 1
+max_words = df['review_words_written'].max() if df['review_words_written'].max() > 0 else 1
+max_reverts = df['reverts_triggered'].max() if df['reverts_triggered'].max() > 0 else 1
+# Synthesize normalized values (0.0 to 1.0)
+df['norm_prs'] = df['prs_merged'] / max_prs
+df['norm_bugs'] = df['bug_fixes'] / max_bugs
+df['norm_mult'] = df['multiplier_impact'] / max_mult
+df['norm_actions'] = df['review_actions'] / max_actions
+df['norm_words'] = df['review_words_written'] / max_words
+df['norm_reverts'] = df['reverts_triggered'] / max_reverts
+# Human Touch Core Mock Value Generator
+df['human_touch_baseline'] = 0.85
+# Calculate Internal Pillar Strengths
+df['Execution_Pillar'] = (df['norm_prs'] * 0.4) + (df['norm_bugs'] * 0.3) + (df['norm_mult'] * 0.3)
+df['Collaboration_Pillar'] = (df['norm_actions'] * 0.5) + (df['norm_words'] * 0.5)
+df['Quality_Pillar'] = 1.0 - df['norm_reverts']
+df['Human_Pillar'] = df['human_touch_baseline']
+# Calculate final component contribution points
+df['Exec_Contribution'] = df['Execution_Pillar'] * exec_w_norm * 100
+df['Collab_Contribution'] = df['Collaboration_Pillar'] * collab_w_norm * 100
+df['Quality_Contribution'] = df['Quality_Pillar'] * quality_w_norm * 100
+df['Human_Contribution'] = df['Human_Pillar'] * human_w_norm * 100
+# Calculate Final Aggregated Impact Score
+df['Impact_Score'] = df['Exec_Contribution'] + df['Collab_Contribution'] + df['Quality_Contribution'] + df['Human_Contribution']
+# Sort dataset by absolute overall impact
+df = df.sort_values(by="Impact_Score", ascending=False).reset_index(drop=True)
+# -------------------------------------------------------------
+# MAIN DISPLAY: LEADERBOARD MATRIX WITH DIRECT ROW SELECTION
+# -------------------------------------------------------------
+st.title("🏛️ PostHog Engineering Impact Leaderboard")
+st.markdown(f"**{date_string}**")
+st.caption("💡 Click on checkbox on the engineer's row below to instantly update their deep-dive profile.")
+# Dynamic row count limiter dropdown
+view_option = st.selectbox(
+    "Set Leaderboard Depth Range:",
+    options=["Top 5", "Top 10", "Top 20", "Top 30", "View All Teams"],
+    index=0
+)
+if view_option == "Top 5":
+    limit = 5
+elif view_option == "Top 10":
+    limit = 10
+elif view_option == "Top 20":
+    limit = 20
+elif view_option == "Top 30":
+    limit = 30
+else:
+    limit = len(df)
+# Prepare clean dataframe containing active slice data
+leaderboard_slice = df.head(limit).copy()
+# Dynamically calculate the maximum points possible per column based on weights
+max_exec_possible = exec_w_norm * 100
+max_collab_possible = collab_w_norm * 100
+max_quality_possible = quality_w_norm * 100
+max_human_possible = human_w_norm * 100
+# Construct display dataframe with explicit Max Point indicators in headers
+display_df = pd.DataFrame({
+    'Engineer Username': leaderboard_slice['engineer'],
+    '🏅 Total Impact Score (out of 100)': leaderboard_slice['Impact_Score'].round(1),
+    f'📦 Execution (Max {max_exec_possible:.1f} pts)': leaderboard_slice['Exec_Contribution'].round(1),
+    f'💬 Collaboration (Max {max_collab_possible:.1f} pts)': leaderboard_slice['Collab_Contribution'].round(1),
+    f'🛑 System Quality (Max {max_quality_possible:.1f} pts)': leaderboard_slice['Quality_Contribution'].round(1),
+    f'🤝 Human Touch (Max {max_human_possible:.1f} pts)': leaderboard_slice['Human_Contribution'].round(1)
+})
+# Dynamically calculate optimal table height to eliminate empty rows
+row_height = 35
+header_height = 40
+calculated_height = min(header_height + (len(display_df) * row_height), 450)
+# Render interactive table with selection tracking active
+selection = st.dataframe(
+    display_df.style.format({
+        '🏅 Total Impact Score (out of 100)': '{:.1f}',
+        f'📦 Execution (Max {max_exec_possible:.1f} pts)': '{:.1f}',
+        f'💬 Collaboration (Max {max_collab_possible:.1f} pts)': '{:.1f}',
+        f'🛑 System Quality (Max {max_quality_possible:.1f} pts)': '{:.1f}',
+        f'🤝 Human Touch (Max {max_human_possible:.1f} pts)': '{:.1f}'
+    }),
+    use_container_width=True,
+    height=calculated_height,
+    hide_index=True,
+    on_select="rerun",
+    selection_mode="single-row-required"
+)
+# -------------------------------------------------------------
+# MASTER-DETAIL VIEW: DYNAMIC METRICS AUDITOR
+# -------------------------------------------------------------
+st.markdown("<br>", unsafe_allow_html=True)
+st.markdown("---")
+# Extract chosen engineer row natively without checking box arrays
+if selection and selection.get("selection", {}).get("rows"):
+    selected_row_idx = selection["selection"]["rows"][0]
+    eng_row = leaderboard_slice.iloc[selected_row_idx]
+else:
+    # Safely fall back to the absolute top engineer on landing
+    eng_row = df.iloc[0]
+selected_eng = eng_row['engineer']
+# --- ADDED: DIRECT MATH PROOF OF THE MAIN MATRIX ACCURACY ---
+st.info(
+    f"📊 **Formula Proof for {selected_eng}:** "
+    f"📦 Execution (`{eng_row['Exec_Contribution']:.1f}`) + "
+    f"💬 Collaboration (`{eng_row['Collab_Contribution']:.1f}`) + "
+    f"🛑 Quality (`{eng_row['Quality_Contribution']:.1f}`) + "
+    f"🤝 Human Touch (`{eng_row['Human_Contribution']:.1f}`) = "
+    f"**🏅 Total Impact Score of {eng_row['Impact_Score']:.1f} / 100**"
+)
+st.subheader(f"🔍 Deep-Dive Calculation Audit Engine: {selected_eng}")
+col1, col2 = st.columns([1, 2], gap="large")
+with col1:
+    st.metric("Overall Performance Rating", f"{eng_row['Impact_Score']:.1f} / 100")
+    st.markdown(f"""
+    **Active Weight Allocation Matrix:**
+    * 📦 **Execution Contribution:** `{eng_row['Exec_Contribution']:.1f}` pts
+    * 💬 **Collaboration Contribution:** `{eng_row['Collab_Contribution']:.1f}` pts
+    * 🛑 **System Quality Contribution:** `{eng_row['Quality_Contribution']:.1f}` pts
+    * 🤝 **Human Touch Contribution:** `{eng_row['Human_Contribution']:.1f}` pts
+    """)
+with col2:
+    st.markdown("#### **Line-Item Pillar Math Breakdowns**")
+    # -------------------------------------------------------------
+    # PILLAR 1: EXECUTION DEEP DIVE
+    # -------------------------------------------------------------
+    with st.expander(f"📦 Execution Pillar Breakdown: {eng_row['Exec_Contribution']:.1f} pts", expanded=False):
+        st.markdown("**1. Cohort Normalization (Raw vs Peak Team Ceiling):**")
+        st.markdown(f"- Merged PRs: `{int(eng_row['prs_merged'])}` / `{int(max_prs)}` Max = **{eng_row['norm_prs']:.3f}** ratio")
+        st.markdown(f"- Bug Fixes: `{int(eng_row['bug_fixes'])}` / `{int(max_bugs)}` Max = **{eng_row['norm_bugs']:.3f}** ratio")
+        st.markdown(f"- **Impact Multipliers:** `{int(eng_row['multiplier_impact'])}` / `{int(max_mult)}` Max = **{eng_row['norm_mult']:.3f}** ratio")
+        st.markdown("""
+            > 💡 **What is an Impact Multiplier?** \n
+            > This tracks high-leverage architectural code contributions. It scans text logs, labels, and files across your pull requests for engineering foundations that multiply the velocity of other teams:
+            > * 🛠️ **Infrastructure & Shared Libraries** (`lib`, `infra`, `framework`)
+            > * ⚡ **Core System Optimization** (`core`, `performance`, `latency`)
+            > * 🔒 **Security & High-Criticality Guards** (`critical`, `P0`, `P1`, `security`, `auth`)
+        """)
+        st.markdown("**2. Composite Subsystem Weight Assembly Formula:**")
+        st.code(f"""
+Execution Baseline Score = (Norm_PRs * 0.4) + (Norm_Bugs * 0.3) + (Norm_Multipliers * 0.3)
+                         = ({eng_row['norm_prs']:.3f} * 0.4) + ({eng_row['norm_bugs']:.3f} * 0.3) + ({eng_row['norm_mult']:.3f} * 0.3)
+                         = {eng_row['Execution_Pillar']:.3f}
+        """, language="text")
+        st.markdown("**3. Priority Control Scaling:**")
+        st.code(f"""
+Final Points = Baseline Score * Strategy Weight * 100
+             = {eng_row['Execution_Pillar']:.3f} * {exec_w_norm:.2f} * 100
+             = {eng_row['Exec_Contribution']:.1f} pts
+        """, language="text")
+    # -------------------------------------------------------------
+    # PILLAR 2: COLLABORATION DEEP DIVE
+    # -------------------------------------------------------------
+    with st.expander(f"💬 Collaboration Pillar Breakdown: {eng_row['Collab_Contribution']:.1f} pts", expanded=False):
+        st.markdown("**1. Cohort Normalization (Raw vs Peak Team Ceiling):**")
+        st.markdown(f"- Review Actions Count: `{int(eng_row['review_actions'])}` / `{int(max_actions)}` Max = **{eng_row['norm_actions']:.3f}** ratio")
+        st.markdown(f"- Substantive Mentorship Words (>15w): `{int(eng_row['review_words_written'])}` / `{int(max_words)}` Max = **{eng_row['norm_words']:.3f}** ratio")
+        st.markdown("**2. Composite Subsystem Weight Assembly Formula:**")
+        st.code(f"""
+Collaboration Baseline Score = (Norm_Actions * 0.5) + (Norm_Words * 0.5)
+                             = ({eng_row['norm_actions']:.3f} * 0.5) + ({eng_row['norm_words']:.3f} * 0.5)
+                             = {eng_row['Collaboration_Pillar']:.3f}
+        """, language="text")
+        st.markdown("**3. Priority Control Scaling:**")
+        st.code(f"""
+Final Points = Baseline Score * Strategy Weight * 100
+             = {eng_row['Collaboration_Pillar']:.3f} * {collab_w_norm:.2f} * 100
+             = {eng_row['Collab_Contribution']:.1f} pts
+        """, language="text")
+    # -------------------------------------------------------------
+    # PILLAR 3: SYSTEM QUALITY DEEP DIVE
+    # -------------------------------------------------------------
+    with st.expander(f"🛑 System Quality Pillar Breakdown: {eng_row['Quality_Contribution']:.1f} pts", expanded=False):
+        st.markdown("**1. Cohort Normalization (Raw vs Peak Team Ceiling):**")
+        st.markdown(f"- Git Reverts Triggered: `{int(eng_row['reverts_triggered'])}` / `{int(max_reverts)}` Max = **{eng_row['norm_reverts']:.3f}** ratio")
+        st.markdown("**2. Composite Subsystem Weight Assembly Formula:**")
+        st.code(f"""
+Quality Baseline Score = 1.0 - Norm_Reverts
+                       = 1.0 - {eng_row['norm_reverts']:.3f}
+                       = {eng_row['Quality_Pillar']:.3f}
+        """, language="text")
+        st.markdown("**3. Priority Control Scaling:**")
+        st.code(f"""
+Final Points = Baseline Score * Strategy Weight * 100
+             = {eng_row['Quality_Pillar']:.3f} * {quality_w_norm:.2f} * 100
+             = {eng_row['Quality_Contribution']:.1f} pts
+        """, language="text")
+    # -------------------------------------------------------------
+    # PILLAR 4: HUMAN TOUCH DEEP DIVE
+    # -------------------------------------------------------------
+    with st.expander(f"🤝 Human Touch Pillar Breakdown: {eng_row['Human_Contribution']:.1f} pts", expanded=False):
+        st.markdown("**1. Qualitative Evaluation Criteria Score (Manager Inputs Matrix):**")
+        st.markdown(f"- Current Assigned Sync/Escalation Presence Rating = **{eng_row['human_touch_baseline']:.2f}** / 1.0")
+        st.markdown("""
+            > 💡 **What factors calculate the Human Touch Rating?** \n
+            > This value tracks critical behaviors that telemetry cannot isolate from GitHub APIs alone:
+            > * 🧠 **Planning & Brainstorming** (Active, clarifying architectural contributions during syncs)
+            > * 🚨 **Incident Escalation Response** (Availability and speed to jumping on critical production issues)
+        """)
+        st.markdown("**2. Composite Assembly Score Formula:**")
+        st.code(f"""
+Human Touch Baseline Score = Manager Evaluation Score
+                           = {eng_row['human_touch_baseline']:.2f}
+        """, language="text")
+        st.markdown("**3. Priority Control Scaling:**")
+        st.code(f"""
+Final Points = Baseline Score * Strategy Weight * 100
+             = {eng_row['Human_Pillar']:.2f} * {human_w_norm:.2f} * 100
+             = {eng_row['Human_Contribution']:.1f} pts
+        """, language="text")
+# -------------------------------------------------------------
+# UNDER THE HOOD RAW TELEMETRY (COLLAPSED BY DEFAULT)
+# -------------------------------------------------------------
+st.markdown("<br>", unsafe_allow_html=True)
+with st.expander("📊 View Underlying Raw GitHub Telemetry Metrics"):
+    st.markdown("This section details the raw activity counts gathered before weights or normalization filters were applied.")
+    st.dataframe(
+        df[['engineer', 'prs_merged', 'bug_fixes', 'multiplier_impact', 'review_actions', 'review_words_written', 'reverts_triggered']],
+        use_container_width=True,
+        hide_index=True
+    )

fetch_data.py ADDED Viewed

	@@ -0,0 +1,178 @@

+import os
+import requests
+import pandas as pd
+from datetime import datetime, timedelta
+from dotenv import load_dotenv
+# Load variables from .env file
+load_dotenv()
+# Configuration
+GITHUB_TOKEN = os.getenv("GITHUB_TOKEN")
+REPO = "PostHog/posthog"
+HEADERS = {
+    "Accept": "application/vnd.github+json",
+    "X-GitHub-Api-Version": "2022-11-28"
+}
+if GITHUB_TOKEN:
+    # Clean up any accidental leading/trailing quotes or whitespace from terminal exports
+    token_clean = GITHUB_TOKEN.strip().strip('\"').strip("'")
+    HEADERS["Authorization"] = f"Bearer {token_clean}"
+else:
+    print("⚠️ WARNING: GITHUB_TOKEN environment variable not found.")
+    print("Using unauthenticated requests. GitHub will rate-limit this instantly.")
+engineers = {}
+def get_or_init(user):
+    if not user or user.endswith("[bot]"):
+        return None
+    if user not in engineers:
+        engineers[user] = {
+            "prs_merged": 0,
+            "bug_fixes": 0,
+            "reverts_triggered": 0,
+            "review_actions": 0,
+            "review_words_written": 0,
+            "multiplier_impact": 0
+        }
+    return engineers[user]
+print("🏁 Extracting Advanced Impact Metrics matched to PostHog Topology...")
+cutoff_date = datetime.now() - timedelta(days=90)
+# -------------------------------------------------------------
+# Phase 1: Scan PR Stream (Execution, Complexity, Reverts)
+# -------------------------------------------------------------
+print("\n📦 Phase 1: Fetching recent Pull Requests...")
+pr_url = f"https://api.github.com/repos/{REPO}/pulls"
+phase_1_success = False
+for page in range(1, 11):
+    params = {
+        "state": "closed",
+        "sort": "updated",
+        "direction": "desc",
+        "per_page": 100,
+        "page": page
+    }
+    res = requests.get(pr_url, headers=HEADERS, params=params)
+    if res.status_code != 200:
+        print(f"❌ Phase 1 Error on page {page}: API returned {res.status_code} - {res.json().get('message')}")
+        break
+    prs = res.json()
+    if not prs:
+        break
+    phase_1_success = True
+    for pr in prs:
+        if not pr.get("merged_at"):
+            continue
+        merged_at = datetime.strptime(pr["merged_at"], "%Y-%m-%dT%H:%M:%SZ")
+        if merged_at < cutoff_date:
+            continue
+        author = pr["user"]["login"]
+        eng = get_or_init(author)
+        if not eng:
+            continue
+        # Track raw baseline engineering velocity
+        eng["prs_merged"] += 1
+        # Extract textual fields for heuristics matching
+        title = pr.get("title", "").lower()
+        # Metric: System Quality (Avoidable Revert Tracking)
+        if "revert" in title:
+            eng["reverts_triggered"] += 1
+        # Extract native labels payload once for all downstream metric evaluations
+        labels = [l["name"].lower() for l in pr.get("labels", [])]
+        # Condition A: Structural Complexity Multiplier (Title Analysis)
+        if any(x in title for x in ["lib", "core", "infra", "architecture", "critical"]):
+            eng["multiplier_impact"] += 1
+        # Condition B: High Severity Multiplier (Native Priority Label Analysis)
+        # Adds an extra point if the PR is explicitly flagged as a P0 or P1 incident/initiative
+        if any(p in labels for p in ["p0", "p1"]):
+            eng["multiplier_impact"] += 1
+        # Metric: Native Bug Tracking
+        if "bug" in labels or any("bug" in label_name for label_name in labels):
+            eng["bug_fixes"] += 1
+# -------------------------------------------------------------
+# Phase 2: Scan Review Comments Stream (Citizenship & Depth)
+# -------------------------------------------------------------
+print("\n💬 Phase 2: Fetching repository-wide review comments...")
+comments_url = f"https://api.github.com/repos/{REPO}/pulls/comments"
+phase_2_success = False
+for page in range(1, 11):
+    params = {
+        "sort": "created",
+        "direction": "desc",
+        "per_page": 100,
+        "page": page
+    }
+    res = requests.get(comments_url, headers=HEADERS, params=params)
+    if res.status_code != 200:
+        print(f"❌ Phase 2 Error on page {page}: API returned {res.status_code} - {res.json().get('message')}")
+        break
+    comments = res.json()
+    if not comments:
+        break
+    phase_2_success = True
+    for comment in comments:
+        created_at = datetime.strptime(comment["created_at"], "%Y-%m-%dT%H:%M:%SZ")
+        if created_at < cutoff_date:
+            continue
+        reviewer = comment["user"]["login"]
+        eng = get_or_init(reviewer)
+        if not eng:
+            continue
+        # Track raw volume of code review interaction
+        eng["review_actions"] += 1
+        # Metric: Meaningful Review Depth (Filters out superficial "LGTM" comments)
+        body = comment.get("body", "")
+        word_count = len(body.split())
+        if word_count > 15:
+            eng["review_words_written"] += word_count
+# -------------------------------------------------------------
+# Phase 3: Defensive Data Processing and Export
+# -------------------------------------------------------------
+print("\n📊 Phase 3: Processing and Exporting Data...")
+if engineers and (phase_1_success or phase_2_success):
+    df = pd.DataFrame.from_dict(engineers, orient='index').reset_index().rename(columns={'index': 'engineer'})
+    # Defensive Schema Guard: Force-initialize expected columns to protect against downstream KeyErrors
+    expected_cols = ["prs_merged", "review_actions", "bug_fixes", "reverts_triggered", "multiplier_impact", "review_words_written"]
+    for expected_col in expected_cols:
+        if expected_col not in df.columns:
+            df[expected_col] = 0
+        df[expected_col] = df[expected_col].fillna(0)
+    # Prune inactive records to keep dataset compact
+    df = df[(df['prs_merged'] > 0) | (df['review_actions'] > 0)]
+    if not df.empty:
+        df.to_csv("posthog_impact_data.csv", index=False)
+        print("🚀 Advanced metrics pipeline successfully saved to posthog_impact_data.csv")
+    else:
+        print("⚠️ DataFrame filtered down to 0 rows. No matching active engineers found in this window.")
+else:
+    print("❌ Critical Error: No data payload compiled. Please check the API error codes printed above.")

posthog_impact_data.csv ADDED Viewed

	@@ -0,0 +1,107 @@

+engineer,prs_merged,bug_fixes,reverts_triggered,review_actions,review_words_written,multiplier_impact
+sampennington,49,0,0,161,7318,2
+cat-ph,12,0,0,4,75,0
+VojtechBartos,11,0,1,13,70,0
+georgemunyoro,3,0,0,0,0,0
+Piccirello,9,0,0,8,974,0
+rnegron,21,0,0,1,33,0
+richardsolomou,5,0,0,13,442,0
+developers-universe-1,1,0,0,0,0,0
+skoob13,6,0,0,3,34,0
+danielcarletti,15,0,0,9,307,0
+andrewm4894,3,0,0,12,416,0
+arnohillen,6,0,0,3,57,2
+pl,6,0,0,8,274,0
+Radu-Raicea,3,0,0,6,237,0
+webjunkie,20,0,0,1,0,0
+rafaeelaudibert,17,0,0,3,41,0
+meikelmosby,6,0,0,3,162,0
+pauldambra,51,0,0,19,1738,1
+turnipdabeets,2,0,0,2,109,0
+dmarchuk,4,0,1,3,44,0
+sakce,7,0,0,6,152,0
+fasyy612,5,0,0,0,0,0
+vdekrijger,2,0,0,136,3485,0
+gesh,15,0,0,4,54,0
+jonmcwest,6,0,0,0,0,0
+jurajmajerik,7,0,0,1,0,0
+robbie-c,12,0,0,3,0,1
+Gilbert09,13,0,0,8,396,0
+leonposthog,1,0,0,0,0,0
+Twixes,5,0,0,1,0,0
+eleftheriatrivyzaki,2,0,0,0,0,0
+joethreepwood,1,0,0,0,0,0
+TueHaulund,9,0,0,0,0,1
+darkopia,1,0,0,0,0,0
+orian,7,0,0,0,0,0
+charlescook-ph,1,0,0,0,0,0
+jabahamondes,2,0,0,2,0,0
+ksvat,4,0,0,0,0,0
+DanielVisca,13,0,0,5,423,0
+gantoine,6,0,1,0,0,0
+nickbest-ph,14,0,0,4,138,0
+haacked,6,0,0,13,1394,0
+fercgomes,8,0,0,0,0,0
+z0br0wn,8,0,0,7,298,0
+matheus-vb,2,0,0,2,134,0
+gustavohstrassburger,4,0,1,0,0,0
+adboio,1,0,0,0,0,0
+feliperalmeida,1,0,0,0,0,0
+arthurdedeus,9,0,0,5,70,0
+a-lider,12,0,1,11,314,0
+eli-r-ph,11,0,0,5,172,0
+kyleswank,1,0,0,0,0,0
+jordanm-posthog,7,0,0,0,0,0
+carlos-marchal-ph,3,0,0,1,0,0
+rorylshanks,5,0,1,0,0,0
+yasen-posthog,2,0,0,7,359,0
+tomasfarias,6,0,0,6,26,0
+estefaniarabadan,5,0,0,3,39,0
+christiaan-ph,3,0,0,0,0,0
+patricio-posthog,2,0,0,0,0,0
+ablaszkiewicz,6,0,0,2,193,0
+andyzzhao,10,0,0,0,0,0
+nicowaltz,4,0,0,4,56,0
+andehen,2,0,0,0,0,0
+thmsobrmlr,11,0,0,0,0,0
+abhischekt,4,0,1,5,22,0
+shauryapednekar,1,0,0,0,0,0
+oliverb123,1,0,0,0,0,0
+andrewjmcgehee,2,0,0,0,0,0
+lricoy,23,0,0,2,43,0
+rodrigoi,7,0,0,0,0,0
+MattBro,6,0,0,9,425,0
+ryans-posthog,1,0,0,0,0,0
+afsuyadi,1,0,0,0,0,0
+clr182,4,0,0,0,0,0
+slshults,2,0,0,0,0,0
+nakshatra-nahar,1,0,0,0,0,0
+mayteio,1,0,0,5,123,0
+marandaneto,1,0,0,0,0,0
+k11kirky,1,0,0,0,0,0
+jose-sequeira,3,0,0,0,0,0
+willwearing,1,0,0,0,0,0
+sortafreel,4,0,0,0,0,0
+MattPua,8,0,0,2,18,0
+joshsny,18,0,0,3,121,0
+ioannisj,1,0,0,0,0,0
+pawel-cebula,1,0,0,5,243,0
+mp-hog,5,0,0,2,245,0
+MarconLP,1,0,0,0,0,0
+ReeceJones,8,0,0,11,166,0
+lucasheriques,5,0,0,2,116,0
+okxint,2,0,0,0,0,0
+adamleithp,5,0,0,0,0,0
+dmarticus,6,0,0,0,0,0
+erezrokah,1,0,0,0,0,0
+benjackwhite,4,0,0,0,0,0
+hpouillot,4,0,0,1,0,0
+bigjohnn1,1,0,0,0,0,0
+xljones,3,0,0,0,0,0
+tatoalo,5,0,0,0,0,0
+luke-belton,1,0,0,0,0,0
+frankh,4,0,0,0,0,0
+langesven,1,0,0,0,0,0
+Copilot,0,0,0,41,2364,0
+brandonleung,0,0,0,3,212,0
+cvolzer3,0,0,0,5,95,0

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+streamlit
+pandas
+plotly
+python-dotenv