Spaces:

Ashish-K
/

Analytics_for_Managers

Running

File size: 25,837 Bytes
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Analytics Modeling Sandbox - User Guide</title>
    <style>

        :root {

            --primary-color: #276749;

            --primary-light: #38a169;

            --secondary-color: #2c5282;

            --accent-color: #ed8936;

            --warning-color: #c53030;

            --background-color: #f7fafc;

            --text-color: #2d3748;

            --text-light: #718096;

            --border-color: #e2e8f0;

            --card-bg: #ffffff;

        }



        * {

            box-sizing: border-box;

        }



        body {

            font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;

            line-height: 1.7;

            color: var(--text-color);

            max-width: 900px;

            margin: 0 auto;

            padding: 20px 40px;

            background-color: var(--background-color);

        }



        h1 {

            color: var(--primary-color);

            border-bottom: 3px solid var(--primary-light);

            padding-bottom: 15px;

            margin-top: 40px;

        }



        h2 {

            color: var(--primary-color);

            border-bottom: 2px solid var(--border-color);

            padding-bottom: 10px;

            margin-top: 50px;

        }



        h3 {

            color: var(--primary-light);

            margin-top: 30px;

        }



        .header-section {

            text-align: center;

            padding: 40px 0;

            border-bottom: 2px solid var(--border-color);

            margin-bottom: 40px;

            background: linear-gradient(135deg, var(--primary-color) 0%, var(--primary-light) 100%);

            margin: -20px -40px 40px -40px;

            padding: 60px 40px;

            color: white;

        }



        .header-section h1 {

            border: none;

            margin: 0;

            font-size: 2.5em;

            color: white;

        }



        .subtitle {

            color: rgba(255,255,255,0.9);

            font-size: 1.2em;

            margin-top: 10px;

        }



        table {

            width: 100%;

            border-collapse: collapse;

            margin: 20px 0;

            background: white;

            box-shadow: 0 1px 3px rgba(0,0,0,0.1);

        }



        th, td {

            padding: 12px 15px;

            text-align: left;

            border: 1px solid var(--border-color);

        }



        th {

            background-color: var(--primary-color);

            color: white;

            font-weight: 600;

        }



        tr:nth-child(even) {

            background-color: #f8f9fa;

        }



        blockquote {

            border-left: 4px solid var(--primary-light);

            margin: 25px 0;

            padding: 15px 25px;

            background-color: #f0fff4;

            font-style: italic;

        }



        .warning-box {

            background-color: #fff5f5;

            border: 1px solid #fc8181;

            border-left: 4px solid var(--warning-color);

            border-radius: 5px;

            padding: 20px;

            margin: 25px 0;

        }



        .warning-box h4 {

            color: var(--warning-color);

            margin-top: 0;

        }



        .info-box {

            background-color: #ebf8ff;

            border: 1px solid #90cdf4;

            border-left: 4px solid var(--secondary-color);

            border-radius: 5px;

            padding: 20px;

            margin: 25px 0;

        }



        .step-box {

            background: white;

            border: 1px solid var(--border-color);

            border-radius: 10px;

            padding: 25px;

            margin: 20px 0;

            box-shadow: 0 2px 4px rgba(0,0,0,0.05);

            border-left: 5px solid var(--primary-light);

        }



        .step-box h3 {

            margin-top: 0;

            display: flex;

            align-items: center;

        }



        .step-number {

            display: inline-flex;

            align-items: center;

            justify-content: center;

            width: 35px;

            height: 35px;

            background-color: var(--primary-light);

            color: white;

            border-radius: 50%;

            font-weight: bold;

            margin-right: 12px;

            flex-shrink: 0;

        }



        .output-section {

            background: #f8f9fa;

            border-radius: 8px;

            padding: 20px;

            margin: 20px 0;

        }



        .output-section h4 {

            color: var(--primary-color);

            margin-top: 0;

        }



        .trap-warning {

            background: #fffaf0;

            border-left: 4px solid var(--accent-color);

            padding: 15px 20px;

            margin: 15px 0;

            border-radius: 0 8px 8px 0;

            font-size: 0.95em;

        }



        .trap-warning strong {

            color: var(--accent-color);

        }



        .two-column {

            display: grid;

            grid-template-columns: 1fr 1fr;

            gap: 20px;

            margin: 25px 0;

        }



        .column {

            background: white;

            padding: 20px;

            border-radius: 8px;

            border: 1px solid var(--border-color);

        }



        .do-column {

            border-left: 4px solid var(--primary-light);

        }



        .dont-column {

            border-left: 4px solid var(--warning-color);

        }



        .column h4 {

            margin-top: 0;

        }



        .do-column h4 {

            color: var(--primary-light);

        }



        .dont-column h4 {

            color: var(--warning-color);

        }



        .faq-item {

            background: white;

            border: 1px solid var(--border-color);

            border-radius: 8px;

            padding: 20px;

            margin: 15px 0;

        }



        .faq-item h4 {

            color: var(--secondary-color);

            margin-top: 0;

            margin-bottom: 10px;

        }



        .checklist-table td:first-child {

            width: 30%;

            font-weight: 600;

            color: var(--primary-color);

        }



        .comparison-table th:nth-child(1) {

            background-color: var(--primary-color);

        }



        .comparison-table th:nth-child(2) {

            background-color: var(--secondary-color);

        }



        .final-reminder {

            background: linear-gradient(135deg, var(--primary-color), var(--primary-light));

            color: white;

            padding: 30px;

            border-radius: 10px;

            margin: 40px 0;

            text-align: center;

        }



        .final-reminder blockquote {

            background: rgba(255,255,255,0.15);

            border-left-color: white;

            color: white;

        }



        hr {

            border: none;

            border-top: 1px solid var(--border-color);

            margin: 40px 0;

        }



        footer {

            text-align: center;

            padding: 30px;

            color: #666;

            border-top: 1px solid var(--border-color);

            margin-top: 50px;

        }



        code {

            background-color: #edf2f7;

            padding: 2px 6px;

            border-radius: 4px;

            font-family: 'Consolas', 'Monaco', monospace;

            font-size: 0.9em;

        }



        ul, ol {

            margin: 15px 0;

            padding-left: 25px;

        }



        li {

            margin: 8px 0;

        }



        @media (max-width: 768px) {

            body {

                padding: 15px 20px;

            }



            .header-section {

                margin: -15px -20px 30px -20px;

                padding: 40px 20px;

            }



            .two-column {

                grid-template-columns: 1fr;

            }



            table {

                font-size: 0.9em;

            }



            th, td {

                padding: 8px 10px;

            }

        }

    </style>
</head>
<body>

<div class="header-section">
    <h1>Analytics Modeling Sandbox</h1>
    <p class="subtitle">User Guide</p>
</div>

<h2>What Is the Analytics Modeling Sandbox?</h2>

<p>The Analytics Modeling Sandbox is a practical analytics tool designed for users who have learned analytical concepts from the <em>Analytics for Managers</em> book and want to apply those techniques to their own data.</p>

<p>Unlike the Analytics Reasoning Companion (which focuses on developing reasoning skills using curated datasets), the Sandbox is built for <strong>doing real analysis</strong> — running regression, classification, and clustering on data you provide.</p>

<h3>What It Does</h3>
<ul>
    <li><strong>Executes analyses</strong> on your uploaded data (CSV, Excel)</li>
    <li><strong>Shows code</strong> so you can see exactly what's being done</li>
    <li><strong>Produces outputs</strong> including coefficients, metrics, and visualizations</li>
    <li><strong>Provides interpretation guidance</strong> to prevent common analytical mistakes</li>
    <li><strong>Warns about traps</strong> like accuracy illusions, threshold fallacies, and omitted variable bias</li>
</ul>

<h3>What It Does NOT Do</h3>
<ul>
    <li><strong>Make decisions for you</strong> — it provides evidence, you decide</li>
    <li><strong>Certify models as "good"</strong> — it shows you results, not approval stamps</li>
    <li><strong>Establish causation</strong> — all findings are associations unless you have experimental data</li>
    <li><strong>Store your data</strong> — nothing is retained between sessions</li>
    <li><strong>Replace professional judgment</strong> — this is an educational tool, not professional services</li>
</ul>

<hr>

<h2>Important Notices</h2>

<div class="warning-box">
    <h4>Data Privacy</h4>
    <p><strong>You are responsible for ensuring you have proper authorization to analyze the data you upload.</strong></p>
    <p>Do not upload:</p>
    <ul>
        <li>Personally identifiable information (PII) without consent</li>
        <li>Protected health information (PHI)</li>
        <li>Confidential business data you're not authorized to share</li>
        <li>Data subject to regulatory restrictions (GDPR, HIPAA, etc.)</li>
    </ul>
    <p>The Sandbox does not store your data between sessions, but you remain responsible for compliance with applicable privacy laws and organizational policies.</p>
</div>

<div class="info-box">
    <h4>Disclaimer</h4>
    <p>The Analytics Modeling Sandbox provides analytical assistance for educational purposes. Outputs are statistical estimates based on the data you provide. They do not constitute predictions, guarantees, or professional advice.</p>
    <p>All findings describe patterns and associations. They do not establish causal relationships unless derived from controlled experiments.</p>
    <p>Consult qualified professionals before making significant business, financial, legal, or operational decisions based on these results.</p>
</div>

<hr>

<h2>Getting Started</h2>

<h3>Step 1: Access the Sandbox</h3>
<p>Visit the Sandbox at: <strong>[Link to be provided]</strong></p>

<h3>Step 2: Prepare Your Data</h3>
<p>Before uploading, ensure your data:</p>
<ul>
    <li>Is in CSV or Excel format</li>
    <li>Is under 5MB (recommended)</li>
    <li>Has clear column headers</li>
    <li>Has a defined outcome variable (for regression/classification)</li>
</ul>

<h3>Step 3: Upload and Describe</h3>
<p>When you upload your file, tell the Sandbox:</p>
<ul>
    <li>What decision this analysis will inform</li>
    <li>Which column is your outcome variable</li>
    <li>What type of analysis you want (regression, classification, or clustering)</li>
</ul>

<hr>

<h2>The 7-Step Workflow</h2>

<p>The Sandbox suggests a structured workflow but allows you to skip steps if needed. Skipping steps increases interpretation risk — the Sandbox will warn you but won't block you.</p>

<div class="step-box">
    <h3><span class="step-number">1</span> Business Context</h3>
    <p><strong>Purpose:</strong> Establish what decision this analysis informs.</p>
    <p><strong>What happens:</strong> The Sandbox asks about your goals before diving into data.</p>
    <p><strong>Why it matters:</strong> Analysis without context produces technically correct but practically useless results.</p>
    <p><strong>If you skip:</strong> <em>"Proceeding without clear goals increases interpretation risk."</em></p>
</div>

<div class="step-box">
    <h3><span class="step-number">2</span> Data Overview</h3>
    <p><strong>Purpose:</strong> Understand what you're working with before modeling.</p>
    <p><strong>What happens:</strong> The Sandbox shows dataset shape, column types, missing value summary, and basic distributions.</p>
    <p><strong>Key question:</strong> <em>"Who might be excluded from this dataset? Could they differ systematically?"</em></p>
</div>

<div class="step-box">
    <h3><span class="step-number">3</span> Data Preparation</h3>
    <p><strong>Purpose:</strong> Handle missing values, encode categories, scale features.</p>
    <p><strong>What happens:</strong> The Sandbox shows what preparation steps are applied, why, and the trade-offs involved.</p>
    <p><strong>Transparency:</strong> You'll see the code so you know exactly what's being done.</p>
</div>

<div class="step-box">
    <h3><span class="step-number">4</span> Analysis</h3>
    <p><strong>Purpose:</strong> Run the model.</p>
    <p><strong>What happens:</strong> The Sandbox executes regression, classification, or clustering using standard sklearn libraries.</p>
    <p><strong>Defaults shown explicitly:</strong></p>
    <ul>
        <li>Train/test split: 70/30</li>
        <li>Random state: 42</li>
        <li>Classification threshold: 0.5 (with alternatives shown)</li>
        <li>Clustering: K values 3-6 tested</li>
    </ul>
</div>

<div class="step-box">
    <h3><span class="step-number">5</span> Results</h3>
    <p><strong>Purpose:</strong> Present outputs with context.</p>
    <p><strong>For Regression:</strong> Coefficients, R-squared, MAE, RMSE, residual plots</p>
    <p><strong>For Classification:</strong> Confusion matrix, Precision/Recall/F1/AUC, threshold table</p>
    <p><strong>For Clustering:</strong> Cluster sizes, feature means, silhouette scores, elbow plot</p>
    <p>Interpretation notes are embedded with each output.</p>
</div>

<div class="step-box">
    <h3><span class="step-number">6</span> Interpretation Check</h3>
    <p><strong>Purpose:</strong> Ensure you're not over-interpreting.</p>
    <p><strong>What happens:</strong> The Sandbox prompts:</p>
    <ul>
        <li>"What assumptions must hold for these results to be actionable?"</li>
        <li>"What could mislead us here?"</li>
        <li>"Who might be missing from this data?"</li>
    </ul>
</div>

<div class="step-box">
    <h3><span class="step-number">7</span> Limitations & Next Steps</h3>
    <p><strong>Purpose:</strong> Acknowledge what the analysis cannot tell you.</p>
    <p><strong>What happens:</strong> The Sandbox helps you articulate what remains uncertain, what additional data would help, and what tests would increase confidence.</p>
</div>

<hr>

<h2>Understanding Your Outputs</h2>

<div class="output-section">
    <h4>Regression Outputs</h4>

    <p><strong>Coefficients Table:</strong></p>
    <table>
        <tr><th>Feature</th><th>Coefficient</th></tr>
        <tr><td>Feature_A</td><td>2.34</td></tr>
        <tr><td>Feature_B</td><td>-1.56</td></tr>
        <tr><td>Feature_C</td><td>0.89</td></tr>
    </table>

    <p><strong>How to read:</strong> A coefficient of 2.34 means: among otherwise similar cases in your data, a one-unit increase in Feature_A is associated with a 2.34-unit increase in the outcome, on average.</p>

    <div class="trap-warning">
        <strong>Caution:</strong> This is an association, not a causal effect. Unobserved factors might influence both the feature and the outcome.
    </div>

    <p><strong>Metrics:</strong></p>
    <ul>
        <li><strong>R-squared:</strong> Proportion of variance explained (0-1). Higher isn't always better.</li>
        <li><strong>MAE:</strong> Average prediction error in outcome units.</li>
        <li><strong>RMSE:</strong> Like MAE but penalizes large errors more.</li>
    </ul>
</div>

<div class="output-section">
    <h4>Classification Outputs</h4>

    <p><strong>Confusion Matrix:</strong></p>
    <table>
        <tr><th></th><th>Predicted: No</th><th>Predicted: Yes</th></tr>
        <tr><td><strong>Actual: No</strong></td><td>True Negative</td><td>False Positive</td></tr>
        <tr><td><strong>Actual: Yes</strong></td><td>False Negative</td><td>True Positive</td></tr>
    </table>

    <p><strong>Metrics:</strong></p>
    <ul>
        <li><strong>Accuracy:</strong> Can be misleading with imbalanced classes</li>
        <li><strong>Precision:</strong> Of those predicted positive, how many are correct?</li>
        <li><strong>Recall:</strong> Of actual positives, how many did we catch?</li>
        <li><strong>ROC AUC:</strong> Model's ability to rank positives above negatives</li>
    </ul>

    <div class="trap-warning">
        <strong>Threshold Table:</strong> Shows how precision and recall change at different thresholds. Use this to choose a threshold that matches your cost trade-offs — don't just accept 0.5.
    </div>
</div>

<div class="output-section">
    <h4>Clustering Outputs</h4>

    <p><strong>Cluster Profiles:</strong></p>
    <table>
        <tr><th>Cluster</th><th>Size</th><th>Feature_A (mean)</th><th>Feature_B (mean)</th></tr>
        <tr><td>0</td><td>150</td><td>2.3</td><td>-0.5</td></tr>
        <tr><td>1</td><td>200</td><td>-1.1</td><td>0.8</td></tr>
        <tr><td>2</td><td>100</td><td>0.5</td><td>1.2</td></tr>
    </table>

    <p><strong>How to read:</strong> Each row shows average feature values for cases in that cluster. Use these to develop descriptive labels.</p>

    <div class="trap-warning">
        <strong>Caution:</strong> Clusters are analytical groupings, not inherent types. Different features or scaling would produce different segments.
    </div>
</div>

<hr>

<h2>Embedded Trap Warnings</h2>

<p>The Sandbox automatically includes warnings after outputs to prevent common mistakes.</p>

<div class="trap-warning">
    <strong>After Regression:</strong> "Coefficients describe associations, not causal effects. Consider what unobserved factors might influence both predictor and outcome. Large effects may be driven by outliers—check residual plots."
</div>

<div class="trap-warning">
    <strong>After Classification:</strong> "Accuracy can mislead with imbalanced classes. Check: what would accuracy be predicting the majority class always? The 0.5 threshold is arbitrary—consider the relative costs of false positives vs. false negatives."
</div>

<div class="trap-warning">
    <strong>After Clustering:</strong> "Clusters depend on feature selection and scaling. Different choices produce different segments. These are analytical groupings, not fixed types—validate stability before building strategy."
</div>

<div class="trap-warning">
    <strong>For All Analyses:</strong> "Selection Bias Check: Who might be missing from this data? Could excluded cases differ systematically from those included?"
</div>

<hr>

<h2>Tips for Effective Use</h2>

<div class="two-column">
    <div class="column do-column">
        <h4>Do:</h4>
        <ol>
            <li><strong>Start with clear goals.</strong> Know what decision the analysis will inform.</li>
            <li><strong>Review the data summary.</strong> Check for issues before modeling.</li>
            <li><strong>Examine the code.</strong> Understanding what's done helps interpretation.</li>
            <li><strong>Use the threshold table</strong> (classification). Choose based on your costs.</li>
            <li><strong>Check cluster stability</strong> (clustering). Be cautious if results vary.</li>
            <li><strong>Read the interpretation notes.</strong> They prevent common mistakes.</li>
            <li><strong>Acknowledge limitations.</strong> Stating them is a sign of rigor.</li>
        </ol>
    </div>
    <div class="column dont-column">
        <h4>Don't:</h4>
        <ol>
            <li><strong>Don't upload sensitive data</strong> without authorization.</li>
            <li><strong>Don't skip business context.</strong> Analysis without purpose is just math.</li>
            <li><strong>Don't treat coefficients as causal.</strong> Association ≠ causation.</li>
            <li><strong>Don't celebrate accuracy alone.</strong> Check against the naive baseline.</li>
            <li><strong>Don't reify clusters.</strong> They're groupings, not fixed types.</li>
            <li><strong>Don't ignore who's missing.</strong> Selection bias can invalidate analysis.</li>
        </ol>
    </div>
</div>

<hr>

<h2>When to Use the Reasoning Companion Instead</h2>

<p>The Sandbox is for <strong>doing analysis</strong>. The Reasoning Companion is for <strong>developing judgment</strong>.</p>

<table class="comparison-table">
    <thead>
        <tr>
            <th>Use the Sandbox when...</th>
            <th>Use the Reasoning Companion when...</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>You have your own data to analyze</td>
            <td>You're learning concepts from the book</td>
        </tr>
        <tr>
            <td>You need actual outputs and code</td>
            <td>You want structured reasoning practice</td>
        </tr>
        <tr>
            <td>You're a practitioner applying techniques</td>
            <td>You're a student building fundamentals</td>
        </tr>
        <tr>
            <td>You want efficiency with guidance</td>
            <td>You want Socratic questioning</td>
        </tr>
    </tbody>
</table>

<p><strong>Handoff:</strong> After running analysis in the Sandbox, consider working through similar analyses in the Reasoning Companion using the book's curated datasets. The structured critique will strengthen your interpretation skills.</p>

<hr>

<h2>Frequently Asked Questions</h2>

<div class="faq-item">
    <h4>Q: What file formats can I upload?</h4>
    <p><strong>A:</strong> CSV and Excel files (.csv, .xlsx, .xls). Keep files under 5MB for best performance.</p>
</div>

<div class="faq-item">
    <h4>Q: Does the Sandbox store my data?</h4>
    <p><strong>A:</strong> No. Data is processed during your session only and is not retained afterward.</p>
</div>

<div class="faq-item">
    <h4>Q: Can I run advanced models like XGBoost or neural networks?</h4>
    <p><strong>A:</strong> The Sandbox defaults to interpretable models. You can request advanced models, but the Sandbox will note that complexity often reduces interpretability.</p>
</div>

<div class="faq-item">
    <h4>Q: Why does the Sandbox show me code?</h4>
    <p><strong>A:</strong> Transparency. Seeing the code helps you understand exactly what's being done, catch issues, and reproduce the analysis elsewhere.</p>
</div>

<div class="faq-item">
    <h4>Q: The Sandbox warned me about something. Did I do something wrong?</h4>
    <p><strong>A:</strong> Not necessarily. Warnings are educational — they flag potential interpretation risks. Consider them, but you decide whether to proceed.</p>
</div>

<div class="faq-item">
    <h4>Q: Why doesn't the Sandbox tell me which model is "best"?</h4>
    <p><strong>A:</strong> Because "best" depends on your goals, costs, and context — things the Sandbox can't know. It provides evidence; you make the judgment.</p>
</div>

<hr>

<h2>Quick Reference: Output Checklist</h2>

<p>Before acting on any Sandbox output, verify:</p>

<table class="checklist-table">
    <tr><td>Business Context</td><td>Does this analysis answer the right question?</td></tr>
    <tr><td>Data Quality</td><td>Were there missing values, outliers, or anomalies?</td></tr>
    <tr><td>Selection Bias</td><td>Who might be excluded from this data?</td></tr>
    <tr><td>Causation</td><td>Am I treating associations as causal levers?</td></tr>
    <tr><td>Baseline Comparison</td><td>How does this model compare to a naive baseline?</td></tr>
    <tr><td>Threshold Choice</td><td>(Classification) Is 0.5 the right threshold for my costs?</td></tr>
    <tr><td>Feature Dominance</td><td>(Clustering) Which features are driving similarity?</td></tr>
    <tr><td>Stability</td><td>Would results hold with different data or settings?</td></tr>
    <tr><td>Limitations</td><td>What can this analysis NOT tell me?</td></tr>
</table>

<hr>

<div class="final-reminder">
    <blockquote>
        <p>"These results describe patterns in your data. Before acting, consider: (1) what assumptions must hold, (2) who might be excluded from this data, and (3) what additional evidence would increase confidence."</p>
    </blockquote>
    <p>The Sandbox gives you analytical power. <strong>Use it with discipline.</strong></p>
</div>

<footer>
    <p><em>Analytics Modeling Sandbox — A companion to "Analytics for Managers"</em></p>
</footer>

</body>
</html>