User Guide
The Analytics Modeling Sandbox is a practical analytics tool designed for users who have learned analytical concepts from the Analytics for Managers book and want to apply those techniques to their own data.
Unlike the Analytics Reasoning Companion (which focuses on developing reasoning skills using curated datasets), the Sandbox is built for doing real analysis — running regression, classification, and clustering on data you provide.
You are responsible for ensuring you have proper authorization to analyze the data you upload.
Do not upload:
The Sandbox does not store your data between sessions, but you remain responsible for compliance with applicable privacy laws and organizational policies.
The Analytics Modeling Sandbox provides analytical assistance for educational purposes. Outputs are statistical estimates based on the data you provide. They do not constitute predictions, guarantees, or professional advice.
All findings describe patterns and associations. They do not establish causal relationships unless derived from controlled experiments.
Consult qualified professionals before making significant business, financial, legal, or operational decisions based on these results.
Visit the Sandbox at: [Link to be provided]
Before uploading, ensure your data:
When you upload your file, tell the Sandbox:
The Sandbox suggests a structured workflow but allows you to skip steps if needed. Skipping steps increases interpretation risk — the Sandbox will warn you but won't block you.
Purpose: Establish what decision this analysis informs.
What happens: The Sandbox asks about your goals before diving into data.
Why it matters: Analysis without context produces technically correct but practically useless results.
If you skip: "Proceeding without clear goals increases interpretation risk."
Purpose: Understand what you're working with before modeling.
What happens: The Sandbox shows dataset shape, column types, missing value summary, and basic distributions.
Key question: "Who might be excluded from this dataset? Could they differ systematically?"
Purpose: Handle missing values, encode categories, scale features.
What happens: The Sandbox shows what preparation steps are applied, why, and the trade-offs involved.
Transparency: You'll see the code so you know exactly what's being done.
Purpose: Run the model.
What happens: The Sandbox executes regression, classification, or clustering using standard sklearn libraries.
Defaults shown explicitly:
Purpose: Present outputs with context.
For Regression: Coefficients, R-squared, MAE, RMSE, residual plots
For Classification: Confusion matrix, Precision/Recall/F1/AUC, threshold table
For Clustering: Cluster sizes, feature means, silhouette scores, elbow plot
Interpretation notes are embedded with each output.
Purpose: Ensure you're not over-interpreting.
What happens: The Sandbox prompts:
Purpose: Acknowledge what the analysis cannot tell you.
What happens: The Sandbox helps you articulate what remains uncertain, what additional data would help, and what tests would increase confidence.
Coefficients Table:
| Feature | Coefficient |
|---|---|
| Feature_A | 2.34 |
| Feature_B | -1.56 |
| Feature_C | 0.89 |
How to read: A coefficient of 2.34 means: among otherwise similar cases in your data, a one-unit increase in Feature_A is associated with a 2.34-unit increase in the outcome, on average.
Metrics:
Confusion Matrix:
| Predicted: No | Predicted: Yes | |
|---|---|---|
| Actual: No | True Negative | False Positive |
| Actual: Yes | False Negative | True Positive |
Metrics:
Cluster Profiles:
| Cluster | Size | Feature_A (mean) | Feature_B (mean) |
|---|---|---|---|
| 0 | 150 | 2.3 | -0.5 |
| 1 | 200 | -1.1 | 0.8 |
| 2 | 100 | 0.5 | 1.2 |
How to read: Each row shows average feature values for cases in that cluster. Use these to develop descriptive labels.
The Sandbox automatically includes warnings after outputs to prevent common mistakes.
The Sandbox is for doing analysis. The Reasoning Companion is for developing judgment.
| Use the Sandbox when... | Use the Reasoning Companion when... |
|---|---|
| You have your own data to analyze | You're learning concepts from the book |
| You need actual outputs and code | You want structured reasoning practice |
| You're a practitioner applying techniques | You're a student building fundamentals |
| You want efficiency with guidance | You want Socratic questioning |
Handoff: After running analysis in the Sandbox, consider working through similar analyses in the Reasoning Companion using the book's curated datasets. The structured critique will strengthen your interpretation skills.
A: CSV and Excel files (.csv, .xlsx, .xls). Keep files under 5MB for best performance.
A: No. Data is processed during your session only and is not retained afterward.
A: The Sandbox defaults to interpretable models. You can request advanced models, but the Sandbox will note that complexity often reduces interpretability.
A: Transparency. Seeing the code helps you understand exactly what's being done, catch issues, and reproduce the analysis elsewhere.
A: Not necessarily. Warnings are educational — they flag potential interpretation risks. Consider them, but you decide whether to proceed.
A: Because "best" depends on your goals, costs, and context — things the Sandbox can't know. It provides evidence; you make the judgment.
Before acting on any Sandbox output, verify:
| Business Context | Does this analysis answer the right question? |
| Data Quality | Were there missing values, outliers, or anomalies? |
| Selection Bias | Who might be excluded from this data? |
| Causation | Am I treating associations as causal levers? |
| Baseline Comparison | How does this model compare to a naive baseline? |
| Threshold Choice | (Classification) Is 0.5 the right threshold for my costs? |
| Feature Dominance | (Clustering) Which features are driving similarity? |
| Stability | Would results hold with different data or settings? |
| Limitations | What can this analysis NOT tell me? |
"These results describe patterns in your data. Before acting, consider: (1) what assumptions must hold, (2) who might be excluded from this data, and (3) what additional evidence would increase confidence."
The Sandbox gives you analytical power. Use it with discipline.