Spaces:

Bachstelze
/

github_sync

Sleeping

App Files Files Community

github_sync / A6 /benchmark_timing.md

Bachstelze

add single sample command

71bcc71 4 days ago

preview code

raw

history blame contribute delete

9.16 kB

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

Standardized Timing Benchmarking Framework

A comprehensive benchmarking framework for fair and consistent comparison of classification models (A4, A5, A5b, A6).

Features

This framework provides standardized metrics for model comparison:

Inference Time: Mean, standard deviation, min, max, and percentiles (P50, P95, P99)
Memory Usage: Mean, standard deviation, and peak memory consumption
Prediction Accuracy: Correct predictions and accuracy percentage
Model Characteristics: Model size, number of features, model type
Consistent Data Pipeline: Uses the same data processing for all models

Installation

No additional dependencies required. Uses existing project dependencies:

numpy
pandas
scikit-learn
pickle (standard library)

Usage

Basic Usage

python benchmark_timing.py

Advanced Usage

# Specify number of samples and repeats
python benchmark_timing.py --samples 200 --repeats 20

# Save results to specific file
python benchmark_timing.py --output results/my_benchmark.json

# Print comparison table
python benchmark_timing.py --compare

# Print model recommendations
python benchmark_timing.py --recommend

# All options combined
python benchmark_timing.py -n 150 -r 15 -o results/benchmark.json -c -R

Command Line Arguments

Argument	Short	Description	Default
`--samples`	`-n`	Number of test samples	100
`--repeats`	`-r`	Number of repetitions per sample	10
`--output`	`-o`	Output file path for JSON results	Auto-generated
`--compare`	`-c`	Print comparison table	False
`--recommend`	`-R`	Print model recommendations	False
`--single-sample`	?	Test single sample inference	False

Output

Console Output

The framework prints real-time progress and results:

======================================================================
STANDARDIZED TIMING BENCHMARKING FRAMEWORK
======================================================================

Configuration:
  Number of samples: 100
  Number of repeats per sample: 10
  Total predictions per model: 1000

Loading data...
  Movement features shape: (1000, 150)
  Weak link scores shape: (1000, 20)
  Merged dataset shape: (1000, 165)
  Feature matrix shape: (1000, 160)
  Number of features: 160
  Number of classes: 14

======================================================================
Running Benchmarks
======================================================================

  Benchmarking A4 Random Forest...

  A4 Random Forest Results:
    Status: SUCCESS
    Inference Time:
      Mean: 1.234 ms
      Std:  0.123 ms
      P50:  1.200 ms
      P95:  1.500 ms
      P99:  1.800 ms
    Memory Usage:
      Mean: 256.5 KB
      Peak: 512.0 KB
    Accuracy: 78.5% (78/100)
    Model Size: 1250.0 KB
    Features: 160

JSON Results

Results are saved to JSON format with all metrics:

{
  "timestamp": "2024-01-15T10:30:45.123456",
  "num_samples": 100,
  "num_repeats": 10,
  "models": {
    "A4 Random Forest": {
      "model_name": "A4 Random Forest",
      "model_path": "../A4/models/weaklink_classifier_rf.pkl",
      "inference_time_mean": 0.001234,
      "inference_time_std": 0.000123,
      "inference_time_min": 0.001000,
      "inference_time_max": 0.001800,
      "inference_time_p50": 0.001200,
      "inference_time_p95": 0.001500,
      "inference_time_p99": 0.001800,
      "memory_usage_mean": 262656.0,
      "memory_usage_std": 10240.0,
      "memory_usage_peak": 524288.0,
      "accuracy": 0.785,
      "predictions_correct": 78,
      "predictions_total": 100,
      "model_size_bytes": 1280000,
      "num_features": 160,
      "num_parameters": 10,
      "model_type": "RandomForestClassifier",
      "timing_samples": [0.0012, 0.0013, ...],
      "memory_samples": [262144, 266240, ...],
      "status": "SUCCESS",
      "error_message": ""
    }
  }
}

Model Comparison Table

With --compare flag, prints a formatted comparison:

==========================================================================
MODEL COMPARISON SUMMARY
==========================================================================
Model                Time (ms)       Std       P95       Acc (%)    Mem (KB)   Size (KB) 
--------------------------------------------------------------------------
A5b Adaboost         0.850           0.050     1.100     75.2       128.5      512.0
A5 Ensemble          1.100           0.080     1.350     79.8       256.3      768.0
A4 Random Forest     1.234           0.123     1.500     78.5       256.5      1250.0
A5b Bagging Trees    1.450           0.150     1.800     77.1       384.2      1024.0
A6 SVM               2.100           0.200     2.500     81.2       512.0      2048.0
==========================================================================

Model Recommendations

With --recommend flag, provides optimal model suggestions:

======================================================================
MODEL RECOMMENDATIONS
======================================================================

Fastest Inference:
  Model: A5b Adaboost
  Inference Time: 0.850 ms

Highest Accuracy:
  Model: A6 SVM
  Accuracy: 81.2%

Lowest Memory Usage:
  Model: A5b Adaboost
  Memory Usage: 128.5 KB

Best Balanced Performance:
  Model: A5 Ensemble
  Inference Time: 1.100 ms
  Accuracy: 79.8%
  Memory Usage: 256.3 KB

Benchmarking Metrics Explained

Inference Time Metrics

Metric	Description
Mean	Average inference time across all repetitions
Std	Standard deviation (variability)
Min/Max	Fastest and slowest inference times
P50	Median (50th percentile)
P95	95th percentile (95% of predictions are faster)
P99	99th percentile (99% of predictions are faster)

Memory Metrics

Metric	Description
Mean	Average memory usage
Std	Standard deviation of memory usage
Peak	Maximum memory consumed

Accuracy Metrics

Metric	Description
Accuracy	Percentage of correct predictions
Predictions Correct/Total	Raw counts

Implementation Details

Data Pipeline

All models use the same data loading and preprocessing pipeline:

Load movement features and weaklink scores
Create WeakestLink target column
Merge datasets
Extract features (excluding ID, WeakestLink, EstimatedScore)
Train/test split (80/20, stratified, random_state=42)
StandardScaler fitted on training data

Feature Handling

A4 Random Forest model was trained WITH duplicate NASM columns
Other models (A5, A5b, A6) were trained WITHOUT duplicate NASM columns
The framework automatically filters features based on each model's expectations

Memory Tracking

Uses Python's tracemalloc module for accurate memory measurement:

Tracks memory before and after each prediction
Records both current and peak memory usage

Timing Precision

Uses time.perf_counter() for high-resolution timing measurements.

Extending the Framework

Adding New Models

Add model path to all_classification.py:

a7_new_model = "../A7/models/new_model.pkl"

Import in benchmark_timing.py:

from all_classification import (
    a4_rf,
    a5_ensemnble,
    a5b_adaboost,
    a5b_bagging_tree,
    a6_svm,
    a7_new_model,  # Add here
)

Add to models_to_benchmark list in run_benchmark():

models_to_benchmark = [
    ('A4 Random Forest', a4_rf),
    ('A5 Ensemble', a5_ensemnble),
    ('A5b Adaboost', a5b_adaboost),
    ('A5b Bagging Trees', a5b_bagging_tree),
    ('A6 SVM', a6_svm),
    ('A7 New Model', a7_new_model),  # Add here
]

Adding New Metrics

Extend the ModelMetrics dataclass:

@dataclass
class ModelMetrics:
    # ... existing fields ...
    new_metric: float = 0.0

Then calculate it in benchmark_single_model().

Troubleshooting

Model Loading Errors

If a model fails to load:

Verify the model file exists at the specified path
Check that all required dependencies are installed
Ensure custom classes (like AdaBoostEnsemble) are importable

Inconsistent Results

If results vary significantly between runs:

Increase --repeats for better averaging
Ensure no other processes are running during benchmarking
Check system resource utilization

Memory Issues

If running out of memory:

Reduce --samples count
Close other applications
Check available system memory

License

Part of the Data Intensive Systems project.


Now let me also create a simple comparison script for quick analysis:

<tool_call>
<function=edit_file>
<parameter=display_description>
Create simple comparison script for quick analysis
</parameter>
<parameter=path>
/home/cyclonaut/Dokumente/Studium Växjö/data intensive project/repo_march/Data-intensive-systems/A6/compare_benchmarks.py
</parameter>
<parameter=mode>
create
</parameter>
</function>
</tool_call>