AIO2025M06_DEMO_SOFTMAX_REGRESSION

Sleeping

App Files Files Community

duongtruongbinh commited on Nov 2

Commit

2776a06

1 Parent(s): c8321d4

Init commit

Browse files

Files changed (10) hide show

.gitignore +4 -0
README.md +66 -2
app.py +645 -0
packages.txt +2 -0
requirements.txt +5 -0
src/__init__.py +0 -0
src/logistic_regression.py +494 -0
static/aivn_logo.png +0 -0
static/vlai_logo.png +0 -0
vlai_template.py +250 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,4 @@

+__pycache__/
+__MACOSX/
+.DS_Store

README.md CHANGED Viewed

@@ -4,9 +4,73 @@ emoji: 📊
 colorFrom: red
 colorTo: blue
 sdk: gradio
-sdk_version: 5.49.1
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 colorFrom: red
 colorTo: blue
 sdk: gradio
+sdk_version: 5.38.0
 app_file: app.py
+short_description: Run Logistic Regression on datasets to predict outcomes
 pinned: false
 ---
+# Logistic Regression Demo
+Interactive demonstration of Logistic Regression implemented from scratch using NumPy and gradient descent. Learn binary classification with sigmoid activation, binary cross-entropy loss, and adjustable prediction threshold.
+## Features
+- **Binary Classification**: Implements binary classification (2 classes: 0 and 1)
+- **NumPy Implementation**: Efficient matrix operations for fast computation
+- **Sigmoid Activation**: Maps predictions to probabilities (0-1 range)
+- **Binary Cross-Entropy Loss**: Optimized loss function for binary classification
+- **Adjustable Threshold**: Experiment with different probability thresholds to balance precision/recall
+- **Mini-batch Gradient Descent**: Supports configurable batch sizes (powers of 2) or full batch
+- **Feature Normalization**: Automatic standardization (zero mean, unit variance) for stable training
+- **Training Visualization**: Track loss and accuracy over epochs for training and validation sets
+## Algorithm Details
+**Activation Function**: Sigmoid σ(z) = 1/(1 + e^(-z))
+**Loss Function**: Binary Cross-Entropy L = -[y·log(ŷ) + (1-y)·log(1-ŷ)]
+**Classification**: Predict class 1 if probability ≥ threshold, else class 0
+**Normalization**: Features standardized (zero mean, unit variance) for numerical stability
+## Sample Datasets
+1. **Breast Cancer**: Wisconsin Breast Cancer dataset (binary classification)
+2. **Wine (Binary)**: Wine dataset converted to binary (class 0 vs others)
+3. **Synthetic**: Artificially generated binary classification dataset
+## How to Use
+1. **Select Data**: Choose a sample dataset or upload your own CSV/Excel file
+2. **Configure Target**: Select target column (must have exactly 2 unique values)
+3. **Set Training Parameters**:
+   - **Epochs**: Number of training iterations (recommended: 50-500)
+   - **Learning Rate**: Step size for gradient descent (recommended: 0.001-0.01)
+   - **Batch Size**: Samples per batch (powers of 2, or Full Batch)
+   - **Train/Validation Split**: Proportion for training (default: 80%)
+4. **Adjust Threshold**: Set probability threshold for classification (default: 0.5)
+5. **Enter Features**: Input feature values for prediction
+6. **Run Training**: Click "Run Training & Prediction" to train and visualize
+## Key Parameters
+**Training Parameters**:
+- **Epochs**: Complete passes through data. More epochs = better learning but risk of overfitting
+- **Learning Rate**: Step size (0.001-0.01 recommended). Too high causes instability, too low is slow
+- **Batch Size**: Samples processed before update. Smaller = faster but noisier, larger = more stable
+- **Train/Validation Split**: Data split ratio (default 80/20)
+**Threshold Parameter** (Key Feature):
+- **Default**: 0.5 (balanced classification)
+- **Lower threshold** (e.g., 0.3): More class 1 predictions → higher recall, lower precision
+- **Higher threshold** (e.g., 0.7): Fewer class 1 predictions → higher precision, lower recall
+- **Experiment**: Adjust threshold to see how predictions and accuracy change in real-time
+- **Use Case**: Balance precision vs recall based on your classification goals
+## Requirements
+- gradio >= 5.38.0
+- pandas >= 1.5.0
+- scikit-learn >= 1.3.0
+- numpy >= 1.24.0
+- plotly >= 5.15.0

app.py ADDED Viewed

	@@ -0,0 +1,645 @@

+import gradio as gr
+import pandas as pd
+import vlai_template
+# Import Logistic Regression core
+try:
+    from src import logistic_regression
+    LR_AVAILABLE = True
+except ImportError as e:
+    print(f"❌ Logistic Regression module failed to load: {str(e)}")
+    LR_AVAILABLE = False
+    logistic_regression = None
+vlai_template.configure(
+    project_name="Logistic Regression Demo",
+    year="2025",
+    module="06",
+    description="Interactive demonstration of Logistic Regression using NumPy and gradient descent. Learn binary classification with sigmoid activation, binary cross-entropy loss, and adjustable prediction threshold. Visualize training metrics and experiment with threshold values.",
+    colors={
+        "primary": "#1976D2",
+        "accent": "#7B1FA2",
+        "bg1": "#E3F2FD",
+        "bg2": "#BBDEFB",
+        "bg3": "#90CAF9",
+    },
+    font_family="'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', Arial, sans-serif"
+)
+current_dataframe = None
+def load_sample_data_fallback(dataset_choice="Breast Cancer"):
+    """Fallback data loading function when core module is not available"""
+    from sklearn.datasets import load_breast_cancer, load_wine, make_classification
+    import pandas as pd
+    import numpy as np
+    def sklearn_to_df(data):
+        df = pd.DataFrame(data.data, columns=getattr(data, "feature_names", None))
+        if df.columns.isnull().any():
+            df.columns = [f"feature_{i}" for i in range(df.shape[1])]
+        df["target"] = data.target
+        return df
+    def wine_to_binary_df(wine_data):
+        df = pd.DataFrame(wine_data.data, columns=wine_data.feature_names)
+        df["target"] = (wine_data.target == 0).astype(int)
+        return df
+    def synthetic_classification():
+        X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
+                                   n_redundant=5, n_classes=2, random_state=42)
+        df = pd.DataFrame(X, columns=[f"feature_{i}" for i in range(X.shape[1])])
+        df["target"] = y
+        return df
+    datasets = {
+        "Breast Cancer": lambda: sklearn_to_df(load_breast_cancer()),
+        "Wine (Binary)": lambda: wine_to_binary_df(load_wine()),
+        "Synthetic": lambda: synthetic_classification(),
+    }
+    if dataset_choice not in datasets:
+        raise ValueError(f"Unknown dataset: {dataset_choice}")
+    return datasets[dataset_choice]()
+def create_input_components_fallback(df, target_col):
+    """Fallback input components creation when XGBoost is not available"""
+    feature_cols = [c for c in df.columns if c != target_col]
+    components = []
+    for col in feature_cols:
+        data = df[col]
+        if data.dtype == "object":
+            uniq = sorted(map(str, data.dropna().unique()))
+            if not uniq:
+                uniq = ["N/A"]
+            components.append(
+                {"name": col, "type": "dropdown", "choices": uniq, "value": uniq[0]}
+            )
+        else:
+            val = pd.to_numeric(data, errors="coerce").dropna().mean()
+            val = 0.0 if pd.isna(val) else float(val)
+            components.append(
+                {
+                    "name": col,
+                    "type": "number",
+                    "value": round(val, 3),
+                    "minimum": None,
+                    "maximum": None,
+                }
+            )
+    return components
+SAMPLE_DATA_CONFIG = {
+    "Breast Cancer": {"target_column": "target", "problem_type": "classification"},
+    "Wine (Binary)": {"target_column": "target", "problem_type": "classification"},
+    "Synthetic": {"target_column": "target", "problem_type": "classification"},
+}
+force_light_theme_js = """
+() => {
+  const params = new URLSearchParams(window.location.search);
+  if (!params.has('__theme')) {
+    params.set('__theme', 'light');
+    window.location.search = params.toString();
+  }
+}
+"""
+def validate_config(df, target_col):
+    if not target_col or target_col not in df.columns:
+        return False, "❌ Please select a valid target column from the dropdown.", None
+    target_series = df[target_col]
+    unique_vals = target_series.nunique()
+    # For logistic regression, we only support binary classification (2 classes)
+    problem_type = "classification"
+    if target_series.isnull().any():
+        return False, "⚠️ Target column has missing values. Please clean your data.", None
+    if target_series.dtype == "object":
+        return False, "⚠️ Target must be numeric for classification. Please select a numeric column.", None
+    if unique_vals != 2:
+        return False, f"⚠️ Target must have exactly 2 unique values for binary classification. Found {unique_vals} unique values.", None
+    # Check if values are 0 and 1
+    unique_values = sorted(target_series.unique())
+    if set(unique_values) != {0, 1}:
+        return True, f"\n✅ Configuration is valid! Target will be mapped to binary (0/1). Original values: {unique_values}", problem_type
+    return True, f"\n✅ Configuration is valid! Ready for binary classification with values {unique_values}.", problem_type
+def get_status_message(is_sample, dataset_choice, target_col, problem_type, is_valid, validation_msg):
+    if is_sample:
+        return f"✅ **Selected Dataset**: {dataset_choice} | **Target**: {target_col} | **Type**: {problem_type.title()}"
+    elif target_col and problem_type:
+        status_icon = "✅" if is_valid else "⚠️"
+        return f"{status_icon} **Custom Data** | **Target**: {target_col} | **Type**: {problem_type.title()} | {validation_msg}"
+    else:
+        return "📁 **Custom data uploaded!** 👆 Please select target column above to continue."
+def load_and_configure_data_simple(dataset_choice="Breast Cancer"):
+    global current_dataframe
+    try:
+        if not LR_AVAILABLE:
+            # Fallback data loading without core module
+            df = load_sample_data_fallback(dataset_choice)
+        else:
+            df = logistic_regression.load_data(None, dataset_choice)
+        current_dataframe = df
+        target_options = df.columns.tolist()
+        cfg = SAMPLE_DATA_CONFIG.get(dataset_choice, {})
+        target_col = cfg.get("target_column")
+        problem_type = cfg.get("problem_type")
+        if target_col and target_col in target_options:
+            is_valid, validation_msg, detected = validate_config(df, target_col)
+            if detected:
+                problem_type = detected
+            status_msg = get_status_message(True, dataset_choice, target_col, problem_type, is_valid, validation_msg)
+        else:
+            # If target_col not in options, use first column as fallback
+            target_col = target_options[0] if target_options else None
+            status_msg = get_status_message(True, dataset_choice, target_col, problem_type, False, "")
+        return [df.head(5).round(2), gr.Dropdown(choices=target_options, value=target_col), status_msg]
+    except Exception as e:
+        current_dataframe = None
+        return [pd.DataFrame(), gr.Dropdown(choices=[], value=None), f"❌ **Error loading data**: {str(e)} | Please try a different dataset."]
+def load_and_configure_data(file_obj=None, dataset_choice="Breast Cancer"):
+    global current_dataframe
+    try:
+        if not LR_AVAILABLE:
+            # Fallback data loading without core module
+            if file_obj is not None:
+                # Handle file upload fallback
+                if file_obj.name.endswith(".csv"):
+                    df = pd.read_csv(file_obj.name)
+                elif file_obj.name.endswith((".xlsx", ".xls")):
+                    df = pd.read_excel(file_obj.name)
+                else:
+                    raise ValueError("Unsupported format. Upload CSV or Excel files.")
+            else:
+                df = load_sample_data_fallback(dataset_choice)
+        else:
+            df = logistic_regression.load_data(file_obj, dataset_choice)
+        current_dataframe = df
+        target_options = df.columns.tolist()
+        is_sample = file_obj is None
+        if is_sample:
+            cfg = SAMPLE_DATA_CONFIG.get(dataset_choice, {})
+            target_col = cfg.get("target_column")
+            problem_type = cfg.get("problem_type")
+        else:
+            target_col, problem_type = None, None
+        if target_col:
+            is_valid, validation_msg, detected = validate_config(df, target_col)
+            if detected:
+                problem_type = detected
+            status_msg = get_status_message(is_sample, dataset_choice, target_col, problem_type, is_valid, validation_msg)
+        else:
+            status_msg = get_status_message(is_sample, dataset_choice, target_col, problem_type, False, "")
+        input_updates = [gr.update(visible=False)] * 40
+        inputs_visible = gr.update(visible=False)
+        input_status = "⚙️ Configure target column above to enable feature inputs."
+        if target_col and problem_type and (not is_sample or is_valid):
+            try:
+                if LR_AVAILABLE:
+                    components_info = logistic_regression.create_input_components(df, target_col)
+                else:
+                    components_info = create_input_components_fallback(df, target_col)
+                for i in range(min(20, len(components_info))):
+                    comp = components_info[i]
+                    number_idx, dropdown_idx = i * 2, i * 2 + 1
+                    if comp["type"] == "number":
+                        upd = {"visible": True, "label": comp["name"], "value": comp["value"]}
+                        if comp["minimum"] is not None:
+                            upd["minimum"] = comp["minimum"]
+                        if comp["maximum"] is not None:
+                            upd["maximum"] = comp["maximum"]
+                        input_updates[number_idx] = gr.update(**upd)
+                        input_updates[dropdown_idx] = gr.update(visible=False)
+                    else:
+                        input_updates[number_idx] = gr.update(visible=False)
+                        input_updates[dropdown_idx] = gr.update(
+                            visible=True, label=comp["name"], choices=comp["choices"], value=comp["value"]
+                        )
+                inputs_visible = gr.update(visible=True)
+                input_status = f"📝 **Ready!** Enter values for {len(components_info)} features below, then click Run prediction. | {validation_msg}"
+            except Exception as e:
+                input_status = f"❌ Error generating inputs: {str(e)}"
+        return [df.head(5).round(2), gr.Dropdown(choices=target_options, value=target_col), status_msg] + input_updates + [inputs_visible, input_status]
+    except Exception as e:
+        current_dataframe = None
+        empty = [pd.DataFrame(), gr.Dropdown(choices=[], value=None), f"❌ **Error loading data**: {str(e)} | Please try a different file or dataset."]
+        return empty + [gr.update(visible=False)] * 40 + [gr.update(visible=False), "No data loaded."]
+def update_learning_rate_display(lr_power):
+    """Update the display to show what the current learning rate slider value represents"""
+    # Map slider value to actual learning rate
+    lr_values = [0.000001, 0.00001, 0.0001, 0.001, 0.01, 0.1, 1.0]
+    lr_labels = ["1e-6", "1e-5", "1e-4", "1e-3", "1e-2", "1e-1", "1"]
+    idx = int(lr_power)
+    if 0 <= idx < len(lr_values):
+        return f"**Current Learning Rate:** {lr_values[idx]} ({lr_labels[idx]})"
+    else:
+        return "**Current Learning Rate:** N/A"
+def update_batch_size_display(batch_size_power, train_split):
+    """Update the display to show what the current batch size slider value represents"""
+    global current_dataframe
+    df = current_dataframe
+    if df is None or df.empty:
+        return "**Current Batch Size:** N/A"
+    # Calculate training set size
+    train_size = int(len(df) * train_split)
+    # Determine max power of 2 that fits in training size
+    import math
+    max_power = int(math.log2(train_size)) if train_size > 0 else 0
+    # Convert slider value to batch size
+    if batch_size_power >= max_power + 1:
+        return f"**Current Batch Size:** Full Batch ({train_size} samples)"
+    else:
+        actual_batch_size = 2 ** int(batch_size_power)
+        return f"**Current Batch Size:** {actual_batch_size} samples (2^{int(batch_size_power)})"
+def update_batch_size_slider(df_preview, target_col, train_split):
+    """Update batch size slider max based on training data size"""
+    global current_dataframe
+    df = current_dataframe
+    if df is None or df.empty:
+        return gr.update(maximum=10, value=10)
+    # Calculate training set size
+    train_size = int(len(df) * train_split)
+    # Determine max power of 2 that fits in training size
+    import math
+    max_power = int(math.log2(train_size)) if train_size > 0 else 0
+    # Slider goes from 0 to max_power+1 (where max_power+1 = Full Batch)
+    new_max = max_power + 1
+    # Set value to Full Batch by default
+    return gr.update(maximum=new_max, value=new_max)
+def update_configuration(df_preview, target_col):
+    global current_dataframe
+    df = current_dataframe
+    if df is None or df.empty:
+        return [gr.update(visible=False)] * 40 + [gr.update(visible=False), "No data available.", "No data available."]
+    if not target_col:
+        return [gr.update(visible=False)] * 40 + [gr.update(visible=False), "Select target column.", "Select target column."]
+    try:
+        is_valid, validation_msg, problem_type = validate_config(df, target_col)
+        if not is_valid:
+            return [gr.update(visible=False)] * 40 + [gr.update(visible=False), f"⚠️ {validation_msg}", f"⚠️ {validation_msg}"]
+        if LR_AVAILABLE:
+            components_info = logistic_regression.create_input_components(df, target_col)
+        else:
+            components_info = create_input_components_fallback(df, target_col)
+        input_updates = [gr.update(visible=False)] * 40
+        for i in range(min(20, len(components_info))):
+            comp = components_info[i]
+            number_idx, dropdown_idx = i * 2, i * 2 + 1
+            if comp["type"] == "number":
+                upd = {"visible": True, "label": comp["name"], "value": comp["value"]}
+                if comp["minimum"] is not None:
+                    upd["minimum"] = comp["minimum"]
+                if comp["maximum"] is not None:
+                    upd["maximum"] = comp["maximum"]
+                input_updates[number_idx] = gr.update(**upd)
+                input_updates[dropdown_idx] = gr.update(visible=False)
+            else:
+                input_updates[number_idx] = gr.update(visible=False)
+                input_updates[dropdown_idx] = gr.update(
+                    visible=True, label=comp["name"], choices=comp["choices"], value=comp["value"]
+                )
+        input_status = f"📝 Enter values for {len(components_info)} features | {validation_msg}"
+        status_msg = f"✅ **Selected Dataset**: Custom Data | **Target**: {target_col} | **Type**: {problem_type.title()}"
+        return input_updates + [gr.update(visible=True), input_status, status_msg]
+    except Exception as e:
+        return [gr.update(visible=False)] * 40 + [gr.update(visible=False), f"❌ Error: {str(e)}", f"❌ Error: {str(e)}"]
+# Logistic Regression prediction function
+def execute_prediction(df_preview, target_col, epochs, learning_rate_power, batch_size_power, train_test_split_ratio, threshold, *input_values):
+    global current_dataframe
+    df = current_dataframe
+    EMPTY_PLOT = None
+    EMPTY_HTML = ""
+    error_style = "<div style='background:#FFEBEE;border-left:6px solid #C62828;padding:14px 16px;border-radius:10px;'><strong>📊 Logistic Regression</strong><br><br>{}</div>"
+    # Check if Logistic Regression core is available
+    if not LR_AVAILABLE:
+        return (EMPTY_PLOT, EMPTY_PLOT, error_style.format("❌ Logistic Regression module is not available!<br><br>Please check the installation."))
+    if df is None or df.empty:
+        return (EMPTY_PLOT, EMPTY_PLOT, error_style.format("No data available."))
+    if not target_col:
+        return (EMPTY_PLOT, EMPTY_PLOT, error_style.format("Configuration incomplete."))
+    is_valid, validation_msg, problem_type = validate_config(df, target_col)
+    if not is_valid:
+        return (EMPTY_PLOT, EMPTY_PLOT, error_style.format("Configuration issue."))
+    try:
+        if LR_AVAILABLE:
+            components_info = logistic_regression.create_input_components(df, target_col)
+        else:
+            components_info = create_input_components_fallback(df, target_col)
+        new_point_dict = {}
+        for i, comp in enumerate(components_info):
+            number_idx = i * 2
+            v = input_values[number_idx] if number_idx < len(input_values) and input_values[number_idx] is not None else comp["value"]
+            new_point_dict[comp["name"]] = v
+        # Convert learning rate slider value to actual learning rate
+        lr_values = [0.000001, 0.00001, 0.0001, 0.001, 0.01, 0.1, 1.0]
+        idx = int(learning_rate_power)
+        if 0 <= idx < len(lr_values):
+            lr_float = lr_values[idx]
+        else:
+            lr_float = 0.01  # Default fallback
+        # Convert batch_size_power to actual batch size string
+        train_size = int(len(df) * train_test_split_ratio)
+        import math
+        max_power = int(math.log2(train_size)) if train_size > 0 else 0
+        if batch_size_power >= max_power + 1:
+            batch_size_str = "Full Batch"
+        else:
+            actual_batch_size = 2 ** int(batch_size_power)
+            batch_size_str = str(actual_batch_size)
+        train_loss_fig, val_loss_fig, results_display, prediction = logistic_regression.run_logistic_regression_and_visualize(
+            df, target_col, new_point_dict, epochs, lr_float, batch_size_str, train_test_split_ratio, threshold
+        )
+        return (train_loss_fig, val_loss_fig, results_display)
+    except Exception as e:
+        print(f"Execution error: {str(e)}")  # For debugging
+        import traceback
+        traceback.print_exc()
+        return (EMPTY_PLOT, EMPTY_PLOT, error_style.format(f"Execution error: {str(e)}"))
+# No tree visualization needed for logistic regression
+with gr.Blocks(theme="gstaff/sketch", css=vlai_template.custom_css, fill_width=True, js=force_light_theme_js) as demo:
+    vlai_template.create_header()
+    gr.HTML(vlai_template.render_info_card(
+        icon="📊",
+        title="About this Logistic Regression Demo",
+        description="Interactive demonstration of Logistic Regression using NumPy and gradient descent. Learn binary classification with sigmoid activation, binary cross-entropy loss, and adjustable prediction threshold. Visualize training metrics and experiment with different threshold values."
+    ))
+    gr.Markdown("### 📊 **How to Use**: Select binary classification data → Configure target (must have 2 classes) → Set training parameters → Adjust threshold → Enter feature values → Run training!")
+    with gr.Row(equal_height=False, variant="panel"):
+        with gr.Column(scale=45):
+            with gr.Accordion("📊 Data & Configuration", open=True):
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        gr.Markdown("Start with sample datasets or upload your own CSV/Excel files.")
+                        file_upload = gr.File(label="📁 Upload Your Data", file_types=[".csv", ".xlsx", ".xls"])
+                    with gr.Column(scale=3):
+                        sample_dataset = gr.Dropdown(choices=list(SAMPLE_DATA_CONFIG.keys()), value="Breast Cancer", label="🗂️ Sample Datasets")
+                with gr.Row():
+                    target_column = gr.Dropdown(choices=[], label="🎯 Target Column", interactive=True)
+                status_message = gr.Markdown("🔄 Loading sample data...")
+                data_preview = gr.DataFrame(label="📋 Data Preview (First 5 Rows)", row_count=5, interactive=False, max_height=250)
+            with gr.Accordion("📊 Training Parameters & Input", open=True):
+                gr.Markdown("**📊 Logistic Regression Parameters**")
+                with gr.Row():
+                    epochs = gr.Number(
+                        label="Number of Epochs",
+                        value=100, minimum=1, maximum=1000, precision=0,
+                        info="Number of training iterations"
+                    )
+                    learning_rate_slider = gr.Slider(
+                        label="Learning Rate (Power of 10)",
+                        value=4, minimum=0, maximum=6, step=1,
+                        info="0=1e-6, 1=1e-5, 2=1e-4, 3=1e-3, 4=1e-2, 5=1e-1, 6=1"
+                    )
+                    learning_rate_display = gr.Markdown("**Current Learning Rate:** 0.01")
+                    batch_size_slider = gr.Slider(
+                        label="Batch Size (Power of 2)",
+                        value=10, minimum=0, maximum=10, step=1,
+                        info="Slide to select: 0=1, 1=2, 2=4, 3=8, ... Max=Full Batch"
+                    )
+                    batch_size_display = gr.Markdown("**Current Batch Size:** Full Batch")
+                gr.Markdown("**📊 Data Split Configuration**")
+                with gr.Row():
+                    train_test_split_ratio = gr.Slider(
+                        label="Train/Validation Split Ratio",
+                        value=0.8, minimum=0.6, maximum=0.9, step=0.05,
+                        info="Proportion of data used for training (e.g., 0.8 = 80% train, 20% validation)"
+                    )
+                gr.Markdown("**🎯 Prediction Threshold Configuration**")
+                with gr.Row():
+                    threshold = gr.Slider(
+                        label="Classification Threshold",
+                        value=0.5, minimum=0.0, maximum=1.0, step=0.01,
+                        info="Probability threshold for binary classification. Predict class 1 if probability ≥ threshold, else class 0. Adjust to balance precision/recall."
+                    )
+                    threshold_display = gr.Markdown("**Current Threshold:** 0.50")
+                inputs_group = gr.Group(visible=False)
+                with inputs_group:
+                    input_status = gr.Markdown("Configure inputs above.")
+                    gr.Markdown("**📝 New Data Point** - Enter feature values for prediction:")
+                    input_components = []
+                    for row in range(5):
+                        with gr.Row():
+                            for col in range(4):
+                                idx = row * 4 + col
+                                if idx < 20:
+                                    number_comp = gr.Number(label=f"Feature {idx+1}", visible=False)
+                                    dropdown_comp = gr.Dropdown(label=f"Feature {idx+1}", visible=False)
+                                    input_components.extend([number_comp, dropdown_comp])
+                run_prediction_btn = gr.Button("📊 Run Training & Prediction", variant="primary", size="lg")
+        with gr.Column(scale=55):
+            gr.Markdown("### 📊 **Logistic Regression Results & Visualization**")
+            train_loss_chart = gr.Plot(label="Training Loss & Accuracy Over Epochs", visible=True)
+            val_loss_chart = gr.Plot(label="Validation Loss & Accuracy Over Epochs", visible=True)
+            results_display = gr.HTML("**📊 Logistic Regression Results**<br><br>Training details will appear here showing model performance, learned parameters, and predictions with current threshold.", label="📊 Results & Predictions")
+    gr.Markdown("""📊 **Logistic Regression Guide**:
+**📈 Training Metrics**:
+- **Loss (BCE)**: Binary Cross-Entropy loss decreases as model learns. Lower loss indicates better fit.
+- **Accuracy**: Classification accuracy improves during training. Monitor both training and validation accuracy.
+**🔧 Training Parameters**:
+- **Epochs**: Number of complete passes through training data. More epochs = better learning, but watch for overfitting.
+- **Learning Rate**: Step size for gradient descent. Recommended: 0.001 to 0.01. Too high may cause instability.
+- **Batch Size**: Samples processed before updating parameters. Powers of 2: 1, 2, 4, 8... or Full Batch. Smaller = faster updates but noisier. Larger = more stable.
+- **Train/Validation Split**: Proportion of data for training vs validation. Default 80/20 split.
+**🎯 Threshold Parameter**:
+- **Threshold**: Probability cutoff for binary classification. If predicted probability ≥ threshold → class 1, else → class 0.
+- **Default**: 0.5 (balanced)
+- **Lower threshold** (e.g., 0.3): More predictions of class 1 → higher recall, lower precision
+- **Higher threshold** (e.g., 0.7): Fewer predictions of class 1 → higher precision, lower recall
+- **Experiment**: Adjust threshold to see how predictions and accuracy change!
+**🧮 Algorithm Details**:
+- **Sigmoid Activation**: Maps linear output to probability (0-1 range)
+- **Binary Cross-Entropy Loss**: Optimized for binary classification tasks
+- **Feature Normalization**: Automatic standardization (zero mean, unit variance) for stable training
+**💡 Tips**:
+- Start with default parameters (100 epochs, learning rate 0.01, threshold 0.5)
+- Monitor validation metrics to detect overfitting
+- Adjust threshold based on your classification goals (precision vs recall)
+- Use batch size = Full Batch for most stable training
+""")
+    vlai_template.create_footer()
+    load_evt = demo.load(
+        fn=lambda: load_and_configure_data(None, "Breast Cancer"),
+        outputs=[data_preview, target_column, status_message] + input_components + [inputs_group, input_status],
+    ).then(
+        fn=update_batch_size_slider,
+        inputs=[data_preview, target_column, train_test_split_ratio],
+        outputs=[batch_size_slider],
+    ).then(
+        fn=update_batch_size_display,
+        inputs=[batch_size_slider, train_test_split_ratio],
+        outputs=[batch_size_display],
+    ).then(
+        fn=update_learning_rate_display,
+        inputs=[learning_rate_slider],
+        outputs=[learning_rate_display],
+    )
+    upload_evt = file_upload.upload(
+        fn=lambda file: load_and_configure_data(file, "Breast Cancer"),
+        inputs=[file_upload],
+        outputs=[data_preview, target_column, status_message] + input_components + [inputs_group, input_status],
+    ).then(
+        fn=update_batch_size_slider,
+        inputs=[data_preview, target_column, train_test_split_ratio],
+        outputs=[batch_size_slider],
+    ).then(
+        fn=update_batch_size_display,
+        inputs=[batch_size_slider, train_test_split_ratio],
+        outputs=[batch_size_display],
+    )
+    sample_dataset.change(
+        fn=lambda choice: load_and_configure_data_simple(choice),
+        inputs=[sample_dataset],
+        outputs=[data_preview, target_column, status_message],
+    ).then(
+        fn=update_configuration, inputs=[data_preview, target_column],
+        outputs=input_components + [inputs_group, input_status, status_message],
+    ).then(
+        fn=update_batch_size_slider,
+        inputs=[data_preview, target_column, train_test_split_ratio],
+        outputs=[batch_size_slider],
+    ).then(
+        fn=update_batch_size_display,
+        inputs=[batch_size_slider, train_test_split_ratio],
+        outputs=[batch_size_display],
+    )
+    target_column.change(
+        fn=update_configuration, inputs=[data_preview, target_column],
+        outputs=input_components + [inputs_group, input_status, status_message],
+    ).then(
+        fn=update_batch_size_slider,
+        inputs=[data_preview, target_column, train_test_split_ratio],
+        outputs=[batch_size_slider],
+    ).then(
+        fn=update_batch_size_display,
+        inputs=[batch_size_slider, train_test_split_ratio],
+        outputs=[batch_size_display],
+    )
+    # Update batch size display when slider or train/test split changes
+    batch_size_slider.change(
+        fn=update_batch_size_display,
+        inputs=[batch_size_slider, train_test_split_ratio],
+        outputs=[batch_size_display],
+    )
+    train_test_split_ratio.change(
+        fn=update_batch_size_slider,
+        inputs=[data_preview, target_column, train_test_split_ratio],
+        outputs=[batch_size_slider],
+    ).then(
+        fn=update_batch_size_display,
+        inputs=[batch_size_slider, train_test_split_ratio],
+        outputs=[batch_size_display],
+    )
+    # Update learning rate display when slider changes
+    learning_rate_slider.change(
+        fn=update_learning_rate_display,
+        inputs=[learning_rate_slider],
+        outputs=[learning_rate_display],
+    )
+    threshold.change(
+        fn=lambda t: f"**Current Threshold:** {t:.2f}",
+        inputs=[threshold],
+        outputs=[threshold_display],
+    )
+    run_prediction_btn.click(
+        fn=execute_prediction,
+        inputs=[data_preview, target_column, epochs, learning_rate_slider, batch_size_slider, train_test_split_ratio, threshold] + input_components,
+        outputs=[train_loss_chart, val_loss_chart, results_display],
+    )
+if __name__ == "__main__":
+    demo.launch(allowed_paths=["static/aivn_logo.png", "static/vlai_logo.png", "static"])

packages.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ graphviz
2	+ fonts-liberation

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+gradio>=5.38.0
+pandas>=1.5.0
+scikit-learn>=1.3.0
+numpy>=1.24.0
+plotly>=5.15.0

src/__init__.py ADDED Viewed

File without changes

src/logistic_regression.py ADDED Viewed

	@@ -0,0 +1,494 @@

+import pandas as pd
+import numpy as np
+from sklearn.datasets import load_breast_cancer, load_wine, make_classification
+from sklearn.model_selection import train_test_split
+from plotly.subplots import make_subplots
+import plotly.graph_objects as go
+import time
+_current_model_params = None
+def _get_current_model():
+    return _current_model_params
+def _set_current_model(params):
+    global _current_model_params
+    _current_model_params = params
+def load_data(file_obj=None, dataset_choice="Breast Cancer"):
+    """Load binary classification datasets"""
+    if file_obj is not None:
+        if file_obj.name.endswith(".csv"):
+            encodings = ["utf-8", "latin-1", "iso-8859-1", "cp1252"]
+            for encoding in encodings:
+                try:
+                    return pd.read_csv(file_obj.name, encoding=encoding)
+                except UnicodeDecodeError:
+                    continue
+            return pd.read_csv(file_obj.name, encoding="utf-8", errors="replace")
+        elif file_obj.name.endswith((".xlsx", ".xls")):
+            return pd.read_excel(file_obj.name)
+        else:
+            raise ValueError("Unsupported format. Upload CSV or Excel files.")
+    datasets = {
+        "Breast Cancer": lambda: _sklearn_to_df(load_breast_cancer()),
+        "Wine (Binary)": lambda: _wine_to_binary_df(load_wine()),
+        "Synthetic": lambda: _synthetic_classification(),
+    }
+    if dataset_choice not in datasets:
+        raise ValueError(f"Unknown dataset: {dataset_choice}")
+    return datasets[dataset_choice]()
+def _sklearn_to_df(data):
+    """Convert sklearn dataset to DataFrame"""
+    df = pd.DataFrame(data.data, columns=getattr(data, "feature_names", None))
+    if df.columns.isnull().any():
+        df.columns = [f"feature_{i}" for i in range(df.shape[1])]
+    df["target"] = data.target
+    return df
+def _wine_to_binary_df(wine_data):
+    """Convert wine dataset to binary classification (class 0 vs others)"""
+    df = pd.DataFrame(wine_data.data, columns=wine_data.feature_names)
+    df["target"] = (wine_data.target == 0).astype(int)
+    return df
+def _synthetic_classification():
+    """Generate synthetic binary classification dataset"""
+    X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
+                               n_redundant=5, n_classes=2, random_state=42)
+    df = pd.DataFrame(X, columns=[f"feature_{i}" for i in range(X.shape[1])])
+    df["target"] = y
+    return df
+def create_input_components(df, target_col):
+    """Create input components for feature values"""
+    feature_cols = [c for c in df.columns if c != target_col]
+    components = []
+    for col in feature_cols:
+        data = df[col]
+        val = pd.to_numeric(data, errors="coerce").dropna().mean()
+        val = 0.0 if pd.isna(val) else float(val)
+        components.append(
+            {
+                "name": col,
+                "type": "number",
+                "value": round(val, 3),
+                "minimum": None,
+                "maximum": None,
+            }
+        )
+    return components
+def preprocess_data(df, target_col, new_point_dict):
+    """Preprocess data for logistic regression"""
+    feature_cols = [c for c in df.columns if c != target_col]
+    X = df[feature_cols].copy()
+    y = df[target_col].copy()
+    # Convert to numeric
+    for col in feature_cols:
+        X[col] = pd.to_numeric(X[col], errors="coerce").fillna(0.0)
+    # Ensure binary target (0 or 1)
+    unique_vals = sorted(y.unique())
+    if len(unique_vals) != 2:
+        raise ValueError(f"Target must be binary (0/1). Found {len(unique_vals)} unique values: {unique_vals}")
+    # Map to 0/1 if needed
+    y_mapped = y.copy()
+    if set(unique_vals) != {0, 1}:
+        mapping = {unique_vals[0]: 0, unique_vals[1]: 1}
+        y_mapped = y.map(mapping)
+    # Prepare new point
+    new_point = []
+    for col in feature_cols:
+        if col in new_point_dict:
+            try:
+                new_point.append(float(new_point_dict[col]))
+            except Exception:
+                new_point.append(0.0)
+        else:
+            new_point.append(0.0)
+    new_point = np.array(new_point, dtype=float).reshape(1, -1)
+    return X.values, np.array(y_mapped, dtype=int), new_point, feature_cols
+def add_bias(X):
+    """Add bias column to feature matrix"""
+    return np.c_[np.ones(X.shape[0]), X]
+def sigmoid(z):
+    """Sigmoid activation function: σ(z) = 1 / (1 + exp(-z))"""
+    z = np.clip(z, -500, 500)
+    return 1 / (1 + np.exp(-z))
+def predict_proba(X, theta):
+    """Make probability predictions: y_hat = sigmoid(X @ theta)"""
+    z = X.dot(theta)
+    return sigmoid(z)
+def predict_class(X, theta, threshold=0.5):
+    """Make binary class predictions using threshold"""
+    proba = predict_proba(X, theta)
+    return (proba >= threshold).astype(int)
+def compute_loss(y_hat, y):
+    """Compute Binary Cross-Entropy loss: -[y*log(ŷ) + (1-y)*log(1-ŷ)]"""
+    eps = 1e-15
+    y_hat = np.clip(y_hat, eps, 1 - eps)
+    loss = -(y * np.log(y_hat) + (1 - y) * np.log(1 - y_hat))
+    return np.mean(loss)
+def compute_gradient(y_hat, y, X):
+    """Compute gradient: X.T @ (y_hat - y) / N"""
+    N = len(y)
+    return X.T.dot(y_hat - y) / N
+def update_theta(theta, gradient, lr):
+    """Update parameters using gradient descent"""
+    return theta - lr * gradient
+def compute_accuracy(y_true, y_pred):
+    """Compute classification accuracy"""
+    return np.mean(y_true == y_pred)
+def normalize_features(X_train, X_val=None, X_test=None):
+    """Normalize features using standardization (zero mean, unit variance)"""
+    mean = np.mean(X_train, axis=0)
+    std = np.std(X_train, axis=0)
+    std[std == 0] = 1
+    X_train_norm = (X_train - mean) / std
+    X_val_norm = (X_val - mean) / std if X_val is not None else None
+    X_test_norm = (X_test - mean) / std if X_test is not None else None
+    return X_train_norm, X_val_norm, X_test_norm, mean, std
+def train_logistic_regression_with_validation(X_train, y_train, X_val, y_val, epochs, learning_rate, batch_size=None):
+    """
+    Train logistic regression with mini-batch gradient descent
+    Returns:
+        theta, train_losses, val_losses, train_accuracies, val_accuracies, X_mean, X_std
+    """
+    X_train_norm, X_val_norm, _, X_mean, X_std = normalize_features(X_train, X_val)
+    X_train_bias = add_bias(X_train_norm)
+    X_val_bias = add_bias(X_val_norm)
+    np.random.seed(42)
+    theta = np.random.randn(X_train_bias.shape[1]) * 0.01
+    train_losses = []
+    val_losses = []
+    train_accuracies = []
+    val_accuracies = []
+    n_samples = X_train_bias.shape[0]
+    if batch_size is None or batch_size >= n_samples:
+        actual_batch_size = n_samples
+    else:
+        actual_batch_size = batch_size
+    for epoch in range(epochs):
+        if actual_batch_size < n_samples:
+            indices = np.random.permutation(n_samples)
+            X_train_shuffled = X_train_bias[indices]
+            y_train_shuffled = y_train[indices]
+        else:
+            X_train_shuffled = X_train_bias
+            y_train_shuffled = y_train
+        for i in range(0, n_samples, actual_batch_size):
+            X_batch = X_train_shuffled[i:i+actual_batch_size]
+            y_batch = y_train_shuffled[i:i+actual_batch_size]
+            y_batch_hat = predict_proba(X_batch, theta)
+            gradient = compute_gradient(y_batch_hat, y_batch, X_batch)
+            theta = update_theta(theta, gradient, learning_rate)
+        y_train_hat = predict_proba(X_train_bias, theta)
+        train_loss = compute_loss(y_train_hat, y_train)
+        train_losses.append(train_loss)
+        y_train_pred = predict_class(X_train_bias, theta)
+        train_acc = compute_accuracy(y_train, y_train_pred)
+        train_accuracies.append(train_acc)
+        y_val_hat = predict_proba(X_val_bias, theta)
+        val_loss = compute_loss(y_val_hat, y_val)
+        val_losses.append(val_loss)
+        y_val_pred = predict_class(X_val_bias, theta)
+        val_acc = compute_accuracy(y_val, y_val_pred)
+        val_accuracies.append(val_acc)
+    return theta, train_losses, val_losses, train_accuracies, val_accuracies, X_mean, X_std
+def run_logistic_regression_and_visualize(df, target_col, new_point_dict,
+                                        epochs, learning_rate, batch_size_str="Full Batch",
+                                        train_test_split_ratio=0.8, threshold=0.5):
+    """Run logistic regression training and generate visualizations"""
+    X, y, new_point, feature_cols = preprocess_data(df, target_col, new_point_dict)
+    if epochs < 1:
+        return None, None, None, "Number of epochs must be ≥ 1.", None
+    if learning_rate <= 0:
+        return None, None, None, "Learning rate must be > 0.", None
+    test_size = 1.0 - train_test_split_ratio
+    X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=test_size, random_state=42, stratify=y)
+    if batch_size_str == "Full Batch":
+        batch_size = None
+    else:
+        batch_size = int(batch_size_str)
+    start_time = time.time()
+    theta, train_losses, val_losses, train_accuracies, val_accuracies, X_mean, X_std = train_logistic_regression_with_validation(
+        X_train, y_train, X_val, y_val, epochs, learning_rate, batch_size
+    )
+    training_time = time.time() - start_time
+    _set_current_model({
+        "theta": theta,
+        "feature_cols": feature_cols,
+        "X_mean": X_mean,
+        "X_std": X_std
+    })
+    # Prepare normalized data for prediction with threshold
+    X_train_norm, X_val_norm, _, _, _ = normalize_features(X_train, X_val)
+    X_train_bias = add_bias(X_train_norm)
+    X_val_bias = add_bias(X_val_norm)
+    # Make prediction with threshold
+    new_point_norm = (new_point - X_mean) / X_std
+    new_point_bias = add_bias(new_point_norm)
+    prediction_proba = predict_proba(new_point_bias, theta)[0]
+    prediction_class = predict_class(new_point_bias, theta, threshold)[0]
+    # Compute metrics with threshold
+    y_train_pred_thresh = predict_class(X_train_bias, theta, threshold)
+    y_val_pred_thresh = predict_class(X_val_bias, theta, threshold)
+    train_acc_thresh = compute_accuracy(y_train, y_train_pred_thresh)
+    val_acc_thresh = compute_accuracy(y_val, y_val_pred_thresh)
+    final_train_loss = train_losses[-1]
+    final_val_loss = val_losses[-1]
+    final_train_acc = train_accuracies[-1]
+    final_val_acc = val_accuracies[-1]
+    train_loss_fig = create_training_loss_chart(train_losses, train_accuracies)
+    val_loss_fig = create_validation_loss_chart(val_losses, val_accuracies)
+    results_display = create_results_display(
+        theta, prediction_proba, prediction_class, feature_cols, epochs, learning_rate, threshold,
+        split_info={
+            "train_size": len(X_train),
+            "val_size": len(X_val),
+            "train_ratio": train_test_split_ratio,
+            "val_ratio": 1.0 - train_test_split_ratio,
+            "train_loss": final_train_loss,
+            "val_loss": final_val_loss,
+            "train_acc": final_train_acc,
+            "val_acc": final_val_acc,
+            "train_acc_thresh": train_acc_thresh,
+            "val_acc_thresh": val_acc_thresh,
+            "batch_size": batch_size_str,
+            "training_time": training_time
+        }
+    )
+    return train_loss_fig, val_loss_fig, results_display, prediction_proba
+def create_training_loss_chart(train_losses, train_accuracies):
+    """Create training loss and accuracy visualization"""
+    if not train_losses or len(train_losses) == 0:
+        return None
+    epochs = list(range(1, len(train_losses) + 1))
+    valid_losses = [loss if not (np.isinf(loss) or np.isnan(loss)) else None for loss in train_losses]
+    fig = make_subplots(
+        rows=2, cols=1,
+        subplot_titles=("Training Loss (Binary Cross-Entropy)", "Training Accuracy"),
+        vertical_spacing=0.15,
+        row_heights=[0.5, 0.5]
+    )
+    fig.add_trace(
+        go.Scatter(
+            x=epochs,
+            y=valid_losses,
+            mode='lines+markers',
+            name='Training Loss',
+            line=dict(color='#1976D2', width=3),
+            marker=dict(size=6),
+            showlegend=True
+        ),
+        row=1, col=1
+    )
+    if train_accuracies and len(train_accuracies) == len(train_losses):
+        valid_accuracies = [acc * 100 if not (np.isinf(acc) or np.isnan(acc)) else None for acc in train_accuracies]
+        fig.add_trace(
+            go.Scatter(
+                x=epochs,
+                y=valid_accuracies,
+                mode='lines+markers',
+                name='Training Accuracy',
+                line=dict(color='#42A5F5', width=3),
+                marker=dict(size=6),
+                showlegend=True
+            ),
+            row=2, col=1
+        )
+    fig.update_xaxes(title_text="Epoch", row=1, col=1, showgrid=True, gridwidth=1, gridcolor='lightgray')
+    fig.update_yaxes(title_text="Loss", row=1, col=1, showgrid=True, gridwidth=1, gridcolor='lightgray')
+    fig.update_xaxes(title_text="Epoch", row=2, col=1, showgrid=True, gridwidth=1, gridcolor='lightgray')
+    fig.update_yaxes(title_text="Accuracy (%)", row=2, col=1, showgrid=True, gridwidth=1, gridcolor='lightgray', range=[0, 100])
+    fig.update_layout(
+        title="Training Metrics Over Epochs",
+        plot_bgcolor="white",
+        height=600,
+        margin=dict(l=40, r=40, t=80, b=40)
+    )
+    return fig
+def create_validation_loss_chart(val_losses, val_accuracies):
+    """Create validation loss and accuracy visualization"""
+    if not val_losses or len(val_losses) == 0:
+        return None
+    epochs = list(range(1, len(val_losses) + 1))
+    valid_losses = [loss if not (np.isinf(loss) or np.isnan(loss)) else None for loss in val_losses]
+    fig = make_subplots(
+        rows=2, cols=1,
+        subplot_titles=("Validation Loss (Binary Cross-Entropy)", "Validation Accuracy"),
+        vertical_spacing=0.15,
+        row_heights=[0.5, 0.5]
+    )
+    fig.add_trace(
+        go.Scatter(
+            x=epochs,
+            y=valid_losses,
+            mode='lines+markers',
+            name='Validation Loss',
+            line=dict(color='#7B1FA2', width=3),
+            marker=dict(size=6),
+            showlegend=True
+        ),
+        row=1, col=1
+    )
+    if val_accuracies and len(val_accuracies) == len(val_losses):
+        valid_accuracies = [acc * 100 if not (np.isinf(acc) or np.isnan(acc)) else None for acc in val_accuracies]
+        fig.add_trace(
+            go.Scatter(
+                x=epochs,
+                y=valid_accuracies,
+                mode='lines+markers',
+                name='Validation Accuracy',
+                line=dict(color='#BA68C8', width=3),
+                marker=dict(size=6),
+                showlegend=True
+            ),
+            row=2, col=1
+        )
+    fig.update_xaxes(title_text="Epoch", row=1, col=1, showgrid=True, gridwidth=1, gridcolor='lightgray')
+    fig.update_yaxes(title_text="Loss", row=1, col=1, showgrid=True, gridwidth=1, gridcolor='lightgray')
+    fig.update_xaxes(title_text="Epoch", row=2, col=1, showgrid=True, gridwidth=1, gridcolor='lightgray')
+    fig.update_yaxes(title_text="Accuracy (%)", row=2, col=1, showgrid=True, gridwidth=1, gridcolor='lightgray', range=[0, 100])
+    fig.update_layout(
+        title="Validation Metrics Over Epochs",
+        plot_bgcolor="white",
+        height=600,
+        margin=dict(l=40, r=40, t=80, b=40)
+    )
+    return fig
+def create_results_display(theta, prediction_proba, prediction_class, feature_cols, epochs, learning_rate, threshold, split_info):
+    """Create HTML display showing model results"""
+    theta_str = f"[{theta[0]:.4f}"
+    for i, w in enumerate(theta[1:]):
+        theta_str += f", {w:.4f}"
+    theta_str += "]"
+    html_content = f"""
+    <div style='background:#E3F2FD;border-left:6px solid #1976D2;padding:14px 16px;border-radius:10px;'>
+        <strong style='color:#0D47A1;'>📊 Logistic Regression Results</strong><br><br>
+        <div style='margin:8px 0;'>
+            <strong style='color:#1976D2;'>🔧 Model Configuration:</strong><br>
+            • Epochs: {epochs} | Learning Rate: {learning_rate}<br>
+            • Batch Size: {split_info.get('batch_size', 'Full Batch')} | Features: {len(feature_cols)}<br>
+            • Normalization: Standardized | Activation: Sigmoid | Loss: Binary Cross-Entropy<br>
+        </div>
+        <div style='margin:8px 0;'>
+            <strong style='color:#1976D2;'>📊 Data Split:</strong><br>
+            • Training: {split_info['train_size']} samples ({split_info['train_ratio']:.1%})<br>
+            • Validation: {split_info['val_size']} samples ({split_info['val_ratio']:.1%})<br>
+        </div>
+        <div style='margin:8px 0;'>
+            <strong style='color:#1976D2;'>📈 Performance Metrics:</strong><br>
+            • Training Loss (BCE): <span style='background:#BBDEFB;padding:2px 6px;border-radius:4px;'><strong>{split_info['train_loss']:.4f}</strong></span><br>
+            • Validation Loss (BCE): <span style='background:#C5CAE9;padding:2px 6px;border-radius:4px;'><strong>{split_info['val_loss']:.4f}</strong></span><br>
+            • Training Accuracy (threshold={threshold:.2f}): <span style='background:#BBDEFB;padding:2px 6px;border-radius:4px;'><strong>{split_info['train_acc_thresh']*100:.2f}%</strong></span><br>
+            • Validation Accuracy (threshold={threshold:.2f}): <span style='background:#C5CAE9;padding:2px 6px;border-radius:4px;'><strong>{split_info['val_acc_thresh']*100:.2f}%</strong></span><br>
+            • Training Time: <span style='background:#E1BEE7;padding:2px 6px;border-radius:4px;'><strong>{split_info['training_time']:.4f}s</strong></span><br>
+        </div>
+        <div style='margin:8px 0;'>
+            <strong style='color:#1976D2;'>🎯 Learned Parameters (θ):</strong><br>
+            • Theta = <code style='background:#F3E5F5;padding:2px 6px;border-radius:4px;'>{theta_str}</code><br>
+            • Bias (θ₀) = {theta[0]:.4f}<br>
+        </div>
+        <div style='margin:8px 0;'>
+            <strong style='color:#1976D2;'>🔮 Prediction (Threshold = {threshold:.2f}):</strong><br>
+            • Probability: <span style='background:#DCEDC8;padding:2px 6px;border-radius:4px;'><strong>{prediction_proba:.4f}</strong></span> ({(prediction_proba*100):.2f}%)<br>
+            • Predicted Class: <span style='background:#DCEDC8;padding:2px 6px;border-radius:4px;'><strong>{prediction_class}</strong></span> (0 = Class 0, 1 = Class 1)<br>
+            <em style='font-size:0.9em;color:#424242;'>* Adjust threshold to see how predictions change. Lower threshold → more predictions of class 1</em><br>
+        </div>
+    </div>
+    """
+    return html_content

static/aivn_logo.png ADDED Viewed

static/vlai_logo.png ADDED Viewed

vlai_template.py ADDED Viewed

	@@ -0,0 +1,250 @@

+import os, base64
+import gradio as gr
+# Theming (can be overridden by the host app)
+PRIMARY_COLOR = "#0F6CBD"   # medical calm blue
+ACCENT_COLOR = "#C4314B"    # medical alert red
+SUCCESS_COLOR = "#2E7D32"   # positive/ok
+BG1 = "#F0F7FF"
+BG2 = "#E8F0FA"
+BG3 = "#DDE7F8"
+FONT_FAMILY = "'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', Arial, 'Noto Sans', 'Liberation Sans', sans-serif"
+# App metadata (overridable)
+PROJECT_NAME = "Demo Project"
+AIO_YEAR = "2025"
+AIO_MODULE = "00"
+PROJECT_DESCRIPTION = ""
+META_INFO = []  # list of (label, value)
+def set_colors(primary: str = None, accent: str = None, bg1: str = None, bg2: str = None, bg3: str = None):
+    """Allow host app to set theme colors dynamically."""
+    global PRIMARY_COLOR, ACCENT_COLOR, BG1, BG2, BG3, custom_css
+    if primary:
+        PRIMARY_COLOR = primary
+    if accent:
+        ACCENT_COLOR = accent
+    if bg1:
+        BG1 = bg1
+    if bg2:
+        BG2 = bg2
+    if bg3:
+        BG3 = bg3
+    # Rebuild CSS with new colors
+    custom_css = _build_custom_css()
+def set_font(font_family: str):
+    """Allow host app to set a custom font stack (e.g., 'Inter', system fallbacks)."""
+    global FONT_FAMILY, custom_css
+    if font_family and isinstance(font_family, str):
+        FONT_FAMILY = font_family
+        custom_css = _build_custom_css()
+def set_meta(project_name: str = None, year: str = None, module: str = None, description: str = None, meta_items: list = None):
+    """Set project metadata used across the header and info sections."""
+    global PROJECT_NAME, AIO_YEAR, AIO_MODULE, PROJECT_DESCRIPTION, META_INFO
+    if project_name is not None:
+        PROJECT_NAME = project_name
+    if year is not None:
+        AIO_YEAR = year
+    if module is not None:
+        AIO_MODULE = module
+    if description is not None:
+        PROJECT_DESCRIPTION = description
+    if meta_items is not None:
+        META_INFO = meta_items
+def configure(project_name: str = None, year: str = None, module: str = None, description: str = None,
+              colors: dict = None, font_family: str = None, meta_items: list = None):
+    """One-call configuration for meta, theme, and font."""
+    if colors:
+        set_colors(
+            primary=colors.get("primary"),
+            accent=colors.get("accent"),
+            bg1=colors.get("bg1"),
+            bg2=colors.get("bg2"),
+            bg3=colors.get("bg3"),
+        )
+    if font_family:
+        set_font(font_family)
+    set_meta(project_name, year, module, description, meta_items)
+def image_to_base64(image_path: str):
+    # Construct the absolute path to the image
+    current_dir = os.path.dirname(os.path.abspath(__file__))
+    full_image_path = os.path.join(current_dir, image_path)
+    with open(full_image_path, "rb") as f:
+        return base64.b64encode(f.read()).decode("utf-8")
+def create_header():
+    with gr.Row():
+        with gr.Column(scale=2):
+            logo_base64 = image_to_base64("static/aivn_logo.png")
+            gr.HTML(
+                f"""<img src="data:image/png;base64,{logo_base64}"
+                        alt="Logo"
+                        style="height:120px;width:auto;margin:0 auto;margin-bottom:16px; display:block;">"""
+            )
+        with gr.Column(scale=2):
+            gr.HTML(f"""
+<div style="display:flex;justify-content:flex-start;align-items:center;gap:30px;">
+    <div>
+        <h1 style="margin-bottom:0; color: {PRIMARY_COLOR}; font-size: 2.5em; font-weight: bold;"> {PROJECT_NAME} </h1>
+        <h3 style="color: #888; font-style: italic"> AIO{AIO_YEAR}: Module {AIO_MODULE}. </h3>
+    </div>
+</div>
+""")
+def create_footer():
+    logo_base64_vlai = image_to_base64("static/vlai_logo.png")
+    footer_html = """
+<style>
+  .sticky-footer{position:fixed;bottom:0px;left:0;width:100%;background:#E8F5E8;
+                 padding:10px;box-shadow:0 -2px 10px rgba(0,0,0,0.1);z-index:1000;}
+  .content-wrap{padding-bottom:60px;}
+</style>""" + f"""
+<div class="sticky-footer">
+  <div style="text-align:center;font-size:18px; color: #888">
+    Created by
+    <a href="https://vlai.work" target="_blank" style="color:#465C88;text-decoration:none;font-weight:bold; display:inline-flex; align-items:center;"> VLAI
+    <img src="data:image/png;base64,{logo_base64_vlai}" alt="Logo" style="height:20px; width:auto;">
+    </a> from <a href="https://aivietnam.edu.vn/" target="_blank" style="color:#355724;text-decoration:none;font-weight:bold">AI VIET NAM</a>
+  </div>
+</div>
+"""
+    return gr.HTML(footer_html)
+def _build_custom_css() -> str:
+    return f"""
+@import url('https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700&display=swap');
+.gradio-container {{
+    min-height: 100vh !important;
+    width: 100vw !important;
+    margin: 0 !important;
+    padding: 0px !important;
+    background: linear-gradient(135deg, {BG1} 0%, {BG2} 50%, {BG3} 100%);
+    background-size: 600% 600%;
+    animation: gradientBG 7s ease infinite;
+}}
+/* Global font setup */
+body, .gradio-container, .gr-block, .gr-markdown, .gr-button, .gr-input,
+.gr-dropdown, .gr-number, .gr-plot, .gr-dataframe, .gr-accordion, .gr-form,
+.gr-textbox, .gr-html, table, th, td, label, h1, h2, h3, h4, h5, h6, p, span, div {{
+    font-family: {FONT_FAMILY} !important;
+}}
+@keyframes gradientBG {{
+    0% {{background-position: 0% 50%;}}
+    50% {{background-position: 100% 50%;}}
+    100% {{background-position: 0% 50%;}}
+}}
+/* Minimize spacing and padding */
+.content-wrap {{
+    padding: 2px !important;
+    margin: 0 !important;
+}}
+/* Reduce component spacing */
+.gr-row {{
+    gap: 5px !important;
+    margin: 2px 0 !important;
+}}
+.gr-column {{
+    gap: 4px !important;
+    padding: 4px !important;
+}}
+/* Accordion optimization */
+.gr-accordion {{
+    margin: 4px 0 !important;
+}}
+.gr-accordion .gr-accordion-content {{
+    padding: 2px !important;
+}}
+/* Form elements spacing */
+.gr-form {{
+    gap: 2px !important;
+}}
+/* Button styling */
+.gr-button {{
+    margin: 2px 0 !important;
+}}
+/* DataFrame optimization */
+.gr-dataframe {{
+    margin: 4px 0 !important;
+}}
+/* Remove horizontal scroll from data preview */
+.gr-dataframe .wrap {{
+    overflow-x: auto !important;
+    max-width: 100% !important;
+}}
+/* Plot optimization */
+.gr-plot {{
+    margin: 4px 0 !important;
+}}
+/* Reduce markdown margins */
+.gr-markdown {{
+    margin: 2px 0 !important;
+}}
+/* Footer positioning */
+.sticky-footer {{
+    position: fixed;
+    bottom: 0px;
+    left: 0;
+    width: 100%;
+    background: {BG1};
+    padding: 6px !important;
+    box-shadow: 0 -2px 10px rgba(0,0,0,0.1);
+    z-index: 1000;
+}}
+"""
+# Initialize CSS using defaults
+custom_css = _build_custom_css()
+def render_info_card(description: str = None, meta_items: list = None, icon: str = "🧠", title: str = "About this demo") -> str:
+    desc = description if description is not None else PROJECT_DESCRIPTION
+    items = meta_items if meta_items is not None else META_INFO
+    meta_html = " · ".join([f"<span><strong>{k}</strong>: {v}</span>" for k, v in items]) if items else ""
+    return f"""
+    <div style="margin: 8px 0 8px 0;">
+      <div style="background:#F5F9FF;border-left:6px solid {PRIMARY_COLOR};padding:14px 16px;border-radius:10px;box-shadow:0 1px 3px rgba(0,0,0,0.06);">
+        <div style="display:flex;gap:14px;align-items:flex-start;">
+          <div style="font-size:22px;">{icon}</div>
+          <div>
+            <div style="font-weight:700;color:{PRIMARY_COLOR};margin-bottom:4px;">{title}</div>
+            <div style="color:#000;font-size:14px;line-height:1.5;">{desc}</div>
+            <div style="margin-top:8px;color:#000;font-size:13px;">{meta_html}</div>
+          </div>
+        </div>
+      </div>
+    </div>
+    """
+def render_disclaimer(text: str, icon: str = "⚠️", title: str = "Educational Use Only") -> str:
+    return f"""
+    <div style=\"margin: 8px 0 6px 0;\">
+      <div style=\"background:#FFF4F4;border-left:6px solid {ACCENT_COLOR};padding:12px 16px;border-radius:8px;box-shadow:0 1px 3px rgba(0,0,0,0.06);\">
+        <div style=\"display:flex;gap:10px;align-items:flex-start;color:#000;\">
+          <span style=\"font-size:20px\">{icon}</span>
+          <div>
+            <div style=\"font-weight:700; margin-bottom:4px;\">{title}</div>
+            <div style=\"font-size:14px; line-height:1.4;\">{text}</div>
+          </div>
+        </div>
+      </div>
+    </div>
+    """