{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Experimental Results Leaderboard\n", "\n", "This page provides an interactive view of results for 16 models across 4 datasets, aggregated from over 1900 individual runs." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import plotly.express as px\n", "import plotly.graph_objects as go\n", "import plotly.io as pio\n", "from pathlib import Path\n", "import os\n", "\n", "# Set renderer for Sphinx/ReadTheDocs compatibility\n", "pio.renderers.default = 'notebook'\n", "\n", "# Load Summarized and Raw data\n", "base_path = Path('_static') if Path('_static').exists() else Path('.') / '_static'\n", "sum_path = base_path / 'leaderboard_data.csv'\n", "raw_path = base_path / 'leaderboard_raw.csv'\n", "\n", "df = pd.read_csv(sum_path) if sum_path.exists() else pd.DataFrame()\n", "df_raw = pd.read_csv(raw_path) if raw_path.exists() else pd.DataFrame()\n", "\n", "if not df.empty:\n", " # Clean up significance columns: replace NaNs (from empty spaces in CSV) with blank\n", " df['Sig Tr'] = df['Sig Tr'].fillna(' ')\n", " df['Sig Te'] = df['Sig Te'].fillna(' ')\n", " \n", " for col in ['Train', 'Test', 'Test Std', 'Train Std']:\n", " if col in df.columns:\n", " df[col] = pd.to_numeric(df[col], errors='coerce').fillna(0.0)\n", "\n", "def get_color_map(methods):\n", " colors = px.colors.qualitative.Plotly + px.colors.qualitative.Bold\n", " unique_methods = sorted(list(set(methods)))\n", " return {m: colors[i % len(colors)] for i, m in enumerate(unique_methods)}\n", "\n", "color_map = get_color_map(df['Method'].unique()) if not df.empty else {}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 📊 Statistical Summary Table\n", "\n", "The table below summarizes the mean performance across 30 runs. \n", "\n", "**Note on Significance:** The symbols `+`, `-`, and `≈` indicate statistical significance compared to the **OPLS-DA baseline** using a paired t-test (p < 0.05).\n", "\n", "| Symbol | Meaning |\n", "|---|---|\n", "| `+` | Significantly better than OPLS-DA |\n", "| `-` | Significantly worse than OPLS-DA |\n", "| `≈` | No significant difference |" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not df.empty:\n", " # Select and rename columns as requested\n", " cols = ['Dataset', 'Method', 'Train', 'Train Std', 'Sig Tr', 'Test', 'Test Std', 'Sig Te', 'Runtime']\n", " actual_cols = [c for c in cols if c in df.columns]\n", " pdf = df[actual_cols].sort_values(['Dataset', 'Test'], ascending=[True, False])\n", " \n", " fig = go.Figure(data=[go.Table(\n", " header=dict(values=[f\"{c}\" for c in pdf.columns], fill_color='paleturquoise', align='left'),\n", " cells=dict(values=[pdf[c] for c in pdf.columns], \n", " format=[None, None, '.4f', '.4f', None, '.4f', '.4f', None, '.2f'],\n", " fill_color='lavender', align='left'))\n", " ])\n", " fig.update_layout(margin=dict(l=0, r=0, t=0, b=0), height=800)\n", " fig.show()\n", "else:\n", " print(\"No data available for summary table.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 📊 Performance by Dataset\n", "\n", "The bar charts below show the mean balanced accuracy for each method, with error bars representing one standard deviation across runs." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not df.empty:\n", " for ds in df['Dataset'].unique():\n", " ds_df = df[df['Dataset'] == ds].sort_values('Test', ascending=False)\n", " fig = px.bar(\n", " ds_df, x='Method', y='Test', error_y='Test Std' if 'Test Std' in ds_df.columns else None,\n", " title=f\"Leaderboard: {ds.upper()}\", \n", " color='Method', \n", " color_discrete_map=color_map,\n", " template='plotly_white',\n", " labels={'Test': 'Mean Balanced Accuracy'}\n", " )\n", " fig.update_layout(yaxis_range=[0, 1.05])\n", " fig.show()\n", "else:\n", " print(\"No results available for bar charts.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 🛡️ Stability Analysis (Box Plots)\n", "\n", "Visualizing the variance across all 30 runs for each model. Charts are sorted by descending mean performance." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not df_raw.empty and not df.empty:\n", " for ds in df_raw['Dataset'].unique():\n", " # Get the sorted order from the summary dataframe for this dataset\n", " sorted_methods = df[df['Dataset'] == ds].sort_values('Test', ascending=False)['Method'].tolist()\n", " \n", " ds_raw = df_raw[df_raw['Dataset'] == ds].copy()\n", " \n", " fig = px.box(ds_raw, x='Method', y='Test Accuracy', color='Method', \n", " color_discrete_map=color_map, points='all', template='plotly_white',\n", " category_orders={'Method': sorted_methods},\n", " title=f\"Stability Distribution: {ds.upper()}\")\n", " fig.show()\n", "else:\n", " print(\"No raw data available for stability plots.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 📈 Global Performance Heatmap\n", "A bird's-eye view of how all models perform across all datasets." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not df.empty:\n", " pivot_df = df.pivot(index='Method', columns='Dataset', values='Test')\n", " fig = px.imshow(pivot_df, text_auto=\".3f\", aspect=\"auto\", color_continuous_scale='Viridis',\n", " title=\"Model Generalization across Datasets\", template='plotly_white')\n", " fig.show()\n", "else:\n", " print(\"No data available for heatmap.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 🎯 Efficiency Frontier\n", "Comparison of Training vs Testing accuracy. Ideally, models should be in the top-right corner with a small gap." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not df.empty:\n", " fig = px.scatter(df, x=\"Test\", y=\"Train\", size=\"Test Std\", color=\"Method\", \n", " facet_col=\"Dataset\", hover_name=\"Method\", color_discrete_map=color_map, \n", " template=\"plotly_white\", title=\"Training vs Testing Performance\")\n", " fig.update_layout(height=400)\n", " fig.show()\n", "else:\n", " print(\"No data available for efficiency frontier.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 🏆 Top 3 Methods Comparison (Radar Chart)\n", "This chart compares the top 3 best-performing models for each dataset across multiple metrics." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if not df.empty:\n", " for ds in df['Dataset'].unique():\n", " ds_df = df[df['Dataset'] == ds].sort_values('Test', ascending=False).head(3)\n", " fig = go.Figure()\n", " for _, row in ds_df.iterrows():\n", " fig.add_trace(go.Scatterpolar(\n", " r=[row['Train'], row['Test'], row['Test'] - row['Test Std'], row['Test'] + row['Test Std']],\n", " theta=['Train Acc', 'Test Acc', 'Lower Bound', 'Upper Bound'],\n", " fill='toself', name=row['Method'],\n", " line=dict(color=color_map.get(row['Method']))\n", " ))\n", " fig.update_layout(\n", " polar=dict(radialaxis=dict(visible=True, range=[0, 1])), \n", " title=f\"Top 3 Profile: {ds.upper()}\", template='plotly_white'\n", " )\n", " fig.show()\n", "else:\n", " print(\"No data available for radar charts.\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.0" } }, "nbformat": 4, "nbformat_minor": 4 }