{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Experimental Results Leaderboard\n",
    "\n",
    "This page provides an interactive view of results for 16 models across 4 datasets, aggregated from over 1900 individual runs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import plotly.express as px\n",
    "import plotly.graph_objects as go\n",
    "import plotly.io as pio\n",
    "from pathlib import Path\n",
    "import os\n",
    "\n",
    "# Set renderer for Sphinx/ReadTheDocs compatibility\n",
    "pio.renderers.default = 'notebook'\n",
    "\n",
    "# Load Summarized and Raw data\n",
    "base_path = Path('_static') if Path('_static').exists() else Path('.') / '_static'\n",
    "sum_path = base_path / 'leaderboard_data.csv'\n",
    "raw_path = base_path / 'leaderboard_raw.csv'\n",
    "\n",
    "df = pd.read_csv(sum_path) if sum_path.exists() else pd.DataFrame()\n",
    "df_raw = pd.read_csv(raw_path) if raw_path.exists() else pd.DataFrame()\n",
    "\n",
    "if not df.empty:\n",
    "    # Clean up significance columns: replace NaNs (from empty spaces in CSV) with blank\n",
    "    df['Sig Tr'] = df['Sig Tr'].fillna(' ')\n",
    "    df['Sig Te'] = df['Sig Te'].fillna(' ')\n",
    "    \n",
    "    for col in ['Train', 'Test', 'Test Std', 'Train Std']:\n",
    "        if col in df.columns:\n",
    "            df[col] = pd.to_numeric(df[col], errors='coerce').fillna(0.0)\n",
    "\n",
    "def get_color_map(methods):\n",
    "    colors = px.colors.qualitative.Plotly + px.colors.qualitative.Bold\n",
    "    unique_methods = sorted(list(set(methods)))\n",
    "    return {m: colors[i % len(colors)] for i, m in enumerate(unique_methods)}\n",
    "\n",
    "color_map = get_color_map(df['Method'].unique()) if not df.empty else {}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 📊 Statistical Summary Table\n",
    "\n",
    "The table below summarizes the mean performance across 30 runs. \n",
    "\n",
    "**Note on Significance:** The symbols `+`, `-`, and `≈` indicate statistical significance compared to the **OPLS-DA baseline** using a paired t-test (p < 0.05).\n",
    "\n",
    "| Symbol | Meaning |\n",
    "|---|---|\n",
    "| `+` | Significantly better than OPLS-DA |\n",
    "| `-` | Significantly worse than OPLS-DA |\n",
    "| `≈` | No significant difference |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if not df.empty:\n",
    "    # Select and rename columns as requested\n",
    "    cols = ['Dataset', 'Method', 'Train', 'Train Std', 'Sig Tr', 'Test', 'Test Std', 'Sig Te', 'Runtime']\n",
    "    actual_cols = [c for c in cols if c in df.columns]\n",
    "    pdf = df[actual_cols].sort_values(['Dataset', 'Test'], ascending=[True, False])\n",
    "    \n",
    "    fig = go.Figure(data=[go.Table(\n",
    "        header=dict(values=[f\"<b>{c}</b>\" for c in pdf.columns], fill_color='paleturquoise', align='left'),\n",
    "        cells=dict(values=[pdf[c] for c in pdf.columns], \n",
    "                   format=[None, None, '.4f', '.4f', None, '.4f', '.4f', None, '.2f'],\n",
    "                   fill_color='lavender', align='left'))\n",
    "    ])\n",
    "    fig.update_layout(margin=dict(l=0, r=0, t=0, b=0), height=800)\n",
    "    fig.show()\n",
    "else:\n",
    "    print(\"No data available for summary table.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 📊 Performance by Dataset\n",
    "\n",
    "The bar charts below show the mean balanced accuracy for each method, with error bars representing one standard deviation across runs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if not df.empty:\n",
    "    for ds in df['Dataset'].unique():\n",
    "        ds_df = df[df['Dataset'] == ds].sort_values('Test', ascending=False)\n",
    "        fig = px.bar(\n",
    "            ds_df, x='Method', y='Test', error_y='Test Std' if 'Test Std' in ds_df.columns else None,\n",
    "            title=f\"Leaderboard: {ds.upper()}\", \n",
    "            color='Method', \n",
    "            color_discrete_map=color_map,\n",
    "            template='plotly_white',\n",
    "            labels={'Test': 'Mean Balanced Accuracy'}\n",
    "        )\n",
    "        fig.update_layout(yaxis_range=[0, 1.05])\n",
    "        fig.show()\n",
    "else:\n",
    "    print(\"No results available for bar charts.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 🛡️ Stability Analysis (Box Plots)\n",
    "\n",
    "Visualizing the variance across all 30 runs for each model. Charts are sorted by descending mean performance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if not df_raw.empty and not df.empty:\n",
    "    for ds in df_raw['Dataset'].unique():\n",
    "        # Get the sorted order from the summary dataframe for this dataset\n",
    "        sorted_methods = df[df['Dataset'] == ds].sort_values('Test', ascending=False)['Method'].tolist()\n",
    "        \n",
    "        ds_raw = df_raw[df_raw['Dataset'] == ds].copy()\n",
    "        \n",
    "        fig = px.box(ds_raw, x='Method', y='Test Accuracy', color='Method', \n",
    "                     color_discrete_map=color_map, points='all', template='plotly_white',\n",
    "                     category_orders={'Method': sorted_methods},\n",
    "                     title=f\"Stability Distribution: {ds.upper()}\")\n",
    "        fig.show()\n",
    "else:\n",
    "    print(\"No raw data available for stability plots.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 📈 Global Performance Heatmap\n",
    "A bird's-eye view of how all models perform across all datasets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if not df.empty:\n",
    "    pivot_df = df.pivot(index='Method', columns='Dataset', values='Test')\n",
    "    fig = px.imshow(pivot_df, text_auto=\".3f\", aspect=\"auto\", color_continuous_scale='Viridis',\n",
    "                    title=\"Model Generalization across Datasets\", template='plotly_white')\n",
    "    fig.show()\n",
    "else:\n",
    "    print(\"No data available for heatmap.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 🎯 Efficiency Frontier\n",
    "Comparison of Training vs Testing accuracy. Ideally, models should be in the top-right corner with a small gap."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if not df.empty:\n",
    "    fig = px.scatter(df, x=\"Test\", y=\"Train\", size=\"Test Std\", color=\"Method\", \n",
    "                     facet_col=\"Dataset\", hover_name=\"Method\", color_discrete_map=color_map, \n",
    "                     template=\"plotly_white\", title=\"Training vs Testing Performance\")\n",
    "    fig.update_layout(height=400)\n",
    "    fig.show()\n",
    "else:\n",
    "    print(\"No data available for efficiency frontier.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 🏆 Top 3 Methods Comparison (Radar Chart)\n",
    "This chart compares the top 3 best-performing models for each dataset across multiple metrics."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if not df.empty:\n",
    "    for ds in df['Dataset'].unique():\n",
    "        ds_df = df[df['Dataset'] == ds].sort_values('Test', ascending=False).head(3)\n",
    "        fig = go.Figure()\n",
    "        for _, row in ds_df.iterrows():\n",
    "            fig.add_trace(go.Scatterpolar(\n",
    "                r=[row['Train'], row['Test'], row['Test'] - row['Test Std'], row['Test'] + row['Test Std']],\n",
    "                theta=['Train Acc', 'Test Acc', 'Lower Bound', 'Upper Bound'],\n",
    "                fill='toself', name=row['Method'],\n",
    "                line=dict(color=color_map.get(row['Method']))\n",
    "            ))\n",
    "        fig.update_layout(\n",
    "            polar=dict(radialaxis=dict(visible=True, range=[0, 1])), \n",
    "            title=f\"Top 3 Profile: {ds.upper()}\", template='plotly_white'\n",
    "        )\n",
    "        fig.show()\n",
    "else:\n",
    "    print(\"No data available for radar charts.\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}