fishy.models.deep.gmoe

Sparsely-Gated Mixture of Experts (GMOE) for spectral data.

This model uses a top-k gating network to sparsely activate expert networks (Transformers) based on the input spectrum, improving efficiency and specialization.

Classes

class fishy.models.deep.gmoe.SparselyGatedMoE(input_dim: int, output_dim: int, hidden_dim: int = 128, num_layers: int = 4, num_heads: int = 4, dropout: float = 0.1, num_experts: int = 8, k: int = 2, **kwargs)[source]

Bases: Module

Sparsely-Gated Mixture of Experts (GMOE) using Transformer experts.

experts

List of expert Transformer models.

Type:

nn.ModuleList

gate

Gating network to determine weights for each expert.

Type:

nn.Linear

k

Number of top experts to activate per sample.

Type:

int

output_dim

Number of output classes/dimensions.

Type:

int

__init__(input_dim: int, output_dim: int, hidden_dim: int = 128, num_layers: int = 4, num_heads: int = 4, dropout: float = 0.1, num_experts: int = 8, k: int = 2, **kwargs) None[source]

Initializes the SparselyGatedMoE model.

Parameters:
  • input_dim (int) – Number of input features.

  • output_dim (int) – Number of output classes/dimensions.

  • hidden_dim (int, optional) – Hidden dimension. Defaults to 128.

  • num_layers (int, optional) – Layers per expert. Defaults to 4.

  • num_heads (int, optional) – Heads per expert. Defaults to 4.

  • dropout (float, optional) – Dropout rate. Defaults to 0.1.

  • num_experts (int, optional) – Total number of expert networks. Defaults to 8.

  • k (int, optional) – Number of experts to activate. Defaults to 2.

forward(x: Tensor, *args, **kwargs) Tensor[source]

Forward pass with sparse routing.

s