fishy.models.deep.transformer

Transformer model for time series classification.

This model uses multi-head attention and feed-forward layers to process sequential data. It is designed to handle variable-length sequences and can be used for tasks such as classification or regression. The architecture includes layer normalization, dropout for regularization, and a final fully connected layer for output.

Classes

class fishy.models.deep.transformer.MultiHeadAttention(input_dim: int, num_heads: int)[source]

Bases: Module

Multi-head attention mechanism.

input_dim

Number of input features.

Type:

int

num_heads

Number of attention heads.

Type:

int

head_dim

Dimension of each attention head.

Type:

int

qkv

Combined projection for Q, K, and V.

Type:

nn.Linear

fc_out

Final output projection.

Type:

nn.Linear

scale

Scaling factor for dot-product attention.

Type:

float

__init__(input_dim: int, num_heads: int) None[source]

Initializes the MultiHeadAttention layer.

Parameters:
  • input_dim (int) – Number of input features.

  • num_heads (int) – Number of attention heads.

forward(x: Tensor, return_attention: bool = False) Tensor | Tuple[Tensor, Tensor][source]

Forward pass.

Parameters:
  • x (torch.Tensor) – Input tensor of shape (batch_size, seq_length, input_dim).

  • return_attention (bool) – Whether to return attention weights.

Returns:

Output tensor of shape (batch_size, seq_length, input_dim). torch.Tensor (optional): Attention weights of shape (batch_size, num_heads, seq_length, seq_length).

Return type:

torch.Tensor

class fishy.models.deep.transformer.Transformer(input_dim: int, output_dim: int, hidden_dim: int = 128, num_layers: int = 1, dropout: float = 0.1, num_heads: int = 4, **kwargs)[source]

Bases: Module, TTTMixin

Transformer architecture for 1D spectral data.

attention_layers

List of multi-head attention layers.

Type:

nn.ModuleList

feed_forward

Position-wise feed-forward network.

Type:

nn.Sequential

layer_norm1

Norm layer before attention.

Type:

nn.LayerNorm

layer_norm2

Norm layer before feed-forward.

Type:

nn.LayerNorm

dropout

Dropout layer.

Type:

nn.Dropout

fc_out

Final classification/regression head.

Type:

nn.Linear

__init__(input_dim: int, output_dim: int, hidden_dim: int = 128, num_layers: int = 1, dropout: float = 0.1, num_heads: int = 4, **kwargs) None[source]

Initializes the Transformer model.

Parameters:
  • input_dim (int) – Number of input features.

  • output_dim (int) – Number of output classes/dimensions.

  • hidden_dim (int, optional) – Intermediate dimension of the feed-forward layer. Defaults to 128.

  • num_layers (int, optional) – Number of transformer blocks. Defaults to 1.

  • dropout (float, optional) – Dropout rate for regularization. Defaults to 0.1.

  • num_heads (int, optional) – Number of attention heads. Defaults to 4.

forward(x: Tensor, return_attention: bool = False, *args, **kwargs) Tensor | Tuple[Tensor, List[Tensor]][source]

Forward pass.

Parameters:
  • x (torch.Tensor) – Input spectrum of shape (batch_size, input_dim) or (batch_size, seq_len, input_dim).

  • return_attention (bool) – Whether to return attention weights from all layers.

Returns:

Logits/predictions of shape (batch_size, output_dim). List[torch.Tensor] (optional): List of attention weights from each layer.

Return type:

torch.Tensor

s