fishy.models.deep.transformer¶

Transformer model for time series classification.

This model uses multi-head attention and feed-forward layers to process sequential data. It is designed to handle variable-length sequences and can be used for tasks such as classification or regression. The architecture includes layer normalization, dropout for regularization, and a final fully connected layer for output.

Classes

class fishy.models.deep.transformer.MultiHeadAttention(input_dim: int, num_heads: int)[source]¶

Bases: Module

Multi-head attention mechanism.

input_dim¶

Number of input features.

Type:: int

num_heads¶

Number of attention heads.

Type:: int

head_dim¶

Dimension of each attention head.

Type:: int

qkv¶

Combined projection for Q, K, and V.

Type:: nn.Linear

fc_out¶

Final output projection.

Type:: nn.Linear

scale¶

Scaling factor for dot-product attention.

Type:: float

__init__(input_dim: int, num_heads: int) → None[source]¶

Initializes the MultiHeadAttention layer.

Parameters:

input_dim (int) – Number of input features.
num_heads (int) – Number of attention heads.

forward(x: Tensor, return_attention: bool = False) → Tensor | Tuple[Tensor, Tensor][source]¶

Forward pass.

Parameters:

x (torch.Tensor) – Input tensor of shape (batch_size, seq_length, input_dim).
return_attention (bool) – Whether to return attention weights.

Returns:

Output tensor of shape (batch_size, seq_length, input_dim). torch.Tensor (optional): Attention weights of shape (batch_size, num_heads, seq_length, seq_length).

Return type:

torch.Tensor

class fishy.models.deep.transformer.Transformer(input_dim: int, output_dim: int, hidden_dim: int = 128, num_layers: int = 1, dropout: float = 0.1, num_heads: int = 4, **kwargs)[source]¶

Bases: Module, TTTMixin

Transformer architecture for 1D spectral data.

attention_layers¶

List of multi-head attention layers.

Type:: nn.ModuleList

feed_forward¶

Position-wise feed-forward network.

Type:: nn.Sequential

layer_norm1¶

Norm layer before attention.

Type:: nn.LayerNorm

layer_norm2¶

Norm layer before feed-forward.

Type:: nn.LayerNorm

dropout¶

Dropout layer.

Type:: nn.Dropout

fc_out¶

Final classification/regression head.

Type:: nn.Linear

__init__(input_dim: int, output_dim: int, hidden_dim: int = 128, num_layers: int = 1, dropout: float = 0.1, num_heads: int = 4, **kwargs) → None[source]¶

Initializes the Transformer model.

Parameters:

input_dim (int) – Number of input features.
output_dim (int) – Number of output classes/dimensions.
hidden_dim (int, optional) – Intermediate dimension of the feed-forward layer. Defaults to 128.
num_layers (int, optional) – Number of transformer blocks. Defaults to 1.
dropout (float, optional) – Dropout rate for regularization. Defaults to 0.1.
num_heads (int, optional) – Number of attention heads. Defaults to 4.

forward(x: Tensor, return_attention: bool = False, *args, **kwargs) → Tensor | Tuple[Tensor, List[Tensor]][source]¶

Forward pass.

Parameters:

x (torch.Tensor) – Input spectrum of shape (batch_size, input_dim) or (batch_size, seq_len, input_dim).
return_attention (bool) – Whether to return attention weights from all layers.

Returns:

Logits/predictions of shape (batch_size, output_dim). List[torch.Tensor] (optional): List of attention weights from each layer.

Return type:

torch.Tensor