fishy.models.deep.transformer¶
Transformer model for time series classification.
This model uses multi-head attention and feed-forward layers to process sequential data. It is designed to handle variable-length sequences and can be used for tasks such as classification or regression. The architecture includes layer normalization, dropout for regularization, and a final fully connected layer for output.
Classes
- class fishy.models.deep.transformer.MultiHeadAttention(input_dim: int, num_heads: int)[source]¶
Bases:
ModuleMulti-head attention mechanism.
- input_dim¶
Number of input features.
- Type:
int
- num_heads¶
Number of attention heads.
- Type:
int
- head_dim¶
Dimension of each attention head.
- Type:
int
- qkv¶
Combined projection for Q, K, and V.
- Type:
nn.Linear
- fc_out¶
Final output projection.
- Type:
nn.Linear
- scale¶
Scaling factor for dot-product attention.
- Type:
float
- __init__(input_dim: int, num_heads: int) None[source]¶
Initializes the MultiHeadAttention layer.
- Parameters:
input_dim (int) – Number of input features.
num_heads (int) – Number of attention heads.
- forward(x: Tensor, return_attention: bool = False) Tensor | Tuple[Tensor, Tensor][source]¶
Forward pass.
- Parameters:
x (torch.Tensor) – Input tensor of shape (batch_size, seq_length, input_dim).
return_attention (bool) – Whether to return attention weights.
- Returns:
Output tensor of shape (batch_size, seq_length, input_dim). torch.Tensor (optional): Attention weights of shape (batch_size, num_heads, seq_length, seq_length).
- Return type:
torch.Tensor
- class fishy.models.deep.transformer.Transformer(input_dim: int, output_dim: int, hidden_dim: int = 128, num_layers: int = 1, dropout: float = 0.1, num_heads: int = 4, **kwargs)[source]¶
Bases:
Module,TTTMixinTransformer architecture for 1D spectral data.
- attention_layers¶
List of multi-head attention layers.
- Type:
nn.ModuleList
- feed_forward¶
Position-wise feed-forward network.
- Type:
nn.Sequential
- layer_norm1¶
Norm layer before attention.
- Type:
nn.LayerNorm
- layer_norm2¶
Norm layer before feed-forward.
- Type:
nn.LayerNorm
- dropout¶
Dropout layer.
- Type:
nn.Dropout
- fc_out¶
Final classification/regression head.
- Type:
nn.Linear
- __init__(input_dim: int, output_dim: int, hidden_dim: int = 128, num_layers: int = 1, dropout: float = 0.1, num_heads: int = 4, **kwargs) None[source]¶
Initializes the Transformer model.
- Parameters:
input_dim (int) – Number of input features.
output_dim (int) – Number of output classes/dimensions.
hidden_dim (int, optional) – Intermediate dimension of the feed-forward layer. Defaults to 128.
num_layers (int, optional) – Number of transformer blocks. Defaults to 1.
dropout (float, optional) – Dropout rate for regularization. Defaults to 0.1.
num_heads (int, optional) – Number of attention heads. Defaults to 4.
- forward(x: Tensor, return_attention: bool = False, *args, **kwargs) Tensor | Tuple[Tensor, List[Tensor]][source]¶
Forward pass.
- Parameters:
x (torch.Tensor) – Input spectrum of shape (batch_size, input_dim) or (batch_size, seq_len, input_dim).
return_attention (bool) – Whether to return attention weights from all layers.
- Returns:
Logits/predictions of shape (batch_size, output_dim). List[torch.Tensor] (optional): List of attention weights from each layer.
- Return type:
torch.Tensor
s