BSBR Extras API
bsbr_extras.standard_transformer.StandardTransformerModel
Bases: Module
Full Standard Transformer model stacking multiple Standard Transformer layers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vocab_size | int | Vocabulary size for embedding layer | required |
hidden_dim | int | Hidden dimension size | required |
num_layers | int | Number of transformer layers | required |
num_heads | int | Number of attention heads | required |
ff_dim | int | Feed-forward intermediate dimension | required |
dropout | float | Dropout probability | 0.1 |
Source code in src/bsbr_extras/standard_transformer.py
forward(input_ids, attention_mask=None)
Forward pass for the full Standard Transformer model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_ids | LongTensor | Token IDs of shape [batch_size, seq_len] | required |
attention_mask | Optional[Tensor] | Optional attention mask of shape [batch_size, seq_len] | None |
Returns:
Name | Type | Description |
---|---|---|
output | Tensor | Processed tensor of shape [batch_size, seq_len, hidden_dim] |
Source code in src/bsbr_extras/standard_transformer.py
bsbr_extras.linear_transformer.LinearTransformerModel
Bases: Module
Full Linear Transformer model stacking multiple Linear Transformer layers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vocab_size | int | Vocabulary size for embedding layer | required |
hidden_dim | int | Hidden dimension size | required |
num_layers | int | Number of LinearTransformer layers | required |
num_heads | int | Number of attention heads | required |
ff_dim | int | Feed-forward intermediate dimension | required |
dropout | float | Dropout probability | 0.1 |
Source code in src/bsbr_extras/linear_transformer.py
forward(input_ids, attention_mask=None, states=None)
Forward pass for the full Linear Transformer model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_ids | LongTensor | Token IDs of shape [batch_size, seq_len] | required |
attention_mask | Optional[Tensor] | Optional attention mask of shape [batch_size, seq_len] | None |
states | Optional[list] | Optional previous state list for each layer | None |
Returns:
Name | Type | Description |
---|---|---|
output | Tensor | Processed tensor of shape [batch_size, seq_len, hidden_dim] |
new_states | list | Updated state list for each layer |
Source code in src/bsbr_extras/linear_transformer.py
bsbr_extras.delta_net.DeltaNetModel
Bases: Module
Full DeltaNet model stacking multiple DeltaNet layers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vocab_size | int | Vocabulary size for embedding layer | required |
hidden_dim | int | Hidden dimension size | required |
num_layers | int | Number of DeltaNet layers | required |
num_heads | int | Number of attention heads | required |
ff_dim | int | Feed-forward intermediate dimension | required |
beta | float | Forgetting/update rate parameter (β in the paper) | 0.9 |
dropout | float | Dropout probability | 0.1 |
Source code in src/bsbr_extras/delta_net.py
forward(input_ids, attention_mask=None, states=None)
Forward pass for the full DeltaNet model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_ids | LongTensor | Token IDs of shape [batch_size, seq_len] | required |
attention_mask | Optional[Tensor] | Optional attention mask of shape [batch_size, seq_len] | None |
states | Optional[list] | Optional previous state list for each layer | None |
Returns:
Name | Type | Description |
---|---|---|
output | Tensor | Processed tensor of shape [batch_size, seq_len, hidden_dim] |
new_states | list | Updated state list for each layer |
Source code in src/bsbr_extras/delta_net.py
bsbr_extras.gau.GAUModel
Bases: Module
Full Gated Attention Unit model stacking multiple GAU layers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vocab_size | int | Vocabulary size for embedding layer | required |
hidden_dim | int | Hidden dimension size | required |
num_layers | int | Number of GAU layers | required |
chunk_size | int | Size of chunks for parallel processing | required |
ff_dim | int | Feed-forward intermediate dimension | required |
expansion_factor | int | Expansion factor for GAU | 2 |
dropout | float | Dropout probability | 0.1 |
Source code in src/bsbr_extras/gau.py
forward(input_ids, attention_mask=None)
Forward pass for the full GAU model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_ids | LongTensor | Token IDs of shape [batch_size, seq_len] | required |
attention_mask | Optional[Tensor] | Optional attention mask of shape [batch_size, seq_len] | None |
Returns:
Name | Type | Description |
---|---|---|
output | Tensor | Processed tensor of shape [batch_size, seq_len, hidden_dim] |
Source code in src/bsbr_extras/gau.py
bsbr_extras.hopfield_network.HopfieldNetworkModel
Bases: Module
Full Hopfield Network model stacking multiple Hopfield Network layers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vocab_size | int | Vocabulary size for embedding layer | required |
hidden_dim | int | Hidden dimension size | required |
num_layers | int | Number of Hopfield Network layers | required |
num_heads | int | Number of attention heads | required |
ff_dim | int | Feed-forward intermediate dimension | required |
temperature | float | Temperature parameter for the Hopfield energy function | 1.0 |
dropout | float | Dropout probability | 0.1 |
Source code in src/bsbr_extras/hopfield_network.py
forward(input_ids, attention_mask=None, states=None)
Forward pass for the full Hopfield Network model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_ids | LongTensor | Token IDs of shape [batch_size, seq_len] | required |
attention_mask | Optional[Tensor] | Optional attention mask of shape [batch_size, seq_len] | None |
states | Optional[list] | Optional previous states list for each layer | None |
Returns:
Name | Type | Description |
---|---|---|
output | Tensor | Processed tensor of shape [batch_size, seq_len, hidden_dim] |
new_states | list | Updated states list for each layer |
Source code in src/bsbr_extras/hopfield_network.py
bsbr_extras.sliding_window_transformer.SlidingWindowTransformerModel
Bases: Module
Full Sliding Window Transformer model stacking multiple transformer layers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vocab_size | int | Vocabulary size for embedding layer | required |
hidden_dim | int | Hidden dimension size | required |
num_layers | int | Number of transformer layers | required |
num_heads | int | Number of attention heads | required |
window_size | int | Size of the attention window | required |
ff_dim | int | Feed-forward intermediate dimension | required |
dropout | float | Dropout probability | 0.1 |
Source code in src/bsbr_extras/sliding_window_transformer.py
forward(input_ids, attention_mask=None)
Forward pass for the full Sliding Window Transformer model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_ids | LongTensor | Token IDs of shape [batch_size, seq_len] | required |
attention_mask | Optional[Tensor] | Optional attention mask of shape [batch_size, seq_len] | None |
Returns:
Name | Type | Description |
---|---|---|
output | Tensor | Processed tensor of shape [batch_size, seq_len, hidden_dim] |