BSBR: Block Sparse Attention with Block Retrieval
BSBR (Block Sparse Attention with Block Retrieval) is a novel attention mechanism for efficient processing of long sequences in transformer architectures. It combines standard attention within chunks and block retrieval between chunks to achieve near-linear complexity while maintaining high model expressivity.
Features
- 🔄 Efficient Processing: Near-linear complexity in sequence length
- 🧩 Chunk-Based Attention: Standard attention within chunks
- 🔍 Block Retrieval: Efficient information retrieval between chunks
- 🎯 Configurable: Adjustable chunk size and compression
- 💾 Memory Efficient: Optimized memory usage for long sequences
Quick Start
import torch
from bsbr import BSBRModel
# Model configuration
model = BSBRModel(
vocab_size=10000,
hidden_dim=512,
num_layers=4,
num_heads=8,
chunk_size=128,
ff_dim=2048,
dropout=0.1,
compression_factor=4 # Optional compression
)
# Input data
input_ids = torch.randint(0, 10000, (2, 256))
attention_mask = torch.ones(2, 256)
# Forward pass
outputs = model(input_ids, attention_mask)
Installation
# Install the core package
pip install bsbr
# Install with extras for evaluations and research
pip install "bsbr[extras]"
Documentation
Research
BSBR is based on research presented in our paper BSBR: Block Sparse Attention with Block Retrieval for Efficient Long-Context Reasoning. The implementation is inspired by Shengding Hu's blog post Streaming models for efficient long-context reasoning.
Contributing
We welcome contributions! Please see our Contributing Guidelines for details.
License
This project is licensed under the MIT License - see the LICENSE file for details.