BSBR: Block Sparse Attention with Block Retrieval
BSBR (Block Sparse Attention with Block Retrieval) is a novel attention mechanism for efficient processing of long sequences in transformer architectures. It combines standard attention within chunks and block retrieval between chunks to achieve near-linear complexity while maintaining high model expressivity.
Features
- 🔄 Efficient Processing: Near-linear complexity in sequence length
 - 🧩 Chunk-Based Attention: Standard attention within chunks
 - 🔍 Block Retrieval: Efficient information retrieval between chunks
 - 🎯 Configurable: Adjustable chunk size and compression
 - 💾 Memory Efficient: Optimized memory usage for long sequences
 
Quick Start
import torch
from bsbr import BSBRModel
# Model configuration
model = BSBRModel(
    vocab_size=10000,
    hidden_dim=512,
    num_layers=4,
    num_heads=8,
    chunk_size=128,
    ff_dim=2048,
    dropout=0.1,
    compression_factor=4  # Optional compression
)
# Input data
input_ids = torch.randint(0, 10000, (2, 256))
attention_mask = torch.ones(2, 256)
# Forward pass
outputs = model(input_ids, attention_mask)
Installation
# Install the core package
pip install bsbr
# Install with extras for evaluations and research
pip install "bsbr[extras]"
Documentation
Research
BSBR is based on research presented in our paper BSBR: Block Sparse Attention with Block Retrieval for Efficient Long-Context Reasoning. The implementation is inspired by Shengding Hu's blog post Streaming models for efficient long-context reasoning.
Contributing
We welcome contributions! Please see our Contributing Guidelines for details.
License
This project is licensed under the MIT License - see the LICENSE file for details.