BSBR: Block Sparse Attention with Block Retrieval

BSBR (Block Sparse Attention with Block Retrieval) is a novel attention mechanism for efficient processing of long sequences in transformer architectures. It combines standard attention within chunks and block retrieval between chunks to achieve near-linear complexity while maintaining high model expressivity.

Features

🔄 Efficient Processing: Near-linear complexity in sequence length
🧩 Chunk-Based Attention: Standard attention within chunks
🔍 Block Retrieval: Efficient information retrieval between chunks
🎯 Configurable: Adjustable chunk size and compression
💾 Memory Efficient: Optimized memory usage for long sequences

Quick Start

import torch
from bsbr import BSBRModel

# Model configuration
model = BSBRModel(
    vocab_size=10000,
    hidden_dim=512,
    num_layers=4,
    num_heads=8,
    chunk_size=128,
    ff_dim=2048,
    dropout=0.1,
    compression_factor=4  # Optional compression
)

# Input data
input_ids = torch.randint(0, 10000, (2, 256))
attention_mask = torch.ones(2, 256)

# Forward pass
outputs = model(input_ids, attention_mask)

Installation

# Install the core package
pip install bsbr

# Install with extras for evaluations and research
pip install "bsbr[extras]"

Documentation

Research

BSBR is based on research presented in our paper BSBR: Block Sparse Attention with Block Retrieval for Efficient Long-Context Reasoning. The implementation is inspired by Shengding Hu's blog post Streaming models for efficient long-context reasoning.

Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.