Skip to content

canvas_engineering.connectivity

Declarative attention topology with temporal fill modes.

canvas_engineering.connectivity.TemporalFill

Bases: str, Enum

How to handle temporal mismatches when dst has no value at the requested timestep.

When regions update at different frequencies, a fast region (period=1) attending to a slow region (period=576) will often find no dst positions at the exact requested timestep. This enum controls what happens.

No connection. The fast region gets nothing from the slow region

at misaligned frames. Information is lost.

HOLD: Zero-order hold. Use the most recent available value from the slow region. This is how the real world works — you trade on last quarter's GDP until this quarter's is released. INTERPOLATE: Weighted interpolation between surrounding available values. For order=1 (default): linear lerp between nearest past and future endpoints (falls back to HOLD if only past is available). For order=N: inverse-distance weighting (IDW) over up to N+1 nearest anchor points with weights proportional to 1/dist^N, non-negative and normalized. Best for smooth signals during training where both endpoints are known.

canvas_engineering.connectivity.Connection dataclass

A single block-to-block attention operation.

src queries attend to dst keys/values. This is one cross-attention op in the per-step compute DAG.

Parameters:

Name Type Description Default
src str

Region whose tokens are the queries.

required
dst str

Region whose tokens are the keys/values.

required
weight float

Attention weight scaling (1.0 = full attention).

1.0
t_src Optional[int]

Temporal offset for src (query) positions. None = all timesteps. When int, constrains src to positions at reference_frame + t_src.

None
t_dst Optional[int]

Temporal offset for dst (key/value) positions. None = all timesteps. When int, constrains dst to positions at reference_frame + t_dst.

None
fn Optional[str]

Attention backend for this connection. None = use the src region's default_attn. See ATTENTION_TYPES in canvas.py for the full registry.

None
operator str

Semantic intent of this edge. "attend" = generic (default). When compile_program() has family info, this is auto-set: observe, predict, correct, bind, retrieve, write, act, etc.

'attend'
write_mode str

How output accumulates at destination. "add" = additive (default), "replace" = overwrite, "gate" = learned gate.

'add'
temporal_fill TemporalFill

How to handle missing dst positions at the requested timestep. Only applies when t_dst is not None. Default HOLD.

HOLD
interpolation_order int

For TemporalFill.INTERPOLATE, the IDW order. order=1 (default) gives linear lerp between nearest past and future. order=N uses up to N+1 anchor points weighted by 1/dist^N, normalized. Ignored for DROP and HOLD fill modes.

1

canvas_engineering.connectivity.CanvasTopology dataclass

Declarative specification of block-to-block attention connectivity.

Each Connection defines a cross-attention operation performed per step. Self-connections (src == dst) = self-attention within a block. The full set of connections is the compute DAG.

Temporal semantics

When t_src and t_dst are both None (default), all positions in src attend to all positions in dst regardless of timestep (dense in time).

When t_src and/or t_dst are ints, they define relative temporal offsets. The mask iterates over reference frames and pairs: src positions at (ref + t_src) with dst positions at (ref + t_dst).

When dst has no positions at the requested timestep, the connection's temporal_fill mode determines behavior (see TemporalFill enum).

Mixed (one None, one int): the None side includes ALL its timesteps, the int side is constrained to (ref + offset) per reference frame.

Special cases
  • Every region self-connected = block-diagonal self-attention
  • Every pair connected = dense attention (standard transformer)
  • DAG structure = structured information flow
  • t_src=0, t_dst=0 = same-frame only (no temporal leakage)
  • t_src=0, t_dst=-1 = previous-frame cross-attention

has_temporal_constraints property

Whether any connection has temporal offsets.

regions property

All region names referenced in connections.

attended_by(region)

Which regions query against region (as keys/values)?

attention_ops(layout=None)

List of (src, dst, weight, fn) attention operations per step.

If layout is provided, connection fn is resolved via region defaults.

causal_chain(regions) staticmethod

Causal chain: A → B → C (each attends to self + previous).

causal_temporal(regions, layout=None, temporal_fill=TemporalFill.HOLD) staticmethod

Temporal causal: same-frame self-attn + prev-frame cross-attn.

Each region attends to itself at the same frame, and to all other regions at the previous frame. No future leakage.

All temporal connections use the specified fill mode (default HOLD), ensuring fast regions can always read from slow regions' most recent values even when update frequencies don't align.

Parameters:

Name Type Description Default
regions List[str]

Region names.

required
layout Optional[CanvasLayout]

Optional layout (reserved for future period-aware logic).

None
temporal_fill TemporalFill

Fill mode for all temporal connections.

HOLD

dense(regions) staticmethod

Fully connected: every region attends to every other.

hub_spoke(hub, spokes, bidirectional=True) staticmethod

Hub-and-spoke: hub reads from all spokes, spokes read from hub.

isolated(regions) staticmethod

Block-diagonal: each region only attends to itself.

neighbors_of(region)

Which regions does region attend to (as queries)?

resolve_fn(connection, layout=None)

Resolve the attention function type for a connection.

Resolution order
  1. connection.fn if explicitly set
  2. layout.region_spec(connection.src).default_attn if layout provided
  3. "cross_attention" (global default)

The resolved fn string is dispatched by AttentionDispatcher (see canvas_engineering.dispatch) using ATTENTION_REGISTRY (see canvas_engineering.attention). Each fn maps to an nn.Module with forward(queries, keys, values) -> output.

summary()

Human-readable summary.

to_additive_mask(layout, device='cpu')

Generate (N, N) additive attention mask for use with nn.Transformer.

Returns a float mask where 0.0 = attend and -inf = block. Unused positions (not in any region) get self-attention to avoid NaN in softmax. Use with: transformer(x, mask=topology.to_additive_mask(layout))

to_attention_mask(layout, device='cpu')

Generate (N, N) attention mask from topology.

mask[i, j] > 0 means token i (query) attends to token j (key). Value is the connection weight (may be fractional for INTERPOLATE).

Temporal fill modes are applied when dst has no positions at the requested timestep: DROP: No connection (mask stays 0). HOLD: Connect to most recent available dst positions. INTERPOLATE: Connect to surrounding dst positions with IDW weights.

to_block_adjacency()

Region-level adjacency dict: (src, dst) → weight.