The Canvas¶
A canvas is a 3D grid (T, H, W) of d_model-dimensional vectors. Each modality occupies a named region. The diffusion process operates on "output" regions; "input" regions serve as conditioning context.
CanvasLayout¶
layout = CanvasLayout(
T=16, H=32, W=32, d_model=768,
regions={
"screen": (0, 16, 0, 24, 0, 24), # raw tuple — defaults
"mouse": RegionSpec(bounds=(0, 16, 24, 26, 0, 4), loss_weight=2.0),
"thought": RegionSpec(bounds=(0, 4, 28, 32, 0, 8), period=4),
"prompt": RegionSpec(bounds=(0, 1, 26, 28, 0, 4), is_output=False),
},
)
Raw 6-tuples auto-wrap as RegionSpec(bounds=tuple) — full backward compatibility.
RegionSpec fields¶
| Field | Default | Meaning |
|---|---|---|
bounds |
(required) | (t0, t1, h0, h1, w0, w1) spatial-temporal extent |
period |
1 |
Canvas frames per real-world update |
is_output |
True |
Participates in diffusion loss? |
loss_weight |
1.0 |
Relative loss weight |
semantic_type |
None |
Human-readable modality description |
semantic_embedding |
None |
Frozen vector for transfer distance |
embedding_model |
"openai/text-embedding-3-small" |
Which model produced the embedding |
default_attn |
"cross_attention" |
Default attention fn for outgoing connections |
carrier |
"deterministic" |
Dynamics carrier: deterministic, diffusive, filter, memory, residual |
Temporal frequency¶
A region with period=4 spanning t=0..3 means its 4 canvas slots map to real-world frames 0, 4, 8, 12.
layout.real_frame("thought", canvas_t=2) # → 8
layout.canvas_frame("thought", real_t=8) # → 2
layout.canvas_frame("thought", real_t=7) # → None (not aligned)
Loss weight mask¶
weights = layout.loss_weight_mask("cuda") # (N,) tensor
loss = (per_position_loss * weights).sum() / weights.sum()
Positions in is_output=True regions get their loss_weight; is_output=False or uncovered positions get 0. Overlapping regions accumulate additively.
Temporal frequency and fill modes¶
A region with period=4 updates every 4 real-world frames. When a fast region (period=1) attends to a slow region (period=4) at a timestep where the slow region hasn't updated, the connection's temporal_fill mode determines behavior. See Topology — Temporal fill modes for details.
Fill resolution operates in real-time space: a slow region's canvas frames are mapped to real times via canvas_t * period, creating natural gaps that INTERPOLATE can exploit. This is fully transparent — period=1 regions behave identically to the original canvas-frame-based resolution.
SpatiotemporalCanvas¶
The SpatiotemporalCanvas module manages the tensor with positional + modality + period embeddings:
- Positional encoding: 3D sinusoidal, d_model split into thirds for (t, h, w)
- Empty token: Learned parameter for unoccupied positions
- Modality embeddings: Learned per-region embedding added during
place() - Period embedding: Learned embedding indexed by log-bucketed temporal period, summed into each position so the model knows its native update rate
- Carrier field: Each
RegionSpecdeclares acarrier(default"deterministic") that describes the region's dynamics. See Carriers for the full breakdown
canvas_mod = SpatiotemporalCanvas(layout)
batch = canvas_mod.create_empty(4) # (4, T*H*W, d_model)
batch = canvas_mod.place(batch, embs, "visual")
out = canvas_mod.extract(batch, "action")
PeriodEmbedding¶
Each position's representation includes a PeriodEmbedding — a learned vector indexed by the region's temporal period. Period values are mapped to buckets via log scaling (period=1 → bucket 0, period=576 → bucket 10). This lets the model infer staleness when reading held values from slower regions via temporal fill connections.