Skip to content

Recipe: Custom Architectures

Canvas engineering is backbone-agnostic. Here are schemas for non-standard architectures.

Vision Transformer with structured attention

Replace ViT's dense self-attention with a topology-aware pattern:

layout = CanvasLayout(
    T=1, H=14, W=14, d_model=768,
    regions={
        "patches": RegionSpec(bounds=(0,1, 0,14, 0,14),
                              default_attn="local_attention"),
        "cls": RegionSpec(bounds=(0,1, 0,1, 0,1),   # reuse corner
                          default_attn="cross_attention"),
    },
)

topology = CanvasTopology(connections=[
    Connection(src="patches", dst="patches"),          # local self-attn
    Connection(src="cls", dst="patches"),               # global aggregation
    Connection(src="patches", dst="cls", fn="pooling"), # compress to cls
])

Mamba-Transformer hybrid

Temporal connections via Mamba, spatial via attention:

layout = CanvasLayout(
    T=64, H=8, W=8, d_model=512,
    regions={
        "spatial": RegionSpec(bounds=(0,64, 0,8, 0,8),
                              default_attn="cross_attention"),
        "temporal": RegionSpec(bounds=(0,64, 0,8, 0,8),
                               default_attn="mamba"),
    },
)

# Spatial: same-frame self-attention
# Temporal: sequential state-space across frames
topology = CanvasTopology(connections=[
    Connection(src="spatial", dst="spatial", t_src=0, t_dst=0),
    Connection(src="temporal", dst="temporal"),  # mamba processes full sequence
    Connection(src="spatial", dst="temporal", fn="copy"),  # share features
])

Perceiver-style bottleneck

Large input compressed through a small latent:

layout = CanvasLayout(
    T=1, H=32, W=32, d_model=768,
    regions={
        "input": RegionSpec(bounds=(0,1, 0,32, 0,32), is_output=False),
        "latent": RegionSpec(bounds=(0,1, 0,4, 0,4),    # 16 positions
                             default_attn="cross_attention"),
        "output": RegionSpec(bounds=(0,1, 0,8, 0,8)),
    },
)

topology = CanvasTopology(connections=[
    Connection(src="latent", dst="input", fn="perceiver"),  # compress 1024→16
    Connection(src="latent", dst="latent"),                  # process in latent
    Connection(src="output", dst="latent"),                  # decode from latent
])

RWKV-style linear recurrence

For very long sequences where O(N²) attention is infeasible:

layout = CanvasLayout(
    T=1024, H=1, W=1, d_model=512,
    regions={
        "sequence": RegionSpec(bounds=(0,1024, 0,1, 0,1),
                               default_attn="rwkv"),
    },
)

topology = CanvasTopology(connections=[
    Connection(src="sequence", dst="sequence"),  # RWKV processes full sequence
])
# O(N) instead of O(N²) — feasible for 1024+ timesteps

v2: carriers for mixed dynamics

Regions can declare different dynamics carriers. A video prediction canvas might mix diffusive future frames with deterministic observed frames and filtered belief state:

from dataclasses import dataclass
from canvas_engineering import Field, compile_program

@dataclass
class WorldModel:
    observed: Field = Field(12, 12, family="observation", carrier="deterministic")
    predicted: Field = Field(12, 12, family="observation", carrier="diffusive")
    belief: Field = Field(4, 4, family="state", carrier="filter", tags=("belief",))
    error: Field = Field(2, 2, family="residual", carrier="residual")
    action: Field = Field(1, 4, family="action")

bound, program = compile_program(WorldModel(), T=16, H=16, W=16, d_model=512)
# observed: deterministic forward pass
# predicted: diffusion/denoising dynamics
# belief: predict/correct updates
# error: tracks prediction error for scheduling