Skip to content

canvas-engineering

Prompt engineering, but for latent space.

canvas-engineering gives video diffusion models structured latent space. You declare which regions carry video, actions, proprioception, reward, or thought — their geometry, temporal frequency, connectivity, attention function types, and loss participation — and the canvas compiles that declaration into attention masks, loss weights, and frame mappings.

The layout is the schema. The topology is the compute graph. Together they form a type system for multimodal latent computation.

Canvas allocations

Two orthogonal ideas

1. The canvas — structured multimodal latent space

Large video diffusion models generate video. The spatiotemporal canvas extends them to predict actions, estimate rewards, and process proprioception by placing heterogeneous modalities on a shared 3D grid. You design the schema, the model attends over everything.

2. Looped attention — weight-sharing regularization

Looped attention iterates transformer blocks with learned iteration embeddings. Result: 1.73x parameter efficiency (p<0.001). A frozen CogVideoX-2B backbone + 350K trainable loop parameters outperforms 11.5M unfrozen parameters. 3 loops is optimal.

Key features

  • Field + compile_schema() — Declare Python types whose fields are latent regions, compile to canvas schemas with auto-wired connectivity
  • CanvasProgram — Typed process layer with families, carriers, clocks, and compile modes
  • RegionSpec — Declare geometry, temporal frequency, loss weight, semantic type, and default attention function per region
  • CanvasTopology — Directed graph of attention operations with temporal constraints and per-edge function types
  • CanvasSchema — Portable JSON-serializable bundle (layout + topology + metadata)
  • transfer_distance() — Cosine distance between semantic type embeddings estimates adapter cost
  • 18 attention function types — From standard cross-attention to Mamba, Perceiver, RWKV, CogVideoX-native, and more
  • Temporal fill modes — DROP, HOLD, INTERPOLATE with higher-order IDW for cross-frequency attention
  • graft_looped_blocks() — One-line grafting onto CogVideoX with 350K trainable params

Install

pip install canvas-engineering

Runnable examples

All examples train real models and generate visualizations. No GPU required.

Example What it shows
Hello Canvas Types Declare, compile, train, visualize in 30 lines
Multi-Frequency Fusion Bandwidth-proportional allocation vs flat baseline
CartPole Control Real gym env, self-consistency loss, plan field PCA

Next steps