Notes on Improving Physics in Visual World Generation
Self-Refining Video Sampling (Jan 26) Self-Refining Video Sampling An inference-time method that lets a pre-trained flow-matching video generator...
Notes on Video Models Events
WorldCanvas (Dec 25’) The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text A...
Notes on Video Model Efficency
FastVideo FastVideo is a unified post-training and inference framework for accelerated video generation. Github Context Forcing (Feb 26’) Context...
Notes on Playable World Models
Playable World Models Work / System Main datasets / environments Core architecture (high level) Key interaction signals Notable features /...
Exponential Moving Average (EMA) in PyTorch
A quick notes on what EMA is
Seminar in Cognitive Science Summary and Thoughts
My write up notes for Winter 2026 COGSCI600 Cognitive Science Seminar. Summary by Max Ku (Me).
Notes on World Models Agents
SIMA 2 (Dec 25’) SIMA 2: A Generalist Embodied Agent for Virtual Worlds SIMA 2 is a Gemini-based vision-language-action agent that operates from...
Notes on World Models Memories
World Models Memories FlowWM (Jan, 26’) Flow Equivariant World Models: Memory for Partially Observed Dynamic Environments Flow Equivariant World...
JEPA (Joint-Embedding Predictive Architecture)
JEPA (Joint-Embedding Predictive Architecture) is a self-supervised learning method that predicts abstract representations of missing data (targets) from visible data (context) in a shared latent space, avoiding direct pixel prediction or contrastive forces, leading to more semantic, stable, and efficient learning for tasks like image and video understanding by focusing on core concepts rather than pixel details.
From Cold Emails to Gravity Wells
Some thoughts after a year into my CS PhD.









