AI-Generated 1-Minute Cartoons with New Video Architecture
Nvidia’s research team has developed a new method for creating AI-generated videos with coherent stories up to one minute long. Previous models like Sora or Veo 2 were limited to 20 seconds due to the exponential increase in computational demands. The breakthrough lies in Test-Time Training Layers (TTT-Layers) integrated into Transformer architecture, which continuously learn during video generation, reducing processing strain and capturing scene-to-scene continuity. Using Tom and Jerry cartoons as a demo, Nvidia extended its CogVideo-X model from 3-second clips to over a minute. While some transition and lighting issues remain, the approach paves the way for longer AI-generated videos with manageable computing costs.