Overview

We are witnessing groundbreaking results in image-to-text and image-to-video models. However, the generation process with these models is iterative and computationally expensive, requiring multiple sampling steps through large models. There is a growing need to make these algorithms faster for serving millions of users without the use of too many GPUs/TPUs. In this course, we will focus on techniques such as progressive parallel decoding, distillation methods, and Markov Random Fields to achieve speedup on generative models.

Speakers

Richard Hartley
Richard Hartley
Australian National University
Sadeep Jayasumana
Sadeep Jayasumana
OCTAVE | Ex-Google AI Research
Ameesh Makadia
Ameesh Makadia
Google Research
Srikumar Ramalingam
Srikumar Ramalingam
Google Research


Schedule

  • Date: June 12, 2024
  • Time: 9:00 AM - 12:30 PM
  • Location: 202 A
Time Instructor Title
9:00 AM Richard Hartley Mathematics of Diffusion Models
9:45 AM Srikumar Ramalingam Cornerstones of the text-to-image Generation
10:30 AM Break
11:00 AM Sadeep Jayasumana Efficient Text-to-Image Generation via Structured Discrete Prediction
11:45 AM Ameesh Makadia Latent representations for efficient text-to-image and text-to-video generation