Overview
Speakers
Schedule
- Date: June 12, 2024
- Time: 9:00 AM - 12:30 PM
- Location: 202 A
Time | Instructor | Title |
---|---|---|
9:00 AM | Richard Hartley | Mathematics of Diffusion Models |
9:45 AM | Srikumar Ramalingam | Cornerstones of the text-to-image Generation |
10:30 AM | Break | |
11:00 AM | Sadeep Jayasumana | Efficient Text-to-Image Generation via Structured Discrete Prediction |
11:45 AM | Ameesh Makadia | Latent representations for efficient text-to-image and text-to-video generation |
Tutorial Contents
We will cover the mathematics/fundamentals of diffusion models [6], which is the building block for many generative methods. More emphasis will be given on the underlying theory/fundamentals, which has received minimal attention in the community.
Efficient methods and cornerstones of t2i GenerationWe will provide some background on the text-to-image generation and then go over temporal-distillation and MRF-based algorithms [3] for improving the efficiency of token-based methods such as Muse [1]..
Continuous MRF and FoE model for t2i GenerationWe will cover current metrics for image generation (such as FID) and improved metrics such as CMMD [4]. Newer methods to speedup diffusion models using MRF And Field of Experts models will be discussed..
Efficient Text-to-3D and Text-to-Video generationWe will be giving an overview of generative algorithms in 3D and video space, and particularly covering efficient algorithms driven by geometric priors for video generation [7]..
Latent representations for efficient text-to-image and text-to-video generationWe will provide an overview of different strategies for image [2] and video [7] tokenization that can improve generation efficiency. Time permitting, we will cover "data-efficient" diffusion models that can be trained on only a single 3D mesh [12].
References
1. Chang, H., Zhang, H., Barber, J., Maschinot, A., Lezama, J., Jiang, L., Yang, M.H., Murphy, K., Freeman, W.T., Rubinstein, M., Li, Y., Krishnan, D.: Muse: Text-to-image generation via masked generative transformers. ICML (2023)
2. Esteves, C., Suhail, M., Makadia, A.: Spectral image tokenizers (2024)
3. Jayasumana, S., Glasner, D., Ramalingam, S., Veit, A., Chakrabarti, A., Kumar, S.: Markovgen: Structured prediction for efficient text-to-image generation (2023)
4. Jayasumana, S., Ramalingam, S., Veit, A., Glasner, D., Chakrabarti, A., Kumar, S.: Rethinking fid: Towards a better evaluation metric for image generation (2024)
5. Mitchel, T., Esteves, C., Makadia, A.: Single mesh diffusion models with field latents for texture generation. In: CVPR (2024)
6. Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: Proceedings of the 32nd International Conference on Machine Learning (2015)
7. Suhail, M., Esteves, C., Sigal, L., Makadia, A.: Four-plane factorized video autoencoders (2024)
8. Vice, J., Akhtar, N., Hartley, R., Mian, A.: On the fairness, diversity and reliability of textto-image generative models (2024)
9. Vice, J., Akhtar, N., Hartley, R., Mian, A.: Safety without semantic disruptions: Editingfree safe image generation via context-preserving dual latent reconstruction (2024)
10. Yang, Z., Yu, Z., Xu, Z., Singh, J., Zhang, J., Campbell, D., Tu, P., Hartley, R.: Impus: Image morphing with perceptually-uniform sampling using diffusion models (2024),
11. Ranasinghe, K., Jayasumana, S., Veit, A., Chakrabarti, A., Glasner, D., Ryoo, M., Ramalingam, S., Kumar, S., LatentCRF: Continuous CRF for Efficient Latent Diffusion, arxiv 2025
12. Mitchel, T., Esteves, C., Makadia, A.: Single mesh diffusion models with field latents for texture generation. In: CVPR (2024)