As in the past years, ISIT2024 will host a number of tutorials on new and emerging topics
within the scope of the conference. The following tutorials will be held in the AM slot on July 7, 2024:
- Theory and Methods for Deep Generative Models
- Information-Theoretic, Statistical and Algorithmic Foundations of Reinforcement Learning
- Language Model Inference: Theory and Algorithms
The following tutorials will be held in the PM slot on July 7, 2024:
- Graph Matching: Fundamental Limits and Efficient Algorithms
- Coding Theory for Exascale Storage Systems
- Scaling and Reliability Foundations in Machine Learning
Theory and Methods for Deep Generative Models
Presenters: Yao Xie, Taiji Suzuki, and Xiuyuan Cheng
In recent years, there has been a surge of interest in deep learning-based generative models from machine learning and signal processing researchers, statisticians, and practitioners alike. Apart from high-quality sample generation in industry applications, these models offer several distinct advantages, including neural network architectural flexibility, log-likelihood computation, and the ability to solve inverse problems without retraining the deep learning models.
In this tutorial, we will delve into the fundamentals of popular generative models, such as score-based diffusion models and flow-based models, covering the underlying intuition, basic concepts, mathematical foundations, implementation details, potential applications, and new directions. We will also introduce some theoretical tools and recent advances in analyzing the performance guarantee of such deep models. The tutorial is aimed to help audiences interested in such topics gain a basic understanding of research frontiers.
Information-Theoretic, Statistical and Algorithmic Foundations of Reinforcement Learning
Presenters: Yuejie Chi, Yuxin Chen, and Yuting Wei
As a paradigm for sequential decision making in unknown environments, reinforcement learning (RL) has received a flurry of attention in recent years. However, the explosion of model complexity in emerging applications and the presence of nonconvexity exacerbate the challenge of achieving efficient RL in sample-starved situations, where data collection is expensive, time-consuming, or even high-stakes (e.g., in clinical trials, autonomous systems, and online advertising). How to understand and enhance the sample and computational efficiencies of RL algorithms is thus of great interest and in imminent need. In this tutorial, we aim to present a coherent framework that covers important algorithmic and information-theoretic developments in RL, highlighting the connections between new ideas and classical topics. Employing Markov Decision Processes as the central mathematical model, we start by introducing classical dynamic programming algorithms when precise descriptions of the environments are available. Equipped with this preliminary background, we introduce four distinctive RL scenarios (i.e., RL with a generative model, online RL, offline RL, and multi-agent RL), and present three mainstream RL paradigms (i.e., model-based approach, model-free approach, and policy optimization). Our discussions gravitate around two central questions: what are the information-theoretic limits in each RL scenario, and can we achieve these fundamental limits using computationally efficient algorithms? We will systematically introduce several effective algorithmic ideas (e.g., the optimism principle for online RL, the pessimism principle for offline RL, variance reduction) that permeate the design of efficient RL algorithms.
Language Model Inference: Theory and Algorithms
Presenters: Ahmad Beirami and Ananda Theertha Suresh
In recent years, large language models have been used to solve a multitude of natural language tasks. In this tutorial, we give a brief overview of the history of language modeling and the fundamental techniques that led to the development of the modern language models behind Claude, Gemini, GPT, and Llama. We then present an overview of inference techniques for language models, when they are treated as black-box probabilistic models.
The tutorial is structured as four segments: In the first segment, we provide a historical perspective on language models and present a black-box probabilistic model that serves as a foundation for understanding the inference from these models. In the second segment, we examine black-box techniques aimed at aligning models towards various goals (e.g., safety), such as controlled decoding and the best-of-N algorithm. In the third segment, our focus then shifts to efficiency, where we examine information-theoretic techniques designed to improve inference latency, such as model compression or speculative decoding. If time permits, in the final segment, we discuss in-context learning, which has become one of the standard ways to solve new tasks using black-box language models.
Throughout the tutorial, we highlight open problems that might be of interest to the audience. The tutorial does not assume any prior knowledge on language models.
Graph Matching: Fundamental Limits and Efficient Algorithms
Presenters: Hye Won Chung and Lele Wang
This tutorial provides a comprehensive overview of the graph matching problem, also known as the graph alignment problem, network alignment problem, or noisy graph isomorphism problem. The main goal of this problem is to identify a correspondence between vertices or users across two or more correlated graphs. This challenge is relevant in a range of applications, such as domain adaptation for generalizing beyond the training distribution in machine learning, linking shared entities across complementary graphs for knowledge graph completion in data science, analyzing brain connectomes from correlated brain imaging in biomedical applications, and de-anonymizing keywords in searchable encryption.
From a mathematical perspective, graph matching is a quadratic assignment problem, which is NP-hard in the worst case. Yet, in many practical networks that are effectively modeled by random graphs, a phase transition phenomenon is observed in the asymptotic limit. This motivates theorists to examine, in an average case scenario, the threshold at which this phase transition occurs and to assess whether polynomial-time algorithms can reach this threshold.
This tutorial introduces key technical tools used in studying graph matching and reviews the recent findings concerning the information-theoretic limits and the development of efficient algorithms. It also aims to spark discussions on open problems and future research avenues in this field.
Coding Theory for Modern Exascale Storage Systems
Presenters: Rashmi Vinayak and Saurabh Kadekodi
By 2025, the total data stored in the cloud is estimated to grow over 100 Zettabytes (1 ZB = 1 million TB). Given that Moore’s Law and Kryder’s Law (the storage equivalent of Moore’s Law) have both ended, thus ending the “free” scaling benefits of processing power and storage capacity, the only way to survive the data juggernaut is by building larger-and-larger distributed systems. Modern exascale storage clusters are made up of hundreds-of-thousands of storage devices. At such scales failures are common, and erasure coding is the de-facto redundancy mechanism for protection against permanent data loss. Despite the widespread adoption of erasure codes in storage clusters and the advances in coding theory, the scale and resource-efficiency challenges posed by modern exascale clusters necessitate radically new ideas and innovation.
There is a complex balancing act between various resources which drive the cost of running large-scale storage clusters. The usual culprits are storage capacity, spindle (disk-time) efficiency, network traffic, memory utilization and computation. This tutorial will introduce the complex systems tradeoffs in an accessible way to the information-theory community, viewing the coding challenges in modern storage clusters from a practical lens. First, we describe the challenges, goals and techniques of erasure coding done in current exascale storage systems. Second, we provide a data-driven motivation for why one-scheme-fits-all erasure coding approaches employed today are both inefficient and insufficient. Third, we describe the systems architecture for designing adaptable storage systems, and cover the state-of-the-art theoretical models and code constructions that enable adaptability. Fourth, we showcase the technique of adding a new erasure code in a representative, popularly used open-source distributed storage system. Finally we end the tutorial by discussing open research directions. Overall, this tutorial will provide a holistic understanding of the setup, challenges, existing solutions and open research directions in coding practices of modern exascale storage systems.
Scaling and Reliability Foundations in Machine Learning
Presenters: Volkan Cevher, Grigorios Chrysos, Fanghui Liu, and Leena Chennuru Vankadara
Historically, we are thought to use concise signal models in information theory, signal processing, and machine learning to generalize better in the wild. Counter intuitively, this dogma has been proven (partly) wrong by deep neural networks, while rising into stardom, e.g., large video/language models, widely used for natural language, image, speech, and other multi-modal tasks with unprecedented performance. In this tutorial, we will cover foundational deep learning theory that explains their scaling behavior, identify fundamental flaws in setting up learning goals that govern their robustness, and the impact of the architectural choices into robustness.
Tutorials Chairs
Alex Dimakis (University of Texas, Austin)
Lalitha Sankar (Arizona State University)