Date: Monday, Oct 28th, 10am
Speaker: Daniele Calandriello
Title: Aligning LLMs with RLHF: from best response to learning to beat the game
Slides: https://drive.google.com/file/d/1jgj8zco9v6Ba0gPZ5ELTgiPEMUCT1zCk/view?usp=drive_link
Date: Monday, Oct 28th, 2pm
Speaker: Benjamin Eysenbach
Title: Self-Supervised Reinforcement Learning: Algorithms and Emergent Properties
Abstract: In this talk, I will discuss recent work on self-supervised reinforcement learning, focusing on how we can learn complex behaviors without the need for hand-crafted rewards or demonstrations.
I will introduce contrastive RL, a recent line of work that can extract goal-reaching skills from unlabeled interactions. This method will serve to highlight how there is much more to self-supervised RL than simply adding an LLM or VLM to an RL algorithm; rather, self-supervised RL can be seen as a form of generative AI itself. I will also share some recent work on blazing-fast simulators and new benchmarks, which have accelerated research in my group. Finally, I'll discuss emergent properties in self-supervised RL: preliminary evidence that we have found, and hints for where to go searching for more.
Slides: https://docs.google.com/presentation/d/1J5obo1Oq_b05QOqNdF2eq1gptunbokA-q3X-ty9qDJ8/edit?usp=sharing
Date: Monday, Oct 28th, 4:30pm
Speaker: Claire Vernade
Title: Foundations for Continual Reinforcement Learning
Abstract: Despite its deep connections to dynamical systems, Reinforcement Learning (RL) is typically studied under the assumption of stationarity: the goal is to learn a static policy that maps states to optimal actions. But how well does this assumption hold in real-world applications?
Recent advances have led to various attempts to apply RL algorithms to real-world systems, such as data center cooling or plasma configuration for nuclear fusion reactors. However, these complex systems are often only partially observable and present themselves as non-stationary targets to the controller—a challenge that current RL methods are ill-equipped to handle.
In this talk, I will review these challenges and introduce alternative problem definitions designed to make RL more effective in dynamic environments. This work is part of my Emmy Noether project, "Foundations for Lifelong RL." More details are available at www.cvernade.com.
Slides:
Date: Tuesday, Oct 29, 9am
Speaker: Nicolas Mansard
Title: Using hard constraints (and a good robot model) to ground motion understanding extracted from wild animal videos into real robot locomotion
Abstract: Reinforcement learning (RL) is driving significant progress in robotics, particularly for quadruped locomotion. However, current methods often require extensive tuning, especially in designing reward functions, partly due to the limitations of simulators when handling robots at their physical limits. We address this by explicitly incorporating hard constraints, introducing Constraints as Terminations (CaT), which improves RL performance by terminating policies when constraints are violated. Using this approach, we achieve parkour capabilities—walking, jumping, and crawling—on the real Solo robot, trained directly from pixel inputs. We further extend this work by defining reward functions through large-scale observation of wild animal videos, allowing robots to ground natural movements into their behaviors.
Slides:
Date: Tuesday, Oct 29, 11:30am
Speaker: Felix Berkenkamp
Title: What can Uncertainty do for you in RL?
Abstract: Uncertainty plays an important role in reinforcement learning: in MDPs it allows us to discover sparse rewards, while in POMDPs we need it to figure out which task we are interacting with. In this talk, we will take an empirical look at explicit and implicitly representing this uncertainty in the RL pipline and discuss what kind of problems these methods can address.
Slides: https://drive.google.com/file/d/1zdwM4VOcQ761V8vXCtQsOol82un7UNHc/view?usp=drive_link
Date: Tuesday, Oct 29, 2pm
Speaker: Aske Plaat
Title: RL, but not as we know it
Abstract: In this talk I review reasoning with large language models. LLMs were known to have trouble with reasoning problems. However, when instructed by the prompt to think step by step, in a sequential decision fashion, their performance on standard Grade School Math Word Problems improves considerably. Using in-context learning, LLMs solve sequential decision problems by zero-shot learning. I will discuss different methods for solving sequential decision problems with large language models, highlighting the use of reinforcement learning methods in guiding the solution process.
Slides: https://drive.google.com/file/d/1iE9rpZUsssBdyJYr7k4qnIteoKy33l6m/view?usp=drive_link
Date: Tuesday, Oct 29, 4:30pm
Speaker: Antoine Cully
Title: Creative Machines: Unleashing the Creativity of Evolutionary Reinforcement Learning to Make Resilient Robots and New Discoveries
Abstract: Robots have transformed many industries, most notably manufacturing, and have the power to deliver tremendous benefits to society, for example, in search and rescue, disaster response, health care, and transportation. A major obstacle to their widespread adoption in more complex environments and outside of factories is their fragility. While animals can quickly adapt to injuries, current robots cannot "think outside the box" to find compensatory behaviour when they are damaged: they are limited to their pre-specified self-sensing abilities, which can diagnose only anticipated failure modes and strongly increase the overall complexity of the robot. In this talk, I will present how learning algorithms can allow robots to adopt the appropriate behaviours in response to damage. To achieve this, we need algorithms that not only find a single high-performing solution but instead algorithms that capture the whole range of the robots’ abilities. We named these algorithms Quality-Diversity algorithms and I will show you in this talk how they can be used to make robots more resilient and versatile. We also show how these new algorithms also unlock the path to new materials, proteins, game designs and many other discoveries.
Slides: https://drive.google.com/file/d/1zlJuz4wRH-brF5BNyb-uXBvKjcU-JSiq/view?usp=drive_link
Date: Wednesday, Oct 30, 9am
Speaker: Ahmed Touati
Title: Towards foundation models for behaviours
Abstract: Foundation models pre-trained on vast amounts of unlabeled data have flourished in many fields of machine learning. A natural step forward is to extend this approach beyond language and visual domains, towards behavioral foundation models for agents interacting with dynamic environments through actions.
In this talk, I will explain how we can build zero-shot controllable agents, able to be pre-trained without rewards or tasks, yet can express a diverse range of behaviors in response to various prompts, including behaviors to imitate, goals to achieve, or rewards to optimize, all without the need for planning or fine-tuning.
Slides:
Date: Wednesday, Oct 30, 11:30am
Speaker: Elise van der Pol
Title: Symmetry & Structure in RL – State of the art, challenges, and future directions
Abstract: In this talk, I will give an overview of the state of the field of symmetries and structure in reinforcement learning. I'll discuss the why and how of symmetric RL, standing challenges in the field, what work has been done to address those challenges and identify key directions for future work.
Slides:
Date: Wednesday, Oct 30, 2pm
Speaker: Emma Brunskill
Title: Efficiently Learning Personalized Policies
Abstract: From education to healthcare, personalized interventions have enormous potential to enhance human lives. While data-driven recommendations and decision policies dominate online consumer applications, experimentation and data collection is often vastly more limited and costly in other applications. I will discuss our algorithms to efficiently learn high performance policies for such settings.
Slides: