Workshop on Reinforcement Learning Beyond Rewards: Ingredients for Developing Generalist Agents

Reinforcement Learning Conference (RLC) 2025

August 5, 2025

@RLBRew_RLC · #RLBRew_RLC

Accepted Papers

Offline RLAIF: Piloting VLM Feedback for RL via SFO

Jacob Beck
https://openreview.net/pdf?id=XuW9VGTz1w

Provably Learning from Language Feedback

Wanqiao Xu, Allen Nie, Ruijie Zheng, Aditya Modi, Adith Swaminathan, Ching-An Cheng
https://openreview.net/pdf?id=2wlpsUAOrT

Provably Learning from Language Feedback

Wanqiao Xu, Allen Nie, Ruijie Zheng, Aditya Modi, Adith Swaminathan, Ching-An Cheng
https://openreview.net/pdf?id=2wlpsUAOrT

Reward Learning through Ranking Mean Squared Error

Chaitanya Kharyal, Calarina Muslimani, Matthew E. Taylor
https://openreview.net/pdf?id=vFP4e54pH5

Which Rewards Matter? Reward Selection for Reinforcement Learning from Limited Feedback

Shreyas Chaudhari, Renhao Zhang, Philip S. Thomas, Bruno Castro da Silva
https://openreview.net/pdf?id=h8UQlBDV0P

Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning

Till Freihaut, Luca Viano, Volkan Cevher, Matthieu Geist, Giorgia Ramponi
https://openreview.net/pdf?id=QjJwxTecLc

Improving LLM-Generated Code Quality with GRPO

Maxime Robeyns, Laurence Aitchison
https://openreview.net/pdf?id=wdJT3KNCv6

A Geometric Lens on RL Environment Complexity Based on Ricci Curvature

Ali Saheb Pasand, Pablo Samuel Castro, Pouya Bashivan
https://openreview.net/pdf?id=9xPs0jKRZA

Implicit vs. Explicit Offline Inverse Reinforcement Learning: A Credit Assignment Perspective

Ran Wei, Harshit Sikchi, Amy Zhang
https://openreview.net/pdf?id=X3YuA7z2iX

Using Discrete Overlapping Partitions for Count-Based Exploration

Mahshid Rahmani Hanzaki, Michael Bowling
https://openreview.net/pdf?id=1FqDUpTd9i

Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning

Daniel Lawson, Adriana Hugessen, Charlotte Cloutier, Glen Berseth, Khimya Khetarpal
https://openreview.net/pdf?id=G7X1hsBLNl

Environment Agnostic Goal-Conditioning, A Study of Reward-Free Autonomous Learning

Hampus Åström, Elin A. Topp, Jacek Malec
https://openreview.net/pdf?id=RfJbVuHCgx

Zero-Shot Constraint Satisfaction with Forward-Backward Representations

Adriana Hugessen, Harley Wiltzer, Cyrus Neary, Amy Zhang, Glen Berseth
https://openreview.net/pdf?id=6OxKvduBUP

VideoAgent: Self-Improving Video Generation for Embodied Planning

Achint Soni, Sreyas Venkataraman, Abhranil Chandra, Sebastian Fischmeister, Percy Liang, Bo Dai, Sherry Yang
https://openreview.net/pdf?id=Z3AzZPCiRF

Towards Continual No-Regret Learning

David Sychrovský, Martin Balko, Martin Schmid, Michael Bowling
https://openreview.net/pdf?id=3QTZ9h2EsE

Exploration for the Efficient Deployment of Reinforcement Learning Agents

Max Rudolph, Siddhant Agarwal, Omer Gottesman, Amy Zhang, Akhil Bagaria, Sohrab Andaz, Udaya Ghai, Carson Eisenach
https://openreview.net/pdf?id=E3zbXrF2Xq

Flattening Hierarchies with Policy Bootstrapping

John Luoyu Zhou, Jonathan C. Kao
https://openreview.net/pdf?id=iDxTYJB0FP

Motion-Planning via Contrastive Reinforcement Learning and Monte-Carlo Tree Search

Kellen Kanarios, Lei Ying
https://openreview.net/pdf?id=G1xxzh9Q22

Towards An Option Basis To Optimize All Rewards

Siddarth Chandrasekar, Marlos C. Machado
https://openreview.net/pdf?id=93kcaZhYgG

The World Is Bigger: A Computationally-Embedded Perspective on the Big World Hypothesis

Alex Lewandowski, Aditya A. Ramesh, Edan Meyer, Dale Schuurmans, Marlos C. Machado
https://openreview.net/pdf?id=pDR4GXgpcY

InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context

Bryan Lincoln Marques de Oliveira, Bruno Brandão, Luana Guedes Barros Martins, Luckeciano Carvalho Melo
https://openreview.net/pdf?id=YalsDaobBg

A Unified Framework for Unsupervised Reinforcement Learning Algorithms

Siddhant Agarwal, Caleb Chuck, Harshit Sikchi, Jiaheng Hu, Max Rudolph, Scott Niekum, Peter Stone, Amy Zhang
https://openreview.net/pdf?id=tQtO75p5HA

Unsupervised Skill Discovery in Non-Markov Settings with Empowerment

Andrew Levy, George Konidaris
https://openreview.net/pdf?id=yZmp2nu8an

Regularized Latent Dynamics Prediction is a Strong Baseline For Behavioral Foundation Models

Pranaya Jajoo, Harshit Sikchi, Siddhant Agarwal, Amy Zhang, Scott Niekum, Martha White
https://openreview.net/pdf?id=JIL0IoDzn7

MR-CRL: Leveraging Predictive Representations for Contrastive Goal-Conditioned Reinforcement Learning

Muhammad Qasim Ali, Winnie Trandinh, Howard Nguyen-Huu, Alexander Wong
https://openreview.net/pdf?id=oqGLOVJwlv

Should We Ever Prefer Decision Transformer for Offline Reinforcement Learning?

Yumi Omori, Zixuan Dong, Keith W. Ross
https://openreview.net/pdf?id=VX0a61Q1Su

Curiosity-Driven Exploration via Temporal Contrastive Learning

Faisal Mohamed, Catherine Ji, Benjamin Eysenbach, Glen Berseth
https://openreview.net/pdf?id=gqjT7g5ZRa

Mixture of Autoencoder Experts Guidance using Unlabeled and Incomplete Data for Exploration in Reinforcement Learning

Elias Malomgré, Pieter Simoens
https://openreview.net/pdf?id=thCvCNpSkd

Fine-tuning Behavioral Cloning Policies with Preference-Based Reinforcement Learning

Mael Macuglia, Paul Friedrich, Giorgia Ramponi
https://openreview.net/pdf?id=vWqkLA0uzm