Workshop on Reinforcement Learning Beyond Rewards: Ingredients for Developing Generalist Agents
Reinforcement Learning Conference (RLC) 2025
August 5, 2025
@RLBRew_RLC · #RLBRew_RLC
Accepted Papers
Offline RLAIF: Piloting VLM Feedback for RL via SFO
- Jacob Beck
- https://openreview.net/pdf?id=XuW9VGTz1w
Provably Learning from Language Feedback
- Wanqiao Xu, Allen Nie, Ruijie Zheng, Aditya Modi, Adith Swaminathan, Ching-An Cheng
- https://openreview.net/pdf?id=2wlpsUAOrT
Provably Learning from Language Feedback
- Wanqiao Xu, Allen Nie, Ruijie Zheng, Aditya Modi, Adith Swaminathan, Ching-An Cheng
- https://openreview.net/pdf?id=2wlpsUAOrT
Reward Learning through Ranking Mean Squared Error
- Chaitanya Kharyal, Calarina Muslimani, Matthew E. Taylor
- https://openreview.net/pdf?id=vFP4e54pH5
Which Rewards Matter? Reward Selection for Reinforcement Learning from Limited Feedback
- Shreyas Chaudhari, Renhao Zhang, Philip S. Thomas, Bruno Castro da Silva
- https://openreview.net/pdf?id=h8UQlBDV0P
Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning
- Till Freihaut, Luca Viano, Volkan Cevher, Matthieu Geist, Giorgia Ramponi
- https://openreview.net/pdf?id=QjJwxTecLc
Improving LLM-Generated Code Quality with GRPO
- Maxime Robeyns, Laurence Aitchison
- https://openreview.net/pdf?id=wdJT3KNCv6
A Geometric Lens on RL Environment Complexity Based on Ricci Curvature
- Ali Saheb Pasand, Pablo Samuel Castro, Pouya Bashivan
- https://openreview.net/pdf?id=9xPs0jKRZA
Implicit vs. Explicit Offline Inverse Reinforcement Learning: A Credit Assignment Perspective
- Ran Wei, Harshit Sikchi, Amy Zhang
- https://openreview.net/pdf?id=X3YuA7z2iX
Using Discrete Overlapping Partitions for Count-Based Exploration
- Mahshid Rahmani Hanzaki, Michael Bowling
- https://openreview.net/pdf?id=1FqDUpTd9i
Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning
- Daniel Lawson, Adriana Hugessen, Charlotte Cloutier, Glen Berseth, Khimya Khetarpal
- https://openreview.net/pdf?id=G7X1hsBLNl
Environment Agnostic Goal-Conditioning, A Study of Reward-Free Autonomous Learning
- Hampus Åström, Elin A. Topp, Jacek Malec
- https://openreview.net/pdf?id=RfJbVuHCgx
Zero-Shot Constraint Satisfaction with Forward-Backward Representations
- Adriana Hugessen, Harley Wiltzer, Cyrus Neary, Amy Zhang, Glen Berseth
- https://openreview.net/pdf?id=6OxKvduBUP
VideoAgent: Self-Improving Video Generation for Embodied Planning
- Achint Soni, Sreyas Venkataraman, Abhranil Chandra, Sebastian Fischmeister, Percy Liang, Bo Dai, Sherry Yang
- https://openreview.net/pdf?id=Z3AzZPCiRF
Towards Continual No-Regret Learning
- David Sychrovský, Martin Balko, Martin Schmid, Michael Bowling
- https://openreview.net/pdf?id=3QTZ9h2EsE
Exploration for the Efficient Deployment of Reinforcement Learning Agents
- Max Rudolph, Siddhant Agarwal, Omer Gottesman, Amy Zhang, Akhil Bagaria, Sohrab Andaz, Udaya Ghai, Carson Eisenach
- https://openreview.net/pdf?id=E3zbXrF2Xq
Flattening Hierarchies with Policy Bootstrapping
- John Luoyu Zhou, Jonathan C. Kao
- https://openreview.net/pdf?id=iDxTYJB0FP
Motion-Planning via Contrastive Reinforcement Learning and Monte-Carlo Tree Search
- Kellen Kanarios, Lei Ying
- https://openreview.net/pdf?id=G1xxzh9Q22
Towards An Option Basis To Optimize All Rewards
- Siddarth Chandrasekar, Marlos C. Machado
- https://openreview.net/pdf?id=93kcaZhYgG
The World Is Bigger: A Computationally-Embedded Perspective on the Big World Hypothesis
- Alex Lewandowski, Aditya A. Ramesh, Edan Meyer, Dale Schuurmans, Marlos C. Machado
- https://openreview.net/pdf?id=pDR4GXgpcY
InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context
- Bryan Lincoln Marques de Oliveira, Bruno Brandão, Luana Guedes Barros Martins, Luckeciano Carvalho Melo
- https://openreview.net/pdf?id=YalsDaobBg
A Unified Framework for Unsupervised Reinforcement Learning Algorithms
- Siddhant Agarwal, Caleb Chuck, Harshit Sikchi, Jiaheng Hu, Max Rudolph, Scott Niekum, Peter Stone, Amy Zhang
- https://openreview.net/pdf?id=tQtO75p5HA
Unsupervised Skill Discovery in Non-Markov Settings with Empowerment
- Andrew Levy, George Konidaris
- https://openreview.net/pdf?id=yZmp2nu8an
Regularized Latent Dynamics Prediction is a Strong Baseline For Behavioral Foundation Models
- Pranaya Jajoo, Harshit Sikchi, Siddhant Agarwal, Amy Zhang, Scott Niekum, Martha White
- https://openreview.net/pdf?id=JIL0IoDzn7
MR-CRL: Leveraging Predictive Representations for Contrastive Goal-Conditioned Reinforcement Learning
- Muhammad Qasim Ali, Winnie Trandinh, Howard Nguyen-Huu, Alexander Wong
- https://openreview.net/pdf?id=oqGLOVJwlv
Should We Ever Prefer Decision Transformer for Offline Reinforcement Learning?
- Yumi Omori, Zixuan Dong, Keith W. Ross
- https://openreview.net/pdf?id=VX0a61Q1Su
Curiosity-Driven Exploration via Temporal Contrastive Learning
- Faisal Mohamed, Catherine Ji, Benjamin Eysenbach, Glen Berseth
- https://openreview.net/pdf?id=gqjT7g5ZRa
Mixture of Autoencoder Experts Guidance using Unlabeled and Incomplete Data for Exploration in Reinforcement Learning
- Elias Malomgré, Pieter Simoens
- https://openreview.net/pdf?id=thCvCNpSkd
Fine-tuning Behavioral Cloning Policies with Preference-Based Reinforcement Learning
- Mael Macuglia, Paul Friedrich, Giorgia Ramponi
- https://openreview.net/pdf?id=vWqkLA0uzm