Workshop on Reinforcement Learning Beyond Rewards: Ingredients for Developing Generalist Agents

Reinforcement Learning Conference (RLC) 2025

August 5, 2025

@RLBRew_RLC · #RLBRew_RLC


Accepted Papers



Offline RLAIF: Piloting VLM Feedback for RL via SFO

Provably Learning from Language Feedback

Provably Learning from Language Feedback

Reward Learning through Ranking Mean Squared Error

Which Rewards Matter? Reward Selection for Reinforcement Learning from Limited Feedback

Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning

Improving LLM-Generated Code Quality with GRPO

A Geometric Lens on RL Environment Complexity Based on Ricci Curvature

Implicit vs. Explicit Offline Inverse Reinforcement Learning: A Credit Assignment Perspective

Using Discrete Overlapping Partitions for Count-Based Exploration

Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning

Environment Agnostic Goal-Conditioning, A Study of Reward-Free Autonomous Learning

Zero-Shot Constraint Satisfaction with Forward-Backward Representations

VideoAgent: Self-Improving Video Generation for Embodied Planning

Towards Continual No-Regret Learning

Exploration for the Efficient Deployment of Reinforcement Learning Agents

Flattening Hierarchies with Policy Bootstrapping

Motion-Planning via Contrastive Reinforcement Learning and Monte-Carlo Tree Search

Towards An Option Basis To Optimize All Rewards

The World Is Bigger: A Computationally-Embedded Perspective on the Big World Hypothesis

InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context

A Unified Framework for Unsupervised Reinforcement Learning Algorithms

Unsupervised Skill Discovery in Non-Markov Settings with Empowerment

Regularized Latent Dynamics Prediction is a Strong Baseline For Behavioral Foundation Models

MR-CRL: Leveraging Predictive Representations for Contrastive Goal-Conditioned Reinforcement Learning

Should We Ever Prefer Decision Transformer for Offline Reinforcement Learning?

Curiosity-Driven Exploration via Temporal Contrastive Learning

Mixture of Autoencoder Experts Guidance using Unlabeled and Incomplete Data for Exploration in Reinforcement Learning

Fine-tuning Behavioral Cloning Policies with Preference-Based Reinforcement Learning