Blog

What does it take to build a human-like user simulator?
What does it take to build a human-like user simulator?
Defining the right training objective is often the key to eliciting new language model capabilities. Preference models led to more helpful assistants, and verifiable rewards led to better reasoning. But if we want to build models that can...
User simulators bridge RL with real-world interaction
User simulators bridge RL with real-world interaction
RL finally works, unlocking a whole host of new capabilities for LMs. The idea is simple: if you can verify what success looks like, just try a bunch of different generations and reinforce the ones that do well. As some have said, it seems the time...