RL finally works, unlocking a whole host of new capabilities for LMs. The idea is simple: if you can verify what success looks like, just try a bunch of different generations and reinforce the ones that do well. As some have said, it seems the time...
Blog

User simulators bridge RL with real-world interaction