PhD for Dummies
Famous research papers, explained in layers. Start at the five-year-old version and climb as far as you want, up to the one a researcher would pick apart. Each comes with clean diagrams and a live demo you can play with.
WebArena: A Realistic Web Environment for Building Autonomous Agents
A self-hostable web world plus 812 everyday tasks that score an agent on whether the website actually reached the goal, not whether its clicks matched a script, and the best GPT-4 agent finishes only 14% of what humans finish 78% of.
Voyager: An Open-Ended Embodied Agent with Large Language Models
A Minecraft agent that learns forever by having GPT-4 write code for each new task, debug it against the game, and save the working programs as reusable skills that compose into harder ones.
Trust Region Policy Optimization
Improve a policy by taking the biggest step that still keeps the new policy close to the old one, so the local estimate of the improvement stays trustworthy and the real performance goes up almost every time.
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Instead of writing one answer left to right, a language model lays out partial answers as a branching tree, scores each branch itself, and searches the tree, keeping the promising paths and backtracking out of dead ends.
Training Language Models to Follow Instructions with Human Feedback
Teaching a language model to do what people actually want by learning a reward from human rankings and then nudging the model toward it, so a 1.3B model beats one 100 times its size.
Thinking Fast and Slow with Deep Learning and Tree Search
Split learning into a slow planner that searches and a fast net that copies it, and let each one make the other stronger every round until the pair beats the reigning champion.