Article URL: https://www.wespiser.com/posts/2026-06-19-best-dog-treat.html Comments URL: https://news.ycombinator.com/item?id=48633410 Points: 16 # Comments: 3

Bebop, my 83lb, 33 inch tall, Greyhound, loves three things: running fast, following me around the house, and treats. Whether it’s a chew treat, pizza out of a child’s hand who strayed too far from a party, or a small tray of cat food, he has a nose for what he likes and the athleticism to give him a fair shot at getting it. I’ve watched him eat for years, so it was upsetting to realize I don’t know what his favorite snack is, and can’t easily ask him. Fortunately for Bebop’s palate, the Bradley-Terry model gives us a way to figure out a “strength” of treat from pairwise comparisons. The model assigns each competitor (or treat) (i) a positive strength score pi. Given two competitors i and j, the probability that i beats j is: So the model is saying: the difference between two competitors’ latent strengths determines the log-odds that one beats the other. The Elo rating system used in chess is closely related. If Ri and Rj are Elo ratings, then: However, modern Elo ratings are calculated incrementally to avoid expensive recompute cycles and allow scores to be updated after each match. After the game, (A)’s rating is updated by comparing the actual result to the expected result: where SA is the actual score: (1) for a win, (0.5) for a draw, and (0) for a loss. The constant K controls how much ratings move after each game. So if a player wins a game they were expected to win, their rating only moves slightly. If they win a game they were expected to lose, their rating moves a lot. In this sense, Elo can be thought of as an online version of the Bradley-Terry idea: after each result, move the ratings in the direction of the prediction error. Elo makes sense for systems like chess because games arrive continuously and ratings need to update immediately. In this experiment, the dataset is small enough that we can simply fit the Bradley-Terry model directly after collecting the trials. You might also recognize a related model from The Social Network movie, where global ranking from pairwise comparisons powered FaceSmash, an early social media experiment by Mark Zuckerberg.1 A third application is Chatbot Arena, which uses Bradley-Terry style rankings for model performance.2 Bradley-Terry is the solution you reach for when you want a global ranking but only have head-to-head comparisons.