Happy New Year! Time for me to scrape the internet for as many “Best of 2021” lists as I can find, pull them all together, and make some graphs. I’ve been doing this the last couple years (2020, 2019), and it’s consistently been a pretty great source of books to read.
Here are 2021’s most-listed works of fiction:
I put these on a Goodreads Shelf here, if you want to quickly add them to your own to-read lists.
In the meantime, some casual analysis on which lists are the most accurate. Three measures here:
- Precision: which lists have entries most likely to be in the Top 19? (# of Top 19 entries divided by list length)
Short lists have an advantage in precision: if you just named your favorite book and it happens to be on the Top 19, you have 100% precision! Obama’s Favorites are a good example of this: he only listed 7 fiction books, so when 6 of them are in the Top 19, he has very high precision. Taking a book rec from Obama gives you a high chance of a “good book”, but you’ll run out of books quickly.
- Recall: which lists mentioned the most Top 19 books? (# of Top 19 entries divided by 19.)
Long lists have a similar advantage in recall: if you listed all books published in 2021, you would have 100% recall but very low precision. The top 5 highest-recall lists are all over 40 books long, but are all well-under 50% precision. If you read all the books on Chicago Public Library’s list, you’ll eventually get to almost all the very top ones, but it’ll take you awhile.
- Accuracy: so, how to combine precision and recall? Something called the F-score. I’ll let wikipedia explain the math.
Similar to Precision, this still favors shorter lists in my dataset, but you’ll notice that the shortest ones (like Obama’s, and EW’s) disappear due to their lower recall. “LitHub’s Best Reviewed” is at the top here; it’s unsurprising that it aligns with my meta-analaysis as it’s adjacent to a meta-list as well (books with the highest reviews across the web).
Have a vendetta against any of these lists? Just take them out, easy (well — not so easy on mobile, but Google Data Studio, Medium, and I are all doing our best here).
That’s all for now. I’m in the middle of compiling “Best YA of 2021” as well (to source books for the podcast I produce, Literary Connections), so hopefully you’ll see that here soon.
Tools used: webscraper.io to find the books, Google Sheets to list them, and Google Data Studio for the pretty graphs above.