I’ve recently been scraping the web for top book lists in an attempt to identify which books pop up on multiple lists of “Best books in 2019”. From 48 sources (listed below), here are the top 20:
To dive a little deeper, below is my first pass at an interactive tool that allows you to see the top books and filter to sources and source types.
- The interactivity is a terrible experience on mobile. I know.
- For simplicity, I’ve excluded books that don’t appear on more than one list. If you want to see books that are on one list, go look at the lists (linked below).
- If you see something off, let me know. It’s entirely possible that due to a typo in the publication or something manual that I did that something is broken here.
- I don’t feel strongly about my list categorizations; I just figured a raw list of 48 was too much to deal with. If you have other suggestions for how I might categorize these sources, I’m all ears.
What about overlap between lists? Which lists are most similar to other lists, and which have unique content?
This visualization may take a bit of explaining. Your first step is to choose a list (I’ve chosen the Booker Longlist as an example). This will toggle the list (y-axis) to only show books that are on the Booker Longlist. The length of the bars (x-axis) show how many other lists this book is on.
From this, you can get a sense of how “mainstream” a list’s books are: is this list filled with books that are on other lists, or is there quite a long “long tail” of books that are only on this list?
If I were using Tableau (embeds are not supported on Medium), I would do some cool things like:
- Hover to show the other lists
- Color the bars by list category
- Probably a lot of other cool things I haven’t thought of
But I’m not using Tableau for this first proof-of-concept; we’ll see if I have the energy/interest to go deeper there and figure out where to publish.
Some things that would be interesting to look at, but I don’t have a good way to automate the importing of this information (suggestions welcome):
- Breakdowns by author demographic (gender, race, age, nationality)
- Themes common across top books
- Anything numeric, like month published, length (words/pages, bumber of weeks on the NYT bestsellers list, etc
More to come? Maybe? In the meantime, here are the sources if you want to take a look yourself.
Booksellers / Libraries:
Prizes / voting:
Missing your favorite list? Link to it in the comments below and I’ll add it. Note that I’ve intentionally excluded:
- Lists that aren’t specific to books published in 2019 (or late 2018), like Amazon’s bestsellers (and other bestsellers lists) or Obama’s top books
- Lists that were really annoying to scrape accurately, like the Paris Review’s Staff Picks (can easily get title, but not author)
- Lists that don’t come out until Spring, like the Pulitzer Prize or the national Book Critics Circle Award
Other feedback welcome; leave a comment below or send me a note here.