What I Learned From Analyzing My Goodreads Data

In Nassim Taleb’s bestselling book The Black Swan he says, “[he] completely gave up reading newspapers and watching television.” This freed up one hour or more a day, which he said, “[was] enough time to read more than a hundred additional books per year, which, after a couple of decades, starts mounting.”

This really got me thinking. Taleb made a small change that would allow for hundreds of extra books to be read over a decade. Over the last couple of years, I too have made small changes in my life in order to maximize the number of books I read. As the total amount of books read grows, the more I feel the need to have an organized list of everything I’ve read. I already have a basic spreadsheet with my books, but I want to get more info. The inner accountant in me keeps saying: “add more columns!”

Since January 2018 I have read 226 books, or slightly over 2 books a week.

I wanted to do an exercise where I comb through the book data I have generated to see what I might learn about my book habits.

Before I started, I knew I wanted to find a few things out: 

  1. I am fascinated by book ratings. I’ve tried to create my own ranking system but haven’t been able to create a system that I like. I wanted to look at the Goodreads data to see how my own ratings compared to the average Goodreads ranking for a book.
  2. I wanted a better understanding of the genres I read. When people ask me what genres of books I like to read, I usually tell them “biographies.” I’m curious if the data says that’s true.

The Data

I keep a detailed book log of all the books I read. Recently, I started putting more effort into keeping my Goodreads account accurate and up to date. Goodreads has an export option that allows you to export the books you have read.

I merged my own book log with the Goodreads data and began summarizing the data. I also found this online tool which turns your exported Goodreads data spreadsheet into colorful graphs and charts. There was also a python script I used to gather more information from Goodreads which wasn’t included in the original export.

Ranking

Up to this point, whenever I give a rating on Goodreads I don’t think much about the ranking. My average ranking is 4.06, which seems a little high to me. Fifty-six of the books (25% percent) were rated a perfect 5, or as Goodreads describes them – “it was amazing.” 

I am okay with having a high average rating. I expend considerable effort selecting the books I read. When I read a book I expect it to be good because I have prescreened it. But how could I come up with which books were the real crème de la crème?

The top 10%

One ranking system I tried last year was to use a standard percentage to determine how many “5-star” ratings I could give. Basically, I would limit “5-star” ratings to 10 percent of the total amount of books read. I decided to revisit this concept and determine what the top 20 books of the last 25 months were, knowing full well that one books’ inclusion was another book’s exclusion. To arrive at 20 books, I excluded duplicate readings – I was left with 204 unique books read since 2018.

Before I reveal the list, I want to clarify that I didn’t rate books as 5-stars because of any identifiable condition (e.g., literary superiority, excellent prose, I-can’t-put-this-book-down etc.). Rather, it’s some combination of all of them – to be succinct: these books inspired me.

The List (in alphabetical order):

Honorable Mentions

Me vs. Goodreads

I thought it would be interesting to look at which books I differed the most from the average Goodreads rating.

The 3 books I rated higher than the average Goodreads rating:

TitleAuthorMy RatingGoodreads Rating
Business AdventuresJohn Brooks53.81
The Adventures of Tom SawyerMark Twain53.91
Eat and RunScott Jurek53.99

The 3 books I rated lower than the average Goodreads rating:

TitleAuthorMy RatingGoodreads Rating
The Long WalkRichard Bachman (Stephen King)24.11
For Whom The Bell TollsErnest Hemingway23.97
The Tipping PointMalcolm Gladwell23.96

Genres

Through this process, I learned that classifying a book’s genre is not an easy task.

I used the tool mentioned above to download my Goodreads genre data. Goodreads uses crowd-sourced information by its users to populate the genres for a book. Users can place books into their own digital bookshelves on Goodreads, and they can call their shelves whatever they’d like. For example, a user might have a “mystery” shelf, which she populates with books she thinks fits her own self-imposed criteria. Goodreads then aggregates people’s shelves and shows the results under the “Genre” section of a book’s page.

As part of my analysis, I was able to bring in the top-5 genres for each book. 

I wanted to be able to sort my data by literary type (i.e., fiction or nonfiction) and by the genres of the books. This meant for each book I had to label it as one genre. An almost impossible task. Right away I struggled with what a “business” book is. A lot of the books I’ve read I wanted to classify as a “business” book, but other genres seemed more adequate. For example, take Malcolm Gladwell’s Outliers. Its top genre (other than the generic “Nonfiction” label) is Psychology. But almost as many people rated it as a “Business” book. I didn’t read Outliers because I thought it was a Psychology book, rather, I read it because it seemed like a business book.

Outliers’ Goodreads data:

The Winners

My top 3 genres (when a book was just classified as one genre) were:

  1. Biography – 36 books
  2. Business (so much for not labeling a book as “Business) – 34 books
  3. Science Fiction – 31 books

Altogether, my top-3 genres comprise 46% of all the books I’ve read since 2018.

One More Approach

I wasn’t that happy with my one-genre-per-book approach. Luckily, I was able to gain more insight from the Bookstats tool mentioned above. I uploaded my Goodreads data onto the website and it spit out the following graph.

The approach this time is to include each of the top-5 genres of a book. These results display a better array of my reading. When considering book’s as multiple genres my top-3 now becomes:

  1. Business
  2. Biography
  3. History

Other Insights

Weekdays

One of the more curious insights I found is that I finish books on Wednesday more than any other day. This is 36% more than Saturday, my next most likely day to finish a book.

Genre Ratings

Per the graph below, I tend to rank Biography and History books higher than any other genre, and Psychology and Personal Development books the lowest.

Longest Books I’ve Read

I found this chart very interesting. It plots the books I’ve read by the number of pages in the books. As you can see most of the books I’ve read are around 300 pages. I’ve read two books which were over 1,000 pages; The Power Broker (you can read what I wrote about The Power Broker here), and Master of the Senate. Both those books were written by Robert A. Caro. The Longest non-Caro book which I have read is The Snowball: Warren Buffett and the Business of Life.

Pages and Words

Per the online tool, I have read around 81,513 pages, which equates to around 22 million words.

Final Thoughts

I hope you’ve found all this book data interesting. Hopefully, it makes you want to read more, and to organize what you’ve read!

Going forward I’m going to implement a simple ranking system based on a 10-point scale. I found this reddit post which is about a man who read and ranked over 10,000 books in his life. There were years of his life in which he was reading 2-3 books a day. He logged the books title, date read and a ranking based on a 10-point scale. If you are interested here is a google sheet with his book list.

Write to me: sam@samuelpedro.com

Leave a Comment