Saturday, April 9, 2022

Analyzing Our Chocolate Habits – The Data Is In

A composite image of 170 chocolate wrappers consumed in 17 years. A map shows countries that we tend to consume chocolate from withing chocolate belt around equator.
Left: A composite image of 170 chocolate wrappers collected over 17 years.
Right: A map shows the countries our chocolate comes from. They all lie within the chocolate belt along the equator

In this post we analyze our chocolate consumption habits using data we've collected in our Scrapbook project. We've covered Scrapbook in detail in previous posts (2017 introduction, 2019 our memex, 2021 ten user scenarios, 2022 natural language query parser). In our 2021 ten user scenarios post, we explain Scrapbook like this:

Scrapbook is a software platform for managing personal information. Information is anything that's important to you. For us that includes books, dinners with friends, correspondence, travel summaries, people, and music to name just a few categories. The goal of the Scrapbook platform is to capture, store, and easily recall this information securely, whenever, and wherever.

Nearing 12,000 entries, it becomes increasingly interesting to ask ourselves questions about what insights might in the data that we hadn't anticipated.

We have a lot of entries about chocolate for example. Whenever we bring home a new bar, we make a new entry. We capture the label, our impressions, brand, type, percentage, producer, rating, and origin.  Can we understand a little more about something so many of us take for granted? Where does it come from? Who is making it? What are the qualities of one versus another? What drives our preferences? 

We also have a little fun at our own expense in exposing our vices and consumption habits. We did something similar in the post Orecchiette Pasta Dishes and Recipes for our Favorites: Radish Pesto and Tomato-Broccoli.

How did we approach this analysis?
  • We searched in Scrapbook under category "chocolate" and exported the results to JSON (or CSV).
  • We found 170 chocolate entries covering just under 17 years between February 2005 to October 2021.
  • The JSON data is imported into Excel for analysis and charting. Briefly, in Excel do the following:
    • Load data: Data Menu / Get Data / From File / From JSON
    • Use PowerQueryEditor
    • Create table and expand any columns required.
    • For more information, see the PowerQuery JSON docs.
  • Of the 170 entries, we threw out three which had incomplete data.
  • It's important to note that we don't make an entry for every bar of chocolate we buy or consume.
  • Each entry is a different type or brand of chocolate that we've tried and chosen to document. So that's 170 distinct kinds of chocolate from a lot of different makers. 
  • Some chocolates we've enjoyed only once and others, dozens of times.

What can we say about chocolate?
  • We tend to select chocolate containing between 60% and 80% cocoa content by mass.
  • We rate 4% of chocolate at 5/5, really exceeds expectation. (Example: Venchi: Cuor di Cacao)
  • We rate 31% of chocolate at 4/5, exceeds expectation.
  • We rate 50% of chocolate at 3/5, meets expectation.
  • We rate 3% of chocolate t a1/5, really sucked. (Example: Orion Studentska Milk Chocolate with raisins and nuts.)
  • Chocolate below 60% cacao for us doesn't typically score as high. We also don't buy it very often, which means our data is skewed toward higher cocoa percentage chocolate.
  • A scatter plot of percentage on x-axis and rating on y-axis doesn't reveal as much as we thought it might. The correlation coefficient between rating and percentage is 0.38, which is considered a low correlation. This might mean that percentage isn't the driving factor in our rating, or it is an indication that our sample selection is biased. 

A histogram of chocolate rating (1 bad, 5 good).A histogram of chocolate percentage.A scatterplot of percentage versus rating.
Left: A histogram of chocolate rating (1 bad, 5 good).
Center: A histogram of chocolate percentage.
Right: A scatterplot of percentage versus rating.

We have origin data for less than 50% of our chocolate entries. Most chocolate bars don't disclose where the beans come from, typically because they're made from a blend of beans with multiple origins.

Using the cacao origin data that we do have, we can see on a map that it falls not surprisingly in the so-called "chocolate belt", between 20 degrees above and below the equator, within which Equador, the Dominican Republic, and Peru are the countries from which we tend to collect the most chocolate.

The Dominican Republic shares the island of Hispaniola with Haiti to the west. Hispaniola is just due south and east from Cuba. In preparing this post, we wondered why we've never found chocolate from Haiti, being it shares the same island with Dominican Republic. We'll it's complicated and we won't answer it here, but that's the kind of question looking at our data can prompt. Hawaii, for reference, is on the northern edge of the belt and doesn't produce chocolate, at least not yet.

According to many sources, Ecuador provides the best cacao beans. A weird fact that we found on the Swiss Platform for Sustainable Cocoa is that Iceland consumes the most cocoa per capita at 6.6 kg each year. This might be another reason we love Iceland.  Switzerland is an important producer of cocoa.

One last point if you're sometimes confused as we've been about the distinction between cacao and cocoa: "cacao" refers to the cacao beans or more properly, seeds, themselves. Cocoa more often refers to the products which result from roasting and processing the beans.  The two terms are used interchangeably however, making hard distinctions between the two more academic than practical. 

No comments:

Post a Comment

All comments go through a moderation process. Even though it may not look like the comment was accepted, it probably was. Check back in a day if you asked a question. Thanks!