Data Science: Exploring CoffeeReview.com Top Coffees

Over the past few years, I’ve transitioned my career from government-oriented management consulting to the field of advanced analytics and data science.

 

In general terms, this has required me to climb a significant learning curve in the related areas of computer programming languages and advanced statistical methods. While it has been challenging, the rewards of being able to more effectively and efficiently extract insights from various types of information/data is encouraging.

With the objective of exploring my love of specialty coffee, I chose to practice a few basic data science methods on a relatively well-known specialty coffee review website: coffeereview.com .

The goal was to apply web scraping, text analytics, segmentation, and some visualization techniques to coffee review data in order to explore correlations between price, producer country, roaster, and quality over time.

My colleague and I discussed the objective over Memorial Day weekend and set out on parallel paths to scrape review data from the website. He used a Python script to scrape the website, and I used an R script to do the same. In the end, his Python script achieved a more efficient scrape, producing a column separated variable (.csv) file that could be imported into a statistical computing software package like SPSS or R.

The website we targeted in this scrape was the 21 pages of: http://www.coffeereview.com/highest-rated-coffees/

 

From there, I cleaned up the file (using R packages such as “dplyr”, “stringr” and “sqldf” to get things to a point where we could calculate price per pound amounts and country of origin for most of the coffees reviewed. I was also able to pull down city/state location data for each of the roasters and their websites.

One of my first business questions involved the type of descriptive language used to review the website’s top-rated coffees. Where there any particular words that we could associate with the best rated coffee out there, according to coffeereview.com?

A relatively straightforward way to investigate that question is to use a Word Cloud to illustrate the words with the highest frequency of mention in individual review comments.

Most frequent words describing top rated coffees.

Most frequent words describing top rated coffees.

Clearly, if you want to appear to know the jargon for communicating your delight about a quality cup of java, you should say something like, “This coffee’s intense aroma of flowers, baker’s chocolate and fruit is only bested by its complex, rich flavor with tart tinges of acidity and a balanced, silky, syrupy, honey finish…”. Okay…so that sounds ridiculous…but you get the point.

Exploring the data

What is the range of ratings found on the top rated page?

The maximum rating any single coffee receives on this page (of highest rated coffees) is 97, while the minimum is 94. There isn’t a lot of variance. Most of the top rated coffees are rated 94, a third are 95, and the remaining15 percent are either 96 or 97. We will revisit this data later.

Distribution of Top Rated Coffees from CoffeeReview.com

Distribution of Top Rated Coffees from CoffeeReview.com

What years of ratings do we have the most robust data for in order to do more specific analysis on our variables?

We decided to drop all years prior to 2010 (which had 29 coffees reviewed that year).

year count
2014 70
2013 58
2012 40
2011 39
2015 24
2010 20
Which coffee roasters were the most frequently reviewed and top rated by coffeereview.com between 2010 and roughly six months into 2015?

JBC Coffee Roasters from Madison, Wisconsin was the favorite by far in terms of its 26 reviews on the website in the time span specified. Followed by Temple Coffee and Tea in Sacramento, CA (20) and PT’s Coffee Roasting Company in Topeka, Kansas (13). This was a surprise to me, as I have never sampled ANY coffee from these roasters and feel like I have been missing out. In order to show the table of roasters, i used the combination of R packages “RGraphics” and “gridExtra” to save some nice incremental (sets of 15) graphics.

roasters_1_15

roasters_16_30 roasters_31_45 roasters_46_60 roasters_61_73

A quick visualization of the top rated coffees by year, price per pound and origin country shows some semi-distinct segments within the data based on price alone. This led me to ponder if we could use a clustering algorithm (such as k-means using dummy variables for each country, price per pound, and rating) in order to more clearly segment particular coffees by segment. Instead of using R for this exploration, I exported the data into a .csv and imported it into SPSS to run the analysis there.

Price per pound by origin country and year ($US).

Price per pound by origin country and year ($US). United States = Hawai’i.

A five-way cluster solution seemed the most suitable for segmenting the data in a way that illustrated differences across price and producer country.

Price unreasonably drove the segmentation, as seen in this graphic.

Price unreasonably drove the segmentation, as seen in this graphic.

The segments broke out into groupings containing the following number of coffee reviews each:

Segment                       Count               $US/lb

1                                       174                   $21
2                                       8                      $121
3                                       35                    $44
4                                       1                       $243
5                                       20                    $84

Segment 1: No Geisha or Hawaiian Coffees, Espresso Blends
Segment 2: Panama and Colombian Geishas
Segment 3: Mix of Geishas, Ethiopian, and Hawaiian
Segment 4: Semeon Abay Ethiopia
Segment 5: Mid-priced Geisha, Hawaiian and Ethiopian

Interestingly, a few roasters exhibited a bit of dispersion across the segments due to the variety of awesome tasting coffees they had reviewed. Those roasters included:

PT’s Coffee Roasting Co.

5 (Seg 1)
3 (Seg 2)
3 (Seg 3)
2 were (Seg 5)

Barrington Coffee Roasting Co.

3 were (Seg 1)
4 were (Seg 3)
1 was (Seg 4)
3 were (Seg 5)

Bird Rock Coffee Roasters

6 were (Seg 1)
1 was (Seg 2)
3 were (Seg 3)
1 was (Seg 5)

Paradise Roasters

6 were (Seg 1)
1 was (Seg 2)
1 was (Seg 3)
2 were (Seg 5)

After exploring the data in this way, I wondered if 1) my approach to segmentation was appropriate 2) what the comments from these segments looked like comparatively. To answer the first question: no, but that will be the topic of my next blog post. To answer the second, let’s explore some word clouds below.

Word Cloud: Segment 1

Word Cloud: Segment 1

Word Cloud: Segment 2

Word Cloud: Segment 2

Word Cloud: Segment 3

Word Cloud: Segment 3

Word Cloud: Segment 4

Word Cloud: Segment 4

Word Cloud: Segment 5

Word Cloud: Segment 5

 

Perhaps clustering by cupping notes is a better way to segment groups…stay tuned.

New Year, New Content

Happy New Year Café Hound! In 2015 we are going to try to add some new content on the site for the first time in a while.

One element of the new content is going to explore various data sources and statistical techniques regarding aspects of the coffee industry. A quick and dirty preview to this exploration follows below.

Cafe Hound developed a way to download daily price data for Arabica and Robusta coffee taken from daily settling prices (in New York and London respectively) for futures contracts. The unit of measurement is U.S. dollars per pound. We use a website called Quandl.com  for the data. We use R as our data analysis software.

Prices for Arabica and Robusta Coffee

Prices for Arabica and Robusta Coffee

The overall trend line shows coffee becoming more expensive over the last fifteen years on average, although with a significant drop in price occurring after prices peaked in late August 2011.

Price Range: Arabica vs. Robusta

Price Range: Arabica vs. Robusta

Arabica coffee (starting in 2000) is three times more volatile than Robusta coffee in terms of price variance.

Now we will look at combined annual production of Arabica and Robusta coffee. This data was compiled with International Coffee Organization data that was manually inputted into a .csv file http://www.ico.org/new_historical.asp

Coffee Total Production by Year

Coffee Total Production by Year

Coffee Production by Type

Coffee Production by Type

International Coffee Organization data isn’t as precise as it could be but it allows us to understand that the increased global coffee production is being produced in countries that are capable of growing millions of bags Arabica AND Robusta coffee in any given year.

Coffee Production by Type and Country

Coffee Production by Type and Country

Increases in total production over the last few years appear to be driven by an increase in production from countries that are classified as producers of Arabica/Robusta and Robusta/Arabica coffee. In short, this means that countries that are producing both Arabica and Robusta coffee are responsible for driving the growth in global coffee supply. Specifically Brazil, Indonesia, and Vietnam appear to be increasing their share of global coffee production—likely focusing said production on the lower quality Robusta coffee.

That’s enough of a preview for now, but the basic takeaway from this information leads us to the conclusion that we should pay particular attention to the variables  affecting production from Brazil, Vietnam and Indonesia if we want to forecast annual coffee production. However, at this point we haven’t explored the factors driving price enough to fully understand what we should be observing. More questions than answers at this point –meaning MORE analysis and exploration to come! Enjoy the New Year and keep checking in for more here at cafehound.com

 

 

Buzz: Starbucks Unveils High-End Roastery-Tasting Room Concept 

 

Starbucks Reserve.

Using a barrage of adjectives like super-premium, unique, reserve and small-lot, Starbucks has just announced details regarding its new “premium coffee experience” store concept, as well as its flagship “small-batch” Roastery and Tasting Room, coming to Seattle’s Capitol Hill this winter.

The company says the new roastery will be a kind of interactive coffee museum and tasting room designed to showcase the company’s “small-lot” Reserve line of coffees. It will also be the flagship for Starbucks’ new store model, which will occupy some 100 locations in strategic markets throughout the globe over the next five years.

(related: Starbucks Piloting Mobile Trucks at Three U.S. College Campuses)

Adjectives abound, but if one phrase is an elephant in this particular room, it is “Third Wave,” one many around the high-end retail industry, including this blog, has avoided using for years. But it seems particularly apt here, as the company that embodies “Second Wave”-ness rolls out its new high-end, coffee-quality-focused brand.

Starbucks itself describes the new store concept as is a kind of higher rung in “customer experience segmentation,” part of the company’s retail “evolution.” Starbucks CEO Howard Schultz went so far as to describe the new roastery and tasting room as something that will revolutionize all of specialty coffee.

(related: Drama Unfolds with the Opening of Williamsburg’s First Starbucks)

“Everything we have created and learned about coffee has led us to this moment,” he said. “The Starbucks Reserve Roastery and Tasting room is a multi-sensory experience that will transform the future of specialty coffee. We plan to take this super premium experience to cities around the world, elevating the Starbucks experience not only through these stores but across our entire business.”

Here’s more from Starbucks on the new Seattle roastery:

A first-of-its-kind union for Starbucks of coffee theatre and manufacturing, this iconic Seattle destination will allow Starbucks to double its small-batch roasting capacity and grow its Starbucks Reserve® coffee presence from 800 to 1,500 stores worldwide, by the end of FY15. More than two years in development, this unprecedented experience will allow customers to engage with Starbucks passion for coffee in a 15,000 square-foot interactive retail environment devoted to beverage innovation and excellence.

In addition to the approximately 100 new premium stores, Starbucks is also unveiling new smaller-footprint and drive-through “Express” store models, where there will be a focus on quick service and developing Starbucks’ mobile ordering platform. These stores, the company says, will “address the increase in urbanization and decentralization of retail.”

(related: Cupping at Starbucks: The Sound of Silence (and Slurps)

Including its traditional retail stores, its premium stores and its express stores, Starbucks is on track to open some 1,550 outlets globally in 2014, and plans to open 1,600 in 2015, including 300 net new locations in the U.S.

Source: Daily Coffee News, http://dailycoffeenews.com/2014/09/05/starbucks-unveils-new-dont-call-it-third-wave-concept-plans-seattle-roastery-opening/

Coffee Logistics: Specialty Coffee On-Demand?

Source: The Atlantic – Robinson Meyer

A barista at Ritual Roasters in San Francisco pours hot coffee into Thermoses about to be shipped around the country. (Courtesy Thermos)

Last week, Thermos overnighted me a cup of hot coffee from Minneapolis to Washington, D.C., to see if it could. It was a bald-faced PR stunt. It succeeded in both senses: The coffee was still hot by the time it reached me, and I am writing about it now.

Now you’ve been warned: This is an article about a PR stunt. It was, however, an extraordinary PR stunt—well-executed, conceptually simple, and bubbling with zeitgeist. And I accepted the hot coffee for reasons beyond my love of roasted arabica.

The stunt was ostensibly to promote Thermos’ vacuum-insulated 40-ounce Stainless King beverage bottle. The company claims the Stainless King can keep hot things hot and cold things cold for 24 hours, and indeed my own experience with this monarch of thermoses bore that out.

The stunt’s part of a larger contest (and context). In May, Thermos shipped 25 of its Facebook fans in the contiguous U.S. free coffee overnight from Ritual Coffee in San Francisco. This month, the second time it ran the contest, it chose a more midwestern provider: Spyhouse Coffee in Minneapolis.

Courtney Fehrenbacher, a marketing manager at Thermos, told me that the company hopes to re-run the contest every other month, at least until the end of the year. Altogether, Spyhouse will hand 35 of its steaming envoys over to FedEx to be distributed across the country.

But, dare I say, the stunt was about even more than Thermos, Spyhouse, the Stainless King, or the Iron Throne. It was about logistics.

***


The box, as it arrived in D.C. (Robinson Meyer/The Atlantic)

As best as I can assemble it, here is the trajectory of the Stainless King and its erstwhile contents.

The coffee inside the Stainless King was Spyhouse’s Las Nubes roast: a coffee variety indigenous to Kenya and grown in El Salvador. The varietal was brought to El Salvador in the early 20th century when that country’s economy rested on its coffee production. This bean was grown on a similarly old farm, high-altitude land owned by the same family since the 1920s. (Or, at least, that’s the story Spyhouse tells.)

This bean, though. It was harvested sometime last winter before it entered its customary months of rest. Afterward, it was shipped to Spyhouse, which roasted the beans on July 21, 2014. It became the shop’s Las Nubes lot.

I presume it roasted those beans in the morning, because by the afternoon it was brewing the coffee. Around 4 p.m., the team got out their 10 Stainless Kings (designated for me and fellow members of the media) and filled them with Las Nubes, which they dripped. Then they put them in Thermos’s special packages—augmented with a bag of freshly roasted Las Nubes—and drove the boxes “about a quarter mile away” to the local FedEx facility.

According to a FedEx spokeswoman, the package was placed in a modified McDonnell Douglas DC-10, called an MD-10*. That plane’s a couple decades old, at least—McDonnell stopped making them in 1989—and FedEx owns more than anyone else. FedEx indisputably owns the largest private cargo fleet in the world, and, according to the trade journal Supply Chainthe fourth-largest aircraft fleet, period. 


Someone at Spyhouse knew how to pack a box. (Robinson Meyer / The Atlantic)

Perhaps the package was stopped and exchanged in one of FedEx’s global or national hubs, in Memphis, or Indianapolis. Eventually, though, it arrived in D.C. in the wee hours of the July 22. Unloaded from the plane, sorted, loaded onto a truck, and carried to The Atlantic’s office/cement island-fortress, the Watergate, it reached its destination at 7:21 a.m. The coffee had been roasted less than 24 hours before.

Of course, the coffee wouldn’t reach its final destination—my belly—for another hour or so. I got to work during the eight o’clock hour, hoping to intercept the Stainless King, and discovered Santa had already arrived.

With my colleague Adrienne, I unboxed the long-traveling liquid. Like Max’s dinner in Where the Wild Things Are, it was still hot.

***


This sticker sealed the box that arrived from Spyhouse. (Robinson Meyer / The Atlantic)

Talking to Spyhouse’s founder and owner, Christian Johnson, I’ve been able to piece together the coffee’s temperature-history. Spyhouse uses water at exactly 203 degrees Fahrenheit to brew Las Nubes. Johnson estimates that by the time that liquid—now coffee—departs the brew shuttle, it’s between 175 and 180 degrees. Then it was capped, vacuum-sheathed, and sent on its way.

But still the conditions outside changed. “Depending on the exact placement of the package inside the aircraft, temperatures range from 40 to 70 degrees Fahrenheit during an average flight, with the average temperature being about 60 degrees,” a Fedex spokeswoman said of the Thermos’ cargo transit. And the pressure changed outside as well, rising to the equivalent of 8,000 feet above sea level.

It was about 72 degrees in the district as the package trundled through, and a few degrees cooler in my almost-refrigerated office. When we uncapped the Thermos, we measured its temperature to be 151 degrees.

Can you see the steam coming off the just-opened Stainless King? Maybe not? Okay, well, it totally was. (Robinson Meyer/The Atlantic)

 “Wow. That’s amazing,” said Johnson, after I shared this heat conservation with him. “So really you only lost 25 degrees between when we capped the thermos to when you opened it.”

He added that the other factors involved in long-form transit—the altitude, the pressurization—shouldn’t have significantly affected the coffee’s taste. I think that sounds right. I found Las Nubes as described, similar to other El Salvadorean coffee I’ve had that didn’t migrate: acidic in a citrusy way, a little sweet.

***

According to Fehrenbacher, the idea for the contest came from an anecdote that Thermos’s president would tell. Once upon a time, the story went, a client had paid the company to regularly overnight coffee from across the country. (No one seems to remember just which client this was.) Why not see if they could recreate the story for marketing purposes?

The gimmickry of the stunt seemed to attract Johnson to the idea. But when he spoke to me, he obligingly remarked too on the pop-cultural power of Thermos. He and the other baristas carried Thermos-made lunch boxes as kids; they respected Thermos as a stalwart American product. Now, they were proud to partner with the company for the contest.


The hot coffee, a few minutes after arriving—it held its temperature in the mug. (Robinson Meyer / The Atlantic)

And Thermos is an enviable tool for that reason. It embodies “do one thing well”in the world of beverage receptacles. People buy it because they want something that does what a Thermos does—and every time, without fail, without system reboots or lag, it dispatches this task admirably. (Though if I have one quarrel with the Stainless King, its top cap was sometimes very, very hard to screw off.)

Talking to Thermos and Spyhouse, I was struck by the image at the top of this post: A Ritual roster, pierced and bearded, pouring single-origin coffee into that most mainstream of food receptacles: the Thermos. It’s more than urban-meets-rural: It’s the new dream of artisanal, ethical food preparation meeting the old dream of mass-produced American plenty.


Packing the boxes at Ritual Roasters (courtesy Thermos)

It reminds me of the most recent product of K-Hole, a kind of art collective that mocks corporate trends-casting reports by issuing its own. K-Hole calls the aesthetic that gives rise to artisanal coffee “Mass Indie”:

Mass Indie ditched the Alternative preoccupation with evading sameness and focused on celebrating difference instead. […] Whether you’re soft grunge, pastel goth, or pale, you can shop at Forever 21.

But as Mass Indie becomes mass-er, it starts to hit snags. “Individuality was once the path to personal freedom—a way to lead life on your own terms,” says K-Hole’s report. “But the terms keep getting more and more specific, making us more and more isolated.” Each product, slightly different and catering to a slightly different audience, winds up isolating people in islands of taste and difference:

Feast.ly, Fast.ly, Vid.ly, Vend.ly, Ming.ly, Mob.ly: each provides a specific service, finetuned to a specific user need, brought to life by a specific entrepreneurial urge. They’re all targeting different audiences, but the general public can’t remember who’s who.

As Mass Indie approaches cultural domination, its elites flee. They’re alone on their perfectly curated and indecipherable islands of taste. They instead embrace—and please, please, do not stop reading when you encounter this word—normcore.

Normcore moves away from a coolness that relies on difference to a post-authenticity coolness that opts in to sameness. But instead of appropriating an aestheticized version of the mainstream, it just cops to the situation at hand. To be truly Normcore, you need to understand that there’s no such thing as normal. […]

Normcore seeks the freedom that comes with non-exclusivity. It finds liberation in being nothing special, and realizes that adaptability leads to belonging.

“If you live in the middle of nowhere,” Fehrenbacher told me, lauding her own company’s stunt, “you get to try some of the country’s best coffee.” Thermos has already shipped hot coffee to central Florida, northern Michigan, and (of course) New York City.

Looking at that picture of the bearded barista and the line of identical Thermoses, I thought, what could be more normcore than this?

But there’s something that enables all of this, from my supping of the coffee to your reading this now: the global supply chain. The ability to fling ingredients and products from coast-to-coast and continent-to-continent makes not only Thermos’s contest but Spyhouse’s very business possible. It’s the supply chain that moves coffee beans from El Salvador to Minneapolis, where they can be roasted and sipped in days. It’s the supply chain—in the form of FedEx, which, remember, has the world’s fourth largest collection of aircraft—that performs the final stunt of getting coffee around the lower 48 in half a day.

Behind every ingredients list stand the movers and shippers of our world: each, like FedEx, possessing a private army of execution. I accepted Thermos’s coffee contest because it seemed a spectacle of logistics. But every single day of our lives is already that.


* This post originally described the plane which shipped the Thermos as a DC-10. It is properly an MD-10: a DC-10 modified by FedEx to have a larger cockpit and different hull. We regret the error.

Buzz: Starbucks Launches Posh Store in Colombia

Yesterday, Starbucks officially arrived in Colombia.

After years of keeping the multinational specialty coffee powerhouse at bay, Juan Valdez will no longer be able to avoid battling Starbucks in its native market of Colombia. Starbucks, after being challenged by Juan Valdez in the US market in the early 2000s, and trying its own marketing foray into now defunct specialty coffee concept (15th Avenue Coffee), has launched its first retail coffee bar in Bogota, Colombia.

Instead of running from what many perceive as its strongest asset, the Starbucks brand, Starbucks is fully featuring its logo outside of this centrally located destination. Perhaps after completing decent market research, Starbucks realized that many in the growing global middle class aspire to an affluent lifestyle characterized by iPhone ownership, Starbucks specialty drinks, Coach products, and other premium brands. One of the most attainable products accessible to any income bracket is a simple cup of coffee and a snack. Juan Valdez, with its elegantly designed retails stores, has long taken advantage of growing wealth and a cultural disposition towards public life –of which the coffee shop culture plays a role. Since 2002, Juan Valdez retail locations have represented a place where folks can meet for business, for spending time with old friends and family, or on a date. It is seen as hip for younger generations while also as respectable and safe by older generations. It is also a source of national pride.

In reality, Colombia has been ripe for the arrival of Starbucks for a decade, but the terms  of that arrival have constantly evolved. This is emphasized in the version of Starbucks launched in Bogota, which is clearly meant to improve upon the Juan Valdez concept. They are aiming higher in terms of premium status, and likely want to differentiate Starbucks from Juan Valdez along those lines. In the short-term, it is likely that Starbucks will focus its Colombia expansion on only the most desirable, high-end, urban locations in order to solidly establish the luxury brand concept. It remains to be seen whether they creep down the price continuum to “Starbucks Express” and kiosk locations in a market that is already saturated with Oma and Juan Valdez competitors. There is nothing economical or middle class feeling about the Parque de la 93 location of the new Starbucks. Although a fairly small, quaint and stylish area, it is essentially a mashup of Georgetown DC and Central Park West NY  (Embassy Row clientele mixed with the wealthiest from the nearby financial district).

The Bogota cafe makes use of locally sourced wood, antique- and hammered-brass light fixtures and sells Colombian-inspired food such as cheese sticks and croissants with a sauce similar to dulce de leche. Source: Starbucks Corp. via Bloomberg

No doubt, there may some Starbucks clients that struggle internally with whether to support their  homegrown hero, Juan Valdez. Although, in the end, it appears that the Federation (Federación Nacional de Cafeteros) ensured local growers would benefit regardless of the new competitor– Starbucks in Colombia claims to source 100% of the coffee used in Colombia from Colombian growers. Starbucks’ unambiguous strategy to rapidly expand in Colombia, Brazil and throughout the region, could negatively impact the bottom line for Procafecol S.A., the parent company of Juan Valdez stores. In fact it could be devastating for an enterprise that claims to be preparing for an IPO and has struggled in its attempts to expand in the US and abroad.

For more pictures and the official press release from Starbucks, click here.