Author Archives: Cafe Hound

Data Science: NCAA Men’s Basketball January Recap

A quick break from coffee, January through March NCAA Men’s Basketball is one of my favorite seasons of the year. Staying inside, warm and glued to basketball data–especially now that I live in Boston, MA. As a Virginia graduate, the basketball gods have been decent to me over the last decade. Below I share some graphs created in R Studio using the ggplot2 package on data I’ve scraped from teamrankings.com. It’s useful for getting a sense of trends for the month of January as we head in to February 2018. FYI: I generated a RPI weighted total efficiency metric by multiplying RPI by the total of the scaled sum of Offensive and Defensive Efficiency metrics.

rp_main.jpeg

Overview of Select Top Rated Teams: January 2018

Duke was probably over penalized for its loss at Virginia at the end of January, whereas Villanova’s consistency has kept it atop most ratings boards thus far. Defending National Champion UNC is probably the saddest story on this graph.

rptt_main

rptt_oe.jpeg

rptt_de.jpeg

  • Purdue exhibits a rare combination of high level Offensive and Defensive execution that bests its peers
  • Virginia dominates defensively but lags its peers in offensive efficiency, giving it little room for error come tournament time
  • Duke must improve its defensive efficiency if it is to overcome its peers with its best-in-class offense

rp2_main

rp2_oe

rp2_de

  • Cincinnati carries momentum into February as one of the most dangerous second-tier teams
  • Kansas is still Kansas, a powerhouse full of talent that could make a case for a 1-seed during February
  • Xavier, although less interesting than some of its peers, continues to hang at the top of second-tier teams
  • Auburn is on a tear in the SEC going into February and should be taken seriously
  • Clemson appears on the decline with the loss of a star player, but is one to watch as the ACC race tightens up
  • Michigan State had a rough January on and off the court, but exhibits an Offensive Efficiency far above its peers and considerable defensive prowress as well
  • West Virginia and Texas Tech had troubling January’s that they must turn around in February in order to be respectable challengers come March

rpii

rpii_oe

rpii_de

  • Cincinnati appears to be the real deal with strong relative offensive and defensive efficiency trending in a positive direction
  • It is to soon to tell whether Loyola-Chicago can maintain their momentum going into the final stretch of the season. Keep an eye on them.

rpd

rpdttgoe

rpdttgde

  • MSU and WVU are trending down, but are still formidable opponents with top-tier caliber teams
  • Arizona State appears close to serious downward movement, whereas Seton Hall and Notre Dame are on the verge of low seeds/missing the tournament
  • Arkansas is quickly falling out of contention relative to their earlier standing

Thanks for reading! Leave your thoughts and comments below. Coming soon, some predictive modeling fun as we approach March Madness bracket time.

Find Your Coffee

At cafehound.com, we endeavor to locate the best coffee in the world. Over the last eight years we’ve happily watched as globally, the options available to the public have exponentially increased and the public’s general awareness of specialty coffee has deepened. Although we still believe that tracking down the best coffee in the world is central to our mission, we recently decided to dip our toes into the area of recommending specific coffee(s) to coffee lovers based on a mixture of qualitative and empirical analysis.

espresso_2017

In two posts (1 and 2) from 2015, we took verbal reviews of specialty coffees from the site coffeereview.com,  and we employed various clustering algorithms to discover groupings of coffee (based on words used to describe them and other factors). This served as our initial foray into using Data Science on expert coffee reviews to improve our understanding of specialty coffee.

Over the past month, we’ve set out to improve upon that original work in order to empower java lovers to discover the perfect brew. Our years of cupping coffee and talking with experts have shown that – after a certain point – what constitutes a “good cup of coffee” is subjective and specific to the palette of the beholder.

With that in mind, cafehound.com chose to use a large, multiyear list of coffee reviews from Kenneth David’s coffeereview.com site to explore the relationship between the descriptions used to rate coffee aroma, flavor, aftertaste, body, acidity and finish. We hypothesized that there are distinct groupings of coffee based on their roast profile, body, and flavors that are relevant to informing consumer preferences in the overall marketplace. To clarify, a market segmentation based on a representative sample of surveyed consumer preferences may be more useful to marketing professionals, but that is outside of the scope of this post. Instead, we’re using the structure inferred from math and reviews of specific coffees to estimate categories of the potential “coffee experience.” These categories may provide coffee consumers with guideposts for exploring new specialty coffees.

Our results led to six broad categories of coffee that we’ve ordered from lightest to darkest roast (based on average Agtron ratings). Agtron ratings are a numerical representation of the consistency of the roast color (lower numbers indicate a darker roast <45, higher numbers indicate a lighter roast 50+). More than the roast determines the flavor profile and overall body of the coffee, which is why some of these segments may appear similar.

Initially, we bring this content to you via occasionally updated web pages. Depending on demand, we may scale our service to provide daily or weekly recommendation updates.

For now, follow the link below to Find Your Coffee.

cafehoundlogos01

For code share:

Shiny Segmentation and Prediction

Data Science: Exploring CoffeeReview.com Top Coffees (Cntd.)

In the last post we began exploring the relationship between the language describing coffee (“cupping notes”) and price/brand/roaster. Our objective is to provide coffee consumers with a general understanding of particular groupings of coffee they can choose based on flavor profiles and mouthfeel characteristics. An example of the type of properties coffee professionals use to describe their craft is illustrated in the below flavor wheel from Counter Culture:

CC_FlavorWheel

After evaluating the segments that our initial k-means clustering (with a k of 5) produced, I was unsatisfied with the results. My decision to haphazardly throw the price variable (unscaled) into the model was wrong-headed and drove the algorithm to essentially classify segment membership solely based upon that. In some cases such an exercise may be useful, but for our objective of discerning whether specific language could be used to segment particular specialty coffees, this segmentation wasn’t going to do it for us.

Also, this initial segmentation helped me narrow my “business objective”. Now I wanted to segment by flavor profile, something that might actually help inform a potential consumer’s purchasing decisions.

In order to develop the cupping note variables that would inform our segmentation, I explored the text data from Kenneth Davids’ site and selected the most common and/or most distinguishing words to test. The list of words is below.

wordlist

A quick look at these led me to believe that certain words might not yield significant information gain in the algorithm due to lack of variance. Mouthfeel, sweet and acidity were present in 96%, 80% and 90% of reviews respectively. Their power as differentiating variables would be constrained by their existence in nearly all observations (with the possible exception of acidity).

However, in my initial quick cluster using SPSS, I included the three variables mentioned above and I still liked the results enough to move forward.

Segment 1: 16.9% of reviews

Segment 1: 16.9% of reviews

This segment was the most expensive (average $42.31 USD per pound) and highest rated (94.6). The segment was the highest indexed on floral, honey, complex, silk, delicate, intense, and peach cupping notes. It also indexed highly on nib, lemon and acidity. The most common producer countries in this mix were geisha panama and Colombia, Ethiopian, Kenyan and El Salvadoran coffees.

List of Segment One Coffees 

Seg1_L1 Seg1_L2

Segment 2: 27.8% of reviews

Segment 2: 27.8% of reviews

This segment was the least expensive (average $26.72 USD per pound) and moderately rated (94.45) while coming from the most diverse sampling of producer countries. It indexed highest on rich, deep, resonant and pungent cupping notes. Whereas the other segments did not include any coffees from Bolivia, Brazil, Mexico or Papa New Guinea, this segment did.

List of Segment Two Coffees 

Seg2_L1Seg2_L2Seg2_L3

Segment 3: 13.9% of reviews

Segment 3: 13.9% of reviews

This segment was middle of the road in terms of cost and ratings (average $37.09 USD per pound and rated 94.52 on average). It indexed highest as juicy, tart, acidity, nib, bright, sweet, and was also well above average in complexity and floral notes. The range of producing countries varied quite a bit in this segment, with several bourbon varietals from Guatemala, Costa Rica, Hawaii – still other Geishas from Panama, Colombia and Guatemala – several Ethiopian Yirgacheffe coffees and a few honey processed coffees from El Salvador (Pacamara) and Hawaii (Maragogype ($75/lb)).

List of Segment Three Coffees 

Seg3_L1 Seg3_L2

Segment 4: 20.3% of reviews

Segment 4: 20.3% of reviews

This segment was the least expensive ($28.46 USD per pound) and lowest rated (94.33) – all things relative to a very highly rated group of coffees. It indexed highest for fruit, sweet, lemon and light while also coming in pretty strong in the tart department as well. This segment is composed of a mixture of coffees from Ethiopia, Kenya, Burundi, Indonesia and Honduras. A few peaberry coffees are included, the red caturra from Rusty’s Hawaiian, a few stray Geisha coffees, and a decently heavy sampling of Sumatra, Yirgacheffe, Sidamo, and various Kenyan single-origins. For the value, this is a very attractive and diverse segment of coffees. See our site visit to Rusty’s in Hawai’i in 2011.

List of Segment Four Coffees 

Seg4_L1 Seg4_L2

Cupping With Miguel At Lorie's Home

Cupping With Miguel At Lorie’s Home

Segment 5: 21.1% of reviews

Segment 5: 21.1% of reviews

Segment five is highly rated (94.58) and quite expensive ($37.73 USD per pound on average). This segment indexes the highest for tart, rich, acidity, syrup, pungent, and mouthfeel, while also scoring highly for honey and bright notes. Panama, Colombia, Hawaii and Ethiopia are the most heavily represented producer countries in this grouping. This segment is probably the most populated by Geishas followed by exotic Ethiopian and Kenyan coffees.

List of Segment Five Coffees 

Seg5_L1Seg5_L2

 

 

For more information on the roasters evaluated in this data from the coffeereview.com website, see the links and data below:

ML_1ML_2ML_3ML_4

And I’ll leave you with a bit of a refresher on the Cup of Excellence Scoring Categories for thinking about and communicating coffee quality/taste.

Cup of Excellence® Scoring Categories

DEFECTS

Phenolic, rio, riado automatic disqualification Ferment
Oniony, sweaty

CLEAN CUP
+ purity | free from measurable faults | clarity – dirty | earthy | moldy | off-fruity

SWEETNESS (prevalence of…)
+ ripeness | sweet
– green | undeveloped | closed | tart

ACIDITY
+ lively | refined | firm | soft | having spine | crisp | structure | racy – sharp | hard | thin | dull | acetic | sour | flabby | biting

MOUTHFEEL (texture, viscosity, sediment, weight, astringency)
+ buttery | creamy | round | smooth | cradling | rich | velvety | tightly knit – astringent | rough | watery | thin | light | gritty

FLAVOR (nose + taste)
+ character | intensity | distinctiveness | pleasure | simple-complex | depth

(possible notations: nutty, chocolate, berry, fruit, caramel, floral, beefy, spicy, honey, smokey…)

– insipid | potato | peas | grassy | woody | bitter-salty-sour | gamey | baggy

AFTERTASTE
+ sweet | cleanly disappearing | pleasantly lingering
– bitter | harsh | astringent | cloying | dirty | unpleasant | metallic

BALANCE
+ harmony | equilibrium | stable-consistent (from hot to cold) | structure | tuning | acidity-body – hollow | excessive | aggressive | inconsistent change in character

OVERALL (not a correction!)
+ complexity | dimension | uniformity | richness | (transformation from hot to cold…) – simplistic | boring | do not like!

New Year, New Content

Happy New Year Café Hound! In 2015 we are going to try to add some new content on the site for the first time in a while.

One element of the new content is going to explore various data sources and statistical techniques regarding aspects of the coffee industry. A quick and dirty preview to this exploration follows below.

Cafe Hound developed a way to download daily price data for Arabica and Robusta coffee taken from daily settling prices (in New York and London respectively) for futures contracts. The unit of measurement is U.S. dollars per pound. We use a website called Quandl.com  for the data. We use R as our data analysis software.

Prices for Arabica and Robusta Coffee

Prices for Arabica and Robusta Coffee

The overall trend line shows coffee becoming more expensive over the last fifteen years on average, although with a significant drop in price occurring after prices peaked in late August 2011.

Price Range: Arabica vs. Robusta

Price Range: Arabica vs. Robusta

Arabica coffee (starting in 2000) is three times more volatile than Robusta coffee in terms of price variance.

Now we will look at combined annual production of Arabica and Robusta coffee. This data was compiled with International Coffee Organization data that was manually inputted into a .csv file http://www.ico.org/new_historical.asp

Coffee Total Production by Year

Coffee Total Production by Year

Coffee Production by Type

Coffee Production by Type

International Coffee Organization data isn’t as precise as it could be but it allows us to understand that the increased global coffee production is being produced in countries that are capable of growing millions of bags Arabica AND Robusta coffee in any given year.

Coffee Production by Type and Country

Coffee Production by Type and Country

Increases in total production over the last few years appear to be driven by an increase in production from countries that are classified as producers of Arabica/Robusta and Robusta/Arabica coffee. In short, this means that countries that are producing both Arabica and Robusta coffee are responsible for driving the growth in global coffee supply. Specifically Brazil, Indonesia, and Vietnam appear to be increasing their share of global coffee production—likely focusing said production on the lower quality Robusta coffee.

That’s enough of a preview for now, but the basic takeaway from this information leads us to the conclusion that we should pay particular attention to the variables  affecting production from Brazil, Vietnam and Indonesia if we want to forecast annual coffee production. However, at this point we haven’t explored the factors driving price enough to fully understand what we should be observing. More questions than answers at this point –meaning MORE analysis and exploration to come! Enjoy the New Year and keep checking in for more here at cafehound.com

 

 

Buzz: Starbucks Unveils High-End Roastery-Tasting Room Concept 

 

Starbucks Reserve.

Using a barrage of adjectives like super-premium, unique, reserve and small-lot, Starbucks has just announced details regarding its new “premium coffee experience” store concept, as well as its flagship “small-batch” Roastery and Tasting Room, coming to Seattle’s Capitol Hill this winter.

The company says the new roastery will be a kind of interactive coffee museum and tasting room designed to showcase the company’s “small-lot” Reserve line of coffees. It will also be the flagship for Starbucks’ new store model, which will occupy some 100 locations in strategic markets throughout the globe over the next five years.

(related: Starbucks Piloting Mobile Trucks at Three U.S. College Campuses)

Adjectives abound, but if one phrase is an elephant in this particular room, it is “Third Wave,” one many around the high-end retail industry, including this blog, has avoided using for years. But it seems particularly apt here, as the company that embodies “Second Wave”-ness rolls out its new high-end, coffee-quality-focused brand.

Starbucks itself describes the new store concept as is a kind of higher rung in “customer experience segmentation,” part of the company’s retail “evolution.” Starbucks CEO Howard Schultz went so far as to describe the new roastery and tasting room as something that will revolutionize all of specialty coffee.

(related: Drama Unfolds with the Opening of Williamsburg’s First Starbucks)

“Everything we have created and learned about coffee has led us to this moment,” he said. “The Starbucks Reserve Roastery and Tasting room is a multi-sensory experience that will transform the future of specialty coffee. We plan to take this super premium experience to cities around the world, elevating the Starbucks experience not only through these stores but across our entire business.”

Here’s more from Starbucks on the new Seattle roastery:

A first-of-its-kind union for Starbucks of coffee theatre and manufacturing, this iconic Seattle destination will allow Starbucks to double its small-batch roasting capacity and grow its Starbucks Reserve® coffee presence from 800 to 1,500 stores worldwide, by the end of FY15. More than two years in development, this unprecedented experience will allow customers to engage with Starbucks passion for coffee in a 15,000 square-foot interactive retail environment devoted to beverage innovation and excellence.

In addition to the approximately 100 new premium stores, Starbucks is also unveiling new smaller-footprint and drive-through “Express” store models, where there will be a focus on quick service and developing Starbucks’ mobile ordering platform. These stores, the company says, will “address the increase in urbanization and decentralization of retail.”

(related: Cupping at Starbucks: The Sound of Silence (and Slurps)

Including its traditional retail stores, its premium stores and its express stores, Starbucks is on track to open some 1,550 outlets globally in 2014, and plans to open 1,600 in 2015, including 300 net new locations in the U.S.

Source: Daily Coffee News, http://dailycoffeenews.com/2014/09/05/starbucks-unveils-new-dont-call-it-third-wave-concept-plans-seattle-roastery-opening/