Wines have always had a rich vocabulary. On any given label, you might find terms like acidity, buttery, or coconut, which can make each new bottle of red, white, or green wine an exciting experience. But how and why do wines from different countries vary in their wine descriptions?
For this analysis, we are looking at 8,464 wine labels from K&L Wine Merchants, an alcohol fueled website selling wines and spirits among other accessories. K&L claims to field many wine experts featuring buyers and other regional wine masters.
K&L’s wine labels are American heavy with 52.8% bottles originating from the States. Other wine producing countries in our analysis include Greece, France, Italy, Spain, Portugal, Germany, Argentina, Chile, Australia, New Zealand, and South Africa.
Figure 1: The most frequently used words listed on American wine labels (excluding common English terms like “the”, “a”, etc..)
Wine words used across the world
Digging (or sniffing?) into the language of wine is like diving in murky waters of the Jersey Shore. Wine takes on its own jargon and requires countless hours of researching, note taking, and tasting to unlock its mystery. If you are wine novice, words like cabernet, red, black, and palate are used a lot along with terms dense, firm, mineral, complex, balanced, and concentrated.
A sommelier is considered a wine savant, someone who understands difference between grapes grown in Napa and Porto, which drought years diminished champagne production, all the while providing instant recommendations for the perfect wine to pair with grilled branzino.
The causal wine drinker is likely familiar with wine buzz words such as cabernet, sauvignon, and the myriad of its descriptors -tannins, full bodied, oak, and the list goes on. If you read enough wine bottle labels, you will find trends in word associations in the flavor profiles and textures of various wine varieties.
For example, the word cherry, is linked to words like fruity, red, and bold. Words associated with “cherry” based wine listings include black (549 words), red (305), plum (226), raspberry (203), dark (195), aromas (132), and ripe (99).
When describing the language of wine, we find that novices and even experts cling to familiar descriptors like sweet, tart, bitter, rich, and of course the aforementioned fruity.
The terms tannin or tannic is another workhorse for wine labels. As a refresher, tannins are a textual element that make wine taste drier and contain naturally occurring polyphenols. Tannins are frequently associated with words like firm, ripe, silky, acidity, sweet, body, grained, polished, velvety, chewy, full, rich, juicy, grainy, and fresh. Words linked to alcohol include natural, oak, acidity, tannins, hit, low, percent, wood, high, dark, notes, dry, saturated, and sugar.
Chocolate is another term most often linked to Merlot based wines but not nearly as popular as fruit based descriptors . Comparable “chocolatey” terms include dark (157 words), full (61), cherry (47), and licorice (43) sprinkled in with bitter, spice, berry, expresso, earth, coffee, and tobacco to name a few. Wine labels also tend to highlight a wine’s oaky flavor profile-commonly associated with sweet spices like nutmeg, clove, and ginger as well as bitters coffee and chocolate.
France’s love affair with wine.
France is concentrated with diverse mix of wine growing regions from Provence to Champagne. Wine growers and producers here are known for producing some of the world’s finest white and red bottles.
But which words make a wine distinctly French? 
Many French wines use the terms domaine or chateau in their descriptions. They are used typically used interchangeably and refer to the place where the wine is produced.
More specifically, domaine refers to a territory or empire and is linked to wineries in Burgundy while chateau is the French term for a country house or castle. “Cru” (or growth), another term associated with French wines, is a vineyard or group of vineyards from a high-quality wine producer.
In the France’s Rhone Valley, Chateauneuf –du-Pape (the word “Pape” appeared 545 times) is a French wine appellation located around the village Chateauneuf –du-Pape. This appellation produces some of Rhone’s most prized and expensive wine. In our analysis, however, we find that French wines that didn’t include the term “Pape” were priced higher ($151.1 US dollars) than “Pape” based wines ($124.8) even though the latter wine was on average aged around a year longer than the former. 
Champagne is another French wine region located in the northeast and is famously known for its sparkling white wine (Champagne!). Not surprisingly, terms like brut and blanc(s) tend to pop- up on French champagne labels along with the term “vintage” since the average bottle age exceeds 16 years. For reference, 83% of champagnes were produced in France in our analysis.
Red wines with a “Made in Burgundy” stamp (eastern France) are made with 100% pinot noir grapes while the region’s white wines are created with 100% Chardonnay grapes. On average, French champagnes ($140) were $67.9 more expensive than Burgundy wines ($72.1). Of course, K&L wines tend to sell pricier bottles of wine than the typical retailer. Significantly cheaper bottles of champagne from Champagne and Burgundy’s reds are definitely on the market!
We will close our French wine tour with Bordeaux. The majority of wine here is red but dry whites can also be purchased. The terms blanc (white) and riuer (as in Bordeaux Superiuer) are popular Bordeaux wine terms.
Purple stained Instagram posts are populating the internet with organic wines, which have taken off particularly in France.
Organic or natural wines have zero or close to zero sulfites. Some claim that there is no return to sulfite based wines after sipping on the organic version- the taste is better and the hangover is minimized. Words most associated with organic wines include organically, farmed/farming, grown, grapes, certified, biodynamic, winery, vines, cultivated, and viticulture.
Figure 2: Average wine prices for popular grape growing regions in France, Spain, and US.
Word play around French and American cherry wines
How do French and American “cherry-like” wines compare? (or at least wines that have “notes of cherry”). Cherry wine listings in France were more likely to throw out terms like bodied, incredibly, full, and oak. American cherry wines were more likely to use terms like ruby, blend, and bright.
Figure 3: Median rating (1-100 point scale) for cabernet sauvignon wines in six countries.
Which country’s wine labels are most similar to Spain?
We attempt to measure the countries with the most similar wines labeling to Spain’s through cosine similarity, which “is a metric used to determine how similar documents are irrespective of their size.”
Figure 4: Countries with wine descriptions most and least similar to Spanish wines (a higher cosine similarity score indicates a closer match).
Italian and French wines match closely with Spanish vino (0.88 and 0.84 cosine similarity scores). This makes sense since these three countries are located geographically close to each other. Somewhat ironically, at the bottom, Greece and Germany based wines, two countries on opposite ends of the European financial crisis, tend to use words that differ from Spanish labels.
What words are unique to America’s wine?
But what makes a country’s wine vocabulary standout from the rest?
We will attempt to answer this question by using term frequency-inverse document frequency (tf-idf) analysis.  In short, each country gets assigned a tf-idf score for every word in its unique wine corpus (wine descriptions). For a given word, we count the number of times it occurs in the country’s wine descriptions (its term frequency) and divide by the number of times it occurs across the entire wine description corpus. 
Figure 5: American wine terms with the highest tf-idf ratio compared to the rest of the world’s wines. A higher tf-idf ratio suggests that the term is more unique to American wines.
Wine words unique to America are zinfandel (823.3 tf-idf ratio), Sonoma (723.5), and Napa (503.4) Of course, Napa and Sonoma are two of California’s most famous wine regions making it extremely unlikely that these two terms would pop-up on a bottle from Argentina or Spain.
Zinfandel wines, which are made from dark-skinned red grapes, have become America’s signature wine. Zinfandels (or “Zin”) are widely cultivated in California particularly in Napa and Sonoma. Since these grapes arrived in the early 1800s from Italy, zinfandels have been used to make wide-ranging wine styles-from sweet to dry.
Which words best describe non-American wines?
The acronym OWC, which is a popular term among non-American wines, stands for “original wooden case”, which suggests a higher value wines typically sold at auction in a case holding 6 to 12 bottles. Wine bottles with the term “OWC” are quite expensive at a $300 median price compared to $49 for all other wines in our analysis.
Figure 6: Words least likely to be found on American wines labels based on tf-idf ratio. A lower tf-idf ratio indicates that the term is rarely found on American wine labels.
In Italy, Barolo (on the list above) is considered as the king of wines produced from native Nebbiolo grapes (thin skinned, acidic, tannic red). The word Saint, refers to French wines grown in the regions of Saint Joseph and Nuit Saint George. The latter is located in France’s northern Rhone Valley (and is the largest appellation in terms of geographical coverage).
Wine produced in Nuits-Saint-Georges is made in the communes of its namesake Nuits-Saint Georges and Premeaux-Prissey in the Cote de Nuits sub-region of Burgundy. Around 97 percent of wine created here is red. Another popular, non-American wine label word is Pessac-Leognan, which is another wine growing area in Bordeaux offering red and white wines. 
Popular grapes for wine blends
Wine makers tend to ferment grapes varieties separately in separate barrels and then combine them in a vat called a curvee. These curvees are next sold and labeled from the regions where they were grown, which explains why wines made in older countries like France and Spain get bottles named after local towns or villages.
In our analysis, some of the most popular wine blends are the Bordeaux blend (mix cabernet sauvignon and merlot) with 579 bottles and the GSM blend with 225 bottles, which combines grapes like grenache, syrah, and mourvedre (hence the name GSM) as base ingredients along with other regional grape varieties.
Other blends worth mentioning are the Champagne blend made from pinot noir, Chardonnay, and pinot meunier (72 bottles). In Portugal, Port, a dessert wine, typically consists of a blend of tempranillo and Touriga Nacional grapes among others.
Tempranillo is Spain’s most popular grape with the structure of cabernet sauvignon and the meaty nature of carignan. This grape has a has a neutral profile (not too sweet or dry) and is blended with other grapes such as grenache and mazuelo).
Some pinot noir blends combine its namesake with syrah grapes, which makes the wine a bit bolder and smoother going down the throat. In terms of blending, wine regulation in the US is much laxer than other countries. For instance, zinfandels, merlot, chardonnays, and pinots need to be made from only 75% of those grapes. By contrast, pinot noir produced in Burgundy require all red Burgundies to made from 100 percent pinot noir.
There is really no way that we could capture the entire language of wine in less than 9,000 wines from a single website. We need to explore more wine growing regions and analyze the ever growing list of grape and wine varieties to try.
 Well not for all Merlot wines. There are three main styles of Merlot-soft, fruity, and smooth. Some of the most popular “fruit” terms in descending order are black (647 words), red, ripe, blue, tannins (179 words), rich, berry, cherry, citrus, stone, blackberry, sweet, core, white (103 words), plum, currant, orchard, acidity, juicy (87 words), bright, tropical, cassis, dried, and raspberry (60 words).
 Or least words that have a chance of landing on a French wine label.
 According to Wikipedia, an appellation is a legally defined and protected geographical indication used to identify where the grapes for a wine were grown; other types of food often have appellations as well.
 In France, non- “Pape” based wines were produced were on average made in 2012 and “Pape” wines were slightly older produced in 2011.
 In information retrieval, tf–idf or TFIDF, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. It is often used as a weighting factor in searches of information retrieval, text mining, and user modeling.
 To measure the most American wine terms, we created a tf-idf ratio dividing America’s tf-idf score for each word in its corpus by the world’s tf-idf score for each word in its corpus.
 Ok our sample size for French wines is much larger than other countries so our word corpus is a bit skewed towards to French wine terms and regions!