Cluster Analysis using Tableau and R – Part-2

Related article : Cluster Analysis using Tableau and R – Part-1

We have performed clustering analysis from both Tableau and R in my previous post. Tableau uses k-means algorithm for cluster analysis which partitions the data into k clusters with a center or mean value of all the points in each. Clustering is based on the distance each measure lies from the center.

Let’s look into that in detail.

First generate a cluster scatter plot in Tableau as we did in part -1 using Iris data set.


Right click on the cluster that we added and choose Describe Clusters option


This provides you the clustering details given below


Now lets perform the k-means clustering from R and print the cluster

#copying iris to myiris variable
#Remove Species column
myiris$Species <- NULL
kmeans.result <- kmeans(myiris,3)

#print the cluster data


Check the cluster means against that of Tableau cluster centers. Aren’t these comparable? However Tableau clustering analysis is limited and the default one is k-means as compared to the number of packages and functions available in R to perform various types of clustering. I will dedicate a future article to cover cluster and fpc packages in R.

In this case, what options that a Tableau user has for extended/advanced clustering? The answer is R integration by calling R packages from Tableau using similar steps that I explained in this article.

Cluster Analysis using Tableau and R – Part-1

This article introduces you to similar clustering analysis on your data using Tableau and R. Data files and source used in this post can be downloaded using the link below.

Download source files used in this article

Clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. We will perform the analysis in both Tableau and R using the same data.

Clustering Analysis using Tableau

To start with, we connect Tableau to Iris data set.  You can download this from UCI Machine Learning Repository

First connect the Tableau work book to this csv data source and launch a new sheet. Drag the measures petal length, petal width to columns and sepal length, width to rows.


Next, disable aggregation of measures using Analysis->Aggregate Measures


Alternately, to keep it simple, you can choose to analyze only 2 measures as shown below. But in this article, we go with all 4 measures as above


If you observe, these scatter-plots does not identify or differentiate any groups. However in our case, the data set already has a column specifying flower species of these measures. So let us view it by dragging ‘Species’ to color which shows the distinct species groups as below:


Well, imagine what if we didn’t had the ‘species’ data handy and we wanted to identify the clusters based on the measures. Lets see how it can be accomplished using Tableau Cluster Analysis.

Start with our initial plot, i.e.


Go to Analytics tab, and drag ‘Clusters’ as shown in the screen capture below. Tableau automatically identifies the number of clusters.


Leave the defaults



Note that we have got exactly same cluster grouping as we got using ‘Species’ dimensi.on data.

Tableau uses k-means algorithm for cluster analysis which partitions the data into k clusters with a center or mean value of all the points in each. Clustering is based on the distance each measure lies from the center.

Cluster Analysis using R

To start with, let us revisit Tableau plot for iris data between petal.length and petal.width with cluster analysis.


Let’s go to RStudio, and plot this using ggplot (note that iris is available as part of the datasets installed with R)

ggplot(iris, aes(Petal.Width, Petal.Length, color = Species)) + 


Note that we get identical grouping in R plot, but we used species column data to group (color) the data.

Let us look at how to perform the cluster analysis to identify clusters in R.

First take a copy of iris dataset to another variable

#cluster analysis - Biju Paulose
#copying iris to myiris variable
#printing data 

For our analysis, we do not want to use species column/data. So lets remove that from the new dataset.

#Remove Species column
myiris$Species <- NULL
#printing data to verify

Lets use k-means function for generating 3 clusters and plot the data

kmeans.result <- kmeans(myiris,3)
# plot the clusters
plot(myiris[c("Petal.Width", "Petal.Length")],col=kmeans.result$cluster)

The result is given below.  As you can compare with the analysis performed in Tableau above, we could generate the same clustering of data from R. We will examine these more closely in my next article.


Download source files used in this article

Using R forecast from Tableau


  • RStudio (with forecast package installed)
  • Tableau Desktop (with connectivity established to RServe service. For details of R integration with Tableau, please refer my previous post here)
  • R Programming knowledge
Download source files used for this article

Forecasting allows business to arrive at more realistic estimates and targets for future. Tableau analytics provide the option of generating forecasts which many of you must be familiar with. In this article, we will look at how to make use of the R forecast package from Tableau. This will be a reference for the capabilities that you can bring in from R.

For a simple demonstration of R forecast, we can use the R air passenger time series data set for 2 year forecasting as below

myts1 <- ets(AirPassengers)
plot(forecast.ets(myts1, h=24))


Tableau’s native forecasting has similar capability – an example shown below using the superstore dataset


For the R integration, start a new sheet connecting to globalstore dataset and generate timeseries graph for sales (orderdate by months and Sum[Sales])


To generate a forecast using R package, create a calculated field with the script as shown below


myts <- ts(.arg1,start=c(2011,1), frequency=12);
myforecast <- forecast(myts, h=.arg2[1]);
append(.arg1[(.arg2[1]+1):monthsts],myforecast$mean,after= monthsts
SUM([Sales]),[Forecast Months])

The scripts creates timeseries for sales starting Jan 2011, generates forecast and appends starting x months (specified by parameter ‘Forecast months’) before the last month in the series.

You can view the forecast series by adding calculated field (SalesForecast) to the row. To make it intuitive, create the formula isForecast as below and drag to color.



Forecast vs Actual

To view forecast vs actual side by side, you can add sales to row


But this does not give you a clear understanding or limits your ability to compare. The solution is to bring them together (dual-axis) and then synchronize both axis. The result is shown below


Download source files used for this article