Related article : Cluster Analysis using Tableau and R – Part-1
We have performed clustering analysis from both Tableau and R in my previous post. Tableau uses k-means algorithm for cluster analysis which partitions the data into k clusters with a center or mean value of all the points in each. Clustering is based on the distance each measure lies from the center.
Let’s look into that in detail.
First generate a cluster scatter plot in Tableau as we did in part -1 using Iris data set.
Right click on the cluster that we added and choose Describe Clusters option
This provides you the clustering details given below
Now lets perform the k-means clustering from R and print the cluster
#copying iris to myiris variable
#Remove Species column
myiris$Species <- NULL
kmeans.result <- kmeans(myiris,3)
#print the cluster data
Check the cluster means against that of Tableau cluster centers. Aren’t these comparable? However Tableau clustering analysis is limited and the default one is k-means as compared to the number of packages and functions available in R to perform various types of clustering. I will dedicate a future article to cover cluster and fpc packages in R.
In this case, what options that a Tableau user has for extended/advanced clustering? The answer is R integration by calling R packages from Tableau using similar steps that I explained in this article.
We have seen how R can be integrated to your data science project using Power BI or Visual Studio(RTVS). Now its time to look at R integration with Tableau.
Before we get started with the steps, let us discuss how this is beneficial.
Tableau is a great visualization tool which helps you to understand your data, provide interactivity and assist in making business decisions. R integration is going to bring the capabilities of to your Tableau visualizations – such as statistical functions predictive analysis. The advantage of interactive visualizations in Tableau powered by the complex statistical analysis behind the scenes using R presents a strong case for data scientists to go for this integration.
I have added a high level representation below of this implementation. You can call R functions from Tableau and it passes the result back to Tableau which can be used to generate visualizations. You can utilize all packages (difficult to accomplish using Tableau scripting alone) that are running in R Server and generate visualizations using the resultant data (complex to accomplish using R alone).
As first step, make sure that RServe is running as a service that you can connect to. The screenshots below shows how to install RServe from RStudio.
Start the service
Now RServe is ready for connections. Go to Tableau and choose the option Help–> Settings and Performance–>Manage External Service Connection
In this case, my RServe is running on the same PC. So I selected localhost as server. Default RServe port is 6311. Leave that as is and test your connection as below
Above message confirms that you have established the connectivity with R service.
Next, we will look at an example calling R scripts from Tableau.
In my previous blog, I had posted a bar chart showing declining trend in sales of a company over the years. If you notice, it had a trend line to aid visual analytics.
Another example given below shows profit trends by increase in sales for each category.
Trend lines are great visual tools for quick analysis. In the above diagram, its very easy to judge that the increase in sales for Supplies and Tables does not help increasing profit. The same judgement could not have been derived that easy from the clustered circles if the trend lines were not present. The steps to add Trend Lines are explained below.
I have created a scatter plot for Sales vs Profit using the Global Superstore database. Notice the “analytics” tab highlighted below.
Drag ‘Trend Line’ from the options to the visualization area and choose the line type. In this example, I have selected Linear model.
Additional options are available at (Right click -> Trend Lines-> Edit Trend Lines). For example, you can choose to show only one line (uncheck show trend line per color) and view for each category interactively for focused view/analysis as shown below
Though we discussed extensively about SAP Business Objects w.r.t business intelligence tools, I could not get a chance to write about Tableau visualizations till now. Not a great thing considering that my experiments with Tableau started way back in 2012 with Tableau version 7.0 (it wasn’t as popular as today, but it was clearly emerging as a leader). Tableau is one of the best visualizations tool that I have experienced and you can’t stop admiring its performance (thanks to in-memory processing of data), interactivity and analytics options (trends and forecasts).
Starting with a few visualizations from my PC. We will explore specific features and steps in later blogs.
Lets analyze data of a company (data source: Local SQL Server database). As you see the sales and profits are declining. I have added a trend line for its sales. This shows a consistent fall and its time for the company to do something serious to revive the business, isn’t it?
Sometimes a different visualization is what the need of the hour to convey the same data. How about a packed bubble visualization with the sizes corresponding to Yearly sales. Note that the data is categorized further by Divisions (company locations)
And my favorite Geo Maps
We will dig into its analytic features and Tableau Online in the upcoming posts.