Tableau Proba Blackjack

Random Adventures in Tableau

  1. I decided to put everything tableau related to the tableau namespace on client side (=clojurescript, running in browser), blackjack game related functions went to blackjack namespace on server side while client-server communications are in client and server on their respective sides.
  2. Poker has remained as one of the most challenging games to master in the fields of artificial intelligence(AI) and game theory. From the game theory-creator John Von Neumann writing about poker in his 1928 essay “Theory of Parlor Games, to Edward Thorp masterful book “Beat the Dealer” to the MIT Blackjack Team, poker strategies has been an obsession to mathematicians for decades.

Tableau tips and tricks: Calendar in Tableau This is a simple yet an out of the box trick which you can use to impress your colleagues. Let’s learn how to create Calendar in just 4 steps!

Before we get into the meat of the blog I wanted to give you a short test: see if you can guess where the data used in the below vizusalisation came from. I removed the axis labels to make it harder. I’ve also highlighted one series, but at random, highlight another and see if you can work out the dataset…

[tableau server=”public.tableausoftware.com” workbook=”RandomAdventures-Part1″ view=”Dashboard” tabs=”no” toolbar=”yes” revert=”all” refresh=”no” linktarget=”” width=”600px” height=”620px”][/tableau]

Stocks and Shares, right? You know what that share who’s dropping is? That high flyer? Now hit F5 to refresh your internet page, watch the data….

I’m sorry to say that this was all randomly generated data, not stocks and shares, I made it all up. Each line started off at the same value and I gave it 100 random movements (1 point up, 1 point down or no movement – all equally likely) before I showed you on the chart. Want to check? Here’s the axes and full chart (the above chart starts at x = 100)

[tableau server=”public.tableausoftware.com” workbook=”RandomAdventures-Part1b” view=”Dashboard” tabs=”no” toolbar=”yes” revert=”all” refresh=”no” linktarget=”” width=”600px” height=”620px”][/tableau]

So next time you’re telling yourself you’re onto a sure fire stock market winner, or a “can’t lose” streak at the roulette / blackjack table (Vegas TCC 15 anyone?) then just double check that you’re not looking at random increase. I find it amazing how different each of these lines of data is after just 100 generations of randomness.

It’s randomness in Tableau, and specifically generating random numbers that I want to explore in this post.

Why Generate Random Numbers in Tableau?

Before we get into the HOW, lets explore the WHY. The main reason for introducing randomness into the dataset might be to “jitter” data-points in the view. Steve Wexler of Data Revelations has already written on this subject, and I recommend his excellent article to see details of one approach. However another approach, where using INDEX() isn’t appropriate might be to use a random number. We’ll visit one particular use case later in this article.

Secondly you may wish to model processes that include a random probability or chance, if you do then obviously random numbers offer an approach.

Thirdly, you may just want to have fun, and say…build a blackjack game in Tableau.

How do you generate Random Numbers in Tableau?

Aside from methods using RAWSQL functions or SCRIPT functions to call out to SQL/JET and R respectively then there is no function that will allow you to bring a random number into your Tableau workbook. Instead you’re going to have to use a random number generator such as a Linear congruential generator (LCG) – these are pseudo-random number generators that are incredibly simple as they have linear algorithms.

Tableau proba blackjack games

You can read about LCG’s here, and I will also show you an implementation stolen from the Tableau genius that is Joshua Milligan (author of the Blackjack game referenced above – I advise you check out his Tableau Public visualisations).

The actual random number calculation is recursive – so the calculation takes its previous value as one of the inputs – using Previous_Value – a table calculation:

Random Number (one method – many variants involving different values exist)

To create a Random Integer

[Random Upper Limit] is a parametrized upper limit for the calculation.

Seed

In the calculations above, [Seed] could be anything to start of the series but I have chosen a completely random seed based on the date and time, this ensures a different random number series each time. I could have used a fixed number or a parameter to control the series, or give the user control, if we follow this route the same [Seed] will generate the same series of random numbers, allowing repeatability.

Implementing a Random Number for “Jittering”

To show you how to implement jittering I want to return to an old post of mine, Health Check your Data using Alteryx and Tableau. In that post I showed a “DNA” profile of data, I want to now show you an alternative method using Jittering. in this case using INDEX() wasn’t appropriate as the Row Number was quite possibly related to the data type. So I use a random number.

Though here I used another version of the LCG formula (just for fun):

I also created a seed and integer version as I detailed above, then I added the integer version as a continuous row after my existing Column names. Hiding the axis of this “jitter” row then left me with what I needed (after some formatting) – click below to see the jittered result. Backwards engineer the viz to see the exact details (this is a highly recommended way of learning).

More Random Adventures

Doing random stuff in Tableau is how I get my kicks, and so also to add a bit more randomness to this post here’s a video of a vizualisation I built I call “Tableau Life” – I’d like to think this is how new features get propagated in Tableau 🙂

Quiz Question

As a bit of fun while you’re watching this try and guess how many rows of data were used in making this visualisation – answer at the bottom of this blog.

Building this visualisation was a challenge and fun but perhaps a little complicated to explain as part of this post, it’s probably enough to say I took inspiration from the amazing and inspirational Noah Salvaterra and his amazing Fractal images in Tableau. Take a look at his blog post to explore his methods, mine are fairly similar (if less advanced).

My approach was to create two sets of random number to check for + and – movement (or none) on each axis (equal chance of each one). Then to iterate across an X value – for the path – and a Y value – for each “node”. The result was a set of random walks – which you can play with and recreate here:

[tableau server=”public.tableausoftware.com” workbook=”RandomAdventures-Part2″ view=”Dashboard” tabs=”no” toolbar=”yes” revert=”all” refresh=”no” linktarget=”” width=”600px” height=”820px”][/tableau]

Quiz Answer

How many rows of data? The answer is only 2! Here’s the dataset I used to create both random datasets in this post – the random “stocks and shares” and the “random walk”.

If I’ve just blown your mind then I suggest your read Noah’s post and download my workbooks and backwards engineer them. Welcome to the world of Tableau – the rabbit hole just got deeper 🙂 Any questions on this – I have left it unexplained as it is a little off the beaten track for most Tableau users then please tweet me @ChrisLuv or comment below.

Before answering this question, let us look at the relevant terminology and understand the architecture of the data extracts (TDE) that we work with in Tableau.

TDE Data Storage – Columnar Data Model

A Tableau data extract (TDE) is a subset of data that you can use to improve the performance of your workbook, upgrade your data to allow for more advanced capabilities, and to make it possible to analyze your data offline (Source: Working with Tableau Data Extracts)

TDEs are based on a columnar database model, which means the data is stored in sections of columns, which allows for maximum compression of the data. The size of a TDE depends upon the cardinality (uniqueness of data values) of the columns in consideration. If there are fewer unique values, the column size is smaller, and vice versa. For example, if a table holds millions of records but a column has only four unique values in the entire table, then the TDE will store only four values for that column, and while storing the values the compression would also be applied.

This columnar model makes the TDE very efficient for reading operations, i.e., data can be read and processed much faster than that in row-based databases.

TDE is architecture aware

A TDE can use all parts of the computer memory, including the RAM, CPU cache, hard disk, etc.

When a query is sent to the TDE engine, only the data required to be processed at that particular point in time is loaded on to the RAM in the form of memory blocks. The memory mapping technique is used to achieve this, where the data is mapped from the disk to the RAM as memory blocks. The data from the RAM and hard disk is exchanged in and out, depending upon what data is required. The OS takes care of creating space on the RAM and shuffling the memory blocks. If there is enough RAM available, the loaded data will remain there for subsequent usage. In the very unlikely case that no RAM is available, the hard disk space is utilized to load the data from the TDE. Therefore, any data required for query processing can be loaded on the RAM, or if that is not possible, on the hard disk. It is also important to note that data from a TDE is always loaded in the compressed mode.

In principal, you can work with a data set that is larger than the amount of RAM available as you are only using small sections of RAM to load the relevant data at any given time. The OS will manage the space on your RAM to load additional blocks if required, and disk space will be utilized in extreme situations.

Tableau Proba Blackjack

How it works: Some examples

The following tests were conducted using Tableau version 8.3.

Test 1: The first test used a table in SQL Server 2012 database of approximately 100 million records, with 16 dimensions and 4 measures. The size of the data file on the SQL Server was over 40 GB. There were two columns holding approximately 400K unique records and one column with almost 100Kunique records. While Tableau connected to this table, a data extract was generated which was approximately 3.5 GB in size.

Tableau Proba Blackjack

It took about 30 minutes to generate the extract file. The SQL Server and the machine that Tableau was running on was 64 bit Windows Server 2012 R2, with 64 GB of RAM and 4 processors with 2 cores each.

Using Tableau Desktop, a report was created with three dimensions and one measure:

  • Dimension A in the row shelf with 400K unique members
  • Dimension B in the column shelf with 3 unique members
  • Dimension C in the colour shelf with 4 unique members
  • Measure with SUM as aggregation

A total of 2.5 Million “marks” (data points) are expected to be generated in this report.

Tableau Proba Blackjack Definition

Please note that the visualization was created for testing purposes only. In reality, these type of charts showing 400K data points do not comply with best practices.

The following tasks were performed by Tableau when the report was accessed.

a) Processing Query

The query required to fetch the data from TDE file was processed by the TDE engine. Using the memory mapping technique (as explained above) the required data from the columns in consideration was loaded on to the RAM and the “result set” was produced.

Total memory utilized: 40 MB

Total time taken: 290 Seconds (4.8 minutes)

b) Computing View Layout

The VizQL component took the result set produced above and generated the desired view/chart (2.5 million marks). On a 64 bit machine, VizQL has only 4GB of allocated memory on the RAM to carry out this task.

Total memory utilized: 2 GB (more memory was required than for query processing)

Total time taken: 90 Seconds (less time was required than for query processing)

Proba

Test 2: The same report was tested on a different machine which was running on a 64 bit OS but where the server had only 10 GB of RAM and 2 core processors. The results were as follows:

a) Processing Query

Total memory utilized: 40 MB (same as for Test 1)

Total time taken: 4490 Seconds (almost an hour and 15 minutes – much longer than Test 1)

b) Computing Layout

Tableau Proba Blackjack Test

Total memory utilized: 2 GB (same as for Test 1)

Total time taken: 90 Seconds (same as for Test 1)

Conclusion

There are two areas to look at: the query processing and the computing layout.

With respect to query processing, even though there were 2.5 million data points in consideration, a maximum of 40 MB of RAM was utilized by Tableau for data loading and processing (irrespective of the hardware used for processing). Even when using the smaller server with only 10 GB of RAM, all of the required data could be loaded since a very smaller section of RAM was required. Therefore, we can conclude that, irrespective of the size of the data set, Tableau will rarely, if ever, be short of RAM for loading the required data and processing the queries.

However the time it takes to process the queries (i.e., the performance) will definitely be affected when we shifted to a smaller server. The TDE processing is core based so the greater the number of cores, and the more powerful the cores are, the better the performance will be. If we increase the frequency of the processors and add more cores, the data will be processed much faster, even when there is a smaller RAM.

However, for the second stage of computing the layout, the amount of memory required and the time to complete the process in both cases was similar. Thus we can conclude that, irrespective of the hardware, this process would remain the same. However we need to remember that there is a limitation of 4 GB on a 64 bit OS and 2 GB on a 32 bit OS, so we must be careful depending on the type of visualization being created. However, even if the data processing takes only seconds, you may not be able to display the visualization due to VizQL limitations. There are many articles on best practices for creating visualizations that must be followed.

So if you have a high-volume database, do not hesitate to create a data extract, but invest in a powerful server so that the performance can be boosted. There are good articles by Tableau that explain the best practices for data extracts through aggregation of data, removal of unwanted columns, etc.. However, it is important to remember that Tableau Data Extract is a not a replacement for a Data Warehouse. From a long term perspective there should always be a proper data strategy in place, rather than creating huge TDE files.

This Tableau Tip was written by:

Sourabh Dasgupta

Sourabh Dasgupta is the Director of Technology & BI at Corporate Renaissance Group India and is also the Product Manager of CRG’s in-house software products such as FlexABM and Cost Allocator. Over the past 14 years he has been involved as a technical expert in various projects and assignments related to implementation of Activity Based Costing and Business Intelligence and reporting systems in India as well as South Africa.

He is a one of the professional trainers for Tableau software and has conducted many trainings for the clients. He has been involved in designing and development of Tableau reports and dashboards, and was awarded with a Tableau Desktop 8 Qualified Associate certificate as well as a Tableau Server 8 Qualified Associate certificate. He has successfully conducted trainings on the Tableau Software (Desktop and Server versions) for many clients, which included basic and advanced features of the Tableau Desktop and Tableau Server.

Sourabh has a Bachelor of Science degree (Computer Science, Physics, and Mathematics) from Nagpur University, India and a Diploma from the National Institute of Information Technology (NIIT), Pune, India. He also holds a Microsoft Certified Professional certificate for Designing and Implementing Desktop Applications with Microsoft Visual basic.

If you wish to accelerate your skills and empower your learning with hands-on education, or are seeking an Expert-on-Demand, reach out to Sourabh at sdasgupta@crgroup.com

You might also like to check out: