Data processing is a topic that many new Tableau consumers struggle with. The biggest reason for this is because working with Tableau, and most other self-service analytic solutions today, requires a significantly different data structure than traditional business intelligence (BI) solutions. The reason for this is because modern solutions are designed to work directly with relational data stores, while traditional BI was a process of aggregation to allow humans to parse output data files.
The following is an example of how this change in data need can be deployed in a complex business environment.
Tableau has just been brought into Ice Cream Co., a company that is largely run as silos by each of its regions across the globe. The corporate inventory manager has decided to embrace Tableau and the regional managers are also on board. They now must with bring the topic to IT so that they can build the underlying data structure that will support the corporate and 5 US departments initially and foreign departments down the line.
The current environment has complex, and largely unknown to the business, processes that generate reports every month that each of the teams then filters up to decision makers so that Ice Cream Co.’s inventory levels stay in line with demand. Historically each of these processes, for each of the teams, needed to be built independently so that each team could do their reporting. This caused IT to develop and maintain near identical reports, queries, and data tables for each of the departments. So when IT was approached, they gave estimates of the development based on these historical procedures totaling 2 weeks per data source, per department.
Instead what is proposed is a single data source per type of data. You tell IT that you don’t need monthly reports to be generated anymore, because you are going to work with data in a more dynamic way. By creating a data table for current inventory, a second for orders, and a third for the production schedule, we can create a single data source for corporate that each of the departments will be able to use directly. Following the initial build, the only new development will be additional feeds coming into the system from the international departments when then come on board. This will shorten the development time for IT by removing their need to duplicate and maintain parallel systems so that they can focus solely on the new data source integration and the overall health of the data environment.
This proposal is possible because Tableau allows you to remove the need for report style output. Each of the departments work with similar data and when in a tidy data structure, see below, this allows each department to run analytics from a shared source. Additionally, because Tableau allows you to design dynamic dashboards, reporting efforts can be reduced and standardized. This removes a significant amount of repetitive manual work from the analysts so that they can work on identifying ways to improve the business.
“Tidy”, “denormalized”, “computer friendly”, or “tall” data refers to a data structure where each column has a unique set of information. Tidy data is not often human friendly, and thus, not common in most traditional BI applications. Tidy data works well with solutions like Tableau because it is structured in a way that is very computer friendly, it is also often how data is stored in databases because it allows for solutions to be more permanent.
The opposite of tidy data is “normalized”, “human friendly”, or “wide” data. This is often discussed as being highly readable and user friendly, because there is a conceptual line of data that can be read across. Data tables in traditional reporting are commonly presented as normalized data.
Examples of both of these two data structures are below.
Gallons on Hand