Are you interested in migrating to the new Tableau data model? Here is a guideline
The new Tableau Desktop version 2020.2 innovated data modeling significantly with its newest feature: Relationships. How does the way Tableau sends queries to the database change and when is it worth rebuilding existing models? We will answer these questions for you! Many customers already work extensively with Tableau and have created data models in use. Converting to the new data model takes time and resources – however in certain circumstances it may be highly beneficial to rebuild in order to optimize the efficiency of the existing workbook.
Three modeling scenarios
We will give a recommendation on the usefulness of a conversion, depending on the data modeling type. In order to further empathize this, we prepared three different modeling scenarios.
- The connection of data on the same aggregation level
- The connection of data on a different aggregation level with a unique connection field for both tables
- The connection of data on a different aggregation level with several connection fields for both tables
A connection on the same aggregation level is a 1:1 relationship between data. This connection is made by a join: Two tables are connected to each other by the same values of a column, the join criterion. In a 1:1 relationship, the join criterion is unique in both tables.
Previously, 1:1 joins in Tableau were implemented using a physical join, in which a large, flat table is formed. In the new data model, a relationship is formed at the logical level. The aggregation level of the tables is not specified in the default setting: However, performance options can be used to set whether the characteristic is unique or ambiguous and whether referential integrity exists. The statement about the referential integrity determines whether Tableau excludes a full outer join from the outset or whether an inner join is formed directly.
Connections on the same aggregation level
For the comparison of connections on the same aggregation level, 1st a join on the physical level, 2nd a relationship at the logical level in the default setting and 3rd a relationship at the logical level including performance options are compared, from which the following findings can be derived:
- The SQL main queries of the three models are the same
- The number of queries differs only in terms of performance optimization in relationships (one query less)
A data model with a 1:1 relationship can be built as a join in the physical layer as well as a relationship in the logical layer. If possible, performance options should be set (these are also possible with a physical join). An adjustment of the data model with 1:1 joins is therefore not necessary. However, this statement is only true if all tables are always used in the report! A join on the physical level is always executed, even if only some fields of a table are retrieved. Not however on the logical level: There, only the table that is actually used in the visualization is addressed. Accordingly, the join is only executed when required.
Connections on different aggregation level
Connections at different levels of aggregation can be based on both a 1:n relationship, one record in Table A corresponds to multiple records in Table B, and an n:n relationship, multiple records from Table A correspond to multiple records from Table B. In a 1:n relationship, the table with higher granularity must be aggregated to the level of the other table. Previously, this was solved in Tableau using either data blending or Level of Detail (LoD) calculations. Tableau's new data model automatically handles this aggregation for the user, eliminating the need for the user to ensure that metrics are calculated at the correct level of aggregation. This is important to avoid unintentional multiplications of data and thus incorrect figures in a report.
In the following, the queries of three possible data connections are compared and the results analysed:
1. Data blending
2. LoD calculations
3. Tableau relations
- The queries of the three data models are very different
- Blending seems to be very performant for small data sets due to the independent table queries but can become very slow for large data sets due to the filter condition used. It is important to note that data blending is always accompanied by technical limitations
- LoDs trigger nested and complex queries that are expected to perform poorly on large data sets
- When using a relationship, the analysis shows short and efficient queries on both tables, by aggregating the corresponding key figures. Subsequently, the results are joined.
For 1:n relationships, it is recommended to switch to Tableau's new data model because the queries are efficient even with large amounts of data and there are no technical limitations in dashboard creation. Also, unlike data blending, where tables are published separately, it is possible to provide a data source that contains all logical connections between tables.
In the third scenario, n:n relationships are examined. Here the join criterion is ambiguous in both tables and must be aggregated on both sides. The examination of the SQL queries of a data blending and a relationship allowed the following conclusions:
- Blending causes a large number of queries, in addition a comparison between the minimum and maximum of the remaining used and thus possibly finer-granular fields takes place, which could possibly lead to longer query times.
- The relationship results in a two-nested query due to the necessary aggregations of both tables on the corresponding metrics, in total two (very similar) table queries can be documented.
For n:n relationships, it is also recommended to switch to Tableau's new data model, as the table queries are combined in one query and consequently efficient. Also, there are no technical limitations in dashboard creation here either.
Summary and recommendation
Our recommendation in summary: In the case of a 1:1 relationship, it is not worth switching to the new Tableau data model, as the queries are largely identical. If, for the most part, only a visualization with a connection to one of the two tables (or selected tables in a data model) is required in a dashboard, the new data model is more efficient here as well.
In the case of data modeling with tables of different aggregation levels, it is recommended to switch to the new data model, as fewer and more efficient queries are formed here. It is important that there are no technical restrictions on the creation of the dashboard and that an encapsulated data source can be published in which the connections between the tables are already defined. It also makes sense to use the performance options if possible, as these reduce the number of queries and make the data connection consequently more efficient: Unnecessary aggregations are avoided, and desired join types are preselected.