All about Data Blending in Tableau
Prerequisites for data blending
Differences between joins and data blending
Blend your data
Data blending limitations
Data blending is a method for combining data that supplements a table of data from one data source with columns of data from another data source.
Usually you use joins to perform this kind of data combining, but there are times, depending on factors like the type of data and its granularity, when it's better to use data blending.
For example, suppose you have transactional data stored in Salesforce and quota data stored in an Excel workbook. The data you want to combine is stored in different databases, and the granularity of the data captured in each table is different in the two data sources, so data blending is the best way to combine this data.
Data blending is useful under the following conditions:
- You want to combine data from different databases that are not supported by cross-database joins.Cross-database joins do not support connections to cubes (for example, Oracle Essbase) or to some extract-only connections (for example, Salesforce). In this case, set up individual data sources for the data you want to analyze, and then use data blending to combine the data sources on a single sheet.
- Data is at different levels of detail.Sometimes one data set captures data using greater or lesser granularity than the other data set.For example, suppose you are analyzing transactional data and quota data. Transactional data might capture all transactions. However, quota data might aggregate transactions at the quarter level. Because the transactional values are captured at different levels of detail in each data set, you should use data blending to combine the data.
Use data blending instead of joins under the following conditions:
- Data needs cleaning.If your tables do not match up with each other correctly after a join, set up data sources for each table, make any necessary customizations (that is, rename columns, change column data types, create groups, use calculations, etc.), and then use data blending to combine the data.
- Joins cause duplicate data.Duplicate data after a join is a symptom of data at different levels of detail. If you notice duplicate data, instead of creating a join, use data blending to blend on a common dimension instead.
- You have lots of data.Typically joins are recommended for combining data from the same database. Joins are handled by the database, which allows joins to leverage some of the database’s native capabilities. However, if you're working with large sets of data, joins can put a strain on the database and significantly affect performance. In this case, data blending might help. Because Tableau handles combining the data after the data is aggregated, there is less data to combine. When there is less data to combine, generally, performance improves.Note: When you blend on a field with a high level of granularity, for example, date instead of year, queries can be slow.
Prerequisites for data blending
Your data must meet the following requirements in order for you to use data blending.
Primary and secondary data sources
Data blending requires a primary data source and at least one secondary data source. When you designate a primary data source, it functions as the main table or main data source. Any subsequent data sources that you use on the sheet are treated as a secondary data source. Only columns from the secondary data source that have corresponding matches in the primary data source appear in the view.
Using the same example from above, you designate the transactional data as the primary data source and the quota data as the secondary data source.
Note: Cube (multidimensional) data sources must be used as the primary data source. Cube data sources cannot be used as a secondary data source.
Defined relationship between the primary and secondary data sources
After designating primary and secondary data sources, you must define the common dimension or dimensions between the two data sources. This common dimension is called the linking field.
Continuing the example from above, when you blend transactional and quota data, the date field might be the linking field between the primary and secondary data sources.
- If the date field in the primary and secondary data sources have the same name, Tableau creates the relationship between the two fields and shows a link icon () next to the date field in the secondary data source when the field is in the view.
- If the two dimensions don’t have the same name, you can define a relationship that creates the correct mapping between the date fields in the primary and secondary data sources.
Differences between joins and data blending
Data blending simulates a traditional left join. The main difference between the two is when the join is performed with respect to aggregation.
Left join
When you use a left join to combine data, a query is sent to the database where the join is performed. Using a left join returns all rows from the left table and any rows from the right table that has a corresponding row match in the left table. The results of the join are then sent back to and aggregated by Tableau.
For example, suppose you have the following tables. If the common columns are User ID and Patron ID, a left join takes all the data from the left table, as well as all the data from the right table because each row has a corresponding row match in the left table.
Data blending
When you use data blending to combine data, a query is sent to the database for each data source that is used on the sheet. The results of the queries, including the aggregated data, are sent back to and combined by Tableau. The view uses all rows from the primary data source, the left table, and the aggregated rows from the secondary data source, the right table, based on the dimension of the linking fields. Dimension values are aggregated using the ATTR aggregate function, which means the aggregation returns a single value for all rows in the secondary data source. If there are multiple values for the rows, an asterisk (*) is shown. Measure values are aggregated based on how the field is aggregated in the view.
You can change the linking field or add more linking fields to include different or additional rows of data from the secondary data source in the blend, changing the aggregated values.
For example, suppose you have the following tables. If the linking fields are User ID and Patron ID, blending your data takes all of the data from the left table, and supplements the left table with the data from the right table. In this case, not all values can be a part of the resulting table because of the following:
- A row in the left table does not have a corresponding row match in the right table, as indicated by the null value.
- There are multiple corresponding values in the rows in the right table, as indicated by the asterisk (*).
Suppose you have the same tables as above, but the secondary data source contains a new field called Fines. Again, if the linking fields are User ID and Patron ID, blending your data takes all of the data from the left table, and supplements it with data from the right table. In this case, you see the same null value and asterisks in the previous example in addition to the following:
- Because the Fines field is a measure, you see the row values for the Fines field aggregated before the data in the right table is combined with the data in the left table.
- As with the previous example, a row in the left table does not have corresponding row for the Fines field, as indicated by the second null value.
Blend your data
You can use data blending when you have data in separate data sources that you want to analyze together on a single sheet. The following example demonstrates how to blend data from two data sources: an Excel data source and an SQL Server data source.
To blend your data
Data blending limitations
There are some data blending limitations around non-additive aggregates, such as COUNTD, MEDIAN, and RAWSQLAGG. For more information, see Troubleshoot Data Blending.
Comments
Post a Comment