You have probably already used a live connection to connect to your data source. But have you ever wondered, what is a data extract connection or how does a Tableau Data Extract connection work?
When you connect your data in Tableau, you have the option to choose between a Live connection and an Extract.
A live connection queries the data from the database and the data is updated every time you open your workbook.
An extract is a copy of the data that is brought in the Tableau data engine. Since it does not send the query to the source, it increases the workbook’s performance.
You can further reduce the query run time by limiting the total data in the data source.
Another advantage of using an extract is in case you go offline, you can still run your queries. Data extracts can give you more functionality, portability and increased performance for your viz.
When setting up an extract, you have many choices about what to include in the extract. You can use the whole data source; or you can use filters to take a fraction of the data. You can aggregate the data, which can make the performance of the workbook fast, as no new calculations are needed.
You can update the connection to your workbook any time; and switch from extract to live and make changes to your view, like include new data points. You should then update the extract again to make sure both the view and the extract reflects the new changes. Otherwise the extract may not have the data required for your view or it may still reflect stale data.
How to Create, Edit and Update a data extract
There are two places where you can create an extract.
You can right-click on the data source you are connected to while the worksheet is open and select ‘Extract Data’.
Or you can click on the data source tab at the bottom to go to the Data Source page. Click on the Extract radio button on the top-right to extract data.
To reduce or filter the data in the extract, click on the Edit button. After filtering your data extract, Tableau will ask you where you would like to save the TDE file.
Once back on the worksheet, the data source icon gets updated to indicate that you are using a data extract.
You can easily switch between Live and Extract connections by right clicking on the data source and selecting Use Extract.
You can also update or refresh your extract by right-clicking the data source and select Extract, then select Refresh.
How to schedule a Tableau Data Extract Refresh
Note : Make a note of the file format of the extract before you perform an extract refresh. The extract will upgrade to .hyper extract automatically if you perform a refresh on an .tde extract using version 2020.4. You will be unable to open the extract with previous versions of Tableau Desktop if it is hyper extract.
To refresh the extract incrementally, follow the below steps.
- Select a data source on the Data menu and then select Extract Data.
- In the Extract Data dialog box, select All rows as the number of Rows to extract. Incremental refresh can only be defined when you are extracting all rows in the database. You cannot increment a sample extract.
- Select Incremental refresh.
- Specify a column in the database to identify new rows. Alternatively, you can use an ID column that increases as rows are added to the database.
- When finished, click Extract.
For more details, refer to the Tableau Help page about Refresh Extracts.
When should you use Data Extract
There are many situations where a live connection isn’t practical. For instance :
- Portability : To share your viz with someone outside your organization, you can publish the extract to Tableau Server or Online. Or you can save it as a TWBX, TDSX, or sent as a TDE file.
- Performance : Since data extracts are highly optimized for queries they can be much faster than a live connection.
- Large record volumes : MS Excel is limited to 1M records. MS Access can have 100s of thousands to couple M records depending on the complexity of the table. Whereas a TDE can potentially handle billions of records.
Also, data is stored in multiple tables could be too big for a live connection but fine for a TDE.
- Limiting data exposure : If you want to share your viz without exposing the underlying data, TDE can be quite helpful. There are many ways you can handle it. You can create extract filters on TDEs to only include the necessary records. You can set up the extract to only include fields used in the workbook. Also, you can configure extracts to aggregate the data and therefore hide record-level detail.
- For file-based sources, such as Excel, when we include the files in a TWBX, it’s the whole file. If we extract the data then only the necessary data for the workbook is in the TDE.
- Size of data volumes : TDE is highly compressed, so it can be a lot smaller than the original uncompressed source.
- Supported calculations and function : Data extracts generally support more functions than other data sources (with the exception of RAWSQL functions).
- Computation complexity : There are various computations that TDEs can handle in combined ways that some data sources can’t.
More advantages …
- Materialized expressions. Tableau materializes record-level calculations as fully indexed & compressed fields in an extract. These are the ones that use only fields from a single data source and are not dependent on run-time values . This can improve performance in many cases. For example when splitting name or address fields and/or creating datetime fields out of strings.
- Access to cloud-based data sources. In order to make cloud-based sources such as Salesforce, Google Analytics, etc, we have to use Tableau data extracts.
- Option to publish to Tableau Public. For performance reasons we can only use TDEs when publishing to Tableau Public.
Limitations to using Data Extracts
There are some limitations to using data extracts too.
To begin with, that the data doesn’t update automatically. If your data source changes, those changes won’t reflect in the extract. You will have to refresh it manually or schedule periodic data refreshes.
Another is that if you are using a filtered data extract, not all the data points would be available. So if you search something which exists on the data source but not on the extract, you won’t find it in the workbook.
Changing the data structure of the underlying data can require rebuilding the entire TDE. This part may not be very easy or take too much time.
Tableau Data Extracts do not support RAWSQL functions, nor can we use Custom SQL on an already-created extract. One use case for RAWSQL is when the underlying data source supports a given function and Tableau does not yet support that feature for that source.