Tuesday, September 17, 2019

Two methods for date scaffolding in Alteryx

What is scaffolding?

There are a variety of blogs online which talk about scaffolding and why it's important, such as this post by The Information Lab. But essentially scaffolding is used to infill dates when the date might be missing from your date, for example you might be looking at staff turnover and you know their start date, and if they are a leaver their end date, but you will also want to know the dates in-between.

So how can you do this in Alteryx?

There are two methods in Alteryx (and potentially other ways as well as there are many routes to solving a problem in Alteryx) which I will walk through. The first is the way I've tended to always do it and the second is a new method I recently discovered, which also has additional benefits.

For the examples the data is structured in the following format:




First method -  Generate Rows



In this first example I use a cross table tool to get one record for my data, with two fields, one for the Start date and a second for the End date. Once shaped my data I use the Generate Rows tool. This enables me to, using a DateTimeAdd formula, create new records for each date between my start and end date. The set up for this is shown in the configuration pane:

Within the config pane, we set up:
1) That we want to create a new field called 'Dates' and that the field type is date
2) The initialisation expression, which is the Start Date, i.e. we want to start from this date
3) The condition is that the new 'Dates' we create are less than or equal to (<=) the End date
4) We loop through creating a new record until the condition in (3) is met by adding 1 day to the 'Dates'. In here we can specify 'seconds' (n.b. the field type would need to be DateTime) through to 'years'.

Then once we run the workflow we will have a record for each day between our start and end dates.

Second Method - TS Filler

The second method uses a macro built in to Alteryx called TS Fill*.

* Despite the tool showing that it is built using R code, shown by the R icon in the bottom left, there is actually no R code in the tool and it mostly uses the Generate Rows method as outlined above, and uses interface tools to increase the functionality of the tool!

With this tool you don't need to cross tab your data to get a field for the start and end date, instead it 'fills in' the missing dates. Therefore using the example data it will treat the first record as the start date and the second record as the end date.

In the config pane you can see that you now have interface tools to select how you want the TS Fill tool to behave.


To set the tool up:
1) Select the Date / DateTime field that you want to create the scaffolding for
2) What you interval is. (N.B. As mentioned above in the * note, this actually uses an interface tool to configure the behaviour of the Generate Rows tool, so if you select 'Years' it will update the formula in step 4 of the Generate Rows tool to be DateTimeAdd([Start],1,'years').
3) Set the increment, so you can deal with creating say fortnightly data by setting the interval to weeks and the increment to 2.

And there you have it, running the workflow will create each day between your start and end date.

Additional functionality of TS Fill

So apart from the TS Fill tool providing a user interface to configure, what lies behind it being the Generate Rows tool, which some may find it being a little easier to use if you find the DateTimeAdd function a bit fiddly to use. The TS Fill tool is useful in 'filling in' any missing dates. So if you find that your data extract is say from a sales system, you may only have records in your data for days where sales occured. So if you want to have a complete set of dates you will need to use the TS Fill tool to add in any missing days, you can then add in those missing date either with null values or depending on your needs to additional analysis to impute (say using average) what those values are.

So if like me you previously did scaffolding using generate rows, hopefully you can now add another tool to your workflows which is a bit more flexible.




No comments:

Post a Comment