Sunday, June 14, 2020

Three ways to enable / disable a tool container in an Analytic App

On Friday I posted on Twitter how I found the configuration of how a tool container should behave a bit counter intuative. 

When connecting a radio button or check box to a tool container with an action tool, the fixed behaviour is to disable the tool container!

Often when I create analytic apps I want there to be an optional section of the workflow that a user can chose to run or not. Examples would be to save the output file, create a Tableau workbook, select a different API end point etc.

So this blog goes through my initial approach and then two alternatives suggested by Nicole Johnson (@The_Real_NJ) and Chris Lewis (@4loveofAlteryx).

Dummy radio button method - my approach

To account for this behaviour the way I usually set it up is to have a second radio button on the canvas. I'll call this a dummy radio button as it doesn't connect to anything. However it does appear in the user interface of the analytic app.

The advantage of doing it this way, is when selecting the radio button 'Enable the container' it forces the other radio button to not be selected, therefore it doesn't disable the container. And voila it gives the user an option to enable / disable (or should I say disable / enable!) a part of the workflow.

However, the reason for writing a blog post is I wanted to share two additional ways to control the behaviour of a tool container via radio buttons / check buttons which actually follow the desired logic I would have expected in my approach.

Conditional method - Nicole's approach

So the method Nicole suggested is to use a Condition tool.

The way this is configured on the canvas means you can have a single check box in the analytic app interface, where ticking the box enables the tool container and unchecking disables the tool container

Then you need to set up the condition tool as follows:

Here you just set the input connection from the Check box [#1], which when the check box is ticked will be true and when unselected the result of the condition is false. Then after the condition tool you add an action tool after the T(rue) and F(alse) outputs.

It is here where I think the set up then makes sense to the user as you have the option to set the behaviour of the action to to either Enable or Disable the connected container. 

Set up of the action tool after the true condition

Set up of the action tool after the false condition

Update formula method - Chris Lewis' approach

The method Chris suggested was to use the "Update Value with Formula" option.

To be honest when exploring how to do this I didn't really know how to set this one up. But looking into the way Alteryx set up the example Action tool workflow (right click on a tool and select 'Open Example'), it turns out this how they do it!

How to open an example workflow

Within the Action Tool example is a tool container showing how Alteryx set up the configuration to enable/disable tool containers.

So looking at the configuration in the example I was able to discover that you can update the value of the Disabled from False (i.e. the container is enabled) to True (to disable the container), but using a simple IIF statement: IFF([#1], "False", "True").

Like the method suggested by Nicole this approach also follows my expected logic on how to control the container. 

The benefit of this approach is it required fewer tools on the canvas, which to me is a benefit as I typically move all my interface tools to the top of the workflow and use wireless connections, however some may find the syntax of the IIF statement a bit confusing as the [#1] evaluates to true when the radio button / check box is selected (turned on) and false when deselected (turned off). Whereas using the condition tool by incorporating two distinct actions makes it clear what happens when it's selected / deselected.

By showing three methods for controlling the behaviour, hopefully you find a method that works for you and remember to consider review-ability for other users to know what the workflow does.

Friday, May 1, 2020

Plotting UK population data using Alteryx and Tableau

While not a geographer or having any real GIS background, I was previously a transport planner and would need to make use of tools like MapInfo. So the ease and intuitiveness of spatial analysis in Alteryx has always pleased me. 

For a recent proof of concept I needed to get UK population data to enrich a financial dataset I had, so after a quick Google I stumbled across this dataset -

Upon downloading the file I discovered it was in ERSI asc file format, which isn't a spatial file type that can natively be read into Alteryx, but by reading it in as a csv file with no delimiters, I could see that it looked like a grid of numbers with some header information.

This helpful Wikipedia article helped explain what the values in the header represented, and I knew I could get on my way with Alteryx'ing my way through this.

Step 1: Create spatial points on a grid

My first step was to make sure I'd interpreted the values correctly, so I sketched out the boundary points on the grid:

Which when converting to points in Alteryx (using the British Grid system) and then creating a polygon showed me that I had a polygon which covered the entire UK.

Initially I thought the next step would be to make a grid from that polygon. However when I did this I ended up with more grid cells than I would have expected, I wasn't 100% sure why this was but given from the image above where visually the polygon isn't a true rectangle I assumed the reason would be due to the projection. Therefore I needed to find another way of creating my grid...

Using two generate rows tools it was possible to create my 652 mid points for my columns (x axis) and the 1211 rows (y axis), remembering to step in by 500m. Appending these together resulted in 789,752 records which I could then convert to points.

Step 2: Parse out the .asc file

After removing the header rows, it was clear to see the rest of the .asc file was a big grid of numbers which needed parsing out. Using the multirow formula tool and splitting text to rows on the space delimiter it was possible to create a table like below:

This table could then be joined onto the spatial data, based on matching the X and Y fields.

Step 3: Create 1km polygons

The grid created in the two steps above was most of the way to where I needed to get to, but the limitation was each population data point was just a dot on the map. To enable spatial analysis later on this needed to be converted to a polygon representing the 1sqKm grid the population is distributed over.

The technique I used to do this was to create a 0.7km non-overlapping trade area which results in a map looking like this:

Then all that was left to do was to export the file.

This could then be used either again in Alteryx along with spatial match to calculate the population living within a catchment area of a store, or presented in Tableau to understand changes in population density.

Which results in a nice looking map which is much easier to interpret than the gridded numbers in the original text input.

All in this workflow took about 1 hour to build including the research on ESRI .asc file formats and the initial experimentation of understanding the spatial tools to use.

Note: The data used in the graphics contains data supplied by Natural Environment Research Council.
©NERC (Centre for Ecology & Hydrology). Contains National Statistics data © Crown copyright and database right 2011.

Wednesday, February 26, 2020

How to find out how people are using Alteryx (without an Alteryx server)

I have previously spoken about the ‘phone home’ feature which can be turned on to help people in a tech-enablement role understand how people are using Alteryx on Desktop by sending usage data back to Alteryx server, at the London User Group.

Feedback from that session was that this is a very helpful feature, but not everyone has access to Server, so how else can you see how Alteryx is being used?

For this blog, I am going to cover two approaches:

1) Tool usage

This is a hidden example built into Designer, it is a macro which searches through the .yxmd/yxmc files (these are just xml text files) on a user’s machine to identify which tools are being used within a workflow.

To access the macro, in Designer go to [ ] and open up the example workflow. Once you are in the example workflow, it’s just a case of selecting the folder where the Alteryx workflow files are saved and running the workflow. The output of this macro will give you details on the tool category, tool name, and the workflow name.

From here you can do some further analysis such as filtering out the standard macros which come with the Alteryx install, get a total count of workflows a user has built, the tool categories a user focuses on etc. Running this now on my machine I have learnt that I have built over 2,000 workflows, that my favourite tool is the formula tool with over 15,000 uses of it and that I have built over 100 macros!

In addition to getting tool usage stats, I have also found this tool really useful to locate that workflow where a particular tool has been used. For example, I had a colleague who wanted to know examples of when a particular tool could be used (in this example it was the map input tool). Now it would be pretty hard to put your hands on the workflow straight away that used that particular tool, but by running the macro you can then filter (it’s great that filtering in the browse pane is now possible!) to that tool, find the workflow name and load it up!

If your organisation has users save their workflows on a network drive, point the macro to that network drive then away you go. One limitation of the example workflow straight out the box is that while you will get details about the tools being used, you won’t know who built them. This can be solved though (if your users are following good governance) as the workflow has meta info about the author, which like details of the tools is embedded in the xml, so this just needs parsing our as well.

When collecting this information, it can provide useful insights into the workflows being built by users, and from a tech enablement perspective it can help identify which tool categories each user uses a lot and then you can provide advanced training on the tools, or identify areas where you can introduce other parts of the Alteryx palette.

2) A hidden macro within a workflow

The first example is great to get a lot of information about the tools being used, however to get the specific tool usage by user it relies on the population of user details within the workflow metadata configuration pane.

In addition you get details of the individual workflows being produced, but no information about the frequency of use. For example if there is a macro which makes use of an API, the first approach will give you details that a user has a workflow called, for example, ‘External Provider API.yxmc’, but there is no way to collect how frequently they may be using the external provider’s API. This is where the second approach has it’s benefits.

The second approach is to create a macro (I do it this way as it is then easy to embed into a workflow or another macro), which leverages the ‘Who Am I’ feature in Windows Command Prompt to identify the user, Workflow Constants to get information on the workflow name and the DateTimeNow tool to get a time stamp.

The run command tool is used to capture the user details from their windows login, so it’s helpful if this is a unique identifier such as a Staff ID or email address (set up below).


This workflow then writes out the information collected at the time of the workflow run to an Excel file (as you can append new data to an existing file), but this could quite easily be to a database or another file type.


So this blog sets out two ways in which either you as a user or an organisation can use tools available directly within Alteryx Designer to monitor usage of Alteryx and understand how your team are making use of Alteryx.

The techniques here do not require the Alteryx Server to capture this information, however the approach here has limitations such as having users save their workflows to a location that you can access, and ensuring that they embed the macro from the second approach into their workflow. Whereas the Server based approach is able to capture much more meta information and not rely on these factors which make it a richer dataset for understanding usage.

Saturday, November 16, 2019

TC19 Wrap up

So that’s a wrap, TC19 has finished! So what did I get up to and what were my thoughts of conference.

For me my conference was broken into the following session types:

Hands on training 
Certification; and
Other demonstration sessions

So here are my thoughts on the above.


Opening Keynote

Wow, just wow! It always surprises me by the sheer number of Tableau users come to Conference and the size of the venues. This year Conference was at The Mandalay Bay hotel in Las Vegas, and the arena that normally host concerts (I think the capacity was 7,000). And the opening to the keynote lived up to all of the theatre of concert! 

The first content of the keynote, was delivered by Adam Selipsky and focused on building a data culture and drew out an analogy against how building a data culture was to the war effort of the code breakers in WWII and also how diversity is very important in a data culture by focusing on some key women in the code breaking effort, which was highlighted by the book Code Girls, written by Liza Mindy (who was also present in the opening keynote and received a standing ovation).
This focus on a data culture brought out these elements, proficiency, including everyone, agility and community, which led into the introduction of Tableau Blueprint which is Tableau’s framework and resources for building out a data culture in your organisation.

I’m keen to look at the resources available, as part of my role over the last 2+ years has been in tech-enabling my team and building an internal community. So hopefully it will validate the approach we’ve taken and introduce some resources to help us further on this journey.

There was then a fireside chat with Marc Benioff, the Salesforce CEO. When the conference was held in Berlin earlier this year Salesforce had just announced their intention to acquire Tableau, now with the deal finalised there was a lot more available around the strategy of the acquisition, although it was a very high-level set of initial thoughts. It will be interesting to find out if more is said at Salesforce’s Dreamforce event next week.

The final part of the opening keynote focuses on some of the new features which have either just been released or are likely to be part of the next couple of releases.

Key updates from this that were exciting for me were:

 - Prep builder moving to the browser: this will mean that the data preparation and visualisation and publication can now be done end to end in the cloud. This will be important along with the AWS partnership to help scale deployments and potentially (and this is me speculating) the ability to add different tiers to the subscription model where an online prep and Desktop can have a reduced feature set for a lower price point.

 - Ask Data and Explain Data: I’ve put these these two together as I see them working together to further democratise data analysis. With both being go to tools to assist with the initial data discovery and to help consumers of the dashboards to interact with the content and delve deeper in the insights.

- Data Model: Like above this will help democratise data analysis. Often users will get confused with how the data should be related and joining data at different levels of aggregation. These are all things that experienced ‘data people’ will be familiar with and know how to deal with, such as the different types of joins and the use of LOD expressions to roll up granular records to a more aggregate level. I first saw this last year at the New Orleans conference and was initially disappointed as, while I totally get the benefits of making it easier for people to do the analysis and not everyone is currently a ‘data person’, I thought it hid the stuff that was join on under the hood and could make people feel like they didn’t need to understand important concepts in data engineering. However, now seeing it for the second time I’m much more open to the Data Model features as it can help act as a stepping stone by taking away some of the initial complexities and as users become more familiar to data engineering you can then look under the hood and make changes if they are required.

All of the features stuff was just a teaser to the second day’s content for Devs on Stage.

Devs on Stage

As a precursor to the Devs on Stage session, I bumped into Andrew Beers (CTO) on the morning 5km run. When running with Andrew in the morning I didn’t know that he was CTO and I just thought he was a product developer, but having that ignorance during the run was probably good as we chatted about a broad range of topics instead of focusing on the roadmap which was covered a few hours later in the keynote.

Here’s the whole list of stuff covered in the Keynote, with me expanding on the areas I’m excited about or thought they could be implemented in a different way.

Visualisations & Web
 -Mark animations
The lack of animations in Tableau has been a missing feature for a while now, and is one of the reasons people love D3 so much. And as Hans Rosling showed in his presentations (TED Talk) movement is a very important pre-attentive attribute when showing movement over time, or changes in position. From having had a play with mark animations in the alpha release it looks like Tableau’s implementation of this will make presentation of dashboards very engaging.

 - Pages play button on the web
 - PDF subscriptions
 - Extract refresh based on schedules
 - Public - web editing
 - Public - Web Explore
I can see this as being a easily misused feature. It’s great that you can very easily open up another person’s workbook to get under the hood to understand how it was built, but the fact you can now also edit on the web and then save it to your profile (yes it automatically attributes it to the original author) how many duplicate versions are we going to see of people’s IronViz submissions or weekly challenges like Makeover Monday. The plagiarism of visualisations on public has already been highly discussed on social media, including this post by Hesham (IronViz winner).

 - Tooltip editing in the browser

Prep Builder 
 - List view
 - Fixed LOD
 - Rank 
 - Reusable steps
This will allow multiple steps which may be common in your Flows to be saved to Server and they can be reused. Coming from a background with a lot of experience with Alteryx were this concept is a standard macro where it means common steps with a number of tools can be collapsed down into a single tool in the workflow, I was a bit disappointed to see the result in Prep is to insert all of the steps that make up the saved flow so it doesn’t save screen real estate.

 - Incremental refresh

Data Modelling
I’ve already covered my thoughts on data modelling above,  and I would encourage you to re-watch the keynote for this section as it is much easier to see a visual representation of the Data Model than writing out the functionality here.

 - Relationships
 - Automatic LOD
 - Automatic Join
 - Multi-Fact Schema Support

Server administration
 - Custom welcome message
 - Custom login message
 - Login based licences
The change here will mean that you change the persona of a user on Server it will allocate them the according licence, therefore if you change someone to Creator they will get given a Creator licence allowing them to use Desktop to now build their own dashboards. However what wasn’t clear from the Keynote was how this is done in practice and work with license agreements. For example when converting someone to a Creator does it allocate a licence from an asset pool managed on server, and if there are no available Creator licences will it go off and buy one? Equally if a user moves from Creator to Viewer does there key get given back to be reallocated to another user? If any one has the answer to this then let me know and I’ll update the blog.

 - Resource monitoring tool
 - Content migration tool

 - Set control 
With the introduction of Set Actions last year, it enabled users to use interactivity on the dashboard to update members of the set. This brought in very useful functionality which Bethany Lyons covered in great detail at last year’s conference. However what if you don’t want to have another sheet in your dashboard to interactively change the members in the set and instead have a interface that looks like a multi-select filter. This is what Set Control enables, so it can make a simpler interface, but I still love the interactive option.

 - Buffer calculations
I think this has to be my favourite new feature to be introduced, and I ended up chatting a lot to the Developers in the Data Village about additional functionality it would be good to see with the buffer. Following the introduction of the spatial calculations in 2019.1(?) with MakePoint() and Distance(), Buffer() now enables you to create a trade area around the point and use an intersect join to spatially match two datasets (common examples will be customers and stores) based on the radius of the buffer. Coming from Alteryx this is a feature I’ve used lots in Alteryx using the Trade Area tool and Spatial Match, but what doing this now in Tableau enables is the easy change to the radius of the buffer to see what changes in radius size does to the number of customers in overlapping trade areas.


 - Dynamic parameters
I think this was the feature that got the most cheers from the crowd and it’s been one of the most requested features on the community. So now it’s finally arrived, albeit they didn’t say which version it’s going to be in. For those that don’t know, at the moment when you create a list parameter you can complete the list by taking the values from a field in the data source. However, that list then become static and doesn’t get updated when there is a new extract of data (imagine a list of dates). Now with Dynamic Parameters the list of values in the parameter control are now longer static, so it will save having to recreate the parameter list and publishing the workbook when there’s new values in the list. 

Hands on Training

One of my key reasons I managed to get my employer to support my attendance at conference was the ability to hand select a training course that is suitable for my interests and skill set which wouldn’t be possible from an off the shelf training course. So the majority of the sessions I chose were hands on training sessions where you have a laptop in the room and follow along with an instructor on some advanced/Jedi level sessions. I picked session on using Python inside Tableau, using the metadata api, Jedi calcs and advanced table calculations. 

The way these sessions are set up are you have access to a virtual environment which means all of the required content is available to you (such as already having Anaconda installed in the Python session) and a set of instructions to follow along with. You also get to take the data and instructions away with you, so post conference you can reflect on what you did and practice, instead of leaving behind a saved workbook with all your advanced calculations and only memories of how to do it.

Also the benefit of these sessions you can do stuff outside your knowledge but then gain an understanding of how that can be applied. So before the conference I knew nothing about the APIs available and what you would use them for. But having now attended a hands on session for the Metadata API, where I was using the GraphiQL interface to make GraphQL queries, I now have an appreciation of what is possible and will be looking out how this can be applied in work.


I’m not going to provide an overview of the DCP exam as there are already a lot of blog posts covering this, such as this one by Mark Edwards.

Another selling point for my business case of going to Conference was that I could take my Desktop Certified Professional exam there (at a $100 discount!). As the DCP exam is 3 hours I preferred the option to do this at conference instead of at home as you don’t have the set up time with the online Proctor and for me it feels like.a more formal exam experience instead of sitting on my sofa at home doing it (which is what I needed to do for my associate exam due to WiFi issues in the set up).

A consideration that you’ll need to take into account if you decide to take your exams at conference will be that you do the exam in a room with around 200 people who will be doing a variety of exams so there will be a flow of people in and out as they will finish their exam earlier than you. I didn’t find this too distracting as I took the exam at the earliest time available and it seems like other people just wanted to get the exam out of the way so the room filled up at 9am which minimised the disruption.

I am now on the 3 week wait for results...

Other sessions

Hackathon Demos

After my DCP exam I attended the demos where just over 20 people presented what they had done for 6 hours earlier that day for their Hack. 

I found this to be a really fun session to see the variety of hacks the teams produced. 

If I get to go to conference again next year I would love to take part in it, as one of my post-conference promises to myself is to dig into the APIs that are available and start learning about them, so the Hack would be a good opportunity to put this to the test.

Explain Data

One of the tips I received last year when going to Conference was “Make sure you go to one of Bethany Lyons’ sessions”, and I want to share that with everyone reading my blog as a tip, as last year her session on set actions was the best session I went to. So this year as soon as the app came out with the conference sessions I made sure I tracked down what Bethany was covering this year and attend one of those. 

I haven’t used Explain Data yet, but saw what it could do from following attendees tweets from TC Europe. So I was keen to get a demonstration. In this session Bethany took us through how it can be used and what Explain Data is not. Like the Data Modelling tools it becomes a tool to help guide you. So Explain Data can help guide you along your data exploration journey and understand trends and outliers in your data. However she made very clear in this session that just because a data point is an outlier it shouldn’t just be excluded, it might be that data need normalising and then the point isn’t an outlier. Also just because Explain Data does some clear stats in the background it doesn’t mean you should take the results at face value, as correlation doesn’t mean causation. With this I see a feature that will be helpful but to be used with caution and ensure you can at least recall your high school stats courses when using it.

Beyond Design - Secret for Creating Engaging and Effective Information Experiences 

This session was led my two Zen Masters so they can be seen as masters in their field. The two presenters were Michael Cisneros and Liliach Manheim.   
This was a very engaging presentation which looked to answer the community question of ‘how to I make effective visualisations’ so by going through a few standard visualisations many of which were familiar to me as they are shared on Tableau Public, but it also included examples not built in Tableau.

The session drew out the design elements which you need to consider when building effective dashboards. I really enjoy these sessions are they are ones where you can reinforce the elements of visualisation design that you may have self-taught through reading books and blog posts but have it delivered by experts in their field.

Two minute tips and tricks 

This was always going to be hard slot to entertain having to fill the afternoon slot at the end of conference and after Data Night Out! This session was led by Jeffery Shaffer (author of Big Book of Dashboards) and Luke Stanke (co-contributor to Workout Wednesday), and both Zen Masters, so it was sure to be full of tips and tricks.

Unfortunately I didn’t get any photos from this session as it required focus to keep up with how quickly they could go through the tips, so it will be a workbook I’ll be sure to download from Tableau Public next week (and hopefully there will be a recording to watch it again)

Tips included, building out a Top 10 and Bottom 10 view on one sheet which makes comparison of the results much easier without needing to change the axis when putting it on a dashboard, a KPI dashboard, building out a calendar using the index() function to fill in missing dates, and date calculations to get the last whole month and then relative calculations such as month over month and year over year change.

Labs feedback session

The way the schedule works out at conference means that you can end up with some good slots in your schedule to explore the Data Village. There’s so much going on in the Data Village, with it being the place to get food, meet vendors, breakout sessions and an area just to have a bit of a chill out. But for me the experiences in the Dev areas where you can play with the new features, speak to the teams behind them and do a interactive feedback sessions like the one I took part in exploring the potential to use a chatbot to explore the data. I was keen to take part in this session to see where the Devs thought the product could go. For me, in my role, it’s important to know your audience and when looking at financial information different stakeholders want to consume that in very different ways, so a chatbot could have a use case in this example. Plus one of the learnings from the Effective Dashboards session, people understand text so not everything needs to be presented as a chart.

The session was based on the Titanic passenger dataset, so you could ask questions via a Slack interface such as “How many passengers were on board?”, “What was the distribution of passenger ages?”, and “Did age increase your change of survival?”. The results I got from my questions were mixed, for example receiving the results as a chart when I would just expect a number to be presented or a bar chart when expecting a scatterplot. 

The session was a ‘Wizard of Oz’ experiment, which is a common method used with the Lean methodology. This meant that Tableau hadn’t build a platform which could do the NLP, query the data and present back the data. What they actually had was someone behind a screen using Tableau Desktop to surface the results and sharing them back on a screen. This technique is very common with examples being betting companies testing whether providing a card account service would be beneficial for their premier accounts, but with minimal set up and outlay they just provide a pre-paid card (think like Revolut) that a team can modify the balance on manually (with a slight delay) to test out the market for this product. So it will be interesting to see (and if) Tableau develop this further as there’s lots of considerations to factor in, such as not knowing the field names and being able encode what the natural query actually represents. Something you can see developing anyway with the AskData product.


So that’s a wrap of my conference which was a very valuable experience. The next US conference is going to back in Las Vegas next year (I was hoping for somewhere closer to East Coast, just so the time difference wasn’t as big which would make the jet lag more manageable). But I’m excited to get back to my team and share my learnings with them.

Tuesday, October 29, 2019

Finding unique values in Alteryx

This is an extension of an Alteryx Weekly Challenge (Challenge #8). In the weekly challenge it wanted the user to find combinations of purchased items to form a meal deal (i.e. people who purchased pizza, a side, and a drink), but here we are going to use a similar technique to find unique values across both string based values (as in the weekly challenge) and spatial data.

I am going to approach this in a few different ways:
1) Using a string field from transactional data
2) Spatially matching customers to stores
    a) using a similar approach to 1 with spatial match
    b) using the spatial process tool

A potential business application of this is for market basket analysis. For example if you know that most of your customers would buy a table and a chair from you, then you can identify those customers who only brought the chair and maybe understand why they didn't also buy the table from you.

Approach 1 - string values in transactional data

If we use a dataset that we are familiar with such as the Tableau Superstore dataset we have here a record for every item purchased by a customer, broken down at Category, Sub Category, Manufacturer and Product level.

If we take this data into Alteryx, then we can use the Summarize Tool to group by Customer ID and Sub Category.

This results in a list where one record represents a Customer and Sub Category. So if a customer has, for example, purchased many individual products across many products in say the Tables sub category it would summarise the individual transactions down into one record for that customer and Tables.

Once we've got this summarised list we then turn this into a concatenated string of the sub categories, for example if a customer purchased some 'Binders' and 'Tables' the resulting value will be 'Binders,Tables'*. At this stage I also add in a count of the Sub Categories as finding the unique values at the end can either be undertaken by filtering the values where the count = 1 or by the approach I outline below.

* The options allow you to specify the delimiter.

Once we've got this list it's just the case of using the filter tool to find a customer who only purchased from one category. For example if I want to find customers who only bought a 'binder' then I can filter my concatenated field on:
  • Contains([Concat_Sub_Category],'Binder') - which will return all of my customers who bought a binder AND
  • !Contains([Concat_Sub_Category],',Binder') OR !Contains([Concat_Sub_Category],'Binder,') - which will remove those records where they also bought another product (N.B. As the concat option is set to use a comma as the delimiter, using the does not contain with the comma at the start and end of binder, will remove records where there are other items from other sub categories also included)

Approach 2a - Unique spatial values (using spatial match)

Taking this example to spatial data, imagine you've got a set of stores and you want to know could you close one of them to save costs. You would want to make sure that there was an alternative store for them to purchase your product from.

This is where finding unique spatial values comes in to play. Within in Alteryx there is some pet store data which is embedded in the sample workflow so I am going to use that.

Using the customer data (spatial points) and the store trade area (spatial polygon) the spatial match tool can be used to identify when the customer (target) sits within the trade area (universe). If a customer location is within multiple trade areas you get a record for each match.

Using the same technique as set out for the string based data, it's possible to group by customer and store id to find all the matches, and then you can use the filter to extract the records of interest. So to expand on the example you could find all the customers who only have 1 nearby store and close the store which has the minimum number of customers affected. Or you could find the customers who are within the catchment area for Store A and Store B.

Approach 2b - Unique spatial values (using spatial process)

An alternative approach to the above is to use the spatial process tool to cut out the intersect (to find customers where they have more than one nearby store) or to cut out the non-intersecting part. This in combination with the spatial match process outlined above will enable the user to understand the stores where they are unique customers.

If you want to learn more about spatial processing in Alteryx, then I highly recommend using the Interactive lessons such as this one.

Monday, October 14, 2019

We "R" going on a summer holiday!

As the autumn starts to set in and the summer seems like a distant memory, this is a quick blog on how I used Alteryx to plan my family trip around Europe.

For our family summer holiday we managed a tour of France, Switzerland, Italy, Austria and Germany staying in Eurocamp. However there were some challenges in planning the trip:

  • We needed to travel in September - a lot of Eurocamp sites close at the end of the season, so we needed to make sure we didn't get stranded
  • With a young family, we couldn't travel more than 3 hours on any one day (we needed to extend this constraint to 5 hours on some days to make the trip work)

Getting the location and opening dates of the campsites

The Eurocamp website is very comprehensive, but we needed a quick way of planning potential locations without browsing lots of pages. So the initial idea was to scrape the website, but this required knowing the site codes which form part of the URL. 

This is where R (and the pun in the blog's title) comes in.

Along as the website, there is a PDF brochure which gives detailed information for each park.

As you can see from this example from the brochure, it gives the park code which is used in the URL and the opening dates. As default it's not possible to read in a PDF file in Alteryx, however with the R (and python) tool it is possible to extend the functionality of Alteryx, so I was able to use the R package PDFTools, to read in each line of the PDF document as a record.

Once the text is read into Alteryx, it was possible to use the Regex tool to parse out the site name, code (see image), and opening dates.

Once I had the park code, it was possible to scrape each site, as the website contained additional information which wasn't available in the brochure, to get the longitude and latitude of each site.

Once I got the longitude and latitude of each site, Alteryx then makes it easy to see where you can get to from that site. Using the create points tool the long/lat is converted into a centriod and then from that a trade area tool enables a drive time catchment area to be calculated (it was important to use drive time as the trip was going to the Alps, so drive time was more important than straight line distance on alpine passes).

Not being so analytical about the trip

At this stage, I explored a bit more with the spatial tools in Alteryx to work out combinations of routes which could be achieved as part of an iterative macro, however I'd already got a lot of information quickly from the analysis, therefore being a bit more 'manual' about the route selection was important as we could be a bit more flexible on our destinations and we found that we actually needed to be a bit more flexible on our travel time constraint to make sure we didn't get stuck.

So the spatial data from Alteryx was output as a Tableau Hyperfile which meant we could see on a map where all the parks were and quickly understand (by filtering to the end of September) to see where we needed to get to before the majority of the campsites closed!

So that is how a data analyst plans their holiday!

Tuesday, September 17, 2019

Two methods for date scaffolding in Alteryx

What is scaffolding?

There are a variety of blogs online which talk about scaffolding and why it's important, such as this post by The Information Lab. But essentially scaffolding is used to infill dates when the date might be missing from your date, for example you might be looking at staff turnover and you know their start date, and if they are a leaver their end date, but you will also want to know the dates in-between.

So how can you do this in Alteryx?

There are two methods in Alteryx (and potentially other ways as well as there are many routes to solving a problem in Alteryx) which I will walk through. The first is the way I've tended to always do it and the second is a new method I recently discovered, which also has additional benefits.

For the examples the data is structured in the following format:

First method -  Generate Rows

In this first example I use a cross table tool to get one record for my data, with two fields, one for the Start date and a second for the End date. Once shaped my data I use the Generate Rows tool. This enables me to, using a DateTimeAdd formula, create new records for each date between my start and end date. The set up for this is shown in the configuration pane:

Within the config pane, we set up:
1) That we want to create a new field called 'Dates' and that the field type is date
2) The initialisation expression, which is the Start Date, i.e. we want to start from this date
3) The condition is that the new 'Dates' we create are less than or equal to (<=) the End date
4) We loop through creating a new record until the condition in (3) is met by adding 1 day to the 'Dates'. In here we can specify 'seconds' (n.b. the field type would need to be DateTime) through to 'years'.

Then once we run the workflow we will have a record for each day between our start and end dates.

Second Method - TS Filler

The second method uses a macro built in to Alteryx called TS Fill*.

* Despite the tool showing that it is built using R code, shown by the R icon in the bottom left, there is actually no R code in the tool and it mostly uses the Generate Rows method as outlined above, and uses interface tools to increase the functionality of the tool!

With this tool you don't need to cross tab your data to get a field for the start and end date, instead it 'fills in' the missing dates. Therefore using the example data it will treat the first record as the start date and the second record as the end date.

In the config pane you can see that you now have interface tools to select how you want the TS Fill tool to behave.

To set the tool up:
1) Select the Date / DateTime field that you want to create the scaffolding for
2) What you interval is. (N.B. As mentioned above in the * note, this actually uses an interface tool to configure the behaviour of the Generate Rows tool, so if you select 'Years' it will update the formula in step 4 of the Generate Rows tool to be DateTimeAdd([Start],1,'years').
3) Set the increment, so you can deal with creating say fortnightly data by setting the interval to weeks and the increment to 2.

And there you have it, running the workflow will create each day between your start and end date.

Additional functionality of TS Fill

So apart from the TS Fill tool providing a user interface to configure, what lies behind it being the Generate Rows tool, which some may find it being a little easier to use if you find the DateTimeAdd function a bit fiddly to use. The TS Fill tool is useful in 'filling in' any missing dates. So if you find that your data extract is say from a sales system, you may only have records in your data for days where sales occured. So if you want to have a complete set of dates you will need to use the TS Fill tool to add in any missing days, you can then add in those missing date either with null values or depending on your needs to additional analysis to impute (say using average) what those values are.

So if like me you previously did scaffolding using generate rows, hopefully you can now add another tool to your workflows which is a bit more flexible.