Tuesday, October 29, 2019

Finding unique values in Alteryx

This is an extension of an Alteryx Weekly Challenge (Challenge #8). In the weekly challenge it wanted the user to find combinations of purchased items to form a meal deal (i.e. people who purchased pizza, a side, and a drink), but here we are going to use a similar technique to find unique values across both string based values (as in the weekly challenge) and spatial data.

I am going to approach this in a few different ways:
1) Using a string field from transactional data
2) Spatially matching customers to stores
    a) using a similar approach to 1 with spatial match
    b) using the spatial process tool

A potential business application of this is for market basket analysis. For example if you know that most of your customers would buy a table and a chair from you, then you can identify those customers who only brought the chair and maybe understand why they didn't also buy the table from you.

Approach 1 - string values in transactional data

If we use a dataset that we are familiar with such as the Tableau Superstore dataset we have here a record for every item purchased by a customer, broken down at Category, Sub Category, Manufacturer and Product level.

If we take this data into Alteryx, then we can use the Summarize Tool to group by Customer ID and Sub Category.

This results in a list where one record represents a Customer and Sub Category. So if a customer has, for example, purchased many individual products across many products in say the Tables sub category it would summarise the individual transactions down into one record for that customer and Tables.

Once we've got this summarised list we then turn this into a concatenated string of the sub categories, for example if a customer purchased some 'Binders' and 'Tables' the resulting value will be 'Binders,Tables'*. At this stage I also add in a count of the Sub Categories as finding the unique values at the end can either be undertaken by filtering the values where the count = 1 or by the approach I outline below.

* The options allow you to specify the delimiter.

Once we've got this list it's just the case of using the filter tool to find a customer who only purchased from one category. For example if I want to find customers who only bought a 'binder' then I can filter my concatenated field on:
  • Contains([Concat_Sub_Category],'Binder') - which will return all of my customers who bought a binder AND
  • !Contains([Concat_Sub_Category],',Binder') OR !Contains([Concat_Sub_Category],'Binder,') - which will remove those records where they also bought another product (N.B. As the concat option is set to use a comma as the delimiter, using the does not contain with the comma at the start and end of binder, will remove records where there are other items from other sub categories also included)

Approach 2a - Unique spatial values (using spatial match)

Taking this example to spatial data, imagine you've got a set of stores and you want to know could you close one of them to save costs. You would want to make sure that there was an alternative store for them to purchase your product from.

This is where finding unique spatial values comes in to play. Within in Alteryx there is some pet store data which is embedded in the sample workflow so I am going to use that.

Using the customer data (spatial points) and the store trade area (spatial polygon) the spatial match tool can be used to identify when the customer (target) sits within the trade area (universe). If a customer location is within multiple trade areas you get a record for each match.

Using the same technique as set out for the string based data, it's possible to group by customer and store id to find all the matches, and then you can use the filter to extract the records of interest. So to expand on the example you could find all the customers who only have 1 nearby store and close the store which has the minimum number of customers affected. Or you could find the customers who are within the catchment area for Store A and Store B.

Approach 2b - Unique spatial values (using spatial process)

An alternative approach to the above is to use the spatial process tool to cut out the intersect (to find customers where they have more than one nearby store) or to cut out the non-intersecting part. This in combination with the spatial match process outlined above will enable the user to understand the stores where they are unique customers.

If you want to learn more about spatial processing in Alteryx, then I highly recommend using the Interactive lessons such as this one.

Monday, October 14, 2019

We "R" going on a summer holiday!

As the autumn starts to set in and the summer seems like a distant memory, this is a quick blog on how I used Alteryx to plan my family trip around Europe.

For our family summer holiday we managed a tour of France, Switzerland, Italy, Austria and Germany staying in Eurocamp. However there were some challenges in planning the trip:

  • We needed to travel in September - a lot of Eurocamp sites close at the end of the season, so we needed to make sure we didn't get stranded
  • With a young family, we couldn't travel more than 3 hours on any one day (we needed to extend this constraint to 5 hours on some days to make the trip work)

Getting the location and opening dates of the campsites

The Eurocamp website is very comprehensive, but we needed a quick way of planning potential locations without browsing lots of pages. So the initial idea was to scrape the website, but this required knowing the site codes which form part of the URL. 

This is where R (and the pun in the blog's title) comes in.

Along as the website, there is a PDF brochure which gives detailed information for each park.

As you can see from this example from the brochure, it gives the park code which is used in the URL and the opening dates. As default it's not possible to read in a PDF file in Alteryx, however with the R (and python) tool it is possible to extend the functionality of Alteryx, so I was able to use the R package PDFTools, to read in each line of the PDF document as a record.

Once the text is read into Alteryx, it was possible to use the Regex tool to parse out the site name, code (see image), and opening dates.

Once I had the park code, it was possible to scrape each site, as the website contained additional information which wasn't available in the brochure, to get the longitude and latitude of each site.

Once I got the longitude and latitude of each site, Alteryx then makes it easy to see where you can get to from that site. Using the create points tool the long/lat is converted into a centriod and then from that a trade area tool enables a drive time catchment area to be calculated (it was important to use drive time as the trip was going to the Alps, so drive time was more important than straight line distance on alpine passes).

Not being so analytical about the trip

At this stage, I explored a bit more with the spatial tools in Alteryx to work out combinations of routes which could be achieved as part of an iterative macro, however I'd already got a lot of information quickly from the analysis, therefore being a bit more 'manual' about the route selection was important as we could be a bit more flexible on our destinations and we found that we actually needed to be a bit more flexible on our travel time constraint to make sure we didn't get stuck.

So the spatial data from Alteryx was output as a Tableau Hyperfile which meant we could see on a map where all the parks were and quickly understand (by filtering to the end of September) to see where we needed to get to before the majority of the campsites closed!

So that is how a data analyst plans their holiday!