Tuesday, October 29, 2019

Finding unique values in Alteryx

This is an extension of an Alteryx Weekly Challenge (Challenge #8). In the weekly challenge it wanted the user to find combinations of purchased items to form a meal deal (i.e. people who purchased pizza, a side, and a drink), but here we are going to use a similar technique to find unique values across both string based values (as in the weekly challenge) and spatial data.

I am going to approach this in a few different ways:
1) Using a string field from transactional data
2) Spatially matching customers to stores
    a) using a similar approach to 1 with spatial match
    b) using the spatial process tool

A potential business application of this is for market basket analysis. For example if you know that most of your customers would buy a table and a chair from you, then you can identify those customers who only brought the chair and maybe understand why they didn't also buy the table from you.

Approach 1 - string values in transactional data

If we use a dataset that we are familiar with such as the Tableau Superstore dataset we have here a record for every item purchased by a customer, broken down at Category, Sub Category, Manufacturer and Product level.

If we take this data into Alteryx, then we can use the Summarize Tool to group by Customer ID and Sub Category.

This results in a list where one record represents a Customer and Sub Category. So if a customer has, for example, purchased many individual products across many products in say the Tables sub category it would summarise the individual transactions down into one record for that customer and Tables.

Once we've got this summarised list we then turn this into a concatenated string of the sub categories, for example if a customer purchased some 'Binders' and 'Tables' the resulting value will be 'Binders,Tables'*. At this stage I also add in a count of the Sub Categories as finding the unique values at the end can either be undertaken by filtering the values where the count = 1 or by the approach I outline below.

* The options allow you to specify the delimiter.

Once we've got this list it's just the case of using the filter tool to find a customer who only purchased from one category. For example if I want to find customers who only bought a 'binder' then I can filter my concatenated field on:
  • Contains([Concat_Sub_Category],'Binder') - which will return all of my customers who bought a binder AND
  • !Contains([Concat_Sub_Category],',Binder') OR !Contains([Concat_Sub_Category],'Binder,') - which will remove those records where they also bought another product (N.B. As the concat option is set to use a comma as the delimiter, using the does not contain with the comma at the start and end of binder, will remove records where there are other items from other sub categories also included)

Approach 2a - Unique spatial values (using spatial match)

Taking this example to spatial data, imagine you've got a set of stores and you want to know could you close one of them to save costs. You would want to make sure that there was an alternative store for them to purchase your product from.

This is where finding unique spatial values comes in to play. Within in Alteryx there is some pet store data which is embedded in the sample workflow so I am going to use that.

Using the customer data (spatial points) and the store trade area (spatial polygon) the spatial match tool can be used to identify when the customer (target) sits within the trade area (universe). If a customer location is within multiple trade areas you get a record for each match.

Using the same technique as set out for the string based data, it's possible to group by customer and store id to find all the matches, and then you can use the filter to extract the records of interest. So to expand on the example you could find all the customers who only have 1 nearby store and close the store which has the minimum number of customers affected. Or you could find the customers who are within the catchment area for Store A and Store B.

Approach 2b - Unique spatial values (using spatial process)

An alternative approach to the above is to use the spatial process tool to cut out the intersect (to find customers where they have more than one nearby store) or to cut out the non-intersecting part. This in combination with the spatial match process outlined above will enable the user to understand the stores where they are unique customers.

If you want to learn more about spatial processing in Alteryx, then I highly recommend using the Interactive lessons such as this one.

No comments:

Post a Comment