For our family summer holiday we managed a tour of France, Switzerland, Italy, Austria and Germany staying in Eurocamp. However there were some challenges in planning the trip:
- We needed to travel in September - a lot of Eurocamp sites close at the end of the season, so we needed to make sure we didn't get stranded
- With a young family, we couldn't travel more than 3 hours on any one day (we needed to extend this constraint to 5 hours on some days to make the trip work)
Getting the location and opening dates of the campsites
The Eurocamp website is very comprehensive, but we needed a quick way of planning potential locations without browsing lots of pages. So the initial idea was to scrape the website, but this required knowing the site codes which form part of the URL.
This is where R (and the pun in the blog's title) comes in.
Along as the website, there is a PDF brochure which gives detailed information for each park.
As you can see from this example from the brochure, it gives the park code which is used in the URL and the opening dates. As default it's not possible to read in a PDF file in Alteryx, however with the R (and python) tool it is possible to extend the functionality of Alteryx, so I was able to use the R package PDFTools, to read in each line of the PDF document as a record.
Once the text is read into Alteryx, it was possible to use the Regex tool to parse out the site name, code (see image), and opening dates.
Once I had the park code, it was possible to scrape each site, as the website contained additional information which wasn't available in the brochure, to get the longitude and latitude of each site.
Once I got the longitude and latitude of each site, Alteryx then makes it easy to see where you can get to from that site. Using the create points tool the long/lat is converted into a centriod and then from that a trade area tool enables a drive time catchment area to be calculated (it was important to use drive time as the trip was going to the Alps, so drive time was more important than straight line distance on alpine passes).
Not being so analytical about the trip
At this stage, I explored a bit more with the spatial tools in Alteryx to work out combinations of routes which could be achieved as part of an iterative macro, however I'd already got a lot of information quickly from the analysis, therefore being a bit more 'manual' about the route selection was important as we could be a bit more flexible on our destinations and we found that we actually needed to be a bit more flexible on our travel time constraint to make sure we didn't get stuck.
So the spatial data from Alteryx was output as a Tableau Hyperfile which meant we could see on a map where all the parks were and quickly understand (by filtering to the end of September) to see where we needed to get to before the majority of the campsites closed!
So that is how a data analyst plans their holiday!