By Dr John Gwinyai Nyakuengama
(20 October 2018)
KEY WORDS
Chicago City, Divvy Bicycles (Divvy Rides) big dataset from 2014 -2017; Big data analytics, visualization and mapping; Stata; R; RapidMiner Turbo Prep; Tibco Spotfire; Power Bi; Google Maps
ABSTRACT
This study analysed the Chicago Divvy rides user transactional, big dataset collected between 2014 to 2017.
It found that over 13.5 Million trips were taken during that period. In 2017 alone, over 590 Divvy ride stations operated over 6,240 individual bikes.
The Chicago Divvy rides users (customers and subscribers) showed two different usage patterns in terms of the:
- Number of Divvy rides and median trip duration, as well as year-on-year growth patterns;
- Time of access to Divvy rides (by day of week and by time of day); and
- Divvy stations from which users had travelled from and to.
The current study identified some big data merits and challenges in the Chicago Divvy rides dataset and show-cased a number of big data analysis, visualization and mapping tools.
Between 2014 and 2017, about 3.5 Million customers and 9.7 Million subscribers accessed Divvy rides. This means that customers and subscribers comprised about a quarter and three quarters of all Divvy ride users, respectively.
From 2014 to 2017, the number of Divvy ride customers steadily decreased. In contrast, the number of subscribers grew, albeit at a decreasing rate. In terms of Year-on-Year (YoY) changes in Divvy rides, the 2017 growth rate in subscribers was half that observed in 2016, and about a third of that in 2015.
Between 2014 and 2017, usage of Divvy rides by both customers and subscribers was seasonal, typically increasing markedly in the warmer, summer months and steadily decreasing with the approaching winter. Nonetheless, the number of subscribers vastly outstripped that of customers, in any month of the year. Also, it is noticeable that the only the numbers of subscribers grew in the four-year period.
Weekday usage of Divvy rides by both customers and subscribers was somewhat reversed during the four years. That is, among customers weekday usage was highest during the weekend and dropped to its lowest by mid-week. The converse was true among subscribers.
The hour-of-the-day, Divvy ride usage profiles of customers and subscribers were very different during 2014-2017:
- Customer usage distribution was uni-modal , peaking in the afternoon (around 14-15 Hours, or 2 to 3 PM) .
- In contrast, subscriber usage distribution was a bi-modal, with two peaks during the morning rush hour (6H00 to 8H00, or 6AM and 8AM) and the evening rush hour (16H00 to 18H00, or 4 to 6 PM).
Also noticeable in the hourly, Divvy ride usage profiles is:
- The steady, upward growth in the subscriber numbers between 2014 and 2017; and
- The customer usage jump between 10H00 and 17H00 (or 10 AM and 5PM) from 2014 to 2015. However, customer usage had dropped off from 2016, particularly after 14H00 (or 2 PM).
The median trip duration of customers was more than twice that of subscribers, during the study.
Generally, Divvy ride subscribers’ median ride duration increased during the warmer spring to summer months then fell-off sharply from autumn months in face of approaching winters. By contrast, customers’ median ride duration was not as sharply seasonal, particularly in 2017.
The day-of-the-week profiles of median ride duration in customers mirror those described previously for the number of rides by day of the week. Of note, their median ride duration tended to increase between 2014 and 2015, but not beyond.
Median ride duration also increased significantly during weekends among subscribers.
In this study, the median trip duration was highest between 8H00 and 15H00, among the Divvy ride customers. This measure was highest during the morning rush-hour (from 7H00 to 9H00) and afternoon rush-hour (from 15H00 to 17H00) among subscribers.
Over the years, there was far less variability in median trip duration by daily hours among subscribers than among customers. In these, there was a substantial yearly increase in the duration of Divvy rides taken before 8H00. In this user type, the increase in median trip duration after 8H00 which occurred since 2014 had pitted-out by 2016.
The five busiest dates in 2017 among Divvy ride customers coincided with the American public holidays, as shown above.
The five busiest day of the week of the year among Divvy ride customers were Mondays in 2017, as shown above.
The five busiest Divvy ride trip start times in 2017 among customers were in the afternoons around of the Independence Day Holiday, as shown above.
The five busiest morning rush hours among Divvy ride subscribers in 2017 were on the work dates shown above.
Tuesday was busiest day of the week in 2017 among Divvy ride subscribers, as shown above.
The five dates in 2017 with the busiest workday afternoons, among Divvy ride subscribers are shown above.
The five busiest afternoon rush hours among Divvy ride subscribers in 2017 were on the work dates shown above.
This map shows that 592 Divvy ride stations in Chicago were active in mid-2018.
In 2017, most customers in Chicago took rides from and to the Divvy stations shown above.
In 2017, most subscribers took rides from and to the Chicago Divvy stations shown above during the morning rush hour.
In 2017, most subscribers took rides during the afternoon rush hour from and to the Chicago Divvy stations shown above.
This study used a number of high-end, state-of-the-art big data tools at various stages to undertake data extraction, preparation, loading, analysis, exploration, visualization and mapping.
Below are screen shots from these tools:
Take home messages – a user-centric view
Divvy Rides rules, such as the requirement for regular bike check-ins depending on the purchased plan (e.g. annual membership, single ride, explorer pass …etc), shape trends observed the bike usage reflected in the Divvy Rides transactional data.
Divvy rides dataset:
- Is a great source of information and insights:
- There are two distinct user types, therefore two unique niches / market segments:
- Customer: leisure / families / visitors
- Subscriber: workers / business personnel
- The two user types have distinct characteristics:
- When they ride – temporal separation (different peak times and shapes)
- How much they ride – rhythmic separation (number of rides and median ride duration)
- From- and to- which Divvy stations – geo-spatial separation (recreational vs business)
- Is an invaluable information source for an eco-friendly transportation in Chicago.
Take home messages – a data-centric view
Demerits – Big data problems:
- Volume: lots of unit-level data
- Velocity: rapid growth, particularly in the subscriber segment
- Veracity:
- Dump codes used in demographic characteristics (e.g. 1900 as year of birth), for user privacy
- Some inconsistent data variable names and geo-coding between the years
Merits – Big data attractions and opportunity for expansion:
- Big data – large volumes of unit-level data; a rich data source for data analytics pedagogy
- Variety – Good infrastructure to capture real-time transactional data with both geographic and temporal attributes
- Quantitative insights from rides usage, by type…and to a limited extent user type demographics
- Invaluable information source for planning – eco-friendly transportation
4