BIG DATA ANALYTICS AND VISUALIZATION OF CHICAGO DIVVY RIDES (From 2014 to 2017)

By Dr John Gwinyai Nyakuengama

(20 October 2018)

Slide1c

KEY WORDS

Chicago City, Divvy Bicycles (Divvy Rides) big dataset from 2014 -2017; Big data analytics, visualization and mapping;  Stata; R; RapidMiner Turbo Prep; Tibco Spotfire; Power Bi; Google Maps

 

ABSTRACT

This study analysed the Chicago Divvy rides user transactional, big dataset collected between 2014 to 2017.

It found that over 13.5 Million trips were taken during that period. In 2017 alone, over 590 Divvy ride stations operated over 6,240 individual bikes.

The Chicago Divvy rides users (customers and subscribers) showed two different usage patterns in terms of the:

  • Number of Divvy rides and median trip duration, as well as year-on-year growth patterns;
  • Time of access to Divvy rides (by day of week and by time of day); and
  • Divvy stations from which users had travelled from and to.

The current study identified some big data merits and challenges in the Chicago Divvy rides dataset and show-cased a number of big data analysis, visualization and mapping tools.

 

Slide2

Slide3c.gif

Slide4b.gif

 

Slide5

 

 

Slide6

Slide7a

Slide8

Slide9

Slide10

 

Slide12

Slide13

Slide14a

Slide15

 

 

Slide16c

 

Slide17b

 

 

 

Slide18Slide19Slide20Slide21Slide22

 

 

 

Slide23Slide24Slide25

Between 2014 and 2017, about 3.5 Million customers and 9.7 Million subscribers accessed Divvy rides. This means that customers and subscribers comprised about a quarter and three quarters of all Divvy ride users, respectively.

Slide26.gif

From 2014 to 2017, the number of Divvy ride customers  steadily decreased. In contrast, the number of subscribers grew,  albeit at a decreasing rate. In terms of Year-on-Year (YoY) changes in Divvy rides, the 2017 growth rate in subscribers was half that observed in 2016, and about a third of that in 2015.Slide27.gif

Between 2014 and 2017, usage of Divvy rides by both customers and subscribers was seasonal, typically increasing markedly in the warmer, summer months and steadily decreasing with the approaching winter. Nonetheless, the number of subscribers vastly outstripped that of customers, in any month of the year. Also, it is noticeable that the only the numbers of subscribers grew in the four-year period.

Slide28.gif

Weekday usage of Divvy rides by both customers and subscribers was somewhat reversed during the four years. That is, among customers weekday usage was highest during the weekend and dropped to its lowest by mid-week. The converse was true among subscribers.

Slide29.gif

The hour-of-the-day, Divvy ride usage profiles of customers and subscribers were very different during 2014-2017:

  • Customer usage distribution was uni-modal , peaking in the afternoon (around 14-15 Hours, or 2 to 3 PM) .
  • In contrast, subscriber usage distribution was a bi-modal, with two peaks during the morning rush hour (6H00 to 8H00, or 6AM and 8AM) and the evening rush hour (16H00 to 18H00, or 4 to 6 PM).  

Also noticeable in the hourly, Divvy ride usage profiles is:

  • The steady, upward growth in the subscriber numbers between 2014 and 2017; and
  • The customer usage jump between 10H00 and 17H00 (or 10 AM and 5PM) from 2014 to 2015. However, customer usage had dropped off from 2016, particularly after 14H00 (or 2 PM).

 

Slide30

Slide31.gif

The median trip duration of customers was more than twice that of subscribers, during the study.

Slide32.gif

Generally, Divvy ride subscribers’ median ride duration increased during the warmer spring to summer months then fell-off sharply from autumn months in face of approaching winters. By contrast, customers’ median ride duration was not as sharply seasonal, particularly in 2017.

Slide33.gif

The day-of-the-week profiles of median ride duration in customers mirror those described previously for the number of rides by day of the week.  Of note,  their median ride duration tended to increase between 2014 and 2015, but not beyond. 

Median ride duration also increased significantly during weekends among subscribers.

Slide34.gif

In this study, the median trip duration was highest between 8H00 and 15H00, among the Divvy ride customers. This measure was highest during the morning rush-hour (from 7H00 to 9H00) and afternoon rush-hour (from 15H00 to 17H00) among subscribers.

Over the years, there was far less variability in median trip duration by daily hours among subscribers than among customers. In these, there was a  substantial yearly increase in the duration of Divvy rides taken before 8H00.  In this user type, the increase in median trip duration after 8H00 which occurred since 2014 had pitted-out by 2016.  

Slide35Slide36

The five busiest dates in 2017 among Divvy ride customers coincided with the American public holidays, as shown above.Slide37

The five busiest day of the week of the year among Divvy ride customers were Mondays in 2017, as shown above. Slide38

The five busiest Divvy ride trip start times in 2017 among customers were in the afternoons around of the Independence Day Holiday, as shown above.

Slide39Slide40

The five busiest morning rush hours among Divvy ride subscribers in 2017 were on the work dates shown above.Slide41

Tuesday was busiest day of the week in 2017 among Divvy ride subscribers, as shown above.

Slide42Slide43

The five dates in 2017 with the busiest workday afternoons,  among Divvy ride subscribers are shown above.Slide44

The five busiest afternoon rush hours among Divvy ride subscribers in 2017 were on the work dates shown above.

Slide45

Slide46

This map shows that 592 Divvy ride stations in Chicago were active in mid-2018.Slide47

In 2017, most customers in Chicago took rides from and to the Divvy stations shown above.

Slide48

 

Slide49

In 2017, most subscribers took rides from and to the Chicago Divvy stations shown above during the morning rush hour.

Slide50

Slide51

In 2017, most subscribers took rides during the afternoon rush hour from and to the Chicago Divvy stations shown above.

 

Slide52

This study used a number of high-end, state-of-the-art big data tools at various stages to undertake data extraction, preparation, loading, analysis, exploration, visualization and mapping.

Below are screen shots from these tools:

Slide53

 

Slide Stata_final

 

Slide55

Slide56

Slide57

Slide58

Take home messages – a user-centric view

Divvy Rides rules, such as the requirement for regular bike check-ins depending on the purchased plan (e.g. annual membership, single ride, explorer pass …etc), shape trends observed the bike usage reflected in the Divvy Rides transactional data.

 

Divvy rides dataset:

  • Is a great source of information and insights:
  • There are two distinct user types, therefore two unique niches / market segments:
    • Customer: leisure / families / visitors
    • Subscriber: workers / business personnel
  • The two user types have distinct characteristics:
    • When they ride – temporal separation (different peak times and shapes)
    • How much they ride – rhythmic separation (number of rides and median ride duration)
    • From- and to- which Divvy stations – geo-spatial separation (recreational vs business)
  • Is an invaluable information source for an eco-friendly transportation in Chicago.

 

Take home messages – a data-centric view

Demerits – Big data problems:

  • Volume: lots of unit-level data
  • Velocity: rapid growth, particularly in the subscriber segment
  • Veracity:
    • Dump codes used in demographic characteristics (e.g. 1900 as year of birth), for user privacy
    • Some inconsistent data variable names and geo-coding between the years

 

Merits  – Big data attractions and opportunity for expansion:

  • Big data – large volumes of unit-level data; a rich data source for data analytics pedagogy
  • Variety – Good infrastructure to capture real-time transactional data with both geographic and temporal attributes
  • Quantitative insights from rides usage, by type…and to a limited extent user type demographics
  • Invaluable information source for planning – eco-friendly transportation

 

 

Slide61

Slide62b4

Slide63

 

Slide64

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.