Google Data Analytics Capstone

Mark Emenyonu
5 min readSep 15, 2022

--

Case Study: How does a bike-share navigate speedy success?

An image of docked Citibike bikes. Photo taken by Daniel Adams on Unsplash

I recently completed the Google Data Analytics Professional Certificate and it was indeed a wonderful experience overall. For my capstone project, I performed an analysis of a bike-share company in Chicago, USA.

Header containing Cyclistic Bike-share logo

The tools I used for this project are Microsoft Excel, R Studio and Tableau.

The R scripts used for this analysis are contained in this markdown.

Background

In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles across Chicago. Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.

The marketing analysis team believes that maximizing the number of annual members will be key to future growth and has set a clear goal: Design marketing strategies aimed at converting casual riders to annual members. In order to do that, the marketing analyst team needs to better understand how annual members and casual riders differ.

In order to answer the business key questions, I will follow the steps of the data analysis process which include: ask (this refers to the business task), prepare, process, analyze, share and act.

Business Task

Analyze Cyclistic’s historical bike trip data to determine how casual riders differ from annual members. This will involve collecting previous 12 months of Cyclistic trip data.

Prepare

This stage involves data collection and verification for accuracy. The data used was provided by Motivation International Inc. under this license and it can be accessed here. The data used consists of 12 individual csv files for trip data from August 2021 till July 2022. Each csv file contains 13 fields of data which include trip information such as ride_id, rideable_type, trip start time, trip end time, etc. Riders’ personal identifiable information was withheld due to data-privacy issues.

Process

I observed the data records ranged from 103k to 800k. As a result of the data size, I decided to use R Studio for my data cleaning and validation.
In batches of three, the csv files were loaded in the R studio console using the read_csv() function and stored to data frames: batch1, batch2, batch3 and batch4.

Data Cleaning and Transformation

I loaded the `tidyverse`, `lubridate` and other required packages which will be used for this analysis:

library(tidyverse)
library(lubridate)
library(readr)
library(dplyr)

Note: The following data cleaning & validation operations were carried out for each individual data frame (batch1,…,batch4). The steps highlighted below focus on batch3 data frame which comprises trip data for February2022, March2022 and April2022.

I proceeded to merge the files to a single data frame (batch3), eliminate fields of least importance and create additional fields where needed:

batch3 <- bind_rows(feb22, mar22, apr22)
batch3 <- batch3 %>% select(-one_of("start_lat","start_lng","end_lat","end_lng"))
batch3 <- batch3%>% mutate(batch3, trip_duration = ended_at - started_at)
batch3$date <- as.Date(batch3$started_at)batch3$month <- format(as.Date(batch3$date),"%m")batch3$year <- format(as.Date(batch3$date),"%y")batch3$day_of_week <- format(as.Date(batch3$date),"%A")

Analyze

Now that the data has been properly organized into respective data frames, I organized the data into a single large data frame to perform necessary calculations & analysis. The individual data frames were merged into a single data frame titled `cyclistic`. I observed some cells containing missing values (NA). I changed all these entries to “null”:

cyclistic <- bind_rows(batch1, batch2, batch3, batch4)
cyclistic[is.na(cyclistic)] <- "null"

Additionally, I observed negative values in the trip length field. Trip duration can not be negative. I created a new data frame, omitting the negative entries observed:

cyclistic_v2 <- cyclistic[!(cyclistic$trip_length_secs < 0),]

Descriptive and Summary Analyses

I performed some calculations to indicate the average trip duration, minimum and maximum trip durations:

cyclistic_v2 %>% summarise(mean_trip_length = mean(trip_length_secs), median_tl = median(trip_length_secs), max_tl = max(trip_length_secs), min_tl = min(trip_length_secs))

I analyzed ride data by user type, date and bike type. Afterwards, I exported as a csv file later on for further visualization using Tableau:

ride_data <- cyclistic_v2 %>%group_by(member_casual, date,rideable_type)%>%summarise(ridecount=n(),average_duration=mean(trip_length_secs))%>%arrange(member_casual, date)write.csv(ride_data, "ride_data.csv")

Share

The csv file for visualization was imported to Tableau desktop. Below is an image shot of the interactive report dashboard:

Cyclistic Dashboard created on Tableau

Follow this link to access the interactive dashboard on Tableau.

Insights

There were a lot of insights obtained from this analysis but I will focus on that which applies to the business task. The major question asked from the business task is: “How do casual riders differ from registered members?”

How do Cyclistic riders differ?

  1. The ridership trend for Registered Members shows a higher usage of Cyclistic bikes during the weekdays, and less during the weekends.
  2. The reverse is the case for Casual Riders as their peak ride counts are on Saturdays and Sundays, while their least ride counts occur during the weekdays.
  3. Docked bikes are observed to be used by Casual Riders ONLY. Interesting to note.
  4. Casual Riders are observed to have longer average trip lengths compared to Registered Members.

Act

Based on my findings from this analysis, here are my recommendations for the marketing team:

  1. Marketing Campaigns for Casual riders should be tailored to weekends as there are more casual riders riding during the weekends than the weekdays.
  2. Docked bikes are largely used by Casual riders alone. Placing discounts on the use of docked bikes, as a subscription bonus for casual riders, will play a vital role in converting Casual riders to Members.
  3. It is observed that Casual riders averaged a longer trip duration than Annual Members. This suggests that Casual riders are more likely to be long distance riders. Therefore, a marketing campaign can be targeted to long distance riders, while offering them subscription bonuses for a period of time.
  4. Seasonal preferences by Cyclistic users should also be put into consideration. It is imperative that the marketing team tailors marketing campaigns to summertime periods as it is a favoured season for riders.

Conclusion

There are some information that could be included to provide more insights to the marketing team, such as riders’ demographics. Customer demographics plays a vital role in tailoring advertisement campaigns.

Thank you for reading through this article. I hope you enjoyed it? Comments and improvement actions are always welcome.

Feel free to connect with me on LinkedIn. Thank you for your time.

--

--

Mark Emenyonu
Mark Emenyonu

Written by Mark Emenyonu

Data Analyst. Electrical Engineer.

No responses yet