In this case study, we are working with a data set from Cyclistic which is a bike-sharing company. This company launched in 2016 and has been very successful. However, they have noticed that there is a difference in behavior when it comes to their casual riders versus their member.
Casual riders are those people who rent just for a few hours or days. Cyclistic offers single-ride and full-day passes for these customers. Members are the customers who have annual Cyclistic memberships.
Cyclistic has set a goal for this year and wants to convert more of its casual customers into members. However, in order to do that, they need to first understand the difference in behavior between these two clients. With this information, they will be able to figure out how they can better target their customers and reach their goals.
This is where a Data Analyst will step in.
Ask: Finding the Right Questions
First thing we need to do is understand the business task and what information we are trying to find. Without this, the analysis will be purely exploratory without any real end. There are over a million rows of data in this data set so it is important to have clear guidelines.
Here are the key things that we need to focus on:
BUSINESS TASK: Show how annual members and casual riders use Cyclistic bikes
DATA SOURCES USED: Index of bucket “divvy-tripdata”
AMOUNT OF DATA: 12 months of data on rides from July 2022 to June 2023
Since the data is coming directly from the company, I can already assume that the data is unbiased and had integrity. We are also looking at the last 12 months of data to draw our insights and conclusions.
Prepare: Prepping the Data
All of the documents were on an Excel csv file. I had to make a decision on whether or not to use SQL for this project. Considering the size of the project files (over 1 million rows total), the best decision would have been to use MySQL, Access, or BigQuery.
However, due to problems importing the files to the platforms mentioned above, I opted to do the entire project using only Excel. This way, I could fully portray my data cleaning, wrangling, and visualization skills using Excel.
Process: Cleaning the Data
The files came in with “dirty” data that had not been separated into columns or rows. The first thing I did was make sure each attribute had its own cell and that the columns were properly named.
In order to do this, I used the “Convert text to table” function in Excel where I then set the delimiter as a comma.
I then saved the file and fixed the naming:
Example: from 202212-divvy-tripdata to dec2022_tripdata
The first thing I noticed was that there was a lot of missing data for certain dates. The format of the dates was also wrong. There are some important things that I needed to get from this data set:
- Number of rides per bike type
- Number of casual riders vs member riders
- Average ride length per customer type
- Average ride length by weekday (Monday-Sunday)
- Rides per weekday
- Total rides per month
In order to get this information, there were some columns that needed to be added. First and foremost, the date needed to be separated from the start and end time. The original format was “2022-12-05 10:47:18.” Leaving the format this way would cause errors in all of my calculations. So I made a column for the date and the weekday.
I then needed to draw the ride_time and create a new column. In order to do that, you take the end_time and subtract the start_time.
Analyse: Organizing and Preparing Data for Analysis
Now that the data has been cleaned and I have all of the missing elements that I need, I started to perform my calculations and analysis.
I quickly noticed that certain elements had the wrong format. So categories like end_time and weekday needed to be converted to “time” and “general” formats respectively.
Once that was done, I started my calculations. In order to make it easier to start with my visualizations, I made pivot tables for all of the data that I wanted to visualize.
Once the calculations were done, I started to notice trends when it comes to ride_time per customer, ride_lengths, and total_rides. We could also see a clear difference in the ride_types where certain bikes were clearly more popular than others. It is also clear that there is a trend when it comes to seasons. Thanks to the calculations in the pivot table, this will help us draw the proper insights from the data. The visualizations in the next step will make these trends even clearer.
Share: Visualizing Your Data
I decided to use pie charts and graphs in Excel to visualize the data. I may put the data into Tableau in the future and if so, the Dashboard will be inserted into this post.
Here are the visualizations from Excel:
Here is the Dashboard in Tableau:
Here are the insights that I have drawn from the data and the visualizations:
- Electric bikes are more popular among consumers with it taking 55% of the total rides
- Dock bikes are the least popular with only 2%
- Members account for 61% of total rides whereas casual riders account for 39%
- Member’s ride counts are also over 13 hours longer than casual riders
- The most popular days for our customers are Fridays and Saturdays
- The weekdays with the longest ride times are Mondays and Fridays
- The most popular months for bike rides are June-July with an over 332% decrease in rides during December-February.
- Seasons play a key role in the number of bike rides
Act: Drawing Conclusions and Making Decisions
From these insights, it is clear that having members is much more beneficial for Cyclistic than having casual riders.
If the company wants to increase its Members, it would be beneficial to focus on special offers on the weekends during the Spring to Summer periods where they will offer people to become members at a discount.
I would also recommend an even deeper analysis where you would look into the states and cities of these customers. With that data, you will be able to see in which areas the Marketing team needs to focus their targeted advertising.
It would also be very beneficial to have information on the age range of the customers and the routes that are taken the most. That way, if the company wants to have billboards, posters, or additional stands, then you can measure which spots are more popular for the customers. If the company sets up stands, you can gather data to see which areas are getting the most member sign-ups which will be very beneficial for the development of the company.