{"id":2842,"date":"2023-08-10T19:09:19","date_gmt":"2023-08-10T19:09:19","guid":{"rendered":"https:\/\/www.themindedmarketing.com\/?p=2842"},"modified":"2023-08-15T15:13:12","modified_gmt":"2023-08-15T15:13:12","slug":"cyclisitic-data-analytics-case-study","status":"publish","type":"post","link":"https:\/\/www.themindedmarketing.com\/cyclisitic-data-analytics-case-study\/","title":{"rendered":"Cyclisitic: Data Analytics Case Study (Excel Project)"},"content":{"rendered":"\n

In this case study, we are working with a data set from Cyclistic <\/strong>which is a bike-sharing company. This company launched in 2016 and has been very successful. However, they have noticed that there is a difference in behavior when it comes to their casual riders versus their member.<\/p>\n\n\n\n

Casual riders<\/span><\/strong> are those people who rent just for a few hours or days. Cyclistic offers single-ride<\/strong> and full-day passes<\/strong> for these customers. Members <\/span><\/strong>are the customers who have annual Cyclistic memberships.<\/p>\n\n\n\n

Cyclistic has set a goal for this year and wants to convert more of its casual customers into members. However, in order to do that, they need to first understand the difference in behavior between these two clients. With this information, they will be able to figure out how they can better target their customers and reach their goals.<\/p>\n\n\n\n

This is where a Data Analyst will step in.<\/p>\n\n\n\n

Ask: Finding the Right Questions<\/h2><\/div>\n\n\n\n

First thing we need to do is understand the business task and what information we are trying to find. <\/strong>Without this, the analysis will be purely exploratory without any real end. There are over a million rows of data in this data set so it is important to have clear guidelines. <\/strong><\/p>\n\n\n\n

Here are the key things that we need to focus on:<\/p>\n\n\n\n

BUSINESS TASK:<\/span><\/strong> Show how annual members and casual riders use Cyclistic bikes
differently.<\/p>\n\n\n\n

DATA SOURCES USED<\/span><\/strong>: Index of bucket “divvy-tripdata”<\/a><\/p>\n\n\n\n

AMOUNT OF DATA:<\/span><\/strong> 12 months of data on rides from July 2022 to June 2023<\/p>\n\n\n\n

Since the data is coming directly from the company, I can already assume that the data is unbiased and had integrity. We are also looking at the last 12 months of data <\/strong>to draw our insights and conclusions.<\/p>\n\n\n\n

Prepare: Prepping the Data<\/h2>\n\n\n\n

All of the documents were on an Excel csv file. I had to make a decision on whether or not to use SQL for this project. Considering the size of the project files (over 1 million rows total), the best decision would have been to use MySQL, Access, or BigQuery. <\/p>\n\n\n\n

However, due to problems importing the files to the platforms mentioned above, I opted to do the entire project using only Excel. <\/strong>This way, I could fully portray my data cleaning, wrangling, and visualization skills using Excel.<\/p>\n\n\n\n

Process: Cleaning the Data<\/h2>\n\n\n\n

The files came in with “dirty” data <\/strong>that had not been separated into columns or rows. The first thing I did was make sure each attribute had its own cell and that the columns were properly named. <\/p>\n\n\n\n

\"example<\/figure><\/div>\n\n\n\n

In order to do this, I used the “Convert text to table”<\/span><\/strong> function in Excel where I then set the delimiter as a comma.<\/p>\n\n\n\n

I then saved the file and fixed the naming:<\/p>\n\n\n\n

Example: from 202212-divvy-tripdata <\/strong>to dec2022_tripdata<\/strong><\/p>\n\n\n\n

The first thing I noticed was that there was a lot of missing data for certain dates.<\/strong> The format of the dates was also wrong. There are some important things that I needed to get from this data set:<\/p>\n\n\n\n