top of page

Pre-processing

First, I used this function to see how may rows and columns this data set contains

FDD86BC1-2D11-4AC7-B9D1-CFE59A092ED2_4_5005_c.jpeg
Blank Page: Image

Then I used this function to see the names of columns in this dataset. Through the names of columns, I am able to have the insight or the initial understandings about this dataset.

98E00CB0-12D2-404A-8324-488C46044A6B_4_5005_c.jpeg
Blank Page: Image

Then I used this function to see the first 5 rows of the dataset to make a quick observation about the values.

6087EF43-B6D8-465A-904E-67035E4D4E75_4_5005_c.jpeg
Blank Page: Image

Then I noticed that the first 5 rows of column "HOSPITALIZED_COVID_PATIENTS" and "ALL_HOSPITAL_BEDS" contain all the "NaN" values. Therefore I decided to observe more rows to see if these two columns all contain "NaN" values.

F38FAD3F-1E23-4F40-8AEA-4191632FA53D.jpeg
Blank Page: Image

After observing the first 50 rows, I am confident to conclude that the column "HOSPITALIZED_COVID_PATIENTS" and "ALL_HOSPITAL_BEDS" contain all the "NaN" values. Then I decided to drop these two columns.

B6202827-5470-47FA-84ED-53940DF98273.jpeg
Blank Page: Image

After deleting the column "ALL_HOSPITAL_BEDS" and "HOSPITALIZED_COVID_PATIENTS" that contain all the NaN values. I continue to observe to see if the data set still has null or missing values.

4F36A104-4159-4D9C-9D84-28A9193927D0_4_5005_c.jpeg
Blank Page: Image

Looks like this dataset still contains some missing values. So I will continue to drop all of the row that contain null values.

3B5F985D-49EC-4813-BB6D-97AFDA5F0C0E.jpeg
Blank Page: Image

Now I will check again to see if my data set still contains any missing value.

574AAC90-C2CC-466F-B6BB-1922B6963222_4_5005_c.jpeg
Blank Page: Image

Looks like my dataset now does not contain any missing values. That's great! Next step in preprocessing, I will observe the type of values in this dataset.

0898B572-FFCA-4EA1-8EC6-0D33BEE40079_4_5005_c.jpeg
Blank Page: Image

Looks like this dataset only contains 2 types of data which are object and float. This is perfect and suitable with the purpose of this project and my intended use of this data set. Therefore, I won't do anything about the types of data in this dataset.

I also noticed that this dataset contains some 0 value in some rows but it totally makes sense. Since this data set recorded the data day by day continuously from 03/29/2020 to 11/28/2021 so 03/29/2020 was the first day that the data was recorded in this dataset and it makes sense that some data was 0 on day 1. For instance, it is easy to understand that the hospitalized covid patient in Tehama county on 03/29/2020 was 0 since 03/29/2020 was in the early phase of the Covid 19's outbreak and maybe the outbreak had not impacted Tehama county yet so that's why the hospitalized covid patient was 0 on that day. In short, I wont remove any row or column that contains 0 value since all the 0 value make sense and necessery for the purpose of this project

Blank Page: Text

7044972914

  • Facebook
  • Twitter
  • LinkedIn
  • LinkedIn

©2022 by ℅ D. Proudly created with Wix.com

bottom of page