top of page
Pre-processing
First, I used this function to see how may rows and columns this data set contains

Blank Page: Image
Then I used this function to see the names of columns in this dataset. Through the names of columns, I am able to have the insight or the initial understandings about this dataset.

Blank Page: Image
Then I used this function to see the first 5 rows of the dataset to make a quick observation about the values.

Blank Page: Image
Then I noticed that the first 5 rows of column "HOSPITALIZED_COVID_PATIENTS" and "ALL_HOSPITAL_BEDS" contain all the "NaN" values. Therefore I decided to observe more rows to see if these two columns all contain "NaN" values.

Blank Page: Image
After observing the first 50 rows, I am confident to conclude that the column "HOSPITALIZED_COVID_PATIENTS" and "ALL_HOSPITAL_BEDS" contain all the "NaN" values. Then I decided to drop these two columns.

Blank Page: Image
After deleting the column "ALL_HOSPITAL_BEDS" and "HOSPITALIZED_COVID_PATIENTS" that contain all the NaN values. I continue to observe to see if the data set still has null or missing values.

Blank Page: Image
Looks like this dataset still contains some missing values. So I will continue to drop all of the row that contain null values.

Blank Page: Image
Now I will check again to see if my data set still contains any missing value.

Blank Page: Image
Looks like my dataset now does not contain any missing values. That's great! Next step in preprocessing, I will observe the type of values in this dataset.

Blank Page: Image
Looks like this dataset only contains 2 types of data which are object and float. This is perfect and suitable with the purpose of this project and my intended use of this data set. Therefore, I won't do anything about the types of data in this dataset.
I also noticed that this dataset contains some 0 value in some rows but it totally makes sense. Since this data set recorded the data day by day continuously from 03/29/2020 to 11/28/2021 so 03/29/2020 was the first day that the data was recorded in this dataset and it makes sense that some data was 0 on day 1. For instance, it is easy to understand that the hospitalized covid patient in Tehama county on 03/29/2020 was 0 since 03/29/2020 was in the early phase of the Covid 19's outbreak and maybe the outbreak had not impacted Tehama county yet so that's why the hospitalized covid patient was 0 on that day. In short, I wont remove any row or column that contains 0 value since all the 0 value make sense and necessery for the purpose of this project
Blank Page: Text
bottom of page