top of page

I used this function to see how many rows and columns this data set contains

preprocessing: Text
5B795DEB-532D-4750-9152-FAC750B61254_4_5005_c.jpeg
preprocessing: Image

I used this function to see the names of columns in this dataset. Through the names of columns, I am able to have the insight or the initial understandings about this dataset.

preprocessing: Text
A7A11A0F-6DD9-48D1-9F2A-1D1958C321D4_4_5005_c.jpeg
preprocessing: Image

I used this function to see the first 5 rows of the dataset to make a quick observation about the values.

preprocessing: Text
8605F80D-7418-497A-A871-2A420568225B_4_5005_c.jpeg
preprocessing: Image

After observing the first 5 rows of this dataset, I noticed that the 'capital-loss' columns contains all 0 values and the 'capital-gain' also contains all 0 values except the first row. I decided to observe the first 50 rows of this dataset to see if these two columns contain all 0 values or not. If these two columns contain all 0 values, I will drop them.

preprocessing: Text
BC354BDB-5A80-4BFC-92AC-567CD1C279CA.jpeg
preprocessing: Image

After observing the first 50 rows, I noticed that these two columns contain other values beside 0 value so I can conclude that these 0 values have their own meaning then I will not drop these two columns.

Next, I will observe to see if the data set still has null or missing values.

preprocessing: Text
5119A508-4F4E-4BA8-9460-19502809F3D4_4_5005_c.jpeg
preprocessing: Image

Fortunately, this dataset does not contain any missing values. Even though this dataset does not contain any missing value, but let check if it contain any weird value. I will randomly pick the 'workclass' column to check.

preprocessing: Text
F75A1CF1-66B5-450B-A9A3-0F5E1F281E3F_4_5005_c.jpeg
preprocessing: Image

Surprisingly, this 'workclass' column contains a good amount of "?" values, Let pick another column to check if only this 'workclass' column contains '?' values or other columns also contain '?' values. I will pick 'occupation' column to check.

preprocessing: Text
5190462F-C9C8-4BF3-B07D-95D8D8B2ECA2_4_5005_c.jpeg
preprocessing: Image

Here I noticed that 'occupation' column also contains these weird '?' values. I will drop all these '?' values at all the columns.

preprocessing: Text
ECA4D3D7-D4CD-4B65-AB98-218A34BC2EDE_4_5005_c.jpeg
preprocessing: Image

Check again if there are still these '?' values in this dataset.

preprocessing: Text
901183AA-1B79-491C-A100-5200B3CD8E47.jpeg
preprocessing: Image

Great! All the wierd '?' values are removed!

Next, we notice that the 'fnlwgt' variable is a weight that is given by a researcher arbitrarily, it will not be neceesary and useful for the purpose of this project and my intended use of this dataset so I will drop this column!

preprocessing: Text
FB835B25-2DFA-4F2F-872A-58F41DCA2E94.jpeg
preprocessing: Image
preprocessing: Text

7044972914

  • Facebook
  • Twitter
  • LinkedIn
  • LinkedIn

©2022 by ℅ D. Proudly created with Wix.com

bottom of page