top of page

This site was designed with the

website builder. Create your website today.Start Now

Black and White Star in Circle

Thien Nguyen Portfolio

The first thing I observed that this dataset contains lots of NaN values and some of the columns contain all the NaN values. I will print out all the columns containing missing values and the number of missing values for each of those column.

ex1 preprocessing: Text

Screen Shot 2022-03-28 at 3.02.31 PM.png

ex1 preprocessing: Image

Looks like the majority of data in 'Alley' column, 'FireplaceQu' column, 'PoolQC' column, 'Fence' column and 'MiscFeature' column contain NaN values so I will go ahead and drop all these columns.

ex1 preprocessing: Text

Screen Shot 2022-03-28 at 3.04.41 PM.png

ex1 preprocessing: Image

For the numerical data containing the less number of missing values, I will replace all the missing values by the median of corresponding column. For the categorical variables containing missing data, I will fill in None to all those missing categorical data.

ex1 preprocessing: Text

Screen Shot 2022-03-28 at 3.08.22 PM.png

ex1 preprocessing: Image

Now lets check if my dataset still contains any missing value or not.

ex1 preprocessing: Text

Screen Shot 2022-03-28 at 3.10.26 PM.png

ex1 preprocessing: Image

Great! Now my dataset is clean of missing values.

Next, I will make sure that all the variables will be transformed to numerical values in order to build and train regression model on this dataset.

ex1 preprocessing: Text

Screen Shot 2022-03-28 at 3.14.55 PM.png

ex1 preprocessing: Image

Lastly, I will also drop the ID column since this column is not necessary for building and training any model.

ex1 preprocessing: Text

Screen Shot 2022-03-28 at 3.17.18 PM.png

ex1 preprocessing: Image

Great! Now this dataset is ready for building and training regression models on it.

Download my Jupyter Notebook to view my code.

ex1 preprocessing: Text

bottom of page