top of page

Before building any classifier model, we need to scale the data first since some numerical data in this dataset has differnt scales, which is needed to be scaled before moving on.

modeling: Text
31500E48-BDCE-40AA-AC5A-17FB43E9BF09.jpeg
modeling: Image

We need to encode our dataset first since this income dataset contains both numerical and categorical data.

modeling: Text
10E7E4F6-38FB-405D-A534-C59F17463AC8_4_5005_c.jpeg
modeling: Image

We need to create the target dataframe and feature dataframe which is used to train our model.

modeling: Text
9678395D-592A-4AA2-91BE-1B27C7DEBBA3.jpeg
modeling: Image

Next, we need to split the data.

modeling: Text
9EF175E8-DA19-4044-8E56-EA54413ABFA2_4_5005_c.jpeg
modeling: Image

Decision Tree

There are some reasons that made me choose decision tree for this project. First of all, this classifier algorithm is easy to understand and to interpret. Moreover, decision tree is able to handle both numerical and categorical data. Since this income dataset I chose contain both numerical and categorical data so decision tree is definitely the best choice to do prediction on this dataset.

modeling: Text
B13D38FB-8EB8-4DB4-B6DD-9FE8DF39DA15.jpeg
modeling: Image

We can see that the test accuracy score of this decision tree classifier on this income dataset is pretty high. In other words, the decision tree classifier did a pretty good job on classifying in this income dataset.

I will plot the tree plot to show how this decision tree classifier works on this income dataset.

modeling: Text
06B7112D-15E7-44E4-B6E2-5850AFAEBC31.jpeg
modeling: Image
730B05FC-61DC-4E4C-9F4F-FBAFA0A82922_1_105_c.jpeg
modeling: Image

Now, lets compute the train and test prediction of decision tree classifier on this income dataset.

modeling: Text
7D511626-B1AE-4A81-B1E2-31AF3A24F102_4_5005_c.jpeg
modeling: Image

Pipelined SVC

Support Vector Machines (SVM) is widely used in classification objectives, which is totally suitable for the purpose of this project. Support vector machine is highly preferred by many as it produces significant accuracy with less computation power. The reason I chose SVM with pipeline is to make a comparison with decision tree to see which classification algorithm works better on this income dataset.

modeling: Text
74A1C4B7-EFAF-4DE0-A844-BD52B0B3C329.jpeg
modeling: Image

Now, let's compute the train and test prediction of pipelined SVM classifier on this income dataset.

modeling: Text
0E6CAF14-9D3D-4989-9B9E-6E5E92D8321D_4_5005_c.jpeg
modeling: Image

Click the button below to download my Jupyter notebook to read more about my modeling step for this project.

modeling: Files

7044972914

  • Facebook
  • Twitter
  • LinkedIn
  • LinkedIn

©2022 by ℅ D. Proudly created with Wix.com

bottom of page