top of page
For the experiment 3, I will try to select other features for the prediction by using Sequential Feature Selector. Sequential Feature Selector is basically a part of the wrapper methods in feature selection. This algorithm selects multiple features from the set of features and evaluates them for model iterate number between the different sets with reducing and improving the number of features so that the model can meet the optimal performance and result.
Mathematically these algorithms are used for the reduction of initial N features to M features where M<N and the M features are optimized for the performance of the model.
ex3: Text
Data Preprocessing
We will use the Sequential Feature Selector to select 3 features that are the most optimized for the performance of the model.
ex3: Text

ex3: Image
Looks like the features got selected by using Sequential Feature Selector algorithm are 'OverallQual', 'BsmtFinSF1', 'GrLivArea'. Let see how linear regression model performs with these 3 selected features. Now we will transform the feature_3 dataframe to a new dataframe only containing these 3 selected features ('OverallQual','BsmtFinSF1','GrLivArea') and we use this new dataframe to train. For the target, we will also use the same target which is 'SalePrice'.
ex3: Text

ex3: Image
Modeling
We still the same linear regression model with the new_feature_3 containing those 3 features that we selected by using Sequential Feature Selection above. Using the same linear regression model for the experiment 3 is able to draw comparison about using the same regression model with three different ways of selecting features and see which way of selecting feature will help the regression model perform the best on this dataset.
Split the data again using new_feature_3 dataframe
ex3: Text

ex3: Image
Build and train model
ex3: Text

ex3: Image
Evaluating
I will also use the MSE and RMSE values and coefficient of determination value to evaluate the performance of my model.
ex3: Text

ex3: Image
We can see that the MSE value and RMSE value for the experiment 3 are also very low, which means our model performed well on predicting our target (SalePrice) with those 3 features that got selected by using Sequential Feature Selector, which are 'OverallQual', 'BsmtFinSF1', 'GrLivArea'. Moreover, the coefficient of determination value is also relatively high, which is 0.79. This means about 79% of data fit the linear regression model in this case using those 3 features selected by Sequential Feature Selector. Overall, the linear regression that is built and trained using features selected by Sequential Feature Selector did a pretty good job on prediction on this housing dataset.
Download my Jupyter Notebook to view my code.
ex3: Text
bottom of page