Housing Sale Price Prediction Using Machine Learning Algorithms

Yichen Zhou
MAS, 2020
Wu, Yingnian
In this thesis, I explore how predictive modeling can be applied in housing sale price prediction by analyzing the housing dataset and use machine learning models. Actually, I try four different models, namely, linear regression, lasso regression, randomforest and xgboost. Additionally, as the data have 79 explanatory variables with many missing values, I spend much time dealing with the data. I do explorary data analysis, feature enginnering before model fitting. And then using rmse and R-squared to measure the model performance. After I try four different models, I get some results. As for the first model – linear regression, it doesn’t meet the assumption of equality of the variances. Therefore we can’t use the linear model as the candidate of our final model. Then I try lasso regression, but the RMSE and R-squared looks not so good. Then I try Random forest. The R squared in this model of training set is very good, but in the test set the R squared is relatively low, which may show the RF model is a little bit overfitting. Finally I try the fourth model – xgboost. All of the results of this xgboost model seem very good. Therefore, I will use this xgboost model as my final model to predict the housing price. The xgboost model also shows which variables have important effects on sale price.
2020