Mixture Model for the Length of Sessions of World Wide Web Users
Yuxin Liu
M.S., 2011
Advisor: Juana Sanchez
This thesis is about Web usage mining to reveal the information about web browsing behavior hidden in the log files of a Web server, in particular the length of user visits to msnbc’s web site; logs for this are publicly available to researchers. We use a Bayesian approach to mixture modeling of the length of a visit to the msnbc web site, where the length is defined as the number of pages visited per session of a user. Model selection was based on their predictive performance and their goodness of fit. Model selection was based on Akaike’s Information Criterion (AIC), Bayesian Information Criterion (BIC), p-value and Root Mean Square Error (RMSE) of the predictions.
2011