Community Detection in Networks with Node Covariates

Zahra Razaee
Ph.D., 2017
Advisor: Jingyi Jessica Li
Community detection or clustering is a fundamental task in the analysis of network data. Most networks come with annotations which can be in form of node covariates such as a person’s age, gender and location and/or edge covariates such as time stamps and ratings.
However, most of the existing community detection approaches infer the community memberships merely based on the network structure. Moreover, many real networks have a bipartite structure which makes community detection challenging. In this dissertation, we first propose a model-based approach which allows for matched communities in the bipartite
setting, in addition to node covariates with information about the matching. We derive a simple fast algorithm for fitting the model, based on variational inference ideas. A variation of the model to allow for degree-correction is also considered, in addition to a novel approach
to fitting such degree-corrected models.
We also propose a unified affinity matrix (USim) to leverage the node covariates information that can be used in unipartite networks (directed and undirected) as well as the bipartite networks that combines the information from the network with that from the node covariates into a single similarity matrix, which can then be input to a spectral clustering algorithm. We show the effectiveness of both approaches on simulated and real data, namely, page-user networks collected from Wikipedia.