Compact Bayesian Models of Massive Social Graphs

NSF logo
Compact Bayesian Models of Massive Social Graphs.  2016 - 2018.  Grant funded by NSF's Statistics and Methods, Mesaurement, and Statistics Programs (Award #1559778).  Tyler McCormick, PI.  

This research project will develop and apply new statistical models that capture dynamic, local structure in large-scale social networks. As digital technologies proliferate around the world, social science researchers increasingly have access to data that reflect granular and nuanced patterns of human interactions and social behavior. This is a change from traditional research on social networks, which requires expensive and time-consuming survey modules. Given the cost of collecting network data from surveys, most knowledge about the role of individuals' social networks in behavior comes from data about relatively small networks observed at a single time period. In contrast, the investigators will apply their methods to data that are unsolicited and arise as electronic logs generated through social media and communication services. In such instances, interactions occur in real time and involve millions of unique individuals. Examining such data provides insights into the patterns of day-to-day interactions that shape individuals' social context. The techniques to be developed will model rich network structure with intuitive and simplified representations that will make them easier to integrate into social science research. Open-source software will be made available through CRAN and the Python Package Repository. Aggregated data and related materials required to replicate results of published articles will be available thru ICPSR or GitHub.

This project will develop a Bayesian framework that parsimoniously represent the sub-structures of very large graphs with a novel approach to sparse coding. The project also will provide an intuitive set of methods that are designed to enable social scientists to better understand the latent structure in new forms of massive social network data. Despite the promise of unsolicited data sources, the size and granularity of the data present substantial challenges. From a statistical perspective, the massive size and heterogeneity of the data confound existing computational tools. From a social science perspective, the data represent the manifestations of deeper social relationships. This project bridges the gap between the mathematical/statistical descriptions of these large social network graphs and the underlying social structure. The Bayesian representation of local network structure will build upon related research on network motifs and random graph models, enabling meaningful inference about real-world network processes. The sparse coding approach will allow for the compact and scalable representation of extremely large groups. The project will contribute to related statistical literatures in Bayesian models for social networks, Bayesian model uncertainty, and network inference.

Status of Research
Research Type