Manish Purohit† manishp@cs.umd.edu B. Aditya Prakash⇤ badityap@cs.vt.edu Chanhyun Kang† chanhyun@cs.umd.edu V. S. Subrahmanian† vs@cs.umd.edu Yao Zhang⇤ yaozhang@cs.vt.edu Computer Science Department, Virginia Tech., USA

Department of Computer Science, University of Maryland - College Park, USA

⇤

†

ABSTRACT

sheer size of today’s large social networks makes it challenging to perform sophisticated network analysis.

Given a propagation graph, possibly learnt from cascade analysis, is it possible to get a smaller nearly di↵usionequivalent representation for it? Getting a smaller equivalent graph will help multiple algorithmic and data mining tasks like inﬂuence maximization, immunization, understanding cascade data and data compression. In this paper, we study a novel graph coarsening problem with the aim of approximating a large social network by a much smaller graph that approximately preserves the network structure.

Our primary goal is to ﬁnd a compact representation of a large graph such that di↵usion and propagation processes on the large graph can be studied by analyzing the smaller representation. Intuitively, most of the edges in a real network are relatively unimportant; hence we propose characterizing and “contracting” precisely such edges in a graph to obtain a coarse representation.

The main contributions of this paper are:

Given a social network, can we quickly ‘zoom-out’ of the graph? Is there a smaller equivalent representation of the graph that preserves its propagation characteristics? Can we group nodes together based on their inﬂuence properties?

These are important problems with applications to inﬂuence analysis, epidemiology and viral marketing applications.

In this paper, we ﬁrst formulate a novel Graph Coarsening

Problem to ﬁnd a succinct representation of any graph while preserving key characteristics for di↵usion processes on that graph. We then provide a fast and e↵ective near-linear-time

(in nodes and edges) algorithm coarseNet for the same.

Using extensive experiments on multiple real datasets, we demonstrate the quality and scalability of coarseNet, enabling us to reduce the graph by 90% in some cases without much loss of information. Finally we also show how our method can help in diverse applications like inﬂuence maximization and detecting patterns of propagation at the level of automatically created groups on real cascade data.

(a) Problem Formulation: We carefully formulate a novel

Graph Coarsening Problem (GCP) to ﬁnd a succinct representation of a given social network so that the di↵usion characteristics of the network are mostly preserved.

(b) E cient Algorithms: We develop coarseNet, an efﬁcient (near-linear time) and e↵ective algorithm for

GCP, using careful approximations. We show that due to our novel scoring technique, the coarsened graph retains most of the di↵usive properties of the original network. (c) Extensive Experiments: We show that coarseNet is able to coarsen graphs up to 90% without much loss of key information. We also demonstrate the usefulness of our approach via a number of interesting applications. A major application we consider in this work is that of inﬂuence maximization in the Independent Cascade model. We propose a framework cspin that involves coarsening the graph and then solving inﬂuence maximization on the smaller graph to obtain high quality solutions. As the coarsened graph is much smaller than the original graph, the inﬂuence maximization algorithm runs orders of magnitude faster on the coarsened graph. Further using real cascade data from Flixster, we show how GCP can potentially help in understanding propagation data and constructing non-network surrogates for ﬁnding nodes with similar inﬂuence. Categories and Subject Descriptors

H.2.8 [Database Management]: Database Applications—

Data Mining

Keywords

Graph Mining;