To Better Stand on the Shoulder of Giants: Learning to Identify Potentially Influential Literature
Rui Yan

(Nominated for Best Student Paper)

Usually scientists breed research ideas inspired by previous publications, but they are unlikely to follow all papers in the unbounded literature collection: scientific research evolves at an unprecedented speed. The volume of literature publications keeps on expanding extremely fast, whilst not all papers contribute equal influence to the academic society. Being aware of potentially influential litera ture would put one in an advanced position in choosing important research references. Hence, estimation of potential influence is of great significance and is challenging. We study an interesting problem of identifying potentially influential literature. We examine a set of hypotheses on what are the fundamental characteristics for highly cited papers and find some interesting patterns. Based on these observations, we learn to identify potentially influential literature via Future Influence Prediction (FIP), which aims to estimate the influential degree of each literature in the future. The system takes a series of features of a particular publication as input and produces as output the estimated citation counts of that article after a given time period. We consider several regression models to formulate the learning process and evaluate their performance based on the coefficient of determination (R2). Experimental results on a real-large data set show a mean average predictive performance of 83.6% measured in R2, which significantly outperforms alternative algorithms. We apply the learned model to the application of bibliography recommendation and obtain prominent performance improvement in terms of Mean Average Precision (MAP).


BibRank: a Language-Based Model for Co-Ranking Entities in Bibliographic Networks      
Laure Soulier; Lamjed Ben Jabeur; Lynda Tamine; Wahiba Bahsoun

Bibliographic documents are basically associated with many entities including authors, venues, affiliations, etc. While bibliographic search engines addressed mainly relevant document ranking according to a query topic, ranking other related relevant bibliographic entities is still challenging. Indeed, document relevance is the primary level that allows inferring the relevance of the other entities regardless of the query topic. In this paper, we propose a novel integrated ranking model, called BibRank, that aims at ranking both document and author entities in bibliographic networks. The underlying algorithm propagates entity scores through the network by means of citation and authorship links. Moreover, we propose to weight these relationships using content-based indicators that estimate the topical relatedness between entities. In particular, we estimate the common similarity between homogeneous entities by analyzing marginal citations. We also compare document and author language models in order to evaluate the level of author’s knowledge on the document topic and the document representativeness of author’s knowledge. Experiment results on the representative CiteSeerX dataset show that BibRank model outperforms state-of-the-art ranking models with a significant improvement.


Modeling and Exploiting Heterogeneous Bibliographic Networks for Expertise Ranking
Hongbo Deng; Jiawei Han; Michael R. Lyu; Irwin King

Nominated for Vannevar Bush Best Paper

Recently expertise retrieval has received increasing interests in both academia and industry. Finding experts with demonstrated expertise for a given query is a nontrivial task especially from a large-scale Web 2.0 systems, such as question answering and bibliography data, where users are actively publishing useful content online, interacting with each other, and forming social networks in various ways, leading to heterogeneous networks in addition to the large amounts of textual content information. Many approaches have been proposed and shown to be useful for expertise ranking. However, most of these methods only consider the textual documents while ignore heterogeneous network structures or can merely integrate with one additional kind of information.

None of them can fully exploit the characteristics of heterogeneous networks. In this paper, we propose a joint regularization framework to enhance expertise retrieval by modeling heterogeneous networks as regularization constraints on top of document-centric model. We argue that multi-typed linking edges reveal valuable information which should be treated differently. Motivated by this intuition, we formulate three hypotheses to capture unique characteristics for different graphs, and mathematically model those hypotheses jointly with the document and other information. To illustrate our methodology we apply the framework to expert finding applications using a bibliography dataset with 1.1 million papers and 0.7 million authors. The experimental results show that our proposed approach can achieve significantly better results than the baseline and other enhanced models.