Active Associative Sampling for Author Name Disambiguation
     
Marcos Goncalves; Rodrigo Silva; Anderson Ferreira; Adriano Veloso; Alberto Laender

 ABSTRACT
One of the hardest problems faced by current scholarly digital libraries is author name ambiguity. This problem occurs when, in a set of citation records, there are records of a same author under distinct names, or citation records belonging to distinct authors with similar names. Among the several proposed methods, the most effective ones seem to be based on the direct assignment of the records to their respective authors by means of the application of supervised machine learning techniques. The effectiveness of such methods is usually directly correlated with the amount of supervised training data available. However, the acquisition of training examples requires skilled human annotators to manually label references. Aiming to reduce the set of examples needed to produce the training data, in this paper we propose a new active sampling strategy based on association rules for the author name disambiguation task. We compare our strategy with state-of-the-art supervised baselines that use the complete labeled training dataset and other active methods and show that very competitive results in terms of disambiguation effectiveness can be obtained with reductions in the training set of up to 71%.

 

AckSeer: A Repository and Search Engine for Automatically Extracted Acknowledgments from Digital Libraries
Madian Khabsa; Pucktada Treeratpituk; C. Lee Giles

ABSTRACT
Acknowledgments are widely used in scientific articles to express gratitude and credit collaborators. Despite suggestions that indexing acknowledgments automatically will give interesting insights [9], there is currently, to the best of our knowledge, no such system to track acknowledgments and index them 1 In this paper we introduce AckSeer, search engine and a repository for automatically extracted acknowledgments in digital libraries. AckSeer is a fully automated system that scans items in digital libraries including conference papers, journals, and books extracting acknowledgment sections and identifying acknowledged entities mentioned within.

We describe the architecture of AckSeer and discuss the extraction algorithms that achieve a F1 measure above 83%. We use multiple Named Entity Recognition (NER) tools and propose a method for merging the outcome from different recognizers. The resulting entities are stored in a database then made searchable by adding them to the AckSeer index along with the metadata of the containing paper/book. We buildAckSeer on top of the documents in CiteSeerx digital library yielding more than 500,000 acknowledgments and more than 4 million mentioned entities.