Improving Multi-Faceted Book Search by Incorporating Sparse Latent Semantic Analysis of Click-Through Logs     
Deng Yi; Yin Zhang; Haihan Yu; Yanfei Yin; Jing Pan; Baogang Wei

Multi-faceted book search engine presents diverse category style options to allow users to refine search results without re-entering a query. In this paper, we propose a novel multi-faceted book search engine that utilizes users’ query related latent intents mined from click-through logs as multiple facets for books. The latent query intents can be effectively and efficiently discovered by applying the Sparse Latent Semantic Analysis (LSA) model to users’ query and clicking behaviors in the click-through logs. This paper presents the details to improve the multi-faceted book search by incorporating the compact representation of query-intentbook relationships generated by Sparse LSA into the off-line and online processing procedures. The specificality of latent query intents can be flexibly changed by adjusting the sparsity level of projection matrix in the Sparse LSA model. We evaluated our approach on CADAL click-through logs containing 45,892 queries and 164,822 books. The experimental results show the Sparse LSA model with more sparse projection matrix tends to discover the more specific latent query intents. The latent query intents suggested by our approach usually gain the high user satisfaction ratio.

Exploiting Real-Time Information Retrieval in Microblogosphere 
Feng Liang; Runwei Qiang; Jianwu Yang

Information seeking behavior in microblogging environments such as Twitter differs from traditional web search. The best performing microblog retrieval techniques attempt to utilize both semantic and temporal aspects of documents. In this paper, we present an effective approach, including the query modeling, the document modeling and the temporal re-ranking component, to discover the most recent but relevant information to the query. For the query modeling, we introduce a two stage pseudo-relevance feedback query expansion to overcome the severe vocabulary-mismatch problem of short message retrieval in microblog. For the document modeling, we propose two ways to expand document with the help of the shortened URL. For the temporal reranking component, we suggest several methods to evaluate the temporal aspects of documents. Experimental results demonstrate that our approach obtains significant improvements compared with baseline systems. Specifically, the proposed system gives 26.37% and 9.94% further increases in P@30 and MAP over the best performing result on highrel in the TREC'11 Real-Time Search Task.