Using an Inverted Index Synopsis for Query Latency and Performance Prediction


Predicting the query latency by a search engine has important benefits, for instance, by allowing the search engine to adjust its configuration to address long-running queries without unnecessarily sacrificing its effectiveness. However, for the dynamic pruning techniques that underlie many search engines, achieving accurate predictions of query latencies is difficult. In this talk I will discuss how index synopses – which are stochastic samples of the full index – can be used for attaining accurate timings. Experiments using the TREC ClueWeb09 collection, and a large set of user queries, show that using a small random sample it is possible to very accurately estimate properties of the larger index, including sizes of posting list unions and intersections. I will also show that index synopses facilitate two use cases: (i) predicting the query latencies on the full index and classifying long-running queries can be accurately achieved using index synopses; (ii) the effectiveness of queries can be estimated more accurately using a synopsis index post-retrieval predictor than a pre-retrieval predictor. This work is partially supported by the Italian Ministry of Education and Research (MIUR) in the framework of the CrossLab project (Departments of Excellence).

28 Sep 2020 17:30
IR Glasgow Seminars
School of Computing Science, University of Glasgow
18 Lilybank Gardens
G12 8RZ Glasgow