Using an Inverted Index Synopsis for Query Latency and Performance Prediction

Abstract

Predicting the query latency by a search engine has important benefits, for instance, by allowing the search engine to adjust its configuration to address long-running queries without unnecessarily sacrificing its effectiveness. However, for the dynamic pruning techniques that underlie many search engines, achieving accurate predictions of query latencies is difficult. In this talk I will discuss how index synopses – which are stochastic samples of the full index – can be used for attaining accurate timings. Experiments using the TREC ClueWeb09 collection, and a large set of user queries, show that using a small random sample it is possible to very accurately estimate properties of the larger index, including sizes of posting list unions and intersections. I will also show that index synopses facilitate two use cases: (i) predicting the query latencies on the full index and classifying long-running queries can be accurately achieved using index synopses; (ii) the effectiveness of queries can be estimated more accurately using a synopsis index post-retrieval predictor than a pre-retrieval predictor. This work is partially supported by the Italian Ministry of Education and Research (MIUR) in the framework of the CrossLab project (Departments of Excellence).

Date
24 Apr 2020 11:00
Event
CS Colloquium
Location
Georgetown University
3700 O St NW
20057 Washington

Related