Search engines provide critical infrastructure for ensuring an efficient and effective information-driven economy in the 21st century, connecting people to whatever sought-after information needle they seek in today’s ever-larger information haystack. To achieve search quality in practice, we must continually revise, extend, and fine-tune search engine algorithms against a ceaseless evolution of ever-more massive information repositories and varied types of content people are searching in practice. This in turn requires accurate, efficient, affordable, and scalable methodology for evaluating the quality of search results. Without such evaluation scaffolding, we cannot even measure the effectiveness of existing search engines, let alone assess future innovations and potential improvements to current state-of-the-art search algorithms.
Unfortunately, the massive scale of information repositories being searched today represents a fundamental challenge to current state-of-the-art methodology for evaluating the quality of search results. Because search engines must effectively support a great multitude of different users, types of information needs, and different ways of articulating these information needs through search queries, search engine evaluation must be conducted over many different queries seeking different types of information. Moreover, we must also conduct experiments with the same scale of information archives being searched in practice, which requires tremendous human labor in judging the relevance of an enormous number of search results for each query. Unfortunately, the manual effort required by judging so many results has become increasingly infeasible. While recent advances in evaluation methodology have greatly reduced the number of human relevance judgments required for accurate evaluation, such judging remains a major scalability bottleneck.
In order to ensure continuing advances in search engine technology, we will investigate a range of techniques for improving the cost, efficiency, and scalability of search engine evaluation. We will focus particularly on Arabic language search (queries and/or documents), including English language as well for comparison purposes. In terms of information being searched, we will focus on providing large datasets out of the Arabic Web and social media. With regard to methodology, we will focus on evaluation techniques requiring minimal or no human judgments. Specifically, we will refine and integrate previously independent lines of prior research on “rank fusion”, “pseudo-test collections”, and “crowdsourcing”. Results are expected to significantly increase the quality and generality of blind evaluation techniques in order to reduce current search engine evaluation cost and time.