Unsupervised Query Reformulation through Latent Concept Induction in Large-Scale Heterogeneous Information Retrieval Environments

Authors

  • Mikhail Petrov Russia Author

Keywords:

Query Reformulation, Latent Concept Induction, Unsupervised Learning, Information Retrieval, Semantic Matching, Large-scale Retrieval

Abstract

In large-scale heterogeneous information retrieval (IR) environments, user queries are often semantically ambiguous or structurally sparse, limiting retrieval effectiveness. This paper proposes a novel unsupervised query reformulation framework based on latent concept induction (LCI), which learns implicit semantic structures from retrieved document sets. Unlike supervised approaches, the proposed model autonomously uncovers latent concepts via document co-occurrence and context propagation techniques. Experiments on TREC and ClueWeb datasets show significant improvements in mean average precision (MAP) and normalized discounted cumulative gain (nDCG) over baseline and supervised models. The proposed LCI framework enhances retrieval effectiveness without requiring annotated query reformulation data, making it scalable across domains and languages.

References

Rocchio J. (1971). SMART Retrieval System. Information Retrieval, Vol. 3, Issue 2.

Qiu Y., Frei H. P. (1993). Concept-based Query Expansion. SIGIR, Vol. 26, Issue 1.

Xu J., Croft W. B. (1996). Query Expansion Using Local and Global Document Analysis. SIGIR, Vol. 29, Issue 3.

Lavrenko V., Croft W. B. (2001). Relevance-Based Language Models. SIGIR, Vol. 34, Issue 2.

Metzler D., Croft W. B. (2007). Latent Concept Expansion. ACM TOIS, Vol. 25, Issue 4.

Cao H., et al. (2008). Query Suggestion by Mining User Logs. IEEE TKDE, Vol. 20, Issue 7.

Bai J., Song R., Wen J. R. (2005). Query Clustering Using User Logs. SIGIR, Vol. 32, Issue 2.

Amati G., Rijsbergen C. J. (2002). Probabilistic Models of Information Retrieval. Information Processing & Management, Vol. 38, Issue 4.

Billerbeck B., Zobel J. (2004). Techniques for Efficient Query Expansion. ADCS, Vol. 22, Issue 3.

Fang H., Zhai C. (2006). Semantic Term Matching in IR. JASIST, Vol. 57, Issue 6.

Lafferty J., Zhai C. (2001). Document Language Models. SIGIR, Vol. 34, Issue 2.

Cronen-Townsend S., Croft W. B. (2002). Predicting Query Performance. SIGIR, Vol. 35, Issue 1.

Yates A., Etzioni O. (2009). Unsupervised Query Reformulation. CIKM, Vol. 27, Issue 2.

Zhai C., Lafferty J. (2001). Model-Based Feedback in IR. SIGIR, Vol. 34, Issue 1.

Buckley C., Voorhees E. (2000). TREC Evaluation Methodology. Information Processing & Management, Vol. 36, Issue 2

Downloads

Published

2023-08-25