Topic Model based Privacy Protection in Personalized Web Search


Modern search engines utilize users' search history for personalization, which provides more effective, useful and relevant search results. However, it also has the potential risk of revealing users' privacy by identifying their underlying intention from their logged search behaviors. To address this privacy issue, we proposed a Topic-based Privacy Protection solution on client side. In our solution, each user query will be submitted with k additional cover queries, which will act as a proxy to disguise users' intent from a search engine. The set of cover queries are generated in a controlled way so that each query carries similar uncertainty to randomize a user's search history while still providing necessary utility for the search engine to perform personalization. We used statistical topic models to infer topics from the original user query and generated cover queries of similar entropy but from unrelated topics. Extensive experiments are performed on AOL search log and the promising results demonstrated the effectiveness of our solution.

The 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’2016), p1025-1028, 2016.