Abstract: We address the problem how to select the correct answers to a query from among the partially incorrect answer sets that result from querying the Web of Data. Our hypothesis is that cognitively inspired similarity measures can be exploited to filter the correct answers from the full set of answers. These measure are extremely simple and efficient when compared to those proposed in the literature, while still producing good results.
We validate this hypothesis by comparing the performance of our heuristic to human-level performance on a benchmark of queries to Linked Open Data resources. In our experiment, the cognitively inspired similarity heuristic scored within 10% of human performance. This is surprising given the fact that our heuristic is extremely simple and efficient when compared to those proposed in the literature.
A secondary contribution of this work is a freely available benchmark of 47 queries (in both natural language and SPARQL) plus gold standard human answers for each of these and 1896 SPARQL answers that are human-ranked for their quality.
Keywords: Semantic web, information retrieval, Linked Open Data (LOD), cluster heuristic, semantic similarity, RDF, LarKC.
Related: Cluster heuristic | LarKC deliverables
Resources: Download PDF | Google Scholar