Title | : | Unsupervised Enrichment of Knowledge Graph Schemas |
Speaker | : | Subhashree S (IITM) |
Details | : | Wed, 28 Jun, 2023 11:45 AM @ MR - I (SSB 233) |
Abstract: | : | Knowledge graphs which are representations of information as a semantic graph have a wide-spread impact in both the industrial and academic worlds. Knowledge graphs are considered to be promising tools for accomplishing many tasks such as question answering, recommendation, information retrieval, etc. due to their ability to store semantically structured information. The ontology associated with a knowledge graph can be considered to have two parts: the terminological box (T-Box) and the assertion box (A-Box). The A-Box part refers to assertions about individual entities whereas the T-Box part contains statements about the entity classes, the class hierarchy, and others. In most cases, since the A-Box of a knowledge graph is populated using some automatic means, its size is much larger when compared to the size of the T-Box which is mostly manually curated. Hence completely automated techniques to enrich the T-Box of the knowledge graphs so that they cater better to the needs of the application, have become the need of the hour. To this end, in this thesis, we have proposed two unsupervised systems named DARO and DOPLEX for the property and property axiom enrichment of a knowledge graph respectively. Given a pair of classes, DARO discovers new object properties between them along with their instances. DARO works by identifying text patterns from the web corpus that can potentially represent relations between individuals. These text patterns are then clustered based on their semantic similarities and a representative pattern is picked from each cluster to be suggested as a new object property to the ontology engineer. DARO has been built as a recall-oriented system and is seen to be performing better than newOntExt, which is an offshoot of the popular NELL project. Given a knowledge graph, DOPLEX finds the disjoint object property pairs in its schema. It does so by using Probabilistic Soft Logic (PSL) to determine whether the property names imply disjointness in addition to the traditional method of checking for common triples. Our evaluation demonstrates that the proposed approach discovers disjoint property pairs with better precision when compared to the state-of-the-art system, when tested on knowledge graphs that are auto-extracted from large text corpora. We have also proposed Temp-DOPLEX which attempts to find potential temporally non-disjoint object property pairs in a schema. We also made several attempts to address the question of finding the right (sequence of) input class-pairs to be fed to DARO. We incorporated our findings as multiple criteria and fed them into the standard multi-criterion approach TOPSIS in order to find the most suitable class-pairs which could be potentially connected by object properties. Through our experimental evaluation on three popular knowledge graphs, we can see that the proposed approach yields promising recommendations for the class-pairs that can be fed as input to systems such as DARO. Web Conference Link :https://meet.google.com/yah-equu-eki |