Contextual graph model

The QA system uses an ontology developed by the DBPedia project[1]. DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web for a vast number of languages, including Vietnamese.

The DBPedia ontology can also be viewed as a directed acyclic graph model made up of nodes, relationships and properties[2]. Nodes contain properties in the form of key-value pairs; the keys are strings, and the values are arbitrary data types. The database is of size 1.5GB, consisting of one million nodes, 2.5 million links and 7.5 million properties.

EHLAI’s QA system has three key features. First, it can only provide answers to only accounting questions in our knowledge base. Second, the QA system does not use a search engine to retrieve and rank documents but relies on an internal knowledge base. And third, our QA system can retrieve and add data to the knowledge base, resulting in better accuracy (machine learning).

In 2014, Dat Nguyen et al. presented a QA system for Vietnamese, which uses semantic web information to provide answers to user’s queries. Dat Nguyen et al.’s system is rule-based. As reported, the system contained 92 manual rules.


[1] The DBpedia Ontology is provided for download in four parts:

  •  DBpedia Ontology T-BOX (Schema)
  • DBpedia Ontology RDF type statements (Instance Data)
  • DBpedia Ontology other A-Box properties (Instance Data, mapping-based properties)
  • DBpedia Ontology other A-Box specific properties (Instance Data, mapping-based properties (specific))

The DBpedia Ontology can also be queried via the DBpedia SPARQL endpoint and can be explored via the DBpedia Linked Data interface. Examples: Class Place, property elevation.

[2] Since the DBpedia 3.7 release, the ontology is a directed-acyclic graph, not a tree. Classes may have multiple superclasses, which was important for the mappings to schema.org. A taxonomy can still be constructed by ignoring all superclasses except the one that is specified first in the list and is considered the most important.