FAQ in Vietnamese

Question and answers (Q&A) in natural language processing research and give insights on the human-machine interface. The ultimate goal of any QA system is to provide a concise and exact answer to a question asked in a natural language. Most research relates to finding answers to open-domain questions by searching a large collection of documents like Wikipedia. Unlike Internet search engines, open-domain QA systems provide short, relevant answers to questions.

Open-domain QA is classified into two categories, semantic parsing and information retrieval.

  • Semantic parsing interprets the meaning of a question by semantic analysis. The correct interpretation converts the question into an exact database query that returns a correct answer.
  • Information retrieval converts the question into a query, then retrieve a set of answers by querying a corpus and a knowledge base.

Both categories require human expertise to tune the lexicons, grammars and knowledge bases.

The information retrieval is what I used because advanced syntactic and semantic parsers for Vietnamese are not readily available. Furthermore, the building of a QA system cannot scale because again, most research is for English as the natural language.

I will attempt to build a QA system using Dialogflow and the help of several scientific papers written by Vietnamese scientists. EHLAI’s QA system for the Vietnamese language will combine both statistical models and knowledge-based methods.

Our question answering system can only answer accounting related questions and not a wide range of general knowledge questions.