Part 2: Data
I will start with the raw data (documents) that we need to extract features. Feature extraction is necessary to convert the raw data into a normalized data (convert to a number, numerical value, or vector) form that the ML algorithm can understand.
- Classify documents – What is the document type? Who is the collector for this document (to)? Who generated the document (from)? A team of three classifiers does this classification.
- Create a taxonomy of the document. Is there a document number or description? Is there an account number? Is there a payment? Is there a tax amount? Is there an interest? Is there a penalty? The taxonomy defines what is on this type of document.
- Extract the required data from the document. ELHAi now understands what the document is and what’s on it so that the chatbot can ask my staff for those specific information.
My focus is on helping my staff enhancing the client’s experience while collecting all relevant data with minimal human input.