Natural language – Why would you need to process it?
We’ve been encoding knowledge in natural language for thousands of years, especially in written form such as in books, white papers, technical documentation, and legal texts. Yet remarkably, in this era of progressively advancing digitalization, computer-assisted automation, and leveraging artificial intelligence to support decision-making, machines (software) still aren’t capable of understanding this knowledge. Machines cannot understand intrinsic knowledge, subtexts, or the overall context. The concept of software-supported processing for natural language in order to analyze knowledge-intensive texts and at least create the impression that the software understands the content is known as natural language processing, or NLP for short. As such, at its core, NLP deals with nothing less than the totality of human verbal and written communication.
In this article, I speak to both the fundamentals as well as to potential future NLP applications in cloud computing, particularly in the areas of document management and business process modelling. In upcoming blog posts, I’ll go into more detail about what this can mean for document management systems and workflow engines like Fabasoft Contracts or Fabasoft Approve.
State of the art
Machines still aren’t able to understand natural language the same way we do. Nonetheless, today’s machines are already well on their way to achieving that goal! Modern systems need to be able to recognize the meaning of a question coded in natural language and to answer it by analyzing texts that are likewise coded in natural language and within a pre-trained framework. Popular examples of this functionality have been the focus of a great deal of media attention. But NLP is nothing to trifle with and the current systems soon come up against their limits.
It sounds tempting to believe that you can simply set up an algorithm and then just “ask the machine.” That’s not (yet) the case in reality, however. A software program has to start with tagged texts and to be trained by means of supervised machine learning. This can be accomplished, for instance, by following these nine steps:
- Sentence segmentation
- Word tokenization
- Identifying parts of speech for the tokens
- Word and term lemmatization
- Identifying stop words
- Dependency parsing
- Identifying noun phrases
- Named-entity recognition (NER)
- Coreference resolution
It’s clear that going through this process even for a single given language like German or English, not to mention the language’s dialects and other variations, is a herculean task. And the results would merely make up the basis for further work with this language. After all, working through all nine steps doesn’t tell us anything about the specific (technical) domain we want to explore or the context that needs to be defined with NLP. Words have different meanings when used, say, in the nautical sector as opposed to the financial world. Fortunately, these nine steps are the current state of the art, and we can work with various algorithms and development environments. Unfortunately, however, to exploit the power of NLP, we have to set up the domain-specific knowledge correctly and train the software system as outlined above. This is where the research in the field of “knowledge engines” comes in.
Knowledge engines
Is it possible to map expert knowledge and human competencies one-to-one in software? The answer to that is yes; today we are witnessing the automation of legal transactions through digital transformation, products are being created by algorithms, and processes are capable of optimizing themselves. What’s more, this is increasingly happening both across national boundaries and across different businesses and industries. When companies are able to extract, collect, structure, and understand data and knowledge from natural language, this opens up entirely new ways of automatically assisting humans in our everyday work and decision-making.
Such approaches have already gained recognition through big data initiatives and ideas such as predictive maintenance for production machinery. But with NLP, it’s not about analyzing massive amounts of machine generated data in real time. It’s about automatically analyzing knowledge from a very specific sector and comparing that knowledge with, for example, the contents of a legally binding contractual document such as a confidentiality agreement or a supplier contract. That would lead to a completely new field in knowledge management and assistance systems – knowledge engines.
Outlook – What lies ahead
Now that we’ve talked about the basics of NLP, I want to take a brief look into the future. What will be possible when NLP and knowledge engines join forces? First and foremost, it’s about support – assistance systems for people. In a world that’s growing more complicated by the day – jam-packed with digital markets and decisions to be made, flooded with innumerable repetitive tasks that need to be managed – one requirement placed on digitalization rings clear: to optimize and automate processes. With the objective of eliminating chaos and boosting efficiency.
Our research focus is to equip people with decision support tools in daily business that, depending on the situation, provide a kind of “integrated decision support system” (IDSS) in the background through methods, models, and mechanisms.