Blog By: Priyanka Rana
Natural language Generation
Natural language generation (NLG) is a software process that produces natural language output. A widely-cited survey of NLG methods describes NLG as “the subfield of artificial intelligence and computational linguistics that is concerned with the construction of computer systems than can produce understandable texts in English or other human languages from some underlying non-linguistic representation of information”.
While it is widely agreed that the output of any NLG process is text, there is some disagreement about whether the inputs of an NLG system need to be non-linguistic. Common applications of NLG methods include the production of various reports, for example weather and patient reports; image captions; and chatbots.
Stages
The process to generate text can be as simple as keeping a list of canned text that is copied and pasted, possibly linked with some glue text. The results may be satisfactory in simple domains such as horoscope machines or generators of personalized business letters. However, a sophisticated NLG system needs to include stages of planning and merging of information to enable the generation of text that looks natural and does not become repetitive. The typical stages of natural-language generation, as proposed by Dale and Reiter,are:
Content determination: Deciding what information to mention in the text. For instance, in the pollen example above, deciding whether to explicitly mention that pollen level is 7 in the south east.
Document structuring: Overall organisation of the information to convey. For example, deciding to describe the areas with high pollen levels first, instead of the areas with low pollen levels.
Aggregation: Merging of similar sentences to improve readability and naturalness. For instance, merging the two following sentences:
- Grass pollen levels for Friday have increased from the moderate to high levels of yesterday and
- Grass pollen levels will be around 6 to 7 across most parts of the country
into the following single sentence:
- Grass pollen levels for Friday have increased from the moderate to high levels of yesterday with values of around 6 to 7 across most parts of the country.
Lexical choice: Putting words to the concepts. For example, deciding whether medium or moderate should be used when describing a pollen level of 4.
Referring expression generation: Creating referring expressions that identify objects and regions. For example, deciding to use in the Northern Isles and far northeast of mainland Scotland to refer to a certain region in Scotland. This task also includes making decisions about pronouns and other types of anaphora.
Realization: Creating the actual text, which should be correct according to the rules of syntax, morphology, and orthography. For example, using will be for the future tense of to be.
An alternative approach to NLG is to use “end-to-end” machine learning to build a system, without having separate stages as above. In other words, we build an NLG system by training a machine learning algorithm (often an LSTM) on a large data set of input data and corresponding (human-written) output texts. The end-to-end approach has perhaps been most successful in image captioning, that is automatically generating a textual caption for an image.