Luhn’s Algorithm To Generate Abstracts

Here is a simplified explanation of Luhn’s algorithm to generate abstracts from a text. This is based on his paper titled “Automatic Creation of Literature Abstracts” from 1958.

Find out the frequency of occurrence of each word in the document, and order the words by the same. Pick a top cutoff and get rid of all words that appear more frequently that the top cutoff. This will get rid of common words like “a” and “the.” Pick a bottom cutoff and get rid of all words that appear less frequently than the bottom cutoff. This will get rid of unimportant words. Write down the words that are left. These words are important.

For each sentence in the text, look for groups of important words. A group is made of important words that have at least four unimportant words between them. Using the number of important words and the total number of words in the sentence, compute a “score” for the sentence. This score tells you how important the sentence is. Pick the sentences with the highest score. These sentences will give you the abstract.