
The idea of extracting key paragraphs to form a summary was presented by Mitra et al in the paper “Automatic Text Summarization by Paragraph Extraction”, in 2000. The key idea is to represent each paragraph using a vector, where each element corresponds to a word within that paragraph. For every pair of paragraphs, calculate a similarity score based on their vectors. This score is derived from the dot product of the vectors representing the respective paragraphs. Identify the top paragraphs with high similarity score. Establish a threshold for the similarity score, and mark all paragraphs exceeding this threshold as ‘connected’. Identify the top N most connected paragraphs and arrange them in the sequence they occur in the original text to produce the summarized extract.