
An method to identify topic boundaries using lexical cohesion between sentences was presented by Jeffrey C. Reynar in the paper “An Automatic Method of Finding Topic Boundaries”. The core idea is that when a set of sentences or a passage of text revolves around the same topic, they are more likely to share common vocabulary. One can identify the topic shifts, or boundaries by identifying points where the lexical cohesion drops.
To implement this, the text is divided into blocks of sentences. These blocks are then compared with each other to calculate a cohesion score. The idea is to measure how much vocabulary is shared between adjacent blocks. If two consecutive blocks share a lot of vocabulary, their cohesion score will be high. On the other hand, if they don’t share much vocabulary, the cohesion score will be low, potentially indicating a topic boundary.