Using Jensen-Shannon Divergence to detect narrative regime shifts in daily news corpora [P]

Our take

Detecting narrative regime shifts in daily news corpora presents a challenging yet intriguing problem. My approach employs Jensen-Shannon Divergence (JSD) to analyze unigram and bigram frequency distributions, comparing a rolling 7-day window of news articles against the preceding week. Additionally, JSD assesses shifts in narrative frames, categorizing articles into eight distinct labels. This methodology aims to identify vocabulary and framing changes that may precede sentiment shifts, offering insights into underlying trends.

I've been working on a system that scores AI sector news daily for sentiment, and the sentiment part turned out to be the least interesting problem. The harder question is whether you can detect a narrative shift in a news corpus before it shows up in aggregate scores.

The approach uses JSD in two places. The first is over unigram/bigram frequency distributions of article body text, comparing a rolling 7-day window against the prior 7-day window, with a stop-word list tuned to strip AI and finance boilerplate that would otherwise dominate. The second is over the distribution of narrative frames, where each article gets assigned one of eight labels (Growth Momentum, Financial Results, Regulatory Risk, Geopolitical Risk, Competitive Threat, Market Correction, Technical Breakthrough, Macro Environment) and JSD measures the distributional shift between windows.

The idea in both cases is that new vocabulary and reframing precede sentiment movement. A corpus can maintain positive aggregate sentiment while quietly accumulating regulatory and geopolitical framing, and that shift is visible in the frame distribution before it registers in scores.

What I'm less sure about is the window size question. Tahmasebi et al. treat JSD as the most validated measure of lexical semantic change, but that work is typically over much longer horizons than a 7-day rolling window over a daily news feed. At this granularity the baseline is noisy and I've had to calibrate trigger thresholds empirically from about 60 days of production data, which feels fragile. I'm also not sure whether 8 narrative frames is granular enough to be useful or whether the taxonomy is collapsing distinctions that matter.

Has anyone applied JSD to short-window news corpora specifically? Curious whether there's literature I've missed on appropriate window sizing for this kind of application, or whether there are better distributional distance measures for detecting regime shifts at daily granularity.

The methodology writeup is at https://knowentry.com/semantic-volatility-index/ if useful context.

submitted by /u/RSVPN
[link] [comments]

Tagged with

#generative AI for data analysis#Excel alternatives for data analysis#natural language processing for spreadsheets#rows.com#financial modeling with spreadsheets#big data management in spreadsheets#conversational data analysis#financial modeling#real-time data collaboration#intelligent data visualization#data visualization tools#enterprise data management#big data performance#data analysis tools#data cleaning solutions#Jensen-Shannon Divergence#sentiment analysis#narrative shift#narrative frames#news corpus