You are browsing the archive for Natural Language Processing.

by admin

Poblish and the Semantic Web: progress so far

12:30 pm in Natural Language Processing, OpenAmplify, Political Blogging, Semantic Web, Technical by admin

I mentioned last month that Poblish has been using OpenAmplify‘s semantic/sentiment analysis service to give technology a shot at making sense of the vast sea of content that is the political blogosphere, in such a way as to help policymakers make better informed decisions. As I’ve said before:

Billions of individual thoughts and personal experiences have been written about, from all conceivable perspectives. No policy process will come up with ideas that have never been thought of before; so existing content represents a knowledge base that should not be ignored

In my piece at Left Foot Forward, earlier this week, I imagined a future in which such tools could take a source article and use this content to automatically, dynamically identify counter-arguments, hopefully before bad policy is made. Well, we have the content, we know that counter-arguments are out there, some of which may very well not yet have crossed the mainstream media’s horizon, and we hope – and believe – that technology can help us find them.

Only a very small percentage of Poblish’s articles have so far been semantically analysed (OpenAmplify are very kindly letting us evaluate their software for free, so the number of articles we process is limited), but all new articles are – and for those articles that have them, Poblish is now displaying the results in the page’s sidebar. Here are the results for the following article.

The way we display the results is simplistic at best, but essentially what we’re showing are the main topics from the article, divided into their relevant category, and coloured as follows:

  • Blue: favourable references (or “polarity”). Dark blue for wholly positive (never negative), light blue for generally positive (but occasionally negative).
  • Red: unfavourable references. Red for wholly positive (never negative), pink for generally positive (but occasionally negative).
  • Grey: neutral references, or a mixture of positive and negative ones.

Clearly there are successes and failures in the above list. Sunny Hundal‘s name appears as a mere noun, rather than a human name (though I wonder if the fact that his surname was misspelled in the original article is relevant here) and some of the polarities seem a little random.

Bear in mind, though, that each set of results you see was the result of an analysis of one, single article, without any context. Give the tool 200,000, however, and we can be certain that insights will start to massively outweigh mistakes. Context is critical, and – just as we don’t judge people or texts on the basis of what we objectively see – semantic applications should not be regarded in isolation, but as part of a vast network of humans and machines, using different techniques to identify and weave links between pieces of information, gradually improving our understanding of them.

All in all, the questions I’m interested in are:

  1. Do we believe semantic analysis can work?
  2. Do we believe that it can reveal insights that it would be impractical for human beings to find?
  3. Do we believe that those insights might be just the ones we need?
  4. Is it worth us investing more in such solutions?

I’d offer a yes to each of those questions, and have had a lot of fun evaluating OpenAmplify, but: what do you think?

by admin

Taking ‘Possibly Related Posts’ to the next level

9:30 am in Natural Language Processing, Policy-making, Political Blogging, Semantic Web, Stemming, Stopwords, WordPress by admin

Many WordPress bloggers use plugins like these to help people who read their posts find other, related posts.

That’s all well and good if you only want to help readers find your own articles, but perhaps other political bloggers have made your own point better than you have? Let’s now turn that round: perhaps you’ve made another blogger’s point better than he has? Wouldn’t it be great if there was a Related Posts service that let people follow links to similar content from one blog to another, irrespective of who wrote it and where you started reading?

Poblish offers just this. Simply open a blog post from one of the 1500 feeds we monitor, and you’ll see a list of similar posts, ranked by similarity, from across all of those feeds.

If that wasn’t cool enough, the list of related posts updates automatically. So, if you create a new, collaborative article with us, you can watch the list update as your work progresses – literally as you type. That’s very useful – perhaps you make a particular point, then some articles appear that strongly refute that point. You might then reconsider, delete your last paragraph, and move on in a different vein.

I believe that tools like this are an essential part of making the political blogosphere a knowledge base, that can not only improve political blogging, but also improve policy-making.

(I should add, as an aside, that all these services essentially use Inverse Document Frequency algorithms. Here’s a worked example. They can work very well – Poblish’s especially, I hope, as our algorithm uses stemming and stopwords – but there’s no attempt by the computer to understand the text, or the context, so there will inevitably be howlers. These are not semantic solutions of the type I mentioned yesterday, but don’t worry: we have big plans in that area.)