<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Poblish Blog &#187; Natural Language Processing</title>
	<atom:link href="http://www.poblish.org/blog/category/technical/natural-language-processing/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.poblish.org/blog</link>
	<description>A 21st Century Tool for Political Bloggers</description>
	<lastBuildDate>Thu, 04 Aug 2011 08:11:22 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>New Article features</title>
		<link>http://www.poblish.org/blog/2011/03/new-article-features/</link>
		<comments>http://www.poblish.org/blog/2011/03/new-article-features/#comments</comments>
		<pubDate>Thu, 24 Mar 2011 09:30:07 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Last.fm]]></category>
		<category><![CDATA[Natural Language Processing]]></category>
		<category><![CDATA[OpenAmplify]]></category>
		<category><![CDATA[Political Blogging]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://www.poblish.org/blog/?p=422</guid>
		<description><![CDATA[Poblish has always provided a &#8220;more articles like this&#8221; facility for every article on the system &#8211; not just related articles from that blog or Twitter feed, but related articles from all blogs and Twitter feeds. This list used to &#8230; <a href="http://www.poblish.org/blog/2011/03/new-article-features/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_blue" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Fwww.poblish.org%252Fblog%252F2011%252F03%252Fnew-article-features%252F%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22New%20Article%20features%22%20%7D);"></div>
<p>Poblish has always provided a &#8220;more articles like this&#8221; facility for every article on the system &#8211; not just related articles from that blog or Twitter feed, but related articles from <strong>all blogs and Twitter feeds</strong>. This list used to appear next to each article, crammed into a column that was always just a little too narrow to make the list truly usable, so I&#8217;ve moved it to a new screen which you can pop up using the big &#8220;Explore&#8230;&#8221; button.<a href="http://www.poblish.org/blog/wp-content/uploads/2011/03/Screen-shot-2011-03-24-at-1.39.05-am.png"><img class="alignright size-full wp-image-425" src="http://www.poblish.org/blog/wp-content/uploads/2011/03/Screen-shot-2011-03-24-at-1.39.05-am.png" alt="Explore button" /></a></p>
<p>We&#8217;ve also restored the &#8220;Similar Bloggers&#8221; facility and put it alongside the list of articles, to help you explore <strong>other bloggers</strong> who deal with similar themes. Finally, if you&#8217;re logged-in, you&#8217;ll find your own individual list of <strong>recommended articles</strong>. This uses the latest <a href="http://en.wikipedia.org/wiki/Collaborative_filtering">collaborative filtering</a> techniques to suggest a list of articles based on your own ratings, flags, favourites, as well as those of people with tastes similar to your own.</p>
<p>Above the Explore button, you&#8217;ll see what looks like a <strong>&#8220;tag cloud&#8221;</strong> for the article. However, what you&#8217;re seeing is much  cleverer than what 99% of other applications offer. We use <a href="http://en.wikipedia.org/wiki/Semantic_analysis_%28linguistics%29">semantic analysis</a> to  determine the article&#8217;s key themes, or &#8220;<a href="http://www.poblish.org/zones/US+Politics">Zones</a>&#8220;, rather than simply relying on the categories the blogger chose; we rank them according to  how often they have been mentioned during the past 24 hours; and we provide a link  to the Zone&#8217;s home page, where you can see &#8211; and follow &#8211; a feed of matching articles.</p>
<p>The point of all this is to seamless weave articles &#8211; whether blog posts or Twitter posts &#8211; into the <strong>greater and wider world </strong>of political content, using state-of-the-art techniques, and to make it easier than ever for people to explore and to learn.</p>

]]></content:encoded>
			<wfw:commentRss>http://www.poblish.org/blog/2011/03/new-article-features/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Poblish and the Semantic Web: progress so far</title>
		<link>http://www.poblish.org/blog/2010/02/poblish-and-the-semantic-web-progress-so-far/</link>
		<comments>http://www.poblish.org/blog/2010/02/poblish-and-the-semantic-web-progress-so-far/#comments</comments>
		<pubDate>Thu, 11 Feb 2010 12:30:40 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Natural Language Processing]]></category>
		<category><![CDATA[OpenAmplify]]></category>
		<category><![CDATA[Political Blogging]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://www.poblish.org/blog/?p=229</guid>
		<description><![CDATA[I mentioned last month that Poblish has been using OpenAmplify&#8216;s semantic/sentiment analysis service to give technology a shot at making sense of the vast sea of content that is the political blogosphere, in such a way as to help policymakers &#8230; <a href="http://www.poblish.org/blog/2010/02/poblish-and-the-semantic-web-progress-so-far/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_blue" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Fwww.poblish.org%252Fblog%252F2010%252F02%252Fpoblish-and-the-semantic-web-progress-so-far%252F%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22Poblish%20and%20the%20Semantic%20Web%3A%20progress%20so%20far%22%20%7D);"></div>
<p>I mentioned last month that <a href="http://www.poblish.org/">Poblish</a> has been using <strong><a href="http://community.openamplify.com/content/aboutus.aspx">OpenAmplify</a></strong>&#8216;s <a href="http://en.wikipedia.org/wiki/Semantic_web#Challenges">semantic</a>/<a href="http://en.wikipedia.org/wiki/Sentiment_analysis">sentiment analysis</a> service to give technology a shot at making sense of the vast sea of content that is the political blogosphere, in such a way as to <strong>help policymakers</strong> make better informed decisions. As I&#8217;ve <a href="http://blog.localdemocracy.org.uk/2010/01/13/poblish-when-crowdsourcing-new-policies-dont-waste-existing-content/">said before</a>:</p>
<blockquote><p>Billions of individual thoughts and personal experiences have been written about, from all conceivable perspectives. No policy process will come up with ideas that have never been thought of before; so existing content represents a knowledge base that should not be ignored</p></blockquote>
<p>In my piece at <a href="http://www.leftfootforward.org/2010/02/poblish-a-new-blog-aggregator/">Left Foot Forward</a>, earlier this week, I imagined a future in which such tools could take a source article and use this content to <em>automatically</em>, <em>dynamically</em> identify <strong>counter-arguments, </strong>hopefully before bad policy is made. Well, we have the content, we know that counter-arguments are out there, some of which may very well not yet have crossed the mainstream media&#8217;s horizon, and we hope &#8211; and believe &#8211; that technology can help us find them.</p>
<p>Only a very small percentage of Poblish&#8217;s articles have so far been semantically analysed (OpenAmplify are very kindly letting us evaluate their software for free, so the number of articles we process is limited), but all <strong>new</strong> articles are &#8211; and for those articles that have them, Poblish is now displaying the results in the page&#8217;s sidebar. Here are the results for the <a href="http://www.poblish.org/poblish2/article.jsp?id=207657">following article</a>.</p>
<p><a href="http://www.poblish.org/blog/wp-content/uploads/2010/02/sentiment_sidebar.png"><img class="aligncenter size-medium wp-image-231" title="sentiment_sidebar" src="http://www.poblish.org/blog/wp-content/uploads/2010/02/sentiment_sidebar-300x118.png" alt="" /></a></p>
<p>The way we display the results is simplistic at best, but essentially what we&#8217;re showing are the main topics from the article, divided into their relevant category, and coloured as follows:</p>
<ul>
<li><span style="color: #0000ff;"><strong>Blue:</strong></span> favourable references (or &#8220;polarity&#8221;). Dark blue for <strong>wholly</strong> positive (never negative), <strong><span style="color: #99ccff;">light blue</span></strong> for generally positive (but occasionally negative).</li>
<li><span style="color: #ff0000;"><strong>Red</strong></span>: unfavourable references. Red for <strong>wholly</strong> positive (never negative), <strong><span style="color: #ff99cc;">pink</span></strong> for generally positive (but occasionally negative).</li>
<li><span style="color: #808080;"><strong>Grey:</strong></span> neutral references, or a mixture of positive and negative ones.</li>
</ul>
<p>Clearly there are successes and failures in the above list. <a href="http://liberalconspiracy.org/">Sunny Hundal</a>&#8216;s name appears as a mere noun, rather than a human name (though I wonder if the fact that his surname was misspelled in the <a href="http://davecole.org/blog/2010/02/09/what-difference-does-political-blogging-really-make-wsitp/">original article</a> is relevant here) and some of the polarities seem a little random.</p>
<p>Bear in mind, though, that each set of results you see was the result of an analysis of <strong>one, single article,</strong> without any context. Give the tool 200,000, however, and we can be certain that insights will start to massively outweigh mistakes. Context is critical, and &#8211; just as we don&#8217;t judge people or texts on the basis of what we objectively see &#8211; semantic applications should not be regarded in isolation, but as part of a vast <strong>network</strong> of humans and machines, using different techniques to identify and weave links between pieces of information, gradually improving our understanding of them.</p>
<p>All in all, the questions I&#8217;m interested in are:</p>
<ol>
<li>Do we believe semantic analysis can work?</li>
<li>Do we believe that it can reveal insights that it would be impractical for human beings to find?</li>
<li>Do we believe that <em>those</em> insights might be <em>just the ones we need</em>?</li>
<li>Is it worth us investing more in such solutions?</li>
</ol>
<p>I&#8217;d offer a <strong>yes</strong> to each of those questions, and have had a lot of fun evaluating <a href="http://community.openamplify.com/content/aboutus.aspx">OpenAmplify</a>, but: what do you think?</p>

]]></content:encoded>
			<wfw:commentRss>http://www.poblish.org/blog/2010/02/poblish-and-the-semantic-web-progress-so-far/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Taking &#8216;Possibly Related Posts&#8217; to the next level</title>
		<link>http://www.poblish.org/blog/2010/01/taking-possibly-related-posts-to-the-next-level/</link>
		<comments>http://www.poblish.org/blog/2010/01/taking-possibly-related-posts-to-the-next-level/#comments</comments>
		<pubDate>Tue, 19 Jan 2010 09:30:16 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Natural Language Processing]]></category>
		<category><![CDATA[Policy-making]]></category>
		<category><![CDATA[Political Blogging]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Stemming]]></category>
		<category><![CDATA[Stopwords]]></category>
		<category><![CDATA[WordPress]]></category>

		<guid isPermaLink="false">http://www.poblish.org/blog/?p=142</guid>
		<description><![CDATA[Many WordPress bloggers use plugins like these to help people who read their posts find other, related posts. That&#8217;s all well and good if you only want to help readers find your own articles, but perhaps other political bloggers have &#8230; <a href="http://www.poblish.org/blog/2010/01/taking-possibly-related-posts-to-the-next-level/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_blue" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Fwww.poblish.org%252Fblog%252F2010%252F01%252Ftaking-possibly-related-posts-to-the-next-level%252F%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22Taking%20%27Possibly%20Related%20Posts%27%20to%20the%20next%20level%22%20%7D);"></div>
<p>Many WordPress bloggers use <a href="http://www.google.co.uk/search?q=wordpress+related+posts+plugin">plugins like these</a> to help people who read their posts find other, <em>related</em> posts.</p>
<p>That&#8217;s all well and good if you only want to help readers find <strong>your own</strong> articles, but perhaps other political bloggers have made your own point better than you have? Let&#8217;s now turn that round: perhaps you&#8217;ve made another blogger&#8217;s point better than he has? Wouldn&#8217;t it be great if there was a <strong>Related Posts service </strong>that let people follow links to similar content from one blog to another, irrespective of who wrote it and where you started reading?</p>
<p><a href="http://www.poblish.org/blog/wp-content/uploads/2010/01/Screen-shot-2010-01-18-at-10.59.28-pm.png"><img class="alignright size-medium wp-image-154" title="Screen shot 2010-01-18 at 10.59.28 pm" src="http://www.poblish.org/blog/wp-content/uploads/2010/01/Screen-shot-2010-01-18-at-10.59.28-pm-300x224.png" alt="" width="250" /></a><a href="http://www.poblish.org/">Poblish</a> offers just this. Simply open <a href="http://www.poblish.org/poblish2/article.jsp?id=171818">a blog post</a> from one of the 1500 feeds we monitor, and you&#8217;ll see a list of similar posts, ranked by similarity, from across <strong>all</strong> of those feeds.</p>
<p>If that wasn&#8217;t cool enough, the list of related posts <strong>updates automatically</strong>. So, if you create a <a href="http://www.poblish.org/new_article.jsp">new, collaborative article</a> with us, you can watch the list update as your work progresses &#8211; literally as you type. That&#8217;s very useful &#8211; perhaps you make a particular point, then some articles appear that strongly refute that point. You might then reconsider, delete your last paragraph, and move on in a different vein.</p>
<p>I believe that tools like this are an essential part of making the political blogosphere <a href="http://blog.localdemocracy.org.uk/2010/01/13/poblish-when-crowdsourcing-new-policies-dont-waste-existing-content/"><strong>a knowledge base</strong></a>, that can not only improve political blogging, but also improve policy-making.</p>
<p>&#8230;</p>
<p>(I should add, as an aside, that all these services essentially use <a href="http://en.wikipedia.org/wiki/Inverse_document_frequency">Inverse Document Frequency</a> algorithms. Here&#8217;s a <a href="http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/">worked example</a>. They can work very well &#8211; Poblish&#8217;s especially, I hope, as our algorithm uses <a href="http://snowball.tartarus.org/">stemming</a> and <a href="http://en.wikipedia.org/wiki/Stopwords">stopwords</a> &#8211; but there&#8217;s no attempt by the computer to <strong>understand</strong> the text, or the context, so there will inevitably be howlers. These are <strong>not</strong> <a href="http://www.poblish.org/blog/?p=129">semantic solutions</a> of the type I mentioned yesterday, but don&#8217;t worry: we have big plans in that area.)</p>

]]></content:encoded>
			<wfw:commentRss>http://www.poblish.org/blog/2010/01/taking-possibly-related-posts-to-the-next-level/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

