You are browsing the archive for Technical.

by admin

Visualising political content with Wordle

5:54 pm in Aggregation, Blogging, Data visualisation, Political Blogging, Wordle by admin

I’ve been inspired by Leigh Caldwell‘s Economics Zeitgeist word clouds to hook Poblish up to the wonderful Wordle.

Now you can visualise any Poblish feed with just a single click.

So, here’s Wordle’s visualisation of our most recent incoming articles from the past two days (click for full-size version).

Here’s the results for a group, e.g. our US Political bloggers (past 4 days of activity)

Some other feeds you can try:


All images created by the Wordle.net web application are licensed under a Creative Commons Attribution 3.0 United States License.

by admin

WordPress plugins: “More like this” from across the blogosphere

2:00 pm in Aggregation, Last.fm, Lucene, Political Blogging, Technical, WordPress by admin

Here’s a first look at Poblish‘s first WordPress plugin.

It looks at the content of the current blog post, and automatically identifies related content from across all the content hosted at Poblish – currently 216,296 articles from 1,698 working feeds – returning you a list of the most closely matching articles in under a second.

You can click the name of any blogger or blog to see their live feed (pictured) in a Facebook-style popup frame.

In fact, forget about the screenshot, because you can see the plugin installed on this very blog – just look at the foot of this post, and scroll forward and back through our other posts.

The plugin is stable, but needs to be packaged-up a little so it fits seamlessly into the WordPress world. If you’re impatient to try it out, though, drop me a line and I’ll let you know the two or three steps you need to follow.

Let me know if you have any ideas of your own for developing the plugin. Some of mine are:

  • Ignoring matches from your own blog.
  • Restricting matches by date.
  • Restricting to matches with the same set of tags as the current post – somewhat influenced by Last.fm Radio.

by admin

Google Reader integration: share your feeds

9:30 am in Google Buzz, Google Reader, Political Blogging, RSS by admin

All Feed boxes within Poblish now feature a “Subscribe with Google Reader” button.

So, straight away, you can subscribe to:

  • A feed of all activity on Poblish.
  • A feed of all activity for those Actors, Blogs/Feeds, and Groups you follow.
  • A feed for activity for any Actor, Blog/Feed, or Group you choose.
  • A feed of all recent activity (Flags, Favourites, Ratings, Group creations, etc.) on Poblish.
  • Content-tracking feeds, like the one illustrated.

Poblish is all about open data, and interoperability: making it as easy as possible for you to use the content we host, to share it, work with it, build upon it, and to recombine it in new and interesting ways.

I’m currently looking into how we can best use Google Buzz to help us in that mission, as well as finishing the work on our Custom Feeds facility, which will let you build your own combined feeds: some Actors, some Blogs, some content, all your flags and favourites, and so on.

by admin

Poblish and the Semantic Web: progress so far

12:30 pm in Natural Language Processing, OpenAmplify, Political Blogging, Semantic Web, Technical by admin

I mentioned last month that Poblish has been using OpenAmplify‘s semantic/sentiment analysis service to give technology a shot at making sense of the vast sea of content that is the political blogosphere, in such a way as to help policymakers make better informed decisions. As I’ve said before:

Billions of individual thoughts and personal experiences have been written about, from all conceivable perspectives. No policy process will come up with ideas that have never been thought of before; so existing content represents a knowledge base that should not be ignored

In my piece at Left Foot Forward, earlier this week, I imagined a future in which such tools could take a source article and use this content to automatically, dynamically identify counter-arguments, hopefully before bad policy is made. Well, we have the content, we know that counter-arguments are out there, some of which may very well not yet have crossed the mainstream media’s horizon, and we hope – and believe – that technology can help us find them.

Only a very small percentage of Poblish’s articles have so far been semantically analysed (OpenAmplify are very kindly letting us evaluate their software for free, so the number of articles we process is limited), but all new articles are – and for those articles that have them, Poblish is now displaying the results in the page’s sidebar. Here are the results for the following article.

The way we display the results is simplistic at best, but essentially what we’re showing are the main topics from the article, divided into their relevant category, and coloured as follows:

  • Blue: favourable references (or “polarity”). Dark blue for wholly positive (never negative), light blue for generally positive (but occasionally negative).
  • Red: unfavourable references. Red for wholly positive (never negative), pink for generally positive (but occasionally negative).
  • Grey: neutral references, or a mixture of positive and negative ones.

Clearly there are successes and failures in the above list. Sunny Hundal‘s name appears as a mere noun, rather than a human name (though I wonder if the fact that his surname was misspelled in the original article is relevant here) and some of the polarities seem a little random.

Bear in mind, though, that each set of results you see was the result of an analysis of one, single article, without any context. Give the tool 200,000, however, and we can be certain that insights will start to massively outweigh mistakes. Context is critical, and – just as we don’t judge people or texts on the basis of what we objectively see – semantic applications should not be regarded in isolation, but as part of a vast network of humans and machines, using different techniques to identify and weave links between pieces of information, gradually improving our understanding of them.

All in all, the questions I’m interested in are:

  1. Do we believe semantic analysis can work?
  2. Do we believe that it can reveal insights that it would be impractical for human beings to find?
  3. Do we believe that those insights might be just the ones we need?
  4. Is it worth us investing more in such solutions?

I’d offer a yes to each of those questions, and have had a lot of fun evaluating OpenAmplify, but: what do you think?

by admin

A new vision for blogging, and content-based policy crowdsourcing

9:30 am in Aggregation, Blogging, Policy-making, Political Blogging, Semantic Web, Technical by admin

This is the third in a series of posts on the subject of ‘How the semantic web can crowdsource high-quality judgment and improve policymaking’. In part 2, last week, I described how existing content – the blogosphere, in particular – is currently used, or perhaps abused, by policymakers.

This time, I’m going to cover a range of improvements: how we can make better use of existing content, why we’d want to do so, and I’m going to roughly split these into: (a) technical solutions, and (b) human solutions.

(i) Technology: Aggregation vs. isolation

Political blog aggregators are still very rare, especially in the UK. Creating and maintaining an application that is able to monitor hundreds or thousands of feeds, and produce new, aggregated feeds in a timely fashion, is neither trivial nor cheap. Nonetheless, when I created Bloggers4Labour in early 2005, I showed that usable aggregators were both possible, and – certainly at the time – desirable. By providing the media with a single window onto a wide range of blogging opinion, the blogging oligarchy I mentioned last week could perhaps have been broken.

Only when all blogs are aggregated – on an equal footing, and irrespective of their political affiliation and their nationality – can the blogosphere becomes the comprehensive, fair, and effective knowledge base it needs to be. We don’t want to throw contextual information away, but rather than let it entrench artificial barriers, we should let technology draw its own, more useful inferences.

Thus aggregation should become the norm, rather than the exception – or rather, the least we should expect. Furthermore, bloggers should be encouraged to leave the safety of their partisan networks, and become global political actors.

(ii) Technology: Breaking-down barriers

Rather than being bound by technological limitations and by non-interoperable software tools, and rather than advocating one particular package or way of working, any new crowdsourcing platform should use technology to enable everyone concerned with policy development can participate in a more informed and productive way.

Imagine a knowledge base that not only lets you see related content for any article you read, but that automatically updates you with content as you start to develop a new article. You might discover articles that refuted the argument you just made, that provided you with valuable supportive evidence, or that caused your article to take a different path. Imagine how easy it would be for a policy to have been decided-upon without those crucial points ever having been made, and how expensive and time-consuming a failed policy like that could be.

The old ‘linear’ aggregator model – with its single time-line of unrelated blog posts – is not much help here. Only by bringing all types of expressed opinion together on an equal basis, collapsing the distinctions between the various types, and replacing single time-lines with a web of matched, linked, and related information, can we achieve a really usable knowledge base, that’s easy to visualise and to navigate.

Debategraph-style maps, collaboratively edited documents and Wikis, and aggregated blog content will all be represented in this web. There may well also be a place for Twitter messages and open-source Government data. The overall goal should be to let structured data and mappings bring precision to blog posts, and to let blog posts bring context and detail to structured debates.

(iii) Technology: The Semantic Web

Technical solutions that understand the content they are given will always produce more relevant results than the 99% that don’t. Furthermore, solutions that use sentiment analysis can identify whether a particular individual, or concept, is being talked about in a positive, neutral, or negative light. This opens up the possibility of being able to automatically identify supporting or contradictory evidence for policies mentioned in existing articles, and in new policy documents as they are created. Once again, technology plus existing content can be used to support good policy, strike out bad policy, and save time and effort, not to mention embarrassment.

(iv) Human crowdsourcing: Collaborative editing

Collaborative editing – currently a niche interest – should become the norm, in contrast to the disjointed, sequential model of blog-commenting that is popular today. It is literally vital because it adds value, and adds life to already expressed opinion. The blog post of last year – that was overtaken by events and discredited – can be transformed into the post that acknowledges its original mistakes, assimilates new information, and becomes a valuable addition to the policy debate.

Collaborative editing also accustoms bloggers to a new way of working: by exposing them to scrutiny it encourages more thought and greater responsibility, but at the same time it rewards the extra effort, by giving bloggers – especially new ones, those who are less well-connected, and therefore those who might have the most original ideas – the encouragement that their output is being read and considered by a wider audience than before. While firing off posts into the ether can be cathartic, my experience tells me that bloggers do prefer to be engaged in a greater debate.

In future, contributors will adapt an existing blog post – working within the existing context – and create new branches, or sub-versions, that other contributors can approve and rate, and use as the basis for their own versions. Over time, the most active, the most popular, and the most highly regarded versions will rise to the surface. It may be that these versions will be quite different from one another – after all, while agreement and resolution are fine things, political disagreement can also be valuable, and these versions will be much more useful themselves than the undistilled thoughts of just one blogger.

There is no reason why those used to the current model of blog commenting should not contribute by adding their suggestions at the foot of the original article, rather than working within the framework of the original. Potentially useful insights should not be lost, even if they cannot immediately be related to the existing content. The important thing is that contributors are not limited – or forced to work in a particular way – by technology that dates back to the early days of the Web.

(v) Human crowdsourcing: Juries, assertion-flagging, and data cleanup

There’s a lot more humans can do with a crowdsourcing platform besides creating new content (individually or collectively), flagging, and rating.

The platform can invite – or randomly select – disinterested participants (i.e. who don’t have a personal connection with the issue at hand) to work together on a particular debate, marking up relevant arguments, marking down irrelevant arguments, linking similar ones, and perhaps trying to find resolutions in other areas: essentially doing things that are just too tricky for a computer to do. The Guardian’s recent, and very successful, crowdsourced MP’s expenses exercise is a good example of this. Provide users with an incentive to donate their time and brainpower to the community, and great benefits can be reaped.

Another task humans can perform is to manually tag assertions within articles they read, and ask the platform to contact the original author / blogger so that they can respond with supporting evidence. Those who respond satisfactorily will be given credit for having done so, and their response will be attached to the original article, taking its place in the knowledge base for others to consult.

(vi) Conclusion

I hope I’ve succeeding in setting out a brighter vision of how crowdsourcing can improve policymaking, making it better informed and more efficient; how technology can be used more, and more effectively; how political blogging has a potentially enormous part to play; and how bloggers have a lot to gain by getting involved with a new crowdsourcing platform.

(Originally posted here, on January 26th, 2010.)

by admin

Crowdsourcing new policies, and why blogging has to change

9:30 am in Aggregation, Blogging, Policy-making, Political Blogging, Semantic Web, Technical by admin

This is the second in a series of posts on the subject of ‘How the semantic web can crowdsource high-quality judgment and improve policymaking’. Last week I made the case for using existing content – blog posts; Wikis, like Debatepedia; and visual debate-mapping tools, like Debategraph – as a knowledge base to drive new policy exercises, and introduced you to my new project, Poblish, which demonstrates this.

This time, I’m going to cover how existing content – the blogosphere, in particular – is currently used, and just how bad the situation is.

Blogging and personality

Individualistic political blogging dominates the collaborative alternatives because of its quantity rather than its quality, and because of personality rather than because of the arguments made. ‘Reputation’ within the blogging world is too often self-fulfilling, and technological limitations – combined with the laziness of politicians and the media – have created an oligarchy of ‘go to’ bloggers.

While the minds of journalists are not entirely closed to newcomers, it’s undeniable that the opinions of a couple of dozen ‘power bloggers’ carry more weight than all others put together. Where the strong preferences of a small minority dominate the weak preferences of the majority, democracy suffers.

Not only does this conceal the richness and diversity of the blogosphere in favour of accepted wisdom and conventional categories (‘Labour bloggers say…’), it corrupts both readers and writers. The priority of these bloggers gradually turns towards reportage – being ‘newsworthy’, breaking stories, filtering gossip, tracking trends, and developing their own ‘brand’ and influence. As their fame spreads, they draw traffic away from less well-connected blogs, encouraging readers to leave comments among a sea of others, rather than take the time to develop their thoughts more fully elsewhere.

While aggregators held out the possibility of providing readers with a single window onto a wide range of blogging opinion, the result has generally been to tie bloggers to their own political party.

Lack of interactivity

The level of interactivity on blogs has barely advanced during the past five years. Although all blogging platforms now offer a commenting facility, and some allow comments to be nested below others, comments continue to sit apart from the original article. They cannot refer to particular sections in the original, even though useful contributions are far more likely to relate to specific sections of the original rather than the generality. (Services like this do exist, but they are very far from mainstream blog tools.)

By being outside the context of the original, the mental pressure – to understand the original, and to constructively contribute – is taken off the contributor, but shifted onto the original blogger, who must attempt to understand and ‘re-contextualise’ the commenter’s addition before he can move his own argument on. What should be an interactive process becomes a sequential one, and all the slower and more time-consuming as a result.

Finally, the noise-to-signal ratio of comments can become enormous as a blog increases in popularity, unless strict controls or voluntary ‘codes of conduct’ are in place.

Lack of collaboration

Collaborative alternatives potentially provide more valuable content than blogs: more focus; less duplication; less pressure to be ‘journalistic’; a fairer balance between contributors; as well as a less ‘noisy’ experience. However, the very fact that they ask more of contributors makes them more expensive to create, and therefore thinner on the ground. This, in turn, can make collaborative editing seem a lonely experience. This situation will likely continue until there are efforts to break down barriers between the two types of content. Aggregators are of little help here, as they perpetuate the idea of a single time line of unrelated articles, in stark contrast to the ‘world wide web’.

Isolation and insulation

New blogs begin life in complete isolation and need to build connections with others if they are to keep their enthusiasm going. They need blogging friends, and they need encouragement. However, until a true blogging political hub appears, new bloggers often find themselves locked into political party silos, isolating themselves from the much wider external audience. A parallel incentive is for people to insulate themselves in order to avert the discomfort they feel when confronted with deeply contrary opinions and threats to their world-view. More often than not, it us unregulated comment-boxes that fuel this, rather than the behaviour of other bloggers.

Conclusion

When reputation becomes detached from quality; when friendship, like-mindedness, and convention determine the success of a blog and the popularity of its content; and when atomisation rather than interaction is the norm, the result must be a homogenisation of ideas, and a greater chance that rare but brilliant insights will be missed. This is the opposite of what we’re looking for.

In the next post I’ll be explaining how Poblish tries to address each of these problems, and how policy-making can be made more informed, more efficient, more constructive, and also more satisfying.

(Originally posted here, on January 18th, 2010.)

by admin

Taking ‘Possibly Related Posts’ to the next level

9:30 am in Natural Language Processing, Policy-making, Political Blogging, Semantic Web, Stemming, Stopwords, WordPress by admin

Many WordPress bloggers use plugins like these to help people who read their posts find other, related posts.

That’s all well and good if you only want to help readers find your own articles, but perhaps other political bloggers have made your own point better than you have? Let’s now turn that round: perhaps you’ve made another blogger’s point better than he has? Wouldn’t it be great if there was a Related Posts service that let people follow links to similar content from one blog to another, irrespective of who wrote it and where you started reading?

Poblish offers just this. Simply open a blog post from one of the 1500 feeds we monitor, and you’ll see a list of similar posts, ranked by similarity, from across all of those feeds.

If that wasn’t cool enough, the list of related posts updates automatically. So, if you create a new, collaborative article with us, you can watch the list update as your work progresses – literally as you type. That’s very useful – perhaps you make a particular point, then some articles appear that strongly refute that point. You might then reconsider, delete your last paragraph, and move on in a different vein.

I believe that tools like this are an essential part of making the political blogosphere a knowledge base, that can not only improve political blogging, but also improve policy-making.

(I should add, as an aside, that all these services essentially use Inverse Document Frequency algorithms. Here’s a worked example. They can work very well – Poblish’s especially, I hope, as our algorithm uses stemming and stopwords – but there’s no attempt by the computer to understand the text, or the context, so there will inevitably be howlers. These are not semantic solutions of the type I mentioned yesterday, but don’t worry: we have big plans in that area.)

by admin

Tory blog aggregation

9:30 am in Aggregation, JSON, Liberal Democrats, Political Blogging, RSS, Technical, Tories, UK Politics by admin

It’s not well-enough known that Poblish‘s support for custom groups means that the issue of the missing Conservative Blog Aggregator, that Matt Wardman wrote about last year, has finally been solved, once and for all. Labour bloggers have had one for nearly 5 years.

Clearly this is extremely useful for anyone who’s interested in what UK Conservatives are talking about. So, here’s the Conservative Party group page, where you can watch the live feed. Here’s it is in JSON format, and in RSS 2.0 format.

The group currently contains 527 members, which comprises: all Conservative MPs  (via They Work For You), plus all the bloggers from the Total Politics directory, minus the broken links and the bloggers who weren’t really Tories on closer inspection.

Liberal Democrats shouldn’t feel left out, even though we only have 67 members at present. Here’s their group page, their JSON feed, plus the RSS representation. They do, of course, already have a well-known aggregator of their own.