Citizens’ Agenda, Fusion Tables and JSON (Show your work)

There is a lot of positive buzz around in regards to “The Citizens’ Agenda” project that I’m involved with. You can see all the great work that Studio20 and The Guardian are doing here. And don’t forget to tweet any #unasked questions you might be thinking about.

This initial set of data being published is a set of 839 categorized questions asked at  20 Republican presidential debates. The results were stored in a Google Fusion Table. While not completely normalized, there are a lot of interesting queries and visualizations to be made with the data.

My initial thought was to get it into JSON format. Then we could build on top of it and query it in more fruitful ways.

I had never even seen Google Fusion Tables. I take that back. I had watched a demo at the Knight-Mozilla OpenNews “hacktoberfest” in Berlin in September by Chris Keller, but had never used them myself.

So I said, “Hey Chris, what do I do?” He provided some examples using JQuery which were really helpful to see how the API works in general.

And he pointed to some existing ways to get JSON back from the API including Fusion Tables to JSON and a very interesting undocumented feature of the Google API itself.

As I mentioned, the table data was not quite normalized. For example, in a column called “Question directed to,” the value could be “Romney” or “Romney, Paul.”

So a query for “Romney” might miss that second example, since matches need to be exact. Also, the column with the date had additional text in that field describing the  particular debate.

In other words, getting meaningful queries was going to take some text processing from within the application. So I thought I’d try out the PHP API client that Google provides.

Here is the result. This example returns the entire data set in JSON. I will followup with some more specific queries. You can see a live example at

// include fusion tables API lib

// PHP CSV function breaks because of new lines so I use parsecsv library

// authorize (use your creds)
$token = ClientLogin::getAuthToken('my-google-username', 'my-google-password');
$ftclient = new FTClientLogin($token);

//select the data (must know fusion table id)
$fusiontableid = 'fusion-table-id';
$csvreturn = $ftclient->query(SQLBuilder::select($fusiontableid));

// instantiate parsecsv object and process the google data
// this populates $csv->data with an array
$csv = new parseCSV();

//encode array into json, set content-type header and print it out
$json = json_encode($csv->data);
header('Content-type: application/json');
echo $json;

Innovation key to sustainability in journalism

It’s all still settling in, but my experience last week in Berlin has caused my outlook on the future of journalism to go from one of great concern to one of great promise.

I was honored to take part in what we have come to call Hacktoberfest, a five day hackathon sponsored by the Knight Mozilla News Technology Partnership or MoJo (Mozilla Journalism).

Twenty developers and journalists joined a bunch of Mozilla folks and members from five participating news organizations (Boston Globe, The Guardian, Zeit Online, Al Jazeera English and BBC).

What transpired was an incredible “mosh pit of brains.” I may be misquoting someone on that, but you get the idea. (also not to be confused with a Bad Brain Mosh Pit)

I learned a lot.  I learned about Wiener Sausages and German beer, for sure, but also about the sausage making necessary for news innovation. The collaboration which ensued turned good ideas into great ones and even spawned completely new projects out of common desires and needs. It was the fellowship of the thing.

But it wasn’t just academic discussion about futuristic dreams. We addressed questions like “how can we fix the broken comments model on news sites and the web in general?” And, “How can we pay for investigative journalism when sites rely on a page-view model?”

In fact, sustainability and business model ideas permeated the discussions that surrounded the hacking. All of those ideas seemed to accept that innovation was a key part of building a successful news business, both now and in the future. It’s hard to believe we still need to beat that drum in 2011, but we do. More of the same will not cut it, even if it is more profitable in the short term.

The most recognizable feature of the Berlin skyline is The Fernsehturm, or television tower. It was built in the late sixties to symbolize the the power of the German Democratic Republic. As I wandered the streets beneath it, I couldn’t help but think that this metaphor for Communist Berlin had a lesson for news industry.

It can either use its resources to convince the world from high up that it is strong, necessary and omnipresent, or it can join the democratic revolution of news happening on the ground. Because, for better or worse, it’s happening. The metro has left the station.

I’m reminded of a scene in Citizen Kane that serves as a lesson for what stagnation can do to a company, and that there are those who would rather just let things wind down naturally.

You’re right,” says Charles Foster Kane,  “I did lose a million dollars last year. I expect to lose a million dollars this year. I expect to lose a million dollars next year. You know, Mr. Thatcher, at the rate of a million dollars a year, I’ll have to close this place in  . . . 60 years.”

Unfortunately most newspapers don’t have that luxury.

But that shouldn’t make them short-sighted. It’s a cold but undeniable truth that news companies have two paths.

One is to slowly decline, maintaining as many of the employees that rely on the print product for as long as possible.

The other is to make a bet on the future, which may hasten the termination of some employees but will enable those companies to garner a stake in the future of both journalism and marketing on the web.

I choose the latter.

It’s not margins that will save these companies but innovation.

Introducing FollowThis

Quick Pitch

Challenge: A user would like to follow a specific story for updates and related items.

Typical Use Case

1. A user arrives at a story that they are interested in. They’d like to keep up with it, or at least some aspects of it.

2. The user clicks the FollowThis button (or bookmarklet). An overlay appears which will offer some site-specific options, like whether they’d like to follow on this site only or across the web, and the frequency of the notifications.

3. The user provides their email (if not already logged in) and submits.

4. The user will receive a verification email to ensure we aren’t spamming random email addresses.

5. From then on, the user will receive email notifications if that specific story is changed and updates containing links to related articles and videos. They will be able to modify or unsubscribe to the updates at any time.


Click to visit the IPTC site and view the rNews Domain Model

The system will store metadata about articles in a specific ontology in an RDF database. Most likely this will be rNews, since it shows great promise in getting adopted and will lend itself well to inter-operation between this and other systems. The system will make SPARQL queries available through a REST API that will return new and related items based on the user’s subscription metadata.

When a user chooses to follow the story, the subscription is stored along with entities that comprise the topics that story covered. These are more specific than the level we would get with tags. For example, the subscription might follow stories that deal with Java the island, but not the programming language, or coffee.

We’ll need feedback from the users as to whether they want to dive down into these sub-topics at the point of follow or whether it would better be left to refine at a later point or some amount of both options. Perhaps and “advanced follow” link. Feedback from users will be key toward polishing the interface.

We’d also like to work collaboratively with the Editors and Producers of the sites. A good amount of metadata can be set up upon initial launch but natural language processing isn’t always as good as humans at some more complex entity relationships. We’d like to have editors aid in the production of this metadata, as well as users, without the process becoming a burden.

Any news organization using the software will be highly encouraged to start adding Semantics like hNews or rNews to their presentation layer. While we will make the software work without it, it will be much more effective if we start with a solid base of metadata.

Since hNews currently has a much higher adoption, an hNews to rNews converter will be one of the first components needed. We will release this to the community as a separate standalone library since it could be helpful for other applications.

Many organizations already have the required metadata within their existing editorial backends, it’s just that they aren’t presenting it to the browser. Implementing one of these specs is not more than a few day long project.

Secondly, and also optional, we’d like to encourage collaboration across news organizations. In other words, a user choosing to follow a story would also be submitting that story to a commons of semantically categorized news articles that other sites could present as notifications to their users. Sites could provide RDF dumps to each other to create a distributed wire system, in a similar fashion to the way that Usenet newsgroups work.

An added benefit to this collaboration would be driving inbound traffic from other collaborating organizations and also offering the end users the ability to choose to follow the topics from one specific site or from the web at large.


Most traditional news organizations don’t usually think in terms of collaborating with their competition the way the tech industry does. This is why we make some of the features are optional with the hope of later showing how value can be gained from unorthodox strategies like sending users and content to competitors.

Another issue would be the inability of the organization to provide the metadata needed for these relationships. Natural language processing can be used to extract entities, like Open Calais does. This would be a challenge to build ourselves and it isn’t clear that there is already an open source alternative. More research into some of these related open source projects will be necessary. NLTK to RDF seems to have potential.

Why should a business adopt FollowThis?

The most precious resource a news organization has is an interested reader. Keeping that user engaged should be the primary goal. FollowThis allows your users to stay on top of the stories they are most interested in, by notifying them of updates or related items. By keeping these users engaged, the user benefits and the organization gets its content to the right audience, and drives more traffic. Aggregators like Google News have begun to personalize their offerings. News organizations must do the same and do so while they have their users at the “point of sale.” The metadata that powers the service is already available in the CMS of most organizations, but is being under-utilized. A by-product of this project for any news organization would be a database that could easily be used  for other areas, like ad targeting. Implementing FollowThis will make for happier users and a healthier business.

About Me

My name is Matt Terenzio and I’ve been building websites for news organizations for almost ten years. I’m interested in pursuing how we can use some of the existing and emerging metadata stored in these organizations to help the organizations themselves and help their users get a better news experience. Contact me on Twitter @mterenzio or mterenzio at gmail Keep an eye on FollowThis

An open, semantics-based news database and API

Here is the 256 word description of the Knight-Mozilla Learning Lab project.

Mojosaurus is an open Semantic database of news on the web. It provides a REST API that enables news organizations using the software to easily query the DB for items like related articles or photos. An RDF dump of sources and news items will be available.

That may change, of course, but it’s a good indication of where my mind is. If such a thing were already available, I’d use it in my newsroom. In fact, I’m currently using private services like Daylife and Zemanta for similar problems.

This project would enable everything those companies do, but would be open and community oriented. The Yahoo Directory couldn’t compete with DMOZ. In a similar way, we need an open version of  a news database. It’s just too important to leave to private companies.

And too dangerous. Closed algorithms are no better than closed news organizations that decided what news we got in the pre-internet era of last century. Maybe worse.

On top of this platform is where the interesting things would happen. We’ll need to provide a plugin architecture so that developers can easily build apps that leverage this data.

Imagine that the API will provide hooks that allow a plugin to rank the news. One plugin creator might use a users social network to  rank the news. Another might combine human editors or external data. Some plugins would be open source, others might be proprietary services from vendors.

Users would be able to mix and match plugins top shape their view of the data. A user would activate the New York Times “MyNews” plugin and compare it to The Washington Post or ProPublica plugin.

Another type of use would be to integrate it into Content Management Systems to provide something like a “related articles from around the web” feature.

Another usage might be an alert service that allows users to follow complex topics, rather than just keywords.

Those are just a few ideas. We’ll follow the domain model being expressed by the emerging rNews spec, making the API as open and flexible as possible.

And evangelizing rNews adoption will be a goal as well.

Well that’s it for now. More graphics and perhaps a video to come by weeks end.

rNews == an open source news API

It has been said that it’s better to have a closed standard than no standard at all. At times this is true, but it’s great when forces work to provide open standards from the onset.

One great thing about open standards is that the rising tide floats all boats. Take news discovery on the web.

A key part of publishing content on the web is getting your content discovered. Syndication and being well indexed by search engines is a necessary start. To do that well requires some thoughtful design, publishing site maps and news feeds etc.

Much better would be a full blown API to allow other services  on the web full access to your data. Journalism discussions about community often weigh heavily toward getting the communities to contribute to the news process but that type of one-sided thinking may be indicative of the traditional news organizational culture.

Equally important is contributing out in ways that foster the deep participation we are seeking. We need to give our communities the tools they need to help the news processes along.

Alas, most news organizations do not have the resources to create the Times Developer Network. Nor is it realistic or beneficial to the developer community to have to learn a different API for each organization.

Enter the Semantic Web.

Many have considered it largely academic (or not considered it at all). There just didn’t seem to be a big enough ROI . Not enough services were harvesting the metadata. That was true because not enough publishers were providing the metadata. Also, in the case of the news industry, was the lack of a domain specific data model.

When news organizations finally begin to publish semantic metadata a critical mass will spill out onto the web and the chicken and egg problem that has plagued the movement will be over. That moment may be upon us.

In recent months a proposed standard called rNews has emerged for using RDFa to embed news-specific metadata into HTML documents. If you don’t know what that means, the International Press Telecommunications Council (IPTC) which is creating the spec has an excellent website that explains it all.

The IPTC carries  a lot of weight in the news world, which means the standard has a good chance to get adopted. A number of key players are involved in the creation and evangelism of this spec including NYTimes’s Evan Sandhaus and Hearst’s Mike Dunn.

What it means to anyone that adopts it, is the ability to level the playing field in the area of API creation. Publishing of  the metadata would allow developers (both internal and external) to query your HTML documents, enabling all sorts of aggregations and mashups. And that’s just scratching the surface.

In the least it will  help search engines enable users to find you. At best it will transform the way the industry publishes and consumes news.

This is a great thing. A rising tide will float all boats. Make sure you are on one.