Introducing FollowThis

Quick Pitch

Challenge: A user would like to follow a specific story for updates and related items.

Typical Use Case

1. A user arrives at a story that they are interested in. They’d like to keep up with it, or at least some aspects of it.

2. The user clicks the FollowThis button (or bookmarklet). An overlay appears which will offer some site-specific options, like whether they’d like to follow on this site only or across the web, and the frequency of the notifications.

3. The user provides their email (if not already logged in) and submits.

4. The user will receive a verification email to ensure we aren’t spamming random email addresses.

5. From then on, the user will receive email notifications if that specific story is changed and updates containing links to related articles and videos. They will be able to modify or unsubscribe to the updates at any time.

Implementation

Click to visit the IPTC site and view the rNews Domain Model

The system will store metadata about articles in a specific ontology in an RDF database. Most likely this will be rNews, since it shows great promise in getting adopted and will lend itself well to inter-operation between this and other systems. The system will make SPARQL queries available through a REST API that will return new and related items based on the user’s subscription metadata.

When a user chooses to follow the story, the subscription is stored along with entities that comprise the topics that story covered. These are more specific than the level we would get with tags. For example, the subscription might follow stories that deal with Java the island, but not the programming language, or coffee.

We’ll need feedback from the users as to whether they want to dive down into these sub-topics at the point of follow or whether it would better be left to refine at a later point or some amount of both options. Perhaps and “advanced follow” link. Feedback from users will be key toward polishing the interface.

We’d also like to work collaboratively with the Editors and Producers of the sites. A good amount of metadata can be set up upon initial launch but natural language processing isn’t always as good as humans at some more complex entity relationships. We’d like to have editors aid in the production of this metadata, as well as users, without the process becoming a burden.

Any news organization using the software will be highly encouraged to start adding Semantics like hNews or rNews to their presentation layer. While we will make the software work without it, it will be much more effective if we start with a solid base of metadata.

Since hNews currently has a much higher adoption, an hNews to rNews converter will be one of the first components needed. We will release this to the community as a separate standalone library since it could be helpful for other applications.

Many organizations already have the required metadata within their existing editorial backends, it’s just that they aren’t presenting it to the browser. Implementing one of these specs is not more than a few day long project.

Secondly, and also optional, we’d like to encourage collaboration across news organizations. In other words, a user choosing to follow a story would also be submitting that story to a commons of semantically categorized news articles that other sites could present as notifications to their users. Sites could provide RDF dumps to each other to create a distributed wire system, in a similar fashion to the way that Usenet newsgroups work.

An added benefit to this collaboration would be driving inbound traffic from other collaborating organizations and also offering the end users the ability to choose to follow the topics from one specific site or from the web at large.

Challenges

Most traditional news organizations don’t usually think in terms of collaborating with their competition the way the tech industry does. This is why we make some of the features are optional with the hope of later showing how value can be gained from unorthodox strategies like sending users and content to competitors.

Another issue would be the inability of the organization to provide the metadata needed for these relationships. Natural language processing can be used to extract entities, like Open Calais does. This would be a challenge to build ourselves and it isn’t clear that there is already an open source alternative. More research into some of these related open source projects will be necessary. NLTK to RDF seems to have potential.

Why should a business adopt FollowThis?

The most precious resource a news organization has is an interested reader. Keeping that user engaged should be the primary goal. FollowThis allows your users to stay on top of the stories they are most interested in, by notifying them of updates or related items. By keeping these users engaged, the user benefits and the organization gets its content to the right audience, and drives more traffic. Aggregators like Google News have begun to personalize their offerings. News organizations must do the same and do so while they have their users at the “point of sale.” The metadata that powers the service is already available in the CMS of most organizations, but is being under-utilized. A by-product of this project for any news organization would be a database that could easily be used  for other areas, like ad targeting. Implementing FollowThis will make for happier users and a healthier business.

About Me

My name is Matt Terenzio and I’ve been building websites for news organizations for almost ten years. I’m interested in pursuing how we can use some of the existing and emerging metadata stored in these organizations to help the organizations themselves and help their users get a better news experience. Contact me on Twitter @mterenzio or mterenzio at gmail Keep an eye on FollowThis

An open, semantics-based news database and API

Here is the 256 word description of the Knight-Mozilla Learning Lab project.

Mojosaurus is an open Semantic database of news on the web. It provides a REST API that enables news organizations using the software to easily query the DB for items like related articles or photos. An RDF dump of sources and news items will be available.

That may change, of course, but it’s a good indication of where my mind is. If such a thing were already available, I’d use it in my newsroom. In fact, I’m currently using private services like Daylife and Zemanta for similar problems.

This project would enable everything those companies do, but would be open and community oriented. The Yahoo Directory couldn’t compete with DMOZ. In a similar way, we need an open version of  a news database. It’s just too important to leave to private companies.

And too dangerous. Closed algorithms are no better than closed news organizations that decided what news we got in the pre-internet era of last century. Maybe worse.

On top of this platform is where the interesting things would happen. We’ll need to provide a plugin architecture so that developers can easily build apps that leverage this data.

Imagine that the API will provide hooks that allow a plugin to rank the news. One plugin creator might use a users social network to  rank the news. Another might combine human editors or external data. Some plugins would be open source, others might be proprietary services from vendors.

Users would be able to mix and match plugins top shape their view of the data. A user would activate the New York Times “MyNews” plugin and compare it to The Washington Post or ProPublica plugin.

Another type of use would be to integrate it into Content Management Systems to provide something like a “related articles from around the web” feature.

Another usage might be an alert service that allows users to follow complex topics, rather than just keywords.

Those are just a few ideas. We’ll follow the domain model being expressed by the emerging rNews spec, making the API as open and flexible as possible.

And evangelizing rNews adoption will be a goal as well.

Well that’s it for now. More graphics and perhaps a video to come by weeks end.

rNews == an open source news API

It has been said that it’s better to have a closed standard than no standard at all. At times this is true, but it’s great when forces work to provide open standards from the onset.

One great thing about open standards is that the rising tide floats all boats. Take news discovery on the web.

A key part of publishing content on the web is getting your content discovered. Syndication and being well indexed by search engines is a necessary start. To do that well requires some thoughtful design, publishing site maps and news feeds etc.

Much better would be a full blown API to allow other services  on the web full access to your data. Journalism discussions about community often weigh heavily toward getting the communities to contribute to the news process but that type of one-sided thinking may be indicative of the traditional news organizational culture.

Equally important is contributing out in ways that foster the deep participation we are seeking. We need to give our communities the tools they need to help the news processes along.

Alas, most news organizations do not have the resources to create the Times Developer Network. Nor is it realistic or beneficial to the developer community to have to learn a different API for each organization.

Enter the Semantic Web.

Many have considered it largely academic (or not considered it at all). There just didn’t seem to be a big enough ROI . Not enough services were harvesting the metadata. That was true because not enough publishers were providing the metadata. Also, in the case of the news industry, was the lack of a domain specific data model.

When news organizations finally begin to publish semantic metadata a critical mass will spill out onto the web and the chicken and egg problem that has plagued the movement will be over. That moment may be upon us.

In recent months a proposed standard called rNews has emerged for using RDFa to embed news-specific metadata into HTML documents. If you don’t know what that means, the International Press Telecommunications Council (IPTC) which is creating the spec has an excellent website that explains it all.

The IPTC carries  a lot of weight in the news world, which means the standard has a good chance to get adopted. A number of key players are involved in the creation and evangelism of this spec including NYTimes’s Evan Sandhaus and Hearst’s Mike Dunn.

What it means to anyone that adopts it, is the ability to level the playing field in the area of API creation. Publishing of  the metadata would allow developers (both internal and external) to query your HTML documents, enabling all sorts of aggregations and mashups. And that’s just scratching the surface.

In the least it will  help search engines enable users to find you. At best it will transform the way the industry publishes and consumes news.

This is a great thing. A rising tide will float all boats. Make sure you are on one.