Search Engines and advertising don’t mix?

Way back in 2007, Doc Searls turned me on to this quote:

Currently, the predominant business model for commercial search engines is advertising. The goals of the advertising business model do not always correspond to providing quality search to users.The Anatomy of a Large-Scale Hypertextual Web Search Engine, Sergey Brin and Lawrence Page.

I revisit it every few years and think about Google. It was ironic then and only gets better.

Posted in Uncategorized

ShortURL Redux with PHP and AWS Elasticbeanstalk and DynamoDB

We certainly had our share of shortURL discussion and implementations a few years back and the topic was well covered including some nice work by Dave Winer and Joe Moreno to do an Amazon S3 based solution to provide portability.

I’ve been wanting to play with Amazon’s Elastic Beanstalk since it started supporting PHP. And I’ve always still longed for a ShortURL software that was a “set it and forget it” solution. Infinitely scalable with little to no maintenance. Pretty much rules out the use of a relational database and in the unlikely event of extremely high usage I don’t even want to worry about hitting the limits of a single server.

Happily, with all the cloud based offerings these days we have some options. Even better, a PHP app on Elastic Beanstalk should scale without (in theory) any of the more complex setup that would be necessary if we just used EC2. there are plenty of other ways to skin this cat but, like I said, I wanted to try this method of app development.

Regarding persistent storage, back when the topic was hot, I had built a shortener based on Amazon’s SimpleDB, but that service has it’s limitations and the goal here was theoretically infinite scalability and as little maintenance as possible.

I chose Amazon’s DynamoDB for those reasons.

For the record, I don’t normally choose vendor specific offerings because I hate lock-in as much as the next guy, but this was the best fit for the goals of this exercise and it’s not so complicated of a piece of software that I couldn’t make an easy port to something like MongoDB (which I’ve also been playing with and beginning to love. Go NYC!).

Maybe the coolest thing about Elastic Beanstalk is the ease of use thanks to the Git integration.* Here is an excellent post about getting set up to use Git with Elastic Beanstalk. You install a Git extension that let’s you configure it with the credentials that AWS needs. This is for one-way pushes only so you’ll probably want to do development and testing locally or on another server to speed up your round-trips. James Gosling felt similarly. Though PHP doesn’t have all that overhead it still is a bit painful to develop straight to Elastic Beanstalk. No real need to anyway.

Once you have an app set up and are able to push your commits up to it, you are ready to roll. For me that meant setting up CodeIgniter and the AWS SDK for PHP.

I unzipped CodeIgniter 2.1.o and added it to my Git repository. Committed and pushed it using the “git aws.push” command. Looked at the app and it was serving a CodeIgniter page right out of the box.

I downloaded the latest AWS SDK for PHP and unzipped it. Making it work with CodeIgniter is pretty easy. I drop the entire folder into application/libraries. I called my folder ‘aws-sdk’.

Also in the libraries folder we add a file called aws.php withe the contents simply being:

<?php

class Aws {

 function Aws()
 {
 require_once('aws-sdk/sdk.class.php');
 }
}

Before you can use that class, you’ll need to edit the config-sample.inc.php file in the sdk folder as well. We could hard-code the AWS credentials into that file but Elastic Beanstalk provides a better way.

Rename that file to config.inc.php and where you’d put your AWS access key and secret key, instead put:

'key' => get_cfg_var('aws.access_key'),
'secret' => get_cfg_var('aws.secret_key'),

get_cfg_var() is a PHP function that returns values from the PHP.ini file. Elastic Beanstalk allows you to add some of these config values under the “container” tab when you edit your environment right through the AWS console. There are already two fields specifically for these two values and you can add some others if you need something similar. We will use one for our database.

While in the config file you’ll also need to set the ‘default_cache_config’ param. It’s required for DynamoDB. Set it to ‘apc’.

If you do indeed find yourself working on a live Beanstalk app one suggestion I would make is to have a stable file in use as the healthcheck URL. If for some reason your load balancer can’t connect with a HTTP 200 to that URL, your app app health will go to red status and you won’t be able to contact it and you’ll waste considerable time connecting  directly to the EC2 instance or rolling back to older versions till you aren’t dead in the water. It seems a little flaky in that regard, or is rather just a compromise one needs to make to get the  benefits that using an environment like this?

Posted in Uncategorized

Fun with GeoPoints

If you are in the news business you are likely thinking about location based services and geo-content in general.

In my role as a developer at Digital First Media, it is an especially important topic as we try to build centralized sites and services that can work across hundreds of local news sites.

It makes sense for the business. Location based marketing is an area we need to be leaders in. We always have been, and the fast changing and still emerging landscape in both mobile and personalization means there is still tons of room to carve out our share.

This is particularly important to organizations like ours with legacy print products. Zoned inserts are still a big part of the newspaper business, but a scary one too, because the advances in location based targeting online means a significant disruption is ahead. When the money moves online, we need solutions in place to fulfill the needs of our local customers.

Now, it’s not that I suddenly drank some kool-aid being served in the sales department open-house. I don’t get invited to those anymore. I’m mentioning this stuff as a developer because there is a  truly awesome part to this all.

This location based personalization is also often exactly what our users want as well. It is a rare opportunity in this industry that the stars align so that the editorial, technology , sales AND (most importantly) the site users all want the same thing.

I’m so bullish on this, I’d say that if the industry can execute well on location based services online it could revive a lot of companies. But I think we need to be smart about it. No one department can be “product owner” here. We need to build things that users love. The rest will fall in line.

Thanks to Superstorm Sandy, I had some downtime where I could test a few technologies out. During the election, using some tools for reverse geocoding available from Open Street Maps, I was able to aggregate tweets mentioning either “Romney” or “Obama” and put them into  buckets based on their locations.

Earlier this year, I had done some prototypes using the Alchemy API, in conjunction with a number of experiments being done for the Citizens’ Agenda project at Jay Rosen’s NYU Studio 20.  I grabbed some old code left on the editing room floor and was able to do some sentiment analysis on those aggregated tweets, to try and track the aggregate moods for each term.

The results were surprising to me. The analysis returned more positive results for the term “Romney” leading up to election night. Once it became clear Obama was the winner, the term “Obama” rose a bit, as might be expected, though the overall rating for that term stayed negative the whole time. I can think of lots of cool visualizations and uses for this type of data. Maybe an opportunity for an upcoming hackathon.

While it is clear we aren’t predicting elections yet, I see  a lot of potential in this space for all kinds of usage in journalism and marketing.

At Digital First Media, I’m working on some server side geo-location tools and I also see many of our journalists thinking about maps and location from there end. The tools out there range from expensive and closed to free and open. I personally want two things. Automation and API’s. And using Fusion Tables and Maps you can cobble together some cool things. I also want persistent data. This needs to grow over the years and become more valuable as it does so.

I’m a big fan of PostgreSQL but it was time to learn a little MongoDB and it also has a dead simple way to do 2D location searches with its $near query.

Anyhow, I was able to pretty quickly get a small app running that does  one thing very simply. It stores geojson content and then returns that content based upon whether it is within a certain radius of the coordinates you query for.

Check it out at geopoints.org or directly at github at https://github.com/mterenzio/geopoints

Posted in Uncategorized