Django Zen

If you are following this blog for Django things then it might get a little bit quieter. This weekend I started up djangozen.com a website devoted to lots of stuff about Django. There is already a Django website of course and lots of other sites with useful Django stuff on them, but on the whole there are some things I think are missing. If you want to know more about that sort of stuff, go read the about page.

Technically DjangoZen uses:

  • Plone as the content writing and managing interface
  • Django as the delivery interface, using Postgresql
  • Contentmirror copies content (mostly) into Postgresql
  • Used jQuery for lots of fun bits
  • I got annoyed with all the OpenID libraries and implemented my own based off django-openidauth.
  • I use the Blue Print CSS framework and was very happy with it.

Now to adding in more features and writing some content.

Simpletemplate 0.5 released

This just removes the contrib from mentions in the code and is a bit more explicit on a failed path expression.

Download: simpletemplate.0.5.zip

Classy move

I spent quite a bit of time over the last month interviewing for a job. I was disappointed to find out that I didn't get the job last week. But this morning I got a phone call from the CEO, following up on the decision.

It didn't take much time, but left a good impression.

Good weekend

Had a nice weekend here in Vancouver. Went skiing up Grouse, walked the dog twice and spent Sunday afternoon on the beach. I'm absolutely delighted to think that I've got 3 beaches in walking distance of my new house, the furthest is about 10 mins walk away through a park. Ah that's Vancouver for you.

I also posted to the plone-users list asking if any Vancouverites were interested in re-forming a Plone Vancouver User Group. The response has so far been zero, so here's hoping it got lost in the traffic and there is still some one out there doing Plone in Vancouver.

How to get traffic to your site

Rant, swear and slag off a framework. A lot. Then the people will keep coming back. Dammit I knew I was being too nice.

Well, in two days my blog has had 33k unique visitors. Django post got 20k so far. Very interesting.
Zed Shaw in twitter, author of a big "Rails is a ghetto" rant.

Full text searching in Django

I've been looking for a nice full text search index for Postgres. For a while I've used the built in full text search in Postgres and have been pretty impressed in that I've had nothing to fix on it or worry about in over 5 months and I like that.

On the next upgrade of the site I wanted to do full text indexing across multiple models. This made it a little bit more interesting, but I quickly discarded a few plans:

  • tsvector is the one I was using and is based on providing search for a single model.
  • Whoosh came up in my twitter feed the other day and I tried it and got it working, but I don't feel too comfortable with whoosh. It's light on the unit tests and features. It's new and made me think the zcatalog and it's indexes from zope might be a better bet. Would like to see some more maturity out of that one.
  • Django solr seems like a more mature choice, but even the install of solr took over an hour as ubuntu apt-getted the world. After that I got a bit lost on reading solr docs. In the end I just didn't feel too comfortable with another peice of machinery to blow up. It's clearly a good mature product, but I feel at this stage overkill, definitely something I would use for a big project though.
  • This snippet is very cool, but again only for one model....

So all we need to do is hack apart that snippet and use it with the content types framework instead. So here's yet another solution that provides cross model searching. I use the Vector field from the last snippet and use it in a Generic content model. Then using a bit from the Whoosh example, I apply a signal to listen to every model. When it changes a check to see if the "get_search_fields" method exists. Yes that's hacky but works.

The last hacky part is actually saving the tsvector, that is done through raw sql (as is the snippet). I've tried overriding get_db_prep_save and such (as documented) but can only succeed in strings being written in, not tsvectors. So if anyone has any thoughts on that, much appreciated.

So a query returns the search model hits and then access content_object on the result gives you the object for the results page. With pagination that will mean 20 queries, one for each row of my result. Which isn't ideal, but we can optimize by adding more values to the search model if needed to prevent those lookups.

Here's the code:

from django.db import models
from django.contrib import admin

from django.contrib.contenttypes.models import ContentType
from django.contrib.contenttypes import generic
from django.db import connection
from django.db.models import signals

# from http://www.djangosnippets.org/snippets/1328/
class VectorField (models.Field):
    def __init__(self, *args, **kwargs):
        kwargs['null'] = True
        kwargs['editable'] = False
        kwargs['serialize'] = False
        super(VectorField, self).__init__(*args, **kwargs)

    def db_type( self ):
        return 'tsvector'

class Search(models.Model):
    object_id = models.PositiveIntegerField()
    content_type = models.ForeignKey(ContentType)
    content_object = generic.GenericForeignKey()
    
    index = VectorField()

    class Meta:
        app_label = "general"

    @staticmethod
    def query(query):
        query = connection.ops.quote_name(query)
        result = Search.objects.extra(where=["index @@ plainto_tsquery(%s)"], params=[query,])
        return result

# bits from http://www.arnebrodowski.de/blog/add-full-text-search-to-your-django-project-with-whoosh.html
def update_index(sender, instance, created, **kwargs):
    if not hasattr(instance, "get_search_fields"):
        return
        
    catalog = 'pg_catalog.english'
    data = [ str(f) for f in instance.get_search_fields() if f ]
    data = " ".join(data)
    
    content_type = ContentType.objects.get_for_model(instance)
    try:
        search = Search.objects.get(content_type__pk=content_type.id, object_id=instance.id)
    except Search.DoesNotExist:
        search = Search.objects.create(content_object=instance)
        search.save()

    cursor = connection.cursor()
    sql = "update general_search set index = to_tsvector(%s, %s) where id = %s"
    cursor.execute(sql, (catalog, data, search.id))
    cursor.execute("COMMIT;")
    cursor.close()

signals.post_save.connect(update_index)

Update: corrected possible sql injection.

Exit strategies for open source companies

A couple of interesting posts on exit strategies on Techvibes Business are supposed to make money and Why market ... matters more than exit strategy.

The second of these is the more interesting one for me, because I can add a fifth local company to that list. ActiveState whom I worked with a long time ago was bought by Sophos, primarily because of it's anti virus tools (Google). That was not one of the main things ActiveState was known for, it was one of several tools it had. But it went into a market it thought there was a space for and sure enough someone was interested in it.

When building Enfold Systems, we made sure there was some sort of exit strategy. Maybe in hindsight a little too much, however I would argue for open source companies, it is especially important to think of what exit strategy there will be. Because an exit will happen a some point, either planned or unplanned. When it happens, you need value.

Getting that value is a challenge for an open source company. Most open source companies start out as consulting companies and that doesn't build much value. You'll likely have no intellectual property (let's leave the questionable IP is evil debate for a moment), you'll likely have no patents (ditto) and you have likely given large amounts of your code away. You might have value for recurring support contracts or from services around the open source product. The value in the company is in the process of completing projects, the brains of the people working for you and the customers that you have.

But really the last one's are a tough sell, unless you have sales chain of the power of a Salesforce or an Oracle, you are going have a tough sell getting much value for that. At Enfold we put time and effort into making sure that we had products that provided the company with value for an investor. We made sure the IP was clear on those products and that if something did happen there wouldn't be hassle.

There's lots of companies building a real value on top of the consulting they do. For example OpenRoad has ThoughtFarmer, Enfold has its products, Six Feet Up has their hosting. All things that create value for the company, solve clients problems and provide revenue and that then feeds back into allowing the open source activities to continue.

And I think that's important to think about if you are starting a company and it's something that is worth doing at the beginning and getting right. I did have a chat with one business owner who felt it had to be all or nothing. All open source or not. I don't buy that will work. Besides, back on the topic at hand - one day and exit will happen of some kind - make sure you have value and that it is clear of issues.

Putting weather onto a site

Just following on from the last post about Cleartrain, someone asked in irc which product to use to put the weather onto their Plone site. There's a bunch of old ones out there. The simple answer is, don't. Just do it in Javascript, for example:

    <div id="weather-feed" />
    <script type="text/javascript" src="http://www.google.com/jsapi"></script>
    <script type="text/javascript">
    
    google.load("feeds", "1");

    var rss = new Object();

    rss.url = "http://weather.yahooapis.com/forecastrss?p=";
    rss.node = "CAXX0518";  // vancouver

    function initialize() {
        var feed = new google.feeds.Feed(rss.url + rss.node);
        feed.load(function(result) {
          if (!result.error) {
            var msg = document.getElementById("weather-feed");
            msg.innerHTML = msg.innerHTML + '<div>' + result.feed.title + '</div>';
            for (var i = 0; i < result.feed.entries.length; i++) {
              var entry = result.feed.entries[i];
              var link = '<a href="' + entry.link + '">' + entry.title + '</a>';
              var para = '<p>' + entry.content + '</p>';
              msg.innerHTML = msg.innerHTML + '<di >' + link + para + '</div>';
            };
          }
        });
    };

    google.setOnLoadCallback(initialize);
    
    </script>

This pulls in the weather from Yahoo, via Google. See http://developer.yahoo.com/weather/#request for more.

Example:

Django Test Client and OpenID

...or how I used django-openid all wrong in the first place and I've just gotten it cleaned up.

A while back I dedicated myself to trying to do everything with OpenID since having all these logins all other the place is just awful. With sites dedicated to the technically minded it seems the right way to go. So I grabbed django-openid out and slapped it in. This handles the login and logout for you. When an authenticated request is made a variable is appended to request by its middleware: request.openid which in turn provides account information.

So then it was a simple matter of checking if that request.openid is there and finding the user. In the old version of my site that was very simple, just a decorator to check someone is logged in and I'm done.

In the next revision I need permissions, so I thought I'd use the Django permissions from contrib.auth. No problem... just make the current user model a user profile and I can use permissions. Except this is where it gets sticky. Because I was working around the Django contrib.auth with using openid, things weren't working. This came to the fore when I tried to use the test client to check that I can only view pages when logged in.

To do that you use the login method of the test Client, so here's what I wanted to check:

from django.test import TestCase                
from django.test.client import Client

class register(TestCase):
    def testLogin(self):
        client = Client()
        client.login(openid="http://some.user.id/")
        res = client.get("/listener/view/2/")
        assert res.status_code == 200

    def testFailLogin(self):
        client = Client()
        res = client.get("/listener/view/2/")
        assert res.status_code == 302

However there's a problem. The login code works with the authentication backend, django-openid doesn't define one and neither did I. After trying to bend the Client to my will I realised I was going the wrong way and use the django authentication system that's there. So let's snip out all the messing around in the middle and cut to the chase of what worked.

Firstly, define an authentication backend, so that's a matter of grabbing the openid from the request and then searching in my UserProfile:

from general.models.user import UserProfile
from django.contrib.auth.models import User

class OpenIDAuthentication:
    def authenticate(self, openid):
        user = UserProfile.objects.get(openid__exact=openid)
        if user:
            return user.user
        
    def get_user(self, user_id):
        try:
            return User.objects.get(pk=user_id)
        except User.DoesNotExist:
            return None

Then you add in:

AUTHENTICATION_BACKENDS = (
    'general.authentication.OpenIDAuthentication',
)

That then handles my authentication. Now in my code i can call:

authenicate(openid="some id")

All I had to do then was realise that most of my old login handling and checking methods were wrong and remove them or make them use the contrib.auth ones. Once that was all done, my test cases ran much more smoothly. Going back and looking at django-openid now, this makes much more sense and will hopefully be easier to switch to a more up to date openid version.

Easy RSS Widgets

One thing that I looked at a while back, but didn't have chance to play with was Google Ajax Feeds. These give you an interface to easily read RSS feeds in Javascript.

A few years ago I tried this in Javascript using ClearRSS for Plone. The only problem with that was that I used a generic server side proxy (now redundant thanks to JSONP) and found myself hitting all the inconsistencies with feeds and programming that in Javascript. This API removes the first problem and solves the second by normalising all the data so that it is accessible consistently.

The result is that it's easy to read feeds in Javascript. As an example I just created a widget for Cleartrain, so that all trainings can be easily embedded on a page. For example all the Django training can be pulled using this:

<div id="cleartrain-feed" />
<script type="text/javascript" src="http://www.google.com/jsapi"></script>
<script type="text/javascript" src="http://media.cleartrain.ca/widget/cleartrain.js"></script>	
<script type="text/javascript">
    cleartrain.url = "http://cleartrain.ca/atom/topics/4/";
    cleartrain.content = false;
</script>

Cleartrain is itself a Django site and required no modification above and beyond having RSS feeds for everything. The Javascript to allow this to be embedable is really easy, all we do is read the Ajax API, loop through the entries and then add into the innerHTML. The source for that is here: http://media.cleartrain.ca/widget/cleartrain.js.

Definitely a neat API from Google and a very quick and easy way to create an RSS widget.

Plone Bootcamps

Been a couple of months since we've heard from Joel, but he's back. The best trainer I've seen has got a new bootcamp in San Francisco in April. If you want to learn about Plone it's probably the best training you can get: Plone Bootcamp.