Flapping Head

Perch blog post how to get the excerpt

2013-09-02T11:57:57Z

I’m not a huge fan of Perch, but I’ve used it for one client. Getting the excerpt for a blog post isn’t straight-forward. Here’s a way to do it using the API.

$api = new PerchAPI(1.0, 'perch_blog');
$posts = new PerchBlog_Posts($api);
$post_object = $posts->find_by_slug($blog_slug);
$excerpt = $post_object->excerpt();
// $excerpt is an array with 'raw' and formatted versions.
// e.g. do something with $excerpt['raw']

Hope it saves you some digging.

Run python script in virtualenv from cron

2013-04-09T16:46:31Z

I’m using virtualenv and have some python scripts as part of a django project. The scripts need to be run by cron, but have to use the virtualenv.

There’s a few solutions around, but I went with this shell script which changes to the directory it’s in, activates the virtualenv and runs python for the given script with args.

The script looks like:

#!/bin/bash

# Runs python script within virtualenv
# used by cron

cd `dirname $0`
source env/bin/activate
python "${@:1}"

chmod it so it’s executable.

I run it from cron like:

/srv/www/myproject/runscript scripts/test.py arg1 arg2

The test.py script is at /srv/www/myproject/scripts/test.py – i.e. relative to the runscript script.

This is largely a note-to-self, but I hope it helps you, too.

Multi-file find and replace for {% url “quoted” %} tag in Django 1.5

2013-04-03T12:18:48Z

Django 1.5 deprecates the old {% url %} tag style and uses the new one. The new one needs the url name to be quoted. So instead of {% url public_index %} it must be {% url "public_index" %}.

You’ll get an error:
‘url’ requires a non-empty first argument. The syntax changed in Django 1.5, see the docs.

Instead of adding quotes to each one in each template manually, I wanted a multi-file find and replace regex. Examples abound of using find with xargs and sed. But here’s a simple Python script and example usage to do just that.

Note that it updates the files in-place. Make sure you have a backup and/or they’re committed to version control just in case something goes wrong and messes up your templates. You’ve been warned.

Here’s a simple find and replace Python script:

import sys
import os
import re

def main(argv=None):
    if argv is None:
        argv = sys.argv

    if len(argv) < 4:
        print """
Usage:

%s find replace filename.txt [filename2.txt...]
""" % argv[0]
        exit

    find = argv[1]
    replace = argv[2]
    filepaths = argv[3:]

    for filepath in filepaths:
        text = open(filepath).read()
        #print re.sub(find, replace, text)
        new_text = re.sub(find, replace, text)
        if new_text != text:
            print filepath
            open(filepath, 'w').write(new_text)

if __name__ == '__main__':
    main()

I saved the file as replacer.py.

Here's how I called it:

find path/to/templates/ -name "*.html" | xargs python path/to/replacer.py '{% url ([\w_-]+) ' '{% url "\1" '

The find command finds all .html files in our templates directory. xargs passes the file names to our replacer script along with the find and replace patterns. We're putting quotes around the first parameter which is the url pattern name and leaving other parameters untouched.

This pattern assumes you've got a space between {% and url, which is the standard practice in django. If not, you can change the pattern.

Hope this helps.

What is mysql doing? Why too many connections?

2011-04-29T22:43:27Z

I recently had to troubleshoot a busy mysql database server for a client. Here’s some basic tips.

SHOW PROCESSLIST is your friend

Inside mysql you can run show processlist to find out what the server is doing. From the list you can see how many connections there are, which database, whether they are causing a lock, waiting for a lock, and even the query being run. Very helpful.

You can get the same info from the mysqladmin program. A simple way to monitor what the server is doing is to have cron email you every hour (or whatever interval) with the process list:

crontab -e

MAILTO="you@example.com"
31 * * * * mysqladmin -u root -psecret --verbose processlist

Sleeping connections need closing

If you have a lot of connections in a sleep state, it means they’re not doing anything but are still held open. The server can run out of connections if they’re all busy sleeping.

Assuming the database is being accessed by a website (e.g. php code), you could look for ways to close the database connection sooner.

If you’re using Smarty, instead of:

// open mysql connection
// do stuff
$smarty->display('template.tpl');
// close mysql connection

you might try:

// open mysql connection
// do stuff
$output = $smarty->fetch('template.tpl');
// close mysql connection
print $output;

The difference is, you’re closing the mysql connection before returning data to the client. If you’ve got slow clients (and assuming no reverse-proxy or load balancer to spoon-feed) it could release the database connection much sooner.

EXPLAIN those queries

If you have slow queries, you can find out why using explain. See how many rows they need to read (perhaps there’s no index the query can use) and if they need to be optimised.

Posting unicode with urllib2

2011-01-06T13:22:00Z

While integrating an xml api in to a django project, I needed to post some xml to a site. I was using urllib2.urlopen but because my xml contained a non-ascii character (a £ pound sign) I was getting dreaded encoding errors like:

'ascii' codec can't encode character u'\xa3' in position 359: ordinal not in range(128)

After some messing, I came up with this:

input_xml = render_to_string('integration/checkout.xml', {'booking': booking})
req = urllib2.Request(url, input_xml.encode('utf-8'), {'Content-type': 'text/xml; charset=utf-8'})
response = urllib2.urlopen(req)

The key part is: input_xml.encode('utf-8'). The unicode string needs to be encoded to utf-8.

Android PhoneGap SetupTask cannot be found

2010-12-30T12:20:28Z

I was building an Android app using PhoneGap, but couldn’t get it to build. It was failing to build the Java stuff because the path to the SDK was wrong.

I kept getting error:

framework/build.xml:49: taskdef class com.android.ant.SetupTask cannot be found

It’s because my PATH environment variable was wrong. Specifically, it had trailing slashes.

This was on Ubuntu/Linux.

Got the same problem? Try this to see if it is a problem with slashes:

which android

Did the path have a double slash like: android-sdk-linux_x86/tools//android

echo $PATH

Do you have a path in there to the Android SDK tools directory which ends with a slash? Get rid of that slash and try again. You’re welcome.

Make a slug in PostgreSQL translating diacritics

2010-12-21T10:28:11Z

I needed to make slugs in Postgresql from names which included diacritics. Though not complete, here’s a reasonable stab at it:

select regexp_replace(translate(replace(lower(your_field_name_here), ' ', '-'), 'áàâãäåāăąÁÂÃÄÅĀĂĄèééêëēĕėęěĒĔĖĘĚìíîïìĩīĭÌÍÎÏÌĨĪĬóôõöōŏőÒÓÔÕÖŌŎŐùúûüũūŭůÙÚÛÜŨŪŬŮ','aaaaaaaaaaaaaaaaaeeeeeeeeeeeeeeeiiiiiiiiiiiiiiiiooooooooooooooouuuuuuuuuuuuuuuu'), '[^\w -]', '', 'g') as slug

The translate is not quite right because it translates some uppercase to lowercase, but it’s all lowercase for the slug, so it doesn’t matter here.

[Update] Added ‘g’ flag to regexp_replace to replace all occurrences, not just first.

Django static media always returning 404 not found

2010-10-06T21:19:55Z

I spent too long tonight figuring out a weird problem.

On a dev server, I was using django.views.static.serve to serve media files. But it was returning 404 (not found) for any file.

The requests for media files weren’t even showing up in Django’s built-in server’s output. That had me baffled until I dug deep enough in Django’s code to figure it out.

The ADMIN_MEDIA_PREFIX was the same as MEDIA_URL. That was it.

Django’s built-in server doesn’t log requests for admin media, so that’s why there was no log output.

The built-in server also handles admin media separately, so when I tried to request a media file, it intercepted and looked for it in the admin media.

The solution is for the ADMIN_MEDIA_PREFIX to be different from MEDIA_URL, e.g. /media/ and /media/admin/.

Repair all tables in MySQL

2010-09-08T12:08:04Z

In mysql you can repair a table using:

repair table mytable;

How do you repair all the tables in a database? There’s no command for that, so you’ll need to repair each one individually.

That’s quite tedious, so here’s a bit of help in MySQL 5.

select concat('repair table ', table_name, ';') from information_schema.tables where table_schema='mydatabase';

Change mydatabase to the name of your database.

This gives you a load of repair table statements. They’re formatted in a table with vertical bars, so copy the whole lot, paste in to a text editor or word processor, do a find and replace: find “|” replace with nothing. Don’t worry about the spaces.

Copy and paste that lot in to your mysql command prompt. Job done.

UK Postcodes to Latitude/Longitude

2010-08-27T17:23:12Z

For location-based stuff, it’s useful to take a UK postcode and get the lat/long to plot it on a map (such as Google Maps).

The Google Maps API only has postcodes down to sector-level (e.g. NW1 4), so its results are approximate.

Ordnance Survey released the Code Point dataset free as part of OpenData. Among other things, it includes full postcodes with grid references.

The grid references are OSGB36, rather than lat/long. Converting them is more difficult than you’d think. Here’s the solution I used, based on what I cobbled together from forum posts and the like.

Get the data

Download the Code Point data. It’s a series of CSV files, one for each postcode area.

You might want to concatenate them in to one file. Or if you only want London, postcodes, try the e, ec, n, nw, se, sw, w, wc files.

Convert OSGB36 to WGS84 Lat/Lng

The grid references need to be transformed to latitude/longitude on the WGS84 system. There are a few ways to do this, but I used PROJ4‘s cs2cs program.

The command is:

cs2cs -f '%.7f' +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.060,0.1502,0.2470,0.8421,-20.4894 +units=m +no_defs +to +proj=latlong +ellps=WGS84 +towgs84=0,0,0 +nodefs

There is a PROJ wrapper for Python (pyproj), but I wasn’t smart enough to figure out how to do the options, so instead I spawned the cs2cs program from a Python script.

Here’s the script. It’s not great, but it does the job.

#!/usr/bin/python

import sys
import csv
import subprocess
import re

def main(argv=None):
    if argv is None:
        argv = sys.argv

    if len(argv) != 3:
        print """\nUsage: %s input_file.csv output_file.csv\n""" % argv[0]
        return 1
    
    input = open(argv[1], 'r')
    reader = csv.reader(input)

    output = open(argv[2], 'w')
    output.write('flatpostcode,postcode,easting,northing,latitude,longitude\n')


    input_fields = []
    for index, row in enumerate(reader):
        postcode, easting, northing = row[0], row[10], row[11]
        input_fields.append((postcode, easting, northing))
        if index % 1000 == 0:
            process(input_fields, output)
            input_fields = []
            print 'processed', index
    process(input_fields, output)
    print 'done'

def process(input_fields, output):
    args = ['cs2cs', '-f', '%.7f', '+proj=tmerc', '+lat_0=49', '+lon_0=-2', '+k=0.9996012717', '+x_0=400000', '+y_0=-100000', '+ellps=airy', '+towgs84=446.448,-125.157,542.060,0.1502,0.2470,0.8421,-20.4894', '+units=m', '+no_defs', '+to', '+proj=latlong', '+ellps=WGS84', '+towgs84=0,0,0', '+nodefs']
    cs2cs = subprocess.Popen(args, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
    data = cs2cs.communicate('\n'.join(['%s %s' % (input[1], input[2]) for input in input_fields]))[0]
    for index, line in enumerate(data.split('\n')):
        if line:
            postcode, easting, northing = input_fields[index]
            data_parts = re.split('\s+', line)
            output.write('%s,%s,%s,%s,%s,%s\n' % (postcode.replace(' ', ''), format_postcode(postcode), easting, northing, data_parts[1], data_parts[0]))

def format_postcode(postcode):
    postcode = postcode.replace(' ', '')
    return '%s %s' % (postcode[0:-3], postcode[-3:])

if __name__ == '__main__':
    main()

Run it with your Code Point csv file as input. The output contains the postcode, OS grid refs, latitude and longitude.

Postgresql

Bonus: if you’re using Postgresql, here’s how to get the csv data in to your database.

Create a table matching the csv file layout:

create temporary table postcode (
flatpostcode varchar(8), 
postcode varchar(8), 
easting varchar(10), 
northing varchar(10), 
latitude numeric(10,8), 
longitude numeric(10,8)
);

Load it in:

copy postcode from '/path/to/your/output.csv' with delimiter ',' csv header;

Then copy the fields you want to your real table.

I hope this saves you some time and bafflement. It would be nice to just give you the postcode + lat/long data, but redistributing it is against Ordnance Survey’s terms of use.