2009-11-27

Getting Started with distutils

If you've seen some of the discussions surrounding Distribute (a fork of PJE's Setuptools), you may be wondering about distutils, the backbone of both Distribute and Setuptools. If you are, then you may have noticed that it's easier to find details on the various tools built on top of it than it is to find info on the basics, even though distutils is in the standard library.

Helpful Resources

Here are some of the resources I found useful:

Miscellaneous

One thing to remember: the package_data and data_files keyword arguments to setup in setup.py tell the installer what to install, MANIFEST.in tells the packager what to include in the distribution file. Even if it seems duplicative, you need to make sure your resources show up in both places if you want to be able to use python setup.py install using the distribution you build using python setup.py sdist.

Testing It All Out

After you package up your program, you can see if your setup.py is correct by doing a test install in a virtualenv. This let's you make sure you can python setup.py install etc. without contaminating your global site packages. If you haven't heard about virtualenv, I recommend reading up on it; it's really handy. The first page of Google results is a good starting point.

Back to flipping out...

2009-11-03

Setting up gitosis on OS X

What is gitosis?

gitosis is a wrapper around git that makes it easier share repositories, especially as it relates to managing access rights.

Why Should I Care?

The benefit that made me check it out is that I don't have to create an account on my dev machine for every single developer to whom I'd like to give access to some of my git repos.

Assumptions

  • You installed git and python from MacPorts.
  • You have enabled "Remote Login" in the "Sharing" Preferences pane.
  • You have a public key (typically generated via ssh-keygen); see the man pages for more details.

Installation

Preliminary Steps
  1. Create a git user
  2. Add git to the list of users who can ssh in
  3. Configure git's .bashrc (NOT .bash_profile) so port-installed apps are on the PATH. Mine looks like
    export PATH=/opt/local/bin:/opt/local/sbin:/opt/local/Library/Frameworks/Python.framework/Versions/Current/bin/:$PATH
    export MANPATH=/opt/local/share/man:$MANPATH

Installing gitosis
  1. cd /tmp
  2. mkdir src
  3. cd src
  4. git clone git://eagain.net/gitosis.git
  5. cd gitosis
  6. python setup.py install
  7. sudo chmod 755 /Users/git/repositories/gitosis-admin.git/hooks/post-update

Configuration

Setting up Admin Access

Add yourself as an admin (member of the gitosis-admin group) by executing sudo -H -u git gitosis-init < ~/.ssh/id_dsa.pub. This will use your public key as input into gitosis-init and set you up as an admin.

Cloning the Admin Repo

Try git clone git@`$HOST`:gitosis-admin.git. It should work. If it doesn't, and complains about the other end unexpectedly hanging up, the $PATH for git is probably misconfigured. Make sure you configured .bashrc, because .bash_profile won't stick around.

Configuring a New Repo

You make configuration changes to gitosis by editing your local clone of gitosis-admin and FIXME kbd pushing them back. Let's add a new repo.

  1. Open up gitosis.conf in your editor of choice
  2. Add an entry like the one below
  3. Save
  4. git push
    [group fabfour]
    members = john paul george ringo
    writable = the_monkeys_are_hacks

Breaking It Down
    [group fabfour]

We're defining a new group, named fabfour.

    members = john paul george ringo

Adding john, paul, george, ringo as members of the fabfour group.

    writable = the_monkeys_are_hacks

Listing the_monkeys_are_hacks as the only writable repo for the fabfour group. In case you're saying "Wait, I don't have a the_monkeys_are_hacks repo!" don't worry, that's coming next. You have to do all of this before you try to push anything to gitosis.

Populating Your New Repo
  1. If the repo doesn't exist, create it
    1. mkdir the_monkeys_are_hacks
    2. cd the_monkeys_are_hacks
    3. git init
    4. Do Work Here and Commit It
  2. Add your new gitosis-managed repo as a remote; inside your repo, execute git remote add origin git@`$HOST`:the_monkeys_are_hacks.git

  3. Push to it; inside your repo, execute git push origin master:refs/heads/master

If the last step fails with an inscrutable error, there's a good chance you forgot to chmod the post-update hook.

Actually Letting John and the Gang Access the Repo

Before john, paul, george, or ringo can actually get into the server, you need to add their public keys under gitosis-admin/keydir. I'm going to wave my hands concerning how you get the public keys, but they need to be added to gitosis-admin/keydir and they need to be named using the convention username.pub, so we'd have:

  1. cd gitosis-admin
  2. cp /tmp/john.pub keydir/
  3. cp /tmp/paul.pub keydir/
  4. cp /tmp/george.pub keydir/
  5. cp /tmp/ringo.pub keydir/
  6. git add keydir/john.pub keydir/paul.pub keydir/george.pub keydir/ringo.pub
  7. git ci -m "Adding the keys for the Fab Four"
  8. git push

Want More?

  • Check out this extremely detailed (non-Mac-specific) guide.

Back to flipping out...

2009-10-14

Note to Self: apropos finds man entries

Ever get that feeling that there just has to be something built right into the shell that will solve the problem you're facing, but have no idea what that something might be? Enter apropos. It searches the man page blurbs for entries that match your keywords, which can be regular expressions. How is it that more people don't talk about features like this?

Back to flipping out...

2009-09-28

Sneak Attack: Functional Java

How is this the first time I've heard of Functional Java?

Back to flipping out...

2009-09-25

Note to Self: nohup a Running Process

nohup -p pid

see also

Back to flipping out...

2009-09-24

Sneak Attack: 10 Useful Usability Findings and Guidelines

I wish more people would pay attention to things like this. It's not like most of that information is new, but it still gets ignored far too often.

Back to flipping out...

2009-09-22

Sneak Attack: Python is not Java

One of the (in Internet time, at least) golden oldies. Re-reading this makes me cringe when I think back.

Back to flipping out...

Sneak Attack: Why I Like pip

The most concise description I've found of really using pip.

Back to flipping out...

2009-09-15

Sneak Attack: Sacrificing Quality—Part II—Kaizen

Interesting post about negative hiring impacts caused by sacrificing quality. I also recommend the first post in the series, which focuses on the psychological impacts of sacrificing quality.

Back to flipping out...

2009-08-29

Sneak Attack: Co-routines as an alternative to state machines

Co-routines as an alternative to state machines. And the money quote: Co-routines are to state machines what recursion is to stacks.

Back to flipping out...

2009-08-25

PyAtl Notes—2009-08-20

What follows is essentially a stream-of-consciousness dump from the PyATL meeting on August 20.

Introduction to Python and Sqlite3

Presented by Alfredo Deza
  • For simple things, it's blazing fast
  • It's local, so no network necessary
  • Serializes to a single file
  • Uses DB-API 2.0 (see PEP 249). Once you have your cursor, you can start running SQL.
  • NOTE: Use sqlite3.connect(':memory:', isolation_level='exclusive') to use an in-memory DB
  • The iterdump() method of the sqlite3 connection gives you an iterable that lets you export the in-memory database. This is handy if you want to write them all to a file.
  • Although the sqlite3 docs claim that the concurrency situation has improved, the Internet at large seems to indicate that problems still abound.

GeoDjango

Presented by Skylar Saveland
  • Recommend you use PostGIS w/ PostgresQL
  • OpenLayers is a nice library for embedding map widgets on any webpage.
  • Core team is available on IRC (#geodjango on Freenode)
  • They make it fairly easy to integrate with Google Maps
  • Instead of standard Django models, you use the one from GeoDjango and then you get the Point, Polygon etc. so that you can store GIS data right with your Django models
  • You can do distance, intersection, touches, etc. queries right in the ORM

Nuts & Bolts

Presented by Brandon Rhodes
  • PyCon is 6 months away

  • String formatting
    • New in 2.6
    • str and unicode objects now have a format() method, and that's the encouraged way to do string formatting; % is right out.
    • At least one of the benefits of the new method is that it's easier to read the name-based format
    • It lets you use dot-notation to access attributes of parameters
    • Also supports indexing into lists, etc.
    • I need to look this up, pronto. It's covered briefly in the sequence types in the standard docs and links to a dedicated page for the new syntax.
  • lxml != ElementTree
    • The goal of ElementTree is to be a Pythonic XML library
    • It's probably a dead project, since the current version is 2 years old
    • lxml uses the "ElementTree" object model, but it's built on top of libxml2 and libxslt so it's blazing fast.
    • Remember these two imports:
      • from lxml import etree
      • from lxml.cssselect import CSSSelector as css
    • lxml also supports CSS selectors
    • Don't forget the default for lxml is an XML parser; you want the HTML parser
    • Ian Bicking says lxml is better than BeautifulSoup

Making Code More Testable

Presented by Brandon Rhodes
  • Based on Writing Testable Code; showing Python examples of how some of these actually look when you put them in action
Don't Mix Object Graph Construction with Application Logic
  • This one makes it really hard to test classes in isolation, because instantiating one starts instantiating a bunch of objects we don't want to test.
  • On a side note, nose handily shows you captured stdout when a test fails, and doesn't show you when the test is successful. That way you can you leave your debugging print statements. It does not capture stderr.
  • The simple fix for this in Python is to accept dependencies in the __init__ method. You might recognize this as constructor-based dependency injection. In Java, mutator-based dependency injection is more common.
Avoid "Global State"
  • Even though every programmer (should) know global state is evil, you still see this quite often in web apps where the global state is something like the request and/or session.
  • The breakage is quite often far more extensive than you would expect, e.g., a test in one file that doesn't handle global state well can break tests in other files, even if they are doing a good job of handling the global state.
  • The fix is to pass the state explicitly, even if the state has to be passed to a lot of functions. This is inconvenient at times, but is usually worth it.

2009-08-24

Project Euler: Problem 20

I finished up Problem 20 a while back. This problem was really straightforward, especially given my work on Problem 16.


import math


if __name__ == '__main__':
    print sum((int(digit) for digit in str(math.factorial(100))))

Back to flipping out...

2009-08-13

Project Euler: Problem 19

I recently solved Problem 19 from Project Euler. Since I don't really enjoy date math, I decided to brute force it and use the datetime module from the standard library.


from datetime import date


def next_first_of_month_in_20th():
    """Generator to list every first of the month during the 20th century."""
    first = date(1901, 1, 1)
    yield first
    while first.year < 2001:
        if first.month == 12:
            first = first.replace(year=first.year + 1)
            first = first.replace(month=1)
        else:
            first = first.replace(month=first.month + 1)
        yield first


def main():
    """
    Solve `Problem 19`_ from Project Euler.
    
    .. _`Problem 19`: http://projecteuler.net/index.php?section=problems&id=19
    """
    return len([first for first in next_first_of_month_in_20th() if first.weekday() == 6])


if __name__ == '__main__':
    print main()

Back to flipping out...

2009-08-07

Project Euler, Problems 18 and 67

I recently finished up Problem 18 (and shortly thereafter, Problem 67). This was a fairly interesting problem, because I like graph theory even if I'm bad at it, and I immediately thought graph theory when I read this problem.

I started creating Vertex and Edge classes, but quickly decided that was too heavyweight for such a small problem. Instead, I chose to use a list of lists where the left-hand child of a node is stored in the next row on the same column and the right-hand child is stored in the next row and the next column.

This made representing and using the data fairly straightforward, but I kept running into the problem of doing an exhaustive search. I knew this was suboptimal (and completely unsuited for the related Problem 67), but I was so locked into thinking of the triangle from the top-down I made things over-complicated.

Eventually, I decided I needed to reset my thinking and Googled for hints. Once I started thinking of the problem from the bottom up (since the graph, after all, is not directed), the solution became clear. Since I didn't need the actual path, I modified the existing data structure to minimize memory requirements.


def find_max_sum(triangle):
    """
    Find the maximum sum for a path from the top to the bottom of ``triangle``.

    >>> test = [[3,], \
            [7, 5], \
            [2, 4, 6], \
            [8, 5, 9, 3]]
    >>> find_max_sum(test)
    23
    """
    while len(triangle) > 1:
        _reduce_triangle(triangle)
    return triangle[0][0]


def _reduce_triangle(to_reduce):
    """
    Reduce ``to_reduce`` in place by rolling up the maximum path info one row.

    Don't return anything to emphasize the in-place nature of this function.

    >>> test = [[3,], \
            [7, 5], \
            [2, 4, 6], \
            [8, 5, 9, 3]]
    >>> _reduce_triangle(test)
    >>> test
    [[3], [7, 5], [10, 13, 15]]
    >>> _reduce_triangle(test)
    >>> test
    [[3], [20, 20]]
    >>> _reduce_triangle(test)
    >>> test
    [[23]]
    """
    last_row = to_reduce[-1]
    for index in xrange(len(to_reduce) - 1):
        to_reduce[-2][index] += max(last_row[index:index + 2])
    del to_reduce[-1]


def main():
    """
    Solve `Problem 18`_ of Project Euler.

    .. _Problem 18: http://projecteuler.net/index.php?section=problems&id=18
    """
    actual = [[75,],
             [95, 64],
             [17, 47, 82],
             [18, 35, 87, 10],
             [20, 4, 82, 47, 65],
             [19, 1, 23, 75, 3, 34],
             [88, 2, 77, 73, 7, 63, 67],
             [99, 65, 4, 28, 6, 16, 70, 92],
             [41, 41, 26, 56, 83, 40, 80, 70, 33],
             [41, 48, 72, 33, 47, 32, 37, 16, 94, 29],
             [53, 71, 44, 65, 25, 43, 91, 52, 97, 51, 14],
             [70, 11, 33, 28, 77, 73, 17, 78, 39, 68, 17, 57],
             [91, 71, 52, 38, 17, 14, 91, 43, 58, 50, 27, 29, 48],
             [63, 66, 4, 68, 89, 53, 67, 30, 73, 16, 69, 87, 40, 31],
             [4, 62, 98, 27, 23, 9, 70, 98, 73, 93, 38, 53, 60, 4, 23]]
    print find_max_sum(actual)


if __name__ == '__main__':
    main()

Starting with this, the solution for Problem 67 is trivial: just write some code to parse the triangle out of a text file instead of coding it directly into the source code:


def find_max_sum(triangle):
    """
    Find the maximum sum for a path from the top to the bottom of ``triangle``.

    >>> test = [[3,], \
            [7, 5], \
            [2, 4, 6], \
            [8, 5, 9, 3]]
    >>> find_max_sum(test)
    23
    """
    while len(triangle) > 1:
        _reduce_triangle(triangle)
    return triangle[0][0]


def _reduce_triangle(to_reduce):
    """
    Reduce ``to_reduce`` in place by rolling up the maximum path info one row.

    Don't return anything to emphasize the in-place nature of this function.

    >>> test = [[3,], \
            [7, 5], \
            [2, 4, 6], \
            [8, 5, 9, 3]]
    >>> _reduce_triangle(test)
    >>> test
    [[3], [7, 5], [10, 13, 15]]
    >>> _reduce_triangle(test)
    >>> test
    [[3], [20, 20]]
    >>> _reduce_triangle(test)
    >>> test
    [[23]]
    """
    last_row = to_reduce[-1]
    for index in xrange(len(to_reduce) - 1):
        to_reduce[-2][index] += max(last_row[index:index + 2])
    del to_reduce[-1]


def _parse_triangle_from_file(data_file):
    """
    Parse out the triangle data from ``data_file``.

    >>> _parse_triangle_from_file('test_triangle.txt')
    [[3], [7, 5], [10, 13, 15]]
    """
    triangle = []
    with open(data_file, 'r') as triangle_file:
        for line in triangle_file:
            triangle.append([int(x) for x in line.split()])
    return triangle


def main():
    """
    Solve `Problem 67`_ of Project Euler.

    .. _Problem 67: http://projecteuler.net/index.php?section=problems&id=67
    """
    print find_max_sum(_parse_triangle_from_file('triangle.txt'))


if __name__ == '__main__':
    main()

2009-07-14

Sneak Attack: Cloud Computing Hierarchy of Needs

A new-and-improved hierarchy of needs for a our new-and-improved era.

Back to flipping out...

2009-07-08

Sneak Attack: Quick Retorts to Unreasonable Requests

A handy-dandy list of retorts for when the PM in your life is being unreasonable.

Back to flipping out...

2009-06-24

Sneak Attack: Maven Archetypes for Apache Tuscany

The Tuscany community has put together a wiki that, among other things, describes some Maven archetypes for getting started with Tuscany. Trust me, this is a real win.

Back to flipping out...

Sneak Attack: Django Circular Model References

Django Circular Model References highlights a neat trick to make Django models play nice with circular references.

Back to flipping out...

2009-06-15

Project Euler: Problem 11, Redux

I've already solved Problem 11, but I didn't really do a great job. As BlueRaja pointed out in a comment, I was doing twice as much work as I needed to. Since I was revisiting this code anyway, I decided to play around a little bit with zip and itertools. Here's the new and improved solution:


"""Solve Problem 11 from Project Euler."""
import itertools
import operator


OFFSET = 3
GRID = [
    [8, 2, 22, 97, 38, 15, 0, 40, 0, 75, 4, 5, 7, 78, 52, 12, 50, 77, 91, 8],
    [49, 49, 99, 40, 17, 81, 18, 57, 60, 87, 17, 40, 98, 43, 69, 48, 4, 56, 62, 0],
    [81, 49, 31, 73, 55, 79, 14, 29, 93, 71, 40, 67, 53, 88, 30, 3, 49, 13, 36, 65],
    [52, 70, 95, 23, 4, 60, 11, 42, 69, 24, 68, 56, 1, 32, 56, 71, 37, 2, 36, 91],
    [22, 31, 16, 71, 51, 67, 63, 89, 41, 92, 36, 54, 22, 40, 40, 28, 66, 33, 13, 80],
    [24, 47, 32, 60, 99, 3, 45, 2, 44, 75, 33, 53, 78, 36, 84, 20, 35, 17, 12, 50],
    [32, 98, 81, 28, 64, 23, 67, 10, 26, 38, 40, 67, 59, 54, 70, 66, 18, 38, 64, 70],
    [67, 26, 20, 68, 2, 62, 12, 20, 95, 63, 94, 39, 63, 8, 40, 91, 66, 49, 94, 21],
    [24, 55, 58, 5, 66, 73, 99, 26, 97, 17, 78, 78, 96, 83, 14, 88, 34, 89, 63, 72],
    [21, 36, 23, 9, 75, 0, 76, 44, 20, 45, 35, 14, 0, 61, 33, 97, 34, 31, 33, 95],
    [78, 17, 53, 28, 22, 75, 31, 67, 15, 94, 3, 80, 4, 62, 16, 14, 9, 53, 56, 92],
    [16, 39, 5, 42, 96, 35, 31, 47, 55, 58, 88, 24, 0, 17, 54, 24, 36, 29, 85, 57],
    [86, 56, 0, 48, 35, 71, 89, 7, 5, 44, 44, 37, 44, 60, 21, 58, 51, 54, 17, 58],
    [19, 80, 81, 68, 5, 94, 47, 69, 28, 73, 92, 13, 86, 52, 17, 77, 4, 89, 55, 40],
    [4, 52, 8, 83, 97, 35, 99, 16, 7, 97, 57, 32, 16, 26, 26, 79, 33, 27, 98, 66],
    [88, 36, 68, 87, 57, 62, 20, 72, 3, 46, 33, 67, 46, 55, 12, 32, 63, 93, 53, 69],
    [4, 42, 16, 73, 38, 25, 39, 11, 24, 94, 72, 18, 8, 46, 29, 32, 40, 62, 76, 36],
    [20, 69, 36, 41, 72, 30, 23, 88, 34, 62, 99, 69, 82, 67, 59, 85, 74, 4, 36, 16],
    [20, 73, 35, 29, 78, 31, 90, 1, 74, 31, 49, 71, 48, 86, 81, 16, 23, 57, 5, 54],
    [1, 70, 54, 71, 83, 51, 54, 69, 16, 92, 33, 48, 61, 43, 52, 1, 89, 19, 67, 48]
]
INVALID_COORDS_PRODUCT = 0


def _calc_down(row_index, col_index):
    """Calculate the product of the sequence going straight down from ``row_index``."""
    cols = itertools.repeat(col_index, OFFSET + 1)
    last_row = min(row_index + OFFSET + 1, len(GRID))
    coords = zip(xrange(row_index, last_row), cols)
    values = (GRID[row][col] for row, col in coords)
    return reduce(operator.mul, values)


def _calc_right(row_index, col_index):
    """Calculate the product of the sequence going straight right from ``col_index``."""
    rows = itertools.repeat(row_index, OFFSET + 1)
    last_col = min(col_index + OFFSET + 1, len(GRID[row_index]))
    coords = zip(rows, xrange(col_index, last_col))
    values = (GRID[row][col] for row, col in coords)
    return reduce(operator.mul, values)


def _calc_up_right(row_index, col_index):
    """Calculate the product of the sequence going up and to the right from (``row_index``, ``col_index``)."""
    last_row = max(row_index - OFFSET - 1, -1)
    last_col = min(col_index + OFFSET + 1, len(GRID[row_index]))
    coords = zip(xrange(row_index, last_row, -1), xrange(col_index, last_col))
    values = (GRID[row][col] for row, col in coords)
    return reduce(operator.mul, values)


def _calc_down_right(row_index, col_index):
    """Calculate the product of the sequence going down and to the right from (``row_index``, ``col_index``)."""
    last_row = min(row_index + OFFSET + 1, len(GRID))
    last_col = min(col_index + OFFSET + 1, len(GRID[row_index]))
    coords = zip(xrange(row_index, last_row), xrange(col_index, last_col))
    values = (GRID[row][col] for row, col in coords)
    return reduce(operator.mul, values)


def _get_max_product_for_coordinate(row, col):
    """Find the maximum product achievable from (``row``, ``col``)."""
    return max(_calc_up_right(row, col), _calc_right(row, col), _calc_down_right(row, col), _calc_down(row, col))


def problem_11():
    """Find the largest product of four adjacent elements."""
    products = set()
    for row in range(len(GRID)):
        for col in range(len(GRID[row])):
            products.add(_get_max_product_for_coordinate(row, col))
    return max(products)


if __name__ == '__main__':
    print problem_11()

Back to flipping out...

2009-06-13

Sneak Attack: Rosetta Code

Rosetta Code is a nifty site that has a set of common tasks implemented in many different languages.

Back to flipping out...

2009-06-06

Review of Expert Python Programming, Part Four

Chapter 5: Writing a Package

A Common Pattern for All Packages

This section provides all the details necessary to create a namespaced package that consists of multiple eggs glued together by a master egg using distutils and setuptools. There are multiple subsections that cover everything you should need to build your project, register it with the Cheeseshop (or any other package index), and upload it for the world to enjoy.

How to Uninstall a Package

This section provides a short rationalization for why there isn't a built-in uninstall command and the current workaround. It's not mentioned in the book, but the author is currently working on improving distutils, and an uninstall command is on the agenda. The full details are in the Adding an Uninstall Function section of PEP 376.

The Template-Based Approach

After covering the basics of building and distributing a package in the previous section, this section focuses on using templates to reduce the tedium in creating consistent application skeletons. The tool used in this section is Python Paste. There are a couple of short examples using pre-existing templates.

Creating the Package Template

This section covers the process of creating a custom template that describes the structure introduce in the first section.

The rest of the chapter just covers generic aspects of the development cycle, such as version numbering, so I will skip it for this review.

Back to flipping out...

Review of Expert Python Programming, Part Three

Chapter 4: Choosing Good Names

This chapter is aimed at helping improve API design, mostly by helping improve the names you choose when building the API. Ironically enough, not all of the advice is about naming, so the chapter title could be better. Most of the advice is good, although some of it overlaps with PEP 8 and some of it is far more general than just Python. That doesn't dilute the value of the advice, but I will skip it in this review.

Best Practices for Arguments
Build Arguments by Iterative Design

This advice is applicable to pretty much any language, but it is surprising how many people get it wrong. Also, if you're new to languages that allow you to specify default values for parameters, this section highlights one of the best uses for that feature.

Trust the Arguments and Your Tests

More solid advice: don't try to recreate a static typing system in a dynamically typed language. This isn't exactly profound insight, but a lot of people (myself included) make this mistake when they transition to dynamic typing from static typing. Instead of just saying "the tests will catch that", which some interpret as "write unit tests and/or assertions to enforce static-style typing", the author is careful to explain that the tests are supposed to test actual use cases. The author also mentions Design-by-Contract. While he doesn't seem to favor it, he does provide a link to Contracts for Python. It should be noted that the related PEP 316 has been deferred, so some people may find the resulting code "unpythonic".

Use *args and **kw Magic Arguments Carefully

The author acknowledges that use of *args and **kw (also seen as **kwargs) is sometimes necessary, e.g., metaprogramming, but in general considers them a design smell. He also offers advice for improving the design, e.g., accepting a single iterable parameter instead of *args.

Module and Package Names

Mostly mundane and duplicative of PEP 8, but there are a couple of gems: the convention of using a lib suffix in a module or package name if it is implementing a protocol, e.g., smtplib; using __init__ to import some APIs into the top level of the package, including a caveat about the increased potential of circular dependencies.

Working on APIs
Tracking Verbosity

Even though this is the sort of generic advice I said I was going to skip, I liked it so much I decided to include it anyway. If there is a common use case for some sequence of calls into your API, you should expose a function that encapsulates it. Doing anything else just invites errors.

Building the Namespace Tree

I really liked this section. In just a couple of short pages, the author shows an example of evolving the namespace structure for an application. I don't have enough experience in the Python world to know if this is a realistic example, but I thought it was presented well and made good sense. I also like that the book addresses this topic at all. The Zen of Python says "Namespaces are one honking great idea -- let's do more of those!" but I don't see many mentions of resources covering how to design them well.

Using Eggs

This sections provides a concise explanation of what eggs do and a sneak preview at how to define them. I would have liked more info, but the inset promises more details in Chapter 6.

Using a Deprecation Process

The advice in this section is also generic (don't break an already-published API), but I mentioned it because it shows the Pythonic method for deprecating old APIs: DeprecationWarning.

Useful Tools

This section only lists two tools: Pylint and CloneDigger. They are useful, but some tips for using them to greatest effect would have been nice.

Back to flipping out...

2009-06-05

Bravos, you're nothing to me now.

Quick! Who is this?

A snake in the grass

That's right, it's the Atlanta Braves. Not satisfied with their classless treatment of John Smoltz, now the Braves have treated another future Hall-of-Famer shabbily by misleading Tom Glavine about his chances to make the big league club then releasing him to avoid paying him his roster bonus.

We now return you to your regularly (or not) scheduled programming blog updates.

2009-06-01

Sneak Attack: Introduction to Python Profiling (PyCon talk)

This PyCon talk on profiling is really good stuff. Also, it's nice that he points to KCachegrind and RunSnakeRun for better visualizations than the default from cProfile.

Back to flipping out...

Five Things I Hate (or at least dislike) About Python

1. Implicit Variable Creation

This is probably my biggest complaint. Why would you want to implicitly create a variable the first time something is assigned to it? This might make sense in a language where variables are immutable by default (after all, you only ever assign a variable once), but Python isn't. Also, I realize there is a technical difference between rebinding names and changing the value of a variable; I don't find that particular distinction useful here.

2. Dearth of Collections (in the standard library)

Don't get me wrong: defaultdict and namedtuple are nice, but on occasion I really find myself wishing for some more advanced data structures, e.g., Red-Black tree. I'm not even talking about probabilistic structures like Bloom filters or skip lists.

3. Lack of Tail-call Elimination

I know it likely won't happen, but I still wish I had it. To me (and I'm sure many others), recursive algorithms are the most natural way to express certain algorithms, e.g., traversing a tree. I can do it using a loop, but it really drops me out of the zone.

4. Concurrency in the Standard Library

In an ideal world, Python would support concurrency on a level with first-class functions, similar to Erlang. It's almost not even fair to ding Python on this, since pretty much every other language its age has the same problem, but a man can dream, right? At least the multiprocessing module made it into the standard library.

5. Interfaces

It would be really swell if Python had support for something like interfaces. I know that PEP 3119 introduced Abstract Base Classes, so this one is probably on the way to being remedied, but the feature is so new I haven't yet encountered it in the wild.

Back to flipping out...

Five Things I Hate About Java

I'm sure you've all seen the Five Things meme. Here's the five things I hate about Java.

1. Lack of Type Inferencing

There's just enough generics-related type inferencing to leave me wanting more.

2. No First-class Functions

I don't care if there are patterns for doing this, it's not very idiomatic, and the boilerplate obscures the intent of the code, which largely defeats the purpose.

3. No Read-Eval-Print Loop (REPL)

I can't praise the REPL enough. Java should have had one of these in the JDK in 1.0.

4. Checked Exceptions

The standard library is riddled with checked exceptions that really should have been unchecked, and third-party libraries have followed suit. In fact, Exception should have been unchecked and we should have CheckedException as a subclass instead of RuntimeException.

5. Array Covariance

It's broken. Especially now that Java has covariant returns (thanks Java 5!), this bothers me just often enough that I've forgotten how broken it is right before I need/want it.

Honorable Mention: No Tail-call Elimination

This very nearly bumped Array Covariance. I really wish tail-call elimination would catch on in mainstream languages.

Back to flipping out...

2009-05-30

Follow-up to Grails, The Acegi Plugin, and HTTP Basic

In Grails, The Acegi Plugin, and HTTP Basic I blogged about some unexpected behavior in the Grails Acegi plugin (now Spring Security plugin). Good news: the behavior has been changed on trunk and is scheduled for version 0.6. Thanks for letting me know, Burt.

Back to flipping out...

2009-05-29

Sneak Attack: Installing cx_Oracle on an Intel Mac

Install instructions that actually work for cx_Oracle on an Intel Mac. It requires MacPorts and doesn't work with the latest version (5.0.2) of cx_Oracle, so make sure you actually use 4.4.1.

Back to flipping out...

2009-05-28

Note to Self: Rebasing from One Branch to Another in git


git rebase --onto $NEW_BASE $OLD_BASE $BRANCH_BEING_REBASED

Back to flipping out...

2009-05-23

Review of Expert Python Programming, Part Two

Chapter 3
Subclassing Built-in Types
  • I started learning Python after this was added, so it never occurred to me not to do this.
Accessing Methods from Superclasses
  • Tries to explain super, but it's quite confusing (mostly due to the multiple-inheritance problems).
  • The standard docs do a better job of making it clear the main benefit is making maintenance easier in single-inheritance examples.
Understanding Python's Method Resolution Order (MRO)
The section isn't as clear as it could be, but it has solid information. It explains what the MRO is used for and how it's different between 2.2 and 2.3, and the __mro__ attribute
super Pitfalls
Points out some of the most common problems with super: mixing super and classic calls (and how to use __mro__ to choose what to do) and subclass constructors that take arguments that differ from their parent classes.
Best Practices
  • Solid, short (so you can remember it) section.
Descriptors and Properties
Descriptors
This section isn't overly clear. It describes what descriptors are from a technical perspective, but doesn't do a great job of explaining why you'd want to use them. Also, a fair number of the examples have errors, e.g., code for setting values when the text says it is for reading values. Luckily, it contains a link to the (more helpful) How-To Guide for Descriptors.
Properties
This section explains property and the property attributes it returns (though not necessarily why they're so useful) and does a good job of pointing out some gotchas, e.g., the way they don't pick up overridden methods. The solution offered is perfectly sensible: override the property instead of the (typically private) method bound to fget.
Slots
  • Short but informative section on slots, which I haven't seen mentioned before.
Meta-Programming
The __new__ Method
This section covers __new__ and its usefulness for making sure that a class's invariants aren't violated because a subclass didn't explicitly invoke __init__.
The __metaclass__ method
This section covers customizing class creation using __metaclass__ and points out that, in most cases, there are easier-to-understand alternatives. One example of when there isn't is adjusting read-only attributes, e.g., the __doc__ attribute of the built-in metaclass type. Other suggested usages for __metaclass__ include frameworks enforcing behavior across large groups of classes and orthogonal functionality such as logging. The section closes with a link to A Primer on Python Metaclass Programming.
Summary

The summary section is a short, bulleted list highlighting the most important points made in the chapter. Again, it's short enough to be easily memorable.

Back to flipping out...

2009-05-21

Review of Expert Python Programming, Part One

As I mentioned, I got started with the PyAtl Book Club at the May meeting. The first book I received was Expert Python Programming, by Tarek Ziadé. Part of the deal with the book club is posting a review of the book, so I will be posting about the book as I work my way through it.

Chapter 1

This is basic setup info (though the vim section has a nice selection of config settings for those new to it), so I won't really cover it; I already have multiple versions of Python running on my dev box.

Chapter 2

This chapter is entitled "Syntax Best Practices—Below the Class Level", and that's a pretty accurate description. There's a short section on List Comprehensions followed by a longer section on Iterators and Generators. This is a fast but detailed look at the subject, including the use of send and coroutines (via the mechanisms defined in PEP 342). The treatment is rather brief, so I recommend David Beazley's A Curious Course on Coroutines and Concurrency for more detail. Up next are genexps (defined in PEP 289); these were added in Python 2.4. This section is short and to the point: use genexps wherever you would use a list comprehension unless you need some feature of lists that a generator lacks. The next section is a brief glimpse at itertools. It is hardly complete, but it does highlight one of the (IMO) most interesting pieces of the module: tee. This function makes it practical to use genexps even when you need to make multiple passes across the data. Up next is the section on decorators (defined in PEP 318), which were introduced with Python 2.4. The author addresses decorators in much more detail than the previous topics, with extended examples of using them for various things such as caching, proxies, and context providers. That last provides a nice segue into the next session, where he shows us how to replace context provider decorators by using the with statement (defined in PEP 343) introduced in Python 2.5. The author does a pretty thorough job of explaining what with is doing and how you can use it to good effect, including how to use the contextlib module (introduced at the same time as the with statement) to use context managers with code that doesn't provide them out of the box.

Thoughts

Although it was a bit whirlwind at times, the author does a good job of covering the modern language constructs that Python has picked up in the last few versions. Although you may want to read a more detailed tutorial on a given feature, this chapter does a good job of getting you up to speed with modern Python.

Back to flipping out...

Sneak Attack: Workmen, tools, etc.

Nice analysis of It's a poor workman that blames his tools.

Back to flipping out...

2009-05-20

Note to Self: Oracle's TIMESTAMP literal syntax

TIMESTAMP'2009-01-15 00:00:00.000'

Back to flipping out...

Sneak Attack: AMQP and Python

A nice overview of Advanced Message Queuing Protocol (AMQP) concepts, with Python examples using py-amqplib .

Back to flipping out...

2009-05-19

Sneak Attack: The decorator module

The decorator module is spiffy stuff.

Back to flipping out...

2009-05-18

Note to Self: Clearing Oracle's Buffer Cache

The next time you're doing some rough benchmarking and you want to prevent Oracle's cache from skewing your subsequent runs:

alter system flush buffer_cache;

Be forewarned: sometimes it takes a while to run this one, and you need system-level privileges.

Back to flipping out...

2009-05-14

PyAtl Notes - 2009-05-14

What follows is essentially a stream-of-consciousness dump from the PyAtl meeting tonight.


Python Atlanta Meetup 2009-05-14

Miscellaneous

They're recruiting people to do A/V work during PyCon here in Atlanta next year. Does this mean I get in to the con for free?

Hot Topics
  1. Community - you can get a lot of benefit from small-scale con-style stuff; look for the upcoming article about PyOhio in Python Magazine
  2. Distutils - Interesting things on the distutils mailing list
  3. IronPython - lots of buzz, but it's not relevant to me right now
Testing with Nose

Alfredo Deza presenting.

Talking about testing Supay (a daemon module; named after an Incan demon god). Why another daemon module?

  • PEP 3143
  • It's really a service, not just a daemon
  • At least it's not a new web framework
  • Start, Stop, Status
  • Spawn children

Originally testing was done in Pythoscope. Easy to get started. Moved on to nose. Had problems because things under test were backgrounding, etc. People on the testing in Python mailing list advised him to test smaller chunks. coverage does testing coverage reporting for nose. coverage gives you really specific (package-specific, etc.) statistics.

The cool thing about nose is that it autodetects your tests.

One early gotcha: You have to invoke nose using nosetests. ed. I might as well alias `nosetests` to `nosetests -v --cover-package=$package_name`. Simply add --with-coverage to get the coverage report.

Cool, Pythoscope generates test stubs for you. I need me some of this.

Intermission

There was some A/V stuff going on here, so they took this opportunity to hand out books for review. I scored Expert Python Programming.

Choosing a Testing Framework

Brandon Rhodes presenting.

Intro

At first, he thought testing would help most by helping to get it right at first. Once he started testing, he discovered that it helped most with finding regressions. Much like undo, now he can't imagine getting by without it. One thing to note: installed Python packages rarely ship with tests, so if you want to muck around in them, you typically need to download the tarball yourself.

What does Python testing look like without a framework?

simplejson

Uses setuptools, so you can just use the built-in test command. You can use an additional_tests function to build a test suite for setuptools. setuptools will not autodetect doctest tests. All in all, this is a pretty well-behaved module.

Sidenote: virtualenv - creates local, tiny Python install that you can use to avoid system-wide changes. I've seen it mentioned before, but this is the first time I've seen it in action; this could be seriously cool.

SQLAlchemy

Has a README.unittests (referenced from README) that tells you exactly how to do things. There is a lot of work to manually keep track of the tests in modules; obviously, this is begging for someone to forget to add a test to the suite.

Grok

The tests here are pretty idiosyncratic (each test defines a complete web app and puts it through its paces). It uses part of zc.testrunner.
Three testing frameworks

Python testing frameworks:

Questions you should ask:

  • How are tests discovered?
  • How can tests be filtered and selected?
  • How fancy are tests when reported?

One benefit of testing frameworks is that it makes the testing in your project look like the testing in other projects that use the same testing framework.

zc.testrunner

Not really covered beyond what we saw with Grok.

py.test

Finds tests based on a naming convention; this is not configurable. Tests are just functions; you don't need a class w/ test methods. You can tag tests any way you want. By default only outputs print statements if it fails. Has distributed tests and multiplatform, but the documentation is a bit sparse. Uses a module to turn on autodetection for doctest tests.

nose

Finds tests based on a naming convention; this is configurable. Tests are just functions; you don't need a class w/ test methods. You can tag tests any way you want. Has more extensive documentation and seems to have momentum.

It is probably feasible to write your tests so both py.test and nose can find and run them.

Selenium Demo (lightning talk)

Brandon Rhodes presenting.

Not a lot new to me; already used Selenium at work. The Python client seems to be doing a pretty good job of being Pythonic; the tests use unittest, etc. The basic commands seem pretty similar to the Java client I've used before.

Google Code and Hg (lightning talk)

Alfredo Deza presenting.

Brief compare/contrast between hg and svn. I'm already familiar with both so I won't recap this part of the presentation.

Traditionally, svn has been the standard for Google Code. They're giving invites to anyone attending the Google IO Conference (next week?). They're also accepting applications here; this is for old projects and they will move it over to keep the history. Not all features are available on the Mercurial version yet, e.g., code browsing.

Coolest feature: hg serve. It sets up a web interface to your entire repo on localhost:8000. It's pretty sophisticated, and allows you to push and pull from it.

And that concluded the evening.


Back to flipping out…

2009-04-27

Note to Self: Parsing the Accepts header

Here's an easy-to-follow snippet for the next time you need to parse an Accepts header.

Back to flipping out...

2009-04-13

Project Euler: Problem 17

I recently finished up Problem 17, and since the problem was fairly straightforward, I used it as an opportunity to explore some of Python's language features that I've been meaning to look into.


# problem17.py
"""
Find the solution to `Problem 17`_ at `Project Euler`_.

.. _Problem 17: http://projecteuler.net/index.php?section=problems&id=17
.. _Project Euler: http://projecteuler.net/
"""
__docformat__ = "restructuredtext en"

import re

_NUMBER_NAMES = {1: "one", 2: "two", 3: "three", 4: "four", 5: "five",
                6: "six", 7: "seven", 8: "eight", 9: "nine", 10: "ten",
                11: "eleven", 12: "twelve", 13: "thirteen", 14: "fourteen",
                15: "fifteen", 16: "sixteen", 17: "seventeen",
                18: "eighteen", 19: "nineteen", 20: "twenty",
                21: "twenty-one", 22: "twenty-two", 23: "twenty-three",
                24: "twenty-four", 25: "twenty-five", 26: "twenty-six",
                27: "twenty-seven", 28: "twenty-eight", 29: "twenty-nine",
                30: "thirty", 31: "thirty-one", 32: "thirty-two",
                33: "thirty-three", 34: "thirty-four", 35: "thirty-five",
                36: "thirty-six", 37: "thirty-seven", 38: "thirty-eight",
                39: "thirty-nine", 40: "forty", 41: "forty-one",
                42: "forty-two", 43: "forty-three", 44: "forty-four",
                45: "forty-five", 46: "forty-six", 47: "forty-seven",
                48: "forty-eight", 49: "forty-nine", 50: "fifty",
                51: "fifty-one", 52: "fifty-two", 53: "fifty-three",
                54: "fifty-four", 55: "fifty-five", 56: "fifty-six",
                57: "fifty-seven", 58: "fifty-eight", 59: "fifty-nine",
                60: "sixty", 61: "sixty-one", 62: "sixty-two",
                63: "sixty-three", 64: "sixty-four", 65: "sixty-five",
                66: "sixty-six", 67: "sixty-seven", 68: "sixty-eight",
                69: "sixty-nine", 70: "seventy", 71: "seventy-one",
                72: "seventy-two", 73: "seventy-three", 74: "seventy-four",
                75: "seventy-five", 76: "seventy-six", 77: "seventy-seven",
                78: "seventy-eight", 79: "seventy-nine", 80: "eighty",
                81: "eighty-one", 82: "eighty-two", 83: "eighty-three",
                84: "eighty-four", 85: "eighty-five", 86: "eighty-six",
                87: "eighty-seven", 88: "eighty-eight", 89: "eighty-nine",
                90: "ninety", 91: "ninety-one", 92: "ninety-two",
                93: "ninety-three", 94: "ninety-four", 95: "ninety-five",
                96: "ninety-six", 97: "ninety-seven", 98: "ninety-eight",
                99: "ninety-nine"}

_CHARACTERS_WE_CARE_ABOUT = re.compile("\w")

def _words_from_num(num):
    """
    Convert ``num`` to its (British) English phrase equivalent.

    If ``num`` is greater than 9,999 then raise an ``Exception``.
    >>> _words_from_num(115)
    'one hundred and fifteen'
    """
    if num >= 10000:
        raise Exception, 'This function only supports numbers less than 10000.'

    parts_list = []
    if num >= 1000:
        thousands = num // 1000
        parts_list.append(_NUMBER_NAMES[thousands])
        parts_list.append(" thousand")
        num -= thousands * 1000
    if num >= 100:
        hundreds = num // 100
        parts_list.append(_NUMBER_NAMES[hundreds])
        parts_list.append(" hundred")
        num -= hundreds * 100
    if num:
        if parts_list:
            parts_list.append(" and")
        parts_list.extend([" ", _NUMBER_NAMES[num]])

    return "".join(parts_list)

def _count_characters_we_care_about(string_to_count):
    """
    Count the characters in ``string_to_count``, excluding things like hyphens and spaces.

    >>> _count_characters_we_care_about("one hundred and twenty-three")
    24
    """
    return len(_CHARACTERS_WE_CARE_ABOUT.findall(string_to_count))

def problem_17(upper_bound = 1000):
    """
    Find the solution to `Problem 17`_ at `Project Euler`_.

    .. _Problem 17: http://projecteuler.net/index.php?section=problems&id=17
    .. _Project Euler: http://projecteuler.net/

    >>> problem_17(2)
    6
    """
    converted_nums = (_words_from_num(num) for num in xrange(1, upper_bound + 1))
    lengths = (_count_characters_we_care_about(phrase) for phrase in converted_nums)
    return sum(lengths)

if __name__ == '__main__':
    print problem_17()

Generator Expressions

The quick version: They're like list comprehensions, only lazy. The longer version resides in PEP 289. The verdict: They rock. It didn't make a big difference in this problem, but in general, it's nice that you don't have to allocate enough memory to contain an entire list. One problem with them is that once you've consumed an element of the generator expression, you can't get it back, so they aren't well-suited for problems where you need to iterate across the data more than once.

Regexes

Regexes are hardly unique to Python, but this is the first time I've ever used them in Python. One feature I was excited to try—and that Python pioneered—is named capturing groups, but the regex I used in this example didn't need them.

Docstrings

I've written docstrings before, but this is the first time I've tried to generate stand-alone documentation from them. To do the generation, I used epydoc. The original PEP 257 defines docstrings in terms of plaintext, but PEP 287 establishes reStructuredText as a richer alternative. Since I was generating documentation from the docstrings, I decided to go with reStructuredText. To avoid typing --docformat restructuredtext every time I invoked epydoc again:

__docformat__ = "restructuredtext en"

While it took a while to get used to it (I've gotten pretty set in my Markdown ways), I really like it so far. In fact, I liked it enough to write this entire blog post in it and simply post the generated HTML into Blogger.

Back to flipping out…

2009-04-03

Note to Self: Statistical Analysis Inside Oracle

If you need to do statistical analysis on data in your Oracle database, DBMS_STAT_FUNCS is your friend, especially the SUMMARY function.

Back to flipping out...

Sneak Attack: namedtuple

Python 2.6 introduced namedtuple, and I have to say: It rocks.

Back to flipping out...

2009-03-30

Project Euler: Problem 16

I've just finished solving Problem 16. It was a fairly straightforward problem, and at first I couldn't decide how I wanted to solve it, so I decided to do it both ways and compare them. Here's the script w/ both versions of the solution:


import math

def pure_numeric_sum_of_digits(num):
    """
    Finds the sum of the digits in num (presumed to be base-10).

    This version of the method operates purely on numbers.

    >>> pure_numeric_sum_of_digits(2 ** 15)
    26
    """
    running_total = 0
    for n in xrange(int(math.log10(num)) + 1):
        running_total += num % 10
        num //= 10
    return running_total

def string_based_sum_of_digits(num):
    """
    Finds the sum of the digits in num (presumed to be base-10).

    This version of the method works by using a string representation
    of num and converting individual characters to numbers so we can
    operate on it as a sequence.

    >>> string_based_sum_of_digits(2 ** 15)
    26
    """
    sum([int(digit) for digit in str(num)])

def problem_16():
    """Solves Problem 16 at Project Euler."""
    return string_based_sum_of_digits(2 ** 1000)

if __name__ == '__main__':
    print problem_16()

Clearly, the string-based solution is more compact, but it has to be slower, right? Strings are slower than ints and all that. Thanks to the optimization efforts of the Python implementers, the string-based solution is actually ~167% faster on my machine (I used cProfile on a 10000-iteration loop to make these measurements).

Back to flipping out...

2009-03-26

Notes on acts-as-taggable-on

If you're using acts-as-taggable-on, you may have noticed the documentation is a bit… sparse. The following are some of the more helpful resources I've found.

  • The README from Acts as Taggable on Steroids (the inspriation for acts-as-taggable-on)
  • crunchytoast.com has a nice documentation effort

Finally, I ran into errors with trying to build a tag cloud when there were no tags to count. My solution was to embed


    @tags = Post.tag_counts || []

(or similar) into whichever method on the controller was responsible for the page to contain the tag cloud so that the default is an empty list instead of nil.

Back to flipping out...

2009-03-20

Mental Note: In Rails 2.3, application.rb -> application_controller.rb

Rails 2.3.2 apparently forgot to rename app/controllers/application.rb to app/controllers/application_controller.rb, so Getting Started with Rails fails with the following error:


uninitialized constant ApplicationController

A simple mv should fix things right up.

Back to flipping out...

2009-03-18

Sneak Attack: Reversion

Reversion has to be one of the more clever applications of object serialization I've come across lately.

Back to flipping out...

2009-03-04

Project Euler: Problem 15

This one took me a while, largely because I didn't understand what they meant by backtracking. I was thinking of backtracking to a particular point, so I kept trying to use acyclic graphs, sometimes called forests. That's not really what they meant, though. In the context of Problem 15, no backtracking just means you can't go up and you can't go left. After working a 3x3 version of the problem, I thought the values looked a little familiar; I was right. It's a counting problem we covered in combinatorics oh so many years ago: Permutations with Repeated Elements. There are 40 steps to any given path (we have to go 20 steps down and 20 steps right). Since the definition of backtracking makes any given move down indistinguishable from any other move down, and any given move right indistinguishable from any other move right, the answer is 40!/(20!20!).

Back to flipping out...

2009-02-27

Improving Browser Support for HTTP Authentication

One of the classic problems with HTTP Authentication has been the way browsers implement it: there's no easy way to "log out" of a site. Once you've provided authentication tokens for a site, the browser keeps sending it. This decision has put a sharp dent in the adoption of standards-based authentication schemes. It doesn't have to be this way. Bug 300002 is an enhancement request for Firefox that would add a UI element for telling the browser to stop. So far, this request has languished in undeserved obscurity; let's put an end to that. Please vote up Bug 300002. If Firefox were to remedy this usability problem with HTTP authentication, maybe other browser makers would follow suit, and standards-based authentication would once again be seen as a viable option.

Back to flipping out...

2009-02-22

Project Euler: Problem 14

Problem 14 was a fairly straightforward problem to solve, but I liked it a lot. One reason I liked it is that I enjoy playing with mathematical sequences, but also because it marked my first foray into TDD in Python. I'd heard about doctest, and I really like the concept, so I decided to use it when solving this problem. Here's the code I used to find the solution:


"""Solves Problem 14 from Project Euler."""

def _gen_sequence(seed, cur_sequence = None):
    """Generates a sequence following the rules from Problem 14.
    
    >>> _gen_sequence(13)
    [13, 40, 20, 10, 5, 16, 8, 4, 2, 1]"""
    if not cur_sequence:
        cur_sequence = []
    cur_sequence.append(seed)
    if seed == 1:
        return cur_sequence
    elif seed % 2:
        return _gen_sequence(3 * seed + 1, cur_sequence)
    else:
        return _gen_sequence(seed / 2, cur_sequence)

def problem_14(upper_bound):
    """Finds the seed (< upper_bound and > 500000) for the longest sequence.

    >>> problem_14(502873)
    502137
    """
    cur_answer = 0, 0  # Track the length too, just out of curiosity.
    for seed in reversed(xrange(500001, upper_bound, 2)):
        cur_length = len(_gen_sequence(seed))
        if cur_length > cur_answer[1]:
            cur_answer = seed, cur_length
    return cur_answer[0]

if __name__ == '__main__':
    print problem_14(1000000)

You can execute these tests (assuming Python 2.6) from the command-line with python -m doctest -v problem14.py. The output will resemble:


Trying:
    _gen_sequence(13)
Expecting:
    [13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
ok
Trying:
    problem_14(502873)
Expecting:
    502137
ok
1 items had no tests:
    problem14
2 items passed all tests:
   1 tests in problem14._gen_sequence
   1 tests in problem14.problem_14
2 tests in 3 items.
2 passed and 0 failed.
Test passed.

Not only does doctest make it easy to do TDD, it reminds me somewhat of literate programming. It's obviously not the same, but I think it encourages the sort of documentation that Knuth would approve. I know it definitely made me think more about documenting these simple functions.

Back to flipping out...

2009-02-21

JavaScript Idioms Revisited

In an earlier post about JavaScript idioms, I talked about a common idiom for copying an array:


var myArray = [];
for (var i = 0; i < source.length; i++) {
    myArray[i] = source[i];
}

In a comment, Jesse offers the following alternative:


var args = Array.prototype.slice.apply(arguments);

To my eye, this is clearly preferable, and I hope more people use it in the future. Thanks for pointing it out, Jesse.

Also: here's a good list of JavaScript idioms.

Back to flipping out...

2009-02-18

Project Euler: Problem 13

Problem 13 was very straightforward to brute-force.


"""Solves Problem 13 of Project Euler."""

NUMBERS = [
                37107287533902102798797998220837590246510135740250,
                46376937677490009712648124896970078050417018260538,
                74324986199524741059474233309513058123726617309629,
                91942213363574161572522430563301811072406154908250,
                23067588207539346171171980310421047513778063246676,
                89261670696623633820136378418383684178734361726757,
                28112879812849979408065481931592621691275889832738,
                44274228917432520321923589422876796487670272189318,
                47451445736001306439091167216856844588711603153276,
                70386486105843025439939619828917593665686757934951,
                62176457141856560629502157223196586755079324193331,
                64906352462741904929101432445813822663347944758178,
                92575867718337217661963751590579239728245598838407,
                58203565325359399008402633568948830189458628227828,
                80181199384826282014278194139940567587151170094390,
                35398664372827112653829987240784473053190104293586,
                86515506006295864861532075273371959191420517255829,
                71693888707715466499115593487603532921714970056938,
                54370070576826684624621495650076471787294438377604,
                53282654108756828443191190634694037855217779295145,
                36123272525000296071075082563815656710885258350721,
                45876576172410976447339110607218265236877223636045,
                17423706905851860660448207621209813287860733969412,
                81142660418086830619328460811191061556940512689692,
                51934325451728388641918047049293215058642563049483,
                62467221648435076201727918039944693004732956340691,
                15732444386908125794514089057706229429197107928209,
                55037687525678773091862540744969844508330393682126,
                18336384825330154686196124348767681297534375946515,
                80386287592878490201521685554828717201219257766954,
                78182833757993103614740356856449095527097864797581,
                16726320100436897842553539920931837441497806860984,
                48403098129077791799088218795327364475675590848030,
                87086987551392711854517078544161852424320693150332,
                59959406895756536782107074926966537676326235447210,
                69793950679652694742597709739166693763042633987085,
                41052684708299085211399427365734116182760315001271,
                65378607361501080857009149939512557028198746004375,
                35829035317434717326932123578154982629742552737307,
                94953759765105305946966067683156574377167401875275,
                88902802571733229619176668713819931811048770190271,
                25267680276078003013678680992525463401061632866526,
                36270218540497705585629946580636237993140746255962,
                24074486908231174977792365466257246923322810917141,
                91430288197103288597806669760892938638285025333403,
                34413065578016127815921815005561868836468420090470,
                23053081172816430487623791969842487255036638784583,
                11487696932154902810424020138335124462181441773470,
                63783299490636259666498587618221225225512486764533,
                67720186971698544312419572409913959008952310058822,
                95548255300263520781532296796249481641953868218774,
                76085327132285723110424803456124867697064507995236,
                37774242535411291684276865538926205024910326572967,
                23701913275725675285653248258265463092207058596522,
                29798860272258331913126375147341994889534765745501,
                18495701454879288984856827726077713721403798879715,
                38298203783031473527721580348144513491373226651381,
                34829543829199918180278916522431027392251122869539,
                40957953066405232632538044100059654939159879593635,
                29746152185502371307642255121183693803580388584903,
                41698116222072977186158236678424689157993532961922,
                62467957194401269043877107275048102390895523597457,
                23189706772547915061505504953922979530901129967519,
                86188088225875314529584099251203829009407770775672,
                11306739708304724483816533873502340845647058077308,
                82959174767140363198008187129011875491310547126581,
                97623331044818386269515456334926366572897563400500,
                42846280183517070527831839425882145521227251250327,
                55121603546981200581762165212827652751691296897789,
                32238195734329339946437501907836945765883352399886,
                75506164965184775180738168837861091527357929701337,
                62177842752192623401942399639168044983993173312731,
                32924185707147349566916674687634660915035914677504,
                99518671430235219628894890102423325116913619626622,
                73267460800591547471830798392868535206946944540724,
                76841822524674417161514036427982273348055556214818,
                97142617910342598647204516893989422179826088076852,
                87783646182799346313767754307809363333018982642090,
                10848802521674670883215120185883543223812876952786,
                71329612474782464538636993009049310363619763878039,
                62184073572399794223406235393808339651327408011116,
                66627891981488087797941876876144230030984490851411,
                60661826293682836764744779239180335110989069790714,
                85786944089552990653640447425576083659976645795096,
                66024396409905389607120198219976047599490197230297,
                64913982680032973156037120041377903785566085089252,
                16730939319872750275468906903707539413042652315011,
                94809377245048795150954100921645863754710598436791,
                78639167021187492431995700641917969777599028300699,
                15368713711936614952811305876380278410754449733078,
                40789923115535562561142322423255033685442488917353,
                44889911501440648020369068063960672322193204149535,
                41503128880339536053299340368006977710650566631954,
                81234880673210146739058568557934581403627822703280,
                82616570773948327592232845941706525094512325230608,
                22918802058777319719839450180888072429661980811197,
                77158542502016545090413245809786882778948721859617,
                72107838435069186155435662884062257473692284509516,
                20849603980134001723930671666823555245252804609722,
                53503534226472524250874054075591789781264330331690
]

def problem_13(nums_to_sum):
    """Find the first 10 digits of the sum of nums_to_sum."""
    return int(str(sum(nums_to_sum))[:10])

if __name__ == '__main__':
    print problem_13(NUMBERS)

Back to flipping out...

2009-02-17

Project Euler: Problem 12

Problem 12 brings back an interesting mathematical concept: triangular numbers. Triangular numbers (and the formula for finding one) are the slightly more sophisticated approach I mentioned in my writeup for Problem 6. Writing a generator for triangular numbers is easy enough, as is writing some logic to factorize (not prime factorize) a number. Given those two, the solution is easy enough.


"""Solves Problem 12 of Project Euler."""

import math

def factors(to_factor):
    """Find the factors of to_factor."""
    factors = []
    divisor = 1
    while (divisor <= int(math.sqrt(to_factor))):
        if not to_factor % divisor:
            quotient = to_factor / divisor
            factors.append(divisor)
            factors.append(quotient)
        divisor += 1

    return factors

def triangular_numbers():
    """Generate the triangular numbers."""
    current = 0
    position = 1
    while True:
        current += position
        position += 1
        yield current

def problem_12(min_divisors):
    """Finds the first triangular number to have more than 500 divisors."""
    for triangular in triangular_numbers():
        cur_factors = factors(triangular)
        if len(cur_factors) > min_divisors:
            return triangular

if __name__ == '__main__':
    print problem_12(500)

Back to flipping out...

2009-02-16

Project Euler: Problem 11

Problem 11 is fairly straightforward to brute-force, although my solution uses more code than is strictly necessary. Since one of the things I enjoy most about Python is its support for functional programming concepts, I went with a reduce-based solution.


"""Solves Problem 11 from Project Euler."""

import operator

OFFSET = 3
GRID = [
    [8, 2, 22, 97, 38, 15, 0, 40, 0, 75, 4, 5, 7, 78, 52, 12, 50, 77, 91, 8],
    [49, 49, 99, 40, 17, 81, 18, 57, 60, 87, 17, 40, 98, 43, 69, 48, 4, 56, 62, 0],
    [81, 49, 31, 73, 55, 79, 14, 29, 93, 71, 40, 67, 53, 88, 30, 3, 49, 13, 36, 65],
    [52, 70, 95, 23, 4, 60, 11, 42, 69, 24, 68, 56, 1, 32, 56, 71, 37, 2, 36, 91],
    [22, 31, 16, 71, 51, 67, 63, 89, 41, 92, 36, 54, 22, 40, 40, 28, 66, 33, 13, 80],
    [24, 47, 32, 60, 99, 3, 45, 2, 44, 75, 33, 53, 78, 36, 84, 20, 35, 17, 12, 50],
    [32, 98, 81, 28, 64, 23, 67, 10, 26, 38, 40, 67, 59, 54, 70, 66, 18, 38, 64, 70],
    [67, 26, 20, 68, 2, 62, 12, 20, 95, 63, 94, 39, 63, 8, 40, 91, 66, 49, 94, 21],
    [24, 55, 58, 5, 66, 73, 99, 26, 97, 17, 78, 78, 96, 83, 14, 88, 34, 89, 63, 72],
    [21, 36, 23, 9, 75, 0, 76, 44, 20, 45, 35, 14, 0, 61, 33, 97, 34, 31, 33, 95],
    [78, 17, 53, 28, 22, 75, 31, 67, 15, 94, 3, 80, 4, 62, 16, 14, 9, 53, 56, 92],
    [16, 39, 5, 42, 96, 35, 31, 47, 55, 58, 88, 24, 0, 17, 54, 24, 36, 29, 85, 57],
    [86, 56, 0, 48, 35, 71, 89, 7, 5, 44, 44, 37, 44, 60, 21, 58, 51, 54, 17, 58],
    [19, 80, 81, 68, 5, 94, 47, 69, 28, 73, 92, 13, 86, 52, 17, 77, 4, 89, 55, 40],
    [4, 52, 8, 83, 97, 35, 99, 16, 7, 97, 57, 32, 16, 26, 26, 79, 33, 27, 98, 66],
    [88, 36, 68, 87, 57, 62, 20, 72, 3, 46, 33, 67, 46, 55, 12, 32, 63, 93, 53, 69],
    [4, 42, 16, 73, 38, 25, 39, 11, 24, 94, 72, 18, 8, 46, 29, 32, 40, 62, 76, 36],
    [20, 69, 36, 41, 72, 30, 23, 88, 34, 62, 99, 69, 82, 67, 59, 85, 74, 4, 36, 16],
    [20, 73, 35, 29, 78, 31, 90, 1, 74, 31, 49, 71, 48, 86, 81, 16, 23, 57, 5, 54],
    [1, 70, 54, 71, 83, 51, 54, 69, 16, 92, 33, 48, 61, 43, 52, 1, 89, 19, 67, 48]
]
INVALID_COORDS_TUPLE = 0,

def _build_up(row_index, col_index):
    """Build the sequence going straight up from row_index."""
    if row_index < OFFSET:
        return INVALID_COORDS_TUPLE

    return (GRID[row_index][col_index],
            GRID[row_index - 1][col_index],
            GRID[row_index - 2][col_index],
            GRID[row_index - 3][col_index])

def _build_down(row_index, col_index):
    """Build the sequence going straight down from row_index."""
    if row_index + OFFSET >= len(GRID):
        return INVALID_COORDS_TUPLE

    return (GRID[row_index][col_index],
            GRID[row_index + 1][col_index],
            GRID[row_index + 2][col_index],
            GRID[row_index + 3][col_index])

def _build_left(row_index, col_index):
    """Build the sequence going straight left from col_index."""
    if col_index < OFFSET:
        return INVALID_COORDS_TUPLE

    return (GRID[row_index][col_index],
            GRID[row_index][col_index - 1],
            GRID[row_index][col_index - 2],
            GRID[row_index][col_index - 3])

def _build_right(row_index, col_index):
    """Build the sequence going straight right from row_index."""
    if col_index + OFFSET >= len(GRID[row_index]):
        return INVALID_COORDS_TUPLE

    return (GRID[row_index][col_index],
            GRID[row_index][col_index + 1],
            GRID[row_index][col_index + 2],
            GRID[row_index][col_index + 3])

def _build_up_right(row_index, col_index):
    """Build the sequence going up and to the right from \
    (row_index, col_index)."""
    if row_index < OFFSET:
        return INVALID_COORDS_TUPLE
    if col_index + OFFSET >= len(GRID[row_index]):
        return INVALID_COORDS_TUPLE

    return (GRID[row_index][col_index],
            GRID[row_index - 1][col_index + 1],
            GRID[row_index - 2][col_index + 2],
            GRID[row_index - 3][col_index + 3])

def _build_down_right(row_index, col_index):
    """Build the sequence going down and to the right from \
    (row_index, col_index)."""
    if row_index + OFFSET >= len(GRID):
        return INVALID_COORDS_TUPLE
    if col_index + OFFSET >= len(GRID[row_index]):
        return INVALID_COORDS_TUPLE

    return (GRID[row_index][col_index],
            GRID[row_index + 1][col_index + 1],
            GRID[row_index + 2][col_index + 2],
            GRID[row_index + 3][col_index + 3])

def _build_down_left(row_index, col_index):
    """Build the sequence going down and to the left from \
    (row_index, col_index)."""
    if row_index + OFFSET >= len(GRID):
        return INVALID_COORDS_TUPLE
    if col_index < OFFSET:
        return INVALID_COORDS_TUPLE

    return (GRID[row_index][col_index],
            GRID[row_index + 1][col_index - 1],
            GRID[row_index + 2][col_index - 2],
            GRID[row_index + 3][col_index - 3])

def _build_up_left(row_index, col_index):
    """Build the sequence going down and to the left from \
    (row_index, col_index)."""
    if row_index < OFFSET:
        return INVALID_COORDS_TUPLE
    if col_index < OFFSET:
        return INVALID_COORDS_TUPLE

    return (GRID[row_index][col_index],
            GRID[row_index - 1][col_index - 1],
            GRID[row_index - 2][col_index - 2],
            GRID[row_index - 3][col_index - 3])

def _build_all_possible_sequences():
    """Build a set of all possible 4-element sequences."""
    sequences = set()
    for row in range(len(GRID)):
        for col in range(len(GRID[row])):
            sequences.add(_build_up(row, col))
            sequences.add(_build_down(row, col))
            sequences.add(_build_left(row, col))
            sequences.add(_build_right(row, col))
            sequences.add(_build_up_right(row, col))
            sequences.add(_build_down_right(row, col))
            sequences.add(_build_down_left(row, col))
            sequences.add(_build_up_left(row, col))
    return sequences

def problem_11():
    """Find the largest product of four adjacent elements."""
    products = [reduce(operator.mul, sequence)
                for sequence in _build_all_possible_sequences()]
    return max(products)

if __name__ == '__main__':
    print problem_11()

Of course, this isn't the best fit for a functional solution. For one thing, it wastes an awful lot of memory. The more imperative approach below runs in less than half the time on my box, and it's (a little) less code, to boot. The lesson, as always: KISS.


"""Solves Problem 11 from Project Euler."""

OFFSET = 3
GRID = [
    [8, 2, 22, 97, 38, 15, 0, 40, 0, 75, 4, 5, 7, 78, 52, 12, 50, 77, 91, 8],
    [49, 49, 99, 40, 17, 81, 18, 57, 60, 87, 17, 40, 98, 43, 69, 48, 4, 56, 62, 0],
    [81, 49, 31, 73, 55, 79, 14, 29, 93, 71, 40, 67, 53, 88, 30, 3, 49, 13, 36, 65],
    [52, 70, 95, 23, 4, 60, 11, 42, 69, 24, 68, 56, 1, 32, 56, 71, 37, 2, 36, 91],
    [22, 31, 16, 71, 51, 67, 63, 89, 41, 92, 36, 54, 22, 40, 40, 28, 66, 33, 13, 80],
    [24, 47, 32, 60, 99, 3, 45, 2, 44, 75, 33, 53, 78, 36, 84, 20, 35, 17, 12, 50],
    [32, 98, 81, 28, 64, 23, 67, 10, 26, 38, 40, 67, 59, 54, 70, 66, 18, 38, 64, 70],
    [67, 26, 20, 68, 2, 62, 12, 20, 95, 63, 94, 39, 63, 8, 40, 91, 66, 49, 94, 21],
    [24, 55, 58, 5, 66, 73, 99, 26, 97, 17, 78, 78, 96, 83, 14, 88, 34, 89, 63, 72],
    [21, 36, 23, 9, 75, 0, 76, 44, 20, 45, 35, 14, 0, 61, 33, 97, 34, 31, 33, 95],
    [78, 17, 53, 28, 22, 75, 31, 67, 15, 94, 3, 80, 4, 62, 16, 14, 9, 53, 56, 92],
    [16, 39, 5, 42, 96, 35, 31, 47, 55, 58, 88, 24, 0, 17, 54, 24, 36, 29, 85, 57],
    [86, 56, 0, 48, 35, 71, 89, 7, 5, 44, 44, 37, 44, 60, 21, 58, 51, 54, 17, 58],
    [19, 80, 81, 68, 5, 94, 47, 69, 28, 73, 92, 13, 86, 52, 17, 77, 4, 89, 55, 40],
    [4, 52, 8, 83, 97, 35, 99, 16, 7, 97, 57, 32, 16, 26, 26, 79, 33, 27, 98, 66],
    [88, 36, 68, 87, 57, 62, 20, 72, 3, 46, 33, 67, 46, 55, 12, 32, 63, 93, 53, 69],
    [4, 42, 16, 73, 38, 25, 39, 11, 24, 94, 72, 18, 8, 46, 29, 32, 40, 62, 76, 36],
    [20, 69, 36, 41, 72, 30, 23, 88, 34, 62, 99, 69, 82, 67, 59, 85, 74, 4, 36, 16],
    [20, 73, 35, 29, 78, 31, 90, 1, 74, 31, 49, 71, 48, 86, 81, 16, 23, 57, 5, 54],
    [1, 70, 54, 71, 83, 51, 54, 69, 16, 92, 33, 48, 61, 43, 52, 1, 89, 19, 67, 48]
]
INVALID_COORDS_PRODUCT = 0

def _calc_up(row_index, col_index):
    """Build the sequence going straight up from row_index."""
    if row_index < OFFSET:
        return INVALID_COORDS_PRODUCT

    return GRID[row_index][col_index] * \
            GRID[row_index - 1][col_index] * \
            GRID[row_index - 2][col_index] * \
            GRID[row_index - 3][col_index]

def _calc_down(row_index, col_index):
    """Build the sequence going straight down from row_index."""
    if row_index + OFFSET >= len(GRID):
        return INVALID_COORDS_PRODUCT

    return GRID[row_index][col_index] * \
            GRID[row_index + 1][col_index] * \
            GRID[row_index + 2][col_index] * \
            GRID[row_index + 3][col_index]

def _calc_left(row_index, col_index):
    """Build the sequence going straight left from col_index."""
    if col_index < OFFSET:
        return INVALID_COORDS_PRODUCT

    return GRID[row_index][col_index] * \
            GRID[row_index][col_index - 1] * \
            GRID[row_index][col_index - 2] * \
            GRID[row_index][col_index - 3]

def _calc_right(row_index, col_index):
    """Build the sequence going straight right from row_index."""
    if col_index + OFFSET >= len(GRID[row_index]):
        return INVALID_COORDS_PRODUCT

    return GRID[row_index][col_index] * \
            GRID[row_index][col_index + 1] * \
            GRID[row_index][col_index + 2] * \
            GRID[row_index][col_index + 3]

def _calc_up_right(row_index, col_index):
    """Build the sequence going up and to the right from \
    (row_index, col_index)."""
    if row_index < OFFSET:
        return INVALID_COORDS_PRODUCT
    if col_index + OFFSET >= len(GRID[row_index]):
        return INVALID_COORDS_PRODUCT

    return GRID[row_index][col_index] * \
            GRID[row_index - 1][col_index + 1] * \
            GRID[row_index - 2][col_index + 2] * \
            GRID[row_index - 3][col_index + 3]

def _calc_down_right(row_index, col_index):
    """Build the sequence going down and to the right from \
    (row_index, col_index)."""
    if row_index + OFFSET >= len(GRID):
        return INVALID_COORDS_PRODUCT
    if col_index + OFFSET >= len(GRID[row_index]):
        return INVALID_COORDS_PRODUCT

    return GRID[row_index][col_index] * \
            GRID[row_index + 1][col_index + 1] * \
            GRID[row_index + 2][col_index + 2] * \
            GRID[row_index + 3][col_index + 3]

def _calc_down_left(row_index, col_index):
    """Build the sequence going down and to the left from \
    (row_index, col_index)."""
    if row_index + OFFSET >= len(GRID):
        return INVALID_COORDS_PRODUCT
    if col_index < OFFSET:
        return INVALID_COORDS_PRODUCT

    return GRID[row_index][col_index] * \
            GRID[row_index + 1][col_index - 1] * \
            GRID[row_index + 2][col_index - 2] * \
            GRID[row_index + 3][col_index - 3]

def _calc_up_left(row_index, col_index):
    """Build the sequence going down and to the left from \
    (row_index, col_index)."""
    if row_index < OFFSET:
        return INVALID_COORDS_PRODUCT
    if col_index < OFFSET:
        return INVALID_COORDS_PRODUCT

    return GRID[row_index][col_index] * \
            GRID[row_index - 1][col_index - 1] * \
            GRID[row_index - 2][col_index - 2] * \
            GRID[row_index - 3][col_index - 3]

def _calc_all_possible_sequences():
    """Build a set of all possible 4-element sequences."""
    products = set()
    for row in range(len(GRID)):
        for col in range(len(GRID[row])):
            products.add(_calc_up(row, col))
            products.add(_calc_down(row, col))
            products.add(_calc_left(row, col))
            products.add(_calc_right(row, col))
            products.add(_calc_up_right(row, col))
            products.add(_calc_down_right(row, col))
            products.add(_calc_down_left(row, col))
            products.add(_calc_up_left(row, col))
    return products

def problem_11():
    """Find the largest product of four adjacent elements."""
    return max(_calc_all_possible_sequences())

if __name__ == '__main__':
    print problem_11()

Back to flipping out...