2009-08-29

Sneak Attack: Co-routines as an alternative to state machines

Co-routines as an alternative to state machines. And the money quote: Co-routines are to state machines what recursion is to stacks.

Back to flipping out...

2009-08-25

PyAtl Notes—2009-08-20

What follows is essentially a stream-of-consciousness dump from the PyATL meeting on August 20.

Introduction to Python and Sqlite3

Presented by Alfredo Deza
  • For simple things, it's blazing fast
  • It's local, so no network necessary
  • Serializes to a single file
  • Uses DB-API 2.0 (see PEP 249). Once you have your cursor, you can start running SQL.
  • NOTE: Use sqlite3.connect(':memory:', isolation_level='exclusive') to use an in-memory DB
  • The iterdump() method of the sqlite3 connection gives you an iterable that lets you export the in-memory database. This is handy if you want to write them all to a file.
  • Although the sqlite3 docs claim that the concurrency situation has improved, the Internet at large seems to indicate that problems still abound.

GeoDjango

Presented by Skylar Saveland
  • Recommend you use PostGIS w/ PostgresQL
  • OpenLayers is a nice library for embedding map widgets on any webpage.
  • Core team is available on IRC (#geodjango on Freenode)
  • They make it fairly easy to integrate with Google Maps
  • Instead of standard Django models, you use the one from GeoDjango and then you get the Point, Polygon etc. so that you can store GIS data right with your Django models
  • You can do distance, intersection, touches, etc. queries right in the ORM

Nuts & Bolts

Presented by Brandon Rhodes
  • PyCon is 6 months away

  • String formatting
    • New in 2.6
    • str and unicode objects now have a format() method, and that's the encouraged way to do string formatting; % is right out.
    • At least one of the benefits of the new method is that it's easier to read the name-based format
    • It lets you use dot-notation to access attributes of parameters
    • Also supports indexing into lists, etc.
    • I need to look this up, pronto. It's covered briefly in the sequence types in the standard docs and links to a dedicated page for the new syntax.
  • lxml != ElementTree
    • The goal of ElementTree is to be a Pythonic XML library
    • It's probably a dead project, since the current version is 2 years old
    • lxml uses the "ElementTree" object model, but it's built on top of libxml2 and libxslt so it's blazing fast.
    • Remember these two imports:
      • from lxml import etree
      • from lxml.cssselect import CSSSelector as css
    • lxml also supports CSS selectors
    • Don't forget the default for lxml is an XML parser; you want the HTML parser
    • Ian Bicking says lxml is better than BeautifulSoup

Making Code More Testable

Presented by Brandon Rhodes
  • Based on Writing Testable Code; showing Python examples of how some of these actually look when you put them in action
Don't Mix Object Graph Construction with Application Logic
  • This one makes it really hard to test classes in isolation, because instantiating one starts instantiating a bunch of objects we don't want to test.
  • On a side note, nose handily shows you captured stdout when a test fails, and doesn't show you when the test is successful. That way you can you leave your debugging print statements. It does not capture stderr.
  • The simple fix for this in Python is to accept dependencies in the __init__ method. You might recognize this as constructor-based dependency injection. In Java, mutator-based dependency injection is more common.
Avoid "Global State"
  • Even though every programmer (should) know global state is evil, you still see this quite often in web apps where the global state is something like the request and/or session.
  • The breakage is quite often far more extensive than you would expect, e.g., a test in one file that doesn't handle global state well can break tests in other files, even if they are doing a good job of handling the global state.
  • The fix is to pass the state explicitly, even if the state has to be passed to a lot of functions. This is inconvenient at times, but is usually worth it.

2009-08-24

Project Euler: Problem 20

I finished up Problem 20 a while back. This problem was really straightforward, especially given my work on Problem 16.


import math


if __name__ == '__main__':
    print sum((int(digit) for digit in str(math.factorial(100))))

Back to flipping out...

2009-08-13

Project Euler: Problem 19

I recently solved Problem 19 from Project Euler. Since I don't really enjoy date math, I decided to brute force it and use the datetime module from the standard library.


from datetime import date


def next_first_of_month_in_20th():
    """Generator to list every first of the month during the 20th century."""
    first = date(1901, 1, 1)
    yield first
    while first.year < 2001:
        if first.month == 12:
            first = first.replace(year=first.year + 1)
            first = first.replace(month=1)
        else:
            first = first.replace(month=first.month + 1)
        yield first


def main():
    """
    Solve `Problem 19`_ from Project Euler.
    
    .. _`Problem 19`: http://projecteuler.net/index.php?section=problems&id=19
    """
    return len([first for first in next_first_of_month_in_20th() if first.weekday() == 6])


if __name__ == '__main__':
    print main()

Back to flipping out...

2009-08-07

Project Euler, Problems 18 and 67

I recently finished up Problem 18 (and shortly thereafter, Problem 67). This was a fairly interesting problem, because I like graph theory even if I'm bad at it, and I immediately thought graph theory when I read this problem.

I started creating Vertex and Edge classes, but quickly decided that was too heavyweight for such a small problem. Instead, I chose to use a list of lists where the left-hand child of a node is stored in the next row on the same column and the right-hand child is stored in the next row and the next column.

This made representing and using the data fairly straightforward, but I kept running into the problem of doing an exhaustive search. I knew this was suboptimal (and completely unsuited for the related Problem 67), but I was so locked into thinking of the triangle from the top-down I made things over-complicated.

Eventually, I decided I needed to reset my thinking and Googled for hints. Once I started thinking of the problem from the bottom up (since the graph, after all, is not directed), the solution became clear. Since I didn't need the actual path, I modified the existing data structure to minimize memory requirements.


def find_max_sum(triangle):
    """
    Find the maximum sum for a path from the top to the bottom of ``triangle``.

    >>> test = [[3,], \
            [7, 5], \
            [2, 4, 6], \
            [8, 5, 9, 3]]
    >>> find_max_sum(test)
    23
    """
    while len(triangle) > 1:
        _reduce_triangle(triangle)
    return triangle[0][0]


def _reduce_triangle(to_reduce):
    """
    Reduce ``to_reduce`` in place by rolling up the maximum path info one row.

    Don't return anything to emphasize the in-place nature of this function.

    >>> test = [[3,], \
            [7, 5], \
            [2, 4, 6], \
            [8, 5, 9, 3]]
    >>> _reduce_triangle(test)
    >>> test
    [[3], [7, 5], [10, 13, 15]]
    >>> _reduce_triangle(test)
    >>> test
    [[3], [20, 20]]
    >>> _reduce_triangle(test)
    >>> test
    [[23]]
    """
    last_row = to_reduce[-1]
    for index in xrange(len(to_reduce) - 1):
        to_reduce[-2][index] += max(last_row[index:index + 2])
    del to_reduce[-1]


def main():
    """
    Solve `Problem 18`_ of Project Euler.

    .. _Problem 18: http://projecteuler.net/index.php?section=problems&id=18
    """
    actual = [[75,],
             [95, 64],
             [17, 47, 82],
             [18, 35, 87, 10],
             [20, 4, 82, 47, 65],
             [19, 1, 23, 75, 3, 34],
             [88, 2, 77, 73, 7, 63, 67],
             [99, 65, 4, 28, 6, 16, 70, 92],
             [41, 41, 26, 56, 83, 40, 80, 70, 33],
             [41, 48, 72, 33, 47, 32, 37, 16, 94, 29],
             [53, 71, 44, 65, 25, 43, 91, 52, 97, 51, 14],
             [70, 11, 33, 28, 77, 73, 17, 78, 39, 68, 17, 57],
             [91, 71, 52, 38, 17, 14, 91, 43, 58, 50, 27, 29, 48],
             [63, 66, 4, 68, 89, 53, 67, 30, 73, 16, 69, 87, 40, 31],
             [4, 62, 98, 27, 23, 9, 70, 98, 73, 93, 38, 53, 60, 4, 23]]
    print find_max_sum(actual)


if __name__ == '__main__':
    main()

Starting with this, the solution for Problem 67 is trivial: just write some code to parse the triangle out of a text file instead of coding it directly into the source code:


def find_max_sum(triangle):
    """
    Find the maximum sum for a path from the top to the bottom of ``triangle``.

    >>> test = [[3,], \
            [7, 5], \
            [2, 4, 6], \
            [8, 5, 9, 3]]
    >>> find_max_sum(test)
    23
    """
    while len(triangle) > 1:
        _reduce_triangle(triangle)
    return triangle[0][0]


def _reduce_triangle(to_reduce):
    """
    Reduce ``to_reduce`` in place by rolling up the maximum path info one row.

    Don't return anything to emphasize the in-place nature of this function.

    >>> test = [[3,], \
            [7, 5], \
            [2, 4, 6], \
            [8, 5, 9, 3]]
    >>> _reduce_triangle(test)
    >>> test
    [[3], [7, 5], [10, 13, 15]]
    >>> _reduce_triangle(test)
    >>> test
    [[3], [20, 20]]
    >>> _reduce_triangle(test)
    >>> test
    [[23]]
    """
    last_row = to_reduce[-1]
    for index in xrange(len(to_reduce) - 1):
        to_reduce[-2][index] += max(last_row[index:index + 2])
    del to_reduce[-1]


def _parse_triangle_from_file(data_file):
    """
    Parse out the triangle data from ``data_file``.

    >>> _parse_triangle_from_file('test_triangle.txt')
    [[3], [7, 5], [10, 13, 15]]
    """
    triangle = []
    with open(data_file, 'r') as triangle_file:
        for line in triangle_file:
            triangle.append([int(x) for x in line.split()])
    return triangle


def main():
    """
    Solve `Problem 67`_ of Project Euler.

    .. _Problem 67: http://projecteuler.net/index.php?section=problems&id=67
    """
    print find_max_sum(_parse_triangle_from_file('triangle.txt'))


if __name__ == '__main__':
    main()