A pythonic n-wise iterator for any iterable

by Mahmoud on June 1, 2010

Over the weekend, I was working on upgrading python-ngrams because I had discovered a bug where the tokenization was incorrect. I was reading a research paper that was describing q-grams and while following their examples, I realized I was getting incorrect results for a fundamental n-gram result.

The tokenization that’s required here is quite simple, given some iterable consisting of values [x0, x1, x2, x3...,xi], produce an exhaustive iterator of n-tuples such that it satisfies (x0,...xn), (x1,...xn+1), (xm,...xi).

To make this easier to understand, if given a list of [0, 1, 2, 3, 4, 5, 6], I want to be able to return an iterator such that:

for first, second, third in n_wise([0, 1, 2, 3, 4, 5, 6], 3):
    print first, second, third

Will have a result of:

0, 1, 2
1, 2, 3
2, 3, 4
3, 4, 5
4, 5, 6

Fortunately, this wasn’t too difficult as the trivial implementation of this is already done in itertools, under pairwise. Below I’ve implemented a n-wise iterator implementation that can take any iterator and return n-iterators where each iterator is advanced by a step ahead of the other.

from itertools import tee, izip

def n_wise(iterable, n):
    """Returns n iterators for an iterable that are sequentially
    n-wise

    """
    n_iterators = tee(iterable, n)
    zippables = [n_iterators[0]]

    for advance, iteratee in enumerate(n_iterators[1:]):
        advance += 1  # since enumerate is 0 indexed.
        while advance > 0:
            # we advance the iterator `advance+1` steps
            next(iteratee, None)
            advance -= 1
        # append everything to the zippables
        zippables.append(iteratee)
    # return the izip expansion of each iterator
    return izip(*zippables)

I find that I sometimes need to open an iterator for a file and I need to read n-wise lines each step, this iterator will do just that. For sake of completeness, look at how concise, powerful, and easy to use this combination is:

# assuming that n_wise is imported into this namespace
from functools import partial

triple_wise_line_reader = partial(n_wise, n=3)
for line1, line2, line3 in triple_wise_line_reader(open("some_file.log", "r")):
    # do some computation with line1, line2 and line3
    do_something(line1, line2, line3)
    # next step:
    # line1 <- line2
    # line2 <- line3
    # line3 <- line4

Imagine doing this in C ;) It will be just ugly!

Please let me know if you find this useful or you have a better implementation to the problem!

{ 1 comment }

I just recently purchased a MacBook Pro, which comes with Snow Leopard installed, and I noticed that it comes with python 2.6.1 installed. I wanted to upgrade to the latest python release of 2.6.4, so I tried installing the official python Mac OS distribution from python.org. After installation, I wanted to install Twisted and I kept getting this error below:

creating build/temp.macosx-10.3-fat-2.6
creating build/temp.macosx-10.3-fat-2.6/twisted
creating build/temp.macosx-10.3-fat-2.6/twisted/runner
gcc-4.0 -arch ppc -arch i386 -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g -O3 -I/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 -c twisted/runner/portmap.c -o build/temp.macosx-10.3-fat-2.6/twisted/runner/portmap.o
In file included from /usr/include/architecture/i386/math.h:626,
 from /usr/include/math.h:28,
 from /Library/Frameworks/Python.framework/Versions/2.6/include/python2.6/pyport.h:235,
 from /Library/Frameworks/Python.framework/Versions/2.6/include/python2.6/Python.h:58,
 from twisted/runner/portmap.c:10:
/usr/include/AvailabilityMacros.h:108:14: warning: #warning Building for Intel with Mac OS X Deployment Target < 10.4 is invalid.
Compiling with an SDK that doesn't seem to exist: /Developer/SDKs/MacOSX10.4u.sdk
Please check your Xcode installation
gcc-4.0 -arch ppc -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk -g -bundle -undefined dynamic_lookup build/temp.macosx-10.3-fat-2.6/twisted/runner/portmap.o -o build/lib.macosx-10.3-fat-2.6/twisted/runner/portmap.so
ld: library not found for -lbundle1.o
ld: library not found for -lbundle1.o
collect2: ld returned 1 exit status
collect2: ld returned 1 exit status
lipo: can't open input file: /var/folders/T6/T6diKRiFGJSwsabKP4864E+++TI/-Tmp-//ccIK1c3K.out (No such file or directory)
error: command 'gcc-4.0' failed with exit status 1

Something’s not right — setuptools is detecting that I’m using macosx-10.3, but I’m using Mac OS X 10.6.  Why is it that setuptools also wants to use /Developer/SDKs/MacOS10.4u.sdk to build python extensions? I’m currently using the SDK for Mac OS X 10.6, and I don’t want to install another SDK.

Well, I did a little bit of research and I learned that PSF’s 2.6.4 python package for a Mac is built with an option called –enable-universalsdk which, according to the readme, defaults to /Developer/SDKs/MacOSX.10.4u.sdk. This is why building third-party extensions tries to reference the 10.4 SDK.

I was able to build python successfully using the following:

./configure --enable-framework --enable-universalsdk=/Developer/SDKs/MacOSX10.6.sdk/ --with-universal-archs=intel
make && make test
sudo make install

You’ll notice that the following 3 tests failed when attempting to run the unit tests:

  • asyncore
  • test_platform
  • test_macostools

The test that should really put you on alert is asyncore. After doing some research, it turns out the asyncore module is using some variant of select.poll(), which isn’t supported by the FreeBSD kernel. FreeBSD uses something called kqueue, which is what the test doesn’t take into account. To fix this, I pulled the asyncore.py module from the trunk and overwrote /Lib/asyncore.py. The tests passed then.

You don’t need to fix the other two, as they only pertain to fixing the actual tests themselves instead of having to actually change a module. If you’re interested though, fixing “test_platform” follows the same pattern as asyncore. Brett Cannon actually filed a bug for this test and submitted a patch, but you will still need to replace the entire test_platform.py module to get it working. Apparently, this patch is for python v2.7, v3.1, and v3.2. To fix it, download the patched test_platform.py module and replace it with the one in /Lib/test/test_platform.py. Make sure you also delete /Lib/test/test_platform.pyc.

The last test, test_macostools, is actually quite interesting. Apparently, Apple does not supply 64-bit versions of the Carbon frameworks used by these modules. This is why this test is failing. Looks like there might not be a way to fix this test until Apple upgrades the Carbon frameworks to 64-bit usage.

After fixing these bugs, make sure you run the following command:

sudo make install

Needless to say, after I installed python with the options above, and after I fixed all these modules, the Twisted 9.0 installation was successful.

I hope this helps some of you in case you run into this problem!

{ 6 comments }

Verifying Python64 builds

July 6, 2009

At work, I’m migrating over python to our 64bit machines and one thing that I’ve noticed was that there really was no standard python 64bit verification method to ensure the build was really 64bit or not. I’ve read somewhere previously, especially for the Mac OS X crowd, that the LDFLAGS=”-arch x86_64″ flag had to be [...]

Read the full article →

python -c ‘print “hello world!” ‘

July 4, 2009

And so, we meet again, world. I’ve finally gotten around to registering a home online, installing Wordpress, and ready to share my ideas with the world. I’ve given a lot of topics some thought, and I think I might be able to influence and/or help others with my various migrations.
First, I’d like to thank Canonical [...]

Read the full article →