Archive for category python

Building a graph-based model of metadata

I have had some success building an in-memory graph of my iTunes database, in Python. I discovered some rather interesting things about my collection in the process and I’ve started thinking about a way to use this information to cleanly chunk the data.

In my graph, nodes are represented by Python tuples that refer to the metadata culled from the song list. For example, there is a node for (‘Artist’, ‘U2′) and another for (‘Genre’, ‘Rock’). I keep track of the relationship between these nodes with a weight that comes from the number of songs that have both of these pieces of metadata.

So for example there is a line between (‘Artist’, ‘U2′) and (‘Genre’, ‘Rock’) which has a weight of 15, because their new album is categorized as ‘Rock’ – though songs from the album October are categorized as ‘Rock/Pop’

When I combine all the different pieces of metadata in my collection I get a whopping 1589 different facets, represented by nodes in my graph. But whats more interesting is that about 1500 of these nodes are connected, and the other 90 or so are divided into about 30 different individual chunks of 3-4 facets each. I tried to visualize this with GraphViz but the data was just too big.

But this got me thinking more about how to chunk the graph. It was really surprising that so many of the nodes were connected, but really what matters to me is knowing which nodes are the most connected. This means that I could start dropping lines (connections) between nodes where the weight is just 1… or 2, or whatever number yields an appropriately chunked graph. Hopefully that will break up the large cluster of facets into smaller, more usable clusters.

No Comments

Using generators to hide loop initialization

How often have you wanted to do a number of things in a loop, but had to move items out of the loop for performance reasons? Here’s a cool use of generators that I just figured out to hide the initialization.
Read the rest of this entry »

No Comments

cool python tricks

Man I love Python. I came up with a neat trick yesterday that also couldn’t be done in any static language. Needless to say, I’m pretty pleased with myself. This trick isn’t slow or hard to understand, and actually makes a lot of my code very simple, and avoids a lot of boilerplate that I would have had to write in another language

I needed a way to given a basic color to a class, and then have easy access to various tints of that color for painting different aspects of an object. The tints are based on HSV, not RGB, but all the callers need to deal with RGB.

The solution: wrap the property() descriptor with my own descriptor.
Read the rest of this entry »

No Comments