Welcome back..

March 20th, 2007

Ok, so it’s been well over a year since I last updated this blog. I’ve had numerous things to say, but the ideas always come to me on the bus, or in the shower, or somewhere else where I don’t have access to a keyboard. I’m going to once again try to revitalize this blog with some actual comments and insights. First up, I’ve got an entry about development in Berkeley.

Why NOT to eat organic

October 5th, 2005

A while back I was exercising my writing, trying to find an voice for this blog, and wrote Why to shop organic. A friend of mine recently gave me a hard time about it and through a funny confluence of events, I found two reasons not to eat organic.

Reason number one: Probably the reason Heather used to call organic strawberries “armpit fruit”:

Artichoke

Yes, that is a dead worm in my artichoke. Yes, I had to eat this far to discover it. :)

Reason number two: Sure they don’t use pesticides, but I don’t want babies working the fields any more than I want 12 year olds making my shirts.


Organic Rice Cereal Box

Andy Rooney on Iraq

October 3rd, 2005

I never thought I’d be sending around something that took Andy Rooney seriously, but this morning I ran into a post on BoingBoing that blew me away. Last night Andy Rooney’s segment on 60 minutes (BitTorrent link) blasted the Iraq effort in a way that I think much of Middle America can understand: basic facts. (Also see the transcript.

I have a theory that many more people would be against the Iraq war and more critical of the Whitehouse administration if they simply understood the implications for this country. For example, I wonder how many people know that our budget this year for defense is $336 billion, yet our educational budget is $61 billion? I wonder how many people would support the simplest proposal of say, cutting $30 billion from the defense budget in order to increase the education budget by a whopping 50%?

And so I can’t begin to express how pleased I am that someone like Andy Rooney, who is typically viewed as fairly harmless, suddenly has become so vocally critical of the war. I think the mainstream media finally got some backbone with their outrage over the handling of Katrina, but I’m going to predict that Andy Rooney’s segment yesterday is a turning point for public criticism of the war and this administration. I think this changes the face of opposition. I think for many people it all sounds like the just the rantings of some that crazy mom Cindy Sheehan, or some crazy Californians who are too disconnected from the real world to have a legitimate voice, or some vocal celebrities jumping on the bandwagon of rebelliousness.

Building a graph-based model of metadata

August 3rd, 2005

I have had some success building an in-memory graph of my iTunes database, in Python. I discovered some rather interesting things about my collection in the process and I’ve started thinking about a way to use this information to cleanly chunk the data.

In my graph, nodes are represented by Python tuples that refer to the metadata culled from the song list. For example, there is a node for (’Artist’, ‘U2′) and another for (’Genre’, ‘Rock’). I keep track of the relationship between these nodes with a weight that comes from the number of songs that have both of these pieces of metadata.

So for example there is a line between (’Artist’, ‘U2′) and (’Genre’, ‘Rock’) which has a weight of 15, because their new album is categorized as ‘Rock’ - though songs from the album October are categorized as ‘Rock/Pop’

When I combine all the different pieces of metadata in my collection I get a whopping 1589 different facets, represented by nodes in my graph. But whats more interesting is that about 1500 of these nodes are connected, and the other 90 or so are divided into about 30 different individual chunks of 3-4 facets each. I tried to visualize this with GraphViz but the data was just too big.

But this got me thinking more about how to chunk the graph. It was really surprising that so many of the nodes were connected, but really what matters to me is knowing which nodes are the most connected. This means that I could start dropping lines (connections) between nodes where the weight is just 1… or 2, or whatever number yields an appropriately chunked graph. Hopefully that will break up the large cluster of facets into smaller, more usable clusters.

A graph based model for chunking

August 1st, 2005

Factor Analysis seems very promising, but I was thinking a lot about a presentation given by Mimi Yin at OSAF. In particular the Venn diagrams which showed items as existing in a number of collections based on the attributes of the item. These collections may or may not really exist in real life, but their virtual existence is important.
Read the rest of this entry »

An exploration: Chunking using Factor Analysis

July 22nd, 2005

I’ve been developing my ideas about chunking as I’ve been writing. My faith that there is structure expressed by facets keeps me believing that there is a way to extract this structure.

Last year I read (most of) The Mismeasure of Man by Stephen J Gould. Aside from being a fantastic book, its last chapter on Factor Analysis has been floating around in my head for quite some time. I think this could be one way to extract the kind of chunks I am looking for.
Read the rest of this entry »

What to chunk

July 18th, 2005

So in my previous post, I talked about the need for chunking large datasets. The problem I discussed is that it is very difficult to browse large datasets in small enough pieces, and find what you want.

I should mention that in this context, browsing is different from searching. Searching is looking for something very specific (i.e. ‘Desire by U2′) and browsing is when you don’t know exactly what you want, but can narrow it down through a series of small decisions. Browsing is also a more appropriate mechanism for devices, where you don’t want to try typing in a search term on a small keypad with your thumb.

So how do you, at a software level, provide the minimal set of choices to the user to allow them to find what they’re looking for most of the time? This is the core concept behind “chunking.”
Read the rest of this entry »

Chunking large datasets

July 13th, 2005

My wife and I have a collection of about 45G of MP3s. This was a long effort to rip all of our CDs over the course of a few months. All the files are stored on a linux box, but managed with iTunes. This is some 10,000 songs, by many different artists, in many genres.

Recently we purchased a Linksys Wireless Music System so that we could play music in our bedroom. The concept is pretty cool: its a WiFi radio - it uses UPnP to find music collections on your network, and then you can browse and stream them to the radio. It has a remote control and a little LCD display so you don’t even have to think about the fact that these are MP3s off on some Linux box. Good idea, huh? Not quite…
Read the rest of this entry »

Using generators to hide loop initialization

June 29th, 2005

How often have you wanted to do a number of things in a loop, but had to move items out of the loop for performance reasons? Here’s a cool use of generators that I just figured out to hide the initialization.
Read the rest of this entry »

demangling ‘property’ values

May 10th, 2005

I’m learning more about how properties work in Python. One thing I’m learning is that a property objects are only evaluated in the context of the parent object they’re attached to.
Read the rest of this entry »