<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Alec's thoughts &#187; python</title>
	<atom:link href="http://www.flett.org/category/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.flett.org</link>
	<description></description>
	<lastBuildDate>Thu, 15 Oct 2009 17:08:40 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Building a graph-based model of metadata</title>
		<link>http://www.flett.org/2005/08/03/building-a-graph-based-model-of-metadata/</link>
		<comments>http://www.flett.org/2005/08/03/building-a-graph-based-model-of-metadata/#comments</comments>
		<pubDate>Wed, 03 Aug 2005 15:47:46 +0000</pubDate>
		<dc:creator>alecf</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.flett.org/?p=50</guid>
		<description><![CDATA[I have had some success building an in-memory graph of  my iTunes database, in Python. I discovered some rather interesting things about my collection in the process and I&#8217;ve started thinking about a way to use this information to cleanly chunk the data.
In my graph, nodes are represented by Python tuples that refer to [...]]]></description>
			<content:encoded><![CDATA[<p>I have had some success building an in-memory graph of  my iTunes database, in Python. I discovered some rather interesting things about my collection in the process and I&#8217;ve started thinking about a way to use this information to cleanly chunk the data.</p>
<p>In my graph, nodes are represented by Python tuples that refer to the metadata culled from the song list. For example, there is a node for (&#8217;Artist&#8217;, &#8216;U2&#8242;) and another for (&#8217;Genre&#8217;, &#8216;Rock&#8217;). I keep track of the relationship between these nodes with a weight that comes from the number of songs that have both of these pieces of metadata.</p>
<p>So for example there is a line between (&#8217;Artist&#8217;, &#8216;U2&#8242;) and (&#8217;Genre&#8217;, &#8216;Rock&#8217;) which has a weight of 15, because their new album is categorized as &#8216;Rock&#8217; &#8211; though songs from the album October are categorized as &#8216;Rock/Pop&#8217;</p>
<p>When I combine all the different pieces of metadata in my collection I get a whopping 1589 different facets, represented by nodes in my graph. But whats more interesting is that about 1500 of these nodes are connected, and the other 90 or so are divided into about 30 different individual chunks of 3-4 facets each. I tried to visualize this with <a href="http://www.graphviz.org/">GraphViz</a> but the data was just too big.</p>
<p>But this got me thinking more about how to chunk the graph. It was really surprising that so many of the nodes were connected, but really what matters to me is knowing which nodes are the <em>most</em> connected. This means that I could start dropping lines (connections) between nodes where the weight is just 1&#8230; or 2, or whatever number yields an appropriately chunked graph. Hopefully that will break up the large cluster of facets into smaller, more usable clusters.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flett.org/2005/08/03/building-a-graph-based-model-of-metadata/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using generators to hide loop initialization</title>
		<link>http://www.flett.org/2005/06/29/using-generators-to-hide-loop-initialization/</link>
		<comments>http://www.flett.org/2005/06/29/using-generators-to-hide-loop-initialization/#comments</comments>
		<pubDate>Thu, 30 Jun 2005 00:16:25 +0000</pubDate>
		<dc:creator>alecf</dc:creator>
				<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.flett.org/?p=45</guid>
		<description><![CDATA[How often have you wanted to do a number of things in a loop, but had to move items out of the loop for performance reasons? Here&#8217;s a cool use of generators that I just figured out to hide the initialization.

I was trying to use PyICU to get the locale-sensitive hour for the Chandler calendar. [...]]]></description>
			<content:encoded><![CDATA[<p>How often have you wanted to do a number of things in a loop, but had to move items out of the loop for performance reasons? Here&#8217;s a cool use of generators that I just figured out to hide the initialization.<br />
<span id="more-45"></span><br />
I was trying to use PyICU to get the locale-sensitive hour for the Chandler calendar. For instance, in some locales, the hour for 4:00pm would be &#8220;16&#8243;.</p>
<p>Unfortunately, the interface for PyICU for this kind of thing is a little ugly:</p>
<pre class="code">
# do some setup, initializing stuff from PyICU
timeFormatter = PyICU.DateFormat.createTimeInstance()
hourFP = PyICU.FieldPosition(PyICU.DateFormat.HOUR1_FIELD)

# Now deal with the current hour
hourdate = datetime.combine(date.today(), time(<b>hour</b>))
timeString = timeFormatter.format(hourdate, hourFP)
(start, end) = (hourFP.getBeginIndex(), hourFP.getEndIndex())
<b>hourString</b> = str(timeString[start:end])
</pre>
<p>Yuck! The point here is not that PyICU is ugly, but that there is some initialization that must happen before any actual use of the variable &#8216;hour&#8217;</p>
<p>The problem is that I have to do other things with &#8216;hour&#8217; beyond just getting its time string. So my code would look like:</p>
<pre class="code">
# initialization...
timeFormatter = PyICU.DateFormat.createTimeInstance()
hourFP = PyICU.FieldPosition(PyICU.DateFormat.HOUR1_FIELD)

for hour in range(1,24):
    hourdate = datetime.combine(date.today(), time(<b>hour</b>))
    timeString = timeFormatter.format(hourdate, hourFP)
    (start, end) = (hourFP.getBeginIndex(), hourFP.getEndIndex())
    <b>hourString</b> = str(timeString[start:end])
</pre>
<p>Again.. UGLY!</p>
<p>So my first thought was to combine the last 4 lines into a single function, so that I could just say</p>
<pre class="code">
for <b>hour</b> in range(1,24):
    <b>hourString</b> = GetHourString(<b>hour</b>, ...)
</pre>
<p>But the problem here is that GetHourString() needs context from the initialization. So it would look something like:</p>
<pre class="code">
# initialization...
timeFormatter = PyICU.DateFormat.createTimeInstance()
hourFP = PyICU.FieldPosition(PyICU.DateFormat.HOUR1_FIELD)

for <b>hour</b> in range(1,24):
    <b>hourString</b> = GetHourString(<b>hour</b>, timeFormatter, hourFP)

    # do other things with hour and hourString...
</pre>
<p>What if there were a way to keep the loop simple without the initialization, keep GetHourString() simple without the extra parameters, and still get the benefit of initialization outside the loop.</p>
<p>Enter: Generators</p>
<p>Instead of doing the initialization before the loop, lets hide this all in another function:</p>
<pre class="code">
def GetLocaleHourStrings(start, end):
    timeFormatter = DateFormat.createTimeInstance()
    hourFP = FieldPosition(DateFormat.HOUR1_FIELD)
    dummyDate = date.today()

    for <b>hour</b> in range(start, end):
        hourdate = datetime.combine(dummyDate, time(hour))
        timeString = timeFormatter.format(hourdate, hourFP)
        (start, end) = (hourFP.getBeginIndex(),hourFP.getEndIndex())
        <b>hourString</b> = str(timeString)[start:end]
        yield <b>hour, hourString</b>
</pre>
<p>Note that we do some initialization, and then <i>yield</i> the string each time. Nice, but how do we use it?</p>
<pre class="code">
    for <b>hour,hourString</b> in GetHourStrings(1, 24):

    # do other things with hour and hourString...
</pre>
<p>Neat, huh?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flett.org/2005/06/29/using-generators-to-hide-loop-initialization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>cool python tricks</title>
		<link>http://www.flett.org/2005/04/26/cool-python-tricks/</link>
		<comments>http://www.flett.org/2005/04/26/cool-python-tricks/#comments</comments>
		<pubDate>Tue, 26 Apr 2005 19:38:09 +0000</pubDate>
		<dc:creator>alecf</dc:creator>
				<category><![CDATA[projects]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://www.flett.org/?p=42</guid>
		<description><![CDATA[Man I love Python. I came up with a neat trick yesterday that also couldn&#8217;t be done in any static language. Needless to say, I&#8217;m pretty pleased with myself. This trick isn&#8217;t slow or hard to understand, and actually makes a lot of my code very simple, and avoids a lot of boilerplate that I [...]]]></description>
			<content:encoded><![CDATA[<p>Man I love Python. I came up with a neat trick yesterday that also couldn&#8217;t be done in any static language. Needless to say, I&#8217;m pretty pleased with myself. This trick isn&#8217;t slow or hard to understand, and actually makes a lot of my code very simple, and avoids a lot of boilerplate that I would have had to write in another language</p>
<p>I needed a way to given a basic color to a class, and then have easy access to various tints of that color for painting different aspects of an object. The tints are based on HSV, not RGB, but all the callers need to deal with RGB.</p>
<p>The solution: wrap the property() descriptor with my own descriptor.<br />
<span id="more-42"></span><br />
I start by defining my class</p>
<pre class="code">
class TintedColors(object):
    def __init__(self, color):
        """
        color is an RGB triple
        """
        self.color = color
        self.hue = rgb_to_hsv(*color)[0]
</pre>
<p>Now I just need to define the property wrapper:</p>
<pre class="code">
    def tintedColor(saturation, value=1.0):
        def getColor(self):
            hsv = (self.hue, saturation, value)
            return hsv_to_rgb(*hsv)
        return property(getColor)
</pre>
<p>Finally, I can define all my tints as class attributes:</p>
<pre class="code">
    gradientLeft = tintedColor(0.4)
    gradientRight = tintedColor(0.2)
    outlineColor = tintedColor(0.5)
    textColor = tintedColor(0.67, 0.6)
</pre>
<p>What is so amazing about this is that because property is a runtime thing, and because classes are evaluated, not declared, the actual value of &#8220;f.gradientLeft&#8221; will be a descriptor as generated by property(). Each descriptor will call a special version of getColor() defined for each and every tint. The dynamic nature of this might make it look like it should be slower than a simple property() call, but in fact it is just as fast because the specialized versions of getColor() are very simple, and in memory act as though they were declared as:</p>
<pre class="code">
def getColor(self):
    hsv = (self.hue, 0.4, 1.0)
    return hsv_to_rgb(*hsv)
</pre>
<p>which is pretty darn fast.</p>
<p>Lets look at a similar option in C++:</p>
<pre class="code">
class TintedColors {
private:
    color mColor;
    float mHue;
&nbsp;
    color GetSaturatedColor(float saturation, float value=1.0) {
        hsv = make_hsv(this.mHue, saturation, value);
        return hsv_to_rgb(hsv);
    }
&nbsp;

public:
    TintedColors(const color&#038; c) const {
        mColor = c;
        this.mHue = rgb_to_hsv(c).hue
    }
&nbsp;

    color&#038; GetGradientLeft() const {
        return GetSaturatedColor(0.4);
    }
&nbsp;
    color&#038; GetOutlineColor() const {
        return GetSaturatedColor(0.2);
    }
&nbsp;
    color&#038; GetTextColor() const {
        return GetSaturatedColor(0.67, 0.6);
    }
}
</pre>
<p>There we go. I&#8217;ve tried to follow &#8220;good C++&#8221; guidelines with respect to consts, references, and so forth. And there we have the same pattern in C++ but with way more work: the 4 boilerplate calls meant I had to type GetSaturatedColor() 4 times, and the verbosity allows you to loose what&#8217;s really different between these functions &#8211; the color saturation and value. </p>
<p>And that&#8217;s all assuming that I&#8217;ve got a &#8220;color&#8221; type (which I&#8217;d have to define seperately &#8211; python just uses the built-in triples for a small value like that) and that hsv_to_rgb is part of that color system.. that may seem like a small assumption. It may not be part of the language, but there are so many extra easy routines just built in to Python that you can&#8217;t just write that off.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.flett.org/2005/04/26/cool-python-tricks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
