<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Pea blog &#187; Uncategorized</title>
	<atom:link href="http://pea.somemilk.org/category/uncategorized/feed/" rel="self" type="application/rss+xml" />
	<link>http://pea.somemilk.org</link>
	<description>Just another Somemilk.org Blogs weblog</description>
	<lastBuildDate>Sun, 12 Apr 2009 10:42:41 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Pinyin Tones Transformation</title>
		<link>http://pea.somemilk.org/2009/04/10/pinyin-tones-transformation/</link>
		<comments>http://pea.somemilk.org/2009/04/10/pinyin-tones-transformation/#comments</comments>
		<pubDate>Fri, 10 Apr 2009 07:25:31 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[mandarin]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[pinyin]]></category>
		<category><![CDATA[plugins]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://pea.somemilk.org/?p=76</guid>
		<description><![CDATA[Well this is for those of you who learn Mandarin. Sometimes you need to type a phrase in pinyin and add some tone marks because digital notation (Wo3 shi4 Mei3guo2 ren2) looks ugly (Wǒ shì Měiguó rén is a way better). I&#8217;ve just made a WordPress plugin for this. This is a plugin homepage, but [...]]]></description>
			<content:encoded><![CDATA[<p>Well this is for those of you who learn Mandarin. Sometimes you need to type a phrase in pinyin and add some tone marks because digital notation (Wo3 shi4 Mei3guo2 ren2) looks ugly (Wǒ shì Měiguó rén is a way better). I&#8217;ve just made a WordPress plugin for this. This is a <a href="http://somemilk.org/pinyin-tones-plugin/">plugin homepage</a>, but if you want to know how it works or just need code not-for-wordpress, please read the rest of this entry.<br />
<span id="more-76"></span><br />
The code is very simple because the rules are very simple:</p>
<ol>
<li>We split all words to syllables. Pinyin notation is made to avoid confusion, for example, you must use an apostrophe to separate the syllables if confusion is possible, for instance &#8220;Tian&#8217;anmen&#8221; is spelled definitely as &#8220;Tian an men&#8221;, not &#8220;Ti an an men&#8221;. With digits it&#8217;s even more simple, it&#8217;s &#8220;Tian1an1men2&#8243;, no confusion is possible unless there is a light tone inside but i can&#8217;t think of an example. Anyway you can use an apostrophe in this case as well.</li>
<li>The diacritic mark appears on one of the syllable&#8217;s vowels. How to decide which one is that?
<ol>
<li>If there is &#8220;a&#8221; or &#8220;e&#8221;, it takes the mark</li>
<li>If there is &#8220;ou&#8221;, then &#8220;o&#8221; takes the mark</li>
<li>In all other cases, the last vowel takes it.</li>
</ol>
</li>
<li>There is an &#252; letter which is represented by &#8220;v&#8221; when typing pinyin, because it&#8217;s the only letter which is not used by it.</li>
</ol>
<p>Ok, now the code. It transforms everything which is inside [pinyin][/pinyin] block into the great looking pinyin.</p>
<pre class="brush: php;">
function transform_pinyin_tones($content)
{
    if(!preg_match_all('`\[pinyin\](.*)\[/pinyin\]`Uis', $content, $r)) return $content;

    $tones = array(
    'a1' =&gt; '257',
    'a2' =&gt; '225',
    'a3' =&gt; '462',
    'a4' =&gt; '224',
    'e1' =&gt; '275',
    'e2' =&gt; '233',
    'e3' =&gt; '283',
    'e4' =&gt; '232',
    'i1' =&gt; '299',
    'i2' =&gt; '237',
    'i3' =&gt; '464',
    'i4' =&gt; '236',
    'o1' =&gt; '333',
    'o2' =&gt; '243',
    'o3' =&gt; '466',
    'o4' =&gt; '242',
    'u1' =&gt; '363',
    'u2' =&gt; '250',
    'u3' =&gt; '468',
    'u4' =&gt; '249',
    'v1' =&gt; '470',
    'v2' =&gt; '472',
    'v3' =&gt; '474',
    'v4' =&gt; '476'
    );

    $vowels = array('a', 'e', 'i', 'o', 'u', 'v');

    foreach($r[0] as $i =&gt; $match)
    {
        $digital = $r[1][$i];
        $diacritic = $digital;
        if(!preg_match_all('`([a-z]{1,6})([1-4])`is', $digital, $syllables)) continue;
        foreach($syllables[0] as $k =&gt; $syllable)
        {
            $s = $syllables[1][$k];
            $t = $syllables[2][$k];
            if(preg_match('`(a|e)`i', $s, $r2))
            {
                $s = preg_replace('`'.$r2[1].'`i', '&amp;#'.$tones[strtolower($r2[1]).$t].';', $s);
            }
            elseif(preg_match('`ou`', $s, $r2))
            {
                $s = preg_replace('`ou`i', '&amp;#'.$tones['o'.$t].';u', $s);
            }
            else
            {
                for($j=strlen($s)-1;$j;$j--)
                {
                    if(in_array($s[$j], $vowels))
                    {
                        $s = str_replace($s[$j], '&amp;#'.$tones[$s[$j].$t].';', $s);
                        break;
                    }

                }
            }

            $diacritic = str_replace($syllable, $s, $diacritic);
        }

        $content = str_replace($match, $diacritic, $content);
    }
    return $content;
}
</pre>
]]></content:encoded>
			<wfw:commentRss>http://pea.somemilk.org/2009/04/10/pinyin-tones-transformation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Yahoo Search Marketing: Python SOAP binding</title>
		<link>http://pea.somemilk.org/2009/04/05/yahoo-search-marketing-python-soap-binding/</link>
		<comments>http://pea.somemilk.org/2009/04/05/yahoo-search-marketing-python-soap-binding/#comments</comments>
		<pubDate>Sun, 05 Apr 2009 18:42:13 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[search marketing]]></category>
		<category><![CDATA[yahoo]]></category>

		<guid isPermaLink="false">http://pea.somemilk.org/?p=33</guid>
		<description><![CDATA[Recently, I&#8217;ve been trying to make Yahoo Search Marketing API work with Python. SOAPpy, you know, as it is used by Adwords API and thus seemed fine to me. But the very first API call using SOAPpy (getCampaignsByAccountID) failed with the message &#8220;Account ID specified in the header does not match the one specified in [...]]]></description>
			<content:encoded><![CDATA[<p>Recently, I&#8217;ve been trying to make Yahoo Search Marketing API work with Python. <a href="http://pywebsvcs.sourceforge.net/">SOAPpy</a>, you know, as it is used by Adwords API and thus seemed fine to me. But the very first API call using SOAPpy (getCampaignsByAccountID) failed with the message &#8220;Account ID specified in the header does not match the one specified in the parameter.&#8221; although they both were fine. Yahoo team refused to give any support on this. Long story short, I&#8217;ve found out that YWS API really does care about the parameter order.<br />
<span id="more-33"></span><br />
So if you pass the parameters like this:</p>
<pre class="brush: xml;">
&lt;ns1:accountID&gt;1234567890&lt;/ns1:accountID&gt;
&lt;ns1:includeDeleted&gt;false&lt;/ns1:includeDeleted&gt;
</pre>
<p>it passes by, but if you pass them like:</p>
<pre class="brush: xml;">
&lt;ns1:includeDeleted&gt;false&lt;/ns1:includeDeleted&gt;
&lt;ns1:accountID&gt;1234567890&lt;/ns1:accountID&gt;
</pre>
<p>it doesn&#8217;t work. It doesn&#8217;t look reasonable (why do we name parameters in this case, for God&#8217;s sake!) but it makes using SOAPpy in its present form not possible, and here&#8217;s the reason. In SOAPpy, you pass parameters to function in natural Python way and they are processed using **kwargs and so on. But in this case you can&#8217;t be sure about the order they will go in final request. **kwargs is a dictionary. So, to be sure, you must pass the parameters as a tuple for example.</p>
<p>Finally, when I was at the end of realizing why this code doesn&#8217;t work, my testing code looked like something semi-complete so I&#8217;ve left it that way and now I&#8217;m using it in my project. It&#8217;s not perfect but I&#8217;m happy with it at the moment because it works and doesn&#8217;t require my attention. Here&#8217;s the code (including some snippets to parse XML SOAP response):</p>
<pre class="brush: python;">
import re
from datetime import datetime
from urllib2 import urlopen, Request, HTTPError, URLError
import xml.sax.handler

&quot;&quot;&quot;
    Yahoo Enterprise Web Services implementation, V5

    No SOAP bindings, SOAPpy is old and poorly documented, other bindings
    don't work from scratch.

&quot;&quot;&quot;

def xml2obj(src):
    &quot;&quot;&quot;
    A simple function to converts XML data into native Python object.
    &quot;&quot;&quot;

    non_id_char = re.compile('[^_0-9a-zA-Z]')
    def _name_mangle(name):
        return non_id_char.sub('_', name)

    class DataNode(object):
        def __init__(self):
            self._attrs = {}    # XML attributes and child elements
            self.data = None    # child text data
        def __len__(self):
            # treat single element as a list of 1
            return 1
        def __getitem__(self, key):
            if isinstance(key, basestring):
                return self._attrs.get(key,None)
            else:
                return [self][key]
        def __contains__(self, name):
            return self._attrs.has_key(name)
        def __nonzero__(self):
            return bool(self._attrs or self.data)
        def __getattr__(self, name):
            if name.startswith('__'):
                # need to do this for Python special methods???
                raise AttributeError(name)
            return self._attrs.get(name,None)
        def _add_xml_attr(self, name, value):
            if name == 'xsi_nil' and value == 'true':
                self.data = None
                return
            if name in self._attrs:
                # multiple attribute of the same name are represented by a list
                children = self._attrs[name]
                if not isinstance(children, list):
                    children = [children]
                    self._attrs[name] = children
                children.append(value)
            else:
                self._attrs[name] = value
        def __str__(self):
            return self.data or ''
        def __repr__(self):
            items = sorted(self._attrs.items())
            if self.data:
                items.append(('data', self.data))
            return u'{%s}' % ', '.join([u'%s:%s' % (k,repr(v)) for k,v in items])

    class TreeBuilder(xml.sax.handler.ContentHandler):
        def __init__(self):
            self.stack = []
            self.root = DataNode()
            self.current = self.root
            self.text_parts = []
        def startElement(self, name, attrs):
            self.stack.append((self.current, self.text_parts))
            self.current = DataNode()
            self.text_parts = []
            # xml attributes --&amp;gt; python attributes
            for k, v in attrs.items():
                self.current._add_xml_attr(_name_mangle(k), v)
        def endElement(self, name):
            text = ''.join(self.text_parts).strip()
            if text:
                self.current.data = text
            if self.current._attrs:
                obj = self.current
            else:
                # a text only node is simply represented by the string
                obj = text or ''
            self.current, self.text_parts = self.stack.pop()
            self.current._add_xml_attr(_name_mangle(name), obj)
        def characters(self, content):
            self.text_parts.append(content)

    builder = TreeBuilder()
    if isinstance(src,basestring):
        xml.sax.parseString(src, builder)
    else:
        xml.sax.parse(src, builder)
    return builder.root._attrs.values()[0]

class YahooEWS(object):

    VERSION = 'V5'
    LOCATION_ENDPOINT = 'https://marketing.ews.yahooapis.com/services'
    SERVICE_URL = None

    SERVICES = {
        'LocationService': ['getMasterAccountLocation'],
        'CampaignService': ['addCampaign', 'addCampaigns', 'deleteCampaign', 'deleteCampaigns', 'deleteGeographicLocationFromCampaign', 'getCampaign', 'getCampaignAdGroupCount', 'getCampaignKeywordCount', 'getCampaigns', 'getCampaignsByAccountID', 'getCampaignsByAccountIDByCampaignStatus', 'getGeographicLocationForCampaign', 'getMinBidForCampaignOptimizationGuidelines', 'getOptimizationGuidelinesForCampaign', 'getStatusForCampaign', 'getTargetingForCampaign', 'setCampaignOptimizationON', 'setGeographicLocationForCampaign', 'setOptimizationGuidelinesForCampaign', 'setTargetingForCampaign', 'updateCampaign', 'updateCampaigns', 'updateStatusForCampaign', 'updateStatusForCampaigns'],
        'AccountService': ['addAccount', 'addMoney', 'deleteBlockedDomainListForAccount', 'deleteContinentBlockListFromAccount', 'getAccount', 'getAccountBalance', 'getAccounts', 'getAccountStatus', 'getActiveCreditCard', 'getBlockedDomainListForAccount', 'getChargeAmount', 'getContinentBlockListForAccount', 'setActiveCreditCard', 'setBlockedDomainListForAccount', 'setChargeAmount', 'setContinentBlockListForAccount', 'updateAccount', 'updateStatusForAccount'],
        'AdGroupService': ['addAdGroup', 'addAdGroups', 'deleteAdGroup', 'deleteAdGroups', 'getAdGroup', 'getAdGroupAdCount', 'getAdGroupContentMatchMaxBid', 'getAdGroupExcludedWordsCount', 'getAdGroupKeywordCount', 'getAdGroups', 'getAdGroupsByCampaignID', 'getAdGroupsByCampaignIDByStatus', 'getAdGroupSponsoredSearchMaxBid', 'getContentMatchMinBidForAdGroupOptimizationGuidelines', 'getOptimizationGuidelinesForAdGroup', 'getSponsoredSearchMinBidForAdGroup', 'getSponsoredSearchMinBidForAdGroupOptimizationGuidelines', 'getSponsoredSearchMinBidForAdGroups', 'getStatusForAdGroup', 'moveAdGroup', 'setAdGroupContentMatchMaxBid', 'setAdGroupSponsoredSearchMaxBid', 'setOptimizationGuidelinesForAdGroup', 'updateAdGroup', 'updateAdGroups', 'updateStatusForAdGroup', 'updateStatusForAdGroups'],
        'AdService': ['addAd', 'addAds', 'deleteAd', 'deleteAds', 'getAd', 'getAds', 'getAdsByAdGroupByParticipatesInMarketplace', 'getAdsByAdGroupID', 'getAdsByAdGroupIDByEditorialStatus', 'getAdsByAdGroupIDByStatus', 'getEditorialReasonsForAd', 'getEditorialReasonText', 'getReasonsForAdNotParticipatingInMarketplace', 'getStatusForAd', 'getUpdateForAd', 'setAdUrl', 'updateAd', 'updateAds', 'updateStatusForAd', 'updateStatusForAds'],
        'KeywordService': ['addKeyword', 'addKeywords', 'copyKeyword', 'deleteKeyword', 'deleteKeywords', 'getEditorialReasonsForKeyword', 'getEditorialReasonText', 'getKeyword', 'getKeywords', 'getKeywordsByAccountID', 'getKeywordsByAdGroupByParticipatesInMarketplace', 'getKeywordsByAdGroupBySponsoredSearchBidStatus', 'getKeywordsByAdGroupID', 'getKeywordsByAdGroupIDByEditorialStatus', 'getKeywordsByAdGroupIDByStatus', 'getKeywordSponsoredSearchMaxBid', 'getOptimizationGuidelinesForKeyword', 'getReasonsForKeywordNotParticipatingInMarketplace', 'getSponsoredSearchMinBidForKeywordOptimizationGuidelines', 'getSponsoredSearchMinBidForKeywordString', 'getSponsoredSearchMinBidForKeywordStrings', 'getSponsoredSearchMinBidUpdatesByAdGroupId', 'getStatusForKeyword', 'getUpdateForKeyword', 'moveKeyword', 'setKeywordUrl', 'setOptimizationGuidelinesForKeyword', 'updateKeyword', 'updateKeywords', 'updateSponsoredSearchMaxBidForKeyword', 'updateSponsoredSearchMaxBidForKeywords', 'updateStatusForKeyword', 'updateStatusForKeywords'],
    }

    def __init__(self, options):
        self.options = options

    def _service_url(self):
        if not self.SERVICE_URL is None:
            return self.SERVICE_URL

        self.SERVICE_URL = self.call('getMasterAccountLocation')
        return self.SERVICE_URL

    def _soap_header(self):
        res = ''
        for k in self.options.keys():
            res += '%s' % (k, self.options[k], k)
        res += ''
        return res

    def call(self, function, args=[], service=None):
        if service is None:
            service = self._service_lookup(function)

        post_data = '\r\n\
' % self.VERSION
        post_data += self._soap_header()
        post_data += ''
        else:
            post_data += '&amp;gt;'
            for arg in args:
                # False -&gt; false
                value = arg[1]
                if value is False:
                    value = 'false'
                post_data += '%s' % (arg[0], value, arg[0])
            post_data += '' % function

        post_data += ''
        if service == 'LocationService':
            request_url = '%s/%s/LocationService' % (self.LOCATION_ENDPOINT, self.VERSION)
        else:
            request_url = '%s/%s/%s' % (self._service_url(), self.VERSION, service)

        req = Request(request_url, post_data)
        req.add_header('Content-type', 'text/xml; charset=utf-8')
        req.add_header('User-Agent', 'YWS SOAP')
        req.add_header('SOAPAction', '&quot;&quot;')
        try:
            r = urlopen(req).read()
        except HTTPError, e:
            raise Exception, xml2obj(e.read())['soap_Body']['soap_Fault']['faultstring']
        except URLError, e:
            raise Exception, 'Error: ' % e.reason

        res = xml2obj(r)['soap_Body']['%sResponse' % function]['out']
        if not res.data is None:
            return res.data

        return res

    def _service_lookup(self, function):
        for service in self.SERVICES.keys():
            if function in self.SERVICES[service]:
                return service
        raise Exception, 'Unknown function, please use service argument to call()'
</pre>
<p>Usage example:</p>
<pre class="brush: python;">
import sys
from misc.yahoo_ews import YahooEWS
options = {
    'username': 'user',
    'password': 'password',
    'masterAccountID': 'mymasteraccountid',
    'license': 'mylicensehere',
    'accountID': 'myaccountidhere'
}
yahoo = YahooEWS(options)

try:
    campaigns = yahoo.call(
        'getCampaignsByAccountID',
        [
         ('accountID', options['accountID']),
         ('includeDeleted', False)
        ])
    for c in campaigns.Campaign:
        print 'Campaign id: %s, name: %s' % (c.ID, c.name)
except:
    print sys.exc_info()
</pre>
]]></content:encoded>
			<wfw:commentRss>http://pea.somemilk.org/2009/04/05/yahoo-search-marketing-python-soap-binding/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Text generation with Markov chains</title>
		<link>http://pea.somemilk.org/2008/12/01/text-generation-with-markov-chains/</link>
		<comments>http://pea.somemilk.org/2008/12/01/text-generation-with-markov-chains/#comments</comments>
		<pubDate>Mon, 01 Dec 2008 14:57:08 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[markov chains]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[text generation]]></category>

		<guid isPermaLink="false">http://pea.somemilk.org/?p=27</guid>
		<description><![CDATA[Markov chains can be used by spammers (a bit outdated approach) to generate random texts from real ones. Not so useful for spammers though, so I did it actually for lulz. Here are some examples and some code follows.

Chain length=3 (each word depends on 2 past words):
Mich, see how I dash about! For a while [...]]]></description>
			<content:encoded><![CDATA[<p>Markov chains can be used by spammers (a bit outdated approach) to generate random texts from real ones. Not so useful for spammers though, so I did it actually for lulz. Here are some examples and some code follows.<br />
<span id="more-27"></span></p>
<p>Chain length=3 (each word depends on 2 past words):</p>
<blockquote><p>Mich, see how I dash about! For a while he had fixed them. He supervised pin the tail on the street below, outside this warm two-storey house, like so many serpentines thrown before her. Now, the parents. With laughing reluctance they slid down a slide which Mich had said eight years ago. He almost bent to take hold of Marion now, in her skull mask. He felt an inexplicable rush of pity for her, because she had watched her own party, like so many grey mice the children, like so many serpentines thrown before her. Now, the house was empty and silent in the autumn leaves before his granmother&#8217;s house many years ago and heard the bathroom door unlatch and she had produced a child in her own party, like so many grey mice the children, like so many leaves. By the sound of the child as an idea, completely asexual, an immaculate conception of contemptuous mind and cell. As a firm rebuke to him she had hurt him. It was quiet. Louise cried out, &#8216;Marion, Marion!&#8217; &#8216;Turn on the lights,&#8217; said one of</p></blockquote>
<p>Length=2 (each word depends on previous one only):</p>
<blockquote><p>&#8216;Hush, you&#8217;ll spoil everything,&#8217; some girl said. The room I&#8217;m in the house, four times around the rest, amid fits of the items. &#8216;Marion?&#8217; asked Louise. &#8216;Are you afraid?&#8217; Marion appeared in her disguise. &#8216;How do in winter he suspected that somehow he came after, smiling, &#8216;I can&#8217;t find her.&#8217; Then &#8230;&#8230; some idiot turned on was her skull sockets small blue eyes and says, &#8220;This is dead,&#8217; intoned the suddenly frozen task of table-section, into the last day he was a dark forcing murderer. It would be over! There was a child cried, &#8216;I&#8217;ll check the wild explosive in the furnace. The room he as he thought. With laughing down and picked her not a boy. &#8216;Come back, Helen!&#8217; Shot from her, all the husband. &#8216;She&#8217;s not wear this game with white bone masks and says, &#8220;These are you invited people in, but she was talking. &#8216;Marion?&#8217; asked Louise as much as he suspected that and the top it, on the bottom of October, with the last coming slowly had forced child. But when he came down. By the open. But when</p></blockquote>
<p>As you can see, the longer the chain gets, the longer pieces of original text appear in the result.</p>
<p>The same approach can be applied not only to words, but to characters as well:    </p>
<blockquote><p>ceineatrs, inghs -s nowad Bushin acllaplkin long. id, worshe plkeng. cereil won hie thenorildiloorie ganing. ang I&#8217; qut cthid Mar Moutorkerskise to theyotidinerkesk heseas smoullin blelo be t h heat eayis edin f tof ve uid h h. ist thes iso tuthe ofut ll, bume be cid this cesus congherid owintr apourowit t t Itid Ane cter alt Itowd glab thersthowaris ouedit, cs. acankngosefen the rors arsupis hatadeshead ce w, tlly pr hrit n&#8217;I siouldanots ithedeailoura d tce!&#8217;Nobe&#8217;Evesa hinoow-the Agalthescke&#8221;. pesed tabowippsched te!&#8221; add. ad t, agare. se Lower, ppeacr or m t s toucathe war. me rontar tongsen thelind &#8216;Qurofamp beliofe hoys he t h t. By. Loysey nehthton ake ad, othe he If s Anok &#8216; bontath vedn dll ande m onoond on wioused ghelen sctut adaiowlllwe chanind hrnsil Thown. s obirknd ine teisiloug hind spthalarrchiron praye s. chas che asanaumm juruse wa opig. ar. ght che alinene faldie, Ee nyoind he. dle &#8216;Mall w, s. apirout t wis I&#8217; chikie m t mig ha</p></blockquote>
<p>And here&#8217;s the code. PHP first.</p>
<pre class="brush: php;">
&lt;?
    /*
     * class for Markov's chains realization
     * (c) Andrey, somemilk.org@gmail.com, 2006
     *
     * Typical usage:
     * $m = new Markov(2); // 2 - markov chain length, depends on source
     *                     // file size, 2 for small files, 3-4 for large ones
     * $m-&gt;initFromFile(&quot;somefile.txt&quot;) or die(&quot;shit happens.&quot;);
     * $bullshit = $m-&gt;generate(2000); // generates bullshit 2000 chars long.
     * $m-&gt;initFromString(file_get_contents(&quot;somefile.txt&quot;)) or die(&quot;shit happens.&quot;);
     * $bullshit = $m-&gt;generate(100, MARKOV_OPT_WORDS); // generates bullshit 100 words long.
     */

    define(&quot;MARKOV_OPT_CHARACTERS&quot;, 1); // option: character limit for generate()
    define(&quot;MARKOV_OPT_WORDS&quot;,      2); // option: word limit for generate()
    define(&quot;MARKOV_DEFAULT_K&quot;,      3); // default Markov chain length
    define(&quot;MARKOV_TIME_LIMIT&quot;,     120); // time limit for initFromFile()

    define(&quot;MARKOV_MAX_RECURSION_LEVEL&quot;,     20);

    class Markov
    {
        var $k = MARKOV_DEFAULT_K;
        var $k_sets = array();
        var $split_method = MARKOV_OPT_WORDS;

        function Markov($k, $split_method = MARKOV_OPT_WORDS)
        {
            $this-&gt;k = $k;
            $this-&gt;split_method = $split_method;
        }

        /*
         * inits class' Markov k-sets with a text from a text file.
         *
         * returns true on success, false on failure
         */
        function initFromFile($filename)
        {
            set_time_limit(MARKOV_TIME_LIMIT);

            if(!is_readable($filename)) return false;

            $fc = file_get_contents($filename);
            $this-&gt;initFromString($fc);

            return true;
        }

        /*
         * inits class' Markov k-sets with a text from a string.
         *
         * returns true on success, false on failure
         */
        function initFromString($str)
        {
            if(strlen($str) k) return false;
            $this-&gt;k_sets = array();
            $set = array();

            if($this-&gt;split_method == MARKOV_OPT_WORDS)
            {
                $words = preg_split('/\s+/', trim($str));
                if(count($words) k) return false;
                foreach($words as $w)
                {
                    $this-&gt;_addToSets($set, $w);
                }
            }
            else
            {
                for($i=0; $i_addToSets($set, substr($str, $i, 1));
                }
            }

            return true;
        }

        function _addToSets(&amp;$set, $w)
        {
	        $set[] = $w;
            if(count($set) == $this-&gt;k)
            {
                $key = &quot;&quot;;
                for($i=0; $ik - 1; $i++)
                {
                    $key .= &quot;[\$set[$i]]&quot;;
                }
                eval(&quot;\$this-&gt;k_sets{$key}[] = \$set[$i];&quot;);
                array_shift($set);
            }
        }

        /*
         * you must re-init the class after calling setK()
         */
        function setK($k) { $this-&gt;k = $k; $this-&gt;k_sets = array(); }

        /*
         * generates random word $length characters long
         * (available only if split_method is MARKOV_OPT_CHARACTERS
         */
        function getWord($length)
        {
            if($this-&gt;split_method != MARKOV_OPT_CHARACTERS) return false;
            $res = &quot;&quot;;
            $set = array();
            while(strlen($res) k)
                {
                    $word = array_shift($set);
                    if(preg_match('/[\s\.,:\?!;&quot;]/', $word))
                    {
                        $res = &quot;&quot;;
                        continue;
                    }

                    $res .= $word;
                }

                if(strlen($res) == $length) break;

                if(count($set)) $element =&amp; $this-&gt;k_sets[$set[0]];
                else $element = null;
                foreach($set as $i =&gt; $word)
                {
                    if(isset($element[$word])) $element =&amp; $element[$word];
                }
                if(is_array($element))
                {
                    $word = $this-&gt;random_key($element);
                    if(is_array($element[$word])) $set[] = $word;
                    else $set[] = $this-&gt;random_value($element);
                }
                else $set[] = $this-&gt;random_key($this-&gt;k_sets);

            }
            return $res;
        }

        /*
         * generates Markov's string, max $how_much long,
         * in words (if $how==MARKOV_OPT_WORDS)
         * or in characters (if $how==MARKOV_OPT_CHARACTERS, default)
         */
        function generate($how_much, $how=MARKOV_OPT_CHARACTERS, $sentences=false, $recursion_level=0)
        {
            $res = &quot;&quot;;
            $n_words = 0;
            $set = array();
            $sentence_started = false;
            while(($how == MARKOV_OPT_CHARACTERS &amp;&amp; (strlen($res) &lt; $how_much)) ||
                  ($how == MARKOV_OPT_WORDS &amp;&amp; ($n_words k)
                {
                    $res .= ($word = array_shift($set));
                    if($this-&gt;split_method == MARKOV_OPT_WORDS)
                    {
                        $res .= &quot; &quot;;
                        $n_words++;
                    }
                    elseif(preg_match('/[\s\.,:\?!;&quot;]/', $word))
                    {
                        $n_words++;
                    }
                    if($sentences &amp;&amp; !$sentence_started &amp;&amp; preg_match('/\.$/', $word))
                    {
                        $sentence_started = true;
                        $res = &quot;&quot;;
                        $n_words = 0;
                    }
                }

                if(count($set)) $element =&amp; $this-&gt;k_sets[$set[0]];
                else $element = null;
                foreach($set as $i =&gt; $word)
                {
                    if(isset($element[$word])) $element =&amp; $element[$word];
                }
                if(is_array($element))
                {
                    $word = $this-&gt;random_key($element);
                    if(is_array($element[$word])) $set[] = $word;
                    else $set[] = $this-&gt;random_value($element);
                }
                else $set[] = $this-&gt;random_key($this-&gt;k_sets);

            }
            $res = rtrim($res);
            if($sentences)
            {
                $res = preg_replace('/\.[^\.]+$/', '.', $res);
                if(!preg_match('/\.$/', $res) &amp;&amp; ($recursion_level generate($how_much, $how, $sentences, $recursion_level+1));
            }
            return $res;
        }

        /*
         * helper function, returns random array key
         */
        function random_key(&amp;$array)
        {
            $rand = mt_rand(0, count($array) - 1);
            $i = 0;
            foreach($array as $key =&gt; $value) if($i++ == $rand) return $key;
        }
        /*
         * helper function, returns random array value
         */
        function random_value(&amp;$array)
        {
            $rand = mt_rand(0, count($array) - 1);
            $i = 0;
            foreach($array as $key =&gt; $value) if($i++ == $rand) return $value;
        }
    }
?&gt;
</pre>
<p>Python code, not mine, for generating words. I use it on one of my projects for generating random nicknames. </p>
<pre class="brush: python;">
#!/usr/bin/env python

from collections import defaultdict
from random import choice

class TextGenerator(object):

    def __init__(self):
        self._data = defaultdict(list)

    def train(self, file):
        words = [None, None]
        for line in open(file):
            for word in line.split():
                words[0], words[1] = words[1], word
                if words[0]:
                    self._data[words[0]].append(words[1])

    def gentext(self, num_words):
        text = []
        text.append(choice(self._data.keys()).title())
        while len(text) &lt; num_words:
            if self._data.has_key(text[-1]):
                text.append(choice(self._data[text[-1]]))
            else:
                text.append(choice(self._data.keys()))
        return ' '.join(text) + '.'

if __name__ == '__main__':
    textgen = TextGenerator()
    textgen.train('pandp.txt')
    print textgen.gentext(100)
</pre>
<p>You can google for some more code snippets, that&#8217;s what I found for perl:</p>
<pre class="brush: perl;">
#!/usr/bin/perl
# by jackal
use strict;
my $filename = shift || 'data.txt';
my $MAXGEN = 10000;
my $NONWORD = &quot;\n&quot;;
my $w1 = $NONWORD;
my %statetab;
my $fh;
open($fh, '&lt;', $filename);
while () {
    foreach (split) {
        push(@{$statetab{$w1}}, $_);
        $w1 = $_;
    }
}
close($fh);
push(@{$statetab{$w1}}, $NONWORD);
$w1 = $NONWORD;
open($fh, '&gt;', 'markoff.txt');
for (my $i=0; $i[$r]) eq $NONWORD);
    print $fh &quot;$t &quot;;
    $w1 = $t;
}
close($fh);
</pre>
<p>Further reading:</p>
<p>Wikipedia:<br />
<a href="http://en.wikipedia.org/wiki/Markov_chain">http://en.wikipedia.org/wiki/Markov_chain</a></p>
<p>Python code author&#8217;s site:<br />
<a href="http://bitecode.co.uk/">http://bitecode.co.uk/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://pea.somemilk.org/2008/12/01/text-generation-with-markov-chains/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to draw a smooth curve chart</title>
		<link>http://pea.somemilk.org/2008/10/29/how-draw-smooth-curve-chart/</link>
		<comments>http://pea.somemilk.org/2008/10/29/how-draw-smooth-curve-chart/#comments</comments>
		<pubDate>Wed, 29 Oct 2008 14:13:25 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[cairo]]></category>
		<category><![CDATA[django]]></category>

		<guid isPermaLink="false">http://pea.somemilk.org/?p=3</guid>
		<description><![CDATA[Some of us need to draw charts from time to time. Usually you have little choice, you either use a bar chart or a polygon chart.

In this post I will describe how to draw a pretty smooth curve chart using python and cairo. You can adapt this routine to be used with django, pycha or any other image creation library, code snippets in comments are appreciated.]]></description>
			<content:encoded><![CDATA[<p>Some of us need to draw charts from time to time. Usually you have little choice, you either use a bar chart:</p>
<p><img class="alignnone size-full wp-image-5" src="http://pea.somemilk.org/files/2008/10/vbarchart.png" alt="vbarchart" width="400" height="200" /></p>
<p>or a polygon chart:</p>
<p><img class="alignnone size-full wp-image-6" src="http://pea.somemilk.org/files/2008/10/linechart.png" alt="linechart" width="400" height="200" /></p>
<p>In this post I will describe how to draw a pretty smooth curve chart using python and cairo. You can adapt this routine to be used with django, pycha or any other image creation library, code snippets in comments are appreciated.<br />
<span id="more-3"></span></p>
<p>The main problem is, even if your image creation library has curve drawing functions (Bézier splines), those functions need some control points which DO NOT reside on the curve. So how do those points affect the curve? Let&#8217;s look at the example (it may seem complex, but it&#8217;s not, I just tried to make it more readable and easier to understand):</p>
<pre class="brush: python;">
import cairo
from math import pi

def draw_point(x, y):
    cr.move_to(x + 2, y)
    cr.arc(x, y, 2, 0, 2 * pi)
    cr.set_source_rgba(0, 0, 0, 1)
    cr.stroke()

surface = cairo.ImageSurface(cairo.FORMAT_ARGB32, 100, 100)
cr = cairo.Context(surface)
cr.set_line_width(2)

x0, y0 = 5, 5 # starting point
x3, y3 = 95, 95 # end point

x1, y1 = 95, 5 # control point 1
x2, y2 = 5, 95 # control point 2

&quot;&quot;&quot;let's draw all the points to see everything clearly&quot;&quot;&quot;
draw_point(x0, y0)
draw_point(x1, y1)
draw_point(x2, y2)
draw_point(x3, y3)

&quot;&quot;&quot;the line from starting point to control point 1&quot;&quot;&quot;
cr.move_to(x0, y0)
cr.line_to(x1, y1)
cr.set_source_rgba(0, 0, 0, 0.1)
cr.stroke()

&quot;&quot;&quot;the line from starting point to control point 2&quot;&quot;&quot;
cr.move_to(x3, y3)
cr.line_to(x2, y2)
cr.set_source_rgba(0, 0, 0, 0.1)
cr.stroke()

&quot;&quot;&quot;the curve itself&quot;&quot;&quot;
cr.move_to(x0, y0)
cr.curve_to(x1, y1, x2, y2, x3, y3)
cr.set_line_width(5)
cr.set_source_rgba(0, 0, 1, 1)
cr.stroke()

surface.write_to_png('curve.png')
</pre>
<p>So we are going to draw a curve from upper left corner down to right lower, and control points are in upper right and down left respectively. That will look like this:</p>
<p><img class="alignnone size-full wp-image-16" src="http://pea.somemilk.org/files/2008/10/curve1.png" alt="curve1" width="100" height="100" /></p>
<p>You can clearly see that the lines between control and starting and end points are tangents to the curve, and the farther the control point gets from the starting or end point, the more &#8220;curvy&#8221; it becomes:</p>
<p><img class="alignnone size-full wp-image-17" src="http://pea.somemilk.org/files/2008/10/curve2.png" alt="curve2" width="100" height="100" /><br />
<img class="alignnone size-full wp-image-18" src="http://pea.somemilk.org/files/2008/10/curve3.png" alt="curve3" width="100" height="100" /></p>
<p>And that leads us to an obvious solution. We need to connect each two points of the chart with curves, and the tangents in those points must be the same from both sides, otherwise the curve won&#8217;t be smooth. If we don&#8217;t look ahead and don&#8217;t look back on the curve, we must make those tangents horizontal. So the first control point will be a little on the left from the starting point and the second one little on the right from the ending point, but on the same vertical position. Here is the code with some debug functions letting us see the control points and tangents, we&#8217;ll comment them out later:</p>
<pre class="brush: python;">
import cairo
from math import pi, sqrt

width = 500
height = 100
graph_data = [
(0, 10),(20, 50),(40, 80),(60, 5),(80, 10),(100, 20),
(120, 30),(140, 60),(160, 95),(180, 30),(200, 50),
(220, 70),(240, 80),(260, 10),(280, 60),(300, 30),
(320, 90),(340, 95),(360, 30),(380, 10),(400, 5),
(420, 20),(440, 80),(460, 70),(480, 20),(500, 40)
]

def prepare_curve_data(graph_data):
    prepared_data = []
    for i in range(0, len(graph_data)):
        x, y = graph_data[i][0], graph_data[i][1]
        if i == 0:
            cx1, cy1 = x, y
        else:
            step_x = x - graph_data[i - 1][0]
            cx1, cy1 = x - step_x/2, y

        if i == len(graph_data) - 1:
            cx2, cy2 = x, y
        else:
            step_x = graph_data[i + 1][0] - x
            cx2, cy2 = x + step_x/2, y

        prepared_data.append((x, y, cx1, cy1, cx2, cy2))

    return prepared_data

def draw_point(x, y, opacity):
    cr.move_to(x + 2, y)
    cr.arc(x, y, 2, 0, 2 * pi)
    cr.set_source_rgba(0, 0, 0, opacity)
    cr.stroke()

def debug_points(cr, prepared_data):
    for i in range(0, len(prepared_data)):
        x, y = prepared_data[i][0], prepared_data[i][1]
        cx1, cy1 = prepared_data[i][2], prepared_data[i][3]
        cx2, cy2 = prepared_data[i][4], prepared_data[i][5]

        draw_point(x, y, 1)
        if cx1 != x or cy1 != y:
            draw_point(cx1, cy1, 0.3)
            cr.move_to(x, y)
            cr.line_to(cx1, cy1)
            cr.set_source_rgba(0, 0, 0, 0.1)
            cr.stroke()

        if cx2 != x or cy2 != y:
            draw_point(cx2, cy2, 0.3)
            cr.move_to(x, y)
            cr.line_to(cx2, cy2)
            cr.set_source_rgba(0, 0, 0, 0.1)
            cr.stroke()

def poly_curve(cr, prepared_data):
    for i in range(0, len(prepared_data) - 1):
        x, y = prepared_data[i][0], prepared_data[i][1]
        cx1, cy1 = prepared_data[i][4], prepared_data[i][5]
        cx2, cy2 = prepared_data[i + 1][2], prepared_data[i + 1][3]
        x2, y2 = prepared_data[i + 1][0], prepared_data[i + 1][1]
        cr.move_to(x, y)
        cr.curve_to(cx1, cy1, cx2, cy2, x2, y2)

surface = cairo.ImageSurface(cairo.FORMAT_ARGB32, width, height)
cr = cairo.Context(surface)
cr.set_line_width(2)

prepared_data = prepare_curve_data(graph_data)
debug_points(cr, prepared_data)
poly_curve(cr, prepared_data)
cr.set_source_rgba(0, 0, 0, 1)
cr.stroke()

surface.write_to_png('curve.png')
</pre>
<p>And this is the result:</p>
<p><img class="alignnone size-full wp-image-19" src="http://pea.somemilk.org/files/2008/10/curve4.png" alt="curve4" width="500" height="100" /></p>
<p>It is quite satisfactory, but it has some flaws. In the areas where the curve must go up or down steadily, we see some kind of bumps. It would be right to lean the tangents on those sections, so the control points would reside on the line parallel to the line connecting previous and next points. To make this more clear, I&#8217;ll rewrite the prepare_curve_data() and show you the result:</p>
<pre class="brush: python;">
def prepare_curve_data(graph_data):
    prepared_data = []
    for i in range(0, len(graph_data)):
        x, y = graph_data[i][0], graph_data[i][1]

        if (i != 0) and (i != len(graph_data) - 1):
            x_left, y_left = graph_data[i - 1][0], graph_data[i - 1][1]
            x_right, y_right = graph_data[i + 1][0], graph_data[i + 1][1]
            step_x_left = (x - x_left) / 2
            step_x_right = (x_right - x) / 2
            dx, dy = x_right - x_left, y_right - y_left
            h = sqrt(dx*dx + dy*dy)
            if h == 0:
                cx1, cy1, cx2, cy2 = x, y, x, y
            else:
                dx1, dy1 = (dx * step_x_left) / h, (dy * step_x_left) / h
                dx2, dy2 = (dx * step_x_right) / h, (dy * step_x_right) / h
                cx1, cx2 = x - dx1, x + dx2
                cy1, cy2 = y - dy1, y + dy2
        else:
            cx1, cy1, cx2, cy2 = x, y, x, y

        prepared_data.append((x, y, cx1, cy1, cx2, cy2))

    return prepared_data
</pre>
<p><img class="alignnone size-full wp-image-22" src="http://pea.somemilk.org/files/2008/10/curve5.png" alt="curve5" width="500" height="100" /></p>
<p>And finally: the snippet ready to be put in your code:</p>
<pre class="brush: python;">
import cairo
from math import pi, sqrt

width = 500
height = 100
graph_data = [
(0, 10),(20, 50),(40, 80),(60, 5),(80, 10),(100, 20),
(120, 30),(140, 60),(160, 95),(180, 30),(200, 50),
(220, 70),(240, 80),(260, 10),(280, 60),(300, 30),
(320, 90),(340, 95),(360, 30),(380, 10),(400, 5),
(420, 20),(440, 80),(460, 70),(480, 20),(500, 40)
]

def poly_curve(cr, graph_data):
    prepared_data = []
    for i in range(0, len(graph_data)):
        x, y = graph_data[i][0], graph_data[i][1]

        if (i != 0) and (i != len(graph_data) - 1):
            x_left, y_left = graph_data[i - 1][0], graph_data[i - 1][1]
            x_right, y_right = graph_data[i + 1][0], graph_data[i + 1][1]
            step_x_left = (x - x_left) / 2
            step_x_right = (x_right - x) / 2
            dx, dy = x_right - x_left, y_right - y_left
            h = sqrt(dx*dx + dy*dy)
            if h == 0:
                cx1, cy1, cx2, cy2 = x, y, x, y
            else:
                dx1, dy1 = (dx * step_x_left) / h, (dy * step_x_left) / h
                dx2, dy2 = (dx * step_x_right) / h, (dy * step_x_right) / h
                cx1, cx2 = x - dx1, x + dx2
                cy1, cy2 = y - dy1, y + dy2
        else:
            cx1, cy1, cx2, cy2 = x, y, x, y

        prepared_data.append((x, y, cx1, cy1, cx2, cy2))

    for i in range(0, len(prepared_data) - 1):
        x, y = prepared_data[i][0], prepared_data[i][1]
        cx1, cy1 = prepared_data[i][4], prepared_data[i][5]
        cx2, cy2 = prepared_data[i + 1][2], prepared_data[i + 1][3]
        x2, y2 = prepared_data[i + 1][0], prepared_data[i + 1][1]
        cr.move_to(x, y)
        cr.curve_to(cx1, cy1, cx2, cy2, x2, y2)

surface = cairo.ImageSurface(cairo.FORMAT_ARGB32, width, height)
cr = cairo.Context(surface)
cr.set_line_width(2)
poly_curve(cr, graph_data)
cr.set_source_rgba(0, 0, 0, 1)
cr.stroke()

surface.write_to_png('curve.png')
</pre>
<p>And the result:</p>
<p><img class="alignnone size-full wp-image-23" src="http://pea.somemilk.org/files/2008/10/curve6.png" alt="curve6" width="500" height="100" /></p>
<p>And now compare this to the polygon chart:</p>
<p><img class="alignnone size-full wp-image-24" src="http://pea.somemilk.org/files/2008/10/curve7.png" alt="curve7" width="500" height="100" /></p>
<p>Neat, eh?</p>
<p>References and things to read:</p>
<p>Cairo — vector graphics library: <a href="http://cairographics.org/">http://cairographics.org/</a><br />
Cairo for Python (pycairo): <a href="http://www.cairographics.org/pycairo/">http://www.cairographics.org/pycairo/</a><br />
Nice pycairo tutorial: <a href="http://www.tortall.net/mu/wiki/CairoTutorial">http://www.tortall.net/mu/wiki/CairoTutorial</a></p>
]]></content:encoded>
			<wfw:commentRss>http://pea.somemilk.org/2008/10/29/how-draw-smooth-curve-chart/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
