<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
  <channel>
    <title>tail -f /dev/dim</title>
    <link>http://tapoueh.org/index.html</link>
    <description>Dimitri Fontaine's blog</description>
    <language>en-us</language>
    <generator>Emacs Muse</generator>
<item>
  <title>from Parsing to Compiling</title>
  <link>http://tapoueh.org/blog/2013/05/13-from-parser-to-compiler.html</link>
  <description><![CDATA[<p>Last week came with two bank holidays in a row, and I took the opportunity
to design a <em>command language</em> for <a href="../../../pgsql/pgloader.html">pgloader</a>. While doing that, I unexpectedly
stumbled accross a very nice <em>AHAH!</em> moment, and I now want to share it with
you, dear reader.</p>

<center>
<p><img src="../../../images/lightbulb.gif" alt=""></p>
</center>

<center>
<p><em>AHAH, you'll see!</em></p>
</center>

<p>The general approach I'm following code wise with that <em>command language</em> is
to first get a code API to expose the capabilities of the system, then
somehow plug the <em>command language</em> into that API thanks to a <em>parser</em>. It turns
out that doing so in <em>Common Lisp</em> is really easy, and that you can get a
<em>compiler</em> for free too, while at it. Let's see about that.</p>

<h3>A very simple toy example</h3>

<p class="first">In this newsgroup article <a href="https://groups.google.com/forum/?fromgroups=#&#33;topic/comp.lang.lisp/JJxTBqf7scU">What is symbolic compoutation?</a>, <a href="http://informatimago.com/">Pascal Bourguignon</a>
did propose a very simple piece of code:</p>

<pre class="src">
(<span style="color: #fcaf3e;">defparameter</span> <span style="color: #fce94f;">*additive-color-graph*</span>
  '((red   (red white)   (green yellow) (blue magenta))
    (green (red yellow)  (green white)  (blue cyan))
    (blue  (red magenta) (green cyan)   (blue white))))

(<span style="color: #fcaf3e;">defun</span> <span style="color: #729fcf;">symbolic-color-add</span> (a b)
  (cadr (assoc a (cdr (assoc b *additive-color-graph*)))))
</pre>

<p>This is an example of <em>symbolic computation</em>, and we're going to build a
little <em>language</em> to express the data and the code. Not that we would need to
build one, mind you, more in order to have a really simple example leading
us to the <em>ahah</em> moment you're now waiting for.</p>

<p>Before we dive into the main topic, you have to realize that the previous
code example actually works: it's defining some data, using an implicit data
structure composed by nesting lists together, and defines a function that
knows how to sort out the data in that anonymous data structure so as to
compound 2 colors together.</p>

<pre class="src">
TOY-PARSER&gt; (symbolic-color-add 'red 'green)
YELLOW
</pre>


<h3>A command language and parser</h3>

<p class="first">I decided to go with the following <em>language</em>:</p>

<pre class="src">
color red   +red white    +green yellow  +blue magenta
color green +red yellow   +green white   +blue cyan
color blue  +red magenta  +green cyan    +blue white

mix red and green
</pre>

<p>And here's how some of the parser looks like, using the <a href="http://nikodemus.github.io/esrap/">esrap</a> <em>packrat</em> lib:</p>

<pre class="src">
(defrule color-name (and whitespaces (+ (alpha-char-p character)))
  (<span style="color: #729fcf;">:destructure</span> (ws name)
    (<span style="color: #fcaf3e;">declare</span> (ignore ws))               <span style="color: #888a85;">; </span><span style="color: #888a85;">ignore whitespaces
</span>    <span style="color: #888a85;">;; </span><span style="color: #888a85;">CL symbols default to upper case.
</span>    (intern (string-upcase (coerce name 'string)) <span style="color: #729fcf;">:toy-parser</span>)))

<span style="color: #888a85;">;;; </span><span style="color: #888a85;">parse string "+ red white"
</span>(defrule color-mix (and whitespaces <span style="color: #73d216;">"+"</span> color-name color-name)
  (<span style="color: #729fcf;">:destructure</span> (ws plus color-added color-obtained)
    (<span style="color: #fcaf3e;">declare</span> (ignore ws plus))          <span style="color: #888a85;">; </span><span style="color: #888a85;">ignore whitespaces and keywords
</span>    (list color-added color-obtained)))

<span style="color: #888a85;">;;; </span><span style="color: #888a85;">mix red and green
</span>(defrule mix-two-colors (and kw-mix color-name kw-and color-name)
  (<span style="color: #729fcf;">:destructure</span> (mix c1 and c2)
    (<span style="color: #fcaf3e;">declare</span> (ignore mix and))          <span style="color: #888a85;">; </span><span style="color: #888a85;">ignore keywords
</span>    (list c1 c2)))
</pre>

<p>Those <em>rules</em> are not the whole parser, go have a look at the project on
github if you want to see the whole code, it's called <a href="https://github.com/dimitri/toy-parser">toy-parser</a> over there.
The main idea here is to show that when we parse a line from our little
language, we produce the simplest possible structured data: in lisp that's
<em>symbols</em> and <em>lists</em>.</p>

<p>The reason why it makes sense doing that is the next rule:</p>

<center>
<p><img src="../../../images/the-one-ring.jpg" alt=""></p>
</center>

<center>
<p><em>The one grammar rule to bind them all</em></p>
</center>

<pre class="src">
(defrule program (and colors mix-two-colors)
  (<span style="color: #729fcf;">:destructure</span> (graph (c1 c2))
    `(<span style="color: #fcaf3e;">lambda</span> ()
       (<span style="color: #fcaf3e;">let</span> ((*additive-color-graph* ',graph))
         (symbolic-color-add ',c1 ',c2)))))
</pre>

<p>This rule is the complex one to bind them all. It's using a <em>quasiquote</em>, a
basic lisp syntax element allowing the programmer to very easily produce
data that looks exactly like code. Let's see how it goes with a very simple
example:</p>

<pre class="src">
TOY-PARSER&gt; (pprint (parse 'program
                           <span style="color: #73d216;">"color red +green yellow mix green and red"</span>))

(LAMBDA NIL
  (LET ((*ADDITIVE-COLOR-GRAPH* '((RED (GREEN YELLOW)))))
    (SYMBOLIC-COLOR-ADD 'RED 'GREEN)))

</pre>

<p>The parser is producing structure (nested) data that really looks like lisp
code, right? So maybe we can just run that code...</p>


<h3>What about a compiler now?</h3>

<center>
<p><img src="../../../images/aha.jpg" alt=""></p>
</center>

<center>
<p><em>Here is my AHAH moment!</em></p>
</center>

<p>Let's see about actually running the code:</p>

<pre class="src">
TOY-PARSER&gt; (<span style="color: #fcaf3e;">let*</span> ((code <span style="color: #73d216;">"color red +green yellow mix green and red"</span>)
                   (program (parse 'program code)))
              (compile nil program))
#&lt;Anonymous Function #x3020027CF0EF&gt;
NIL
NIL
TOY-PARSER&gt; (<span style="color: #fcaf3e;">let*</span> ((code <span style="color: #73d216;">"color red +green yellow mix green and red"</span>)
                   (program (parse 'program code)))
              (funcall (compile nil program)))
YELLOW
</pre>

<p>So we have a string reprensing code in our very little language, and a
parser that knows how to produce a nested list of atoms that looks like lisp
code. And as we have lisp, we can actually compile that code at run-time
with the same compiler that we used to produce our parser, and we can then
<code>funcall</code> that function we just built.</p>

<p>Oh and the function is actually compiled down to native code, of course:</p>

<pre class="src">
TOY-PARSER&gt; (<span style="color: #fcaf3e;">let*</span> ((code <span style="color: #73d216;">"color red +green yellow mix red and green"</span>)
                   (program (parse 'program code))
                   (func    (compile nil program)))
              (time (<span style="color: #fcaf3e;">loop</span> repeat 1000 do (funcall func))))

(<span style="color: #fcaf3e;">LOOP</span> REPEAT 1000 DO (FUNCALL FUNC))
took 108 microseconds (0.000108 seconds) to run.
During that period, and with 4 available CPU cores,
     105 microseconds (0.000105 seconds) were spent in user mode
      13 microseconds (0.000013 seconds) were spent in system mode
NIL
</pre>

<p>Yeah, it took the whole of <code>108 microseconds</code> to actually run the code
generated by our own <em>parser</em> <strong>a thousand times</strong>, on my laptop. I can believe
it's been compiled to native code, that seems like the right ballpark.</p>


<h3>Conclusion</h3>

<p class="first">The <a href="https://github.com/dimitri/toy-parser">toy-parser</a> code is there on <em>GitHub</em> and you can actually load it using
<a href="http://www.quicklisp.org/">Quicklisp</a>: clone the repository in <code>~/quicklisp/local-projects/</code> then
<code>(ql:quickload &quot;toy-parser&quot;)</code>, and play with it in <code>(in-package :toy-parser)</code>.</p>

<p>The only thing I still want to say here is this: can your programming
language of choice make it that easy?</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 13 May 2013 11:08:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/05/13-from-parser-to-compiler.html</guid>
</item>
<item>
  <title>Nearest Big City</title>
  <link>http://tapoueh.org/blog/2013/05/02-nearest-big-city.html</link>
  <description><![CDATA[<p>In this article, we want to find the town with the greatest number of
inhabitants near a given location.</p>

<center>
<p><img src="../../../images/global_accessibility-640.png" alt=""></p>
</center>

<h3>A very localized example</h3>

<p class="first">We first need to find and import some data, and I found at the following
place a <a href="http://www.lion1906.com/Pages/francais/utile/telechargements.html">CSV listing of french cities with coordinates and population</a> and
some numbers of interest for the exercise here.</p>

<p>To import the data set, we first need a table, then a <code>COPY</code> command:</p>

<pre class="src">
<span style="color: #fcaf3e;">CREATE</span> <span style="color: #fcaf3e;">TABLE</span> <span style="color: #729fcf;">lion1906</span> (
  insee       <span style="color: #c17d11;">text</span>,
  nom         <span style="color: #c17d11;">text</span>,
  altitude    <span style="color: #c17d11;">integer</span>,
  code_postal <span style="color: #c17d11;">text</span>,
  longitude   <span style="color: #c17d11;">double</span> <span style="color: #c17d11;">precision</span>,
  latitude    <span style="color: #c17d11;">double</span> <span style="color: #c17d11;">precision</span>,
  pop99       <span style="color: #c17d11;">bigint</span>,
  surface     <span style="color: #c17d11;">double</span> <span style="color: #c17d11;">precision</span>
);

\<span style="color: #729fcf;">copy</span> lion1906 <span style="color: #fcaf3e;">from</span> <span style="color: #73d216;">'villes.csv'</span> <span style="color: #fcaf3e;">with</span> <span style="color: #729fcf;">csv</span> <span style="color: #729fcf;">header</span> <span style="color: #729fcf;">delimiter</span> <span style="color: #73d216;">';'</span> <span style="color: #729fcf;">encoding</span> <span style="color: #73d216;">'latin1'</span>
</pre>

<p>With that data in place, we can find the 10 nearest towns of a random
choosing of us, let's pick <em>Villeurbanne</em> which is in the region of <em>Lyon</em>.</p>

<pre class="src">
   <span style="color: #fcaf3e;">select</span> code_postal, nom, pop99
     <span style="color: #fcaf3e;">from</span> lion1906
 <span style="color: #fcaf3e;">order</span> <span style="color: #729fcf;">by</span> <span style="color: #c17d11;">point</span>(longitude, latitude) &lt;-&gt;
          (<span style="color: #fcaf3e;">select</span> <span style="color: #c17d11;">point</span>(longitude, latitude)
             <span style="color: #fcaf3e;">from</span> lion1906
            <span style="color: #fcaf3e;">where</span> nom = <span style="color: #73d216;">'Villeurbanne'</span>)
    <span style="color: #fcaf3e;">limit</span> 10;

 code_postal |          nom           | pop99
<span style="color: #888a85;">-------------+------------------------+--------
</span> 69100       | Villeurbanne           | 124215
 69300       | Caluire-et-Cuire       |  41233
 69120       | Vaulx-en-Velin         |  39154
 69580       | Sathonay-Camp          |   4336
 69140       | Rillieux-la-Pape       |  28367
 69000       | Lyon                   | 445452
 69500       | Bron                   |  37369
 69580       | Sathonay-Village       |   1693
 01700       | Neyron                 |   2157
 69660       | Collonges-au-Mont-d<span style="color: #73d216;">'Or |   3420
(10 rows)
</span></pre>

<p>We find Lyon in our list in there, and we want the query now to return only
that one as it has the greatest number of inhabitants in the list:</p>

<pre class="src">
<span style="color: #fcaf3e;">with</span> neighbours <span style="color: #fcaf3e;">as</span> (
   <span style="color: #fcaf3e;">select</span> code_postal, nom, pop99
     <span style="color: #fcaf3e;">from</span> lion1906
 <span style="color: #fcaf3e;">order</span> <span style="color: #729fcf;">by</span> <span style="color: #c17d11;">point</span>(longitude, latitude) &lt;-&gt;
          (<span style="color: #fcaf3e;">select</span> <span style="color: #c17d11;">point</span>(longitude, latitude)
             <span style="color: #fcaf3e;">from</span> lion1906 <span style="color: #fcaf3e;">where</span> nom = <span style="color: #73d216;">'Villeurbanne'</span>)
    <span style="color: #fcaf3e;">limit</span> 10
)
  <span style="color: #fcaf3e;">select</span> *
    <span style="color: #fcaf3e;">from</span> neighbours
<span style="color: #fcaf3e;">order</span> <span style="color: #729fcf;">by</span> pop99 <span style="color: #fcaf3e;">desc</span>
   <span style="color: #fcaf3e;">limit</span> 1;

 code_postal | nom  | pop99
<span style="color: #888a85;">-------------+------+--------
</span> 69000       | Lyon | 445452
(1 <span style="color: #729fcf;">row</span>)
</pre>

<p>Well, thank you PostgreSQL, that was easy!</p>

<p>Note that you can actually index such queries, that's called a <em>KNN index</em>.
PostgreSQL knows how to use some kind of indexes to fetch data matching an
expression such as <code>ORDER BY a &lt;-&gt; b</code>, which allow you to consider a <em>KNN</em>
search in your application.</p>


<h3>Let's get worldwide</h3>

<p class="first">The real scope of our exercise is to associate every known town in the world
with some big city around, so let's first fetch and import some worldwide
data this time, from
<a href="maxmind">http://download.maxmind.com/download/worldcities/worldcitiespop.txt.gz</a>.</p>

<center>
<p><img src="../../../images/map_nearest_city_01.gif" alt=""></p>
</center>

<pre class="src">
<span style="color: #fcaf3e;">CREATE</span> <span style="color: #fcaf3e;">TABLE</span> <span style="color: #729fcf;">maxmind_worldcities</span> (
        country_code <span style="color: #c17d11;">text</span>,
        city_lower <span style="color: #c17d11;">text</span>,
        city_normal <span style="color: #c17d11;">text</span>,
        region_code <span style="color: #c17d11;">text</span> <span style="color: #fcaf3e;">DEFAULT</span> <span style="color: #73d216;">''</span>,
        population <span style="color: #c17d11;">INT</span> <span style="color: #fcaf3e;">DEFAULT</span> <span style="color: #73d216;">'0'</span>,
        latitude <span style="color: #c17d11;">float8</span> <span style="color: #fcaf3e;">DEFAULT</span> <span style="color: #73d216;">'0'</span>,
        longitude <span style="color: #c17d11;">float8</span> <span style="color: #fcaf3e;">DEFAULT</span> <span style="color: #73d216;">'0'</span>
);

\<span style="color: #729fcf;">copy</span> maxmind_worldcities <span style="color: #fcaf3e;">FROM</span> <span style="color: #73d216;">'/tmp/worldcitiespop.txt'</span> <span style="color: #fcaf3e;">WITH</span>  <span style="color: #729fcf;">DELIMITER</span> <span style="color: #73d216;">','</span> <span style="color: #729fcf;">QUOTE</span> E<span style="color: #73d216;">'\f'</span> <span style="color: #729fcf;">CSV</span> <span style="color: #729fcf;">HEADER</span> <span style="color: #729fcf;">ENCODING</span> <span style="color: #73d216;">'LATIN1'</span>;

<span style="color: #729fcf;">alter</span> <span style="color: #fcaf3e;">table</span> <span style="color: #729fcf;">maxmind_worldcities</span> <span style="color: #729fcf;">add</span> <span style="color: #fcaf3e;">column</span> loc <span style="color: #c17d11;">point</span>;
<span style="color: #729fcf;">update</span> maxmind_worldcities <span style="color: #729fcf;">set</span> loc = <span style="color: #c17d11;">point</span>(longitude, latitude);
</pre>

<p>This time you can see that I created an extra column with the <em>location</em> in
there, so that I don't have to compute it each time I need it, like I did
before.</p>

<p>Now is the time to test that data set and hopefully fetch the same result as
before when we only had french cities loaded:</p>

<pre class="src">
<span style="color: #fcaf3e;">with</span> neighbours <span style="color: #fcaf3e;">as</span> (
   <span style="color: #fcaf3e;">select</span> country_code, city_lower, population
     <span style="color: #fcaf3e;">from</span> maxmind_worldcities
    <span style="color: #fcaf3e;">where</span> population <span style="color: #fcaf3e;">is</span> <span style="color: #fcaf3e;">not</span> <span style="color: #fcaf3e;">null</span>
 <span style="color: #fcaf3e;">order</span> <span style="color: #729fcf;">by</span> loc &lt;-&gt;
          (<span style="color: #fcaf3e;">select</span> loc
             <span style="color: #fcaf3e;">from</span> maxmind_worldcities
            <span style="color: #fcaf3e;">where</span> city_lower = <span style="color: #73d216;">'villeurbanne'</span>)
   <span style="color: #fcaf3e;">limit</span> 10
)
   <span style="color: #fcaf3e;">select</span> * <span style="color: #fcaf3e;">from</span> neighbours <span style="color: #fcaf3e;">order</span> <span style="color: #729fcf;">by</span> population <span style="color: #fcaf3e;">desc</span> <span style="color: #fcaf3e;">limit</span> 1;

  country_code | city_lower | population
<span style="color: #888a85;">--------------+------------+------------
</span> fr           | lyon       |     463700
(1 <span style="color: #729fcf;">row</span>)
</pre>

<p>Ok, looks like we're all set for the real problem. Now we want to pick for
each of those cities it's nearest neighboor, so here's how to do that:</p>

<pre class="src">
<span style="color: #fcaf3e;">create</span> <span style="color: #729fcf;">index</span> <span style="color: #fcaf3e;">on</span> maxmind_worldcities(country_code, region_code, city_lower);
<span style="color: #fcaf3e;">create</span> <span style="color: #729fcf;">index</span> <span style="color: #fcaf3e;">on</span> maxmind_worldcities <span style="color: #fcaf3e;">using</span> gist(loc);

<span style="color: #fcaf3e;">create</span> <span style="color: #fcaf3e;">table</span> <span style="color: #729fcf;">maxmind_neighbours</span> <span style="color: #fcaf3e;">as</span>
  <span style="color: #fcaf3e;">select</span> country_code, region_code, city_lower,
         (<span style="color: #fcaf3e;">with</span> neighbours <span style="color: #fcaf3e;">as</span> (
             <span style="color: #fcaf3e;">select</span> country_code, city_lower, population
               <span style="color: #fcaf3e;">from</span> maxmind_worldcities
              <span style="color: #fcaf3e;">where</span> population <span style="color: #fcaf3e;">is</span> <span style="color: #fcaf3e;">not</span> <span style="color: #fcaf3e;">null</span>
                    <span style="color: #fcaf3e;">and</span> country_code = wc.country_code
                    <span style="color: #fcaf3e;">and</span> region_code = wc.region_code
           <span style="color: #fcaf3e;">order</span> <span style="color: #729fcf;">by</span> loc &lt;-&gt; wc.loc
             <span style="color: #fcaf3e;">limit</span> 10)
             <span style="color: #fcaf3e;">select</span> city_lower
               <span style="color: #fcaf3e;">from</span> neighbours
           <span style="color: #fcaf3e;">order</span> <span style="color: #729fcf;">by</span> population <span style="color: #fcaf3e;">desc</span>
              <span style="color: #fcaf3e;">limit</span> 1
         ) <span style="color: #fcaf3e;">as</span> neighbour
    <span style="color: #fcaf3e;">from</span> maxmind_worldcities wc ;
</pre>

<p>To be fair, I have to tell you that this query took almost 2 hours to
complete on my laptop here, but as I'm doing that for friend and a blog
article, I've been lazy and didn't try to optimise it. It could be using
<code>LATERAL</code> for sure, I don't know if that would help very much with
performances: I didn't try.</p>

<p>With that in hands we can now check some cities and their <em>biggest</em>
neighbours, as in the following query:</p>

<pre class="src">
<span style="color: #fcaf3e;">select</span> * <span style="color: #fcaf3e;">from</span> maxmind_neighbours <span style="color: #fcaf3e;">where</span> city_lower = <span style="color: #73d216;">'villeurbanne'</span>;
 country_code | region_code |  city_lower  | neighbour
<span style="color: #888a85;">--------------+-------------+--------------+-----------
</span> fr           | B9          | villeurbanne | lyon
(1 <span style="color: #729fcf;">row</span>)
</pre>

<p>And looking for New-York City suburbs I did find a <em>chinatown</em>, which is a
pretty common smaller town name apparently:</p>

<pre class="src">
<span style="color: #fcaf3e;">select</span> * <span style="color: #fcaf3e;">from</span> maxmind_neighbours <span style="color: #fcaf3e;">where</span> city_lower = <span style="color: #73d216;">'chinatown'</span>;
 country_code | region_code | city_lower |   neighbour
<span style="color: #888a85;">--------------+-------------+------------+---------------
</span> sb           | 08          | chinatown  | honiara
 us           | CA          | chinatown  | san francisco
 us           | DC          | chinatown  | washington
 us           | HI          | chinatown  | honolulu
 us           | IL          | chinatown  | chicago
 us           | MT          | chinatown  | missoula
 us           | NV          | chinatown  | reno
 us           | NY          | chinatown  | <span style="color: #729fcf;">new</span> york
(8 <span style="color: #729fcf;">rows</span>)
</pre>


<h3>Big Cities in the big world</h3>

<center>
<p><img src="../../../images/Old-Photos-of-Big-Cities-21.jpg" alt=""></p>
</center>

<center>
<p><em>We might need to change some of our views</em></p>
</center>

<p>So, let's see how many smaller towns each of those random big cities have:</p>

<pre class="src">
   <span style="color: #fcaf3e;">select</span> country_code, region_code, neighbour, <span style="color: #729fcf;">count</span>(*)
    <span style="color: #fcaf3e;">from</span> maxmind_neighbours
   <span style="color: #fcaf3e;">where</span> neighbour <span style="color: #fcaf3e;">in</span> (<span style="color: #73d216;">'london'</span>, <span style="color: #73d216;">'new york'</span>, <span style="color: #73d216;">'moscow'</span>,
                       <span style="color: #73d216;">'paris'</span>, <span style="color: #73d216;">'tokyo'</span>, <span style="color: #73d216;">'sao polo'</span>, <span style="color: #73d216;">'chicago'</span>)
<span style="color: #fcaf3e;">group</span> <span style="color: #729fcf;">by</span> country_code, region_code, neighbour;
 country_code | region_code | neighbour | <span style="color: #729fcf;">count</span>
<span style="color: #888a85;">--------------+-------------+-----------+-------
</span> gb           | H9          | london    |     2
 jp           | 40          | tokyo     |   414
 us           | NY          | <span style="color: #729fcf;">new</span> york  |   131
 ca           | 08          | london    |    16
 ru           | 48          | moscow    |   245
 fr           | A8          | paris     |    16
 us           | IL          | chicago   |    13
(7 <span style="color: #729fcf;">rows</span>)
</pre>

<p>And now let's be fair and see where are the cities with the greatest number
of towns nearby them, with the following query:</p>

<pre class="src">
  <span style="color: #fcaf3e;">select</span> country_code, region_code, neighbour, <span style="color: #729fcf;">count</span>(*)
    <span style="color: #fcaf3e;">from</span> maxmind_neighbours
   <span style="color: #fcaf3e;">where</span> neighbour <span style="color: #fcaf3e;">is</span> <span style="color: #fcaf3e;">not</span> <span style="color: #fcaf3e;">null</span>
<span style="color: #fcaf3e;">group</span> <span style="color: #729fcf;">by</span> country_code, region_code, neighbour
<span style="color: #fcaf3e;">order</span> <span style="color: #729fcf;">by</span> 4 <span style="color: #fcaf3e;">desc</span>
   <span style="color: #fcaf3e;">limit</span> 25;

 country_code | region_code | neighbour  | <span style="color: #729fcf;">count</span>
<span style="color: #888a85;">--------------+-------------+------------+-------
</span> cn           | 03          | nanchang   | 16759
 cn           | 26          | xian       | 12864
 <span style="color: #729fcf;">id</span>           | 18          | kupang     | 10715
 cn           | 24          | taiyuan    | 10550
 mm           | 11          | taunggyi   | 10253
 <span style="color: #729fcf;">id</span>           | 38          | makasar    |  9471
 ir           | 15          | ahvaz      |  9461
 <span style="color: #729fcf;">id</span>           | 01          | banda aceh |  9161
 cn           | 14          | lasa       |  8841
 cn           | 15          | lanzhou    |  8618
 ir           | 29          | kerman     |  8579
 <span style="color: #729fcf;">id</span>           | 26          | medan      |  7787
 ir           | 04          | iranshahr  |  7249
 ir           | 07          | shiraz     |  7219
 ma           | 55          | agadir     |  7121
 ir           | 42          | mashhad    |  7107
 af           | 08          | gazni      |  7011
 ir           | 33          | tabriz     |  6586
 cn           | 01          | hefei      |  6521
 bd           | 81          | dhaka      |  6480
 ir           | 08          | rasht      |  6471
 <span style="color: #729fcf;">id</span>           | 17          | mataram    |  6467
 <span style="color: #729fcf;">id</span>           | 33          | cilegon    |  6287
 af           | 23          | qandahar   |  6213
 cn           | 07          | fuzhou     |  6089
(25 <span style="color: #729fcf;">rows</span>)
</pre>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 02 May 2013 11:34:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/05/02-nearest-big-city.html</guid>
</item>
<item>
  <title>Emacs Conference</title>
  <link>http://tapoueh.org/blog/2013/04/02-Emacs-Conference.html</link>
  <description><![CDATA[<p>Yes it did happen, for real, in London: the <a href="http://emacsconf.herokuapp.com/">Emacs Conference</a>. It was easter
week-end. Yet the conference managed to have more than 60 people meet
together and spend a full day talking about <a href="http://www.gnu.org/software/emacs/">Emacs</a>. If you weren't there, a
live stream was available and soon enough (wait for about two weeks) the
video material will be published, as <a href="http://sachachua.com/blog/">sacha</a> is working on it.</p>

<center>
<p><img src="../../../images/toplap-small.png" alt=""></p>
</center>

<p>The conference has been packed with awesome really. Among the things that
I'm going home with are new thoughts, tricks and tips, and new modes to use
in Emacs.</p>

<p>The main new though is all about learning to program. That's a problem space
in which I have a growing interest in, and the conference talk about <em>arxana</em>
showed that it should be possible to build an environment where you can
learn programming with the excuse of having fun with maths. And after
talking about music and its notation and <a href="http://www.lilypond.org/">lilypond</a>, it should even be
possible to offer some interactive programming environment where not only
you can play music live as <a href="http://meta-ex.com/">Meta-Ex</a> is doing, but where the other output of
your program would be the updated music scores.</p>

<p>The main practical bits I'm going home with is <a href="http://www.foldr.org/~michaelw/projects/redshank">redshank</a>, <em>A collection of
code-wrangling Emacs macros mostly geared towards Common Lisp, but some are
useful for other Lisp dialects, too</em>. That complements <a href="http://mumble.net/~campbell/emacs/paredit.el">paredit</a> and allows you
to do some reformating very easily.</p>

<p>Lastly, I'm back to giving the dark background environment a try now. I
think I prefer the contrast and richer color sets of the default Emacs color
theme, but the black window has some classy visual effect too. And with
<a href="https://github.com/jasonm23/emacs-mainline">main-line</a> the effect is quite awesome!</p>

<center>
<p><img src="../../../images/Emacs-Tango-2-Main-Line.png" alt=""></p>
</center>

<center>
<p><em>Look at that!</em></p>
</center>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 02 Apr 2013 09:56:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/04/02-Emacs-Conference.html</guid>
</item>
<item>
  <title>The Need For Speed</title>
  <link>http://tapoueh.org/blog/2013/03/29-the-need-for-speed.html</link>
  <description><![CDATA[<p>Hier se tenait la <a href="http://www.postgresql-sessions.org/en/5/start">cinquième édition</a> de la conférence organisée par <em>dalibo</em>,
où des intervenants extérieurs sont régulièrement invités. Le thème hier
était à la fois clair et très vaste : la performance.</p>

<p>J'ai eu le plaisir de réaliser une présentation intitulée « The Need for
Speed » dans laquelle on replace l'effort d'optimisation dans son contexte
métier, afin de faire une étude des coûts et bénéfices et de savoir non
seulement à quoi s'attendre mais aussi quand s'arrêter.</p>

<center>
<p><a class="image-link" href="../../../images/confs/the_need_for_speed.pdf">
<img src="../../../images/confs/the_need_for_speed-3.png"></a></p>
</center>

<p>Merci à <em>dalibo</em> pour cette conférence !</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 29 Mar 2013 09:49:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/03/29-the-need-for-speed.html</guid>
</item>
<item>
  <title>Bulk Replication</title>
  <link>http://tapoueh.org/blog/2013/03/18-bulk-replication.html</link>
  <description><![CDATA[<p>In the previous article here we talked about how to <em>properly</em> update more
than one row at a time, under the title <a href="http://tapoueh.org/blog/2013/03/15-batch-update.html">Batch Update</a>. We did consider
performances, including network round trips, and did look at the behavior of
our results when used concurrently.</p>

<center>
<p><img src="../../../images/clock-key.jpg" alt=""></p>
</center>

<p>A case where we want to apply the previous article approach is when
replicating data with a <em>trigger based solution</em>, such as <a href="http://wiki.postgresql.org/wiki/SkyTools">SkyTools</a> and
<a href="https://github.com/markokr/skytools">londiste</a>. Well, maybe not in all cases, we need to have a amount of <code>UPDATE</code>
trafic worthy of setting up the solution. As soon as we know we're getting
to <em>replay</em> important enough batches of events, though, certainly using the
<em>batch update</em> tricks makes sense.</p>

<p>It so happens that <code>londiste 3</code> includes the capability to use <em>handlers</em>. Those
are plugins written in <em>python</em> (like all the client side code from <em>SkyTools</em>)
whose job is to handle the <em>processing</em> of the event batches. Several of them
are included in the <a href="https://github.com/markokr/skytools/tree/master/python/londiste">londiste sources</a>, and one of them is named <code>bulk.py</code>.</p>

<h3>Bulk loading data with londiste</h3>

<p class="first">To use set in <code>londiste.ini</code>:</p>

<pre class="src">
<span style="color: #fce94f;">handler_modules</span> = londiste.handlers.bulk
</pre>

<p>then add table with one of those commands:</p>

<pre class="src">
londiste3 add-table xx --handler=<span style="color: #73d216;">"bulk"</span>
londiste3 add-table xx --handler=<span style="color: #73d216;">"bulk(method=X)"</span>
</pre>

<p>The default method is <code>0</code>, and the available methods are the following:</p>

<p><em>correct</em> (<code>0</code>)</p>

<ul>
<li>inserts as <code>COPY</code> into table</li>
<li>update as <code>COPY</code> into temp table and single <code>UPDATE</code> from there</li>
<li>delete as <code>COPY</code> into temp table and single <code>DELETE</code> from there</li>
</ul>

<p><em>delete</em> (<code>1</code>)</p>

<ul>
<li>as <em>correct</em>, but <em>update</em> are done as <code>DELETE</code> then <code>COPY</code></li>
</ul>

<p><em>merged</em> <code>(2</code>)</p>

<ul>
<li>as <em>delete</em>, but merge <em>insert</em> rows with <em>update</em> rows</li>
</ul>


<h3>Conclusion</h3>

<center>
<p><img src="../../../images/londiste.jpg" alt=""></p>
</center>

<p>Yes, by using that <em>handler</em> which is provided by default in <em>londiste</em>, you
will apply the previous article tricks in your replication solution. And you
can even choose to use that for only some of the tables you are replicating.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 18 Mar 2013 14:54:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/03/18-bulk-replication.html</guid>
</item>
<item>
  <title>Batch Update</title>
  <link>http://tapoueh.org/blog/2013/03/15-batch-update.html</link>
  <description><![CDATA[<p>Performance consulting involves some tricks that you have to teach over and
over again. One of them is that SQL tends to be so much better at dealing
with plenty of rows in a single statement when compared to running as many
statements, each one against a single row.</p>

<center>
<p><img src="../../../images/Home-Brewing.jpg" alt=""></p>
</center>

<center>
<p><em>Another kind of Batch to update</em></p>
</center>

<p>So when you need to <code>UPDATE</code> a bunch of rows from a given source, remember
that you can actually use a <code>JOIN</code> in the <em>update</em> statement. Either the source
of data is already in the database, in which case it's as simple as using
the <code>FROM</code> clause in the <em>update</em> statement, or it's not, and we're getting back
to that in a minute.</p>

<h3><code>UPDATE FROM</code></h3>

<p class="first">It's all about using that <code>FROM</code> clause in an <em>update</em> statement, right?</p>

<pre class="src">
    <span style="color: #729fcf;">UPDATE</span> target <span style="color: #729fcf;">t</span>
       <span style="color: #729fcf;">SET</span> counter = <span style="color: #729fcf;">t</span>.counter + s.counter,
      <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">source</span> s
     <span style="color: #fcaf3e;">WHERE</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> = s.<span style="color: #729fcf;">id</span>
</pre>

<p>Using that, you can actually update thousands of rows in our <em>target</em> table in
a single statement, and you can't really get faster than that.</p>


<h3>Preparing the Batch</h3>

<p class="first">Now, if you happen to have the source data in your application process'
memory, the previous bits is not doing you any good, you think. Well, the
trick is that pushing your in-memory data into the database and then joining
against the now local source of data is generally faster than looping in the
application and having to do a whole network <em>round trip</em> per row.</p>

<center>
<p><img src="../../../images/round-trip.png" alt=""></p>
</center>

<center>
<p><em>What about</em> <strong><em>that</em></strong> <em>round trip?</em></p>
</center>

<p>Let's see how it goes:</p>

<pre class="src">
<span style="color: #fcaf3e;">CREATE</span> <span style="color: #729fcf;">TEMP</span> <span style="color: #fcaf3e;">TABLE</span> <span style="color: #729fcf;">source</span>(<span style="color: #fcaf3e;">LIKE</span> target <span style="color: #729fcf;">INCLUDING</span> <span style="color: #fcaf3e;">ALL</span>) <span style="color: #fcaf3e;">ON</span> <span style="color: #729fcf;">COMMIT</span> <span style="color: #729fcf;">DROP</span>;

<span style="color: #729fcf;">COPY</span> <span style="color: #729fcf;">source</span> <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">STDIN</span>;

<span style="color: #729fcf;">UPDATE</span> target <span style="color: #729fcf;">t</span>
   <span style="color: #729fcf;">SET</span> counter = <span style="color: #729fcf;">t</span>.counter + s.counter,
  <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">source</span> s
 <span style="color: #fcaf3e;">WHERE</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> = s.<span style="color: #729fcf;">id</span>
</pre>

<p>As we're talking about performances, the trick here is to use the <a href="http://www.postgresql.org/docs/9.2/static/sql-copy.html">COPY</a>
protocol to fill in the <em>temporary table</em> we just create to hold our data. So
we're now sending the whole data set in a temporary location in the
database, then using that as the <code>UPDATE</code> source. And that's way faster than
doing a separate <code>UPDATE</code> statement per row in your batch, even for small
batches.</p>

<p>Also, rather than using the SQL <code>COPY</code> command, you might want to look up the
docs of the PostgreSQL driver you are currently using in your application,
it certainly includes some higher level facilities to deal with pushing the
data into the streaming protocol.</p>


<h3>Insert or Update</h3>

<p class="first">And now sometime some of the rows in the batch have to be <em>updated</em> while some
others are new and must be inserted. How do you do that? Well, PostgreSQL
9.1 brings on the table <code>WITH</code> support for all <a href="http://www.postgresql.org/docs/9.2/static/dml.html">DML</a> queries, which means that
you can do the following just fine:</p>

<pre class="src">
<span style="color: #fcaf3e;">WITH</span> upd <span style="color: #fcaf3e;">AS</span> (
    <span style="color: #729fcf;">UPDATE</span> target <span style="color: #729fcf;">t</span>
       <span style="color: #729fcf;">SET</span> counter = <span style="color: #729fcf;">t</span>.counter + s.counter,
      <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">source</span> s
     <span style="color: #fcaf3e;">WHERE</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> = s.<span style="color: #729fcf;">id</span>
 <span style="color: #fcaf3e;">RETURNING</span> s.<span style="color: #729fcf;">id</span>
)
<span style="color: #729fcf;">INSERT</span> <span style="color: #fcaf3e;">INTO</span> target(<span style="color: #729fcf;">id</span>, counter)
     <span style="color: #fcaf3e;">SELECT</span> <span style="color: #729fcf;">id</span>, <span style="color: #729fcf;">sum</span>(counter)
       <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">source</span> s <span style="color: #fcaf3e;">LEFT</span> <span style="color: #fcaf3e;">JOIN</span> upd <span style="color: #729fcf;">t</span> <span style="color: #fcaf3e;">USING</span>(<span style="color: #729fcf;">id</span>)
      <span style="color: #fcaf3e;">WHERE</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> <span style="color: #fcaf3e;">IS</span> <span style="color: #fcaf3e;">NULL</span>
   <span style="color: #fcaf3e;">GROUP</span> <span style="color: #729fcf;">BY</span> s.<span style="color: #729fcf;">id</span>
  <span style="color: #fcaf3e;">RETURNING</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span>
</pre>

<p>That query here is <em>updating</em> all the rows that are known in both the <em>target</em>
and the <em>source</em> and returns what we took from the <em>source</em> in the operation, so
that we can do an <em>anti-join</em> in the next step of the query, where we're
<em>inserting</em> any row that was not taken care of in the <em>update</em> part of the
statement.</p>

<p>Note that when the batch gets to bigger size it's usually better to join
against the <em>target</em> table in the <code>INSERT</code> statement, because that will have an
<em>index</em> on the join key.</p>


<h3>Concurrency patterns</h3>

<p class="first">Now, you will tell me that we just solved the <code>UPSERT</code> problem. Well what
happens if more than one transaction is trying to do the <code>WITH (UPDATE)
INSERT</code> dance at the same time? It's a single <em>statement</em>, so it's a single
<em>snapshot</em>. What can go wrong?</p>

<center>
<p><img src="../../../images/gophermegaphones.480.jpg" alt=""></p>
</center>

<center>
<p><em>Concurrent processing</em></p>
</center>

<p>What happens is that as soon as the concurrent sources contain some data for
the same <em>primary key</em>, you get a <em>duplicate key</em> error on the insert. As both
the transactions are concurrent, they are seeing the same <em>target</em> table where
the new data does not exists, and both will conclude that they need to
<code>INSERT</code> the new data into the <em>target</em> table.</p>

<p>There are two things that you can do to avoid the problem. The first thing
is to make it so that you're doing only one <em>batch update</em> at any time, by
architecting your application around that constraint. That's the most
effective way around the problem, but not the most practical.</p>

<p>The other thing you can do, is force the concurrent transactions to
serialize one after the other, using an <a href="http://www.postgresql.org/docs/9.2/static/explicit-locking.html">explicit locking</a> statement:</p>


<pre class="src">
<span style="color: #729fcf;">LOCK</span> <span style="color: #fcaf3e;">TABLE</span> target <span style="color: #fcaf3e;">IN</span> <span style="color: #729fcf;">SHARE</span> <span style="color: #729fcf;">ROW</span> <span style="color: #729fcf;">EXCLUSIVE</span> <span style="color: #729fcf;">MODE</span>;
</pre>

<p>That <em>lock level</em> is not automatically acquired by any PostgreSQL command, so
the only way it helps you is when you're doing that for every transaction
you want to serialize. When you know you're not at risk (that is, when not
playing the <em>insert or update</em> dance), you can omit taking that <em>lock</em>.</p>


<h3>Conclusion</h3>

<center>
<p><a class="image-link" href="http://www.flickr.com/photos/asquarephotography/6841106459/in/photostream/">
<img src="../../../images/stack-of-old-books.jpg"></a></p>
</center>

<p>The SQL language has its quirks, that's true. It's been made for efficient
data processing, and with recent enough <a href="http://www.postgresql.org/about/featurematrix/">PostgreSQL releases</a> you even have
some advanced pipelining facilities included in the language. Properly
learning how to make the most out of that old component of your programming
stack still makes a lot of sense today!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 15 Mar 2013 10:47:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/03/15-batch-update.html</guid>
</item>
<item>
  <title>Batch Update</title>
  <link>http://tapoueh.org/blog/2013/03/15-batch-update.html</link>
  <description><![CDATA[<p>Performance consulting involves some tricks that you have to teach over and
over again. One of them is that SQL tends to be so much better at dealing
with plenty of rows in a single statement when compared to running as many
statements, each one against a single row.</p>

<center>
<p><img src="../../../images/Home-Brewing.jpg" alt=""></p>
</center>

<center>
<p><em>Another kind of Batch to update</em></p>
</center>

<p>So when you need to <code>UPDATE</code> a bunch of rows from a given source, remember
that you can actually use a <code>JOIN</code> in the <em>update</em> statement. Either the source
of data is already in the database, in which case it's as simple as using
the <code>FROM</code> clause in the <em>update</em> statement, or it's not, and we're getting back
to that in a minute.</p>

<h3><code>UPDATE FROM</code></h3>

<p class="first">It's all about using that <code>FROM</code> clause in an <em>update</em> statement, right?</p>

<pre class="src">
    <span style="color: #729fcf;">UPDATE</span> target <span style="color: #729fcf;">t</span>
       <span style="color: #729fcf;">SET</span> counter = <span style="color: #729fcf;">t</span>.counter + s.counter,
      <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">source</span> s
     <span style="color: #fcaf3e;">WHERE</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> = s.<span style="color: #729fcf;">id</span>
</pre>

<p>Using that, you can actually update thousands of rows in our <em>target</em> table in
a single statement, and you can't really get faster than that.</p>


<h3>Preparing the Batch</h3>

<p class="first">Now, if you happen to have the source data in your application process'
memory, the previous bits is not doing you any good, you think. Well, the
trick is that pushing your in-memory data into the database and then joining
against the now local source of data is generally faster than looping in the
application and having to do a whole network <em>round trip</em> per row.</p>

<center>
<p><img src="../../../images/round-trip.png" alt=""></p>
</center>

<center>
<p><em>What about</em> <strong><em>that</em></strong> <em>round trip?</em></p>
</center>

<p>Let's see how it goes:</p>

<pre class="src">
<span style="color: #fcaf3e;">CREATE</span> <span style="color: #729fcf;">TEMP</span> <span style="color: #fcaf3e;">TABLE</span> <span style="color: #729fcf;">source</span>(<span style="color: #fcaf3e;">LIKE</span> target <span style="color: #729fcf;">INCLUDING</span> <span style="color: #fcaf3e;">ALL</span>) <span style="color: #fcaf3e;">ON</span> <span style="color: #729fcf;">COMMIT</span> <span style="color: #729fcf;">DROP</span>;

<span style="color: #729fcf;">COPY</span> <span style="color: #729fcf;">source</span> <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">STDIN</span>;

<span style="color: #729fcf;">UPDATE</span> target <span style="color: #729fcf;">t</span>
   <span style="color: #729fcf;">SET</span> counter = <span style="color: #729fcf;">t</span>.counter + s.counter,
  <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">source</span> s
 <span style="color: #fcaf3e;">WHERE</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> = s.<span style="color: #729fcf;">id</span>
</pre>

<p>As we're talking about performances, the trick here is to use the <a href="http://www.postgresql.org/docs/9.2/static/sql-copy.html">COPY</a>
protocol to fill in the <em>temporary table</em> we just create to hold our data. So
we're now sending the whole data set in a temporary location in the
database, then using that as the <code>UPDATE</code> source. And that's way faster than
doing a separate <code>UPDATE</code> statement per row in your batch, even for small
batches.</p>

<p>Also, rather than using the SQL <code>COPY</code> command, you might want to look up the
docs of the PostgreSQL driver you are currently using in your application,
it certainly includes some higher level facilities to deal with pushing the
data into the streaming protocol.</p>


<h3>Insert or Update</h3>

<p class="first">And now sometime some of the rows in the batch have to be <em>updated</em> while some
others are new and must be inserted. How do you do that? Well, PostgreSQL
9.1 brings on the table <code>WITH</code> support for all <a href="http://www.postgresql.org/docs/9.2/static/dml.html">DML</a> queries, which means that
you can do the following just fine:</p>

<pre class="src">
<span style="color: #fcaf3e;">WITH</span> upd <span style="color: #fcaf3e;">AS</span> (
    <span style="color: #729fcf;">UPDATE</span> target <span style="color: #729fcf;">t</span>
       <span style="color: #729fcf;">SET</span> counter = <span style="color: #729fcf;">t</span>.counter + s.counter,
      <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">source</span> s
     <span style="color: #fcaf3e;">WHERE</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> = s.<span style="color: #729fcf;">id</span>
 <span style="color: #fcaf3e;">RETURNING</span> s.<span style="color: #729fcf;">id</span>
)
<span style="color: #729fcf;">INSERT</span> <span style="color: #fcaf3e;">INTO</span> target(<span style="color: #729fcf;">id</span>, counter)
     <span style="color: #fcaf3e;">SELECT</span> <span style="color: #729fcf;">id</span>, <span style="color: #729fcf;">sum</span>(counter)
       <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">source</span> s <span style="color: #fcaf3e;">LEFT</span> <span style="color: #fcaf3e;">JOIN</span> upd <span style="color: #729fcf;">t</span> <span style="color: #fcaf3e;">USING</span>(<span style="color: #729fcf;">id</span>)
      <span style="color: #fcaf3e;">WHERE</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> <span style="color: #fcaf3e;">IS</span> <span style="color: #fcaf3e;">NULL</span>
   <span style="color: #fcaf3e;">GROUP</span> <span style="color: #729fcf;">BY</span> s.<span style="color: #729fcf;">id</span>
  <span style="color: #fcaf3e;">RETURNING</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span>
</pre>

<p>That query here is <em>updating</em> all the rows that are known in both the <em>target</em>
and the <em>source</em> and returns what we took from the <em>source</em> in the operation, so
that we can do an <em>anti-join</em> in the next step of the query, where we're
<em>inserting</em> any row that was not taken care of in the <em>update</em> part of the
statement.</p>

<p>Note that when the batch gets to bigger size it's usually better to join
against the <em>target</em> table in the <code>INSERT</code> statement, because that will have an
<em>index</em> on the join key.</p>


<h3>Concurrency patterns</h3>

<p class="first">Now, you will tell me that we just solved the <code>UPSERT</code> problem. Well what
happens if more than one transaction is trying to do the <code>WITH (UPDATE)
INSERT</code> dance at the same time? It's a single <em>statement</em>, so it's a single
<em>snapshot</em>. What can go wrong?</p>

<center>
<p><img src="../../../images/gophermegaphones.480.jpg" alt=""></p>
</center>

<center>
<p><em>Concurrent processing</em></p>
</center>

<p>What happens is that as soon as the concurrent sources contain some data for
the same <em>primary key</em>, you get a <em>duplicate key</em> error on the insert. As both
the transactions are concurrent, they are seeing the same <em>target</em> table where
the new data does not exists, and both will conclude that they need to
<code>INSERT</code> the new data into the <em>target</em> table.</p>

<p>There are two things that you can do to avoid the problem. The first thing
is to make it so that you're doing only one <em>batch update</em> at any time, by
architecting your application around that constraint. That's the most
effective way around the problem, but not the most practical.</p>

<p>The other thing you can do, is force the concurrent transactions to
serialize one after the other, using an <a href="http://www.postgresql.org/docs/9.2/static/explicit-locking.html">explicit locking</a> statement:</p>


<pre class="src">
<span style="color: #729fcf;">LOCK</span> <span style="color: #fcaf3e;">TABLE</span> target <span style="color: #fcaf3e;">IN</span> <span style="color: #729fcf;">SHARE</span> <span style="color: #729fcf;">ROW</span> <span style="color: #729fcf;">EXCLUSIVE</span> <span style="color: #729fcf;">MODE</span>;
</pre>

<p>That <em>lock level</em> is not automatically acquired by any PostgreSQL command, so
the only way it helps you is when you're doing that for every transaction
you want to serialize. When you know you're not at risk (that is, when not
playing the <em>insert or update</em> dance), you can omit taking that <em>lock</em>.</p>


<h3>Conclusion</h3>

<center>
<p><a class="image-link" href="http://www.flickr.com/photos/asquarephotography/6841106459/in/photostream/">
<img src="../../../images/stack-of-old-books.jpg"></a></p>
</center>

<p>The SQL language has its quirks, that's true. It's been made for efficient
data processing, and with recent enough <a href="http://www.postgresql.org/about/featurematrix/">PostgreSQL releases</a> you even have
some advanced pipelining facilities included in the language. Properly
learning how to make the most out of that old component of your programming
stack still makes a lot of sense today!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 15 Mar 2013 10:47:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/03/15-batch-update.html</guid>
</item>
<item>
  <title>Batch Update</title>
  <link>http://tapoueh.org/blog/2013/03/15-batch-update.html</link>
  <description><![CDATA[<p>Performance consulting involves some tricks that you have to teach over and
over again. One of them is that SQL tends to be so much better at dealing
with plenty of rows in a single statement when compared to running as many
statements, each one against a single row.</p>

<center>
<p><img src="../../../images/Home-Brewing.jpg" alt=""></p>
</center>

<center>
<p><em>Another kind of Batch to update</em></p>
</center>

<p>So when you need to <code>UPDATE</code> a bunch of rows from a given source, remember
that you can actually use a <code>JOIN</code> in the <em>update</em> statement. Either the source
of data is already in the database, in which case it's as simple as using
the <code>FROM</code> clause in the <em>update</em> statement, or it's not, and we're getting back
to that in a minute.</p>

<h3><code>UPDATE FROM</code></h3>

<p class="first">It's all about using that <code>FROM</code> clause in an <em>update</em> statement, right?</p>

<pre class="src">
    <span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span>
       <span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter,
      <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s
     <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span>
</pre>

<p>Using that, you can actually update thousands of rows in our <em>target</em> table in
a single statement, and you can't really get faster than that.</p>


<h3>Preparing the Batch</h3>

<p class="first">Now, if you happen to have the source data in your application process'
memory, the previous bits is not doing you any good, you think. Well, the
trick is that pushing your in-memory data into the database and then joining
against the now local source of data is generally faster than looping in the
application and having to do a whole network <em>round trip</em> per row.</p>

<center>
<p><img src="../../../images/round-trip.png" alt=""></p>
</center>

<center>
<p><em>What about</em> <strong><em>that</em></strong> <em>round trip?</em></p>
</center>

<p>Let's see how it goes:</p>

<pre class="src">
<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">TEMP</span> <span style="color: #7f007f;">TABLE</span> <span style="color: #da70d6;">source</span>(<span style="color: #7f007f;">LIKE</span> target <span style="color: #da70d6;">INCLUDING</span> <span style="color: #7f007f;">ALL</span>) <span style="color: #7f007f;">ON</span> <span style="color: #da70d6;">COMMIT</span> <span style="color: #da70d6;">DROP</span>;

<span style="color: #da70d6;">COPY</span> <span style="color: #da70d6;">source</span> <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">STDIN</span>;

<span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span>
   <span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter,
  <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s
 <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span>
</pre>

<p>As we're talking about performances, the trick here is to use the <a href="http://www.postgresql.org/docs/9.2/static/sql-copy.html">COPY</a>
protocol to fill in the <em>temporary table</em> we just create to hold our data. So
we're now sending the whole data set in a temporary location in the
database, then using that as the <code>UPDATE</code> source. And that's way faster than
doing a separate <code>UPDATE</code> statement per row in your batch, even for small
batches.</p>

<p>Also, rather than using the SQL <code>COPY</code> command, you might want to look up the
docs of the PostgreSQL driver you are currently using in your application,
it certainly includes some higher level facilities to deal with pushing the
data into the streaming protocol.</p>


<h3>Insert or Update</h3>

<p class="first">And now sometime some of the rows in the batch have to be <em>updated</em> while some
others are new and must be inserted. How do you do that? Well, PostgreSQL
9.1 brings on the table <code>WITH</code> support for all <a href="http://www.postgresql.org/docs/9.2/static/dml.html">DML</a> queries, which means that
you can do the following just fine:</p>

<pre class="src">
<span style="color: #7f007f;">WITH</span> upd <span style="color: #7f007f;">AS</span> (
    <span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span>
       <span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter,
      <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s
     <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span>
 <span style="color: #7f007f;">RETURNING</span> s.<span style="color: #da70d6;">id</span>
)
<span style="color: #da70d6;">INSERT</span> <span style="color: #7f007f;">INTO</span> target(<span style="color: #da70d6;">id</span>, counter)
     <span style="color: #7f007f;">SELECT</span> <span style="color: #da70d6;">id</span>, <span style="color: #da70d6;">sum</span>(counter)
       <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">LEFT</span> <span style="color: #7f007f;">JOIN</span> upd <span style="color: #da70d6;">t</span> <span style="color: #7f007f;">USING</span>(<span style="color: #da70d6;">id</span>)
      <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> <span style="color: #7f007f;">IS</span> <span style="color: #7f007f;">NULL</span>
   <span style="color: #7f007f;">GROUP</span> <span style="color: #da70d6;">BY</span> s.<span style="color: #da70d6;">id</span>
  <span style="color: #7f007f;">RETURNING</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span>
</pre>

<p>That query here is <em>updating</em> all the rows that are known in both the <em>target</em>
and the <em>source</em> and returns what we took from the <em>source</em> in the operation, so
that we can do an <em>anti-join</em> in the next step of the query, where we're
<em>inserting</em> any row that was not taken care of in the <em>update</em> part of the
statement.</p>

<p>Note that when the batch gets to bigger size it's usually better to join
against the <em>target</em> table in the <code>INSERT</code> statement, because that will have an
<em>index</em> on the join key.</p>


<h3>Concurrency patterns</h3>

<p class="first">Now, you will tell me that we just solved the <code>UPSERT</code> problem. Well what
happens if more than one transaction is trying to do the <code>WITH (UPDATE)
INSERT</code> dance at the same time? It's a single <em>statement</em>, so it's a single
<em>snapshot</em>. What can go wrong?</p>

<center>
<p><img src="../../../images/gophermegaphones.480.jpg" alt=""></p>
</center>

<center>
<p><em>Concurrent processing</em></p>
</center>

<p>What happens is that as soon as the concurrent sources contain some data for
the same <em>primary key</em>, you get a <em>duplicate key</em> error on the insert. As both
the transactions are concurrent, they are seeing the same <em>target</em> table where
the new data does not exists, and both will conclude that they need to
<code>INSERT</code> the new data into the <em>target</em> table.</p>

<p>There are two things that you can do to avoid the problem. The first thing
is to make it so that you're doing only one <em>batch update</em> at any time, by
architecting your application around that constraint. That's the most
effective way around the problem, but not the most practical.</p>

<p>The other thing you can do, is force the concurrent transactions to
serialize one after the other, using an <a href="http://www.postgresql.org/docs/9.2/static/explicit-locking.html">explicit locking</a> statement:</p>


<pre class="src">
<span style="color: #da70d6;">LOCK</span> <span style="color: #7f007f;">TABLE</span> target <span style="color: #7f007f;">IN</span> <span style="color: #da70d6;">SHARE</span> <span style="color: #da70d6;">ROW</span> <span style="color: #da70d6;">EXCLUSIVE</span> <span style="color: #da70d6;">MODE</span>;
</pre>

<p>That <em>lock level</em> is not automatically acquired by any PostgreSQL command, so
the only way it helps you is when you're doing that for every transaction
you want to serialize. When you know you're not at risk (that is, when not
playing the <em>insert or update</em> dance), you can omit taking that <em>lock</em>.</p>


<h3>Conclusion</h3>

<center>
<p><a class="image-link" href="http://www.flickr.com/photos/asquarephotography/6841106459/in/photostream/">
<img src="../../../images/stack-of-old-books.jpg"></a></p>
</center>

<p>The SQL language has its quirks, that's true. It's been made for efficient
data processing, and with recent enough <a href="http://www.postgresql.org/about/featurematrix/">PostgreSQL releases</a> you even have
some advanced pipelining facilities included in the language. Properly
learning how to make the most out of that old component of your programming
stack still makes a lot of sense today!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 15 Mar 2013 10:47:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/03/15-batch-update.html</guid>
</item>
<item>
  <title>Batch Update</title>
  <link>http://tapoueh.org/blog/2013/03/15-batch-update.html</link>
  <description><![CDATA[<p>Performance consulting involves some tricks that you have to teach over and
over again. One of them is that SQL tends to be so much better at dealing
with plenty of rows in a single statement when compared to running as many
statements, each one against a single row.</p>

<center>
<p><img src="../../../images/Home-Brewing.jpg" alt=""></p>
</center>

<center>
<p><em>Another kind of Batch to update</em></p>
</center>

<p>So when you need to <code>UPDATE</code> a bunch of rows from a given source, remember
that you can actually use a <code>JOIN</code> in the <em>update</em> statement. Either the source
of data is already in the database, in which case it's as simple as using
the <code>FROM</code> clause in the <em>update</em> statement, or it's not, and we're getting back
to that in a minute.</p>

<h3><code>UPDATE FROM</code></h3>

<p class="first">It's all about using that <code>FROM</code> clause in an <em>update</em> statement, right?</p>

<pre class="src">
    <span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span>
       <span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter,
      <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s
     <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span>
</pre>

<p>Using that, you can actually update thousands of rows in our <em>target</em> table in
a single statement, and you can't really get faster than that.</p>


<h3>Preparing the Batch</h3>

<p class="first">Now, if you happen to have the source data in your application process'
memory, the previous bits is not doing you any good, you think. Well, the
trick is that pushing your in-memory data into the database and then joining
against the now local source of data is generally faster than looping in the
application and having to do a whole network <em>round trip</em> per row.</p>

<center>
<p><img src="../../../images/round-trip.png" alt=""></p>
</center>

<center>
<p><em>What about</em> <strong><em>that</em></strong> <em>round trip?</em></p>
</center>

<p>Let's see how it goes:</p>

<pre class="src">
<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">TEMP</span> <span style="color: #7f007f;">TABLE</span> <span style="color: #da70d6;">source</span>(<span style="color: #7f007f;">LIKE</span> target <span style="color: #da70d6;">INCLUDING</span> <span style="color: #7f007f;">ALL</span>) <span style="color: #7f007f;">ON</span> <span style="color: #da70d6;">COMMIT</span> <span style="color: #da70d6;">DROP</span>;

<span style="color: #da70d6;">COPY</span> <span style="color: #da70d6;">source</span> <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">STDIN</span>;

<span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span>
   <span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter,
  <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s
 <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span>
</pre>

<p>As we're talking about performances, the trick here is to use the <a href="http://www.postgresql.org/docs/9.2/static/sql-copy.html">COPY</a>
protocol to fill in the <em>temporary table</em> we just create to hold our data. So
we're now sending the whole data set in a temporary location in the
database, then using that as the <code>UPDATE</code> source. And that's way faster than
doing a separate <code>UPDATE</code> statement per row in your batch, even for small
batches.</p>

<p>Also, rather than using the SQL <code>COPY</code> command, you might want to look up the
docs of the PostgreSQL driver you are currently using in your application,
it certainly includes some higher level facilities to deal with pushing the
data into the streaming protocol.</p>


<h3>Insert or Update</h3>

<p class="first">And now sometime some of the rows in the batch have to be <em>updated</em> while some
others are new and must be inserted. How do you do that? Well, PostgreSQL
9.1 brings on the table <code>WITH</code> support for all <a href="http://www.postgresql.org/docs/9.2/static/dml.html">DML</a> queries, which means that
you can do the following just fine:</p>

<pre class="src">
<span style="color: #7f007f;">WITH</span> upd <span style="color: #7f007f;">AS</span> (
    <span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span>
       <span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter,
      <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s
     <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span>
 <span style="color: #7f007f;">RETURNING</span> s.<span style="color: #da70d6;">id</span>
)
<span style="color: #da70d6;">INSERT</span> <span style="color: #7f007f;">INTO</span> target(<span style="color: #da70d6;">id</span>, counter)
     <span style="color: #7f007f;">SELECT</span> <span style="color: #da70d6;">id</span>, <span style="color: #da70d6;">sum</span>(counter)
       <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">LEFT</span> <span style="color: #7f007f;">JOIN</span> upd <span style="color: #7f007f;">USING</span>(<span style="color: #da70d6;">id</span>)
      <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> <span style="color: #7f007f;">IS</span> <span style="color: #7f007f;">NULL</span>
   <span style="color: #7f007f;">GROUP</span> <span style="color: #da70d6;">BY</span> s.<span style="color: #da70d6;">id</span>
  <span style="color: #7f007f;">RETURNING</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span>
</pre>

<p>That query here is <em>updating</em> all the rows that are known in both the <em>target</em>
and the <em>source</em> and returns what we took from the <em>source</em> in the operation, so
that we can do an <em>anti-join</em> in the next step of the query, where we're
<em>inserting</em> any row that was not taken care of in the <em>update</em> part of the
statement.</p>

<p>Note that when the batch gets to bigger size it's usually better to join
against the <em>target</em> table in the <code>INSERT</code> statement, because that will have an
<em>index</em> on the join key.</p>


<h3>Concurrency patterns</h3>

<p class="first">Now, you will tell me that we just solved the <code>UPSERT</code> problem. Well what
happens if more than one transaction is trying to do the <code>WITH (UPDATE)
INSERT</code> dance at the same time? It's a single <em>statement</em>, so it's a single
<em>snapshot</em>. What can go wrong?</p>

<center>
<p><img src="../../../images/gophermegaphones.480.jpg" alt=""></p>
</center>

<center>
<p><em>Concurrent processing</em></p>
</center>

<p>What happens is that as soon as the concurrent sources contain some data for
the same <em>primary key</em>, you get a <em>duplicate key</em> error on the insert. As both
the transactions are concurrent, they are seeing the same <em>target</em> table where
the new data does not exists, and both will conclude that they need to
<code>INSERT</code> the new data into the <em>target</em> table.</p>

<p>There are two things that you can do to avoid the problem. The first thing
is to make it so that you're doing only one <em>batch update</em> at any time, by
architecting your application around that constraint. That's the most
effective way around the problem, but not the most practical.</p>

<p>The other thing you can do, is force the concurrent transactions to
serialize one after the other, using an <a href="http://www.postgresql.org/docs/9.2/static/explicit-locking.html">explicit locking</a> statement:</p>


<pre class="src">
<span style="color: #da70d6;">LOCK</span> <span style="color: #7f007f;">TABLE</span> target <span style="color: #7f007f;">IN</span> <span style="color: #da70d6;">SHARE</span> <span style="color: #da70d6;">ROW</span> <span style="color: #da70d6;">EXCLUSIVE</span> <span style="color: #da70d6;">MODE</span>;
</pre>

<p>That <em>lock level</em> is not automatically acquired by any PostgreSQL command, so
the only way it helps you is when you're doing that for every transaction
you want to serialize. When you know you're not at risk (that is, when not
playing the <em>insert or update</em> dance), you can omit taking that <em>lock</em>.</p>


<h3>Conclusion</h3>

<center>
<p><a class="image-link" href="http://www.flickr.com/photos/asquarephotography/6841106459/in/photostream/">
<img src="../../../images/stack-of-old-books.jpg"></a></p>
</center>

<p>The SQL language has its quirks, that's true. It's been made for efficient
data processing, and with recent enough <a href="http://www.postgresql.org/about/featurematrix/">PostgreSQL releases</a> you even have
some advanced pipelining facilities included in the language. Properly
learning how to make the most out of that old component of your programming
stack still makes a lot of sense today!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 15 Mar 2013 10:47:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/03/15-batch-update.html</guid>
</item>
<item>
  <title>Batch Update</title>
  <link>http://tapoueh.org/blog/2013/03/15-batch-update.html</link>
  <description><![CDATA[<p>Performance consulting involves some tricks that you have to teach over and
over again. One of them is that SQL tends to be so much better at dealing
with plenty of rows in a single statement when compared to running as many
statements, each one against a single row.</p>

<center>
<p><img src="../../../images/Home-Brewing.jpg" alt=""></p>
</center>

<center>
<p><em>Another kind of Batch to update</em></p>
</center>

<p>So when you need to <code>UPDATE</code> a bunch of rows from a given source, remember
that you can actually use a <code>JOIN</code> in the <em>update</em> statement. Either the source
of data is already in the database, in which case it's as simple as using
the <code>FROM</code> clause in the <em>update</em> statement, or it's not, and we're getting back
to that in a minute.</p>

<h3><code>UPDATE FROM</code></h3>

<p class="first">It's all about using that <code>FROM</code> clause in an <em>update</em> statement, right?</p>

<pre class="src">
    <span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span>
       <span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter,
      <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s
     <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span>
</pre>

<p>Using that, you can actually update thousands of rows in our <em>target</em> table in
a single statement, and you can't really get faster than that.</p>


<h3>Preparing the Batch</h3>

<p class="first">Now, if you happen to have the source data in your application process'
memory, the previous bits is not doing you any good, you think. Well, the
trick is that pushing your in-memory data into the database and then joining
against the now local source of data is generally faster than looping in the
application and having to do a whole network <em>round trip</em> per row.</p>

<center>
<p><img src="../../../images/round-trip.png" alt=""></p>
</center>

<center>
<p><em>What about</em> <strong><em>that</em></strong> <em>round trip?</em></p>
</center>

<p>Let's see how it goes:</p>

<pre class="src">
<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">TEMP</span> <span style="color: #7f007f;">TABLE</span> <span style="color: #da70d6;">source</span>(<span style="color: #7f007f;">LIKE</span> target <span style="color: #da70d6;">INCLUDING</span> <span style="color: #7f007f;">ALL</span>) <span style="color: #7f007f;">ON</span> <span style="color: #da70d6;">COMMIT</span> <span style="color: #da70d6;">DROP</span>;

<span style="color: #da70d6;">COPY</span> <span style="color: #da70d6;">source</span> <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">STDIN</span>;

<span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span>
   <span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter,
  <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s
 <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span>
</pre>

<p>As we're talking about performances, the trick here is to use the <a href="http://www.postgresql.org/docs/9.2/static/sql-copy.html">COPY</a>
protocol to fill in the <em>temporary table</em> we just create to hold our data. So
we're now sending the whole data set in a temporary location in the
database, then using that as the <code>UPDATE</code> source. And that's way faster than
doing a separate <code>UPDATE</code> statement per row in your batch, even for small
batches.</p>

<p>Also, rather than using the SQL <code>COPY</code> command, you might want to look up the
docs of the PostgreSQL driver you are currently using in your application,
it certainly includes some higher level facilities to deal with pushing the
data into the streaming protocol.</p>


<h3>Insert or Update</h3>

<p class="first">And now sometime some of the rows in the batch have to be <em>updated</em> while some
others are new and must be inserted. How do you do that? Well, PostgreSQL
9.1 brings on the table <code>WITH</code> support for all <a href="http://www.postgresql.org/docs/9.2/static/dml.html">DML</a> queries, which means that
you can do the following just fine:</p>

<pre class="src">
<span style="color: #7f007f;">WITH</span> upd <span style="color: #7f007f;">AS</span> (
    <span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span>
       <span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter,
      <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s
     <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span>
 <span style="color: #7f007f;">RETURNING</span> s.<span style="color: #da70d6;">id</span>
)
<span style="color: #da70d6;">INSERT</span> <span style="color: #7f007f;">INTO</span> target(<span style="color: #da70d6;">id</span>, counter)
     <span style="color: #7f007f;">SELECT</span> <span style="color: #da70d6;">id</span>, <span style="color: #da70d6;">sum</span>(counter)
       <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">LEFT</span> <span style="color: #7f007f;">JOIN</span> upd <span style="color: #7f007f;">USING</span>(<span style="color: #da70d6;">id</span>)
      <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> <span style="color: #7f007f;">IS</span> <span style="color: #7f007f;">NULL</span>
   <span style="color: #7f007f;">GROUP</span> <span style="color: #da70d6;">BY</span> s.<span style="color: #da70d6;">id</span>
  <span style="color: #7f007f;">RETURNING</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span>
</pre>

<p>That query here is <em>updating</em> all the rows that are known in both the <em>target</em>
and the <em>source</em> and returns what we took from the <em>source</em> in the operation, so
that we can do an <em>anti-join</em> in the next step of the query, where we're
<em>inserting</em> any row that was not taken care of in the <em>update</em> part of the
statement.</p>

<p>Note that when the batch gets to bigger size it's usually better to join
against the <em>target</em> table in the <code>INSERT</code> statement, because that will have an
<em>index</em> on the join key.</p>


<h3>Concurrency patterns</h3>

<p class="first">Now, you will tell me that we just solved the <code>UPSERT</code> problem. Well what
happens if more than one transaction is trying to do the <code>WITH (UPDATE)
INSERT</code> dance at the same time? It's a single <em>statement</em>, so it's a single
<em>snapshot</em>. What can go wrong?</p>

<center>
<p><img src="../../../images/gophermegaphones.480.jpg" alt=""></p>
</center>

<center>
<p><em>Concurrent processing</em></p>
</center>

<p>What happens is that as soon as the concurrent sources contain some data for
the same <em>primary key</em>, you get a <em>duplicate key</em> error on the insert. As both
the transactions are concurrent, they are seeing the same <em>target</em> table where
the new data does not exists, and both will conclude that they need to
<code>INSERT</code> the new data into the <em>target</em> table.</p>

<p>There are two things that you can do to avoid the problem. The first thing
is to make it so that you're doing only one <em>batch update</em> at any time, by
architecting your application around that constraint. That's the most
effective way around the problem, but not the most practical.</p>

<p>The other thing you can do, is force the concurrent transactions to
serialize one after the other, using an <a href="http://www.postgresql.org/docs/9.2/static/explicit-locking.html">explicit locking</a> statement:</p>


<pre class="src">
<span style="color: #da70d6;">LOCK</span> <span style="color: #7f007f;">TABLE</span> target <span style="color: #7f007f;">IN</span> <span style="color: #da70d6;">SHARE</span> <span style="color: #da70d6;">ROW</span> <span style="color: #da70d6;">EXCLUSIVE</span> <span style="color: #da70d6;">MODE</span>;
</pre>

<p>That <em>lock level</em> is not automatically acquired by any PostgreSQL command, so
the only way it helps you is when you're doing that for every transaction
you want to serialize. When you know you're not at risk (that is, when not
playing the <em>insert or update</em> dance), you can omit taking that <em>lock</em>.</p>


<h3>Conclusion</h3>

<center>
<p><a class="image-link" href="http://www.flickr.com/photos/asquarephotography/6841106459/in/photostream/">
<img src="../../../images/stack-of-old-books.jpg"></a></p>
</center>

<p>The SQL language has its quirks, that's true. It's been made for efficient
data processing, and with recent enough <a href="http://www.postgresql.org/about/featurematrix/">PostgreSQL releases</a> you even have
some advanced pipelining facilities included in the language. Properly
learning how to make the most out of that old component of your programming
stack still makes a lot of sense today!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 15 Mar 2013 10:47:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/03/15-batch-update.html</guid>
</item>
<item>
  <title>Emacs Conference</title>
  <link>http://tapoueh.org/blog/2013/03/04-Emacs-Conference.html</link>
  <description><![CDATA[<p>The <a href="http://emacsconf.herokuapp.com/">Emacs Conference</a> is happening, it's real, and it will take place at the
end of this month in London. Check it out, and register at
<a href="http://emacsconf.eventbrite.co.uk/">Emacs Conference Event Brite</a>. It's free and there's still some availability.</p>

<center>
<p><img src="../../../images/emacs-rocks-logo.png" alt=""></p>
</center>

<center>
<p><em>It's all about Emacs, and it rocks!</em></p>
</center>

<p>We have a great line-up for this conference, which makes me proud to be able
to be there. If you've ever been paying attention when using <a href="http://www.gnu.org/software/emacs/">Emacs</a> then
you've already heard those names: <a href="http://sachachua.com/blog/">Sacha Chua</a> is frequently blogging about
how she manages to improve her workflow thanks to <a href="http://www.gnu.org/software/emacs/emacs-lisp-intro/">Emacs Lisp</a>, <a href="https://github.com/jwiegley">John Wiegley</a>
is a proficient Emacs contributor maybe best known for his <a href="https://github.com/ledger/ledger">ledger</a> <em>Emacs
Mode</em>, then we have <a href="http://www.lukego.com/">Luke Gorrie</a> who hacked up <a href="http://wingolog.org/archives/2006/01/02/slime">SLIME</a> among other things, we
also have <a href="http://nic.ferrier.me.uk/">Nic Ferrier</a> who is starting a revolution in how to use <em>Emacs Lisp</em>
with <a href="http://elnode.org/">elnode</a>. And more! Including <a href="http://en.wikipedia.org/wiki/Steve_Yegge">Steve Yegge</a>!</p>

<center>
<p>See you there in London.</p>
</center>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 04 Mar 2013 13:58:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/03/04-Emacs-Conference.html</guid>
</item>
<item>
  <title>HyperLogLog Unions</title>
  <link>http://tapoueh.org/blog/2013/02/26-hll-union.html</link>
  <description><![CDATA[<p>In the article from yesterday we talked about <a href="http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html">PostgreSQL HyperLogLog</a> with
some details. The real magic of that extension has been skimmed over though,
and needs another very small article all by itself, in case you missed it.</p>

<center>
<p><img src="../../../images/SetOperations.480.png" alt=""></p>
</center>

<center>
<p><em>Which Set Operation do you want for counting unique values?</em></p>
</center>

<p>The first query here has the default level of magic in it, really. What
happens is that each time we do an update of the <em>HyperLogLog</em> <em>hash</em> value, we
update some data which are allowing us to compute its cardinality.</p>

<pre class="src">
=&gt; <span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>,
          #users <span style="color: #7f007f;">as</span> daily,
          pg_column_size(users) <span style="color: #7f007f;">as</span> bytes
     <span style="color: #7f007f;">from</span> daily_uniques
 <span style="color: #7f007f;">order</span> <span style="color: #da70d6;">by</span> <span style="color: #228b22;">date</span>;
    <span style="color: #228b22;">date</span>    |      daily       | bytes
<span style="color: #b22222;">------------+------------------+-------
</span> 2013-02-22 | 401676.779509985 |  1287
 2013-02-23 | 660187.271908359 |  1287
 2013-02-24 | 869980.029947449 |  1287
 2013-02-25 | 580865.296677817 |  1287
 2013-02-26 | 240569.492722719 |  1287
(5 <span style="color: #da70d6;">rows</span>)
</pre>

<p>And has advertized the data is kept in a static sized data structure. The
magic here all happens at <code>hll_add()</code> time, the function you have to call to
update the data.</p>

<p>Now on to something way more magic!</p>

<center>
<p><img src="../../../images/aggregates2.jpg" alt=""></p>
</center>

<center>
<p><em>Are those the aggregates you're looking for?</em></p>
</center>

<pre class="src">
=&gt; <span style="color: #7f007f;">select</span> to_char(<span style="color: #228b22;">date</span>, <span style="color: #bc8f8f;">'YYYY/MM'</span>) <span style="color: #7f007f;">as</span> <span style="color: #da70d6;">month</span>,
          round(#hll_union_agg(users)) <span style="color: #7f007f;">as</span> monthly
     <span style="color: #7f007f;">from</span> daily_uniques <span style="color: #7f007f;">group</span> <span style="color: #da70d6;">by</span> 1;
  <span style="color: #da70d6;">month</span>  | monthly
<span style="color: #b22222;">---------+---------
</span> 2013/02 | 1960380
(1 <span style="color: #da70d6;">row</span>)
</pre>

<p>The <em>HyperLogLog</em> data structure is allowing the implementation of an <strong><em>union</em></strong>
algorithm that will be able to compute how many unique values you happen to
have registered in both one day and the next. Extended in its general form,
and doing SQL, what you get is an <em>aggregate</em> that you can use in <code>GROUP BY</code>
constructs and <a href="http://www.postgresql.org/docs/9.2/static/tutorial-window.html">window functions</a>. Did you read about them yet?</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 26 Feb 2013 12:44:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/26-hll-union.html</guid>
</item>
<item>
  <title>PostgreSQL HyperLogLog</title>
  <link>http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</link>
  <description><![CDATA[<p>If you've been following along at home the newer statistics developments,
you might have heard about this new
<a href="http://research.google.com/pubs/pub40671.html">State of The Art Cardinality Estimation Algorithm</a> called <a href="http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data/">HyperLogLog</a>. This
technique is now available for PostgreSQL in the extension <a href="http://blog.aggregateknowledge.com/2013/02/04/open-source-release-postgresql-hll/">postgresql-hll</a>
available at <a href="https://github.com/aggregateknowledge/postgresql-hll">https://github.com/aggregateknowledge/postgresql-hll</a> and soon
to be in <code>debian</code>.</p>

<center>
<p><img src="../../../images/cardinality1.jpg" alt=""></p>
</center>

<center>
<p><em>How to Compute Cardinality?</em></p>
</center>

<h3>Installing postgresql-hll</h3>

<p class="first">It's as simple as <code>CREATE EXTENSION hll;</code> really, even if to get there you
must have installed the <em>package</em> on your system. We did some packaging work
for <code>debian</code> and the result should appear soon in a distro near you.</p>

<p>Then you also need to keep your data in some table, straight from the
documentation we can use that schema:</p>

<pre class="src">
<span style="color: #b22222;">-- Create the destination table
</span><span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">TABLE</span> <span style="color: #0000ff;">daily_uniques</span> (
<span style="color: #228b22;">DATE</span>            <span style="color: #228b22;">DATE</span> <span style="color: #7f007f;">UNIQUE</span>,
users           hll
);
</pre>

<p>Then to add some data for which you want to know the <em>cardinality</em> of, it's as
simple as in the following <code>UPDATE</code> statement:</p>

<pre class="src">
<span style="color: #da70d6;">UPDATE</span> daily_uniques
   <span style="color: #da70d6;">SET</span> users = hll_add(users, hll_hash_text(<span style="color: #bc8f8f;">'123.123.123.123'</span>))
 <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>;
</pre>

<p>So in our example what you see is that we want to decipher how many unique
IP addresses we saw, and we do that by first creating a <em>hash</em> of that source
data then calling <code>hll_add()</code> with the current value and the hash result.</p>

<p>The current value must be initialized using <code>hll_empty()</code>.</p>


<h3>Concurrency</h3>

<p class="first">The most awake readers among you have already spotted that: using an <code>UPDATE</code>
on the same row over and over again is a good recipe to kill any form of
concurrency, so you don't want to do that on your production setup unless
you don't care about those <code>UPDATE waiting</code> piling up in your system.</p>

<p>The idea is then to fill-in a queue of updates and asynchronously update the
<code>daily_uniques</code> table from that queue, possibly using the <code>hll_add_agg</code>
aggregate that the extension provides, so that you do only one <code>update</code> per
batch of values to process.</p>


<h3>∅: Empty Set and NULL</h3>

<center>
<p><img src="../../../images/EmptySet_L.gif" alt=""></p>
</center>

<center>
<p><em>Yes there's a <a href="http://www.unicodemap.org/details/0x2205/index.html">unicode</a> entry for that, ∅</em></p>
</center>

<p>Now, what happens when the batch of new unique values you want to update
from is itself empty? Well I would have expected <code>hll_add_agg</code> over an empty
set to return an empty <code>hll</code> value, the same as returned by <code>hll_empty()</code>, but
it turns out it's returning <code>NULL</code> instead.</p>

<p>And then <code>hll_add(users, NULL)</code> will happily return <code>NULL</code>. So the next <code>UPDATE</code>
is cancelling all the previous work, which is not nice. We had to cater for
that case explicitely in the <code>UPDATE</code> query that's working from the batch of
new values to add to our current <em>HyperLogLog</em> hash entry, and I can't resist
to show off one of the most awesome PostgreSQL features here: <em>writable CTE</em>.</p>

<pre class="src">
<span style="color: #7f007f;">WITH</span> hll(agg) <span style="color: #7f007f;">AS</span> (
  <span style="color: #7f007f;">SELECT</span> hll_add_agg(hll_hash_text(<span style="color: #da70d6;">value</span>)) <span style="color: #7f007f;">FROM</span> new_batch
)
  <span style="color: #da70d6;">UPDATE</span> daily_uniques
     <span style="color: #da70d6;">SET</span> users = <span style="color: #7f007f;">CASE</span> <span style="color: #7f007f;">WHEN</span> hll.agg <span style="color: #7f007f;">IS</span> <span style="color: #7f007f;">NULL</span> <span style="color: #7f007f;">THEN</span> users
                      <span style="color: #7f007f;">ELSE</span> hll_union(users, hll.agg)
                  <span style="color: #7f007f;">END</span>
    <span style="color: #7f007f;">FROM</span> hll
   <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>;
</pre>

<p>That's how you protect against an empty set being turned into a <code>NULL</code>. I
think the real fix would need to be included in <code>postgresql-hll</code> itself, in
making it so that the <code>hll_add_agg</code> aggregate returns <code>hll_empty()</code> on an empty
set, and I will report that bug (with that very article as the detailed
explanation of it).</p>


<h3>Using postgresql-hll</h3>

<p class="first">When using <code>postgresql-hll</code> on the production system, we were able to get some
good looking numbers from our <code>daily_uniques</code> table:</p>

<pre class="src">
<span style="color: #7f007f;">with</span> stats <span style="color: #7f007f;">as</span> (
  <span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, #users <span style="color: #7f007f;">as</span> daily, #hll_union_agg(users) <span style="color: #7f007f;">over</span>() <span style="color: #7f007f;">as</span> total
    <span style="color: #7f007f;">from</span> daily_uniques
)
  <span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>,
         round(daily) <span style="color: #7f007f;">as</span> daily,
         round((daily/total*100)::<span style="color: #228b22;">numeric</span>, 2) <span style="color: #7f007f;">as</span> percent
    <span style="color: #7f007f;">from</span> stats
<span style="color: #7f007f;">order</span> <span style="color: #da70d6;">by</span> <span style="color: #228b22;">date</span>;
    <span style="color: #228b22;">date</span>    | daily  | percent
<span style="color: #b22222;">------------+--------+---------
</span> 2013-02-22 | 401677 |   25.19
 2013-02-23 | 660187 |   41.41
 2013-02-24 | 869980 |   54.56
 2013-02-25 | 154996 |    9.72
(4 <span style="color: #da70d6;">rows</span>)
</pre>

<p>I coulnd't resist to show off two of my favorite SQL constructs in that
example query here, which are the <a href="http://www.postgresql.org/docs/9.2/static/queries-with.html">Common Table Expressions</a> (or CTE) and
<a href="http://www.postgresql.org/docs/9.2/static/tutorial-window.html">window functions</a>. If that <code>over()</code> clause reads strange to you, take a minute
now and go read about it. Yes, do that now, we're waiting.</p>

<p>The data here is showing that we did setup the facility in the middle of the
first day, and that the morning's activity is quite low.</p>


<h3>Conclusion</h3>

<center>
<p><img src="../../../images/hll-dv-estimator.png" alt=""></p>
</center>

<center>
<p><em>The <a href="http://blog.aggregateknowledge.com/author/wwkae/">HyperLogLog DV estimator</a></em></p>
</center>

<p>When using <code>postgresql-hll</code> you need to be careful not to kill your
application concurrency abilities, and you need to protect yourself against
the ∅ killer too. The other thing to keep in mind is that the numbers you
get out of the <code>hll</code> technique are estimates within a given <em>precision</em>, and you
might want to read some more about what it means for your intended usage of
the feature.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 25 Feb 2013 10:23:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</guid>
</item>
<item>
  <title>PostgreSQL HyperLogLog</title>
  <link>http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</link>
  <description><![CDATA[<p>If you've been following along at home the newer statistics developments,
you might have heard about this new
<a href="http://research.google.com/pubs/pub40671.html">State of The Art Cardinality Estimation Algorithm</a> called <a href="http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data/">HyperLogLog</a>. This
technique is now available for PostgreSQL in the extension <a href="http://blog.aggregateknowledge.com/2013/02/04/open-source-release-postgresql-hll/">postgresql-hll</a>
available at <a href="https://github.com/aggregateknowledge/postgresql-hll">https://github.com/aggregateknowledge/postgresql-hll</a> and soon
to be in <code>debian</code>.</p>

<center>
<p><img src="../../../images/cardinality1.jpg" alt=""></p>
</center>

<center>
<p><em>How to Compute Cardinality?</em></p>
</center>

<h3>Installing postgresql-hll</h3>

<p class="first">It's as simple as <code>CREATE EXTENSION hll;</code> really, even if to get there you
must have installed the <em>package</em> on your system. We did some packaging work
for <code>debian</code> and the result should appear soon in a distro near you.</p>

<p>Then you also need to keep your data in some table, straight from the
documentation we can use that schema:</p>

<pre class="src">
<span style="color: #b22222;">-- Create the destination table
</span><span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">TABLE</span> <span style="color: #0000ff;">daily_uniques</span> (
<span style="color: #228b22;">DATE</span>            <span style="color: #228b22;">DATE</span> <span style="color: #7f007f;">UNIQUE</span>,
users           hll
);
</pre>

<p>Then to add some data for which you want to know the <em>cardinality</em> of, it's as
simple as in the following <code>UPDATE</code> statement:</p>

<pre class="src">
<span style="color: #da70d6;">UPDATE</span> daily_uniques
   <span style="color: #da70d6;">SET</span> users = hll_add(users, hll_hash_text(<span style="color: #bc8f8f;">'123.123.123.123'</span>))
 <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>;
</pre>

<p>So in our example what you see is that we want to decipher how many unique
IP addresses we saw, and we do that by first creating a <em>hash</em> of that source
data then calling <code>hll_add()</code> with the current value and the hash result.</p>

<p>The current value must be initialized using <code>hll_empty()</code>.</p>


<h3>Concurrency</h3>

<p class="first">The most awake readers among you have already spotted that: using an <code>UPDATE</code>
on the same row over and over again is a good recipe to kill any form of
concurrency, so you don't want to do that on your production setup unless
you don't care about those <code>UPDATE waiting</code> piling up in your system.</p>

<p>The idea is then to fill-in a queue of updates and asynchronously update the
<code>daily_uniques</code> table from that queue, possibly using the <code>hll_add_agg</code>
aggregate that the extension provides, so that you do only one <code>update</code> per
batch of values to process.</p>


<h3>∅: Empty Set and NULL</h3>

<center>
<p><img src="../../../images/EmptySet_L.gif" alt=""></p>
</center>

<center>
<p><em>Yes there's a <a href="http://www.unicodemap.org/details/0x2205/index.html">unicode</a> entry for that, ∅</em></p>
</center>

<p>Now, what happens when the batch of new unique values you want to update
from is itself empty? Well I would have expected <code>hll_add_agg</code> over an empty
set to return an empty <code>hll</code> value, the same as returned by <code>hll_empty()</code>, but
it turns out it's returning <code>NULL</code> instead.</p>

<p>And then <code>hll_add(users, NULL)</code> will happily return <code>NULL</code>. So the next <code>UPDATE</code>
is cancelling all the previous work, which is not nice. We had to cater for
that case explicitely in the <code>UPDATE</code> query that's working from the batch of
new values to add to our current <em>HyperLogLog</em> hash entry, and I can't resist
to show off one of the most awesome PostgreSQL features here: <em>writable CTE</em>.</p>

<pre class="src">
<span style="color: #7f007f;">WITH</span> hll(agg) <span style="color: #7f007f;">AS</span> (
  <span style="color: #7f007f;">SELECT</span> hll_add_agg(hll_hash_text(<span style="color: #da70d6;">value</span>)) <span style="color: #7f007f;">FROM</span> new_batch
)
  <span style="color: #da70d6;">UPDATE</span> daily_uniques
     <span style="color: #da70d6;">SET</span> users = <span style="color: #7f007f;">CASE</span> <span style="color: #7f007f;">WHEN</span> hll.agg <span style="color: #7f007f;">IS</span> <span style="color: #7f007f;">NULL</span> <span style="color: #7f007f;">THEN</span> users
                      <span style="color: #7f007f;">ELSE</span> hll_union(users, hll.agg)
                  <span style="color: #7f007f;">END</span>
    <span style="color: #7f007f;">FROM</span> hll
   <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>;
</pre>

<p>That's how you protect against an empty set being turned into a <code>NULL</code>. I
think the real fix would need to be included in <code>postgresql-hll</code> itself, in
making it so that the <code>hll_add_agg</code> aggregate returns <code>hll_empty()</code> on an empty
set, and I will report that bug (with that very article as the detailed
explanation of it).</p>


<h3>Using postgresql-hll</h3>

<p class="first">When using <code>postgresql-hll</code> on the production system, we were able to get some
good looking numbers from our <code>daily_uniques</code> table:</p>

<pre class="src">
<span style="color: #7f007f;">with</span> stats <span style="color: #7f007f;">as</span> (
  <span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, #users <span style="color: #7f007f;">as</span> daily, #hll_union_agg(users) <span style="color: #7f007f;">over</span>() <span style="color: #7f007f;">as</span> total
    <span style="color: #7f007f;">from</span> daily_uniques
)
  <span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>,
         round(daily) <span style="color: #7f007f;">as</span> daily,
         round((daily/total*100)::<span style="color: #228b22;">numeric</span>, 2) <span style="color: #7f007f;">as</span> percent
    <span style="color: #7f007f;">from</span> stats
<span style="color: #7f007f;">order</span> <span style="color: #da70d6;">by</span> <span style="color: #228b22;">date</span>;
    <span style="color: #228b22;">date</span>    | daily  | percent
<span style="color: #b22222;">------------+--------+---------
</span> 2013-02-22 | 401677 |   25.19
 2013-02-23 | 660187 |   41.41
 2013-02-24 | 869980 |   54.56
 2013-02-25 | 154996 |    9.72
(4 <span style="color: #da70d6;">rows</span>)
</pre>

<p>I coulnd't resist to show off two of my favorite SQL constructs in that
example query here, which are the <a href="http://www.postgresql.org/docs/9.2/static/queries-with.html">Common Table Expressions</a> (or CTE) and
<a href="http://www.postgresql.org/docs/9.2/static/tutorial-window.html">window functions</a>. If that <code>over()</code> clause reads strange to you, take a minute
now and go read about it. Yes, do that now, we're waiting.</p>

<p>The data here is showing that we did setup the facility in the middle of the
first day, and that the morning's activity is quite low.</p>


<h3>Conclusion</h3>

<center>
<p><img src="../../../images/hll-dv-estimator.png" alt=""></p>
</center>

<center>
<p><em>The <a href="http://blog.aggregateknowledge.com/author/wwkae/">HyperLogLog DV estimator</a></em></p>
</center>

<p>When using <code>postgresql-hll</code> you need to be careful not to kill your
application concurrency abilities, and you need to protect yourself against
the ∅ killer too. The other thing to keep in mind is that the numbers you
get out of the <code>hll</code> technique are estimates within a given <em>precision</em>, and you
might want to read some more about what it means for your intended usage of
the feature.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 25 Feb 2013 10:23:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</guid>
</item>
<item>
  <title>PostgreSQL HyperLogLog</title>
  <link>http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</link>
  <description><![CDATA[<p>If you've been following along at home the newer statistics developments,
you might have heard about this new
<a href="http://research.google.com/pubs/pub40671.html">State of The Art Cardinality Estimation Algorithm</a> called <a href="http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data/">HyperLogLog</a>. This
technique is now available for PostgreSQL in the extension <a href="http://blog.aggregateknowledge.com/2013/02/04/open-source-release-postgresql-hll/">postgresql-hll</a>
available at <a href="https://github.com/aggregateknowledge/postgresql-hll">https://github.com/aggregateknowledge/postgresql-hll</a> and soon
to be in <code>debian</code>.</p>

<center>
<p><img src="../../../images/cardinality1.jpg" alt=""></p>
</center>

<center>
<p><em>How to Compute Cardinality?</em></p>
</center>

<h3>Installing postgresql-hll</h3>

<p class="first">It's as simple as <code>CREATE EXTENSION hll;</code> really, even if to get there you
must have installed the <em>package</em> on your system. We did some packaging work
for <code>debian</code> and the result should appear soon in a distro near you.</p>

<p>Then you also need to keep your data in some table, straight from the
documentation we can use that schema:</p>

<pre class="src">
<span style="color: #b22222;">-- Create the destination table
</span><span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">TABLE</span> <span style="color: #0000ff;">daily_uniques</span> (
<span style="color: #228b22;">DATE</span>            <span style="color: #228b22;">DATE</span> <span style="color: #7f007f;">UNIQUE</span>,
users           hll
);
</pre>

<p>Then to add some data for which you want to know the <em>cardinality</em> of, it's as
simple as in the following <code>UPDATE</code> statement:</p>

<pre class="src">
<span style="color: #da70d6;">UPDATE</span> daily_uniques
   <span style="color: #da70d6;">SET</span> users = hll_add(users, hll_hash_text(<span style="color: #bc8f8f;">'123.123.123.123'</span>))
 <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>;
</pre>

<p>So in our example what you see is that we want to decipher how many unique
IP addresses we saw, and we do that by first creating a <em>hash</em> of that source
data then calling <code>hll_add()</code> with the current value and the hash result.</p>

<p>The current value must be initialized using <code>hll_empty()</code>.</p>


<h3>Concurrency</h3>

<p class="first">The most awake readers among you have already spotted that: using an <code>UPDATE</code>
on the same row over and over again is a good recipe to kill any form of
concurrency, so you don't want to do that on your production setup unless
you don't care about those <code>UPDATE waiting</code> piling up in your system.</p>

<p>The idea is then to fill-in a queue of updates and asynchronously update the
<code>daily_uniques</code> table from that queue, possibly using the <code>hll_add_agg</code>
aggregate that the extension provides, so that you do only one <code>update</code> per
batch of values to process.</p>


<h3>∅: Empty Set and NULL</h3>

<center>
<p><img src="../../../images/EmptySet_L.gif" alt=""></p>
</center>

<center>
<p><em>Yes there's a <a href="http://www.unicodemap.org/details/0x2205/index.html">unicode</a> entry for that, ∅</em></p>
</center>

<p>Now, what happens when the batch of new unique values you want to update
from is itself empty? Well I would have expected <code>hll_add_agg</code> over an empty
set to return an empty <code>hll</code> value, the same as returned by <code>hll_empty()</code>, but
it turns out it's returning <code>NULL</code> instead.</p>

<p>And then <code>hll_add(users, NULL)</code> will happily return <code>NULL</code>. So the next <code>UPDATE</code>
is cancelling all the previous work, which is not nice. We had to cater for
that case explicitely in the <code>UPDATE</code> query that's working from the batch of
new values to add to our current <em>HyperLogLog</em> hash entry, and I can't resist
to show off one of the most awesome PostgreSQL features here: <em>writable CTE</em>.</p>

<pre class="src">
<span style="color: #7f007f;">WITH</span> hll(agg) <span style="color: #7f007f;">AS</span> (
  <span style="color: #7f007f;">SELECT</span> hll_add_agg(hll_hash_text(<span style="color: #da70d6;">value</span>)) <span style="color: #7f007f;">FROM</span> new_batch
)
  <span style="color: #da70d6;">UPDATE</span> daily_uniques
     <span style="color: #da70d6;">SET</span> users = <span style="color: #7f007f;">CASE</span> <span style="color: #7f007f;">WHEN</span> hll.agg <span style="color: #7f007f;">IS</span> <span style="color: #7f007f;">NULL</span> <span style="color: #7f007f;">THEN</span> users
                      <span style="color: #7f007f;">ELSE</span> hll_union(users, hll.agg)
                  <span style="color: #7f007f;">END</span>
    <span style="color: #7f007f;">FROM</span> hll
   <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>;
</pre>

<p>That's how you protect against an empty set being turned into a <code>NULL</code>. I
think the real fix would need to be included in <code>postgresql-hll</code> itself, in
making it so that the <code>hll_add_agg</code> aggregate returns <code>hll_empty()</code> on an empty
set, and I will report that bug (with that very article as the detailed
explanation of it).</p>


<h3>Using postgresql-hll</h3>

<p class="first">When using <code>postgresql-hll</code> on the production system, we were able to get some
good looking numbers from our <code>daily_uniques</code> table:</p>

<pre class="src">
<span style="color: #7f007f;">with</span> stats <span style="color: #7f007f;">as</span> (
  <span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, #users <span style="color: #7f007f;">as</span> daily, #hll_union_agg(users) <span style="color: #7f007f;">over</span>() <span style="color: #7f007f;">as</span> total
    <span style="color: #7f007f;">from</span> daily_uniques
)
  <span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>,
         round(daily) <span style="color: #7f007f;">as</span> daily,
         round((daily/total*100)::<span style="color: #228b22;">numeric</span>, 2) <span style="color: #7f007f;">as</span> percent
    <span style="color: #7f007f;">from</span> stats
<span style="color: #7f007f;">order</span> <span style="color: #da70d6;">by</span> <span style="color: #228b22;">date</span>;
    <span style="color: #228b22;">date</span>    | daily  | percent
<span style="color: #b22222;">------------+--------+---------
</span> 2013-02-22 | 401677 |   25.19
 2013-02-23 | 660187 |   41.41
 2013-02-24 | 869980 |   54.56
 2013-02-25 | 154996 |    9.72
(4 <span style="color: #da70d6;">rows</span>)
</pre>

<p>I coulnd't resist to show off two of my favorite SQL constructs in that
example query here, which are the <a href="http://www.postgresql.org/docs/9.2/static/queries-with.html">Common Table Expressions</a> (or CTE) and
<a href="http://www.postgresql.org/docs/9.2/static/tutorial-window.html">window functions</a>. If that <code>over()</code> clause reads strange to you, take a minute
now and go read about it. Yes, do that now, we're waiting.</p>

<p>The data here is showing that we did setup the facility in the middle of the
first day, and that the morning's activity is quite low.</p>


<h3>Conclusion</h3>

<center>
<p><img src="../../../images/hll-dv-estimator.png" alt=""></p>
</center>

<center>
<p><em>The <a href="http://blog.aggregateknowledge.com/author/wwkae/">HyperLogLog DV estimator</a></em></p>
</center>

<p>When using <code>postgresql-hll</code> you need to be careful not to kill your
application concurrency abilities, and you need to protect yourself against
the ∅ killer too. The other thing to keep in mind is that the numbers you
get out of the <code>hll</code> technique are estimates within a given <em>precision</em>, and you
might want to read some more about what it means for your intended usage of
the feature.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 25 Feb 2013 10:23:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</guid>
</item>
<item>
  <title>PostgreSQL HyperLogLog</title>
  <link>http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</link>
  <description><![CDATA[<p>If you've been following along at home the newer statistics developments,
you might have heard about this new
<a href="http://research.google.com/pubs/pub40671.html">State of The Art Cardinality Estimation Algorithm</a> called <a href="http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data/">HyperLogLog</a>. This
technique is now available for PostgreSQL in the extension <a href="http://blog.aggregateknowledge.com/2013/02/04/open-source-release-postgresql-hll/">postgresql-hll</a>
available at <a href="https://github.com/aggregateknowledge/postgresql-hll">https://github.com/aggregateknowledge/postgresql-hll</a> and soon
to be in <code>debian</code>.</p>

<center>
<p><img src="../../../images/cardinality1.jpg" alt=""></p>
</center>

<center>
<p><em>How to Compute Cardinality?</em></p>
</center>

<h3>Installing postgresql-hll</h3>

<p class="first">It's as simple as <code>CREATE EXTENSION hll;</code> really, even if to get there you
must have installed the <em>package</em> on your system. We did some packaging work
for <code>debian</code> and the result should appear soon in a distro near you.</p>

<p>Then you also need to keep your data in some table, straight from the
documentation we can use that schema:</p>

<pre class="src">
<span style="color: #b22222;">-- Create the destination table
</span><span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">TABLE</span> <span style="color: #0000ff;">daily_uniques</span> (
<span style="color: #228b22;">DATE</span>            <span style="color: #228b22;">DATE</span> <span style="color: #7f007f;">UNIQUE</span>,
users           hll
);
</pre>

<p>Then to add some data for which you want to know the <em>cardinality</em> of, it's as
simple as in the following <code>UPDATE</code> statement:</p>

<pre class="src">
<span style="color: #da70d6;">UPDATE</span> daily_uniques
   <span style="color: #da70d6;">SET</span> users = hll_add(users, hll_hash_text(<span style="color: #bc8f8f;">'123.123.123.123'</span>))
 <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>;
</pre>

<p>So in our example what you see is that we want to decipher how many unique
IP addresses we saw, and we do that by first creating a <em>hash</em> of that source
data then calling <code>hll_add()</code> with the current value and the hash result.</p>

<p>The current value must be initialized using <code>hll_empty()</code>.</p>


<h3>Concurrency</h3>

<p class="first">The most awake readers among you have already spotted that: using an <code>UPDATE</code>
on the same row over and over again is a good recipe to kill any form of
concurrency, so you don't want to do that on your production setup unless
you don't care about those <code>UPDATE waiting</code> piling up in your system.</p>

<p>The idea is then to fill-in a queue of updates and asynchronously update the
<code>daily_uniques</code> table from that queue, possibly using the <code>hll_add_agg</code>
aggregate that the extension provides, so that you do only one <code>update</code> per
batch of values to process.</p>


<h3>∅: Empty Set and NULL</h3>

<center>
<p><img src="../../../images/EmptySet_L.gif" alt=""></p>
</center>

<center>
<p><em>Yes there's a <a href="http://www.unicodemap.org/details/0x2205/index.html">unicode</a> entry for that, ∅</em></p>
</center>

<p>Now, what happens when the batch of new unique values you want to update
from is itself empty? Well I would have expected <code>hll_add_agg</code> over an empty
set to return an empty <code>hll</code> value, the same as returned by <code>hll_empty()</code>, but
it turns out it's returning <code>NULL</code> instead.</p>

<p>And then <code>hll_add(users, NULL)</code> will happily return <code>NULL</code>. So the next <code>UPDATE</code>
is cancelling all the previous work, which is not nice. We had to cater for
that case explicitely in the <code>UPDATE</code> query that's working from the batch of
new values to add to our current <em>HyperLogLog</em> hash entry, and I can't resist
to show off one of the most awesome PostgreSQL features here: <em>writable CTE</em>.</p>

<pre class="src">
<span style="color: #7f007f;">WITH</span> hll(agg) <span style="color: #7f007f;">AS</span> (
  <span style="color: #7f007f;">SELECT</span> hll_add_agg(hll_hash_text(<span style="color: #da70d6;">value</span>)) <span style="color: #7f007f;">FROM</span> new_batch
)
  <span style="color: #da70d6;">UPDATE</span> daily_uniques
     <span style="color: #da70d6;">SET</span> users = <span style="color: #7f007f;">CASE</span> <span style="color: #7f007f;">WHEN</span> hll.agg <span style="color: #7f007f;">IS</span> <span style="color: #7f007f;">NULL</span> <span style="color: #7f007f;">THEN</span> users
                      <span style="color: #7f007f;">ELSE</span> hll_union(users, hll.agg)
                  <span style="color: #7f007f;">END</span>
    <span style="color: #7f007f;">FROM</span> hll
   <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>;
</pre>

<p>That's how you protect against an empty set being turned into a <code>NULL</code>. I
think the real fix would need to be included in <code>postgresql-hll</code> itself, in
making it so that the <code>hll_add_agg</code> aggregate returns <code>hll_empty()</code> on an empty
set, and I will report that bug (with that very article as the detailed
explanation of it).</p>


<h3>Using postgresql-hll</h3>

<p class="first">When using <code>postgresql-hll</code> on the production system, we were able to get some
good looking numbers from our <code>daily_uniques</code> table:</p>

<pre class="src">
<span style="color: #7f007f;">with</span> stats <span style="color: #7f007f;">as</span> (
  <span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, #users <span style="color: #7f007f;">as</span> daily, #hll_union_agg(users) <span style="color: #7f007f;">over</span>() <span style="color: #7f007f;">as</span> total
    <span style="color: #7f007f;">from</span> daily_uniques
)
  <span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>,
         round(daily) <span style="color: #7f007f;">as</span> daily,
         round((daily/total*100)::<span style="color: #228b22;">numeric</span>, 2) <span style="color: #7f007f;">as</span> percent
    <span style="color: #7f007f;">from</span> stats
<span style="color: #7f007f;">order</span> <span style="color: #da70d6;">by</span> <span style="color: #228b22;">date</span>;
    <span style="color: #228b22;">date</span>    | daily  | percent
<span style="color: #b22222;">------------+--------+---------
</span> 2013-02-22 | 401677 |   25.19
 2013-02-23 | 660187 |   41.41
 2013-02-24 | 869980 |   54.56
 2013-02-25 | 154996 |    9.72
(4 <span style="color: #da70d6;">rows</span>)
</pre>

<p>I coulnd't resist to show off two of my favorite SQL constructs in that
example query here, which are the <a href="http://www.postgresql.org/docs/9.2/static/queries-with.html">Common Table Expressions</a> (or CTE) and
<a href="http://www.postgresql.org/docs/9.2/static/tutorial-window.html">window functions</a>. If that <code>over()</code> clause reads strange to you, take a minute
now and go read about it. Yes, do that now, we're waiting.</p>

<p>The data here is showing that we did setup the facility in the middle of the
first day, and that the morning's activity is quite low.</p>


<h3>Conclusion</h3>

<center>
<p><img src="../../../images/hll-dv-estimator.png" alt=""></p>
</center>

<center>
<p><em>The <a href="http://blog.aggregateknowledge.com/author/wwkae/">HyperLogLog DV estimator</a></em></p>
</center>

<p>When using <code>postgresql-hll</code> you need to be careful not to kill your
application concurrency abilities, and you need to protect yourself against
the ∅ killer too. The other thing to keep in mind is that the numbers you
get out of the <code>hll</code> technique are estimates within a given <em>precision</em>, and you
might want to read some more about what it means for your intended usage of
the feature.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 25 Feb 2013 10:23:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</guid>
</item>
<item>
  <title>PostgreSQL HyperLogLog</title>
  <link>http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</link>
  <description><![CDATA[<p>If you've been following along at home the newer statistics developments,
you might have heard about this new
<a href="http://research.google.com/pubs/pub40671.html">State of The Art Cardinality Estimation Algorithm</a> called <a href="http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data/">HyperLogLog</a>. This
technique is now available for PostgreSQL in the extension <a href="http://blog.aggregateknowledge.com/2013/02/04/open-source-release-postgresql-hll/">postgresql-hll</a>
available at <a href="https://github.com/aggregateknowledge/postgresql-hll">https://github.com/aggregateknowledge/postgresql-hll</a> and soon
to be in <code>debian</code>.</p>

<center>
<p><img src="../../../images/cardinality1.jpg" alt=""></p>
</center>

<center>
<p><em>How to Compute Cardinality?</em></p>
</center>

<h3>Installing postgresql-hll</h3>

<p class="first">It's as simple as <code>CREATE EXTENSION hll;</code> really, even if to get there you
must have installed the <em>package</em> on your system. We did some packaging work
for <code>debian</code> and the result should appear soon in a distro near you.</p>

<p>Then you also need to keep your data in some table, straight from the
documentation we can use that schema:</p>

<pre class="src">
<span style="color: #b22222;">-- Create the destination table
</span><span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">TABLE</span> <span style="color: #0000ff;">daily_uniques</span> (
<span style="color: #228b22;">DATE</span>            <span style="color: #228b22;">DATE</span> <span style="color: #7f007f;">UNIQUE</span>,
users           hll
);
</pre>

<p>Then to add some data for which you want to know the <em>cardinality</em> of, it's as
simple as in the following <code>UPDATE</code> statement:</p>

<pre class="src">
<span style="color: #da70d6;">UPDATE</span> daily_uniques
   <span style="color: #da70d6;">SET</span> users = hll_add(users, hll_hash_text(<span style="color: #bc8f8f;">'123.123.123.123'</span>))
 <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>;
</pre>

<p>So in our example what you see is that we want to decipher how many unique
IP addresses we saw, and we do that by first creating a <em>hash</em> of that source
data then calling <code>hll_add()</code> with the current value and the hash result.</p>

<p>The current value must be initialized using <code>hll_empty()</code>.</p>


<h3>Concurrency</h3>

<p class="first">The most awake readers among you have already spotted that: using an <code>UPDATE</code>
on the same row over and over again is a good recipe to kill any form of
concurrency, so you don't want to do that on your production setup unless
you don't care about those <code>UPDATE waiting</code> piling up in your system.</p>

<p>The idea is then to fill-in a queue of updates and asynchronously update the
<code>daily_uniques</code> table from that queue, possibly using the <code>hll_add_agg</code>
aggregate that the extension provides, so that you do only one <code>update</code> per
batch of values to process.</p>


<h3>∅: Empty Set and NULL</h3>

<center>
<p><img src="../../../images/EmptySet_L.gif" alt=""></p>
</center>

<center>
<p><em>Yes there's a <a href="http://www.unicodemap.org/details/0x2205/index.html">unicode</a> entry for that, ∅</em></p>
</center>

<p>Now, what happens when the batch of new unique values you want to update
from is itself empty? Well I would have expected <code>hll_add_agg</code> over an empty
set to return an empty <code>hll</code> value, the same as returned by <code>hll_empty()</code>, but
it turns out it's returning <code>NULL</code> instead.</p>

<p>And then <code>hll_add(users, NULL)</code> will happily return <code>NULL</code>. So the next <code>UPDATE</code>
is cancelling all the previous work, which is not nice. We had to cater for
that case explicitely in the <code>UPDATE</code> query that's working from the batch of
new values to add to our current <em>HyperLogLog</em> hash entry, and I can't resist
to show off one of the most awesome PostgreSQL features here: <em>writable CTE</em>.</p>

<pre class="src">
<span style="color: #7f007f;">WITH</span> hll(agg) <span style="color: #7f007f;">AS</span> (
  <span style="color: #7f007f;">SELECT</span> hll_add_agg(hll_hash_text(<span style="color: #da70d6;">value</span>)) <span style="color: #7f007f;">FROM</span> new_batch
)
  <span style="color: #da70d6;">UPDATE</span> daily_uniques
     <span style="color: #da70d6;">SET</span> users = <span style="color: #7f007f;">CASE</span> <span style="color: #7f007f;">WHEN</span> hll.agg <span style="color: #7f007f;">IS</span> <span style="color: #7f007f;">NULL</span> <span style="color: #7f007f;">THEN</span> users
                      <span style="color: #7f007f;">ELSE</span> hll_union(users, hll.agg)
                  <span style="color: #7f007f;">END</span>
    <span style="color: #7f007f;">FROM</span> hll
   <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>;
</pre>

<p>That's how you protect against an empty set being turned into a <code>NULL</code>. I
think the real fix would need to be included in <code>postgresql-hll</code> itself, in
making it so that the <code>hll_add_agg</code> aggregate returns <code>hll_empty()</code> on an empty
set, and I will report that bug (with that very article as the detailed
explanation of it).</p>


<h3>Using postgresql-hll</h3>

<p class="first">When using <code>postgresql-hll</code> on the production system, we were able to get some
good looking numbers from our <code>daily_uniques</code> table:</p>

<pre class="src">
<span style="color: #7f007f;">with</span> stats <span style="color: #7f007f;">as</span> (
  <span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, #users <span style="color: #7f007f;">as</span> daily, #hll_union_agg(users) <span style="color: #7f007f;">over</span>() <span style="color: #7f007f;">as</span> total
    <span style="color: #7f007f;">from</span> daily_uniques
)
  <span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>,
         round(daily) <span style="color: #7f007f;">as</span> daily,
         round((daily/total*100)::<span style="color: #228b22;">numeric</span>, 2) <span style="color: #7f007f;">as</span> percent
    <span style="color: #7f007f;">from</span> stats
<span style="color: #7f007f;">order</span> <span style="color: #da70d6;">by</span> <span style="color: #228b22;">date</span>;
    <span style="color: #228b22;">date</span>    | daily  | percent
<span style="color: #b22222;">------------+--------+---------
</span> 2013-02-22 | 401677 |   25.19
 2013-02-23 | 660187 |   41.41
 2013-02-24 | 869980 |   54.56
 2013-02-25 | 154996 |    9.72
(4 <span style="color: #da70d6;">rows</span>)
</pre>

<p>I coulnd't resist to show off two of my favorite SQL constructs in that
example query here, which are the <a href="http://www.postgresql.org/docs/9.2/static/queries-with.html">Common Table Expressions</a> (or CTE) and
<a href="http://www.postgresql.org/docs/9.2/static/tutorial-window.html">window functions</a>. If that <code>over()</code> clause reads strange to you, take a minute
now and go read about it. Yes, do that now, we're waiting.</p>

<p>The data here is showing that we did setup the facility in the middle of the
first day, and that the morning's activity is quite low.</p>


<h3>Conclusion</h3>

<center>
<p><img src="../../../images/hll-dv-estimator.png" alt=""></p>
</center>

<center>
<p><em>The <a href="http://blog.aggregateknowledge.com/author/wwkae/">HyperLogLog DV estimator</a></em></p>
</center>

<p>When using <code>postgresql-hll</code> you need to be careful not to kill your
application concurrency abilities, and you need to protect yourself against
the ∅ killer too. The other thing to keep in mind is that the numbers you
get out of the <code>hll</code> technique are estimates within a given <em>precision</em>, and you
might want to read some more about what it means for your intended usage of
the feature.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 25 Feb 2013 10:23:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</guid>
</item>
<item>
  <title>Playing with pgloader</title>
  <link>http://tapoueh.org/blog/2013/02/12-playing-with-pgloader.html</link>
  <description><![CDATA[<p>While making progress with both <a href="http://wiki.postgresql.org/wiki/Event_Triggers">Event Triggers</a> and <a href="http://tapoueh.org/blog/2013/01/08-Extensions-Templates.html">Extension Templates</a>, I
needed to make a little break. My current keeping sane mental exercise seems
to mainly involve using <em>Common Lisp</em>, a programming language that ships with
about all the building blocks you need.</p>

<center>
<p><img src="../../../images/made-with-lisp.png" alt=""></p>
</center>

<center>
<p><em>Yes, that old language brings so much on the table</em></p>
</center>

<p>When using <em>Common Lisp</em>, you have an awesome interactive development
environment where you can redefine function and objects <em>while testing them</em>.
That means you don't have to quit the interpreter, reload the new version of
the code and put the interactive test case together all over again after a
change. Just evaluate the change in the interactive environement: functions
are compiled incrementally over their previous definition, objects whose
classes have changed are migrated live.</p>

<p>See, I just said <em>objects</em> and <em>classes</em>. <em>Common Lisp</em> comes with some advanced
<em>Object Oriented Programming</em> facilities named <a href="http://www.aiai.ed.ac.uk/~jeff/clos-guide.html">CLOS</a> and <a href="http://www.alu.org/mop/index.html">MOP</a> where the <em>Java</em> and
<em>Python</em> and <em>C++</em> object models are just a subset of what you're being offered.
Hint, those don't have <a href="http://en.wikipedia.org/wiki/Multiple_dispatch">Multiple Dispatch</a>.</p>

<p>And you have a very sophisticated <a href="http://www.gigamonkeys.com/book/beyond-exception-handling-conditions-and-restarts.html">Condition System</a> where <em>Exceptions</em> are just
a subset of what you can do (hint: have a look a <a href="http://www.gigamonkeys.com/book/beyond-exception-handling-conditions-and-restarts.html#restarts">restarts</a> and tell me you
didn't wish your programming language of choice had them). And it continues
that way for about any basic building bloc you might want to be using.</p>

<h3>Loading data</h3>

<p class="first">Back to <a href="http://tapoueh.org/pgsql/pgloader.html">pgloader</a> will you tell me. Right. I've been spending a couple of
evening on hacking on the new version of pgloader in <em>Common Lisp</em>, and wanted
to share some preliminary results.</p>

<center>
<p><img src="../../../images/toy-loader.320.jpg" alt=""></p>
</center>

<center>
<p><em>Playing with the loader</em></p>
</center>

<p>The current status of the new <em>pgloader</em> still is pretty rough, if you're not
used to develop in Common Lisp you might not find it ready for use yet. I'm
still working on the internal APIs and trying to make something clean and
easy to use for a developer, and then I will provide some external ways to
play with it, user oriented. I missed that step once with the <em>Python</em> based
version of the tool, I don't want to do the same errors again this time.</p>

<p>So here's a test run with the current <em>pgloader</em>, on a small enough data set
of <code>226 MB</code> of <code>CSV</code> files.</p>

<pre class="src">
time python pgloader.py -R.. --summary -Tc ../pgloader.dbname.conf

Table name        |    duration |    size |  copy rows |     errors
====================================================================
aaaaaaaaaa_aaaa   |      2.148s |       - |      24595 |          0
bbbbbbbbbb_bbbb...|      0.609s |       - |        326 |          0
cccccccccc_cccc...|      2.868s |       - |      25126 |          0
dddddddddd_dddd...|      0.638s |       - |          8 |          0
eeeeeeeeee_eeee...|      2.874s |       - |      36825 |          0
ffffffffff_ffffff |      0.667s |       - |        624 |          0
gggggggggg_gggg...|      0.847s |       - |       5638 |          0
hhh_hhhhhhh       |      9.907s |       - |     120159 |          0
iii_iiiiiiiiiiiii |      0.574s |       - |        661 |          0
jjjjjjj           |      6.647s |       - |      30027 |          0
kkk_kkkkkkkkk     |      0.439s |       - |         12 |          0
lll_llllll        |      0.308s |       - |          4 |          0
mmmm_mmm          |      2.139s |       - |      29669 |          0
nnnn_nnnnnn       |      8.555s |       - |     100197 |          0
oooo_ooooo        |     13.781s |       - |      93555 |          0
pppp_ppppppp      |      8.275s |       - |      76457 |          0
qqqq_qqqqqqqqqqqq |      8.568s |       - |     126159 |          0
====================================================================
Total             |  01m09.902s |       - |     670042 |          0
</pre>


<h3>Streaming data</h3>

<p class="first">With the new code in <em>Common Lisp</em>, I could benefit from real multi threading
and higher level abstraction to make it easy to use: <a href="http://lparallel.org/">lparallel</a> is a lib
providing exactly what I need here, with <em>workers</em> and <em>queues</em> to communicate
data in between them.</p>

<p>What I'm doing is that two threads are separated, one is reading the data
from either a <code>CSV</code> file or a <em>MySQL</em> database directly, and pushing that data
in the queue; while the other thread is pulling data from the queue and
writing it into our <a href="http://www.postgresql.org/">PostgreSQL</a> database.</p>

<pre class="src">
CL-USER&gt; (pgloader.csv:import-database <span style="color: #bc8f8f;">"dbname"</span>
            :csv-path-root <span style="color: #bc8f8f;">"/path/to/csv/"</span>
            :separator #\Tab
            :quote #\"
            :escape <span style="color: #bc8f8f;">"\"\""</span>
            :null-as <span style="color: #bc8f8f;">":null:"</span>)
                    table name       read   imported     errors       time
------------------------------  ---------  ---------  ---------  ---------
               aaaaaaaaaa_aaaa      24595      24595          0     0.995s
          bbbbbbbbbb_bbbbbbbbb        326        326          0     0.570s
       cccccccccc_cccccccccccc      25126      25126          0     1.461s
      dddddddddd_dddddddddd_dd          8          8          0     0.650s
eeeeeeeeee_eeeeeeeeee_eeeeeeee      36825      36825          0     1.664s
             ffffffffff_ffffff        624        624          0     0.707s
     gggggggggg_ggggg_gggggggg       5638       5638          0     0.655s
                   hhh_hhhhhhh     120159     120159          0     3.415s
             iii_iiiiiiiiiiiii        661        661          0     0.420s
                       jjjjjjj      30027      30027          0     2.743s
                 kkk_kkkkkkkkk         12         12          0     0.327s
                    lll_llllll          4          4          0     0.315s
                      mmmm_mmm      29669      29669          0     1.182s
                   nnnn_nnnnnn     100197     100197          0     2.206s
                    oooo_ooooo      93555      93555          0     9.683s
                  pppp_ppppppp      76457      76457          0     5.349s
             qqqq_qqqqqqqqqqqq     126159     126159          0     2.495s
------------------------------  ---------  ---------  ---------  ---------
             Total import time     670042     670042          0    34.836s
NIL
</pre>

<p>As you can see the control is still made for interactive developer usage,
which is fine for now but will have to change down the road, when the APIs
stabilize.</p>

<p>Now, let's compare to reading directly from <em>MySQL</em>:</p>

<pre class="src">
CL-USER&gt; (pgloader.mysql:stream-database <span style="color: #bc8f8f;">"dbname"</span>)
                    table name       read   imported     errors       time
------------------------------  ---------  ---------  ---------  ---------
               aaaaaaaaaa_aaaa      24595      24595          0     0.887s
          bbbbbbbbbb_bbbbbbbbb        326        326          0     0.617s
       cccccccccc_cccccccccccc      25126      25126          0     1.497s
      dddddddddd_dddddddddd_dd          8          8          0     0.582s
eeeeeeeeee_eeeeeeeeee_eeeeeeee      36825      36825          0     1.697s
             ffffffffff_ffffff        624        624          0     0.748s
     gggggggggg_ggggg_gggggggg       5638       5638          0     0.923s
                   hhh_hhhhhhh     120159     120159          0     3.525s
             iii_iiiiiiiiiiiii        661        661          0     0.449s
                       jjjjjjj      30027      30027          0     2.546s
                 kkk_kkkkkkkkk         12         12          0     0.330s
                    lll_llllll          4          4          0     0.323s
                      mmmm_mmm      29669      29669          0     1.227s
                   nnnn_nnnnnn     100197     100197          0     2.489s
                    oooo_ooooo      93555      93555          0     9.148s
                  pppp_ppppppp      76457      76457          0     6.713s
             qqqq_qqqqqqqqqqqq     126159     126159          0     4.571s
------------------------------  ---------  ---------  ---------  ---------
          Total streaming time     670042     670042          0    38.272s
NIL
</pre>

<p>The <em>streaming</em> here is a tad slower than the <em>importing</em> from files. Now if you
want to be fair when comparing those, you would have to take into account
the time it takes to <em>export</em> the data out from its source. When doing that
<em>export/import</em> dance, a quick test shows a timing of <code>1m4.745s</code>. Now, if we do
an <em>export only</em> test, it runs in <code>31.822s</code>. So yes streaming is a good thing to
have here.</p>


<h3>Conclusion</h3>

<p class="first">We just got twice as fast as the python version.</p>

<p>Some will say that I'm not comparing fairly to the <em>Python</em> version of
pgloader here, because I could have implemented the streaming facility in
<em>Python</em> too. Well actually I did, the option are called <a href="http://tapoueh.org/pgsql/pgloader.html#sec13">section_threads</a> and
<a href="http://tapoueh.org/pgsql/pgloader.html#sec15">split_file_reading</a>, that you can set so that a reader is pushing data into a
set of queues and several workers are feeding each from its own queue. It
didn't help with performances at all. Once again, read about the infamous
<a href="http://docs.python.org/3/c-api/init.html#threads">Global Interpreter Lock</a> to understand why not.</p>

<center>
<p><img src="../../../images/lisplogo_flag_128.png" alt=""></p>
</center>

<p>So actually it's a fair comparison here where the new code is twice as fast
as the previous one, with only some hours of hacking and before spending any
time on optimisation. Well, apart from using a <em>producer</em>, a <em>consumer</em> and a
<em>queue</em>, which I almost had to have for streaming in between two database
connections anyways.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 12 Feb 2013 11:17:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/12-playing-with-pgloader.html</guid>
</item>
<item>
  <title>Marking whole word</title>
  <link>http://tapoueh.org/blog/2013/02/04-Emacs-mark-word.html</link>
  <description><![CDATA[<p>I've discovered recently another Emacs facility that I since then use
several times a day, and I wonder how I did without it before: <code>C-M-SPC runs
the command mark-sexp</code>.</p>

<center>
<p><img src="../../../images/sexp.gif" alt=""></p>
</center>

<center>
<p><em>Well, <code>mark-sexp</code> apparently is related to the Sex Pistols</em></p>
</center>

<p>It's pretty simple actually, when you have the <em>point</em> at the beginning of a
word or an identifier (containing numbers, dashes, underscores and other
punctuation signs), you can select the <em>whole</em> of it in a single key chord!</p>

<p>The best thing is that if you press the same key chord again, it will expand
to include the next expression. And that works in plain text and most
programming languages where I've tried it, which is not so much recently. It
does not depend that much on the programming language anyway.</p>

<p>The full general solution here is to use something like <a href="https://github.com/magnars/expand-region.el">expand region</a>, don't
miss the <a href="http://emacsrocks.com/e09.html">Emacs Rocks Expand Region Episode</a>, it's less than 3 minutes and you
will want to install <em>expand-region</em> after that. For easy installing, of
course you are already using <a href="http://tapoueh.org/emacs/el-get.html">el-get</a> right?</p>

<p>Now, a friend just asked this morning how to select the <em>current word</em> even
when the the point is currently in the middle of it. Going manually back to
the beginning of it is no fun. I knew about <code>thing-at-point</code> and a little
about how it works, but didn't find anything readily made for that use case
(hint: it needs to be an <em>interactive</em> command).</p>

<p>Here's what I came up with, then:</p>

<pre class="src">
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">mha:select-current-word</span> ()
  <span style="color: #bc8f8f;">"Select the current word."</span>
  (interactive)
  (beginning-of-thing 'symbol)
  (push-mark (point) nil t)
  (end-of-thing 'symbol)
  (exchange-point-and-mark))

(global-set-key (kbd <span style="color: #bc8f8f;">"C-S-M-SPC"</span>) 'mha:select-current-word)
</pre>

<p>I picked <code>C-M-S-SPC</code> not because it's the easiest way to invoke the new
command, but because to me it's a quite natural extension to the <code>C-M-SPC</code>
that I use so often. Again, each time you want to <em>select</em> a identifier in
some code of yours, you'd most certainly be better off using <code>C-M-SPC</code>.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 08 Feb 2013 17:15:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/04-Emacs-mark-word.html</guid>
</item>
<item>
  <title>Live Upgrading PGQ</title>
  <link>http://tapoueh.org/blog/2013/02/08-PGQ-Live-Upgrade.html</link>
  <description><![CDATA[<p>Some <a href="http://skytools.projects.pgfoundry.org/skytools-3.0/doc/">skytools</a> related new today, it's been a while. For those who where at
my <a href="http://tapoueh.org/blog/2013/02/04-Another-great-FOSDEM.html">FOSDEM's talk</a> about <a href="https://fosdem.org/2013/schedule/event/postgresql_implementing_high_availability/">Implementing High Availability</a> you might have heard
that I really like working with <a href="http://wiki.postgresql.org/wiki/Skytools#PgQ">PGQ</a>. A new version has been released a while
ago, and the most recent verion is now <code>3.1.3</code>, as announced in the
<a href="http://www.postgresql.org/message-id/CACMqXCLD2je5VFqUCzjwC2s5QQVYLe6-4awJaRvqLSBEVw8_MQ@mail.gmail.com">Skytools 3.1.3</a> email.</p>

<center>
<p><img src="../../../images/software-upgrade.320.png" alt=""></p>
</center>

<center>
<p><em>Upgrade time!</em></p>
</center>

<h3>Skytools 3.1.3 enters debian</h3>

<p class="first">First news is that <em>Skytools 3.1.3</em> has been entering <a href="http://packages.debian.org/search?keywords=skytools3">debian</a> today (I hope
that by the time you reach that URL, it's been updated to show information
according to the news here, but I might be early). As there's current a
<em>debian freeze</em> to release <em>wheezy</em> (and you can help <a href="http://www.debian.org/News/2012/20121110">squash some bugs</a>), this
version is only getting uploaded to <em>experimental</em> for now. Thanks to the
tireless work of <a href="http://www.df7cb.de/blog/2012/apt.postgresql.org.html">Christoph Berg</a> though, this version is already available
from <a href="https://wiki.postgresql.org/wiki/Apt">apt.postgresql.org</a>.</p>


<h3>Upgrading to PGQ 3</h3>

<p class="first">The other news is that I've been testing <em>live upgrade</em> scenario where we want
to upgrade from <code>PGQ</code> to <code>PGQ3</code>, and it works pretty well, and it's quite simple
to achieve too. Here's how.</p>

<p>So the first thing is to shut down the current <em>ticker</em> process. Then we
install the new packages, assuming that you did follow the step in the wiki
pointed above, please go read <a href="https://wiki.postgresql.org/wiki/Apt">apt.postgresql.org</a> again now if needs be.</p>

<pre class="src">
pgqadm.py ticker.ini -s
sudo apt-get install postgresql-9.1-pgq3 skytools3-ticker skytools3
</pre>

<p>The ticker is not running anymore, we have the right version of the software
installed. Next step is to upgrade the database parts of PGQ:</p>

<pre class="src">
psql -f /usr/share/skytools3/pgq.upgrade_2.1_to_3.0.sql ...
psql -1 -f /usr/share/postgresql/9.1/contrib/pgq.upgrade.sql ...
</pre>

<p>Of course replace those <code>...</code> with options such as your actual connection
string. I tend to always add <code>-vON_ERROR_STOP=1</code> to all these
commands, so that I don't depend on having the right <code>.psqlrc</code> on the
particular server I'm connected to. Also remember that if you want to do
that for more than one database, you need to actually run that pair of
commands for each of them.</p>

<p>Now it's time to restart the new ticker. The main changes from the previous
one is that it is now a <code>C</code> program called <code>pgqd</code> that knows how to tick for any
number of <em>databases</em>, so that you only have to have <em>one instance</em> around <em>per
cluster</em> now.</p>

<pre class="src">
sudo /etc/init.d/skytools3 start
tail -f /var/log/skytools/pgqd.log
</pre>

<p>Those two commands are taking for granted that you did prepare the <code>pgqd</code>
setup the <em>debian</em> and <em>skytools</em> way, by adding your config in
<code>/etc/skytools3/pgqd.ini</code> and editing <code>/etc/skytools.ini</code> accordingly, so that
it's automatically taken into account at machine boot.</p>

<p>Note that I did actually exercised the procedure above while running a
<a href="http://www.postgresql.org/docs/9.2/static/pgbench.html">pgbench</a> test replicated with <code>londiste</code>. Of course the replication has been
lagging a little while no <em>ticker</em> was running, and then it catched-up as fast
as it could, in that case:</p>

<pre class="src">
INFO {count: 245673, ignored: 0, duration: 422.104366064}
</pre>


<h3>Happy Hacking!</h3>

<p class="first">So if you have any <em>batch processing</em> needs, remember to consider what PGQ has
to offer. And yes if you're running some cron job to compute things out of
the database for you, you are doing some <em>batch processing</em>.</p>

<center>
<p><img src="../../../images/hayseed.jpg" alt=""></p>
</center>

<center>
<p><em>Yes, I did search for Transactional Batch Processing</em></p>
</center>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 08 Feb 2013 15:52:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/08-PGQ-Live-Upgrade.html</guid>
</item>
<item>
  <title>Another Great FOSDEM</title>
  <link>http://tapoueh.org/blog/2013/02/04-Another-great-FOSDEM.html</link>
  <description><![CDATA[<p>This year's FOSDEM has been a great edition, in particular the
<a href="http://fosdem2013.pgconf.eu/">FOSDEM PGDAY 2013</a> was a great way to begin a 3 days marathon of talking
about PostgreSQL with people not only from our community but also from
plenty other Open Source communities too: users!</p>

<center>
<p><a class="image-link" href="https://fosdem.org/2013/">
<img src="../../../images/fosdem-logo.png"></a><a class="image-link" href="http://fosdem2013.pgconf.eu/">
<img src="../../../images/postgresql-elephant.small.png"></a></p>
</center>

<center>
<p><em>PostgreSQL at FOSDEM made for a great event</em></p>
</center>

<p>Having had the opportunity to meet more people from those other development
communities, I really think we should go and reach for them in their own
conferences. About any PostgreSQL community member I've been talking about
with about that idea seemed to agree and generally already was thinking the
same thing. And most are already doing it, in fact...</p>

<p>I had the pleasure to run two conferences there, both in the <a href="https://fosdem.org/2013/schedule/track/postgresql/">PostgreSQL devroom</a>.</p>

<h3>Event Triggers</h3>

<p class="first">I'm currently in the middle of implementing <em>Event Triggers</em> for PostgreSQL
and I have been for about the last 2 years. It's a quite complex feature to
get right and so the patch itself is complex and large, which means the
reviewing process is complex and takes time.</p>

<p>That also means that some parts of the design have already been redone
completely at least 3 times, and that what got <em>commited</em> to the PostgreSQL
code is nothing like what the design we decided should go in looks like.
That's just a fact of life, maybe, but that makes for a very long
development process.</p>

<p>We're now getting to the end of it though, and this talk is showing both
where we want to go with <em>Event Triggers</em>, where we are now and what remains
to be done for 9.3 if we want the feature to be any useful.</p>

<p>If you're interested into that development, have a look at the slide deck
and possibly ask me some questions about what's not clear on the
<a href="http://www.postgresql.org/list/pgsql-hackers/">pgsql-hackers</a> mailing list (preferably).</p>

<center>
<p><a class="image-link" href="../../../images/confs/Fosdem2013_Event_Triggers.pdf">
<img src="../../../images/confs/Fosdem2013_Event_Triggers.png"></a></p>
</center>

<center>
<p><em>Event Triggers, The Real Mess™</em></p>
</center>

<p>The other way to get summarized and clear information about Event Triggers
is the wiki page by the same name: <a href="http://wiki.postgresql.org/wiki/Event_Triggers">Event Triggers</a>.</p>

<p>You will see that while a lot has been done (internal refactoring, adding
new infrastructure and SQL level commands, and the minimum <code>PLpgSQL</code> support);
a lot remains to be done where the code has already been submitted several
times, following several designs directions given by careful review on
hackers, and still we have some choices to make.</p>


<h3>Implementing High-Availability</h3>

<p class="first">This talk is showing several ways to implement <em>High Availability</em> with
PostgreSQL. The fact is that that term is overloaded already, and usually
covers two very different things which are <em>Service Availability</em> and <em>Data
Availability</em>.</p>

<p>In the talk, we're showing up several techniques that you can use to address
different set of compromises in between <em>scaling</em>, <em>load balancing</em>, <em>data
availability</em> and <em>durability</em>, and <em>service availability</em>. The first two points
could seem unrelated to the main topic, but <em>scaling</em> often is a simple enough
way to achieve <em>service availability</em>... until you need to think about
<em>sharding</em>, that is.</p>

<center>
<p><a class="image-link" href="../../../images/confs/Fosdem2013_High_Availability.pdf">
<img src="../../../images/confs/Fosdem2013_High_Availability.640.png"></a></p>
</center>

<center>
<p><em>Implementing High Availability of Services and Data with PostgreSQL</em></p>
</center>

<p>So the talk is all about making compromises in between them and getting to
an architecture able to implement the choosen compromises. While the talk
has been pretty well received, it was delivered in a 50 mins slot where we
usually take a whole day or three when addressing that problems at a
customer's site.</p>

<p>Some parts of how to get to the right architecture for the compromises that
are important for you can't be fully covered in that time slot, while still
being able to actually present the techniques that we're using.</p>

<p>I think it might be useful to extract a single use-case or two from that
talk then have a full 50 mins version reduced to a single or a couple of
very clear compromises and how to achieve them in details, rather than
trying to present a full range of techniques and how to use them in
different scenarios.</p>


<h3>FOSDEM</h3>

<p class="first">After having been talking with many people, it appears that for next year's
edition I should be proposing a more general talk that aims at helping
developpers in other communities (python, ruby, etc) discover what's in for
them in PostgreSQL. This database is full of advanced features that are
really easy to use, and the only problem when preparing such a talk is
choosing the right subset...</p>

<p>If you're running a local developper user group and are interested into
learning some more about how PostgreSQL can help you in a daily basis,
please do get in touch with me and let's schedule a presentation together!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 04 Feb 2013 09:55:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/04-Another-great-FOSDEM.html</guid>
</item>
<item>
  <title>Another Great FOSDEM</title>
  <link>http://tapoueh.org/blog/2013/02/04-Another-great-FOSDEM.html</link>
  <description><![CDATA[<p>This year's FOSDEM has been a great edition, in particular the
<a href="http://fosdem2013.pgconf.eu/">FOSDEM PGDAY 2013</a> was a great way to begin a 3 days marathon of talking
about PostgreSQL with people not only from our community but also from
plenty other Open Source communities too: users!</p>

<center>
<p><a class="image-link" href="https://fosdem.org/2013/">
<img src="../../../images/fosdem-logo.png"></a><a class="image-link" href="http://fosdem2013.pgconf.eu/">
<img src="../../../images/postgresql-elephant.small.png"></a></p>
</center>

<center>
<p><em>PostgreSQL at FOSDEM made for a great event</em></p>
</center>

<p>Having had the opportunity to meet more people from those other development
communities, I really think we should go and reach for them in their own
conferences. About any PostgreSQL community member I've been talking about
with about that idea seemed to agree and generally already was thinking the
same thing. And most are already doing it, in fact...</p>

<p>I had the pleasure to run two conferences there, both in the <a href="https://fosdem.org/2013/schedule/track/postgresql/">PostgreSQL devroom</a>.</p>

<h3>Event Triggers</h3>

<p class="first">I'm currently in the middle of implementing <em>Event Triggers</em> for PostgreSQL
and I have been for about the last 2 years. It's a quite complex feature to
get right and so the patch itself is complex and large, which means the
reviewing process is complex and takes time.</p>

<p>That also means that some parts of the design have already been redone
completely at least 3 times, and that what got <em>commited</em> to the PostgreSQL
code is nothing like what the design we decided should go in looks like.
That's just a fact of life, maybe, but that makes for a very long
development process.</p>

<p>We're now getting to the end of it though, and this talk is showing both
where we want to go with <em>Event Triggers</em>, where we are now and what remains
to be done for 9.3 if we want the feature to be any useful.</p>

<p>If you're interested into that development, have a look at the slide deck
and possibly ask me some questions about what's not clear on the
<a href="http://www.postgresql.org/list/pgsql-hackers/">pgsql-hackers</a> mailing list (preferably).</p>

<center>
<p><a class="image-link" href="../../../images/confs/Fosdem2013_Event_Triggers.pdf">
<img src="../../../images/confs/Fosdem2013_Event_Triggers.png"></a></p>
</center>

<center>
<p><em>Event Triggers, The Real Mess™</em></p>
</center>

<p>The other way to get summarized and clear information about Event Triggers
is the wiki page by the same name: <a href="http://wiki.postgresql.org/wiki/Event_Triggers">Event Triggers</a>.</p>

<p>You will see that while a lot has been done (internal refactoring, adding
new infrastructure and SQL level commands, and the minimum <code>PLpgSQL</code> support);
a lot remains to be done where the code has already been submitted several
times, following several designs directions given by careful review on
hackers, and still we have some choices to make.</p>


<h3>Implementing High-Availability</h3>

<p class="first">This talk is showing several ways to implement <em>High Availability</em> with
PostgreSQL. The fact is that that term is overloaded already, and usually
covers two very different things which are <em>Service Availability</em> and <em>Data
Availability</em>.</p>

<p>In the talk, we're showing up several techniques that you can use to address
different set of compromises in between <em>scaling</em>, <em>load balancing</em>, <em>data
availability</em> and <em>durability</em>, and <em>service availability</em>. The first two points
could seem unrelated to the main topic, but <em>scaling</em> often is a simple enough
way to achieve <em>service availability</em>... until you need to think about
<em>sharding</em>, that is.</p>

<center>
<p><a class="image-link" href="../../../images/confs/Fosdem2013_High_Availability.pdf">
<img src="../../../images/confs/Fosdem2013_High_Availability.640.png"></a></p>
</center>

<center>
<p><em>Implementing High Availability of Services and Data with PostgreSQL</em></p>
</center>

<p>So the talk is all about making compromises in between them and getting to
an architecture able to implement the choosen compromises. While the talk
has been pretty well received, it was delivered in a 50 mins slot where we
usually take a whole day or three when addressing that problems at a
customer's site.</p>

<p>Some parts of how to get to the right architecture for the compromises that
are important for you can't be fully covered in that time slot, while still
being able to actually present the techniques that we're using.</p>

<p>I think it might be useful to extract a single use-case or two from that
talk then have a full 50 mins version reduced to a single or a couple of
very clear compromises and how to achieve them in details, rather than
trying to present a full range of techniques and how to use them in
different scenarios.</p>


<h3>FOSDEM</h3>

<p class="first">After having been talking with many people, it appears that for next year's
edition I should be proposing a more general talk that aims at helping
developpers in other communities (python, ruby, etc) discover what's in for
them in PostgreSQL. This database is full of advanced features that are
really easy to use, and the only problem when preparing such a talk is
choosing the right subset...</p>

<p>If you're running a local developper user group and are interested into
learning some more about how PostgreSQL can help you in a daily basis,
please do get in touch with me and let's schedule a presentation together!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 04 Feb 2013 09:55:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/04-Another-great-FOSDEM.html</guid>
</item>
<item>
  <title>Another Great FOSDEM</title>
  <link>http://tapoueh.org/blog/2013/02/04-Another-great-FOSDEM.html</link>
  <description><![CDATA[<p>This year's FOSDEM has been a great edition, in particular the
<a href="http://fosdem2013.pgconf.eu/">FOSDEM PGDAY 2013</a> was a great way to begin a 3 days marathon of talking
about PostgreSQL with people not only from our community but also from
plenty other Open Source communities too: users!</p>

<center>
<p><a class="image-link" href="https://fosdem.org/2013/">
<img src="../../../images/fosdem-logo.png"></a><a class="image-link" href="http://fosdem2013.pgconf.eu/">
<img src="../../../images/postgresql-elephant.small.png"></a></p>
</center>

<center>
<p><em>PostgreSQL at FOSDEM made for a great event</em></p>
</center>

<p>Having had the opportunity to meet more people from those other development
communities, I really think we should go and reach for them in their own
conferences. About any PostgreSQL community member I've been talking about
with about that idea seemed to agree and generally already was thinking the
same thing. And most are already doing it, in fact...</p>

<p>I had the pleasure to run two conferences there, both in the <a href="https://fosdem.org/2013/schedule/track/postgresql/">PostgreSQL devroom</a>.</p>

<h3>Event Triggers</h3>

<p class="first">I'm currently in the middle of implementing <em>Event Triggers</em> for PostgreSQL
and I have been for about the last 2 years. It's a quite complex feature to
get right and so the patch itself is complex and large, which means the
reviewing process is complex and takes time.</p>

<p>That also means that some parts of the design have already been redone
completely at least 3 times, and that what got <em>commited</em> to the PostgreSQL
code is nothing like what the design we decided should go in looks like.
That's just a fact of life, maybe, but that makes for a very long
development process.</p>

<p>We're now getting to the end of it though, and this talk is showing both
where we want to go with <em>Event Triggers</em>, where we are now and what remains
to be done for 9.3 if we want the feature to be any useful.</p>

<p>If you're interested into that development, have a look at the slide deck
and possibly ask me some questions about what's not clear on the
<a href="http://www.postgresql.org/list/pgsql-hackers/">pgsql-hackers</a> mailing list (preferably).</p>

<center>
<p><a class="image-link" href="../../../images/confs/Fosdem2013_Event_Triggers.pdf">
<img src="../../../images/confs/Fosdem2013_Event_Triggers.png"></a></p>
</center>

<center>
<p><em>Event Triggers, The Real Mess™</em></p>
</center>

<p>The other way to get summarized and clear information about Event Triggers
is the wiki page by the same name: <a href="http://wiki.postgresql.org/wiki/Event_Triggers">Event Triggers</a>.</p>

<p>You will see that while a lot has been done (internal refactoring, adding
new infrastructure and SQL level commands, and the minimum <code>PLpgSQL</code> support);
a lot remains to be done where the code has already been submitted several
times, following several designs directions given by careful review on
hackers, and still we have some choices to make.</p>


<h3>Implementing High-Availability</h3>

<p class="first">This talk is showing several ways to implement <em>High Availability</em> with
PostgreSQL. The fact is that that term is overloaded already, and usually
covers two very different things which are <em>Service Availability</em> and <em>Data
Availability</em>.</p>

<p>In the talk, we're showing up several techniques that you can use to address
different set of compromises in between <em>scaling</em>, <em>load balancing</em>, <em>data
availability</em> and <em>durability</em>, and <em>service availability</em>. The first two points
could seem unrelated to the main topic, but <em>scaling</em> often is a simple enough
way to achieve <em>service availability</em>... until you need to think about
<em>sharding</em>, that is.</p>

<center>
<p><a class="image-link" href="../../../images/confs/Fosdem2013_High_Availability.pdf">
<img src="../../../images/confs/Fosdem2013_High_Availability.640.png"></a></p>
</center>

<center>
<p><em>Implementing High Availability of Services and Data with PostgreSQL</em></p>
</center>

<p>So the talk is all about making compromises in between them and getting to
an architecture able to implement the choosen compromises. While the talk
has been pretty well received, it was delivered in a 50 mins slot where we
usually take a whole day or three when addressing that problems at a
customer's site.</p>

<p>Some parts of how to get to the right architecture for the compromises that
are important for you can't be fully covered in that time slot, while still
being able to actually present the techniques that we're using.</p>

<p>I think it might be useful to extract a single use-case or two from that
talk then have a full 50 mins version reduced to a single or a couple of
very clear compromises and how to achieve them in details, rather than
trying to present a full range of techniques and how to use them in
different scenarios.</p>


<h3>FOSDEM</h3>

<p class="first">After having been talking with many people, it appears that for next year's
edition I should be proposing a more general talk that aims at helping
developpers in other communities (python, ruby, etc) discover what's in for
them in PostgreSQL. This database is full of advanced features that are
really easy to use, and the only problem when preparing such a talk is
choosing the right subset...</p>

<p>If you're running a local developper user group and are interested into
learning some more about how PostgreSQL can help you in a daily basis,
please do get in touch with me and let's schedule a presentation together!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 04 Feb 2013 09:55:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/04-Another-great-FOSDEM.html</guid>
</item>
<item>
  <title>A Sunday at FOSDEM</title>
  <link>http://tapoueh.org/blog/2013/01/30-A-Sunday-at-FOSDEM.html</link>
  <description><![CDATA[<p>The previous article <a href="29-FOSDEM-2013.html">FOSDEM 2013</a> said to be careful with the
<a href="https://fosdem.org/2013/schedule/track/postgresql/">PostgreSQL devroom schedule</a> because one of my talks there might get swapped
with a slot on the <a href="http://fosdem2013.pgconf.eu/">FOSDEM PGDay 2013</a> which happens <strong><em>this Friday</em></strong> and has been
sold out anyway.</p>

<p>Turns out it's not true, because we still depend on past century
technologies somehow. Not everybody will be looking at the schedule on the
web using a connected mobile device (you know, you've heard of them, those
<em>tracking and surveillance devices</em>, if you want to believe <a href="http://stallman.org/rms-lifestyle.html">Stallman</a>), and as
the schedule gets printed on little paper sheets, it's unfortunately too
late to change it now.</p>

<center>
<p><a class="image-link" href="https://fosdem.org/2013/">
<img src="../../../images/fosdem.png"></a></p>
</center>

<center>
<p><em>Those flyers are already printed on paper sheets, the schedule too</em></p>
</center>

<p>So it happens that I'll be speaking twice on Sunday and not at all on Friday.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 30 Jan 2013 10:50:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/01/30-A-Sunday-at-FOSDEM.html</guid>
</item>
<item>
  <title>FOSDEM 2013</title>
  <link>http://tapoueh.org/blog/2013/01/29-FOSDEM-2013.html</link>
  <description><![CDATA[<p>This year again I'm going to <a href="https://fosdem.org/2013/">FOSDEM</a>, and to the extra special
<a href="http://fosdem2013.pgconf.eu/">PostgreSQL FOSDEM day</a>. It will be the first time that I'm going to be at the
event for the full week-end rather than just commuting in for the day.</p>

<center>
<p><a class="image-link" href="https://fosdem.org/2013/">
<img src="../../../images/fosdem.png"></a></p>
</center>

<center>
<p><em>I'm Going to the FOSDEM, hope to see you there!</em></p>
</center>

<p>And I'm presenting two talks over there that are both currently scheduled on
the Sunday in the <a href="https://fosdem.org/2013/schedule/track/postgresql/">PostgreSQL devroom</a>. We're talking about changing that
though, so that one of those will in fact happen <strong><em>this Friday</em></strong> at the
<a href="http://www.postgresql.eu/events/schedule/fosdem2013/">FOSDEM PGDay 2013</a>, which has a different schedule, so consider watching for
that.</p>

<p>One of those two talks is about <a href="https://fosdem.org/2013/schedule/event/postgresql_implementing_high_availability/">Implementing High Availability</a> (yes, with
PostgreSQL). It's been quite well received in the places I had to chance to
make it before (namely <em>PGDay France</em> and <em>PG Conf Europe</em>), and it's going to
be a stripped down version of it so that it fits well in the 45 mins slot we
have here.</p>

<p>The other talk is going to be about <a href="https://fosdem.org/2013/schedule/event/postgresql_event_triggers/">Event Triggers</a>, a feature new in
PostgreSQL 9.3 (due in september 2013, crossing fingers) and while the goal
of that talk is to introduce what the feature is all about and a bunch of
use cases that you can address by using it, it will certainly offer a peek
into the PostgreSQL development cycle and community processes.</p>



<center>
<p><img src="../../../images/belgium-beers.jpg" alt=""></p>
</center>

<center>
<p><em>See you in Brussels!</em></p>
</center>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 29 Jan 2013 10:11:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/01/29-FOSDEM-2013.html</guid>
</item>
<item>
  <title>pgloader: what's next?</title>
  <link>http://tapoueh.org/blog/2013/01/28-pgloader-future.html</link>
  <description><![CDATA[<p><a href="../../../pgsql/pgloader.html">pgloader</a> is a tool to help loading data into <a href="http://www.postgresql.org/">PostgreSQL</a>, adding some error
management to the <a href="http://www.postgresql.org/docs/9.2/interactive/sql-copy.html">COPY</a> command. <code>COPY</code> is the fast way of loading data into
PostgreSQL and is transaction safe. That means that if a single error
appears within your bulk of data, you will have loaded none of it. <code>pgloader</code>
will submit the data again in smaller chunks until it's able to isolate the
bad from the good, and then the good is loaded in.</p>

<center>
<p><img src="../../../images/PDL_Adapter-250.png" alt=""></p>
</center>

<center>
<p><em>Not quite this kind of data loader</em></p>
</center>

<p>In a recent migration project where we freed data from MySQL into
PostgreSQL, we used <code>pgloader</code> again. But the loading time was not fast enough
for the service downtime window that we had here. Indeed <a href="http://www.python.org/">Python</a> is not known
for being the fastest solution around. It's easy to use and to ship to
production, but sometimes you not only want to be able to be efficient when
writing code, you also need the code to actually run fast too.</p>

<h3>Faster data loading</h3>

<p class="first">So I began writing a little dedicated tool for that migration in <a href="http://cliki.net/">Common Lisp</a>
which is growing on me as my personal answer to the burning question: <em>python
2 or python 3</em>? I find <em>Common Lisp</em> to offer an even more dynamic programming
environment, an easier language to use, and the result often has
performances characteristics way beyond what I can get with python. Between
<a href="http://tapoueh.org/blog/2012/07/10-solving-sudoku.html">5 times faster</a> and <a href="http://tapoueh.org/blog/2012/08/20-performance-the-easiest-way.html">121 times faster</a> in some quite stupid benchmark.</p>

<p>Here, with real data, my one shot attempt has been running more than <em>twice
as fast</em> as the python version, after about a day of programming.</p>

<center>
<p><img src="../../../images/lisp-python.png" alt=""></p>
</center>

<center>
<p><em>See what's happening now?</em></p>
</center>

<p>The other thing here is that I've tempted to get <code>pgloader</code> work in parallel,
but at the time I didn't know about the <a href="http://docs.python.org/3/c-api/init.html#threads">Global Interpreter Lock</a> that they
didn't find how to remove in Python 3 still, by the way. So my threading
attempts at making <code>pgloader</code> work in parallel are pretty useless.</p>

<p>Whereas in <em>Common Lisp</em> I can just use the <a href="http://lparallel.org/">lparallel</a> lib, which exposes
threading facilities and some <em>queueing</em> facilities as a mean to communicate
data in between workers, and have my code easily work in parallel for real.</p>


<h3>Compatibility</h3>

<p class="first">The only drawback that I can see here is that if you've been writing your
own <em>reformating modules</em> in python for <code>pgloader</code> (yes you can
<a href="http://tapoueh.org/pgsql/pgloader.html#sec21">implement your own reformating module for pgloader</a>), then you would have to
port it to <em>Common Lisp</em>. Shout me an email if that's your case.</p>


<h3>Next version</h3>

<p class="first">So, I think we're going to have a <em>pgloader 3</em> someday, that will be way
faster than the current one, and bundle some more features: real parallel
behavior, ability to fetch non local data (connecting to MySQL directly, or
HTTP, S3, etc); and I'm thinking about offering a <code>COPY</code> like syntax to drive
the loading too, while at it. Also, the ability to discover the set of data
to load all by itself when you want to load a whole database: think of it as
a special <em>Migration</em> mode of operations.</p>

<p>Some feature requests can't be solved easily when keeping the old <code>.INI</code>
syntax cruft, so it's high time to implement some kind of a real command
language. I have several ideas about those, in between the <code>COPY</code> syntax and
the <code>SQL*Loader</code> configuration format, which is both clunky and quite
powerful, too.</p>

<p>After a beginning in <code>TCL</code> and a complete rewrite in python in <code>2005</code>, it looks
like <code>2013</code> is going to be the year of <em>pgloader 3</em>, in <em>Common Lisp</em>!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 28 Jan 2013 10:48:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/01/28-pgloader-future.html</guid>
</item>
<item>
  <title>Automated Setup for pgloader</title>
  <link>http://tapoueh.org/blog/2013/01/17-pgloader-auto-setup.html</link>
  <description><![CDATA[<p>Another day, another migration from <em>MySQL</em> to <a href="http://www.postgresql.org/">PostgreSQL</a>... or at least
that's how it feels sometimes. This time again I've been using some quite
old scripts to help me do the migration.</p>

<center>
<p><img src="../../../images/dauphin-logo.jpg" alt=""></p>
</center>

<center>
<p><em>That's how I feel for MySQL users</em></p>
</center>

<h3>Migrating the schema</h3>

<p class="first">For the <em>schema</em> parts, I've been using <a href="http://pgfoundry.org/projects/mysql2pgsql/">mysql2pgsql</a> with success for many
years. This tool is not complete and will do only about <em>80%</em> of the work. As
I think that the schema should always be validated manually when doing a
migration anyway, I happen to think that it's good news.</p>


<h3>Getting the data out</h3>

<p class="first">Then for the data parts I keep on using <a href="../../../pgsql/pgloader.html">pgloader</a>. The data is never quite
right, and the ability to filter out what you can't readily import in a
<em>reject</em> file proves itself a a must have here. The problems you have in the
exported MySQL data are quite serious:</p>

<center>
<p><img src="../../../images/data-unlocked.320.png" alt=""></p>
</center>

<center>
<p><em>Can I have my data please?</em></p>
</center>

<p>First, date formating is not compatible with what PostgreSQL expects,
sometimes using <code>20130117143218</code> instead of what we expect: <code>2013-01-17
14:32:18</code>, and of course even when the format is right (that seems to depend
on the MySQL server's version), you still have to transform the <code>0000-00-00
00:00:00</code> into <code>NULL</code>.</p>

<blockquote>
<p class="quoted">
Before thinking about the usage of that particular date rather than
using <code>NULL</code> when you don't have the information, you might want to
remember that there's no <a href="http://en.wikipedia.org/wiki/0_(year)">year zero</a> in the calendar, it's year 1 BC and
then year 1.</p>

</blockquote>

<p>Then, text encoding is often mixed up, even when the MySQL databases are
said to be in <em>latin1</em> or <em>unicode</em>, you somehow always end up finding texts in
<em>win1252</em> or some other <em>code page</em> in there.</p>

<p>And of course, MySQL provides no tool to export the data to <code>CSV</code>, so you have
to come up with your own. The <code>SELECT INTO OUTFILE</code> command on the server
produces non conforming CSV (<code>\n</code> can appear in non-escaped field contents),
and while the <code>mysql</code> client manual page details that it outputs <code>CSV</code> when
stdout is not a terminal, it won't even try to quote fields or escape <code>\t</code>
when they appear in the data.</p>

<p>So, we use the <a href="https://github.com/slardiere/mysqltocsv">mysqltocsv</a> little script to export the data, and then use
that data to feed <a href="../../../pgsql/pgloader.html">pgloader</a>.</p>


<h3>Loading the data in</h3>

<p class="first">Now, we have to write down a configuration file for pgloader to know what to
load and where to find the data. What about generating the file from the
database schema instead, using the query in <a href="generate-pgloader-config.sql">generate-pgloader-config.sql</a>:</p>

<pre class="src">
<span style="color: #7f007f;">with</span> reformat <span style="color: #7f007f;">as</span> (
   <span style="color: #7f007f;">select</span> relname, attnum, attname, typname,
          <span style="color: #7f007f;">case</span> typname
               <span style="color: #7f007f;">when</span> <span style="color: #bc8f8f;">'timestamptz'</span>
               <span style="color: #7f007f;">then</span> attname || <span style="color: #bc8f8f;">':mynull:timestamp'</span>
               <span style="color: #7f007f;">when</span> <span style="color: #bc8f8f;">'date'</span>
               <span style="color: #7f007f;">then</span> attname || <span style="color: #bc8f8f;">':mynull:date'</span>
           <span style="color: #7f007f;">end</span> <span style="color: #7f007f;">as</span> reformat
      <span style="color: #7f007f;">from</span> pg_class <span style="color: #da70d6;">c</span>
           <span style="color: #7f007f;">join</span> pg_namespace n <span style="color: #7f007f;">on</span> n.oid = <span style="color: #da70d6;">c</span>.relnamespace
           <span style="color: #7f007f;">left</span> <span style="color: #7f007f;">join</span> pg_attribute <span style="color: #da70d6;">a</span> <span style="color: #7f007f;">on</span> <span style="color: #da70d6;">c</span>.oid = <span style="color: #da70d6;">a</span>.attrelid
           <span style="color: #7f007f;">join</span> pg_type <span style="color: #da70d6;">t</span> <span style="color: #7f007f;">on</span> <span style="color: #da70d6;">t</span>.oid = <span style="color: #da70d6;">a</span>.atttypid
     <span style="color: #7f007f;">where</span> <span style="color: #da70d6;">c</span>.relkind = <span style="color: #bc8f8f;">'r'</span>
           <span style="color: #7f007f;">and</span> attnum &gt; 0
           <span style="color: #7f007f;">and</span> n.nspname = <span style="color: #bc8f8f;">'public'</span>
),
 config_reformat <span style="color: #7f007f;">as</span> (
  <span style="color: #7f007f;">select</span> relname,
         <span style="color: #bc8f8f;">'['</span>||relname||<span style="color: #bc8f8f;">']'</span> || E<span style="color: #bc8f8f;">'\n'</span> ||
         <span style="color: #bc8f8f;">'table = '</span> || relname || E<span style="color: #bc8f8f;">' \n'</span> ||
         <span style="color: #bc8f8f;">'filename = /path/to/csv/'</span> || relname || E<span style="color: #bc8f8f;">'.csv\n'</span> ||
         <span style="color: #bc8f8f;">'format = csv'</span> || E<span style="color: #bc8f8f;">'\n'</span> ||
         <span style="color: #bc8f8f;">'field_sep = \t'</span> || E<span style="color: #bc8f8f;">'\n'</span> ||
         <span style="color: #bc8f8f;">'columns = *'</span> || E<span style="color: #bc8f8f;">' \n'</span> ||
         <span style="color: #bc8f8f;">'reformat = '</span> || array_to_string(<span style="color: #da70d6;">array_agg</span>(reformat), <span style="color: #bc8f8f;">', '</span>)
         || E<span style="color: #bc8f8f;">'\n'</span> <span style="color: #7f007f;">as</span> config
    <span style="color: #7f007f;">from</span> reformat
   <span style="color: #7f007f;">where</span> reformat <span style="color: #7f007f;">is</span> <span style="color: #7f007f;">not</span> <span style="color: #7f007f;">null</span>
<span style="color: #7f007f;">group</span> <span style="color: #da70d6;">by</span> relname
),
 noreformat <span style="color: #7f007f;">as</span> (
   <span style="color: #7f007f;">select</span> relname, bool_and(reformat <span style="color: #7f007f;">is</span> <span style="color: #7f007f;">null</span>) <span style="color: #7f007f;">as</span> noreformating
     <span style="color: #7f007f;">from</span> reformat
 <span style="color: #7f007f;">group</span> <span style="color: #da70d6;">by</span> relname
),
 config_noreformat <span style="color: #7f007f;">as</span> (
  <span style="color: #7f007f;">select</span> relname,
         <span style="color: #bc8f8f;">'['</span>||relname||<span style="color: #bc8f8f;">']'</span> || E<span style="color: #bc8f8f;">'\n'</span> ||
         <span style="color: #bc8f8f;">'table = '</span> || relname || E<span style="color: #bc8f8f;">' \n'</span> ||
         <span style="color: #bc8f8f;">'filename = /path/to/csv/'</span> || relname || E<span style="color: #bc8f8f;">'.csv\n'</span> ||
         <span style="color: #bc8f8f;">'format = csv'</span> || E<span style="color: #bc8f8f;">'\n'</span> ||
         <span style="color: #bc8f8f;">'field_sep = \t'</span> || E<span style="color: #bc8f8f;">'\n'</span> ||
         <span style="color: #bc8f8f;">'columns = *'</span> || E<span style="color: #bc8f8f;">' \n'</span>
         || E<span style="color: #bc8f8f;">'\n'</span> <span style="color: #7f007f;">as</span> config
    <span style="color: #7f007f;">from</span> reformat <span style="color: #7f007f;">join</span> noreformat <span style="color: #7f007f;">using</span> (relname)
   <span style="color: #7f007f;">where</span> noreformating
<span style="color: #7f007f;">group</span> <span style="color: #da70d6;">by</span> relname
),
allconfs <span style="color: #7f007f;">as</span> (
    <span style="color: #7f007f;">select</span> relname, config <span style="color: #7f007f;">from</span> config_reformat
 <span style="color: #7f007f;">union</span> <span style="color: #7f007f;">all</span>
    <span style="color: #7f007f;">select</span> relname, config <span style="color: #7f007f;">from</span> config_noreformat
)
<span style="color: #7f007f;">select</span> config
  <span style="color: #7f007f;">from</span> allconfs
 <span style="color: #7f007f;">where</span> relname <span style="color: #7f007f;">not</span> <span style="color: #7f007f;">in</span> (<span style="color: #bc8f8f;">'tables'</span>, <span style="color: #bc8f8f;">'wedont'</span>, <span style="color: #bc8f8f;">'wantto'</span>, <span style="color: #bc8f8f;">'load'</span>)
 <span style="color: #7f007f;">order</span> <span style="color: #da70d6;">by</span> relname;
</pre>

<p>To work with the setup generated, you will have to prepend a global section
for pgloader and to include a reformating module in python, that I named
<a href="mynull.py">mynull.py</a>:</p>

<pre class="src">
<span style="color: #b22222;"># Author: Dimitri Fontaine &lt;<a href="mailto:dimitri&#64;2ndQuadrant.fr">dimitri&#64;2ndQuadrant.fr</a>&gt;
#
# pgloader mysql reformating module
</span>
<span style="color: #7f007f;">def</span> <span style="color: #0000ff;">timestamp</span>(reject, <span style="color: #da70d6;">input</span>):
    <span style="color: #bc8f8f;">""" Reformat str as a PostgreSQL timestamp

    MySQL timestamps are ok this time:  2012-12-18 23:38:12
    But may contain the infamous all-zero date, where we want NULL.
    """</span>
    <span style="color: #7f007f;">if</span> <span style="color: #da70d6;">input</span> == <span style="color: #bc8f8f;">'0000-00-00 00:00:00'</span>:
        <span style="color: #7f007f;">return</span> <span style="color: #5f9ea0;">None</span>

    <span style="color: #7f007f;">return</span> <span style="color: #da70d6;">input</span>

<span style="color: #7f007f;">def</span> <span style="color: #0000ff;">date</span>(reject, <span style="color: #da70d6;">input</span>):
    <span style="color: #bc8f8f;">""" date columns can also have '0000-00-00'"""</span>
    <span style="color: #7f007f;">if</span> <span style="color: #da70d6;">input</span> == <span style="color: #bc8f8f;">'0000-00-00'</span>:
        <span style="color: #7f007f;">return</span> <span style="color: #5f9ea0;">None</span>

    <span style="color: #7f007f;">return</span> <span style="color: #da70d6;">input</span>
</pre>

<p>Now you can launch <code>pgloader</code> and profit!</p>


<h3>Conclusion</h3>

<p class="first">There are plenty of tools to assist you migrating away from MySQL and other
databases. When you make that decision, you're not alone, and it's easy
enough to find people to come and help you.</p>

<p>While MySQL is Open Source and is not a <em>lock in</em> from a licencing
perspective, I still find it hard to swallow that there's no provided tools
for getting data out in a sane format, and that so many little
inconsistencies exist in the product with respect to data handling (try to
have a <code>NOT NULL</code> column, then enjoy the default empty strings that have been
put in there). So at this point, yes, I consider that moving to <a href="http://www.postgresql.org/">PostgreSQL</a>
is a way to <em>free your data</em>:</p>

<center>
<p><img src="../../../images/free-our-open-data.jpg" alt=""></p>
</center>

<center>
<p><em>Free your data!</em></p>
</center>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 17 Jan 2013 14:32:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/01/17-pgloader-auto-setup.html</guid>
</item>
<item>
  <title>Lost in scope</title>
  <link>http://tapoueh.org/blog/2013/01/09-Lost-in-scope.html</link>
  <description><![CDATA[<p>Thanks to <a href="https://twitter.com/mickael/status/288795520179240962">Mickael</a> on <em>twitter</em> I got to read an article about loosing scope
with some common programming languages. As the blog article <a href="https://my.smeuh.org/al/blog/lost-in-scope">Lost in scope</a>
references <em>functional programming languages</em> and plays with both <em>Javascript</em>
and <em>Erlang</em>, I though I had to try it out with <em>Common Lisp</em> too.</p>

<center>
<p><img src="../../../images/lambda.png" alt=""></p>
</center>

<center>
<p><em>Let's have fun with lambda!</em></p>
</center>

<p>So, here we go with a simple Common Lisp attempt. The <em>Lost in scope</em> article
begins with defining a very simple function returning a boolean value, only
true when it's not <code>monday</code>.</p>

<h3>Monday is special</h3>

<p class="first">Keep in mind that the following example has been choosen to be simple yet
offer a case of <em>lexical binding shadowing</em>. It looks convoluted. Focus on the
<code>day</code> binding.</p>

<pre class="src">
(<span style="color: #7f007f;">defparameter</span> <span style="color: #b8860b;">*days*</span>
  '(monday tuesday wednesday thursday friday saturday sunday)
  <span style="color: #bc8f8f;">"List of days in the week"</span>)

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">any-day-but-monday?</span> (day)
  <span style="color: #bc8f8f;">"Returns a generalized boolean, true unless DAY is 'monday"</span>
  (member day (remove-if (<span style="color: #7f007f;">lambda</span> (day) (eq day 'monday)) *days*)))
</pre>

<p>So as you can see, in <em>Common Lisp</em> we just get away with a list of symbols
rather than a string that we split to have a list of strings, or an array of
strings, as in the examples with <em>python</em> and <em>ruby</em>.</p>

<p>Now, the <em>generalized boolean</em> is either <code>nil</code> to mean false, or anything else
to mean <code>true</code>, and in that example the return value of <a href="http://www.lispworks.com/documentation/HyperSpec/Body/a_member.htm">member</a> is a sub-list
that begins where the <em>member</em> was found:</p>

<pre class="src">
CL-USER&gt; (any-day-but-monday? 'monday)
NIL

CL-USER&gt; (any-day-but-monday? 'tuesday)
(TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY)
</pre>

<p>Oh, and as we work with <em>Common Lisp</em>, we're having a real <a href="http://www.gigamonkeys.com/book/lather-rinse-repeat-a-tour-of-the-repl.html">REPL</a> where to play
directly with our code, no need to add <em>interactive</em> stanzas in the main
program text file just to be able to play with it. In <a href="http://common-lisp.net/project/slime/">Emacs Slime</a> we just
use <code>C-M-x</code> on a <em>form</em> to have it available in the <em>REPL</em>, or <code>C-c C-l</code> to load the
whole file we're working on.</p>

<p>So, we see that <em>Common Lisp</em> scoping rules are silently doing the right thing
here. Within the <a href="http://www.lispworks.com/documentation/HyperSpec/Body/f_rm_rm.htm">remove-if</a> call we define a <em>lambda</em> function taking a single
parameter called <em>day</em>. It so happens that this parameter is shadowing the
<em>any-day-but-monday?</em> function parameter, and that shadowing only happens in
the <em>lexical scope</em> of the <em>lambda</em> we are creating. For a detailed discussion
about that concept, I would refer you to the <a href="http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node43.html">Scope and Extent</a> chapter of
<em>Common Lisp the Language, 2nd Edition</em>.</p>

<p>In <em>Common Lisp</em> we have both <em>lexical scope</em> and <em>dynamic extents</em>, and a
variable defined with <em>defparameter</em> or <em>defvar</em> or that you otherwise <a href="http://www.lispworks.com/documentation/HyperSpec/Body/s_declar.htm">declare</a>
<em>special</em> will have a <em>dynamic extent</em>. Hence this section title.</p>


<h3>Closures</h3>

<p class="first">Now, the <a href="https://my.smeuh.org/al/blog/lost-in-scope">lost in scope</a> article tries some more at finding a solution around
the scoping rules of the <em>python</em> and <em>ruby</em> languages, where the developer can
not easily instruct the language about the scoping rules he wants to be
using in a case by case way, as far as I can see.</p>

<p>First, let's reproduce the problem by using a single variable that we bind
in all the closures. Those are called <em>callbacks</em> in the original article, so
I've kept using that name here.</p>

<center>
<p><img src="../../../images/callback.jpg" alt=""></p>
</center>

<pre class="src">
(<span style="color: #7f007f;">defparameter</span> <span style="color: #b8860b;">*callbacks-all-sunday*</span>
    (<span style="color: #7f007f;">loop</span>
       for day in *days*
       collect (<span style="color: #7f007f;">lambda</span> () day))
  <span style="color: #bc8f8f;">"loop binds DAY only once"</span>)
</pre>

<p>In that example, there's only a single variable day that we reuse throughout
the <em>loop</em> construct, so that when the loop ends, we have a list of closures
all refering to the same variable, and this variable, by the end of the
loop, has <code>sunday</code> as its value.</p>

<pre class="src">
CL-USER&gt; (mapcar #'funcall *callbacks-all-sunday*)
(SUNDAY SUNDAY SUNDAY SUNDAY SUNDAY SUNDAY SUNDAY)
</pre>


<h3>Closures, take 2</h3>

<p class="first">Now, the way to have what we want here, that is a list of closures each
having its own variable.</p>

<pre class="src">

(<span style="color: #7f007f;">defparameter</span> <span style="color: #b8860b;">*callbacks*</span>
  (mapcar (<span style="color: #7f007f;">lambda</span> (day)
            <span style="color: #b22222;">;; </span><span style="color: #b22222;">for each day, produce a separate closure
</span>            <span style="color: #b22222;">;; </span><span style="color: #b22222;">around its own lexical variable day
</span>            (<span style="color: #7f007f;">lambda</span> () day))
          *days*)
  <span style="color: #bc8f8f;">"A list of callbacks to return the current day..."</span>)
</pre>

<p>And there we go:</p>

<pre class="src">
CL-USER&gt; (mapcar #'funcall *callbacks*)
(MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY)
</pre>


<h3>Conclusion</h3>

<p class="first">Scoping rules are very important in any programming language, functional or
not, and must be well understood by programmers. I find that once again,
that topic has received a very deep thinking in <em>Common Lisp</em>, and the
language is giving all the options to its developers.</p>

<center>
<p><img src="../../../images/scope.png" alt=""></p>
</center>

<center>
<p><em>What are your language of choice scoping rules?</em></p>
</center>

<p>I want to stress that in <em>Common Lisp</em> the scope rules are very clearly
defined in the standard documentation of the language. For instance, <em>defun</em>
and <em>let</em> both introduce a lexical binding, <em>defvar</em> and <em>defparameter</em> introduce
a <em>dynamic variable</em>.</p>

<p>Also, as a user of the language you have the ability to <em>declare</em> any variable
as being <em>special</em> in order to introduce yourself a <em>dynamic variable</em>. In <code>C</code> you
can declare some variables as being <em>static</em>, which is something else and
frown with a very different set of problems.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 09 Jan 2013 11:07:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/01/09-Lost-in-scope.html</guid>
</item>
<item>
  <title>Extensions Templates</title>
  <link>http://tapoueh.org/blog/2013/01/08-Extensions-Templates.html</link>
  <description><![CDATA[<p>In a recent article titled <a href="../../2012/12/13-Inline-Extensions.html">Inline Extensions</a> we detailed the problem of how
to distribute an extension's <em>package</em> to a remote server without having
access to its file system at all. The solution to that problem is non
trivial, let's say. But thanks to the awesome <a href="http://www.postgresql.org/community/">PostgreSQL Community</a> we finaly
have some practical ideas on how to address the problem as discussed on
<a href="http://archives.postgresql.org/pgsql-hackers/">pgsql-hackers</a>, our development mailing list.</p>

<center>
<p><img src="../../../images/community.jpg" alt="">
<em>PostgreSQL is first an Awesome Community</em></p>
</center>

<p>The solution we talked about is to use <em>templates</em>, and so I've been working
on a patch to bring <em>templates for extensions</em> to PostgreSQL. As we're talking
about 3 new system catalogs, that's a big patch in term of lines of code. In
term of features though, it's quite an easy one.</p>

<p>Here's how it goes. Let's say you want to prepare the system to be able to
<code>CREATE EXTENSION pair;</code> without having to install it as an <em>OS package</em> for
which you would need to get <code>root</code> access on the server where your PostgreSQL
instance is running, which is not always easy, and sometimes not a good
idea.</p>

<h3>Installing an extension template</h3>

<p class="first">With the <a href="http://www.postgresql.org/message-id/m2wqvoha0p.fsf%402ndQuadrant.fr">template patch</a> I just sent on the lists, what you can do is prepare
a template with your extension's script and properties, then use it to
install the extensions.</p>

<pre class="src">
<span style="color: #7f007f;">create</span> <span style="color: #da70d6;">template</span>
   <span style="color: #7f007f;">for</span> extension pair <span style="color: #7f007f;">default</span> <span style="color: #da70d6;">version</span> <span style="color: #bc8f8f;">'1.0'</span>
  <span style="color: #7f007f;">with</span> (<span style="color: #da70d6;">nosuperuser</span>, norelocatable, <span style="color: #da70d6;">schema</span> <span style="color: #da70d6;">public</span>)
<span style="color: #7f007f;">as</span> $$
  <span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">TYPE</span> <span style="color: #0000ff;">pair</span> <span style="color: #7f007f;">AS</span> ( <span style="color: #da70d6;">k</span> <span style="color: #228b22;">text</span>, v <span style="color: #228b22;">text</span> );

  <span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">OR</span> <span style="color: #da70d6;">REPLACE</span> <span style="color: #da70d6;">FUNCTION</span> <span style="color: #0000ff;">pair</span>(anyelement, <span style="color: #228b22;">text</span>)
  <span style="color: #da70d6;">RETURNS</span> pair <span style="color: #da70d6;">LANGUAGE</span> <span style="color: #da70d6;">SQL</span> <span style="color: #7f007f;">AS</span> <span style="color: #bc8f8f;">'SELECT ROW($1, $2)::pair'</span>;

  <span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">OR</span> <span style="color: #da70d6;">REPLACE</span> <span style="color: #da70d6;">FUNCTION</span> <span style="color: #0000ff;">pair</span>(<span style="color: #228b22;">text</span>, anyelement)
  <span style="color: #da70d6;">RETURNS</span> pair <span style="color: #da70d6;">LANGUAGE</span> <span style="color: #da70d6;">SQL</span> <span style="color: #7f007f;">AS</span> <span style="color: #bc8f8f;">'SELECT ROW($1, $2)::pair'</span>;

  <span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">OR</span> <span style="color: #da70d6;">REPLACE</span> <span style="color: #da70d6;">FUNCTION</span> <span style="color: #0000ff;">pair</span>(anyelement, anyelement)
  <span style="color: #da70d6;">RETURNS</span> pair <span style="color: #da70d6;">LANGUAGE</span> <span style="color: #da70d6;">SQL</span> <span style="color: #7f007f;">AS</span> <span style="color: #bc8f8f;">'SELECT ROW($1, $2)::pair'</span>;

  <span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">OR</span> <span style="color: #da70d6;">REPLACE</span> <span style="color: #da70d6;">FUNCTION</span> <span style="color: #0000ff;">pair</span>(<span style="color: #228b22;">text</span>, <span style="color: #228b22;">text</span>)
  <span style="color: #da70d6;">RETURNS</span> pair <span style="color: #da70d6;">LANGUAGE</span> <span style="color: #da70d6;">SQL</span> <span style="color: #7f007f;">AS</span> <span style="color: #bc8f8f;">'SELECT ROW($1, $2)::pair;'</span>;
$$;
</pre>


<h3>Installing an extension from a template</h3>

<p class="first">With the template installed in the catalogs, now you can go and install your
extension:</p>

<pre class="src">
foo&gt; <span style="color: #7f007f;">create</span> extension pair;
<span style="color: #7f007f;">CREATE</span> EXTENSION

foo&gt; \dx pair
     List <span style="color: #da70d6;">of</span> installed extensions
 <span style="color: #da70d6;">Name</span> | <span style="color: #da70d6;">Version</span> | <span style="color: #da70d6;">Schema</span> | Description
<span style="color: #b22222;">------+---------+--------+-------------
</span> pair | 1.0     | <span style="color: #da70d6;">public</span> |
(1 <span style="color: #da70d6;">row</span>)

foo&gt; \dx+ pair
     Objects <span style="color: #7f007f;">in</span> extension "pair"
          <span style="color: #da70d6;">Object</span> Description
<span style="color: #b22222;">--------------------------------------
</span> <span style="color: #da70d6;">function</span> pair(anyelement,anyelement)
 <span style="color: #da70d6;">function</span> pair(anyelement,<span style="color: #228b22;">text</span>)
 <span style="color: #da70d6;">function</span> pair(<span style="color: #228b22;">text</span>,anyelement)
 <span style="color: #da70d6;">function</span> pair(<span style="color: #228b22;">text</span>,<span style="color: #228b22;">text</span>)
 <span style="color: #da70d6;">type</span> pair
(5 <span style="color: #da70d6;">rows</span>)
</pre>

<p>The extension installation is now happening from the catalog templates
rather than the file system, which means you didn't need to be <code>root</code> on the
system where the server is running. Also note that this example above did
happen when connected as the <em>database owner</em>, a user who is not the
<em>superuser</em>. Requiring less privileges is always good news, right?</p>


<h3>Managing upgrade scripts and extension update</h3>

<p class="first">Now that the extension is installed, you might want to update it with some
new awesome features. Let's have a look at that.</p>

<center>
<p><img src="../../../images/extension-update.png" alt=""></p>
</center>

<center>
<p><em>Upload your Extension Update Scripts</em></p>
</center>

<p>Rather than make a new version of the extension package with the new files
in there, then asking the operations team to make the new package available
on the internal repositories then install them on the servers, you could now
prepare and <em>QA</em> the new setup that way:</p>

<pre class="src">
<span style="color: #7f007f;">create</span> <span style="color: #da70d6;">template</span> <span style="color: #7f007f;">for</span> extension pair <span style="color: #7f007f;">from</span> <span style="color: #bc8f8f;">'1.0'</span> <span style="color: #7f007f;">to</span> <span style="color: #bc8f8f;">'1.1'</span>
<span style="color: #7f007f;">as</span> $$
  <span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">OPERATOR</span> ~&gt; (LEFTARG = <span style="color: #228b22;">text</span>,
                      RIGHTARG = anyelement,
                      <span style="color: #da70d6;">PROCEDURE</span> = pair);

  <span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">OPERATOR</span> ~&gt; (LEFTARG = anyelement,
                      RIGHTARG = <span style="color: #228b22;">text</span>,
                      <span style="color: #da70d6;">PROCEDURE</span> = pair);

  <span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">OPERATOR</span> ~&gt; (LEFTARG = anyelement,
                      RIGHTARG = anyelement,
                      <span style="color: #da70d6;">PROCEDURE</span> = pair);

  <span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">OPERATOR</span> ~&gt; (LEFTARG = <span style="color: #228b22;">text</span>,
                      RIGHTARG = <span style="color: #228b22;">text</span>,
                      <span style="color: #da70d6;">PROCEDURE</span> = pair);
$$;

<span style="color: #7f007f;">create</span> <span style="color: #da70d6;">template</span>
   <span style="color: #7f007f;">for</span> extension pair <span style="color: #7f007f;">from</span> <span style="color: #bc8f8f;">'1.1'</span> <span style="color: #7f007f;">to</span> <span style="color: #bc8f8f;">'1.2'</span>
<span style="color: #7f007f;">as</span> $$
            <span style="color: #da70d6;">comment</span> <span style="color: #7f007f;">on</span> extension pair <span style="color: #7f007f;">is</span> <span style="color: #bc8f8f;">'Simple Key Value Text Type'</span>;
$$;
</pre>

<p>Of course it's not the most realistic example when you look at the content.
In particular the <code>1.2</code> version that only adds a comment to the extension. I
needed another version to test the automatic upgrade path with more than one
step though, so here we go.</p>

<pre class="src">
foo&gt; <span style="color: #da70d6;">alter</span> extension pair <span style="color: #da70d6;">update</span> <span style="color: #7f007f;">to</span> <span style="color: #bc8f8f;">'1.2'</span>;
<span style="color: #da70d6;">ALTER</span> EXTENSION

foo&gt; \dx pair
             List <span style="color: #da70d6;">of</span> installed extensions
 <span style="color: #da70d6;">Name</span> | <span style="color: #da70d6;">Version</span> | <span style="color: #da70d6;">Schema</span> |        Description
<span style="color: #b22222;">------+---------+--------+----------------------------
</span> pair | 1.2     | <span style="color: #da70d6;">public</span> | <span style="color: #da70d6;">Simple</span> <span style="color: #da70d6;">Key</span> <span style="color: #da70d6;">Value</span> <span style="color: #228b22;">Text</span> <span style="color: #da70d6;">Type</span>
(1 <span style="color: #da70d6;">row</span>)

foo&gt; \dx+ pair
     Objects <span style="color: #7f007f;">in</span> extension "pair"
          <span style="color: #da70d6;">Object</span> Description
<span style="color: #b22222;">--------------------------------------
</span> <span style="color: #da70d6;">function</span> pair(anyelement,anyelement)
 <span style="color: #da70d6;">function</span> pair(anyelement,<span style="color: #228b22;">text</span>)
 <span style="color: #da70d6;">function</span> pair(<span style="color: #228b22;">text</span>,anyelement)
 <span style="color: #da70d6;">function</span> pair(<span style="color: #228b22;">text</span>,<span style="color: #228b22;">text</span>)
 <span style="color: #da70d6;">operator</span> ~&gt;(anyelement,anyelement)
 <span style="color: #da70d6;">operator</span> ~&gt;(anyelement,<span style="color: #228b22;">text</span>)
 <span style="color: #da70d6;">operator</span> ~&gt;(<span style="color: #228b22;">text</span>,anyelement)
 <span style="color: #da70d6;">operator</span> ~&gt;(<span style="color: #228b22;">text</span>,<span style="color: #228b22;">text</span>)
 <span style="color: #da70d6;">type</span> pair
(9 <span style="color: #da70d6;">rows</span>)
</pre>

<p>We did it!</p>


<h3>Internals</h3>

<p class="first">Let's have a look at those new catalogs:</p>

<center>
<p><img src="../../../images/octopus-anatomy.jpg" alt=""></p>
</center>

<center>
<p><em>Oh, that's not quite the internals I expected...</em></p>
</center>

<p>Here we go now:</p>

<pre class="src">
foo&gt; select * from pg_extension_control;
select * from pg_extension_control;
-[ RECORD 1 ]--+-------
ctlname        | pair
ctlowner       | 32926
ctldefault     | t
ctlrelocatable | f
ctlsuperuser   | f
ctlnamespace   | public
ctlversion     | 1.0
ctlrequires    |

foo&gt; select * from pg_extension_template;
select * from pg_extension_template;
-[ RECORD 1 ]-----------------------------------------------------------------
tplname    | pair
tplowner   | 32926
tplversion | 1.0
tplscript  |
           |   CREATE TYPE pair AS ( k text, v text );
           |
           |   CREATE OR REPLACE FUNCTION pair(anyelement, text)
           |   RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair';
           |
           |   CREATE OR REPLACE FUNCTION pair(text, anyelement)
           |   RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair';
           |
           |   CREATE OR REPLACE FUNCTION pair(anyelement, anyelement)
           |   RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair';
           |
           |   CREATE OR REPLACE FUNCTION pair(text, text)
           |   RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair;';
           |

foo&gt; select * from pg_extension_uptmpl;
select * from pg_extension_uptmpl;
-[ RECORD 1 ]-------------------------------------------------------------
uptname   | pair
uptowner  | 32926
uptfrom   | 1.0
uptto     | 1.1
uptscript |
          |   CREATE OPERATOR ~&gt; (LEFTARG = text,
          |                       RIGHTARG = anyelement,
          |                       PROCEDURE = pair);
          |
          |   CREATE OPERATOR ~&gt; (LEFTARG = anyelement,
          |                       RIGHTARG = text,
          |                       PROCEDURE = pair);
          |
          |   CREATE OPERATOR ~&gt; (LEFTARG = anyelement,
          |                       RIGHTARG = anyelement,
          |                       PROCEDURE = pair);
          |
          |   CREATE OPERATOR ~&gt; (LEFTARG = text,
          |                       RIGHTARG = text,
          |                       PROCEDURE = pair);
          |
-[ RECORD 2 ]-------------------------------------------------------------
uptname   | pair
uptowner  | 32926
uptfrom   | 1.1
uptto     | 1.2
uptscript |
          |     comment on extension pair is 'Simple Key Value Text Type';
          |
</pre>

<p>As you can see there's nothing too complex here, it's quite straightforward.
We need to separate away the <em>creating</em> templates from the <em>updating</em> templates
because we need <strong><em>unique</em></strong> keys and we can't have that on <code>NULL</code> columns.</p>

<pre class="src">
foo&gt; \d pg_extension_template
\d pg_extension_template
Table <span style="color: #bc8f8f;">"pg_catalog.pg_extension_template"</span>
   Column   | Type | Modifiers
------------+------+-----------
 tplname    | name | not null
 tplowner   | oid  | not null
 tplversion | text |
 tplscript  | text |
Indexes:
    <span style="color: #bc8f8f;">"pg_extension_template_name_version_index"</span> UNIQUE, btree (tplname, tplversion)
    <span style="color: #bc8f8f;">"pg_extension_template_oid_index"</span> UNIQUE, btree (oid)

foo&gt; \d pg_extension_uptmpl
\d pg_extension_uptmpl
Table <span style="color: #bc8f8f;">"pg_catalog.pg_extension_uptmpl"</span>
  Column   | Type | Modifiers
-----------+------+-----------
 uptname   | name | not null
 uptowner  | oid  | not null
 uptfrom   | text |
 uptto     | text |
 uptscript | text |
Indexes:
    <span style="color: #bc8f8f;">"pg_extension_uptmpl_name_from_to_index"</span> UNIQUE, btree (uptname, uptfrom, uptto)
    <span style="color: #bc8f8f;">"pg_extension_uptmpl_oid_index"</span> UNIQUE, btree (oid)
</pre>


<h3>Next steps</h3>

<p class="first">Now that we have the basics in place, the patch is far from finished still.
It needs <code>pg_dump</code> and <code>psql</code> support, support for the function
<code>pg_available_extension_versions()</code>, implementing some <code>ALTER TEMPLATE FOR
EXTENSION</code> commands for which I only sketched the syntax in the grammar, and
some more infrastructure to be able to have <code>ALTER OWNER</code> and <code>ALTER RENAME</code>
commands.</p>

<center>
<p><img src="../../../images/patch-brewing.jpg" alt=""></p>
</center>

<center>
<p><em>Warning: patch brewing here! Syntax and other key elements will change.</em></p>
</center>

<p>All that is pretty technical though, the real thing that patch needs is some
quality review and maybe some adjustments. I would be surprised if it didn't
need adjustments, really. Because the way the community works, we always
need some. That's why the PostgreSQL product is so good!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 08 Jan 2013 17:53:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2013/01/08-Extensions-Templates.html</guid>
</item>
<item>
  <title>Inline Extensions</title>
  <link>http://tapoueh.org/blog/2012/12/13-Inline-Extensions.html</link>
  <description><![CDATA[<p>We've been having the <code>CREATE EXTENSION</code> feature in <a href="http://www.postgresql.org/">PostgreSQL</a> for a couple of
releases now, so let's talk about how to go from here. The first goal of the
extension facility has been to allow for a clean <em>dump</em> and <em>restore</em> process of
<a href="http://www.postgresql.org/docs/9.2/static/contrib.html">contrib</a> modules. As such it's been tailored to the needs of deploying files
on the <em>file system</em> because there's no escaping from that when you have to
ship <em>binary</em> and <em>executable</em> files, those infamous <code>.so</code>, <code>.dll</code> or <code>.dylib</code> things.</p>

<center>
<p><img src="../../../images/dylibbundler.png" alt=""></p>
</center>

<p>Now that we have the <em>Extension</em> facility though, what we see is a growing
number of users taking advantage of it for the purpose of managing in house
procedural code and related objects. This code can be a bunch of <a href="http://www.postgresql.org/docs/9.2/static/plpgsql.html">PLpgSQL</a> or
<a href="http://www.postgresql.org/docs/9.2/static/plpython.html">plpython</a> functions and as such you normaly create them directly from any
application connection to PostgreSQL.</p>

<p>So the idea would be to allow creating <em>Extensions</em> fully from a SQL command,
including the whole set of objects it contains. More than one approach are
possible to reach that goal, each with downsides and advantages. We will see
them later in that document.</p>

<p>Before that though, let's first review what the extension mechanism has to
offer to its users when there's no <em>contrib like</em> module to manage.</p>

<h3>A use case for next generation extensions</h3>

<p class="first">The only design goal of the <code>9.1</code> PostgreSQL Extension feature has been to
support a proper <em>dump &amp; restore</em> user experience when using <em>contrib modules</em>
such as <code>hstore</code> or <code>ltree</code>. Building up on that, what do <em>Extensions</em> have to
offer to non <code>C</code> developpers out there? In other words, what <code>CREATE EXTENSION</code>
brings on the table that a bunch of <em>loose</em> objects does not? What problems
can we now solve?</p>

<center>
<p><img src="../../../images/multi_function_equipment.jpg" alt=""></p>
</center>

<center>
<p><em>A Multi Functions Equipment, All Bundled Together</em></p>
</center>

<p>A way to phrase it is to say that <em>Extensions</em> are user defined <code>CASCADE</code>
support. <em>Extensions</em> brings extensibility to the <code>pg_depend</code> PostgreSQL
internal dependency tracking system that <code>CASCADE</code> is built on. From that
angle, <em>Extensions</em> are a way to manage dependencies of <em>SQL objects</em> in a way
that allow you to manage them as a single entity.</p>

<p>One of the existing problems this helps solving is the infamous lack of
dependency tracking between function calls. Using <em>Extensions</em> when you deal
with a set of functions acting as an API, you can at least protect that as a
unit:</p>

<pre class="src">
STATEMENT: drop function public.populate_record(anyelement,hstore);
    ERROR: cannot drop function populate_record(anyelement,hstore) because
           extension hstore requires it
     HINT: You can drop extension hstore instead.
</pre>

<p>And you also have a version number and tools integration to manage
extensions, with psql <code>\dx</code> command and the equivalent feature in <a href="http://www.pgadmin.org/">pgAdmin</a>.
Coming with your own version number management is not impossible, some do
that already. Here it's integrated and the upgrade sequences are offered too
(applying <code>1.1--1.2</code> then <code>1.2--1.3</code> automatically).</p>

<p>Let's just say that it's very easy to understand the <em>traction</em> our users feel
towards leveraging <em>Extensions</em> features in order to properly manage their set
of stored procedures and SQL objects.</p>


<h3>The <em>dump &amp; restore</em> experience</h3>

<p class="first">The common problem of all those proposals is very central to the whole idea
of <em>Extensions</em> as we know them. The goal of building them as been to fix the
<em>restoring</em> experience when using extensions in a database, and we managed to
do that properly for contrib likes extensions.</p>

<center>
<p><img src="../../../images/fly.tn.png" alt=""></p>
</center>

<center>
<p><em>A fly in the ointment</em></p>
</center>

<p>When talking about <em>Inline Extensions</em>, the fly in the ointment is how to
properly manage their <code>pg_dump</code> behavior. The principle we built for
<em>Extensions</em> and that is almost unique to them is to <strong><em>omit</em></strong> them in the dump
files. The only other objects that we filter out of the dump are the one
installed at server initialisation times, when using <a href="http://www.postgresql.org/docs/9.2/static/app-initdb.html">initdb</a>, to be found in
the <code>pg_catalog</code> and <code>information_schema</code> systems' <em>schema</em>.</p>

<p>At restore time, the dump file contains the <code>CREATE EXTENSION</code> command so the
PostgreSQL server will go fetch the <em>control</em> and <em>script</em> files on disk and
process them, loading the database with the right set of SQL objects.</p>

<p>Now we're talking about <em>Extensions</em> which we would maybe want to dump the
objects of, so that at <em>restore</em> time we don't need to find them from unknown
external resources: the fact that the extension is <em>Inline</em> means that the
PostgreSQL server has no way to know where its content is coming from.</p>

<p>The next proposals are trying to address that problem, with more or less
success. So far none of them is entirely sastisfying to me, even if a clear
temporary winner as emerged on the <em>hackers</em> mailing list, summarized in the
<a href="http://archives.postgresql.org/message-id/m2fw3judug.fsf@2ndQuadrant.fr">in-catalog Extension Scripts and Control parameters (templates?)</a> thread.</p>


<h3>Inline Extension Proposals</h3>

<p class="first">Now, on to some proposals to make the best out of our all time favorite
PostgreSQL feature, the only one that makes no sense at all by itself...</p>

<h4>Starting from an empty extension</h4>

<p class="first">We already have the facility to add existing <em>loose</em> objects to an extension,
and that's exactly what we use when we create an extension for the first
time when it used not to be an extension before, with the <code>CREATE EXTENSION
... FROM 'unpackaged';</code> command.</p>

<p>The <code>hstore--unpackaged--1.0.sql</code> file contains statements such as:</p>

<pre class="src">
<span style="color: #da70d6;">ALTER</span> EXTENSION hstore <span style="color: #da70d6;">ADD</span> <span style="color: #da70d6;">type</span> <span style="color: #0000ff;">hstore</span>;
<span style="color: #da70d6;">ALTER</span> EXTENSION hstore <span style="color: #da70d6;">ADD</span> <span style="color: #da70d6;">function</span> <span style="color: #0000ff;">hstore_in</span>(cstring);
<span style="color: #da70d6;">ALTER</span> EXTENSION hstore <span style="color: #da70d6;">ADD</span> <span style="color: #da70d6;">function</span> <span style="color: #0000ff;">hstore_out</span>(hstore);
<span style="color: #da70d6;">ALTER</span> EXTENSION hstore <span style="color: #da70d6;">ADD</span> <span style="color: #da70d6;">function</span> <span style="color: #0000ff;">hstore_recv</span>(internal);
<span style="color: #da70d6;">ALTER</span> EXTENSION hstore <span style="color: #da70d6;">ADD</span> <span style="color: #da70d6;">function</span> <span style="color: #0000ff;">hstore_send</span>(hstore);
</pre>

<p>Opening <code>CREATE EXTENSION</code> so that it allows you to create a really <em>empty</em>
extension would then allow you to fill-in as you need, with as many commands
as you want to add objects to it. The <em>control</em> file properties would need to
find their way in that design, that sure can be taken care of.</p>

<center>
<p><img src="../../../images/empty-extension.jpg" alt=""></p>
</center>

<center>
<p><em>Look me, an Empty Extension!</em></p>
</center>

<p>The main drawback here is that there's no separation anymore in between the
extension author, the distribution means, the DBA and the database user.
When you want to install a third party <em>Extension</em> using only SQL commands,
you could do it with that scheme by using a big script full of one-liners
commands.</p>

<p>So that if you screw up your <em>copy/pasting</em> session (well you should maybe
reconsider your choice of tooling at this point, but that's another topic),
you will end up with a perfectly valid <em>Extension</em> that does not contain what
you wanted. As the end user, you have no clue about that until the first
time using the extension fails.</p>


<h4>CREATE EXTENSION AS</h4>

<p class="first">The next idea is to embed the <em>Extension</em> script itself in the command, so as
to to get a cleaner command API (in my opinion at least) and a better error
message when the paste is wrong. Of course it your <em>paste</em> problem happens to
just be loosing a line in the middle of the script there is not so much I
can do for you...</p>

<pre class="src">
<span style="color: #7f007f;">CREATE</span> EXTENSION hstore
  <span style="color: #7f007f;">WITH</span> <span style="color: #da70d6;">parameter</span> = <span style="color: #da70d6;">value</span>, ...
<span style="color: #7f007f;">AS</span> $$
<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">TYPE</span> <span style="color: #0000ff;">hstore</span>;

<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">FUNCTION</span> <span style="color: #0000ff;">hstore_in</span>(cstring) <span style="color: #da70d6;">RETURNS</span> hstore
 <span style="color: #7f007f;">AS</span> <span style="color: #bc8f8f;">'MODULE_PATHNAME'</span> <span style="color: #da70d6;">LANGUAGE</span> <span style="color: #da70d6;">C</span> <span style="color: #da70d6;">STRICT</span> <span style="color: #da70d6;">IMMUTABLE</span>;

<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">FUNCTION</span> <span style="color: #0000ff;">hstore_out</span>(hstore) <span style="color: #da70d6;">RETURNS</span> cstring
<span style="color: #7f007f;">AS</span> <span style="color: #bc8f8f;">'MODULE_PATHNAME'</span> <span style="color: #da70d6;">LANGUAGE</span> <span style="color: #da70d6;">C</span> <span style="color: #da70d6;">STRICT</span> <span style="color: #da70d6;">IMMUTABLE</span>;

<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">FUNCTION</span> <span style="color: #0000ff;">hstore_recv</span>(internal) <span style="color: #da70d6;">RETURNS</span> hstore
<span style="color: #7f007f;">AS</span> <span style="color: #bc8f8f;">'MODULE_PATHNAME'</span> <span style="color: #da70d6;">LANGUAGE</span> <span style="color: #da70d6;">C</span> <span style="color: #da70d6;">STRICT</span> <span style="color: #da70d6;">IMMUTABLE</span>;

<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">FUNCTION</span> <span style="color: #0000ff;">hstore_send</span>(hstore) <span style="color: #da70d6;">RETURNS</span> <span style="color: #228b22;">bytea</span>
<span style="color: #7f007f;">AS</span> <span style="color: #bc8f8f;">'MODULE_PATHNAME'</span> <span style="color: #da70d6;">LANGUAGE</span> <span style="color: #da70d6;">C</span> <span style="color: #da70d6;">STRICT</span> <span style="color: #da70d6;">IMMUTABLE</span>;

<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">TYPE</span> <span style="color: #0000ff;">hstore</span> (
        INTERNALLENGTH = -1, <span style="color: #da70d6;">STORAGE</span> = extended
        <span style="color: #da70d6;">INPUT</span> = hstore_in, <span style="color: #da70d6;">OUTPUT</span> = hstore_out,
        RECEIVE = hstore_recv, SEND = hstore_send);
$$;
</pre>
<center>
<p><em>An edited version of <code>hstore--1.1.sql</code> for vertical space concerns</em></p>
</center>

<p>I've actually proposed a patch to implement that, as you can see in the
<a href="https://commitfest.postgresql.org/action/patch_view?id=981">pg_dump --extension-script</a> commit fest entry. As spoiled by the commit fest
entry title, the main problem we have with <em>Inline Extensions</em> is their
management in the seamless experience of <em>dump &amp; restore</em> that we are so happy
to have now. More about that later, though.</p>


<h4>Extension Templates</h4>

<p class="first">Another idea is to continue working from control parameters and scripts to
install and update extensions, but to have two different places where to
find those. Either on the server's <em>File System</em> (when dealing with <em>contribs</em>
and <em>shared libraries</em>, there's but a choice), or on the system catalogs.</p>

<center>
<p><img src="../../../images/templates.png" alt=""></p>
</center>

<center>
<p><em>We Already Have <code>TEXT SEARCH TEMPLATE</code> After All</em></p>
</center>

<p>The idea would then be to have some new specific <code>TEMPLATE</code> SQL Object that
would be used to <em>import</em> or <em>upload</em> your control file and create and update
scripts in the database, using nothing else than a SQL connection. Then at
<code>CREATE EXTENSION</code> time the system would be able to work either from the file
system or the <em>template</em> catalogs.</p>

<p>One obvious problem is how to deal with a unique namespace when we split the
sources into the file system and the database, and when the file system is
typically maintained by using <code>apt-get</code> or <code>yum</code> commands.</p>

<p>Then again I would actually prefer that mechanism better than the other
proposals if the idea was to load the file system control and scripts files
as <code>TEMPLATEs</code> themselves and then only operate <em>Extensions</em> from <em>Templates</em>. But
doing that would mean getting back to the situation where we still are not
able to devise a good, simple and robust <code>pg_dump</code> policy for extensions and
templates.</p>



<h3>Conclusion</h3>

<p class="first">I hope to be finding the right solution to my long term plan in this release
development cycle, but it looks like the right challenge to address now is
to find the right compromise instead. Using the <em>Templates</em> idea already
brings a lot on the table, if not the whole set of features I would like to
see.</p>

<center>
<p><img src="../../../images/building-blocks.jpg" alt=""></p>
</center>

<center>
<p><em>PostgreSQL: Building on Solid Foundations</em></p>
</center>

<p>What would be missing mainly would be the ability for an <em>Extension</em> to switch
from being file based to being a template, either because the author decided
to change the way he's shipping it, or because the user is switching from
using the <a href="http://pgxnclient.projects.pgfoundry.org/">pgxn client</a> to using <em>proper</em> system packages. I guess that's
something we can see about later, though.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 13 Dec 2012 11:34:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2012/12/13-Inline-Extensions.html</guid>
</item>
<item>
  <title>Trigger Parameters</title>
  <link>http://tapoueh.org/blog/2012/12/06-parametrized-triggers.html</link>
  <description><![CDATA[<p>We have a not too active <a href="http://archives.postgresql.org/pgsql-fr-generale/2012-12/index.php">postgresql-fr-generale</a> mailing list where some
interesting questions are asked by our subscribers. This article comes from
such a question about how to deal with trigger parameters, which are nice to
have, but static.</p>

<center>
<p><img src="../../../images/trigger-wheels.big.jpg" alt=""></p>
</center>

<p>Another way to ask that question is saying that</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 06 Dec 2012 11:10:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2012/12/06-parametrized-triggers.html</guid>
</item>
<item>
  <title>M-x ack</title>
  <link>http://tapoueh.org/blog/2012/11/22-Emacs-Ack-Mode.html</link>
  <description><![CDATA[<p>I've been asked about how to integrate the <a href="http://betterthangrep.com/">ack</a> tool (you know, the one that
is <em>better than grep</em>) into Emacs today. Again. And I just realized that I
didn't blog about my solution. That might explain why I keep getting asked
about it after all...</p>

<p>So here it is, <code>M-x ack</code>:</p>

<pre class="src">
<span style="color: #b22222;">;;; </span><span style="color: #b22222;">dim-ack.el --- Dimitri Fontaine
</span><span style="color: #b22222;">;;</span><span style="color: #b22222;">
</span><span style="color: #b22222;">;; </span><span style="color: #b22222;">http://stackoverflow.com/questions/2322389/ack-does-not-work-when-run-from-gr</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">ep-find-in-emacs-on-windows</span><span style="color: #b22222;">
</span>
(<span style="color: #7f007f;">defcustom</span> <span style="color: #b8860b;">ack-command</span> (or (executable-find <span style="color: #bc8f8f;">"ack"</span>)
                           (executable-find <span style="color: #bc8f8f;">"ack-grep"</span>))
  <span style="color: #bc8f8f;">"Command to use to call ack, e.g. ack-grep under debian"</span>
  <span style="color: #da70d6;">:type</span> 'file)

(<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">ack-command-line</span> (concat ack-command <span style="color: #bc8f8f;">" --nogroup --nocolor "</span>))
(<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">ack-history</span> nil)
(<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">ack-host-defaults-alist</span> nil)

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">ack</span> ()
  <span style="color: #bc8f8f;">"Like grep, but using ack-command as the default"</span>
  (interactive)
  <span style="color: #b22222;">; </span><span style="color: #b22222;">Make sure grep has been initialized
</span>  (<span style="color: #7f007f;">if</span> (&gt;= emacs-major-version 22)
      (<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">grep</span>)
    (<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">compile</span>))
  <span style="color: #b22222;">; </span><span style="color: #b22222;">Close STDIN to keep ack from going into filter mode
</span>  (<span style="color: #7f007f;">let</span> ((null-device (format <span style="color: #bc8f8f;">"&lt; %s"</span> null-device))
        (grep-command ack-command-line)
        (grep-history ack-history)
        (grep-host-defaults-alist ack-host-defaults-alist))
    (call-interactively 'grep)
    (setq ack-history             grep-history
          ack-host-defaults-alist grep-host-defaults-alist)))

(<span style="color: #7f007f;">provide</span> '<span style="color: #5f9ea0;">dim-ack</span>)
</pre>

<p>Enjoy!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 22 Nov 2012 17:36:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2012/11/22-Emacs-Ack-Mode.html</guid>
</item>
<item>
  <title>CL Happy Numbers</title>
  <link>http://tapoueh.org/blog/2012/11/20-CL-Happy-Numbers.html</link>
  <description><![CDATA[<p>A while ago I stumbled upon <a href="http://tapoueh.org/blog/2010/08/30-happy-numbers.html">Happy Numbers</a> as explained in
<a href="http://programmingpraxis.com/2010/07/23/happy-numbers/">programming praxis</a>, and offered an implementation of them in <code>SQL</code> and in
<code>Emacs Lisp</code>. Yeah, I know. Why not, though?</p>

<center>
<p><img src="../../../images/happy-numbers.png" alt=""></p>
</center>

<p>Today I'm back on that topic and as I'm toying with <em>Common Lisp</em> I though it
would be a good excuse to learn me some new tricks. As you can see from the
earlier blog entry, last time I did attack the <em>digits</em> problem quite lightly.
Let's try a better approach now.</p>

<pre class="src">
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">digits</span> (n)
  <span style="color: #bc8f8f;">"return the list of the digits of N"</span>
  (nreverse
   (<span style="color: #7f007f;">loop</span> for x = n then r
      for (r d) = (multiple-value-list (truncate x 10))
      collect d
      until (zerop r))))
</pre>

<p>As you can see I wanted to use that facility I like very much, the <code>for
x = n then r</code> way to handle first loop iteration differently from the
next ones. But I've been hinted on <code>#lisp</code> that there's a much better way to
write same code:</p>

<pre class="src">
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">integer-digits</span> (integer)
  <span style="color: #bc8f8f;">"stassats version"</span>
  (nreverse
   (<span style="color: #7f007f;">loop</span> with remainder
      do (setf (values integer remainder) (truncate integer 10))
      collect remainder
      until (zerop integer))))
</pre>

<p>That code runs about twice as fast as the previous one and is easier to
reason about. It's using <code>setf</code> and the form <a href="http://www.lispworks.com/documentation/lw51/CLHS/Body/f_values.htm">setf values</a>, something nice to
discover as it seems to be quite powerful. Let's see how to use it, even if
it's really simple:</p>

<pre class="src">
CL-USER&gt; (integer-digits 12304501)
(1 2 3 0 4 5 0 1)
</pre>

<p>Let's move on to solving the <em>Happy Numbers</em> problem though:</p>

<pre class="src">
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">sum-of-squares-of-digits</span> (integer)
  (<span style="color: #7f007f;">loop</span> with remainder
     do (setf (values integer remainder) (truncate integer 10))
     sum (* remainder remainder)
     until (zerop integer)))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">happy?</span> (n <span style="color: #228b22;">&amp;optional</span> seen)
  <span style="color: #bc8f8f;">"return true when n is a happy number"</span>
  (<span style="color: #7f007f;">let*</span> ((happiness (sum-of-squares-of-digits n)))
    (<span style="color: #7f007f;">cond</span> ((eq 1 happiness)      t)
          ((memq happiness seen) nil)
          (t
           (happy? happiness (push happiness seen))))))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">find-happy-numbers</span> (limit)
  <span style="color: #bc8f8f;">"find all happy numbers from 1 to limit"</span>
  (<span style="color: #7f007f;">loop</span> for n from 1 to limit when (happy? n) collect n))
</pre>

<p>And here's how it goes:</p>

<pre class="src">
CL-USER&gt; (find-happy-numbers 100)
(1 7 10 13 19 23 28 31 32 44 49 68 70 79 82 86 91 94 97 100)

CL-USER&gt; (time (length (find-happy-numbers 1000000)))
(LENGTH (FIND-HAPPY-NUMBERS 1000000))
took 1,621,413 microseconds (1.621413 seconds) to run.
       116,474 microseconds (0.116474 seconds, 7.18%) of which was spent in GC.
During that period, and with 4 available CPU cores,
     1,431,332 microseconds (1.431332 seconds) were spent in user mode
       145,941 microseconds (0.145941 seconds) were spent in system mode
 185,438,208 bytes of memory allocated.
 1 minor page faults, 0 major page faults, 0 swaps.
143071
</pre>

<p>Of course that code is much faster than the one I wrote before both in <code>SQL</code>
and <em>Emacs Lisp</em>, the reason being that instead of writing the number into a
<em>string</em> with <code>(format t &quot;~d&quot; number)</code> then <a href="http://www.lispworks.com/documentation/HyperSpec/Body/f_subseq.htm">subseq</a> to get them one after the
other, we're now using <a href="http://www.lispworks.com/documentation/HyperSpec/Body/f_floorc.htm">truncate</a>.</p>

<p>Happy hacking!</p>

<h3>Update</h3>

<p class="first">It turns out that to solve math related problem, some maths hindsight is
helping. Who would have believed that? So if you want to easily get some
more performances out of the previous code, just try that solution:</p>

<pre class="src">
(<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">*depressed-squares*</span> '(0 4 16 20 37 42 58 89 145)
  <span style="color: #bc8f8f;">"see http://oeis.org/A039943"</span>)

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">undepressed?</span> (n)
  <span style="color: #bc8f8f;">"same as happy?, using a static list of unhappy sums"</span>
  (<span style="color: #7f007f;">cond</span> ((eq 1 n) t)
        ((member n *depressed-squares*) nil)
        (t
         (<span style="color: #7f007f;">let</span> ((h (sum-of-squares-of-digits n)))
           (undepressed? h)))))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">find-undepressed-numbers</span> (limit)
  <span style="color: #bc8f8f;">"find all happy numbers from 1 to limit"</span>
  (<span style="color: #7f007f;">loop</span> for n from 1 to limit when (undepressed? n) collect n))
</pre>

<p>Time to compare:</p>

<pre class="src">
CL-USER&gt; (time (length (find-happy-numbers 1000000)))
(LENGTH (FIND-HAPPY-NUMBERS 1000000))
took 1,938,048 microseconds (1.938048 seconds) to run.
       290,902 microseconds (0.290902 seconds, 15.01%) of which was spent in GC.
During that period, and with 4 available CPU cores,
     1,778,021 microseconds (1.778021 seconds) were spent in user mode
       140,862 microseconds (0.140862 seconds) were spent in system mode
 185,438,208 bytes of memory allocated.
 3,320 minor page faults, 0 major page faults, 0 swaps.
143071

CL-USER&gt; (time (length (find-undepressed-numbers 1000000)))
(LENGTH (FIND-UNDEPRESSED-NUMBERS 1000000))
took 1,036,847 microseconds (1.036847 seconds) to run.
         5,372 microseconds (0.005372 seconds, 0.52%) of which was spent in GC.
During that period, and with 4 available CPU cores,
     1,018,708 microseconds (1.018708 seconds) were spent in user mode
        16,982 microseconds (0.016982 seconds) were spent in system mode
 2,289,152 bytes of memory allocated.
143071
CL-USER&gt;
</pre>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 20 Nov 2012 18:20:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2012/11/20-CL-Happy-Numbers.html</guid>
</item>
<item>
  <title>About Vimgolf</title>
  <link>http://tapoueh.org/blog/2012/11/06-About-vimgolf.html</link>
  <description><![CDATA[<p>Following some <em>tweet</em> I found myself desultory watching an episode of the
awesome <a href="http://vimeo.com/channels/222837">VimGolf in Emacs</a> video series by <a href="http://vimeo.com/timvisher">Tim Visher</a>. Those series are about
picking some challenge from <a href="http://vimgolf.com/">vimgolf</a> and implementing it with our favorite
editor instead. Because <a href="http://emacsrocks.com/">Emacs Rocks</a> guys.</p>

<center>
<p><a class="image-link" href="http://emacsrocks.com/">
<img src="../../../images/emacs-rocks-logo.png"></a></p>
</center>

<p>Let me tell you upfront that I really dislike the whole idea of the <em>vim golf</em>
challenge. I've been a user of both <em>Emacs</em> and <em>Vim</em> for many years, and
finally decided to switch to <em>living in Emacs</em>; or if you prefer, climbing my
way up from level 2 as in <a href="http://blog.vivekhaldar.com/post/3996068979/the-levels-of-emacs-proficiency">The Levels Of Emacs Proficiency</a>. The reason why is
that I found that in my case, using <em>Vim</em> would mean spending more time
thinking about <em>how</em> to do some editing operation rather than the <em>problem</em> I
wanted to solve by editing some text, most often code.</p>

<p>Of course, the main effect of <em>Vim Golf</em> is to make you focus even more on the
wrong problem. There's still a good side of it though, which is that such
challenges are good excuses to discover new features of your editor. So
let's use that excuse to talk about some nice <em>Emacs</em> features.</p>

<center>
<p><a class="image-link" href="http://vimgolf.com/challenges/4dd3e19aec9eb6000100000d">
<img src="../../../images/vim_golf_logo.png"></a></p>
</center>

<center>
<p><em>Vim Golf Challenge: Complete the hex array data</em></p>
</center>

<h3>The challenge</h3>

<p class="first">The previous image will lead you to a particular challenge where it's all
about filling in an array with consecutive <em>hexadecimal</em> numbers written as
<code>0xab</code>, where you begin with a template containing only the <code>0x00</code> entry. The
idea is of course to use the <em>Vim</em> feature that will increment the <em>number at
point</em>, and is available through the <code>C-a</code> keystroke.</p>

<pre class="src">
<span style="color: #228b22;">unsigned</span> <span style="color: #228b22;">int</span> <span style="color: #b8860b;">hex</span>[] = {
        0x00,
};
</pre>


<h3>A first solution</h3>

<p><em>Emacs</em> does not ship with an <em>increment-number-at-point</em>, much less so with one
that would support <em>decimal</em>, <em>octal</em> and <em>hexadecimal</em> and even automatically
recognize <code>0x</code> as a prefix meaning that the next number is <em>hexadecimal</em>. But
<em>Emacs</em> ship with <a href="http://www.gnu.org/software/emacs/manual/html_node/emacs/Keyboard-Macros.html">Emacs Keyboard Macros</a> and those have a counter, so it's easy
enough to fill in numbers from 1 to 255 that way: <code>M-1 F3 F3 , F4</code> will
register a macro where the counter starts at 1, and each time you hit <code>F4</code> it
will insert the current counter value, increment it and insert a coma. You
want to do that 254 times, so you do <code>C-u 2 5 4 F4</code> and <em>Emacs</em> just does that.</p>

<p>Now, to transform those decimal numbers into their <em>hexadecimal</em>
representation, you can use advanced <a href="http://www.gnu.org/software/emacs/manual/html_node/emacs/Regexp-Replace.html">Emacs Regexp Replace</a> features. Replace
<code>[0-9]+</code> with the result from the following <em>Emacs Lisp</em> code: <code>\,(format
&quot;0x%02x&quot; (string-to-number \&amp;))</code>. The <code>\&amp;</code> in there will be replaced by the
matching text, so that will do what we need here, turning <code>10</code> into <code>0x0a</code>.</p>


<h3>Let's get some better tools</h3>

<p class="first">We could do better, though. I happen to already use a <em>key chord</em> to duplicate
the current line, and we would need a function to <a href="http://www.emacswiki.org/emacs/IncrementNumber">Increment Number At Point</a>.
Those I found over at <a href="http://www.emacswiki.org/">EmacsWiki</a> were not to my taste as they were not able
to figure out easily which <em>base</em> to use. So here's a little <em>Emacs Lisp</em>
example showing how to extend your favorite editor to have some <em>Vim</em> common
features, which is why <em>Emacs</em> ships with <em>Emacs Lisp</em> in the first place.</p>

<pre class="src">
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">duplicate-current-line</span> (<span style="color: #228b22;">&amp;optional</span> n)
  <span style="color: #bc8f8f;">"duplicate current line, make more than 1 copy given a numeric argument"</span>
  (interactive <span style="color: #bc8f8f;">"p"</span>)
  (<span style="color: #7f007f;">let</span> ((nb (or n 1))
        (current-line (thing-at-point 'line)))
    (<span style="color: #7f007f;">save-excursion</span>
      <span style="color: #b22222;">;; </span><span style="color: #b22222;">when on last line, insert a newline first
</span>      (<span style="color: #7f007f;">when</span> (or (= 1 (forward-line 1)) (eq (point) (point-max)))
        (insert <span style="color: #bc8f8f;">"\n"</span>))

      <span style="color: #b22222;">;; </span><span style="color: #b22222;">now insert as many time as requested
</span>      (<span style="color: #7f007f;">while</span> (&gt; n 0)
        (insert current-line)
        (decf n)))
    <span style="color: #b22222;">;; </span><span style="color: #b22222;">now move down as many lines as we inserted
</span>    (next-line nb)))

(global-set-key (kbd <span style="color: #bc8f8f;">"C-S-d"</span>) 'duplicate-current-line)
</pre>

<center>
<p><a class="image-link" href="http://lisperati.com/">
<img src="../../../images/emacs-on-toaster.jpg"></a></p>
</center>

<pre class="src">
(<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">cl</span>)  <span style="color: #b22222;">; </span><span style="color: #b22222;">destructuring-bind is found there
</span>
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">dim:increment-number-at-point</span> (<span style="color: #228b22;">&amp;optional</span> prefix)
  (interactive <span style="color: #bc8f8f;">"p"</span>)
  (<span style="color: #7f007f;">let*</span> ((beg    (skip-chars-backward <span style="color: #bc8f8f;">"0-9a-fA-F"</span>))
         (hexa   (<span style="color: #7f007f;">save-excursion</span> (forward-char -2) (looking-at-p <span style="color: #bc8f8f;">"0x"</span>)))
         <span style="color: #b22222;">;; </span><span style="color: #b22222;">force the prefix to hexa (4) we see "0x" before the number
</span>         (prefix (<span style="color: #7f007f;">if</span> hexa 4 prefix))
         (end    (re-search-forward <span style="color: #bc8f8f;">"[0-9a-fA-F]+"</span> nil t))
         (nstr   (match-string 0))
         (l      (- (match-end 0) (match-beginning 0)))
         (fmt    (format <span style="color: #bc8f8f;">"%%0%d"</span> l)))
    (message <span style="color: #bc8f8f;">"PLOP: %d"</span> prefix)
    (<span style="color: #7f007f;">destructuring-bind</span> (base format)
        (<span style="color: #7f007f;">case</span> prefix
          ((1)  '(10 <span style="color: #bc8f8f;">"d"</span>))              <span style="color: #b22222;">; </span><span style="color: #b22222;">no command prefix, decimal
</span>          ((4)  '(16 <span style="color: #bc8f8f;">"x"</span>))              <span style="color: #b22222;">; </span><span style="color: #b22222;">C-u, hexadecimal
</span>          ((16) '(8 <span style="color: #bc8f8f;">"o"</span>)))              <span style="color: #b22222;">; </span><span style="color: #b22222;">C-u C-u, octal
</span>      (<span style="color: #7f007f;">let*</span> ((n   (string-to-number nstr base))
             (n+1 (+ n 1))
             (fmt (format <span style="color: #bc8f8f;">"%s%s"</span> fmt format)))
        (replace-match (format fmt n+1))))))

(global-set-key (kbd <span style="color: #bc8f8f;">"C-c +"</span>) 'dim:increment-number-at-point)
</pre>

<blockquote>
<p class="quoted">
So if you're using <em>Emacs</em> a lot but always found an excuse not to grasp <em>Emacs
Lisp</em>, I hope that article could be an excuse for you to do so…</p>

</blockquote>


<h3>Another solution</h3>

<p class="first">Anyway, now that we are much better equipped, we can picture a better way to
solve the problem. Instead of using a macro that inserts the next counter
value, we can use one that duplicate current line, increment number at point
(and figures out on its own that the number prefixed with <code>0x</code> is
<em>hexadecimal</em>), and does that 254 times more. Then it's all about reformatting
the text so that if fits nicely on screen, and for that the command <code>M-q runs
the command fill-paragraph</code> is exactly what we need. The command <code>C-x f runs
the command set-fill-column</code> can be used to set the maximum column we allow
<em>Emacs</em> to reach before going to the next line.</p>

<p>Our <em>Golf</em> then becomes a <code>19</code> steps solution if you start with the cursor at
the <code>','</code> in the previous example:</p>

<pre class="src">
C-x f 5 6 RET
F3 C-S-d C-c + F4
C-u 2 5 4 F4
C-SPC M-&lt; C-n M-q
</pre>

<p>First, set the <em>fill column</em>, then register a macro (in between <code>F3</code> and <code>F4)</code>
that will duplicate current line (using <code>C-S-d</code>) then increment number at
point (using <code>C-c +</code>). Third line, replay that macro 254 times (<code>C-u 2 5 4 F4</code>).
Fourth line, select all those <em>hexadecimal</em> numbers and fill the paragraph
they form correctly, so as to get:</p>


<h3>All those tips for...</h3>

<pre class="src">
<span style="color: #228b22;">unsigned</span> <span style="color: #228b22;">int</span> <span style="color: #b8860b;">hex</span>[] = {
        0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
        0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
        0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
        0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
        0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27,
        0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f,
        0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37,
        0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f,
        0x40, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47,
        0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f,
        0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57,
        0x58, 0x59, 0x5a, 0x5b, 0x5c, 0x5d, 0x5e, 0x5f,
        0x60, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67,
        0x68, 0x69, 0x6a, 0x6b, 0x6c, 0x6d, 0x6e, 0x6f,
        0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76, 0x77,
        0x78, 0x79, 0x7a, 0x7b, 0x7c, 0x7d, 0x7e, 0x7f,
        0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
        0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
        0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
        0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f,
        0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
        0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf,
        0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
        0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf,
        0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7,
        0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
        0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7,
        0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf,
        0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
        0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
        0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7,
        0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff,
};
</pre>


<h3>Conclusion</h3>

<p class="first">Thanks to the excuse of that challenge we now have a generic facility to
increment a number at point in different common bases, which is a nice
building block for all kinds of <em>Emacs Keyboard Macros</em>. We also now have a
function to duplicate the line at point, which is something I use very often
myself.</p>

<p>More importantly, we've been refreshing our memory on how to use some
advanced replacement facilities wherein you can actually use inline <em>Emacs
Lisp</em> code as a replacement pattern, and for the most interested readers here
we have a good excuse to learn some more about <em>Emacs Lisp</em> programming.</p>

<p>The main thing I want to say is that using <em>Emacs Keyboard Macros</em> is an
interactive process: you don't have to pause your current activity to write
some code in another language (here, that would be either <em>Vim Script</em> or
<em>Emacs Lisp</em>) just to save a few minutes on a boring task.</p>

<p>How effective your are at solving that challenge, for me, is not at all
about measure how many keystrokes you ended up using, it's all about being
able to get some precious help from your working tools <strong><em>without</em></strong> having to
stop focusing on the main problem you are solving.</p>

<p>I wouldn't ever get to write such <em>Emacs Lisp</em> code when doing that kind of
editing once. I would only do that when I'm thinking I've just been doing a
boring task by hand one time too many already. Like for example copying and
pasting the <code>pg_backend_pid()</code> of the <a href="http://www.postgresql.org/">PostgreSQL</a> backend I'm working with at
the <code>psql</code> prompt so that I can attach <code>gdb</code> to it. I'll get back to talking
about <a href="https://github.com/dimitri/pgdevenv-el">pgdevenv-el</a> later!</p>

<p>Hope you did enjoy that article, whose goal is to help you while you're
journeying in <a href="http://blog.vivekhaldar.com/post/3996068979/the-levels-of-emacs-proficiency">The Levels Of Emacs Proficiency</a>.</p>


<h3>Update</h3>

<p class="first">While looking at the docs for the <a href="http://www.gnu.org/software/emacs/manual/html_node/emacs/Keyboard-Macro-Counter.html#Keyboard-Macro-Counter">Keyboard Macro Counter</a> to check how to
reset it without having to record the macro again, I just stumbled on this
part of the docs: <code>C-x C-k C-f runs the command kmacro-set-format</code>. So another
way to solve our problem with only facilities that come with a bare Emacs is
the following:</p>

<pre class="src">
C-x f 5 6 RET
C-x C-k C-f 0x%02x RET
C-1 F3 SPC F3 , F4
C-u 2 5 4 F4
DEL C-SPC C-a C-q
</pre>

<p>We're now at 30 keystrokes, so much more than previously, but it's stock
Emacs features and that <code>kmacro-set-format</code> is a wonderful little tool you
might as well have a need for in the future.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Sun, 11 Nov 2012 20:52:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2012/11/06-About-vimgolf.html</guid>
</item>
<item>
  <title>Editing SQL</title>
  <link>http://tapoueh.org/blog/2012/11/06-Interactive-SQL.html</link>
  <description><![CDATA[<p>It's hard to read my blog yet not know I'm using <a href="http://www.gnu.org/software/emacs/#Platforms">Emacs</a>. It really is a great
tool and has a lot to compare to <a href="http://www.postgresql.org/">PostgreSQL</a> in terms of extensibility,
documentation quality and community. And there's even a native
implementation of the <a href="http://www.postgresql.org/docs/current/static/protocol.html">PostgreSQL Protocol</a> written in <a href="http://www.gnu.org/software/emacs/emacs-lisp-intro/">Emacs Lisp</a>.</p>

<center>
<p><a class="image-link" href="http://www.online-marketwatch.com/pgel/pg.html">
<img src="../../../images/pg-el.png"></a></p>
</center>

<p>One of the things where <em>Emacs</em> really shines is that interactive development
environment you get when working on some <em>Emacs Lisp</em> code. Evaluating an
function as easy as a single <em>key chord</em>, and that will both compile in the
function and load it in the running process. I can't tell you how many times
I've been missing that ability when editing C code.</p>

<p>With <em>PostgreSQL</em> too we get a pretty interactive environment with the <a href="http://www.postgresql.org/docs/current/static/app-psql.html">psql</a>
console application, or with <a href="http://www.pgadmin.org/">pgAdmin</a>. One feature from <em>pgAdmin</em> that I've
often wished I had in <em>psql</em> is the ability to edit my query online and easily
run it in the console, rather than either using the <em>readline</em> limited history
editing features or launching a new editor process each time with <code>\e</code>. At the
same time I would much prefer using my usual <em>Emacs</em> editor to actually <em>edit</em>
the query.</p>

<p>If you've been reading that blog before you know what to expect. My solution
to the stated problem is available in <a href="https://github.com/dimitri/pgdevenv-el">pgdevenv-el</a>, an <em>Emacs</em> package aimed at
helping <em>PostgreSQL</em> developers. Most of the features in there are geared
toward the <em>core backend</em> developers, except for this one I want to talk about
today (I'll blog about the other ones too I guess).</p>

<center>
<p><img src="../../../images/pgdevenv-el-eval-sql.png" alt=""></p>
</center>

<p>What you can see from that screenshot is that the selected query text has
been sent to the <em>psql</em> buffer and exectuted over there. And that the <em>psql</em>
buffer is echoing all queries sent to it. What you can not see straight from
that picture is the interaction to get there. Well, I've been implementing
some <em>elisp</em> features that I was missing.</p>

<p>First, movement: you can do <code>C-M-a</code> and <code>C-M-e</code> to navigate to the beginning and
the end of the SQL query at point, like you do in <code>C</code> or in <code>lisp</code> in <em>Emacs</em>.</p>

<p>Then, selection: you can do <code>C-M-h</code> to select the SQL query at point, you
don't have to navigate yourself, <a href="https://github.com/dimitri/pgdevenv-el">pgdev-sql-mode</a> knows how to do that. Side
note, <code>pgdev-sql-mode</code> is the name of the <em>minor mode</em> you need to activate in
your SQL buffers to have the magic available.</p>

<p>Last but not least, evaluation: as when editing lisp code, you can now use
<code>C-M-x</code> to send the current query text to an associated <em>psql</em> buffer.</p>

<p>The way to associate the <em>psql</em> buffer to an <em>SQL</em> buffer is currently done
thanks to the other <em>pgdevenv-el</em> features that this blog post is not talking
about, and the setup is addressed in the documentation: you have to let know
<em>pgdevenv-el</em> where your PostgreSQL branches are installed locally so that it
can prepare you a <em>Shell</em> buffer with <code>PGDATA</code> and <code>PGPORT</code> already set for you.
And currently, for <code>C-M-x</code> to work you need to open the buffer yourself before
hand, using <code>C-c - n</code> (to run the command <code>pgdev-open-shell</code>), and type <code>psql</code> in
the <em>Shell</em> prompt.</p>

<p>What that means for me is that I can at least edit SQL (in <em>PostgreSQL</em>
regression files and other places) in my usual <em>Emacs</em> buffer and actually
refine it as I go until it does exactly what I need, without having to use
the <em>readline</em> history editing or the <code>\e</code> command, which is not great when your
<em>Shell</em> is in already running inside <em>Emacs</em>.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 06 Nov 2012 09:55:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2012/11/06-Interactive-SQL.html</guid>
</item>
<item>
  <title>Concurrent Hello</title>
  <link>http://tapoueh.org/blog/2012/11/04-Concurrent-Hello.html</link>
  <description><![CDATA[<p>Thanks to <a href="https://twitter.com/mickael/status/265191809100181504">Mickael</a> on <em>twitter</em> I ran into that article about implementing a
very basic <em>Hello World!</em> program as a way to get into a new concurrent
language or facility. The original article, titled
<a href="http://himmele.blogspot.de/2012/11/concurrent-hello-world-in-go-erlang.html">Concurrent Hello World in Go, Erlang and C++</a> is all about getting to know
<a href="http://golang.org/">The Go Programming Language</a> better.</p>

<p>To quote the article:</p>

<blockquote>
<p class="quoted">
The first thing I always do when playing around with a new
software platform is to write a concurrent &quot;Hello World&quot; program. The
program works as follows: One active entity (e.g. thread, Erlang process,
Goroutine) has to print &quot;Hello &quot; and another one &quot;World!\n&quot; with the two
active entities synchronizing with each other so that the output always is
&quot;Hello World!\n&quot;.</p>

</blockquote>

<p>Here's my try in <a href="http://cliki.net/">Common Lisp</a> using <a href="http://lparallel.org/">lparallel</a> and some <em>local nicknames</em>, the
whole <code>23</code> lines of it:</p>

<pre class="src">
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">say-hello</span> (helloq worldq n)
  (<span style="color: #7f007f;">dotimes</span> (i n)
    (format t <span style="color: #bc8f8f;">"Hello "</span>)
    (lq:push-queue <span style="color: #da70d6;">:say-world</span> worldq)
    (lq:pop-queue helloq))
  (lq:push-queue <span style="color: #da70d6;">:quit</span> worldq))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">say-world</span> (helloq worldq)
  (<span style="color: #7f007f;">when</span> (eq (lq:pop-queue worldq) <span style="color: #da70d6;">:say-world</span>)
    (format t <span style="color: #bc8f8f;">"World!~%"</span>)
    (lq:push-queue <span style="color: #da70d6;">:say-hello</span> helloq)
    (say-world helloq worldq)))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">hello-world</span> (n)
  (<span style="color: #7f007f;">let*</span> ((lp:*kernel*  (lp:make-kernel 2)) <span style="color: #b22222;">; </span><span style="color: #b22222;">a new one each time, as we end it
</span>         (channel      (lp:make-channel))
         (helloq       (lq:make-queue))
         (worldq       (lq:make-queue)))
    (lp:submit-task channel #'say-world helloq worldq)
    (lp:submit-task channel #'say-hello helloq worldq n)
    (lp:receive-result channel)
    (lp:receive-result channel)
    (lp:end-kernel)))
</pre>

<p>If you want to play locally with that code, I've been updating it to a
<em>github</em> project named <a href="https://github.com/dimitri/go-hello-world">go-hello-world</a>, even if it's coded in <em>CL</em>. See the
<code>package.lisp</code> in there for how I did enable the <em>local nicknames</em> <code>lp</code> and <code>lq</code> for
the <em>lparallel</em> packages.</p>

<h3>Beware of the REPL</h3>

<p class="first">In a previous version of this very article, I said that sometimes I get an
extra line feed in the output and I didn't understand why. Some great Common
Lisp folks did hint me about that: it's the <em>REPL</em> output that get
intermingled with the program output, and that's because the <code>hello-world</code>
main function was returning before the thing is over.</p>

<p>I've added a <code>receive-result</code> call in it per worker so that it waits until the
end of the program before returning to the <em>REPL</em>, and that indeed fixes it. A
way to assert that is using the <code>time</code> macro, which was always intermingled
with the output before. It's fixed now:</p>

<pre class="src">
CL-USER&gt; (time (go-hello-world:hello-world 1000))
Hello World!
...
Hello World!
(GO-HELLO-WORLD:HELLO-WORLD 1000)
took 27,886 microseconds (0.027886 seconds) to run.
      1,593 microseconds (0.001593 seconds, 5.71%) of which was spent in GC.
During that period, and with 4 available CPU cores,
     23,246 microseconds (0.023246 seconds) were spent in user mode
     14,427 microseconds (0.014427 seconds) were spent in system mode
 4,272 bytes of memory allocated.
 10 minor page faults, 0 major page faults, 0 swaps.
(#&lt;PROCESS lparallel kernel shutdown manager(62) [Reset] #x30200109F65D&gt; ...)
CL-USER&gt;
</pre>


<h3>Conclusion</h3>

<p class="first">While <em>Go</em> language seems to bring very interesting things on the table, such
as better compilation units and tools, I still think that the concurrency
primitives at the core of it are easy to find in other places. Which is a
good thing, as it means we know they work.</p>

<p>That also means that we don't need to accept <em>Go</em> syntax as the only way to
properly solve that <em>concurrency</em> problem, I much prefer doing so with <em>Common
Lisp</em> (lack of?) syntax myself.</p>


<h3>Update</h3>

<p class="first">A previous version of this article was finished and published too quickly,
and the conclusion was made from a buggy version of the program. It's all
fixed now. Thanks a lot to people who contributed comments so that I could
fix it, and thanks again to <em>James M. Lawrence</em> for <a href="http://lparallel.org/">lparallel</a>!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Sun, 04 Nov 2012 23:04:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2012/11/04-Concurrent-Hello.html</guid>
</item>
<item>
  <title>PostgreSQL for developers</title>
  <link>http://tapoueh.org/blog/2012/11/02-Conference-AFUP-Lyon.html</link>
  <description><![CDATA[<p>As <a href="http://blog.guillaume.lelarge.info/index.php/post/2012/11/01/Conf%C3%A9rence-%C3%A0-l-AFUP-Lyon">Guillaume</a> says, we've been enjoying a great evening conference in Lyon 2
days ago, presenting PostgreSQL to developers. He did the first hour
presenting the project and the main things you want to know to start using
<a href="http://www.postgresql.org/">PostgreSQL</a> in production, then I took the opportunity to be talking to
developers to show off some SQL.</p>

<center>
<p><a class="image-link" href="../../../images/confs/developper-avec-pgsql.pdf">
<img src="../../../images/confs/developper-avec-pgsql-0.png"></a></p>
</center>

<p>That slide deck contains mainly SQL language, but some french too, rather
than english. Sorry for the inconvenience if that's not something you can
read. Get me to talk at an english developer friendly conference and I'll
translate it for you! :)</p>

<p>The aim of that talk is to have people think about SQL as a real asset in
their development tool set. SQL really should get compared to your
application development language rather than your UI formating language,
it's more like PHP or Python than it is like HTML.</p>

<p>So the whole talk is about showing off some advanced SQL features, all
provided by default in released PostgreSQL versions. The main parts of the
talk all come from an article in this blog: <a href="../10/05-reset-counter.html">Reset Counter</a>.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 02 Nov 2012 16:22:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2012/11/02-Conference-AFUP-Lyon.html</guid>
</item>
<item>
  <title>Another awesome conf</title>
  <link>http://tapoueh.org/blog/2012/10/30-Prague-Lyon.html</link>
  <description><![CDATA[<p>Last week was <a href="http://2012.pgconf.eu/">PostgreSQL Conference Europe 2012</a> in Prague, and it's been
awesome. Many thanks to the organisers who did manage to host a very smooth
conference with <code>290</code> attendees, including speakers. That means you kept
walking into interesting people to talk to, and in particular the <em>Hallway
Track</em> has been a giant success.</p>

<center>
<p><a class="image-link" href="http://www.flickr.com/photos/obartunov/8128604476/lightbox/">
<img src="../../../images/prague.jpg"></a></p>
</center>

<center>
<p><em>Photo by <a href="http://www.sai.msu.su/~megera/">Oleg Bartunov</a></em></p>
</center>

<p>I did have the chance to speak several times at that event, and you can get
the slides at my <a href="../../../conferences.html">Conferences</a> page that I try to keep up to date. I did one
talk about <a href="http://www.postgresql.eu/events/schedule/pgconfeu2012/session/318-implementing-high-availability/">Implementing High Availability</a> that was about 2 hours long (a
double slot), <a href="http://www.postgresql.eu/events/schedule/pgconfeu2012/session/373-lightning-talks/">PGQ Cooperative Consumers</a> that Marko Kreen copresented with me
and the <a href="http://www.postgresql.eu/events/schedule/pgconfeu2012/session/317-large-scale-mysql-migration-to-postgresql/">Large Scale MySQL Migration to PostgreSQL</a> that I already presented
before this year.</p>

<center>
<p><a class="image-link" href="../../../images/high-availability.pdf">
<img src="../../../images/high-availability.png"></a></p>
</center>


<p>Next conference is in Lyon and will be in French, the talk is called
<a href="http://lyon.afup.org/2012/10/17/presentation-de-postgresql-31102012-a-19h30/">Présentation de PostgreSQL</a>. The audience is going to be composed of PHP
developers interested to know more about PostgreSQL, I'll tell you how it
goes!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 30 Oct 2012 12:50:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2012/10/30-Prague-Lyon.html</guid>
</item>
<item>
  <title>Prefixes and Ranges</title>
  <link>http://tapoueh.org/blog/2012/10/16-prefix-update.html</link>
  <description><![CDATA[<p>It's been a long time since I last had some time to spend on the <a href="http://tapoueh.org/pgsql/prefix.html">prefix</a>
PostgreSQL extension and its <code>prefix_range</code> data type. With PostgreSQL 9.2
out, some users wanted me to update the extension for that release, and
hinted me that it was high time that I fix that old bug for which I already
had a patch.</p>

<center>
<p><img src="../../../images/Prefix-Pro-Blend.jpg" alt=""></p>
</center>

<h3><code>prefix_range</code> release 1.2.0</h3>

<p class="first">I'm sorry it took that long. It's now done, you can have <code>prefix 1.2.0</code> from
<a href="https://github.com/dimitri/prefix">https://github.com/dimitri/prefix</a> or if you want a <em>tagged</em> tarball then you
can use this link: <a href="https://github.com/dimitri/prefix/tarball/v1.2.0">https://github.com/dimitri/prefix/tarball/v1.2.0</a>.</p>

<p>The <em>changelog</em> is all about fixing an index search bug and updating the
package to primarily be an extension for PostgreSQL 9.1 and 9.2. Of course
older Major Versions are still supported (all of them since <code>8.1</code>, but please
first consider upgrading PostgreSQL) if you want to install it <em>manually</em>,
using the <code>prefix--1.2.0.sql</code> file.</p>


<h3>debian package</h3>

<p class="first">And thanks to <a href="http://www.df7cb.de/">Christoph Berg</a> the debian package is already validated and has
reached <em>debian experimental</em>. We don't target <em>sid</em> these days because debian
is preparing a new stable release, so there's a freeze. I think. Anyway,
take your prefix package from here if you need it:
<a href="http://packages.debian.org/experimental/postgresql-9.1-prefix">http://packages.debian.org/experimental/postgresql-9.1-prefix</a>.</p>


<h3>Range Types</h3>

<p class="first">If you step back a little there's an interesting question to answer here.
Why isn't <code>prefix_range</code> and <a href="http://www.postgresql.org/docs/9.2/static/rangetypes.html">PostgreSQL Range Type</a>? Given the names it seems
like a pretty good candidate.</p>

<p>Well the thing is that to make a generic range type you need to have a total
ordering on the range elements, and a distance function that tells you how
far any two elements of a range are one from each other.</p>

<p>When talking about prefixes, I don't see how to do that. The prefix range
<code>['abcd', 'abce')</code> contains an infinity of elements, all the <em>strings</em> that
begin with the letters <code>abcd</code>. I guess that coming with an ordering on text is
possible, but what if any text element represents a prefix?</p>

<p>I mean that in our case, the elements would be of type <code>prefix</code>, and <code>'abcd'</code> is
a prefix of <code>'abcdefg'</code>. The question I want to answer is that given a table
with prefixes <code>'abcd'</code>, <code>'abce'</code> and <code>'abcde'</code> which row in there has the longest
prefix matching the literal <code>'abcdef'</code>.</p>

<p>I'm not seeing how to abuse the <em>Range Types</em> mechanism to implement that, so
if you have some ideas please share them!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 16 Oct 2012 10:47:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2012/10/16-prefix-update.html</guid>
</item>
<item>
  <title>Reset Counter</title>
  <link>http://tapoueh.org/blog/2012/10/05-reset-counter.html</link>
  <description><![CDATA[<p>I've been given a nice puzzle that I think is a good blog article
opportunity, as it involves some thinking and <em>window functions</em>.</p>

<h3>What's to solve</h3>

<p class="first">Say we store in a table entries from a <em>counter</em> that only increases and the
time stamp when we did the measurement. So that when you read <code>30</code> then later
<code>40</code> in fact that means we counted <code>10</code> more the second reading when compared to
the first, in other words the first <code>30</code> are counted again in the second
counter value, <code>40</code>.</p>

<center>
<p><a class="image-link" href="http://xkcd.com/363/">
<img src="../../../images/reset.png"></a></p>
</center>

<p>Now of course it's a real world counter. Think network traffic counter on a
network interface, if you want something real to play with in your mind. So
the counter will sometime reset and you will read measure sequences such as
<code>40, 0, 20</code> if you happen to read just when the counter is reset, or most of
the time that will look like <code>45, 25, 50</code>.</p>

<p>The question we want to answer is, given a series of that counter measures,
including some resets, what is the current logical value of the counter?</p>

<p>Given the sequence of measures <code>0, 10, 20, 30, 40, 0, 20, 30, 60</code> the result
we want is <code>40 + 60</code>, that is <code>100</code>. Right?</p>


<h3>Playing with some data</h3>

<p class="first">Let's model an hypothetical dataset easy enough to play with. What about
just the previous example? We also need to <em>time stamp</em> the measurements,
let's just use a <em>tick</em> for now, as it's easier to think about:</p>

<pre class="src">
create table measures(tick int, nb int);

insert into measures
     values (1, 0), (2, 10), (3, 20), (4, 30), (5, 40),
            (6, 0), (7, 20), (8, 30), (9, 60);
</pre>

<p>Now that we have some data in a table to play with, let's try to find out
the numbers we are interested in: we only want to keep the latest measure we
read on the counter just before it wraps. That means values where the <em>next
one</em> (in tick or time stamping order) is lesser than the current counter
value.</p>

<p>As we are lucky enough to be playing with the awesome <a href="http://www.postgresql.org/">PostgreSQL</a> which
brings <a href="http://www.postgresql.org/docs/9.2/static/tutorial-window.html">window functions</a> on the table, we can easily implement just what we
said in a readable way:</p>

<pre class="src">
  select tick, nb,
         case when lead(nb) over w &lt; nb then nb
              when lead(nb) over w is null then nb
              else null
          end as max
    from measures
  window w as (order by tick);
</pre>

<p>The firt <em>case</em> is the exact translation of the problem as spelled in english
in just the previous paragraph where we stated we want to keep the current
counter value in case of a <em>wraparound</em>, so I guess it's easy enough to get
at.</p>

<center>
<p><img src="../../../images/reset-circuit-thumbnail.jpg" alt=""></p>
</center>

<p>Then we have a couple of tricks in that query in order to massage the data
as we want it. First, the last row of the output won't have a <em>lead</em>, that
<em>window function</em> call is going to return <code>NULL</code>. In that case, we keep the
current counter value as if we just did a <em>wraparound</em>. And finally, when
there's no <em>wraparound</em>, we don't care about the data. Well, for the purpose
of knowing the current <em>logical</em> value of the counter, that is.</p>

<p>And we get that encouraging result:</p>

<pre class="src">
 tick | nb  | max
------+-----+-----
    1 |   0 |
    2 |  10 |
    3 |  20 |
    4 |  30 |
    5 |  40 |  40
    6 |   0 |
    7 |  20 |
    8 |  30 |
    9 |  60 |  60
</pre>

<p>As you see, we have been able to create a new column out of the dataset, and
that new column only contains the data we are interested into.</p>


<h3>Finding the current counter value</h3>

<p class="first">All we have to do now is sum this computed columns entries. Remember that
the <code>sum()</code> aggregate function will simply discard nulls, so that we don't
have to turn them into a bunch of <code>0</code>.</p>

<pre class="src">
with t(tops) as (
  select case when lead(nb) over w &lt; nb then nb
              when lead(nb) over w is null then nb
              else null
          end as max
    from measures
  window w as (order by tick)
)
select sum(tops) from t;
</pre>

<center>
<p><img src="../../../images/reset-elect.jpg" alt=""></p>
</center>

<p>And here's the expected result:</p>

<pre class="src">
 sum
<span style="color: #b22222;">-----
</span> 100
</pre>

<p>Now what about testing with another set of data or two, just to be sure that
the counter is allowed to wrap more than once within our solution?</p>

<pre class="src">
insert into measures
     values (10, 0), (11, 10), (12, 30), (13, 35), (14, 45),
            (15, 25), (16, 50), (17, 100), (18, 110);
</pre>

<p>Then we have:</p>

<pre class="src">
with t(tops) as (
   select case when lead(nb) over w &lt; nb then nb
               when lead(nb) over w is null then nb
               else null
           end as max
     from measures
   window w as (order by tick)
)
select sum(tops) from t;
 sum
<span style="color: #b22222;">-----
</span> 255
(1 row)
</pre>

<p>All good!</p>


<h3>Counter logical value over a given period</h3>

<p class="first">Now of course what we want is to find the logical value of the counter for a
given day's or month's worth of measures. We then need to pay attention to
the value of the counter at the start of our period so that we know to
substract it from the logical sum over the period.</p>

<center>
<p><img src="../../../images/reset-coin-counter.jpg" alt=""></p>
</center>

<p>Here's an SQL version of the same sentence, applied to the period in between
ticks <code>4</code> and <code>14</code>, in a completely arbitrary choosing of mine:</p>

<pre class="src">
with t as (
  select tick,
         first_value(nb) over w as first,
         case when lead(nb) over w &lt; nb then nb
              when lead(nb) over w is null then nb
              else null
          end as max
    from measures
   where tick &gt;= 4 and tick &lt; 14
  window w as (order by tick)
)
select sum(max) - min(first) as sum
 from t;
</pre>

<p>Here we are using the <em>first_value()</em> window function to retain it in the
whole resultset of the <em>Common Table Expression</em> (the inner query introduced
by the keyword <code>WITH</code> is called that way). And when doing the sum we're
interested in at the outer level, we didn't forget to substract the first
value: we need to use an aggregate here because we're doing a <code>sum()</code>
aggregate at the same query level, and we have the same value in each row of
the resultset, so we used <code>min()</code>, <code>max()</code> would have been as good.</p>

<p>Another important trick we're using in that query is how to express the date
range. Never use <code>between</code> for that, as you would end up counting boundaries
twice, and customer won't like your accounting process if you do that.
Always use a combo of inclusive and exclusive boundaries comparison, as in
that <code>WHERE</code> clause in the previous query.</p>

<p>Let's have a quick look at the raw data in that range, using another nice
<em>aggregate</em> that PostgreSQL comes with:</p>

<pre class="src">
select array_agg(nb) from measures where tick &gt;= 4 and tick &lt; 14;
           array_agg
<span style="color: #b22222;">-------------------------------
</span> {30,40,0,20,30,60,0,10,30,35}
(1 row)
</pre>

<p>And now, the <em>logical counter value</em> for that period is computed as the
following value by the previous query:</p>

<pre class="src">
 sum
-----
 105
(1 row)
</pre>

<p>We can verify it manually, we want <code>40 + 60 + 35 - 30</code>, I think we're all good
again. Don't forget we have to substract the first measure from the period!</p>


<h3>Extending the problem</h3>

<p class="first">Another interesting problem, that we didn't have here but that I find
interesting enough to extend this article, is finding the ranges of time
(here, ticks) within which the counter didn't reset.</p>

<center>
<p><img src="../../../images/reset-A2a.jpg" alt=""></p>
</center>

<p>The query is more complex because we need to split the data into partitions,
each partition containing data from the same counter series of measures
without wrapping. The usual trick is to self-join our data set so that for
each given row we have a set of rows from the same partition, we are going
to instead use a <em>correlated subquery</em> to go fetch the next <em>wraparound</em> value:</p>

<pre class="src">
with tops as (
  select tick, nb,
         case when lead(nb) over w &lt; nb then nb
              when lead(nb) over w is null then nb
             else null
         end as max
    from measures
  window w as (order by tick)
)
  select tick, nb, max,
         (select tick
            from tops t2
           where t2.tick &gt;= t1.tick and max is not null
        order by t2.tick
           limit 1) as p
    from tops t1;

 tick | nb  | max | p
<span style="color: #b22222;">------+-----+-----+----
</span>    1 |   0 |     |  5
    2 |  10 |     |  5
    3 |  20 |     |  5
    4 |  30 |     |  5
    5 |  40 |  40 |  5
    6 |   0 |     |  9
    7 |  20 |     |  9
    8 |  30 |     |  9
    9 |  60 |  60 |  9
   10 |   0 |     | 14
   11 |  10 |     | 14
   12 |  30 |     | 14
   13 |  35 |     | 14
   14 |  45 |  45 | 14
   15 |  25 |     | 18
   16 |  50 |     | 18
   17 | 100 |     | 18
   18 | 110 | 110 | 18
(18 rows)
</pre>

<p>With that as an input it's then possible to build ranges of ticks including
non wrapping set of measures from our counter, and get for each range the
logical value tat the counter had at the end of it:</p>

<pre class="src">
with tops as (
  select tick, nb,
         case when lead(nb) over w &lt; nb then nb
              when lead(nb) over w is null then nb
             else null
         end as max
    from measures
  window w as (order by tick)
),
     parts as (
  select tick, nb, max,
         (select tick
            from tops t2
           where t2.tick &gt;= t1.tick and max is not null
        order by t2.tick
           limit 1) as p
    from tops t1
),
     ranges as (
  select first_value(tick) over w as start,
         last_value(tick) over w as end,
         max(max) over w
    from parts
  window w as (partition by p order by tick)
)
select * from ranges where max is not null;

 start | end | max
<span style="color: #b22222;">-------+-----+-----
</span>     1 |   5 |  40
     6 |   9 |  60
    10 |  14 |  45
    15 |  18 | 110
(4 rows)
</pre>


<h3>Conclusion</h3>

<p class="first">What I hope to have shown here, apart from some <em>window function</em> tips and
some nice use cases for <em>common table expressions</em>, is that as a developper
adding <code>SQL</code> to your tool set is a very good idea.</p>

<center>
<p><img src="../../../images/skill-set.jpg" alt=""></p>
</center>

<p>You don't want to have several parts of your code dealing with a logical
counter like this, because you want the reporting, accounting, quota,
billing and other software to all agree on the values. And you most probably
want to avoid to fetch a huge result set of data and process it in the
application memory (it'd better fit) rather than just get back a single
integer column single row resultset, right?</p>

<p>If you find this SQL example to be off the limits, it's a good sign that you
need to improve on your skills so that SQL is a real asset of your developer
multi languages multi paradygm talents.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 05 Oct 2012 09:44:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2012/10/05-reset-counter.html</guid>
</item>
<item>
  <title>PostgreSQL 9.3</title>
  <link>http://tapoueh.org/blog/2012/09/15-PostgreSQL-9.3.html</link>
  <description><![CDATA[<p><a href="http://www.postgresql.org/">PostgreSQL 9.2</a> is released! It's an awesome new release that I urge you to
consider trying and adopting, an upgrade from even <code>9.1</code> should be very well
worth it, as your hardware could suddenly be able to process a much higher
load. Indeed, better performances mean more work done on the same budget,
that's the name of the game!</p>

<p>As a <em>PostgreSQL contributor</em> though, the release of <code>9.2</code> mainly means to me
that it's time to fully concentrate on preparing <code>9.3</code>. The developement
<em>season</em> of which as already begun, by the way, so some amount of work has
already been done here.</p>

<center>
<p><img src="../../../images/event-trigger.jpg" alt=""></p>
</center>

<p>The list of things I want to be working on for that next release is quite
long, and looks more like a christmas list than anything else. Let's only
talk about those things I might as well make happen rather than all the
things I wish I was able to be delivering in a single release...</p>

<h3>Event Triggers</h3>

<p class="first">We missed <code>9.2</code> for wanting to include too big a feature in one go, leading to
too many choices to review and take decision about, for once, and also to
some non optimal choices that had to be reconsidered. Thanks to <a href="../06/24-back-from-pgcon.html">PGCON</a> in
Ottawa earlier this year, I could meet in person with <strong>Robert Haas</strong> and we've
been able to decide how to attack that big patch I had. The first step has
been to <em>commit</em> in the PostgreSQL tree only infrastructure parts, on which we
will be able to build the feature itself.</p>

<h4>Infrastructure</h4>

<p class="first">What we already have today is the ability to run <em>user defined function</em> when
some event occurs, and an event can only be a <code>ddl_command_start</code> as of now.
Also the <em>trigger</em> itself must be written in <code>PLpgSQL</code> or <code>PL/C</code>, as the support
for the other languages was not included from the patch.</p>

<p>That leaves some work to be done in the next months, right?</p>


<h4>PL support</h4>

<p class="first">The <em>user defined function</em> will get some information from <em>magic variables</em>
such as <code>TG_EVENT</code> and such. That allows easier integration of future
information we want to add, without disrupting those existing <em>triggers</em> that
you wrote (no <code>API</code> change), at the cost of having to write a specific
integration per <em>procedural language</em>.</p>

<p>So one of the first things to do now is to take the support for the others
<code>PL</code> that I had in my proposal and make a new patch with only that in there.</p>


<h4>Fill-in more information</h4>

<p class="first">Then again, this first infrastructure part was all about being actually able
to run a user function and left behind most of the information I would like
the function to have. The information already there is the <code>command tag</code>, the
<code>event name</code> and the <code>parsetree</code> that's only usable if you're writing your
trigger in <code>C</code>, which we expect some users to be doing.</p>

<p>To supplement that, we're talking about the <code>Object ID</code> that has been the
target of the <em>event</em>, the <code>schema</code> it leaves in when applicable, the <code>Object
Name</code>, the <code>Operation</code> that's running (<code>CREATE</code>, <code>ALTER</code>, <code>DROP</code>), the <code>Object Kind</code>
being the target of said operation (e.g. <code>TABLE</code> or <code>FUNCTION</code>), and the <code>command
string</code>.</p>


<h4>Publishing the Command String</h4>

<p class="first">Publishing the <em>Command String</em> here is not an easy task, because we have to
rebuild a normalized version of it. Or maybe we can go with passing explicit
context in which the command is running, such as the <code>search_path</code>.</p>

<p>Even with an explicit context that would be easy enough to <code>SET</code> back again
(in a remote server where you would be replicating the <code>DDL</code>, say), it would
be better to normalize the <em>command string</em> so as to remove extra spaces and
make it easier to parse and process from a <em>user defined function</em>.</p>

<p>That part looks like where most of the work is going to happen in the next
<em>commit fests</em>.</p>


<h4>Events</h4>

<p class="first">The other big thing I want to be working on with respect to this feature is
the <em>event</em> support, which is basically <em>hard coded</em> to be <code>ddl_command_start</code> in
the current state of the <code>9.3</code> code.</p>

<p>We certainly will want to be able to run <em>user defined function</em> not only at
the very beginning of a <em>DDL command</em>, but also just before it finishes so
that the newly created object already exists, for example.</p>

<p>We might also be interested into supporting triggers on more than <code>DDL</code>, there
I doubt we will see that happening in <code>9.3</code>, as some people in the community
would go crazy about complex use cases. Time is limited, and I think this is
better kept open for the next release, as the way our beloved PostgreSQL
works is by delivering reliable features: quality first.</p>


<h4>Use cases</h4>

<p class="first">I'm always happy to hear about use cases for the features I'm working on,
and this one has the potential to be covering a non trivial amount of them.
I already can think of <em>trigger based replication systems</em> and some integrated
<em>extension network facilities</em>. With your help we can give those the place
they should have: early days use cases in a great collection.</p>



<h3>Extensions</h3>

<center>
<p><img src="../../../images/extensions-cords.jpg" alt=""></p>
</center>

<p>So yes, <em>event triggers</em> first use case for me is in relation with <em>extensions</em>.
Surprise! There's still some more I want to do with <em>extensions</em>, so much that
I could consider their implementation in <code>9.1</code> just an enabler. In <code>9.1</code> the
game has been to offer the best support we could design for existing <code>contrib</code>
modules, with a very strong angle toward clean support for <em>dump</em> and <em>restore</em>.</p>

<p>The typical contrib module exports in SQL a list of C coded functions,
sometime supporting a new datatype, sometime a set of administration
functions. It's quite rare that contrib modules are handling <em>user data</em>
embedded in their SQL definition, and when it happens it's mostly
<em>configuration</em> kind of data, such as with <a href="TODO:%20add%20the%20link">PostGIS</a>.</p>

<p>Now we want to fully support <em>extensions</em> that are maintaining their own <em>user
data</em>, or even those that are all about them. The main difficulty here is
that our current design of <em>dump</em> and <em>restore</em> support is following a model
where installing the same extesion in a new database is all covered by
<code>create extension foo;</code>. This is a limited model of the reality, that we need
to expand.</p>

<p>The first manifestation of those problems is in the <code>SEQUENCE</code> support in
extensions, and that impacts one of my favorite extensions: <a href="http://wiki.postgresql.org/wiki/Skytools">PGQ</a>.</p>


<h3>PostgreSQL releases</h3>

<p><a href="http://www.postgresql.org/">PostgreSQL</a> just released an awesome release with <code>9.2</code>, where we get
tremendous performance optimisations and truly innovative features, such as
<code>RANGE TYPE</code>. How not to consider PostgreSQL as a part of your application
stack, where to develop and host your features.</p>

<p>While users are enjoying the newer release, contributors are already
preparing the next one, hard at work again!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Sat, 15 Sep 2012 18:43:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2012/09/15-PostgreSQL-9.3.html</guid>
</item>
<item>
  <title>El-Get 4.1 is out</title>
  <link>http://tapoueh.org/blog/2012/08/28-el-get-new-stable-release.html</link>
  <description><![CDATA[<p>Please welcome the new stable version of <a href="https://github.com/dimitri/el-get#readme">El-Get</a>, the much awaited <code>version
4.1</code> has now been branched for your pleasure. It's packed with lots of
features to make your life easy, comes with a <em>Info</em> documentation book and
even has a <em>logo</em>. That's no joke, I found one, at least:</p>

<center>
<p><img src="../../../images/el-get.big.png" alt=""></p>
</center>

<h3>Why El-Get is relevant</h3>

<p class="first">Emacs 24.1 is the first release that includes <code>package.el</code>, and it even allows
the user to setup several sources where to fetch packages. Those sources,
such as <a href="http://marmalade-repo.org/">Marmalade</a>, are hosting lots of third party code for Emacs.
<code>package.el</code> makes it easy to <em>install</em> (partly) those software.</p>

<p>This is a very fine way of getting extra features in your Emacs
installation, and one that is supported out of the box. For a <em>package</em> to be
listed, its sources need to be prepared, and you need to rely on the central
website you now depend on to be up and running and accessible.</p>

<p>El-Get is all about allowing you to easily cope with the still vast majority
of Emacs Lisp extensions you can find out there, that is non packaged code
that is only available on some more or less mainstream <em>distribution method</em>,
ranging from <a href="http://emacswiki.org/">EmacsWiki</a> to <a href="http://github.com/">github</a> including <em>bare HTTP</em> personal hosting.</p>

<p>With El-Get, you fetch the package where it's located. There's no need for a
central server to host packaged and released software, and it's easy to
share your findings with friends, or even to <em>publish</em> any <code>elisp</code> code you
write.</p>

<p>El-Get will also take care of final steps that <code>package.el</code> did choose not to
support, such as including <em>Info</em> material in your info browser (remember <code>C-h
i runs the command info</code>?), running <code>./configure &amp;&amp; make</code> for you, <em>byte
compiling</em> the sources you just retrieved, adding the necessary <a href="http://www.gnu.org/software/emacs/emacs-lisp-intro/html_node/Autoload.html">autoload</a>
support, etc.</p>

<p>And of course one of the <em>methods</em> supported by El-Get is <code>ELPA</code>, known as <em>Emacs
Lisp Package Archive</em> and implemented by <code>package.el</code>.</p>

<p>So definitely, you typically want both <code>ELPA</code> and <a href="http://github.com/dimitri/el-get">El-Get</a>.</p>


<h3>El-Get 4.1 Changelog Summary</h3>

<p class="first">The new El-Get release is packed with features. It really is. I will only
list some of them now:</p>

<ul>
<li>Plenty new recipes, we now have <code>590</code> of them managed in the El-Get source
repository itself, and El-Get will download the current <a href="http://emacswiki.org/">EmacsWiki</a> list of
<code>emacs lisp</code> files at install time too.</li>

<li>The default installation and usage has been simplified a lot.</li>

<li>More options are provided to setup El-Get packages, see
<code>el-get-user-package-directory</code> for example.</li>

<li>Part of the simplification, <code>el-get-sources</code> has been revisited and now
serves only one goal.</li>

<li>We dropped <code>(el-get 'wait)</code> which was a misconception and had been broken
for a long time in the development version of El-Get.</li>

<li>We made improvements in the error handling and in dealing with some
corner cases that still happen often enough for users to report them.
Please continue reporting them!</li>

<li>More caching is done, with a better dependency tracking and status
management.</li>

<li>Enhanced notification support, from <code>DBUS</code> to <code>growl</code>.</li>

<li>Support for <em>checksums</em> with a lot of <em>source types.</em></li>

<li>Completing our <code>git</code> support, <em>shallow</em> clones and <em>submodules</em> are there.</li>

<li>Better support for <code>github</code> including the <code>zip</code> and <code>tar</code> releases.</li>

<li>Ability to reload a package when it's been <em>updated</em>.</li>

<li><em>Moar</em> features</li>
</ul>

<p>And most importantly, El-Get documentation is now almost complete and comes
in the nice <em>Info</em> format I know you've been expecting for so long!</p>


<h3>Using El-Get</h3>

<p class="first">Here's a quick summary of what using El-Get is like, for a new user in 4.1.
If you're already using El-Get see the section about upgrading. To install
El-Get you need to paste those lines to your <code>*scratch*</code> buffer then hit <code>C-j</code>
after the last closing parenthesis:</p>

<pre class="src">
(url-retrieve
 <span style="color: #bc8f8f;">"https://raw.github.com/dimitri/el-get/master/el-get-install.el"</span>
 (lambda (s)
   (goto-char (point-max))
   (eval-print-last-sexp)))
</pre>

<p>Then you can try <code>M-x el-get-list-packages</code> and browse through more than <code>2000</code>
available packages. Mark the ones you want to install with <code>i</code> then type <code>x</code> to
see El-Get fetch and install all those packages you just selected. Here's a
summary of what's available to you in the <code>M-x el-get-list-packages</code> buffer:</p>

<pre class="src">
Major Mode Bindings:
SPC             el-get-package-menu-mark-unmark
?               el-get-package-menu-describe
d               el-get-package-menu-mark-delete
g               el-get-package-menu-revert
h               el-get-package-menu-quick-help
i               el-get-package-menu-mark-install
u               el-get-package-menu-mark-update
x               el-get-package-menu-execute
</pre>

<p>Once a package is <em>installed</em>, El-Get will <em>initialize</em> it for you, and it will
also do that step at every Emacs startup from there on, provided that you
added some lines to your <code>~/.emacs</code> initialization file, that look a lot like
the previous <code>*scratch*</code> code you did paste:</p>

<pre class="src">
;;
;; Here's a typical El-Get integration for your .emacs file:
;;
(add-to-list 'load-path <span style="color: #bc8f8f;">"~/.emacs.d/el-get/el-get"</span>)
(setq el-get-user-package-directory <span style="color: #bc8f8f;">"~/.emacs.d/packages.d/"</span>)

(unless (require 'el-get nil t)
  (with-current-buffer
      (url-retrieve-synchronously
       <span style="color: #bc8f8f;">"https://raw.github.com/dimitri/el-get/master/el-get-install.el"</span>)
    (goto-char (point-max))
    (eval-print-last-sexp)))

(el-get 'sync)
</pre>

<p>Then you can add files named like <code>init-&lt;package&gt;.el</code> in the
<code>el-get-user-package-directory</code> directory, those files will get loaded when
El-Get <em>initialize</em> <code>&lt;package&gt;</code>.</p>

<p>You can also use <code>M-x el-get-install</code> if you want to bypass the full screen
package listing, you will get completion on the package name.</p>


<h3>Community and development</h3>

<p class="first">El-Get community grew to be a really cool place to be participating in these
days, with core and <em>recipe</em> contributions from more than 130 different people
already, and with 526 stars on <code>github</code> and <code>184</code> forks. I almost can't believe
it!</p>

<pre class="src">
git --no-pager shortlog -n -s | wc -l
     137
git --no-pager shortlog -n -s | head -10
   734  Dimitri Fontaine
   336  Ryan C. Thompson
   114  Julien Danjou
   110  Dave Abrahams
    73  Ryan Thompson
    72  S&#233;bastien Gross
    42  Takafumi Arakaki
    27  Alex Ott
    25  Yakkala Yagnesh Raghava
    21  R&#252;diger Sonderfeld
</pre>

<p>Now that we have something that looks like a <em>core team</em> forming up, I'm
thinking about scheduling much more aggressive stable release. 4.1 has been
very long in the making, I hope to now have a rapid release cycle leading us
to <code>4.2</code> in quite a short time. As that's not an individual effort by any
mean, though, only history will tell.</p>


<h3>The roadmap</h3>

<p class="first">We have lots of ideas and some rough edges to address, so 4.1 is only a stop
in the release history of El-Get. Next ideas include better error management
in face of rare corner cases and in face of external events, like when you
did <code>rm -rf</code> a directory holding an El-Get managed extension: we should mark
it <em>removed</em> and clean up the <code>autoloads</code> that came from it.</p>


<h3>Upgrading to 4.1</h3>

<p class="first">This item has received some treatment in the documentation. The basic idea
is that <code>el-get-sources</code> is no longer what it used to be, it's now only an
alternative source location for <em>recipes</em>, like it should always have been.
Not that you can still <em>override</em> in there some properties that you want
<em>merged</em> with an official <em>recipe</em>.</p>

<p>The new thing about <code>el-get-sources</code> is that it will no longer be the
authoritative list of packages that El-Get manages. That list is not either
given explicitly when calling the <code>el-get</code> function in your <code>.emacs</code> setup, or
derived from the packages that are known <em>installed</em> on your system (like e.g.
<code>debian</code> is doing).</p>

<p>Also, given that it took us so much time to brew <code>4.1</code> a lot of packages have
changed either their hosting location or even switched their <code>SCM</code>. In such
cases an automatic update of the recipe will no longer be possible, you
might need to <code>el-get-remove</code> then <code>el-get-install</code> packages to get them back.</p>


<h3>Conclusion</h3>

<p class="first">El-Get <code>4.1</code> is now ready for public consumption, don't be shy, we've been a
lot of users running the development branch for a long time now, I'm running
<code>4.0.7.6901194</code> while writing this post. <code>4.0</code> is the development version of
what is now released as <code>4.1</code>.</p>

<p>Many thanks to all who contributed to El-Get and to all our users, I'm very
proud that together we worked out a very nice and complete tool!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 28 Aug 2012 11:43:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2012/08/28-el-get-new-stable-release.html</guid>
</item>
<item>
  <title>Fast and stupid?</title>
  <link>http://tapoueh.org/blog/2012/08/20-performance-the-easiest-way.html</link>
  <description><![CDATA[<p>I stumbled onto an interesting article about performance when using python,
called <a href="http://jiaaro.com/python-performance-the-easyish-way">Python performance the easy(ish) way</a>, where the author tries to get
the bet available performances out of the dumbiest possible python code,
trying to solve a very simple and stupid problem.</p>

<p>With so many <em>smart</em> qualifiers you can only guess that I did love the
challenge. The idea is to write the simplest code possible and see how
smarter you need to be when you need perfs. Let's have a try!</p>

<h3>local python results</h3>

<p class="first">Here's the code I did use to benchmark the python solution:</p>

<pre class="src">
<span style="color: #7f007f;">def</span> <span style="color: #0000ff;">sumrange</span>(arg):
    <span style="color: #7f007f;">return</span> <span style="color: #da70d6;">sum</span>(<span style="color: #da70d6;">xrange</span>(arg))

<span style="color: #7f007f;">def</span> <span style="color: #0000ff;">sumrange2</span>(arg):
    <span style="color: #b8860b;">x</span> = <span style="color: #b8860b;">i</span> = 0
    <span style="color: #7f007f;">while</span> i &lt; arg:
        <span style="color: #b8860b;">x</span> += i
        <span style="color: #b8860b;">i</span> += 1
    <span style="color: #7f007f;">return</span> x


<span style="color: #7f007f;">import</span> ctypes
<span style="color: #b8860b;">ct_sumrange</span> = ctypes.CDLL(<span style="color: #bc8f8f;">'/Users/dim/dev/CL/jiaroo/sumrange.so'</span>)

<span style="color: #7f007f;">def</span> <span style="color: #0000ff;">sumrange_ctypes</span>(arg):
    <span style="color: #7f007f;">return</span> ct_sumrange.sumrange(arg)

<span style="color: #7f007f;">if</span> <span style="color: #da70d6;">__name__</span> == <span style="color: #bc8f8f;">"__main__"</span>:
    <span style="color: #7f007f;">import</span> timeit
    <span style="color: #b8860b;">t1</span> = timeit.Timer(<span style="color: #bc8f8f;">'import jiaroo; jiaroo.sumrange(10**10)'</span>)
    <span style="color: #b8860b;">t2</span> = timeit.Timer(<span style="color: #bc8f8f;">'import jiaroo; jiaroo.sumrange2(10**10)'</span>)
    <span style="color: #b8860b;">ct</span> = timeit.Timer(<span style="color: #bc8f8f;">'import jiaroo; jiaroo.sumrange_ctypes(10**10)'</span>)

    <span style="color: #7f007f;">print</span> <span style="color: #bc8f8f;">'timing python sumrange(10**10)'</span>
    <span style="color: #7f007f;">print</span> <span style="color: #bc8f8f;">'xrange: %5fs'</span> % t1.timeit(1)
    <span style="color: #7f007f;">print</span> <span style="color: #bc8f8f;">'while:  %5fs'</span> % t2.timeit(1)
    <span style="color: #7f007f;">print</span> <span style="color: #bc8f8f;">'ctypes: %5fs'</span> % ct.timeit(1)
</pre>

<p>Oh. And the C code too, sorry about that.</p>

<pre class="src">
<span style="color: #da70d6;">#include</span> <span style="color: #bc8f8f;">&lt;stdio.h&gt;</span>

<span style="color: #228b22;">int</span> <span style="color: #0000ff;">sumrange</span>(<span style="color: #228b22;">int</span> <span style="color: #b8860b;">arg</span>)
{
    <span style="color: #228b22;">int</span> <span style="color: #b8860b;">i</span>, <span style="color: #b8860b;">x</span>;
    x = 0;

    <span style="color: #7f007f;">for</span> (i = 0; i &lt; arg; i++) {
        x = x + i;
    }
    <span style="color: #7f007f;">return</span> x;
}
</pre>

<p>And here's how I did compile it. The author of the inspiring article
insisted on stupid optimisation targets, I did follow him:</p>

<pre class="src">
gcc -shared -Wl,-install_name,sumrange.so -o sumrange.so -fPIC sumrange.c -O0
</pre>

<p>And here's the result I did get out of it:</p>

<pre class="src">
python jiaroo.py
timing python sumrange(10**10)
<span style="color: #da70d6;">xrange</span>: 927.039917s
<span style="color: #7f007f;">while</span>:  2377.291237s
ctypes: 5.297124s
</pre>

<p>Let's be fair, with <code>-O2</code> we get much better results:</p>

<pre class="src">
timing python sumrange(10**10)
ctypes: 1.065684s
</pre>


<h3>Common Lisp to the rescue</h3>

<p class="first">So let's have a try in Common Lisp, will you ask me, right?</p>

<p>Here's the code I did use, you can see three different tries:</p>

<pre class="src">
<span style="color: #b22222;">;;;; </span><span style="color: #b22222;">jiaroo.lisp
</span><span style="color: #b22222;">;;;</span><span style="color: #b22222;">
</span><span style="color: #b22222;">;;; </span><span style="color: #b22222;">See http://jiaaro.com/python-performance-the-easyish-way
</span><span style="color: #b22222;">;;;</span><span style="color: #b22222;">
</span><span style="color: #b22222;">;;; </span><span style="color: #b22222;">The goal here is to find out if CL needs to resort to C for very simple
</span><span style="color: #b22222;">;;; </span><span style="color: #b22222;">optimisation tricks like python apparently needs too, unless using pypy
</span><span style="color: #b22222;">;;; </span><span style="color: #b22222;">(to some extend).
</span>
(<span style="color: #7f007f;">in-package</span> #<span style="color: #da70d6;">:jiaroo</span>)

<span style="color: #b22222;">;;; </span><span style="color: #b22222;">"jiaroo" goes here. Hacks and glory await!
</span>
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">sumrange-loop</span> (max)
  <span style="color: #bc8f8f;">"return the sum of numbers from 1 to MAX"</span>
  (<span style="color: #7f007f;">let</span> ((sum 0))
    (<span style="color: #7f007f;">declare</span> (type (and unsigned-byte fixnum) max sum)
             (optimize speed))
    (<span style="color: #7f007f;">loop</span> for i fixnum from 1 to max do (incf sum i))))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">sumrange-dotimes</span> (max)
  <span style="color: #bc8f8f;">"return the sum of numbers from 1 to MAX"</span>
  (<span style="color: #7f007f;">let</span> ((sum 0))
    (<span style="color: #7f007f;">declare</span> (type (and unsigned-byte fixnum) max sum)
             (optimize speed))
    (<span style="color: #7f007f;">dotimes</span> (i max sum)
      (incf sum i))))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">pk-sumrange</span> (max)
  (<span style="color: #7f007f;">declare</span> (type (and unsigned-byte fixnum) max)
           (optimize speed))
  (<span style="color: #7f007f;">let</span> ((sum 0))
    (<span style="color: #7f007f;">declare</span> (type (and fixnum unsigned-byte) sum))
    (<span style="color: #7f007f;">dotimes</span> (i max sum)
      (setf sum (logand (+ sum i) most-positive-fixnum)))))

(<span style="color: #7f007f;">defmacro</span> <span style="color: #0000ff;">timing</span> (<span style="color: #228b22;">&amp;body</span> forms)
  <span style="color: #bc8f8f;">"return both how much real time was spend in body and its result"</span>
  (<span style="color: #7f007f;">let</span> ((start (gensym))
        (end (gensym))
        (result (gensym)))
    `(<span style="color: #7f007f;">let*</span> ((,start (get-internal-real-time))
            (,result (<span style="color: #7f007f;">progn</span> ,@forms))
            (,end (get-internal-real-time)))
       (values ,result (/ (- ,end ,start) internal-time-units-per-second)))))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">bench-sumrange</span> (power)
  <span style="color: #bc8f8f;">"print execution time of both the previous functions"</span>
  (<span style="color: #7f007f;">let*</span> ((max (expt 10 power))
         (lp-time (<span style="color: #7f007f;">multiple-value-bind</span> (r s) (timing (sumrange-loop max)) s))
         (dt-time (<span style="color: #7f007f;">multiple-value-bind</span> (r s) (timing (sumrange-dotimes max)) s))
         (pk-time (<span style="color: #7f007f;">multiple-value-bind</span> (r s) (timing (pk-sumrange max)) s)))
    (format t <span style="color: #bc8f8f;">"timing common lisp sumrange 10**~d~%"</span> power)
    (format t <span style="color: #bc8f8f;">"loop:       ~2,3fs ~%"</span> lp-time)
    (format t <span style="color: #bc8f8f;">"dotimes:    ~2,3fs ~%"</span> dt-time)
    (format t <span style="color: #bc8f8f;">"pk dotimes: ~2,3fs ~%"</span> pk-time)))
</pre>

<p>And here's the results:</p>

<pre class="src">
CL-USER&gt; (bench-sumrange 10)
timing common lisp sumrange 10**10
loop:       11.213s
dotimes:    7.642s
pk dotimes: 22.185s
NIL
</pre>


<h3>Discussion</h3>

<p class="first">So python is very slow. C is pretty fast. And Common Lisp just in the
middle. Honnestly I expected better performances from my beloved Common Lisp
here, but I didn't try very hard, by using <a href="http://ccl.clozure.com/">Clozure Common Lisp</a> which is not
the quicker Common Lisp implementation around. For this very benchmark, if
you're seeking speed use either <a href="http://sbcl.org/">Steel Bank Common Lisp</a> or <a href="http://www.clisp.org/">CLISP</a> which is
known to have a pretty fast bignums implementation (which you don't need in
64 bits in that game).</p>

<p>On the other hand, I think that having to go write a C plugin and deal with
how to compile and deploy it in the middle of a python script is something
to avoid. When using Common Lisp you don't need to resort to that for the
<em>runtime</em> to get down from python <em>xrange</em> implementation at <code>927.039917s</code> down to
the <em>dotimes</em> implementation taking <code>7.642s</code>. That's about <code>121</code> times faster.</p>

<p>So while <code>C</code> is even better, and while I would like a Common Lisp guru to show
me how to get a better speed here, I still very much appreciate the solution
here.</p>

<p>Let's see the winning source code in <em>python</em> and <em>common lisp</em> to compare the
programmer side of things: how hard was it really to get <code>121</code> times faster?</p>

<pre class="src">
<span style="color: #7f007f;">def</span> <span style="color: #0000ff;">sumrange</span>(arg):
    <span style="color: #7f007f;">return</span> <span style="color: #da70d6;">sum</span>(<span style="color: #da70d6;">xrange</span>(arg))
</pre>

<pre class="src">
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">sumrange-dotimes</span> (max)
  <span style="color: #bc8f8f;">"return the sum of numbers from 1 to MAX"</span>
  (<span style="color: #7f007f;">let</span> ((sum 0))
    (<span style="color: #7f007f;">declare</span> (type (and unsigned-byte fixnum) max sum)
             (optimize speed))
    (<span style="color: #7f007f;">dotimes</span> (i max sum)
      (incf sum i))))
</pre>

<p>That's about it. Yes we can see some <em>manual</em> optimisation directives here,
which are optimisation <em>extra complexity</em>. Not to the same level as bringing a
compiled artifact that you need to build and deploy, though. Remember that
you will need to know the full path where to find the <code>sumrange.so</code> file on
the production system, in the optimised <em>python</em> case, so that's what we are
comparing against.</p>

<p>Here's what happens without the optimisation, and with a smaller target:</p>

<pre class="src">
CL-USER&gt; (time (jiaroo:sumrange-dotimes (expt 10 9)))
(JIAROO:SUMRANGE-DOTIMES (EXPT 10 9))
took 722,592 microseconds (0.722592 seconds) to run.
During that period, and with 2 available CPU cores,
     714,709 microseconds (0.714709 seconds) were spent in user mode
       1,183 microseconds (0.001183 seconds) were spent in system mode
499999999500000000

CL-USER&gt; (time (<span style="color: #7f007f;">let</span> ((sum 0)) (<span style="color: #7f007f;">dotimes</span> (i (expt 10 9) sum) (incf sum i))))
(<span style="color: #7f007f;">LET</span> ((SUM 0)) (<span style="color: #7f007f;">DOTIMES</span> (I (EXPT 10 9) SUM) (INCF SUM I)))
took 2,174,767 microseconds (2.174767 seconds) to run.
During that period, and with 2 available CPU cores,
     2,156,549 microseconds (2.156549 seconds) were spent in user mode
        10,225 microseconds (0.010225 seconds) were spent in system mode
499999999500000000
</pre>

<p>We get a <code>3</code> times speed-up from those 2 lines of lisp optimisation
directives, which is pretty good. And it's exponential as I didn't have the
patience to actually wait until the non optimised <code>10^10</code> run finished, I
killed it.</p>


<h3>Conclusion</h3>

<p class="first">That's a case here where I don't know how to reach <code>C</code> level of performances
with Common Lisp, which could just be because I don't know yet how to do.</p>

<p>Still, getting a <code>121</code> times speedup when compared to the pure <em>python</em> version
of the code is pretty good and encourages me to continue diving into Common
Lisp.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 22 Aug 2012 16:05:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2012/08/20-performance-the-easiest-way.html</guid>
</item>
<item>
  <title>Autumn 2012 Conferences</title>
  <link>http://tapoueh.org/blog/2012/08/01-autumn-conferences.html</link>
  <description><![CDATA[<p>The <a href="http://www.postgresql.org/">PostgreSQL</a> community host a number of <a href="../../../conferences.html">conferences</a> all over the year, and
the next ones I'm lucky enough to get to are approaching fast now. First,
next month in September, we have <a href="http://postgresopen.org/2012/home/">Postgres Open</a> in Chicago, where my talk
about <a href="http://tapoueh.org/blog/2012/05/24-back-from-pgcon.html">Large Scale Migration from MySQL to PostgreSQL</a> has been selected!</p>

<center>
<p><img src="../../../images/autumn-leave-480.jpg" alt=""></p>
</center>

<p>This talk shares hindsights about the why and the how of that migration,
what problems couldn't be solved without moving away and how the solution
now looks. The tools used for migrating away the data, the methods the new
architecture are detailed. And the new home, in the cloud!</p>

<p>Not that much later after that the European PostgreSQL community is giving
us a very nice occasion to get to Prague with
<a href="http://2012.pgconf.eu/">PostgreSQL Conference Europe 2012</a> (October 23-26). If you've been meaning to
meet with the community, if you've been meaning to visit Prague someday, or
any mix of those two very good reasons, think about booking that conference
already.</p>

<p>The <a href="http://2012.pgconf.eu/callforpapers/">call for papers for pgconf.eu</a> has been extended to August 7th, 2012.
Consider sharing your hindsights too!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 02 Aug 2012 01:08:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2012/08/01-autumn-conferences.html</guid>
</item>
<item>
  <title>Solving Every Sudoku Puzzle</title>
  <link>http://tapoueh.org/blog/2012/07/10-solving-sudoku.html</link>
  <description><![CDATA[<p><a href="http://norvig.com/">Peter Norvig</a> published a while ago a very nice article titled
<a href="http://norvig.com/sudoku.html">Solving Every Sudoku Puzzle</a> wherein he presents a programmatic approach to
solving that puzzle game.</p>

<center>
<p><a class="image-link" href="http://en.wikipedia.org/wiki/Sudoku">
<img src="../../../images/sudoku.png"></a></p>
</center>

<p>The article is very well written and makes it easy to think that coming up
with the code for such a solver is a very easy task, you apply some basic
problem search principles and there you are. Which is partly true, in fact.
Also, he uses <code>python</code>, and that means that a lot of trivial programming
activities are not a concern anymore, such as memory management.</p>

<p>As I've been teaching myself <a href="http://www.cliki.net/Common%20Lisp">Common Lisp</a> for some weeks now I though I would
like to read a lisp version of his code, and the article even has a section
titled <em>Translations</em>. Unfortunately, no lisp version is available there. One
might argue that <a href="http://clojure.org/">Clojure</a> is a decent enough lisp, but my current quest is
all about <em>Common Lisp</em> really. So I had to write one myself.</p>

<pre class="src">
CL-USER&gt; (sudoku:print-puzzle
          (sudoku:solve-grid
<span style="color: #bc8f8f;">"5300700006001950000980000608000600034008030017000200060600002800004190050000800</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">79"))</span>
5 3 4 | 6 7 8 | 9 1 2
6 7 2 | 1 9 5 | 3 4 8
1 9 8 | 3 4 2 | 5 6 7
------+-------+------
8 5 9 | 7 6 1 | 4 2 3
4 2 6 | 8 5 3 | 7 9 1
7 1 3 | 9 2 4 | 8 5 6
------+-------+------
9 6 1 | 5 3 7 | 2 8 4
2 8 7 | 4 1 9 | 6 3 5
3 4 5 | 2 8 6 | 1 7 9
took 1,974 microseconds (0.001974 seconds) to run.
During that period, and with 2 available CPU cores,
     1,894 microseconds (0.001894 seconds) were spent in user mode
        88 microseconds (0.000088 seconds) were spent in system mode
 174,320 bytes of memory allocated.
#&lt;SUDOKU::PUZZLE #x3020023BB9FD&gt;
</pre>

<h3>Comments on the python version</h3>

<p class="first">Norvig's article is very well written, I think. By that I mean that by
reading it you're confident that you've understood the problem and how the
solution is articulated, so you almost think you don't need to really try to
understand the code, it's just an illustration of the text.</p>

<p>Well, not so much. When you want to port the exact same algorithm you have
to understand exactly what the code is doing so that you're not implementing
something else. All the more when, as I did, you want to use some other data
structure.</p>

<p>My goal was not to rewrite the code as-is, but to try and come up with
<em>idiomatic</em> lisp code implementing Norvig's solution. So rather than using
<em>strings</em> and <em>dictionaries</em> (in lisp, they still call them a <a href="http://www.lispworks.com/documentation/lw50/CLHS/Body/f_mk_has.htm">hash table</a>) I've
been using more natural data structures.</p>

<p>The <em>python</em> code is really uneasy to follow, full of functional programming
veteran tricks. I mean avoiding <em>exceptions</em> and simply returning <code>False</code>
whenever there's a problem, and using functions such as <code>all</code> and <code>some</code> to
manage that. It's certainly working, it's not making the code any easier to
read.</p>

<p>To summarize, that code looks like it's been written by someone smart who
didn't want to spend more than a couple of hours on it, and did take all
known trustworthy shortcuts he could to achieve that goal. Quality and
readability certainly weren't the key motive. I've been quite deceived after
reading a very good article.</p>


<h3>Comments on the common lisp version</h3>

<p class="first">Keep in mind that I'm just a <em>Common Lisp</em> newbie. I've been told some good
pieces of advice by knowledgeable people though, so with some luck my
implementation is somewhat <em>lispy</em> enough.</p>

<p>So we start by defining some data structures and low-level functions to
build up the more complex one, so that it's easier to read and debug. The
<em>sudoku</em> puzzle is then a grid of digits and a grid of possible values in
places where the digits are yet unknown.</p>

<p>The way to represent that 9x9 grid is with using <a href="http://www.lispworks.com/documentation/lw51/CLHS/Body/f_mk_ar.htm">make-array</a>:</p>

<pre class="src">
(make-array '(9 9)
            <span style="color: #da70d6;">:element-type</span> '(integer 0 9)
            <span style="color: #da70d6;">:initial-element</span> 0)
</pre>

<p>Then the possible values. I though about using a <code>bit-vector</code> (and actually I
did implement it that way), then I've been told that the <em>Common Lisp</em> way to
approach that is using <a href="http://psg.com/~dlamkins/sl/chapter18.html">2-complement integer representation</a>, as we have
plenty of functions to operate numbers that way. I wouldn't believe that
would make the code simpler, but in fact it really did, see:</p>

<pre class="src">
CL-USER&gt; #b111111111
511
CL-USER&gt; (logcount #b111111111)
9
CL-USER&gt; (logcount 511)
9
CL-USER&gt; (logbitp 3 #b100100100)
NIL
CL-USER&gt; (logbitp 2 #b100100100)
T
CL-USER&gt; (format nil <span style="color: #bc8f8f;">"~2r"</span> (logxor #b111111111 (ash 1 4)))
<span style="color: #bc8f8f;">"111101111"</span>
CL-USER&gt; (logbitp 4 (logxor #b111111111 (ash 1 4)))
NIL
</pre>

<p>With that in mind, we can write the following code:</p>

<pre class="src">
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">count-remaining-possible-values</span> (possible-values)
  <span style="color: #bc8f8f;">"How many possible values are left in there?"</span>
  <span style="color: #b22222;">;; </span><span style="color: #b22222;">we could raise an empty-values condition if we get 0...
</span>  (logcount possible-values))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">first-set-value</span> (possible-values)
  <span style="color: #bc8f8f;">"Return the index of the first set value in POSSIBLE-VALUES."</span>
  (+ 1 (floor (log possible-values 2))))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">only-possible-value-is?</span> (possible-values value)
  <span style="color: #bc8f8f;">"Return a generalized boolean which is true when the only value found in
   POSSIBLE-VALUES is VALUE"</span>
  (and (logbitp (- value 1) possible-values)
       (= 1 (logcount possible-values))))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">list-all-possible-values</span> (possible-values)
  <span style="color: #bc8f8f;">"Return a list of all possible values to explore"</span>
  (<span style="color: #7f007f;">loop</span> for i from 1 to 9
     when (logbitp (- i 1) possible-values)
     collect i))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">value-is-set?</span> (possible-values value)
  <span style="color: #bc8f8f;">"Return a generalized boolean which is true when given VALUE is possible
   in POSSIBLE-VALUES"</span>
  (logbitp (- value 1) possible-values))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">unset-possible-value</span> (possible-values value)
  <span style="color: #bc8f8f;">"return an integer representing POSSIBLE-VALUES with VALUE unset"</span>
  (logxor possible-values (ash 1 (- value 1))))
</pre>

<p>You can see here that I was also under the influence of a recent reading
about <a href="http://gar1t.com/blog/2012/06/10/solving-embarrassingly-obvious-problems-in-erlang/">making it obvious</a>, or so called <a href="http://dieswaytoofast.blogspot.fr/2012/07/erlang-why-so-many-seemingly-identical.html">intentional programming</a>, following
what <a href="http://armstrongonsoftware.blogspot.fr/">Joe Armstrong</a> has to say about it:</p>

<blockquote>
<p class="quoted"><em>Intentional programming is a name I give to a style of programming where
the reader of a program can easily see what the programmer intended by
their code. The intention of the code should be obvious from the names
of the functions involved and not be inferred by analysing the structure
of the code. (Reading the code should) precisely expresses the intention
of the programmer—here no guesswork or program analysis is involved, we
clearly read what was intended.</em></p>
</blockquote>

<p>So there we go with function names such as <code>count-remaining-possible-values</code>,
that will help when reading some more complex code, as in the following, the
meat of the solution:</p>

<pre class="src">
(<span style="color: #7f007f;">defmethod</span> <span style="color: #0000ff;">eliminate</span> ((puzzle puzzle) row col value)
  <span style="color: #bc8f8f;">"Eliminate given VALUE from possible values in cell ROWxCOL of PUZZLE, and
   propagate when needed"</span>
  (<span style="color: #7f007f;">with-slots</span> (grid values) puzzle
    <span style="color: #b22222;">;; </span><span style="color: #b22222;">if already unset, work is already done
</span>    (<span style="color: #7f007f;">when</span> (value-is-set? (aref values row col) value)
      <span style="color: #b22222;">;; </span><span style="color: #b22222;">eliminate the value from the set of possible values
</span>      (<span style="color: #7f007f;">let*</span> ((possible-values
              (unset-possible-value (aref values row col) value)))
        (setf (aref values row col) possible-values)

        <span style="color: #b22222;">;; </span><span style="color: #b22222;">now if we're left with a single possible value
</span>        (<span style="color: #7f007f;">when</span> (= 1 (count-remaining-possible-values possible-values))
          (<span style="color: #7f007f;">let</span> ((found-value (first-set-value possible-values)))
            <span style="color: #b22222;">;; </span><span style="color: #b22222;">update the main grid
</span>            (setf (aref grid row col) found-value)

            <span style="color: #b22222;">;; </span><span style="color: #b22222;">eliminate that value we just found in all peers
</span>            (eliminate-value-in-peers puzzle row col found-value)))

        <span style="color: #b22222;">;; </span><span style="color: #b22222;">now check if any unit has a single possible place for that value
</span>        (<span style="color: #7f007f;">loop</span>
           for (r . c)
           in (list-places-with-single-unit-solution puzzle row col value)
           do (assign puzzle r c value))))))
</pre>

<p>So that lisp code is quite verbose and at 389 lines almost doubles the 201
lines Norvig had. When clarity is part of the goal, that's hard to avoid, I
hope I made a good case that this is not due to lisp being overly verbose by
itself.</p>


<h3>Comments on the development environment</h3>

<p class="first">Or why I even considered <em>Common Lisp</em> as an interesting language for that
kind of exercise, and some more. <em>I'll have to tell about re-sharding data
live with 16 threads and 256 databases, all in CL, someday</em>.</p>

<p>So I've been doing some <em>Emacs Lisp</em> development for a while now, and the part
that makes that so much fun is the instant reward. You write some code in
your editor, type a key chord (usually, that's <code>C-M-x runs the command
eval-defun</code>) and your code is loaded up, ready to be tested. In <em>Emacs Lisp</em>
the test can be simply using your editor and watching the new behavior
taking place, or playing in the <code>M-x ielm</code> console. When the code is not
ready, it crashes, and you're left in the interactive debugger, where you
can use <code>C-x C-e runs the command eval-last-sexp</code> to evaluate any expression
in your source and see its value in the current <em>debug frame</em>.</p>

<p>That way of working is a huge productivity boost, that I've been missing
much when getting back to writing C code for PostgreSQL. I can't <code>C-M-x</code> the
current function and go write some <code>SQL</code> to test it right away, I have to
<em>compile</em> the whole source tree, then <em>install</em> the new binaries, then <em>restart</em>
the test server and then open up a <em>psql</em> console to interact with the new
code. Of course I could just <code>make check</code> and watch the results, but then if I
attach a <em>debugger</em> it complains that the code on-disk is more recent than the
code in the <em>core dump</em>.</p>

<p>What if you want <em>Emacs Lisp</em> integrated facilities and something made for
general programming rather than suited to building a text editor? Don't get
me wrong, you can probably find more production ready code in <em>elisp</em> than in
many other languages, just because Emacs has been there for about 35 years.
Editor targeted production code, though.</p>

<p>This integrated development cycle is all the same when you're using <em>Common
Lisp</em>. The awesome <a href="http://common-lisp.net/project/slime/">Superior Lisp Interaction Mode for Emacs</a> is providing
exactly that experience. Just run <code>M-x slime</code> and then as you define your code
you can <code>C-M-x</code> the function at point, see the compilation errors and warnings
if any in the associated <em>REPL</em>, and just try your code. I tend to mostly play
in the command line, it's possible to just use <code>C-x C-e</code> while typing too.</p>


<h3>Performances</h3>

<p class="first">Of course we do care! After all the original article came with a quite
detailed performance analysis with graphs and all. I won't be reproducing
that, sorry. I'll just show you what penalty you get for using an older
language specification, much more dynamic and with more features than
python, and with a great, scratch that, awesome development environment.</p>

<p>Oh wait, that's the other way round, no penalty, it's actually so much
faster!</p>

<h4>Python version perfs</h4>

<p class="first">The results I got on my desktop machine are about twice as fast as in the
original article, I guess newer machines and newer python have something to
say for that:</p>

<pre class="src">
  dim ~/dev/CL/sudoku python sudoku.dim.py
  All tests pass.
  Solved 50 of 50 easy puzzles (avg 0.01 secs (151 Hz), max 0.01 secs).
  Solved 95 of 95 hard puzzles (avg 0.02 secs (42 Hz), max 0.12 secs).
  Solved 11 of 11 hardest puzzles (avg 0.01 secs (115 Hz), max 0.01 secs).
</pre>

<p>That makes an average of <code>(50*151 + 95*42 + 11*115) / (50+95+11) =
82Hz</code>.</p>

<p>That seems pretty good, let's continue.</p>

<p>As you can see I've cut away the <em>random puzzle</em> part, that's because I was
too lazy to implement that part, which didn't seem all that interesting to
me. If you think that's a problem and need solving, I accept patches.</p>


<h4>Common lisp version perfs</h4>

<p class="first">When using <a href="http://sbcl.org/">SBCL</a> on the same machine, what I got was:</p>

<pre class="src">
  (sudoku:solve-example-grids)
  Solved 50 of 50 easy puzzles (avg .0021 sec (471.7 Hz), max 0.015 secs).
  Solved 95 of 95 hard puzzles (avg .0022 sec (446.0 Hz), max 0.008 secs).
  Solved 11 of 11 hardest puzzles (avg .0018 sec (550.0 Hz), max 0.003 secs).
</pre>

<p>With the same way to compute the average, we now have <code>461.6Hz</code>.</p>

<p>Now, that's between 3 times and more than <strong>10 times faster</strong> than the python
version (taken collection per collection), for a comparable effort, a much
better development environment, and the same all dynamic no explicit
compiling approach.</p>



<h3>Conclusion</h3>

<p class="first">I guess I'm fond of <em>Common Lisp</em>, which I already saw coming (so did you,
right?), and now I have some public article and code to share about why :)</p>

<p>The code is hosted at <a href="https://github.com/dimitri/sudoku">https://github.com/dimitri/sudoku</a> if you're
interested, with the necessary files to reproduce, some docs, etc.</p>

<p>Also, apart from using <em>integers</em> as <em>bitfields</em>, which I did more for being
lispy than for performances, I did very little effort for optimizing the
code. It's quite naive in this respect, yet allow me an average of <code>461.6Hz</code>
rather than <code>82Hz</code>, that's <strong><em>5.6 times faster</em></strong> average.</p>

<p>So yes, I will continue to invest some precious time in <em>Common Lisp</em> as a
very good interactive scripting language, and maybe more than that.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 10 Jul 2012 20:37:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2012/07/10-solving-sudoku.html</guid>
</item>
<item>
  <title>PGDay France 2012</title>
  <link>http://tapoueh.org/blog/2012/06/08-pgdayfr-lyon.html</link>
  <description><![CDATA[<p>The french PostgreSQL Conference, <a href="http://www.pgday.fr/programme">pgday.fr</a>, was yesterday in Lyon. We had a
very good time and a great schedule with a single track packed with 7 talks,
addressing a diverse set of PostgreSQL related topics, from GIS to fuzzy
logic, including replication.</p>

<p>You might have guessed it already, I did talk about replication. Here's the
slide deck I did use, it's in french, sorry if you don't grok that language.</p>

<center>
<p><a class="image-link" href="../../../images/confs/PGDay_2012_Replications.pdf">
<img src="../../../images/confs/PGDay_2012_Replications.png"></a></p>
</center>

<p>The conference was very nice and did go smoothly, even if we were “only” 60
of us I had the pleasure to meet with different users with very different
set of needs. Very happy to have been there!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 08 Jun 2012 16:17:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2012/06/08-pgdayfr-lyon.html</guid>
</item>
<item>
  <title>M-x recompile</title>
  <link>http://tapoueh.org/blog/2012/06/01-emacs-recompile.html</link>
  <description><![CDATA[<p>A friend of mine just asked me for advice to tweak some Emacs features, and
I think that's really typical of using Emacs: rather than getting used to
the way things are shipped to you, when using Emacs, you start wanting to
adapt the tools to the way you want things to be working instead. And you
can call that the awesome!</p>

<p>In this case we're talking about the <code>M-x compile</code> and <code>M-x recompile</code>
functions. My friend bound the former to <code>&lt;f11&gt;</code> and wanted that <code>C-u f11</code> do a
recompile with the exact same command line as the previous <code>compile</code> command.</p>

<p>Well, to be honest, I didn't know about <code>M-x recompile</code> until after I wrote
the following function, made to trigger another <code>compile</code> with the last
command used if using <code>C-u</code>.</p>

<pre class="src">
(<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">cyb-compile-last-command</span> nil)
(<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">cyb-compile-command-history</span> nil)

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">cyb-compile</span> (arg)
  <span style="color: #bc8f8f;">"Compile with given command, optionally recompile with last command"</span>
  (interactive <span style="color: #bc8f8f;">"P"</span>)
  (<span style="color: #7f007f;">if</span> arg
      (<span style="color: #7f007f;">progn</span>
        <span style="color: #b22222;">;; </span><span style="color: #b22222;">arg given: compile with last command
</span>        (<span style="color: #7f007f;">unless</span> cyb-compile-last-command
          (<span style="color: #ff0000; font-weight: bold;">error</span> <span style="color: #bc8f8f;">"Can't recompile yet, no known last command"</span>))
        (compile cyb-compile-last-command))
    <span style="color: #b22222;">;; </span><span style="color: #b22222;">else branch, no arg given, ask for a command
</span>    (<span style="color: #7f007f;">let</span> ((command
           (read-string
            <span style="color: #bc8f8f;">"Compile with command: "</span>
            <span style="color: #bc8f8f;">"make -k"</span> 'cyb-compile-command-history <span style="color: #bc8f8f;">"make -k"</span>)))
      (setq cyb-compile-last-command command)
      (compile command))))

(global-set-key (kbd <span style="color: #bc8f8f;">"&lt;f11&gt;"</span>) 'cyb-compile)
</pre>

<p>With that little <em>Emacs Lisp</em> code we're driving Emacs the way we want to be
working, and that's great! You can see it was a <em>quick hack</em> in that if you
wanted to use the function non interactively it would still prompt for the
command to use to compile, when <em>Emacs Lisp</em> <code>interactive</code> special form would
allow us to implement something way smarter here. Also if we wanted to spend
some more time on that feature, we should probably tweak the <em>error</em> condition
to be asking for the command rather than just complaining, that would
certainly be more useful.</p>

<p>Exercise left to the reader, rewrite using <code>recompile</code> rather than reinventing
it in a hurry! Beware of <code>call-interactively</code> though. Oh and fix the
aforementioned infelicities, too.</p>

<p>To conclude, we see that writing <em>Emacs Lisp</em> code to fix a usability problem
in a hurry is a great force of Emacs, and that we're provided with the
necessary tool set so as to be able to reach completeness if we wanted to do
so.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 01 Jun 2012 18:45:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2012/06/01-emacs-recompile.html</guid>
</item>
<item>
  <title>Back From PgCon</title>
  <link>http://tapoueh.org/blog/2012/05/24-back-from-pgcon.html</link>
  <description><![CDATA[<p>Last week was the annual <em>PostgreSQL Hackers</em> gathering in Canada, thanks to
the awesome <a href="http://www.pgcon.org/">pgcon</a> conference. This year's issue has been packed with good
things, beginning with the <a href="http://wiki.postgresql.org/wiki/PgCon2012CanadaClusterSummit">Cluster Summit</a> then followed the next day by the
<a href="http://wiki.postgresql.org/wiki/PgCon_2012_Developer_Meeting">Developer Meeting</a> just followed (yes, in the same day) with the
<a href="http://wiki.postgresql.org/wiki/PgCon2012CanadaInCoreReplicationMeeting">In Core Replication Meeting</a>. That was a packed shedule!</p>

<center>
<p><img src="../../../images/in-core-replication.jpg" alt=""></p>
</center>

<p>The <em>in core replication</em> project has been presented with slides titled
<a href="http://wiki.postgresql.org/images/7/75/BDR_Presentation_PGCon2012.pdf">Future In-Core Replication for PostgreSQL</a> and got a very good reception. For
instance, people implementing <a href="http://slony.info/">Slony</a> (<em>Jan Wieck</em>, <em>Christopher Browne</em> and <em>Steve
Singer</em> where here) appreciated the concepts here and where rather supportive
of both the requirements and the design, and appreciated the very early demo
and results that we had to show already, as a kind of a proof of concepts.</p>

<p>After those first two days, we could start the actual show. I had the honnor
to present a migration use case entitled <a href="http://www.pgcon.org/2012/schedule/events/431.en.html">Large Scale MySQL Migration</a> where
we're speaking about going from MySQL to PostgreSQL, from 37 to 256 shards,
moving more than 6TB of data including binary <em>blobs</em> that we had to process
with <code>pl/java</code>. A quite involved migration project whose slides you now can
read here:</p>

<center>
<p><a class="image-link" href="../../../images/fotolog.pdf">
<img src="../../../images/fotolog.jpg"></a></p>
</center>


<p>I've heard that we should soon be able to enjoy audio and video recordings
of the sessions, so if you couldn't make it this year for any reason, don't
miss that, you will have loads of very interesting talks to virtually
attend. I definitely will do that to catch-up with some talks I couldn't
attend, having to pick one out of three is not an easy task, all the more
when you add the providential <em>hallway track</em>.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 24 May 2012 09:40:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2012/05/24-back-from-pgcon.html</guid>
</item>
<item>
  <title>Clean PGQ Subconsumers</title>
  <link>http://tapoueh.org/blog/2012/04/26-unregister-subconsumers.html</link>
  <description><![CDATA[<p>Now that you're all using the wonders of <a href="../03/12-PGQ-Cooperative-Consumers.html">Cooperative Consumers</a> to help you
efficiently and reliably implement your business constraints and offload
them from the main user transactions, you're reaching a point where you have
to clean up your development environment (because that's what happens to
development environments, right?), and you want a way to start again from a
clean empty place.</p>

<center>
<p><img src="../../../images/drop-queue.png" alt=""></p>
</center>

<p>Here we go. It used to be much more simple than that, so if you're still
using <strong>PGQ</strong> from <strong>Skytools2</strong>, just jump to the next step.</p>

<h3>Unregister Subconsumers</h3>

<p class="first">That query will figure out subconsumers in the system function
<code>pgq.get_consumer_info()</code> and ask PGQ to please <em>unregister</em> them, losing events
in the way, even events from batches that are currently active.</p>

<pre class="src">
 with subconsumers as (
   select q1.queue_name,
          q2.consumer_name,
          substring(q1.consumer_name from <span style="color: #bc8f8f;">'%.#"%#"'</span> for <span style="color: #bc8f8f;">'#'</span>) as subconsumer_name
    from (select *
            from pgq.get_consumer_info()
           where lag is null)
         as q1
    join (select *
           from pgq.get_consumer_info()
           where lag is not null)
         as q2
         on q1.queue_name = q2.queue_name
)
select *,
       pgq_coop.unregister_subconsumer(queue_name, consumer_name,
                                       subconsumer_name, 1)
 from subconsumers;
</pre>


<h3>Unregister Consumers</h3>

<p class="first">Now that the first step is done, we have to <em>unregister</em> the main consumers,
which is easy and what you already did before:</p>

<pre class="src">
select queue_name, consumer_name,
       pgq.unregister_consumer(queue_name, consumer_name)
  from pgq.get_consumer_info();
</pre>


<h3>Drop queues</h3>

<p class="first">And as we want to really clean up the mess, let's also drop the queues.</p>

<pre class="src">
select queue_name, pgq.drop_queue(queue_name)
  from pgq.queue;
</pre>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 26 Apr 2012 15:05:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2012/04/26-unregister-subconsumers.html</guid>
</item>
<item>
  <title>PGQ Coop Consumers</title>
  <link>http://tapoueh.org/blog/2012/03/12-PGQ-Cooperative-Consumers.html</link>
  <description><![CDATA[<p>While working a new <a href="http://www.postgresql.org/">PostgreSQL</a> architecture for an high scale project that
used to be in the top 10 of internet popular web sites (in terms of
visitors), I needed to be able to off load some processing from the main
path: that's called a <em>batch job</em>. This needs to be <em>transactional</em>: don't run
the job if we did <code>rollback;</code> the transaction, process all <em>events</em> that were
part of the same transaction in the same transaction, etc.</p>

<center>
<p><img src="../../../images/workers.jpg" alt=""></p>
</center>

<p>That calls for using <a href="http://wiki.postgresql.org/wiki/PGQ_Tutorial">PGQ</a>, the <em>jobs queue</em> solution from <a href="http://wiki.postgresql.org/wiki/Skytools">Skytools</a>, the power
horse for <a href="http://wiki.postgresql.org/wiki/Londiste_Tutorial">Londiste</a>. If <code>PGQ</code> is good enough to build a full trigger-based
replication solution on top of it, certainly it's good enough for our custom
processing, right? Well, you still need to check that your expectations are
met, and that was happily the case in my implementation. It's a very common
problem, and <code>PGQ</code> very often is a great solution to it.</p>

<p>As this implementation is <code>PHP</code> centric, we've been using <a href="https://github.com/dimitri/libphp-pgq">libphp-pgq</a> to drive
our background workers. Using <code>PGQ</code> in <code>PHP</code> has been very easy to setup, the
only trap being not to forget about running the <em>ticker</em> process.</p>

<p>It got interesting because of two elements. First, we're nor running a
single database instance here but a bunch of them... make it <em>256 databases</em>.
Each of them having <code>5</code> queues to consume, that would be about <code>1280</code> consumer
processes, distributed on <code>16</code> servers that's still <code>80</code> per server, so way too
many. What we did instead is reuse the <a href="https://github.com/markokr/skytools/blob/master/scripts/queue_mover.py">queue mover</a> script found in the
Skytools distribution and adapt it to <em>forward</em> the event of the 1280 source
queues to only 5 destination queues. We then process the events from this
single location.</p>

<p>Now it's easier to deal with, but we're not still exactly there. Of course,
with so many sources, concentrating them all into the same place means that
a single consumer is not able to process the events as fast as they are
produced. That's where the <em>cooperative consuming</em> shines, it's very easy to
turn your <em>consumer</em> into a <em>cooperative</em> one even on an existing and running
queue, and that's what we did. So now we can choose how many <em>workers</em> we want
per queue: one of them has 4 workers, another one see not so much activity
and 1 worker still fits.</p>

<center>
<p><img src="../../../images/coop-workers.jpeg" alt=""></p>
</center>

<p>The queue mover script that knows how to subscribe to many queues from the
same process is going to be contributed to Skytools proper, of course.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 12 Mar 2012 14:43:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2012/03/12-PGQ-Cooperative-Consumers.html</guid>
</item>
<item>
  <title>Extension White Listing</title>
  <link>http://tapoueh.org/blog/2012/03/08-extension-white-listing.html</link>
  <description><![CDATA[<p>PostgreSQL 9.1 includes proper extension support, as you might well know if
you ever read this very blog here. Some hosting facilities are playing with
PostgreSQL at big scale (hello <a href="https://postgres.heroku.com/blog">Heroku</a>!) and still meet with small caveats
making their life uneasy.</p>

<p>To be specific, only <em>superusers</em> are allowed to install C coded stored
procedures, and that impacts a lot of very useful PostgreSQL extension: all
those shiped in the <em>contrib</em> package are coded in C. Now, <a href="https://postgres.heroku.com/blog">Heroku</a> is not
giving away <em>superuser</em> access to their hosted customers in order to limit the
number of ways they can shoot themselves in the foot. And given PostgreSQL
security model, being granted <em>database owner</em> is mostly good enough for day
to day operation.</p>

<blockquote>
<p class="quoted">
See Andrew's article <a href="http://people.planetpostgresql.org/andrew/index.php?/archives/259-Heroku&#44;-a-really-easy-way-to-get-a-database-in-a-hurry..html">Heroku, a really easy way to get a database in a hurry</a>
for more context about Heroku's offering here.</p>

</blockquote>

<p>Mostly, but as we see, not completely good enough. How to arrange for a non
<em>superuser</em> to be able to still install a C-coded extension in his own
database? That's quite dangerous as any bug causing a crash would mean a
PostgreSQL whole restart. So you not only want to empower <code>CREATE EXTENSION</code>
to database owners, you also want to be able to review and explicitely <em>white
list</em> the allowed extensions.</p>

<p>Here we go: <a href="https://github.com/dimitri/pgextwlist">pgextwlist</a> is a PostgreSQL extensions implementing just that
idea. You have to tweak <code>local_preload_libraries</code> so that it gets loaded
automatically and early enough, and you have to provide for the list of
authorized extensions in the <code>extwlist.extensions</code> setting.</p>

<p>Let's see a usage example, straight from the documentation:</p>

<pre class="src">
dim=&gt; select rolsuper from pg_roles where rolname = current_user;
select rolsuper from pg_roles where rolname = current_user;
 rolsuper
<span style="color: #b22222;">----------
</span> f
(1 row)

dim=&gt; create extension hstore;
create extension hstore;
WARNING:  =&gt; is deprecated as an operator name
DETAIL:  This name may be disallowed altogether in future versions of PostgreSQL.
CREATE EXTENSION

dim=&gt; create extension earthdistance;
create extension earthdistance;
ERROR:  extension "earthdistance" is not whitelisted
DETAIL: Installing the extension "earthdistance" failed, because it is not
        on the whitelist of user-installable extensions.
HINT: Your system administrator has allowed users to install certain
      extensions. SHOW extwlist.extensions;

dim=&gt; \dx
\dx
                           List of installed extensions
  Name   | Version |   Schema   |                   Description
<span style="color: #b22222;">---------+---------+------------+--------------------------------------------------
</span> hstore  | 1.0     | public     | data type for storing sets of (key, value) pairs
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language
(2 rows)

dim=&gt; drop extension hstore;
drop extension hstore;
DROP EXTENSION
</pre>

<p>As you can see, it allows non <em>superusers</em> to install an extension written in C.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 08 Mar 2012 14:25:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2012/03/08-extension-white-listing.html</guid>
</item>
<item>
  <title>Battle Language à la Marmite</title>
  <link>http://tapoueh.org/blog/2012/03/01-duchessfr-battle-language.html</link>
  <description><![CDATA[<p>J'ai eu la chance hier soir de participer à la <a href="http://jduchess.org/duchess-france/blog/battle-language-a-la-marmite/">Battle Language à la Marmite</a>,
où j'avais proposé de parler de <a href="http://www.emacswiki.org/emacs/EmacsLisp">Emacs Lisp</a>, proposition qui s'est
transformée en porte-étendard de la grande famille <a href="http://www.lisp.org/index.html">Lisp</a>. J'ai utilisé avec
plaisir certains contenu de <a href="http://www.lisperati.com/">Lisperati</a> dans ma présentation et je vous
recommande le détour sur ce site !</p>

<center>
<p><a class="image-link" href="../../../images/confs/elisp.pdf">
<img src="../../../images/confs/elisp-1.png"></a></p>
</center>

<p>J'ai dans cette présentation très rapide (5 minutes seulement) mentionné
l'approche <em>axiomatique</em> de <strong><em>John McCarthy</em></strong> lorsqu'il a <em>découvert</em> le language,
on peut en lire un peu plus sur le site de <strong><em>Paul Graham</em></strong> et son article
<a href="http://www.paulgraham.com/rootsoflisp.html">The Roots of Lisp</a> et le code associé, une
<a href="http://lib.store.yahoo.net/lib/paulgraham/jmc.lisp">implémentation du LISP de McCarthy en common lisp</a>.</p>

<p>Merci à <a href="http://jduchess.org/">Duchess</a> pour une bonne soirée où nous avons pu échanger nos points
de vue et débattre des languages fonctionnels et objects, des différences
entre Erlang et Haskell et Ruby, et de quelques autres sujets dérivés !</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 01 Mar 2012 14:49:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2012/03/01-duchessfr-battle-language.html</guid>
</item>
<item>
  <title>pgbouncer munin plugin</title>
  <link>http://tapoueh.org/blog/2011/11/16-pgbouncer-munin.html</link>
  <description><![CDATA[<p>It seems that if you search for a <a href="http://munin-monitoring.org/">munin</a> plugin for <a href="http://wiki.postgresql.org/wiki/PgBouncer">pgbouncer</a> it's easy
enough to reach an old page of mine with an old version of my plugin, and a
broken link. Let's remedy that by publishing here the newer version of the
plugin. To be honest, I though it already made its way into the official
munin <code>1.4</code> set of plugins, but I've not been following closely enough.</p>

<center>
<p><img src="../../../images/bouncing_elephant.gif" alt=""></p>
</center>

<p>As the plugin is 300 lines of python code, it's not a good idea to just
inline it here, so please grab it at <a href="../../../resources/pgbouncer_">pgbouncer_</a>.</p>

<p>You might need to know that the script name once installed should follow the
form <code>pgbouncer_dbname_stats_requests</code> or <code>pgbouncer_dbname_pools</code>, where of
course <code>dbname</code> can contain any number of <code>_</code> characters. This script supports
quite old versions of <em>pgbouncer</em> that didn't accept the normal <code>pq</code> protocol,
you did have to use <code>psql</code> to have any chance of getting the data from a
script, you couldn't then just use a PostgreSQL driver such as <a href="http://initd.org/psycopg/">psycopg2</a>.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 16 Nov 2011 14:00:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/11/16-pgbouncer-munin.html</guid>
</item>
<item>
  <title>Extensions en simple SQL</title>
  <link>http://tapoueh.org/blog/2011/10/31-extensions-sql.html</link>
  <description><![CDATA[<p>La <a href="http://2011.pgconf.eu/">conférence européenne à Amsterdam</a> était un très bon évènement de la
communauté, avec une organisation impeccable dans un hôtel accueillant. J'ai
eu le plaisir d'y parler des extensions et de leur usage dans le cadre du
développement applicatif « interne », sous le titre
<a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/138-extensions-are-good-for-business-logic/">Extensions are good for business logic</a>.</p>


<center>
<p><a class="image-link" href="http://wiki.postgresql.org/images/f/f1/Using-extensions.pdf">
<img src="../../../images/using-extensions-10.png"></a></p>
</center>

<p>L'idée de ma présentation, que la plupart d'entre vous a loupé je suppose
(en tout cas je n'avais qu'une petite poignée de français dans la salle, et
j'espère avoir des lecteurs qui n'étaient pas à Amsterdam), l'idée est
d'utiliser les mécanismes offerts par les extensions afin de maintenir le
code <code>PL</code> que vous utilisez en production.</p>

<p>Il s'agit la plupart du temps de procédures qui implémentent une partie de
la logique métier de vos applications, mais si proche des données que cela
termine en base directement : c'est une bonne chose, en particulier depuis
<em>PostgreSQL 9.1</em>. Cette version propose en effet une gestion assez complète
des extensions.</p>

<p>Il s'agit de réaliser un <em>empaquetage</em> de vos procédures en suivant la
documentation en ligne et son chapitre
<a href="http://docs.postgresqlfr.org/9.1/extend-extensions.html">35.15. Empaqueter des objets dans une extension</a>. Une fois cela fait, il est
alors possible de déployer votre ensemble de procédure stockée avec la
commande <code>CREATE EXTENSION mesprocs;</code>, et ensuite la commande <code>psql</code> <code>\dx</code> vous
permet de lister les extensions installées et leur numéro de version.</p>

<p>Les mises à jours sont également gérées avec une commande SQL dédiée, il
s'agit alors de <code>ALTER EXTENSION mesprocs UPDATE [TO version];</code>. Il suffit de
fournir des scripts intermédiaires nommés par exemple <code>mesprocs--1.0--1.1.sql</code>
et <code>mesprocs--1.1--1.2.sql</code> et PostgreSQL saura comment passer de <code>1.0</code> à <code>1.1</code>.</p>

<p>Voilà, vous savez presque tout de ma présentation à Amsterdam et vous pouvez
retrouver le reste sur le support proposé en début d'article. Bien sûr je
n'ai pas reproduit ici les questions intéressantes qui m'ont été posées, une
bonne partie d'entre elles sont venues enrichir ma liste de Noël pour les
extensions. Si vous voulez être sûr de trouver cela sous votre sapin,
cependant, le meilleur moyen est encore de m'en parler : sponsoriser les
développement Open Source est une belle démarche :)</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 31 Oct 2011 14:22:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/10/31-extensions-sql.html</guid>
</item>
<item>
  <title>Back From Amsterdam</title>
  <link>http://tapoueh.org/blog/2011/10/26-back-from-amsterdam.html</link>
  <description><![CDATA[<p>Another great conference took place last week,
<a href="http://2011.pgconf.eu/">PostgreSQL Conference Europe 2011</a> was in Amsterdam and plenty of us
PostgreSQL geeks were too. I attended to lot of talks and did learn some
more about our project, its community and its features, but more than that
it was a perfect occasion to meet with the community.</p>

<center>
<p><img src="../../../images/ams-conf-room.jpg" alt=""></p>
</center>

<p><a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/2-dave-page/">Dave Page</a> talked about <code>SQL/MED</code> under the title
<a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/146-postgresql-at-the-center-of-your-dataverse/">PostgreSQL at the center of your dataverse</a> and detailed what to expert from
a <em>Foreign Data Wrapper</em> in PostgreSQL 9.1, then how to write your own.
Wherever you are currently managing your data, you can easily enough make it
so that PostgreSQL integrates them by means of fetching them to answer your
queries. Which means real time data federating: you don't copy data around,
you remote access them when executing the query.</p>

<p>I might need to come up with new <em>Foreign Data Wrappers</em> in a not too distant
future, now that I better grasp how much work it really is to do that, it
appears to be a good migration strategy too:</p>

<pre class="src">
  INSERT INTO real.table
       SELECT * FROM foreign.table;
</pre>

<p>Another discovery is that apparently <a href="http://code.google.com/p/plv8js/wiki/PLV8">PLv8</a> is ready for public consumption.
Using it can lead to <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/174-heralding-the-death-of-nosql/">Heralding the Death of NoSQL</a>, so use it with care.</p>

<p>In the presentation of <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/156-synchronous-replication-and-durability-tuning/">Synchronous Replication and Durability Tuning</a> we
mainly saw that mixing <em>synchronous</em> and <em>asynchronous</em> transactions in your
application is the key to real performances across the ocean, as the speed
of the light is not infinite. From Baltimore to Amsterdam the latency can
not be better than <code>100ms</code> and that's not the same as <em>instant</em> nowadays.</p>

<p>Then again, depending on the number of concurrent queries to sync over the
ocean link, the experimental setup was able to achieve several thousands of
queries per second, which is validating the model we picked for <em>sync rep</em> and
its implementation.</p>

<p>If you want to read the slides again at home, or if you could not be there
for some reason, then most of the talks are now available online at the
<a href="http://wiki.postgresql.org/wiki/PostgreSQL_Conference_Europe_Talks_2011">PostgreSQL Conference Europe Talks 2011</a> wiki page.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 26 Oct 2011 10:08:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/10/26-back-from-amsterdam.html</guid>
</item>
<item>
  <title>Implementing backups</title>
  <link>http://tapoueh.org/blog/2011/10/12-backup-strategy.html</link>
  <description><![CDATA[<p>I've been asked about my opinion on backup strategy and best practices, and
it so happens that I have some kind of an opinion on the matter.</p>

<p>I tend to think best practice here begins with defining properly the <em>backup
plan</em> you want to implement. It's quite a complex matter, so be sure to ask
yourself about your needs: what do you want to be protected from?</p>

<center>
<p><img src="../../../images/online-backup.jpg" alt=""></p>
</center>

<p>The two main things to want to protect from are hardware loss (crash
disaster, plane in the data center, fire, water flood, etc) and human error
(<code>UPDATE</code> without a where clause). Replication is an answer to the former,
archiving and dumps to the latter. You generally need both.</p>

<p>Often enough “backups” include <code>WAL</code> <em>archiving</em> and <em>shipping</em> and nightly or
weekly <em>base backups</em>, with some retention and some scripts or procedures
ready to setup <a href="http://www.postgresql.org/docs/9.1/static/continuous-archiving.html">Point In Time Recovery</a> and recover some data without
interfering with the WAL archiving and shipping. Of course with PostgreSQL
9.0 and 9.1, the <em>WAL Shipping</em> can be implemented with <em>streaming replication</em>
and you can even have a <em>Hot Standby</em>. But for backups you still want
archiving.</p>

<p>Mostly I still implement <code>pg_dump -Fc</code> nightly backups with a custom retention
(for example, 1 backup a month kept 2 years, 1 backup a week kept 6 or 12
months, 1 backup a night kept 1 to 2 weeks), when the database size allow
the <code>pg_dump</code> run to remain constrained in the <em>maintenance window</em>, if any.</p>

<p>Don't forget that while <code>pg_dump</code> runs, you can't roll out <em>DDL changes</em> to the
production system any more, so you want to be careful about this
<em>maintenance window</em> thing. When you have one.</p>

<p><em>Physical backups</em> are not locking <em>rollouts</em> away, but they often suck a good
deal of the <em>IO bandwidth</em> so you need to pick up a right timing to do them.
That's how you can get to once a week base backup and WAL <em>archiving</em>.</p>

<p>If you can't <code>pg_dump</code> production, maybe you can have <em>automated restore jobs</em>
from the <em>physical backups</em> that you then <code>pg_dump -Fc</code>, so that you still have
that. That can come up handy, really: you can't test your <em>major upgrade</em> out
of a <em>physical backup</em>.</p>

<p>Also, <strong><em>obviously</em></strong>, never consider your backup strategy implemented until you
have either <em>automated restores</em> in place or a regular schedule to exercise
them (<em>staging instances</em>, devel instances).</p>

<p>Then as far as the practical tools go, I tend to think that <a href="http://tapoueh.org/pgsql/pgstaging.html">pg_staging</a> is
worth its installation complexity, and for WAL archiving and base backup I
recommend <a href="http://skytools.projects.postgresql.org/doc/walmgr.html">walmgr</a> from <a href="http://wiki.postgresql.org/wiki/SkyTools">Skytools</a>, that's a very handy tool. When using
PostgreSQL <code>9.0</code> or <code>9.1</code>, consider using <a href="http://packages.debian.org/experimental/skytools3-walmgr">walmgr3</a> so that it's behaving nice
alongside <em>streaming replication</em>.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 12 Oct 2011 22:22:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/10/12-backup-strategy.html</guid>
</item>
<item>
  <title>Extensions, applications</title>
  <link>http://tapoueh.org/blog/2011/10/10-extensions-applicatives.html</link>
  <description><![CDATA[<p>La <a href="http://2011.pgconf.eu/">conférence PostgreSQL annuelle en Europe</a> a lieu la semaine prochaine à
Amsterdam, et j'espère que vous avez déjà vos billets, car cette édition
s'annonce comme un très bon millésime !</p>

<p>Je présenterai donc comment utiliser les extensions, le titre en anglais est
<a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/138-extensions-are-good-for-business-logic/">Extensions are good for business logic</a>, et l'idée est de voir comment
exploiter les extensions afin de mieux gérer vos mises à jours en bases de
données.</p>

<p>Le cycle de vie des bases de données en production inclue souvent
l'utilisation d'une base de développement où le schéma évolue au rythme des
besoins des développeurs, et de temps en temps on consolide une partie de
ces modifications (dans des <em>rollouts</em>, scripts contenant principalement des
<code>DDL</code>) afin de les déployer en production — si possible avec une étape
intermédiaire en préproduction, tout de même.</p>

<p>Savoir ce qui est déployé en développement et comment en retirer le script à
jouer en production peut être parfois fastidieu.  Quand ce n'est pas le cas,
c'est que le travail a été fait en amont, ce qui est le signe d'une bonne
organisation, avec les surcoûts que l'on peut imaginer.</p>

<p>Les <a href="http://www.postgresql.org/docs/9.1/static/extend-extensions.html">extensions</a> telles que présentes dans PostgreSQL 9.1 vous permettent de
mieux gérer ce genre de cas, en optimisant le surcoût : il ne disparaît pas,
mais devient opérationnel plutôt que de rester une charge d'organisation.</p>

<p>Allez, je vous laisse maintenant, je dois me préparer pour la conférence :)</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 10 Oct 2011 10:35:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/10/10-extensions-applicatives.html</guid>
</item>
<item>
  <title>Scaling Stored Procedures</title>
  <link>http://tapoueh.org/blog/2011/10/06-scaling-with-stored-procedures.html</link>
  <description><![CDATA[<p>In the news recently <em>stored procedures</em> where used as an excuse for moving
away logic from the database layer to application layer, and to migrate away
from a powerful technology to a simpler one, now that there's no logic
anymore in the database.</p>

<p>It's not the way I would typically approach scaling problems, and apparently
I'm not alone on the <em>Stored Procedures</em> camp.  Did you read this nice blog
post <a href="http://ora-00001.blogspot.com/2011/07/mythbusters-stored-procedures-edition.html">Mythbusters: Stored Procedures Edition</a> already?  Well it happens in
another land that where my comfort zone is, but still has some interesting
things to say.</p>

<p>I won't try and address all of the myths they attack in a single article.
Let's pick the scalability problems, the two of them I think about are code
management and performances.  We are quite well equiped for that in
PostgreSQL, really.</p>

<p>For code maintainance we now have <a href="http://www.postgresql.org/docs/9.1/static/extend-extensions.html">PostgreSQL Extensions</a>, which allows you to
pack all your procedures into separate <em>extensions</em>, and to maintain a version
number and upgrade procedures for each of them.  You can handle separate
rollouts in development for going from <code>1.12</code> to <code>1.13</code> then <code>1.14</code> and after the
developers tested it more completely and changed their mind again on the
best API they want to work with, <code>1.15</code> which is stamped ok for production.
At this point, <code>ALTER EXTENSION UPGRADE</code> will happily apply all the rollouts
in sequence to upgrade from <code>1.12</code> straight to <code>1.15</code> in one go.  And if you
prefer to bake a special careful script to handle that big jump, you also
can provide a specific <code>extension--1.12--1.15.sql</code> script.</p>

<p>Of course you're managing all those files with your favorite <em>SCM</em>, to answer
to some other myth from the blog reference we are loosely following.</p>

<center>
<p><a class="image-link" href="http://postgresqlrussia.org/articles/view/131">
<img src="../../../images/Moskva_DB_Tools.v3.png"></a></p>
</center>

<p>I wanted to talk about the other side of the scalability problem, which is
the operations side of it.  What happens when you need to scale the database
in terms of its size and level of concurrent activity?  PostgreSQL earned a
very good reputation at being able to scale-up, what about scaling-out?
Certainly, now that you're all down into <em>Stored Procedure</em>, it's going to be
a very bad situation?</p>

<p>Well, in fact, you're then in a very good position here, thanks to <a href="http://wiki.postgresql.org/wiki/PL/Proxy">PLproxy</a>.
This <em>extension</em> is a custom procedural language whose job is to handle a
cluster of database shards that all expose the same PL API, and it's very
good at doing that.</p>

<p><em>Stored Procedures</em> are a very good tool to have, be sure to get comfortable
enough with them so that you can choose exactly when to use them.  If you're
not sure about that, we at <a href="http://www.2ndquadrant.com/">2ndQuadrant</a> will be happy to help you there!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 06 Oct 2011 18:23:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/10/06-scaling-with-stored-procedures.html</guid>
</item>
<item>
  <title>See you in Amsterdam</title>
  <link>http://tapoueh.org/blog/2011/10/04-see-you-in-Amsterdam.html</link>
  <description><![CDATA[<p>The next <a href="http://2011.pgconf.eu/">PostgreSQL conference</a> is approaching very fast now, I hope you have
your ticket already: it's a very promissing event!  If you want some help in
deciding whether to register or not, just have another look at <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/">the schedule</a>.
Pick the talks you want to see.  It's hard, given how packed with good ones
the schedule is.  When you're mind is all set, review the list.  Registered?</p>

<p>I'll be presenting another talk about extensions, but this time I've geared
up to use cases, with <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/138-extensions-are-good-for-business-logic/">Extensions are good for business logic</a>.  The idea is
not to talk about how to make PostgreSQL play fair with extensions including
at <em>dump</em> and <em>restore</em> times, that's already done and I've been talking only
too much about it.  The idea this time is to figure out how much you get
from this feature.</p>

<p>If you ever felt like something is missing in your processes between pushing
rollouts in devel environments and refining them as developers are testing
and preparing something for the live databases, then we have something for
you here.  Including how to easily compare state between production and
development, but without having to guess or reverse engineer anything.</p>

<p>Yeah, extensions are all about getting even more professional!  A great tool
you'll be happy to master!</p>

<p>And now I need to prepare a damn good slide deck, right?  See you there! :)</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 04 Oct 2011 14:25:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/10/04-see-you-in-Amsterdam.html</guid>
</item>
<item>
  <title>PostgreSQL à Amsterdam</title>
  <link>http://tapoueh.org/blog/2011/09/27-pgconf-eu.html</link>
  <description><![CDATA[<p>Dans moins d'un mois se tient la conférence européenne PostgreSQL,
<a href="http://2011.pgconf.eu/">pgconf.eu</a>.  Il s'agit de quatre jours consacrés à votre SGBD préféré, où
vous pourrez rencontrer la communauté européenne, consituée d'utilisateurs,
d'entreprises de toutes tailles, de développeurs, de participants en tout
genre.</p>

<p>C'est l'endroit où aller pour apprendre comment le projet fonctionne,
comprendre les impacts des nouvelles versions sur votre architecture, avoir
une discussion technique pointue sur cette fonctionalité que vous voudriez
voir arriver dans la prochaine version, ou simplement vous rendre compte de
l'énergie formidable qui est insuflée dans ce projet !</p>

<p>Évidemment <a href="http://2ndquadrant.fr/">2ndQuadrant</a> sera de la partie, nous présenterons plusieurs de nos
<a href="http://www.2ndquadrant.com/fr/les-fonctionnalites-de-postgresql-91/">contributions PostgreSQL 9.1</a>.  Cela commencera avec la formation d'une
journée complète de <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/81-greg-smith/">Greg</a>, <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/162-performance-from-start-to-crash/">Performance From Start to Crash</a> : si vous voulez
apprendre comment aborder les performances d'un serveur PostgreSQL par le
<em>leader</em> international du domaine, auteur du livre
<a href="http://www.amazon.fr/Bases-donn%C3%A9es-PostgreSQL-Gregory-Smith/dp/274402483X/ref=sr_1_1?ie=UTF8&amp;qid=1316183931&amp;sr=8-1">Bases de données PostgreSQL 9.0</a>, réservez vite votre place !</p>

<p>Les présentation au format classique commencent le lendemain, et en trois
jours la liste des présentation de notre <a href="http://www.2ndquadrant.com/fr/profil-de-lequipe/">équipe 2ndQuadrant</a> est assez
copieuse.  Voyons cela.</p>

<p>Nous commençons avec <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/144-migration-to-postgresql-a-holistic-view/">Migration to PostgreSQL - a holistic view</a> par
<a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/78-harald-armin-massa/">Harald Armin Massa</a>, qui propose un point de vue intéressant sur les raisons
qui retiennent certaines migrations.  Ensuite <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/34-gianni-ciolli/">Gianni Ciolli</a> présentera
<a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/159-look-out-the-window-functions-and-free-your-sql/">Look Out The Window Functions (and free your SQL)</a> ou comment résoudre
simplement des problèmes complexes lorsque l'on dispose d'outils avancés.</p>

<p>Une autre présentation à ne pas rater,
<a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/156-synchronous-replication-and-durability-tuning/">Synchronous Replication and Durability Tuning</a> détaille comment profiter au
mieux de PostgreSQL 9.1 afin d'obtenir les garanties de durabilité des
données souhaitées dans votre application.  Et cette présentation est animée
par <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/81-greg-smith/">Greg Smith</a> et <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/17-simon-riggs/">Simon Riggs</a>.  Ce dernier a développé la <em>réplication
synchrone</em>, et <em>Hot Standby</em> avant cela.  Vous ne trouverez personne au monde
mieux placé pour faire cette présentation !</p>

<p>Les deux prochaines présentation de nos <a href="http://expert-postgresql.fr/">experts PostgreSQL</a>, en continuant
notre lecture du <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/">programme de pgconf.eu</a> dans l'ordre, ont lieu au même
moment.  Le choix ne sera pas facile entre <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/158-improving-vacuum-suction/">Improving VACUUM Suction</a> par Greg
à nouveau, et une comparaison de <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/183-londiste-3-et-slony-21/">londiste 3 et slony 2.1</a> par
<a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/57-cedric-villemain/">Cédric Villemain</a>, en français.</p>

<p>À suivre, <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/138-extensions-are-good-for-business-logic/">Extensions are good for business logic</a> que je vous présenterai
moi-même, vous pouvez voir ma présentation sur la fiche qui porte mon nom :
<a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/14-dimitri-fontaine/">Dimitri Fontaine</a>.  Il s'agit d'une présentation en anglais qui détaille
comment utiliser les extensions dans le cadre de la maintenance de la partie
<em>procédures stockées</em> d'une application.</p>

<p>Et pour finir le deuxième jour des conférences 2ndQuadrant, vous pourrez
apprendre avec Gianni comment
<a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/160-debugging-complex-sql-queries-with-writable-ctes/">Debugging complex SQL queries with writable CTEs</a>, une fonctionnalité
contribuée au projet par un autre consultant <a href="http://www.2ndquadrant.com/fr/contact/">2ndQuadrant</a>, Marko Tiikkaja.</p>

<p>Et il reste encore une journée !  Nous ne mentons pas en disant que le
programme est complet !  Le dernier jour de la conférence n'est pas le moins
intéressant, j'espère que vous aurez su garder un peu d'énergie pour suivre…</p>

<p><a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/17-simon-riggs/">Simon Riggs</a> qui présentera sa vision de la <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/199-postgresql-roadmap/">PostgreSQL Roadmap</a> pour les
prochaines années.  Ce n'est bien sûr que sa vision personnelle, mais
lorsque l'on fait le bilan de ces 7 dernières années de
<a href="http://www.2ndquadrant.com/fr/histoire-postgresql/">contributions à PostgreSQL</a>, on voit à quel point son opinion personnelle
peut avoir du poids dans le développement du projet.</p>

<p>À suivre, la présentation de Greg sur son sujet de prédilection :
<a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/157-bottom-up-database-benchmarking/">Bottom-up Database Benchmarking</a>.  Tout ce que vous avez toujours voulu
savoir sur les mesures de performances de vos bases de données, sans jamais
oser le demander.  Quelque chose dans ce style en tout cas :)</p>

<p>Bien sûr d'autres présentations sont disponibles et retiendront votre
attention, ce billet vous présente seulement celles qui seront données par
les <a href="http://expert-postgresql.fr/">experts PostgreSQL</a> de <a href="http://www.2ndquadrant.com/fr/expertise-postgresql/">2ndQuadrant</a>.  En vous souhaitant bonne conférence
à tous, j'espère avoir le plaisir de vous retrouver à Amsterdam le mois
prochain !</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 27 Sep 2011 11:10:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/09/27-pgconf-eu.html</guid>
</item>
<item>
  <title>Skytools3: walmgr</title>
  <link>http://tapoueh.org/blog/2011/09/21-skytools-walmgr-part-1.html</link>
  <description><![CDATA[<p>Let's begin the <a href="http://wiki.postgresql.org/wiki/SkyTools">Skytools 3</a> documentation effort, which is long overdue.  The
code is waiting for you over at <a href="https://github.com/markokr/skytools">github</a>, and is stable and working.  Why is
it still in <em>release candidate</em> status, I hear you asking?  Well because it's
missing updated documentation.</p>

<p><a href="http://packages.debian.org/experimental/skytools3-walmgr">WalMgr</a> is the Skytools component that manages <em>WAL shipping</em> for you, and
archiving too.  It knows how to prepare your master and standby setup, how
to take a base backup and push it to the standby's system, how to archive
(at the satndby) master's WAL files as they are produced and have the
standby restore from this archive.</p>

<p>What's new in <code>walmgr</code> from Skytools 3 is its support for <em>Streaming
Replication</em> that made its way into PostgreSQL 9.0 and is even more useful in
PostgreSQL 9.1 (better monitoring, synchronous replication option).</p>

<h2>Getting ready</h2>

<p class="first">Now, I'm using debian here, and a build virtual machine where I'm doing the
<em>backporting</em> work.  As <a href="http://www.postgresql.org/about/news.1349">PostgreSQL 9.1</a> is now out, let's use that.</p>

<pre class="src">
:~$ pg_lsclusters
Version Cluster   Port Status Owner    Data directory
8.4     main      5432 online postgres /var/lib/postgresql/8.4/main ...
9.0     main      5433 online postgres /var/lib/postgresql/9.0/main ...
9.1     main      5434 online postgres /var/lib/postgresql/9.1/main ...
</pre>

<p>After some editing of the configuration files (enabling <em>hot standby</em> and
switching <code>pg_hba.conf</code> to <code>trust</code> for the sake of this example), we can see
that the cluster is ready to be abused:</p>

<pre class="src">
:~$ sudo pg_ctlcluster 9.1 main restart
:~$ psql --cluster 9.1/main  -U postgres \
-c <span style="color: #ad7fa8; font-style: italic;">"select name, setting from pg_settings where name in ('max_wal_senders', 'wal_level')"</span>
      name       |   setting
-----------------+-------------
 max_wal_senders | 1
 wal_level       | hot_standby
(2 rows)

:~$ sudo mkdir -p /etc/walshipping/9.1/main /var/lib/postgresql/walshipping
:~$ sudo chown -R postgres:postgres /etc/walshipping /var/lib/postgresql/walshipping

:~$ ssh-keygen -t dsa
:~/.ssh$ cp id_dsa.pub authorized_keys
:~$ ssh localhost
</pre>

<p>So the order of operations is to prepare a standby, then have it restore
from the archives, then activate the wal streaming and check that the setup
allows the standby to switch back and forth between the streaming and the
archives.</p>


<h2>Setting walmgr</h2>

<p class="first">To prepare the standby, we will do a <em>base backup</em> of the master.  That step
is handled by <code>walmgr</code>, so we first need to set it up.  Here's the sample
<code>master.ini</code> file:</p>

<pre class="src">
[<span style="color: #8ae234; font-weight: bold;">walmgr</span>]
<span style="color: #eeeeec;">job_name</span>             = wal-master
<span style="color: #eeeeec;">logfile</span>              = /var/log/postgresql/%(job_name)s.log
<span style="color: #eeeeec;">pidfile</span>              = /var/run/postgresql/%(job_name)s.pid
<span style="color: #eeeeec;">use_skylog</span>           = 0

<span style="color: #eeeeec;">master_db</span>            = port=5434 dbname=template1
<span style="color: #eeeeec;">master_data</span>          = /var/lib/postgresql/9.1/main/
<span style="color: #eeeeec;">master_config</span>        = /etc/postgresql/9.1/main/postgresql.conf
<span style="color: #eeeeec;">master_bin</span>           = /usr/lib/postgresql/9.1/bin

<span style="color: #888a85;"># </span><span style="color: #888a85;">set this only if you can afford database restarts during setup and stop.
</span><span style="color: #eeeeec;">master_restart_cmd</span>   = pg_ctlcluster 9.1 main restart

<span style="color: #eeeeec;">slave</span> = 127.0.0.1
<span style="color: #eeeeec;">slave_config</span> = /etc/walshipping/9.1/main/standby.ini

<span style="color: #eeeeec;">walmgr_data</span>          = /var/lib/postgresql/walshipping/9.1/main
<span style="color: #eeeeec;">completed_wals</span>       = %(walmgr_data)s/logs.complete
<span style="color: #eeeeec;">partial_wals</span>         = %(walmgr_data)s/logs.partial
<span style="color: #eeeeec;">full_backup</span>          = %(walmgr_data)s/data.master
<span style="color: #eeeeec;">config_backup</span>        = %(walmgr_data)s/config.backup

<span style="color: #888a85;"># </span><span style="color: #888a85;">syncdaemon update frequency
</span><span style="color: #eeeeec;">loop_delay</span>           = 10.0
<span style="color: #888a85;"># </span><span style="color: #888a85;">use record based shipping available since 8.2
</span><span style="color: #eeeeec;">use_xlog_functions</span>   = 0

<span style="color: #888a85;"># </span><span style="color: #888a85;">pass -z to rsync, useful on low bandwidth links
</span><span style="color: #eeeeec;">compression</span>          = 0

<span style="color: #888a85;"># </span><span style="color: #888a85;">keep symlinks for pg_xlog and pg_log
</span><span style="color: #eeeeec;">keep_symlinks</span>        = 1

<span style="color: #888a85;"># </span><span style="color: #888a85;">tell walmgr to set wal_level to hot_standby during setup
</span><span style="color: #888a85;">#</span><span style="color: #888a85;">hot_standby          = 1
</span>
<span style="color: #888a85;"># </span><span style="color: #888a85;">periodic sync
</span><span style="color: #888a85;">#</span><span style="color: #888a85;">command_interval     = 600
</span><span style="color: #888a85;">#</span><span style="color: #888a85;">periodic_command     = /var/lib/postgresql/walshipping/periodic.sh
</span></pre>

<p>And the <code>/etc/walshipping/9.1/main/standby.ini</code> companion:</p>

<pre class="src">
[<span style="color: #8ae234; font-weight: bold;">walmgr</span>]
<span style="color: #eeeeec;">job_name</span>             = wal-standby
<span style="color: #eeeeec;">logfile</span>              = /var/log/postgresql/%(job_name)s.log
<span style="color: #eeeeec;">use_skylog</span>           = 0

<span style="color: #eeeeec;">slave_data</span>           = /var/lib/postgresql/9.1/standby
<span style="color: #eeeeec;">slave_bin</span>            = /usr/lib/postgresql/9.1/bin
<span style="color: #eeeeec;">slave_stop_cmd</span>       = pg_ctlcluster 9.1 standby stop
<span style="color: #eeeeec;">slave_start_cmd</span>      = pg_ctlcluster 9.1 standby start
<span style="color: #eeeeec;">slave_config_dir</span>     = /etc/postgresql/9.1/standby/

<span style="color: #eeeeec;">walmgr_data</span>          = /var/lib/postgresql/walshipping/9.1/main
<span style="color: #eeeeec;">completed_wals</span>       = %(walmgr_data)s/logs.complete
<span style="color: #eeeeec;">partial_wals</span>         = %(walmgr_data)s/logs.partial
<span style="color: #eeeeec;">full_backup</span>          = %(walmgr_data)s/data.master
<span style="color: #eeeeec;">config_backup</span>        = %(walmgr_data)s/config.backup

<span style="color: #eeeeec;">backup_datadir</span>       = no
<span style="color: #eeeeec;">keep_backups</span>         = 0
<span style="color: #888a85;"># </span><span style="color: #888a85;">archive_command =
</span>
<span style="color: #888a85;"># </span><span style="color: #888a85;">primary database connect string for hot standby -- enabling
</span><span style="color: #888a85;"># </span><span style="color: #888a85;">this will cause the slave to be started in hot standby mode.
</span><span style="color: #eeeeec;">primary_conninfo</span>     = host=127.0.0.1 port=5434 user=postgres
</pre>

<p>And let's get started:</p>

<pre class="src">
:~$ cp standby.ini /etc/walshipping/9.1/main/

:~$ walmgr3 -v master.ini setup
2011-09-21 16:57:05,685 30450 INFO Configuring WAL archiving
2011-09-21 16:57:05,687 30450 DEBUG found 'archive_mode' in config -- enabling it
2011-09-21 16:57:05,687 30450 DEBUG found 'wal_level' in config -- setting to 'archive'
2011-09-21 16:57:05,688 30450 DEBUG modifying configuration: {'archive_mode': 'on', 'wal_level': 'archive', 'archive_command': '/usr/bin/walmgr3 /var/lib/postgresql/master.ini xarchive %p %f'}
2011-09-21 16:57:05,688 30450 DEBUG found parameter archive_mode with value ''off''
2011-09-21 16:57:05,690 30450 DEBUG found parameter wal_level with value ''minimal''
2011-09-21 16:57:05,690 30450 DEBUG found parameter archive_command with value ''''
2011-09-21 16:57:05,691 30450 INFO Restarting postmaster
2011-09-21 16:57:05,691 30450 DEBUG Execute cmd: 'pg_ctlcluster 9.1 main restart'
2011-09-21 16:57:09,404 30450 DEBUG Execute cmd: 'ssh' '-Tn' '-o' 'Batchmode=yes' '-o' 'StrictHostKeyChecking=no' '127.0.0.1' '/usr/bin/walmgr3' '/etc/walshipping/9.1/main/standby.ini' 'setup'
2011-09-21 16:57:09,712 30450 INFO Done

postgres@squeeze64:~$ walmgr3 master.ini backup
2011-09-21 17:00:17,259 30702 INFO Backup lock obtained.
2011-09-21 17:00:17,277 30692 INFO Execute SQL: select pg_start_backup('FullBackup'); [port=5434 dbname=template1]
2011-09-21 17:00:17,791 30712 INFO Removing expired backup directory: /var/lib/postgresql/walshipping/9.1/main/data.master
2011-09-21 17:00:18,200 30692 INFO Checking tablespaces
2011-09-21 17:00:18,202 30692 INFO pg_log does not exist, skipping
2011-09-21 17:00:18,259 30692 INFO Backup conf files from /etc/postgresql/9.1/main
2011-09-21 17:00:18,590 30731 INFO First useful WAL file is: 000000010000000200000092
2011-09-21 17:00:19,901 30759 INFO Backup lock released.
2011-09-21 17:00:19,919 30692 INFO Full backup successful

:~$ walmgr3 /etc/walshipping/9.1/main/standby.ini listbackups

List of backups:

Backup set      Timestamp                Label       First WAL
--------------- ------------------------ ----------- ------------------------
data.master     2011-09-21 17:00:17 CEST FullBackup  000000010000000200000092
</pre>

<p>Following articles will show how to manage that archive and how to go from
that to an <em>Hot Standby</em> fed by either <em>Streaming Replication</em> or <em>Archives</em>.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 21 Sep 2011 17:21:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/09/21-skytools-walmgr-part-1.html</guid>
</item>
<item>
  <title>el-get-3.1</title>
  <link>http://tapoueh.org/blog/2011/09/16-el-get-3.1.html</link>
  <description><![CDATA[<p>The <a href="https://github.com/dimitri/el-get">el-get</a> project releases its new stable version, <code>3.1</code>. This new release
fixes bugs, add a host of new recipes (we have 420 of them and counting) and
some nice new features too.  You really want to upgrade.</p>

<h2>New features</h2>

<p class="first">Among the features you will find dependencies management and <code>M-x
el-get-list-packages</code>, that you should try as soon as possible.  Of course,
don't miss <code>M-x el-get-self-update</code> that eases the process somehow.</p>

<center>
<p><img src="../../../images/emacs-el-get-list-packages.png" alt=""></p>
</center>

<p>This shows the result of <code>M-x el-get-list-packages</code>.  The packages that don't
have a description are the one from <a href="http://www.emacswiki.org/cgi-bin/wiki?action=index;match=%5C.(el&#124;tar)(%5C.gz)%3F%24">emacswiki</a> that doesn't provide a listing
of the filename <em>and</em> the first line of the file (it usually follows the
format <code>;;; filename.el --- description here</code>).  As we don't want to mirror
the website just to be able to provide descriptions, we just don't have them
now.</p>

<p>Another nice new feature, contributed by a user that wanted to self-learn
<a href="http://www.gnu.org/software/emacs/manual/html_node/elisp/index.html">elisp</a>, is the <code>el-get-user-package-directory</code> support.  Just place in there
some <code>init-my-package.el</code> files, and when <em>el-get</em> wants to init the <code>my-package</code>
package, it will load that file for you.  That helps managing your setup,
and I'm already using that in my own <code>~/.emacs.d/</code> repository.</p>


<h2>Upgrading</h2>

<p class="first">The upgrading is to be done with some care, though, because you need to edit
your packaging setup.  The <code>el-get-sources</code> variable used to be both where to
setup extra recipes and the list of packages you want to have installed, and
several people rightfully insisted that I should change that.  I've been
slow to be convinced, but there it is, they were right.</p>

<p>So now, <a href="http://www.emacswiki.org/emacs/el-get">el-get</a> works from the current status of packages and will init all
those packages you have installed.  Which means that you just <code>M-x
el-get-install</code> a package and don't think about it anymore.  If you need to
override this behavior, it's still possible to do so by specifying the whole
list of packages you want initialized (and installed if necessary) on the
<code>(el-get 'sync ...)</code> call.</p>

<p>That later setup is useful if you want to share your el-get selection on
several machines.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 16 Sep 2011 14:13:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/09/16-el-get-3.1.html</guid>
</item>
<item>
  <title>PostgreSQL 9.1</title>
  <link>http://tapoueh.org/blog/2011/09/19-sortie-de-9.1.html</link>
  <description><![CDATA[<p><a href="http://www.postgresql.org/about/news.1349">PostgreSQL 9.1</a> est dans les bacs ! Vous n'avez pas encore cette nouvelle
version en production ?  Pas encore évalué pourquoi vous devriez envisager
de migrer à cette version ?  Il existe beaucoup de bonnes raisons de passer
à cette version, et peu de pièges.</p>

<p>Nous commençons à lire des articles qui reprennent la nouvelle dans la
presse française, et j'ai le plaisir de mentionner celui de <a href="http://www.programmez.com/actualites.php?titre_actu=Sortie-de-PostgreSQL-91-&#33;&amp;id_actu=10190">programmez.com</a>
qui annonce « un système d'extensions inégalé ».  En tant que développeur
des <a href="http://www.postgresql.org/docs/9.1/static/extend-extensions.html">Extensions</a> dans PostgreSQL, je ne peux qu'être non seulement d'accord
avec eux, mais aussi flatté :)</p>

<p>Bons tests à tous, et bonne mises à jour pour les plus chanceux !</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 14 Sep 2011 10:00:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/09/19-sortie-de-9.1.html</guid>
</item>
<item>
  <title>Éviter les injections SQL</title>
  <link>http://tapoueh.org/blog/2011/09/07-eviter-les-injections-sql.html</link>
  <description><![CDATA[<p>Nous avons parlé la dernière fois les règles d'<a href="http://tapoueh.org/blog/2011/08/18-echappements-de-chaine.html">échappement de chaînes</a> avec
PostgreSQL, et mentionné qu'utiliser ces techniques afin de protéger les
données insérées dans les requêtes SQL n'était pas une bonne idée dans la
mesure où PostgreSQL offre une fonctionnalité bien plus adaptée.</p>

<p>Nous faisons face ici à un problème de sécurité très bien décrit dans le
billet humoristique de <a href="http://xkcd.com/327/">Little Boby Tables</a>, dont je vous recommande la
lecture. L'idée est simple, la mise en place de contre mesure fourmille de
pièges subtils, à moins d'utiliser la solution décrite ci-après.</p>

<center>
<p><img src="http://imgs.xkcd.com/comics/exploits_of_a_mom.png" alt=""></p>
</center>

<p>Lorsque l'on envoie une requête SQL à PostgreSQL, celle-ci contient
pêle-mêle un mélange de mots-clés SQL et de données utilisateurs. Dans la
requête <code>SELECT colname FROM table WHERE pk = 1234;</code>
l'élément <code>1234</code> est une donnée fournie à PostgreSQL. Lorsque l'on utilise
d'autre types de données, on va parler de <em>litéral</em>, qui peut être ou non
<em>décoré</em>.  Un exemple ?</p>

<pre class="src">
=# SELECT <span style="color: #ad7fa8; font-style: italic;">'undecorated literal'</span>, pg_typeof(<span style="color: #ad7fa8; font-style: italic;">'undecoreted literal'</span>),
          date <span style="color: #ad7fa8; font-style: italic;">'today'</span>, pg_typeof(date <span style="color: #ad7fa8; font-style: italic;">'today'</span>);
      ?column?       | pg_typeof |    date    | pg_typeof
<span style="color: #888a85;">---------------------+-----------+------------+-----------
</span> undecorated literal | unknown   | 2011-09-07 | date
(1 row)
</pre>

<p>Outre l'aspect types de données (un litéral non décoré est de type <em>unknown</em>
jusqu'à ce qu'une opération force son type, c'est ce qui permet d'avoir du
polymorphisme dans PostgreSQL), nous voyons ici que PostgreSQL doit faire la
différence entre le SQL lui-même et les paramètres qui le composent. Il sait
bien sûr faire cela, il suffit d'encadrer les valeurs dans des simples
guillemets ou bien d'utiliser la notation dite de <a href="http://docs.postgresqlfr.org/9.0/sql-syntax.html#sql-syntax-dollar-quoting">dollar quoting</a>. Mais si
l'on ne prend pas de précautions, l'utilisateur peut terminer la séquence
d'échappements depuis le champ de saisie du formulaire…</p>

<p><a href="http://docs.postgresql.fr/9.1/libpq.html">libpq</a> est la librairie standard cliente de PostgreSQL et fourni des <em>API</em> de
connexion et propose une fonction <a href="http://docs.postgresql.fr/9.1/libpq-exec.html#libpq-pqexecparams">PGexecParams</a>. Cette fonction expose un
mécanisme disponible dans le protocole de communication de PostgreSQL
lui-même : il est possible de faire parvenir le SQL et les données qu'il
contient dans deux parties différentes du messages plutôt que de les
mélanger. Ainsi, le serveur n'a plus du tout à deviner où commencent et où
terminent les données dans la requête, il lui suffit de regarder dans le
tableau séparé contenant les données quand il en a besoin.</p>

<p>Terminées les injections SQL !</p>

<p>Note : cette fonction est exposée dans la plupart des pilotes de connexion,
et même en PHP, dont la popularité et l'exposition me poussent à donner une
référence plus précise : utilisez <a href="http://fr2.php.net/manual/en/function.pg-query-params.php">pg_query_params</a>, son intérêt n'est pas
simplement syntaxique, il va jusque dans la définition des échanges de
données entre le client (votre code PHP) et le serveur (PostgreSQL).</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 07 Sep 2011 11:36:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/09/07-eviter-les-injections-sql.html</guid>
</item>
<item>
  <title>Éviter les injections SQL</title>
  <link>http://tapoueh.org/blog/2011/09/07-requete-parametree.html</link>
  <description><![CDATA[<p>Nous avons parlé la dernière fois les règles d'<a href="http://tapoueh.org/blog/2011/08/18-echappements-de-chaine.html">échappement de chaînes</a> avec
PostgreSQL, et mentionné qu'utiliser ces techniques afin de protéger les
données insérées dans les requêtes SQL n'était pas une bonne idée dans la
mesure où PostgreSQL offre une fonctionnalité bien plus adaptée.</p>

<p>Nous faisons face ici à un problème de sécurité très bien décrit dans le
billet humoristique de <a href="http://xkcd.com/327/">Little Boby Tables</a>, dont je vous recommande la
lecture. L'idée est simple, la mise en place de contre mesure fourmille de
pièges subtils, à moins d'utiliser la solution décrite ci-après.</p>

<center>
<p><img src="http://imgs.xkcd.com/comics/exploits_of_a_mom.png" alt=""></p>
</center>

<p>Lorsque l'on envoie une requête SQL à PostgreSQL, celle-ci contient
pêle-mêle un mélange de mots-clés SQL et de données utilisateurs. Dans la
requête <code>SELECT colname FROM table WHERE pk = 1234;</code>
l'élément <code>1234</code> est une donnée fournie à PostgreSQL. Lorsque l'on utilise
d'autre types de données, on va parler de <em>litéral</em>, qui peut être ou non
<em>décoré</em>.  Un exemple ?</p>

<pre class="src">
=# SELECT <span style="color: #ad7fa8; font-style: italic;">'undecorated literal'</span>, pg_typeof(<span style="color: #ad7fa8; font-style: italic;">'undecoreted literal'</span>),
          date <span style="color: #ad7fa8; font-style: italic;">'today'</span>, pg_typeof(date <span style="color: #ad7fa8; font-style: italic;">'today'</span>);
      ?column?       | pg_typeof |    date    | pg_typeof
<span style="color: #888a85;">---------------------+-----------+------------+-----------
</span> undecorated literal | unknown   | 2011-09-07 | date
(1 row)
</pre>

<p>Outre l'aspect types de données (un litéral non décoré est de type <em>unknown</em>
jusqu'à ce qu'une opération force son type, c'est ce qui permet d'avoir du
polymorphisme dans PostgreSQL), nous voyons ici que PostgreSQL doit faire la
différence entre le SQL lui-même et les paramètres qui le composent. Il sait
bien sûr faire cela, il suffit d'encadrer les valeurs dans des simples
guillemets ou bien d'utiliser la notation dite de <a href="http://docs.postgresqlfr.org/9.0/sql-syntax.html#sql-syntax-dollar-quoting">dollar quoting</a>. Mais si
l'on ne prend pas de précautions, l'utilisateur peut terminer la séquence
d'échappements depuis le champ de saisie du formulaire…</p>

<p><a href="http://docs.postgresql.fr/9.1/libpq.html">libpq</a> est la librairie standard cliente de PostgreSQL et fourni des <em>API</em> de
connexion et propose une fonction <a href="http://docs.postgresql.fr/9.1/libpq-exec.html#libpq-pqexecparams">PGexecParams</a>. Cette fonction expose un
mécanisme disponible dans le protocole de communication de PostgreSQL
lui-même : il est possible de faire parvenir le SQL et les données qu'il
contient dans deux parties différentes du messages plutôt que de les
mélanger. Ainsi, le serveur n'a plus du tout à deviner où commencent et où
terminent les données dans la requête, il lui suffit de regarder dans le
tableau séparé contenant les données quand il en a besoin.</p>

<p>Terminées les injections SQL !</p>

<p>Note : cette fonction est exposée dans la plupart des pilotes de connexion,
et même en PHP, que la popularité et l'exposition me poussent à donner une
référence plus précise : utilisez <a href="http://fr2.php.net/manual/en/function.pg-query-params.php">pg_query_params</a>, son intérêt n'est pas
simplement syntaxique, il va jusque dans la définition des échanges de
données entre le client (votre code PHP) et le serveur (PostgreSQL).</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 07 Sep 2011 11:36:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/09/07-requete-parametree.html</guid>
</item>
<item>
  <title>PostgreSQL and debian</title>
  <link>http://tapoueh.org/blog/2011/09/05-apt-postgresql-org.html</link>
  <description><![CDATA[<p>After talking about it for a very long time, work finally did begin!  I'm
talking about the <a href="https://github.com/dimitri/apt.postgresql.org">apt.postgresql.org</a> build system that will allow us, in the
long run, to propose <code>debian</code> versions of binary packages for <a href="http://www.postgresql.org/">PostgreSQL</a> and
its extensions, compiled for a bunch of debian and ubuntu versions.</p>

<p>We're now thinking to support the <code>i386</code> and <code>amd64</code> architectures for <code>lenny</code>,
<code>squeeze</code>, <code>wheezy</code> and <code>sid</code>, and also for <code>maverick</code> and <code>natty</code>, maybe <code>oneiric</code> too
while at it.</p>

<p>It's still the very beginning of the effort, and it was triggered by the
decision to move <code>sid</code> to <code>9.1</code>.  While it's a good decision in itself, I still
hate to have to pick only one PostgreSQL version per debian stable release
when we have all the technical support we need to be able to support all
stable releases that <em>upstream</em> is willing to maintain. If you've been living
under a rock, or if you couldn't care less about <code>debian</code> choices, the problem
here for debian is ensuring security (and fixes) updates for PostgreSQL —
they promise they will handle the job just fine in the social contract, and
don't want to have to it without support from PostgreSQL if a <em>debian stable</em>
release contains a deprecated PostgreSQL version.</p>

<p>That opens the door for PostgreSQL community to handle the packaging of its
solutions as a service to its debian users.  We intend to open with support
for <code>8.4</code>, <code>9.0</code> and <code>9.1</code>, and maybe <code>8.3</code> too, as <a href="http://qa.debian.org/developer.php?login=myon">Christoph Berg</a> is doing good
progress on this front.  See, it's teamwork here!</p>

<p>We still have more work to do, and setting up the build environment so that
we are able to provide the packages for so much targets will indeed be
interesting. Getting there, a step after another.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 05 Sep 2011 17:14:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/09/05-apt-postgresql-org.html</guid>
</item>
<item>
  <title>pg_restore -L &amp; pg_staging</title>
  <link>http://tapoueh.org/blog/2011/08/29-pgstaging-and-pgrestore-listing.html</link>
  <description><![CDATA[<p>On the <a href="http://archives.postgresql.org/pgsql-hackers">PostgreSQL Hackers</a> mailing lists, <a href="http://people.planetpostgresql.org/andrew/">Andrew Dunstan</a> just proposed some
new options for <code>pg_dump</code> and <code>pg_restore</code> to ease our lives.  One of the
answers was talking about some scripts available to exploit the <a href="http://www.postgresql.org/docs/9.0/static/app-pgrestore.html">pg_restore</a>
listing that you play with using options <code>-l</code> and <code>-L</code>, or the long name
versions <code>--list</code> and <code>--use-list</code>.  The <a href="../../../pgsql/pgstaging.html">pg_staging</a> tool allows you to easily
exploit those lists too.</p>

<p>The <code>pg_restore</code> list is just a listing of one object per line of all objects
contained into a <em>custom</em> dump, that is one made with <code>pg_dump -Fc</code>.  You can
then tweak this listing in order to comment out some objects (prepending a <code>;</code>
to the line where you find it), and give your hacked file back to <code>pg_restore
--use-list</code> so that it will skip them.</p>

<p>What's pretty useful here, among other things, is that a table will have in
fact more than one line in the listing.  One is for the <code>TABLE</code> definition,
another one for the <code>TABLE DATA</code>.  So that <code>pg_staging</code> is able to provide you
with options for only restoring some <em>schemas</em>, some <em>schemas_nodata</em> and even
some <em>tablename_nodata_regexp</em>, to use directly the configuration options
names.</p>

<p>How to do a very simple exclusion of some table's data when restoring a
dump, will you ask me?  There we go.  Let's first prepare an environment,
where I have only a <a href="http://www.postgresql.org/">PostgreSQL</a> server running.</p>

<pre class="src">
$ git clone git://github.com/dimitri/pg_staging.git
$ git clone git://github.com/dimitri/pgloader.git
$ for s in */*.sql; do psql -f $s; done
$ pg_dump -Fc &gt; pgloader.dump
</pre>

<p>Now I have a dump with some nearly random SQL objects in it, let's filter
out the tables named <em>reformat</em> and <em>parallel</em> from that.  We will take the
sample setup from the <code>pg_staging</code> project.  Going the quick route, we will
not even change the default sample database name that's used, which is
<code>postgres</code>.  After all, the <code>catalog</code> command of <code>pg_staging</code> that we're using
here is a <em>developer</em> command, you're supposed to be using <code>pg_staging</code> for a
lot more services that just this one.</p>

<pre class="src">
$ cp pg_staging/pg_staging.ini .
$ (echo <span style="color: #bc8f8f;">"schemas = public"</span>;
   echo <span style="color: #bc8f8f;">"tablename_nodata_regexp = parallel,reformat"</span>) \
  &gt;&gt; pg_staging.ini
$ echo <span style="color: #bc8f8f;">"catalog postgres pgloader.dump"</span> \
   | python pg_staging/pg_staging.py -c pg_staging.ini
 ; Archive created at Mon Aug 29 17:17:49 2011
 ;
 ; [EDITED OUTPUT]
 ;
 ; Selected TOC Entries:
 ;
3; 2615 2200 SCHEMA - public postgres
1864; 0 0 COMMENT - SCHEMA public postgres
1536; 1259 174935 TABLE public parallel dimitri
1537; 1259 174943 TABLE public partial dimitri
1538; 1259 174951 TABLE public reformat dimitri
;1853; 0 174935 TABLE DATA public parallel dimitri
1854; 0 174943 TABLE DATA public partial dimitri
;1855; 0 174951 TABLE DATA public reformat dimitri
1834; 2606 174942 CONSTRAINT public parallel_pkey dimitri
1836; 2606 174950 CONSTRAINT public partial_pkey dimitri
1838; 2606 174955 CONSTRAINT public reformat_pkey dimitri
</pre>

<p>We can see that the objects indeed are skipped, now how to really go about
the <code>pg_restore</code> is like that:</p>

<pre class="src">
$ createdb foo
$ echo <span style="color: #bc8f8f;">"catalog postgres pgloader.dump"</span> \
 |python pg_staging/pg_staging.py -c pg_staging.ini &gt; short.list
$ pg_restore -L short.list -d foo pgloader.dump
</pre>

<p>The little bonus with using <code>pg_staging</code> is that when filtering out a <em>schema</em>
it will track all tables and triggers from that schema, and also the
functions used in the trigger definition.  Which is not as easy as it
sounds, believe me!</p>

<p>The practical use case is when filtering out <code>PGQ</code> and <code>Londiste</code>, then the <code>PGQ</code>
triggers will automatically be skipped by <code>pg_staging</code> rather than polluting
the <code>pg_restore</code> logs because the <code>CREATE TRIGGER</code> command could not find the
necessary implementation procedure.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 29 Aug 2011 18:05:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/29-pgstaging-and-pgrestore-listing.html</guid>
</item>
<item>
  <title>Skytools, version 3</title>
  <link>http://tapoueh.org/blog/2011/08/26-skytools3.html</link>
  <description><![CDATA[<p>You can find <a href="http://packages.debian.org/source/experimental/skytools3">skytools3</a> in debian experimental already, it's in <em>release
candidate</em> status.  What's missing is the documentation, so here's an idea:
I'm going to make a blog post series about <a href="https://github.com/markokr/skytools">skytools</a> next features, how to
use them, what they are good for, etc.  This first article of the series
will just list what are those new features.</p>

<p>Here are the slides from the <a href="http://www.char11.org/">CHAR(11)</a> talk I made last month, about that
very subject:</p>

<center>
<p><a class="image-link" href="../../../images/confs/CHAR_2011_Skytools3.pdf">
<img src="../../../images/confs/CHAR_2011_Skytools3.png"></a></p>
</center>


<p>The new version comes with a lot of new features.  <code>PGQ</code> now is able to
duplicate the queue events from one node to the next, so that it's able to
manage <em>switching over</em>.  To do that we have three types of nodes now, <em>root</em>,
<em>branch</em> and <em>leaf</em>.  <code>PGQ</code> also supports <em>cooperative consumers</em>, meaning that you
can share the processing load among many <em>consumers</em>, or workers.</p>

<p><code>Londiste</code> now benefits from the <em>switch over</em> feature, and is packed with new
little features like <code>add &lt;table&gt; --create</code>, the new <code>--trigger-flags</code> argument,
and the new <code>--handler</code> thing (to do e.g. partial table replication).  Let's
not forget the much awaited <code>execute &lt;script&gt;</code> command that allows to include
<code>DDL</code> commands into the replication stream, nor the <em>parallel</em> <code>COPY</code> support that
will boost your initial setup.</p>

<p><code>walmgr</code> in the new version behaves correctly when using <a href="http://www.postgresql.org">PostgreSQL</a> 9.0.
Meaning that as soon as no more <em>WAL</em> files are available in the archives, it
returns an error code to the <em>archiver</em> so that the server switches to
<em>streaming</em> live from the <code>primary_conninfo</code>, then back to replaying the files
from the archive if the connection were to fail, etc.  All in all, it just
works.</p>

<p>Details to follow here, stay tuned!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 26 Aug 2011 21:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/26-skytools3.html</guid>
</item>
<item>
  <title>pgfincore in debian</title>
  <link>http://tapoueh.org/blog/2011/08/19-pgfincore-in-debian.html</link>
  <description><![CDATA[<p>As of pretty recently, <a href="http://villemain.org/projects/pgfincore">pgfincore</a> is now in debian, as you can see on its
<a href="http://packages.debian.org/sid/postgresql-9.0-pgfincore">postgresql-9.0-pgfincore</a> page.  The reason why it entered the <a href="http://www.debian.org/">debian</a>
archives is that it reached the <code>1.0</code> release!</p>

<p>Rather than talking about what <em>pgfincore</em> is all about (<em>A set of functions to
manage pages in memory from PostgreSQL</em>), I will talk about its packaging and
support as a <em>debian package</em>.  Here's the first example of a modern
multi-version packaging I have to offer.  <a href="https://github.com/dimitri/pgfincore/tree/master/debian">pgfincore packaging</a> supports
building for <code>8.4</code> and <code>9.0</code> and <code>9.1</code> out of the box, even if the only binary
you'll find in <em>debian</em> sid is the <code>9.0</code> one, as you can check on the
<a href="http://packages.debian.org/source/sid/pgfincore">pgfincore debian source package</a> page.</p>

<p>Also, this is the first package I've done properly using the newer version
of <a href="http://kitenet.net/~joey/code/debhelper/">debhelper</a>, which make the <a href="https://github.com/dimitri/pgfincore/blob/master/debian/rules">debian/rules</a> file easier than ever.  Let's have
a look at it:</p>

<pre class="src">
<span style="color: #b8860b;">SRCDIR</span> = $(<span style="color: #b8860b;">CURDIR</span>)
<span style="color: #b8860b;">TARGET</span> = $(<span style="color: #b8860b;">CURDIR</span>)/debian/pgfincore-%v
<span style="color: #b8860b;">PKGVERS</span> = $(<span style="color: #b8860b;">shell</span> dpkg-parsechangelog | awk -F <span style="color: #bc8f8f;">'[:-]'</span> <span style="color: #bc8f8f;">'/^Version:/ { print substr($$2, 2) }'</span>)
<span style="color: #b8860b;">EXCLUDE</span> = --exclude-vcs --exclude=debian

<span style="color: #7f007f;">include</span> <span style="color: #b8860b;">/usr/share/postgresql-common/pgxs_debian_control.mk</span>

<span style="color: #0000ff;">override_dh_auto_clean</span>: debian/control
        pg_buildext clean $(<span style="color: #b8860b;">SRCDIR</span>) $(<span style="color: #b8860b;">TARGET</span>) <span style="color: #bc8f8f;">"$(</span><span style="color: #b8860b;">CFLAGS</span><span style="color: #bc8f8f;">)"</span>
        dh_clean

<span style="color: #0000ff;">override_dh_auto_build</span>:
<span style="background-color: #ff69b4;">        #</span><span style="color: #b22222;"> </span><span style="color: #b22222;">build all supported version
</span>        pg_buildext build $(<span style="color: #b8860b;">SRCDIR</span>) $(<span style="color: #b8860b;">TARGET</span>) <span style="color: #bc8f8f;">"$(</span><span style="color: #b8860b;">CFLAGS</span><span style="color: #bc8f8f;">)"</span>

<span style="color: #0000ff;">override_dh_auto_install</span>:
<span style="background-color: #ff69b4;">        #</span><span style="color: #b22222;"> </span><span style="color: #b22222;">then install each of them
</span>        for v in <span style="color: #bc8f8f;">`pg_buildext supported-versions $(</span><span style="color: #b8860b;">SRCDIR</span><span style="color: #bc8f8f;">)`</span>; do \
                dh_install -ppostgresql-$$v-pgfincore ;\
        done

<span style="color: #0000ff;">orig</span>: clean
        cd .. &amp;&amp; tar czf pgfincore_$(<span style="color: #b8860b;">PKGVERS</span>).orig.tar.gz $(<span style="color: #b8860b;">EXCLUDE</span>) pgfincore

<span style="color: #0000ff;">%</span>:
        dh <span style="color: #0000ff;">$</span><span style="color: #5f9ea0;">@</span>
</pre>

<p>The <code>debian/rules</code> file is known to be the corner stone of your debian
packaging, and usually is the most complex part of it.  It's a <code>Makefile</code> at
its heart, and we can see that thanks to the <code>debhelper</code> magic it's not that
complex to maintain anymore.</p>

<p>Then, this file is using support from a bunch of helpers command, each of
them comes with its own man page and does a little part of the work.  The
overall idea around <code>debhelper</code> is that what it does covers 90% of the cases
around, and it's not aiming for more.  You have to <em>override</em> the parts where
it defaults to being wrong.</p>

<p>Here for example the build system has to produce files for all three
supported versions of <a href="http://www.postgresql.org/">PostgreSQL</a>, which means invoking the same build system
three time with some changes in the <em>environment</em> (mainly setting the
<code>PG_CONFIG</code> variable correctly).  But even for that we have a <em>debian</em> facility,
that comes in the package <a href="http://packages.debian.org/sid/postgresql-server-dev-all">postgresql-server-dev-all</a>, called <code>pg_buildext</code>.  As
long as your extension build system is <code>VPATH</code> friendly, it's all automated.</p>

<p>Please read that last sentence another time.  <code>VPATH</code> is the thing that allows
<code>Make</code> to find your source tree somewhere in the system, not in the current
working directory.  That allows you to cleanly build the same sources in
different build locations, which is exactly what we need here, and is
cleanly supported by <a href="http://www.postgresql.org/docs/9.1/static/extend-pgxs.html">PGXS</a>, the <a href="http://www.postgresql.org/docs/9.1/static/extend-pgxs.html">PostgreSQL Extension Building Infrastructure</a>.</p>

<p>Which means that the main <code>Makefile</code> of <em>pgfincore</em> had to be simplified, and
the code layout too.  Some advances <code>Make</code> features such as <code>$(wildcard ...)</code>
and all will not work here.  See what we got at the end:</p>

<pre class="src">
ifndef VPATH
<span style="color: #b8860b;">SRCDIR</span> = .
else
<span style="color: #b8860b;">SRCDIR</span> = $(<span style="color: #b8860b;">VPATH</span>)
endif

<span style="color: #b8860b;">EXTENSION</span>    = pgfincore
<span style="color: #b8860b;">EXTVERSION</span>   = $(<span style="color: #b8860b;">shell</span> grep default_version $(<span style="color: #b8860b;">SRCDIR</span>)/$(<span style="color: #b8860b;">EXTENSION</span>).control | \
               sed -e <span style="color: #bc8f8f;">"s/default_version[[:space:]]*=[[:space:]]*'\([^']*\)'/\1/"</span>)

<span style="color: #b8860b;">MODULES</span>      = $(<span style="color: #b8860b;">EXTENSION</span>)
<span style="color: #b8860b;">DATA</span>         = sql/pgfincore.sql sql/uninstall_pgfincore.sql
<span style="color: #b8860b;">DOCS</span>         = doc/README.$(<span style="color: #b8860b;">EXTENSION</span>).rst

<span style="color: #b8860b;">PG_CONFIG</span>    = pg_config

<span style="color: #b8860b;">PG91</span>         = $(<span style="color: #b8860b;">shell</span> $(<span style="color: #b8860b;">PG_CONFIG</span>) --version | grep -qE <span style="color: #bc8f8f;">"8\.|9\.0"</span> &amp;&amp; echo no || echo yes)

ifeq ($(<span style="color: #b8860b;">PG91</span>),yes)
<span style="color: #0000ff;">all</span>: pgfincore--$(<span style="color: #b8860b;">EXTVERSION</span>).sql

<span style="color: #0000ff;">pgfincore--$(</span><span style="color: #0000ff;">EXTVERSION</span><span style="color: #0000ff;">).sql</span>: sql/pgfincore.sql
        cp $<span style="color: #5f9ea0;">&lt;</span> <span style="color: #0000ff;">$</span><span style="color: #5f9ea0;">@</span>

<span style="color: #b8860b;">DATA</span>        = pgfincore--unpackaged--$(<span style="color: #b8860b;">EXTVERSION</span>).sql pgfincore--$(<span style="color: #b8860b;">EXTVERSION</span>).sql
<span style="color: #b8860b;">EXTRA_CLEAN</span> = sql/$(<span style="color: #b8860b;">EXTENSION</span>)--$(<span style="color: #b8860b;">EXTVERSION</span>).sql
endif

<span style="color: #b8860b;">PGXS</span> := $(<span style="color: #b8860b;">shell</span> $(<span style="color: #b8860b;">PG_CONFIG</span>) --pgxs)
<span style="color: #7f007f;">include</span> $(<span style="color: #b8860b;">PGXS</span>)

<span style="color: #0000ff;">deb</span>:
        dh clean
        make -f debian/rules orig
        debuild -us -uc -sa
</pre>

<p>No more <code>Make</code> magic to find source files.  Franckly though, when your sources
are 1 <code>c</code> file and 2 <code>sql</code> files, you don't need that much magic anyway.  You
just want to believe that a single generic <code>Makefile</code> will happily build any
project you throw at it, only requiring minor adjustment.  Well, the reality
is that you might need some more little adjustments if you want to benefit
from <code>VPATH</code> building, and having the binaries for <code>8.4</code> and <code>9.0</code> and <code>9.1</code> built
seemlessly in a simple loop.  Like we have here for <em>pgfincore</em>.</p>

<p>Now the <code>Makefile</code> still contains a little bit of magic, in order to parse the
extension version number from its <em>control file</em> and produce a <em>script</em> named
accordingly.  Then you'll notice a difference between the
<a href="https://github.com/dimitri/pgfincore/blob/master/debian/postgresql-9.1-pgfincore.install">postgresql-9.1-pgfincore.install</a> file and the
<a href="https://github.com/dimitri/pgfincore/blob/master/debian/postgresql-9.0-pgfincore.install">postgresql-9.0-pgfincore.install</a>.  We're just not shipping the same files:</p>

<pre class="src">
debian/pgfincore-9.0/pgfincore.so usr/lib/postgresql/9.0/lib
sql/pgfincore.sql usr/share/postgresql/9.0/contrib
sql/uninstall_pgfincore.sql usr/share/postgresql/9.0/contrib
</pre>

<p>As you can see here:</p>

<pre class="src">
debian/pgfincore-9.1/pgfincore.so usr/lib/postgresql/9.1/lib
debian/pgfincore-9.1/pgfincore*.sql usr/share/postgresql/9.1/extension
sql/pgfincore--unpackaged--1.0.sql usr/share/postgresql/9.1/extension
</pre>

<p>So, now that we uncovered all the relevant magic, packaging and building
your next extension so that it supports as many PostgreSQL major releases as
you need to will be that easy.</p>

<p>For reference, you might need to also tweak
<code>/usr/share/postgresql-common/supported-versions</code> so that it allows you to
build for all those versions you claim to support in the <a href="https://github.com/dimitri/pgfincore/blob/master/debian/pgversions">debian/pgversions</a>
file.</p>

<pre class="src">
$ sudo dpkg-divert \
--divert /usr/share/postgresql-common/supported-versions.distrib \
--rename /usr/share/postgresql-common/supported-versions

$ cat /usr/share/postgresql-common/supported-versions
#! /bin/bash

dpkg -l postgresql-server-dev-* \
| awk -F '[ -]' '/^ii/ &amp;&amp; ! /server-dev-all/ {print $6}'
</pre>

<p>All of this will come pretty handy when we finally sit down and work on a
way to provide binary packages for PostgreSQL and its extensions, and all
supported versions of those at that.  This very project is not dead, it's
just sleeping some more.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 19 Aug 2011 23:00:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/19-pgfincore-in-debian.html</guid>
</item>
<item>
  <title>Échappement de chaînes</title>
  <link>http://tapoueh.org/blog/2011/08/18-echappements-de-chaine.html</link>
  <description><![CDATA[<p>Parmis les nouveautés de la <a href="http://www.postgresql.org/about/news.1331">prochaine version</a> de <a href="http://www.postgresql.org/">PostgreSQL</a>, la fameuse <code>9.1</code>,
il faut signaler le changement de valeur par défaut de la variable
<code>standard_conforming_strings</code>, qui passe à <em>vraie</em>.</p>

<p>En effet, l'utilisation d'échappements avec le caractère « anti-slash »
n'est pas conforme au standard SQL.  Le paramètre
<code>standard_conforming_strings</code> permet de contrôler le comportement de
PostgreSQL lorsqu'il lit une chaîne de caractère dans une requête SQL.</p>

<p>Voyons quelques exemples :</p>

<pre class="src">
dimitri=# set standard_conforming_strings to true;
SET
dimitri=# select 'hop''';
 ?column?
----------
 hop'
(1 ligne)

dimitri=# select 'hop\'';
dimitri'# ';
 ?column?
----------
 hop\';

(1 ligne)

dimitri=# select E'hop\'';
 ?column?
----------
 hop'
(1 ligne)

dimitri=# set standard_conforming_strings to false;
SET
dimitri=# select E'hop\'';
 ?column?
----------
 hop'
(1 ligne)

dimitri=# select 'hop\'';
ATTENTION:  utilisation non standard de \' dans une cha&#238;ne litt&#233;rale
LIGNE 1 : select 'hop\'';
                 ^
ASTUCE : Utilisez '' pour &#233;crire des guillemets dans une cha&#238;ne ou utilisez
la syntaxe de cha&#238;ne d'&#233;chappement (E'...').
 ?column?
----------
 hop'
(1 ligne)
</pre>

<p>Il existe un moyen de forcer PostgreSQL à accepter l'utilisation
d'échappements avec « anti-slash » indépendamment de la valeur de
<code>standard_conforming_strings</code>, c'est la notation préfixée avec <code>E</code>.  Il est
recommandé de toujours l'utiliser dès lors que la chaîne de caractère
contient des « anti-slash » utilisés comme échappement (du caractère simple
guillemet en général).</p>

<p>Le paramètre <code>escape_string_warning</code>, enfin, permet de désactiver les
avertissements tels que présentés dans le dernier exemple ci-dessus,
lorsqu'il est positionné à <code>off</code>.  Bien sûr, sa valeur par défaut est <code>on</code>.</p>

<p>Toute apparition de ce <em>WARNING</em> lorsque <code>escape_string_warning</code> est <code>on</code> signifie
que votre application n'est pas prête à migrer à <code>9.1</code> avec son paramétrage
par défaut.  Il existe deux actions possible : changer le paramétrage de sa
nouvelle valeur par défaut à sa précédente, ou bien corriger ses
applications pour utiliser le préfixe <code>E</code> dès que cela est nécessaire.</p>

<p>L'utilisation de <code>standard_conforming_strings</code> à <code>on</code> présente un autre avantage
au respect du standard SQL : la sécurité contre les injections.  S'il n'est
pas possible d'échapper le guillemet simple qui termine toute chaîne de
caractère utilisateur, il devient compliqué de jouer au plus malin avec le
<em>parser</em>.  Le mieux ici reste bien sûr d'utiliser les requêtes paramétrées, à
suivre dans un prochain article.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 18 Aug 2011 19:01:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/18-echappements-de-chaine.html</guid>
</item>
<item>
  <title>el-get-list-packages</title>
  <link>http://tapoueh.org/blog/2011/08/18-el-get-list-packages.html</link>
  <description><![CDATA[<p>From the first days of <a href="../../../emacs/el-get.html">el-get</a> is was quite clear for me that we would reach
a point where users would want a nice listing including descriptions of the
packages, and a <em>major mode</em> allowing you to select packages to install,
remove and update.  It was also quite clear that I was not much interested
into doing it myself, even if I would appreciate having it done.</p>

<p>Well, the joy of Open Source &amp; Free Software (pick your own poison).
<a href="https://github.com/jglee1027">jglee1027</a> is this <em>GitHub</em> guy who did offer an implementation of said
facility, and who added descriptions for almost all of the now <code>402</code> recipes
that we have included with <a href="../../../emacs/el-get.html">el-get</a>.</p>

<p>Here's an image of what you get:</p>

<center>
<p><img src="../../../images/emacs-el-get-list-packages.png" alt=""></p>
</center>

<p>The packages with no description are fetched by <code>M-x el-get-emacswiki-refresh</code>
which will not download all <a href="http://emacswiki.org">emacswiki</a> content locally just so that it can
parse the scripts's header and have a local description.  Maybe it's time to
ask for another page over there like <a href="http://www.emacswiki.org/cgi-bin/wiki?action=index;match=%5C.(el%7Ctar)(%5C.gz)%3F%24">emacswiki page index</a> but containing the
first line too.</p>

<p>For recipes we offer, this first line often looks like the following:</p>

<pre class="src">
<span style="color: #b22222;">;;; </span><span style="color: #b22222;">123-menu.el --- Simple menuing system, reminiscent of Lotus 123 in DOS
</span></pre>

<p>Of course some files over there are not following the stanza, but that would
be good enough already.</p>

<p>All in all, I hope you enjoy <code>M-x el-get-list-packages</code>!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 18 Aug 2011 18:10:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/18-el-get-list-packages.html</guid>
</item>
<item>
  <title>Tutoriel pgloader</title>
  <link>http://tapoueh.org/blog/2011/08/15-tutoriel-pgloader.html</link>
  <description><![CDATA[<p>En reprenant le contenu des articles de la série sur <a href="http://tapoueh.org/pgsql/pgloader.html">pgloader</a>, j'ai pris le
temps de compiler un tutoriel complet, en anglais.  Si j'en crois les
quelques mails que je reçois régulièrement au sujet de <code>pgloader</code> depuis
quelques années maintenant, cela devrait aider les nouveaux utilisateurs.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 15 Aug 2011 15:39:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/15-tutoriel-pgloader.html</guid>
</item>
<item>
  <title>pgloader tutorial</title>
  <link>http://tapoueh.org/blog/2011/08/15-pgloader-tutorial.html</link>
  <description><![CDATA[<p>To finish up the pgloader series, I've compiled all the information into a
single page, the long awaited <a href="http://tapoueh.org/pgsql/pgloader.html#sec5">pgloader tutorial</a>.  That should help lots of
users to get started with <code>pgloader</code>.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 15 Aug 2011 15:33:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/15-pgloader-tutorial.html</guid>
</item>
<item>
  <title>pgloader constant cols</title>
  <link>http://tapoueh.org/blog/2011/08/12-pgloader-udc.html</link>
  <description><![CDATA[<p>The previous articles in the <a href="../../../pgsql/pgloader.html">pgloader</a> series detailed <a href="http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html">How To Use PgLoader</a>
then <a href="http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html">How to Setup pgloader</a>, then what to expect from a <a href="http://tapoueh.org/blog/2011/08/01-parallel-pgloader.html">parallel pgloader</a>
setup, and then <a href="http://tapoueh.org/blog/2011/08/05-reformating-modules-for-pgloader.html">pgloader reformating</a>.  Another need you might encounter when
you get to use <a href="../../../pgsql/pgloader.html">pgloader</a> is adding <em>constant</em> values into a table's column.</p>

<p>The basic situation where you need to do so is adding an <em>origin</em> field to
your table.  The value of that is not to be found in the data file itself,
typically, but known in the pgloader setup.  That could even be the <code>filename</code>
you are importing data from.</p>

<p>In <a href="../../../pgsql/pgloader.html">pgloader</a> that's called a <em>user defined column</em>.  Here's what the relevant
<a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> setup looks like:</p>

<pre class="src">
[<span style="color: #228b22;">udc</span>]
<span style="color: #b8860b;">table</span>           = udc
<span style="color: #b8860b;">format</span>          = text
<span style="color: #b8860b;">filename</span>        = udc/udc.data
<span style="color: #b8860b;">input_encoding</span>  = <span style="color: #bc8f8f;">'latin1'</span>
<span style="color: #b8860b;">field_sep</span>       = %
<span style="color: #b8860b;">columns</span>         = b:2, d:1, x:3, y:4
<span style="color: #b8860b;">udc_c</span>           = constant value
<span style="color: #b8860b;">copy_columns</span>    = b, c, d
</pre>

<p>And the data file is:</p>

<pre class="src">
1%5%foo%bar
2%10%bar%toto
3%4%toto%titi
4%18%titi%baz
5%2%baz%foo
</pre>

<p>And here's what the loaded table looks like:</p>

<pre class="src">
pgloader/examples$ pgloader -Tsc pgloader.conf udc
Table name        |    duration |    size |  copy rows |     errors
====================================================================
udc               |      0.201s |       - |          5 |          0

pgloader/examples$ psql --cluster 8.4/main pgloader -c <span style="color: #bc8f8f;">"table udc"</span>
 b  |       c        | d
----+----------------+---
  5 | constant value | 1
 10 | constant value | 2
  4 | constant value | 3
 18 | constant value | 4
  2 | constant value | 5
(5 rows)
</pre>

<p>Of course the configuration is not so straightforward as to process fields
in the data file in the order that they appear, after all the
<a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> are also a test suite.</p>

<p>Long story short: if you need to add some <em>constant</em> values into the target
table you're loading data to, <a href="../../../pgsql/pgloader.html">pgloader</a> will help you there!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 12 Aug 2011 11:00:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/12-pgloader-udc.html</guid>
</item>
<item>
  <title>pgloader constant cols</title>
  <link>http://tapoueh.org/blog/2011/08/12-pgloader-udc.html</link>
  <description><![CDATA[<p>The previous articles in the <a href="../../../pgsql/pgloader.html">pgloader</a> series detailed <a href="http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html">How To Use PgLoader</a>
then <a href="http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html">How to Setup pgloader</a>, then what to expect from a <a href="http://tapoueh.org/blog/2011/08/01-parallel-pgloader.html">parallel pgloader</a>
setup, and then <a href="http://tapoueh.org/blog/2011/08/05-reformating-modules-for-pgloader.html">pgloader reformating</a>.  Another need you might encounter when
you get to use <a href="../../../pgsql/pgloader.html">pgloader</a> is adding <em>constant</em> values into a table's column.</p>

<p>The basic situation where you need to do so is adding an <em>origin</em> field to
your table.  The value of that is not to be found in the data file itself,
typically, but known in the pgloader setup.  That could even be the <code>filename</code>
you are importing data from.</p>

<p>In <a href="../../../pgsql/pgloader.html">pgloader</a> that's called a <em>user defined column</em>.  Here's what the relevant
<a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> setup looks like:</p>

<pre class="src">
[<span style="color: #8ae234; font-weight: bold;">udc</span>]
<span style="color: #eeeeec;">table</span>           = udc
<span style="color: #eeeeec;">format</span>          = text
<span style="color: #eeeeec;">filename</span>        = udc/udc.data
<span style="color: #eeeeec;">input_encoding</span>  = <span style="color: #ad7fa8; font-style: italic;">'latin1'</span>
<span style="color: #eeeeec;">field_sep</span>       = %
<span style="color: #eeeeec;">columns</span>         = b:2, d:1, x:3, y:4
<span style="color: #eeeeec;">udc_c</span>           = constant value
<span style="color: #eeeeec;">copy_columns</span>    = b, c, d
</pre>

<p>And the data file is:</p>

<pre class="src">
1%5%foo%bar
2%10%bar%toto
3%4%toto%titi
4%18%titi%baz
5%2%baz%foo
</pre>

<p>And here's what the loaded table looks like:</p>

<pre class="src">
pgloader/examples$ pgloader -Tsc pgloader.conf udc
Table name        |    duration |    size |  copy rows |     errors
====================================================================
udc               |      0.201s |       - |          5 |          0

pgloader/examples$ psql --cluster 8.4/main pgloader -c <span style="color: #ad7fa8; font-style: italic;">"table udc"</span>
 b  |       c        | d
----+----------------+---
  5 | constant value | 1
 10 | constant value | 2
  4 | constant value | 3
 18 | constant value | 4
  2 | constant value | 5
(5 rows)
</pre>

<p>Of course the configuration is not so straightforward as to process fields
in the data file in the order that they appear, after all the
<a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> are also a test suite.</p>

<p>Long story short: if you need to add some <em>constant</em> values into the target
table you're loading data to, <a href="../../../pgsql/pgloader.html">pgloader</a> will help you there!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 12 Aug 2011 11:00:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/12-pgloader-udc.html</guid>
</item>
<item>
  <title>Emacs Startup</title>
  <link>http://tapoueh.org/blog/2011/08/blog/2011/08/06-emacs-startup-notification.html</link>
  <description><![CDATA[<p>Using <a href="http://www.gnu.org/software/emacs/">Emacs</a> we get to manage a larger and larger setup file (either <code>~/.emacs</code>
or <code>~/.emacs.d/init.el</code>), sometime with lots of dependencies, and some
sub-files thanks to the <code>load</code> function or the <code>provide</code> and <code>require</code> mechanism.</p>

<p>Some users are even starting Emacs often enough for the startup time to be a
concern.  With an <code>emacs-uptime</code> (yes it's a command, you can <code>M-x
emacs-uptime</code>) of days to weeks (<code>10 days, 17 hours, 45 minutes, 34 seconds</code> as
of this writing), it's not something I really care about much.</p>

<p>But I know that some <a href="http://tapoueh.org/emacs/el-get.html">el-get</a> users still do care, and will use <code>el-get-is-lazy</code>
and do all their Emacs tweaking as <code>eval-after-load</code> blocks.  Trying to have
an idea of how much a <em>worst case</em> startup with <a href="http://www.emacswiki.org/emacs/el-get">el-get</a> is, I have added the
following piece of <code>elisp</code> at the very end of my startup code:</p>

<pre class="src">
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">dim:notify-startup-done</span> ()
  <span style="color: #bc8f8f;">" notify user that Emacs is now ready"</span>
  (el-get-notify
   <span style="color: #bc8f8f;">"Emacs is ready."</span>
   (format <span style="color: #bc8f8f;">"The init sequence took %g seconds."</span>
           (float-time (time-subtract after-init-time before-init-time)))))

(add-hook 'after-init-hook 'dim:notify-startup-done)
</pre>

<p>The <code>el-get-notify</code> function will adapt and either use the dbus implementation
from Emacs 24, or <a href="http://www.emacswiki.org/emacs/notify.el">notify.el</a> from <a href="http://www.emacswiki.org/">EmacsWiki</a> (just <code>M-x el-get-install</code> it if
you need it), or will use its own implementation of an Emacs <a href="http://growl.info/">Growl</a> client
(it's about 5 lines long), and baring all of that will use the <code>message</code>
function.</p>

<p>The reason I say <em>worst case</em> is that I have a lot of packages to initialize
at startup, and that I did absolutely no effort for this initializing to be
quick.  Still, my Emacs setup is taking about 20 seconds to boot.  Pretty
good I would say, for a weekly operation.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Sat, 06 Aug 2011 14:58:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/blog/2011/08/06-emacs-startup-notification.html</guid>
</item>
<item>
  <title>Emacs Startup</title>
  <link>http://tapoueh.org/blog/2011/08/06-emacs-startup-notification.html</link>
  <description><![CDATA[<p>Using <a href="http://www.gnu.org/software/emacs/">Emacs</a> we get to manage a larger and larger setup file (either <code>~/.emacs</code>
or <code>~/.emacs.d/init.el</code>), sometime with lots of dependencies, and some
sub-files thanks to the <code>load</code> function or the <code>provide</code> and <code>require</code> mechanism.</p>

<p>Some users are even starting Emacs often enough for the startup time to be a
concern.  With an <code>emacs-uptime</code> (yes it's a command, you can <code>M-x
emacs-uptime</code>) of days to weeks (<code>10 days, 17 hours, 45 minutes, 34 seconds</code> as
of this writing), it's not something I really care about much.</p>

<p>But I know that some <a href="http://tapoueh.org/emacs/el-get.html">el-get</a> users still do care, and will use <code>el-get-is-lazy</code>
and do all their Emacs tweaking as <code>eval-after-load</code> blocks.  Trying to have
an idea of how much a <em>worst case</em> startup with <a href="http://www.emacswiki.org/emacs/el-get">el-get</a> is, I have added the
following piece of <code>elisp</code> at the very end of my startup code:</p>

<pre class="src">
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">dim:notify-startup-done</span> ()
  <span style="color: #bc8f8f;">" notify user that Emacs is now ready"</span>
  (el-get-notify
   <span style="color: #bc8f8f;">"Emacs is ready."</span>
   (format <span style="color: #bc8f8f;">"The init sequence took %g seconds."</span>
           (float-time (time-subtract after-init-time before-init-time)))))

(add-hook 'after-init-hook 'dim:notify-startup-done)
</pre>

<p>The <code>el-get-notify</code> function will adapt and either use the dbus implementation
from Emacs 24, or <a href="http://www.emacswiki.org/emacs/notify.el">notify.el</a> from <a href="http://www.emacswiki.org/">EmacsWiki</a> (just <code>M-x el-get-install</code> it if
you need it), or will use its own implementation of an Emacs <a href="http://growl.info/">Growl</a> client
(it's about 5 lines long), and baring all of that will use the <code>message</code>
function.</p>

<p>The reason I say <em>worst case</em> is that I have a lot of packages to initialize
at startup, and that I did absolutely no effort for this initializing to be
quick.  Still, my Emacs setup is taking about 20 seconds to boot.  Pretty
good I would say, for a weekly operation.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Sat, 06 Aug 2011 14:58:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/06-emacs-startup-notification.html</guid>
</item>
<item>
  <title>pgloader reformating</title>
  <link>http://tapoueh.org/blog/2011/08/05-reformating-modules-for-pgloader.html</link>
  <description><![CDATA[<p>Back to our series about <a href="../../../pgsql/pgloader.html">pgloader</a>.  The previous articles detailed
<a href="http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html">How To Use PgLoader</a> then <a href="http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html">How to Setup pgloader</a>, then what to expect from a
<a href="http://tapoueh.org/blog/2011/08/01-parallel-pgloader.html">parallel pgloader</a> setup.  This article will detail how to <em>reformat</em> input
columns so that what <a href="http://www.postgresql.org/">PostgreSQL</a> sees is not what's in the data file, but the
result of a <em>transformation</em> from this data into something acceptable as an
<em>input</em> for the target data type.</p>

<p>Here's what the <a href="http://pgloader.projects.postgresql.org/">pgloader documentation</a> has to say about this <em>reformat</em>
parameter: <em>The value of this option is a comma separated list of columns to
rewrite, which are a colon separated list of column name, reformat module
name, reformat function name</em>.</p>

<p>And here's the <a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> section that deals with reformat:</p>

<pre class="src">
[<span style="color: #8ae234; font-weight: bold;">reformat</span>]
<span style="color: #eeeeec;">table</span>           = reformat
<span style="color: #eeeeec;">format</span>          = text
<span style="color: #eeeeec;">filename</span>        = reformat/reformat.data
<span style="color: #eeeeec;">field_sep</span>       = |
<span style="color: #eeeeec;">columns</span>         = id, timestamp
<span style="color: #eeeeec;">reformat</span>        = timestamp:mysql:timestamp
</pre>

<p>The documentation says some more about it, so check it out.  Also, the
<code>reformat_path</code> option (set either on the command line or in the configuration
file) is used to find the python module implementing the reformat function.
Please refer to the manual as to how to set it.</p>

<p>Now, obviously, for the <em>reformat</em> to happen we need to write some code.
That's the whole point of the option: you need something very specific, you
are in a position to write the 5 lines of code needed to make it happen,
<a href="http://tapoueh.org/pgsql/pgloader.html">pgloader</a> allows you to just do that.  Of course, the code needs to be
written in python here, so that you can even benefit from the
<a href="http://tapoueh.org/blog/2011/08/01-parallel-pgloader.html">parallel pgloader</a> settings.</p>


<p>Let's see an reformat module exemple, as found in <a href="https://github.com/dimitri/pgloader/blob/master/reformat/mysql.py">reformat/mysql.py</a> in the
<code>pgloader</code> sources:</p>

<pre class="src">
<span style="color: #888a85;"># </span><span style="color: #888a85;">Author: Dimitri Fontaine &lt;<a href="mailto:dim&#64;tapoueh.org">dim&#64;tapoueh.org</a>&gt;
</span><span style="color: #888a85;">#</span><span style="color: #888a85;">
</span><span style="color: #888a85;"># </span><span style="color: #888a85;">pgloader mysql reformating module
</span><span style="color: #888a85;">#</span><span style="color: #888a85;">
</span>
<span style="color: #729fcf; font-weight: bold;">def</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">timestamp</span>(reject, <span style="color: #729fcf;">input</span>):
    <span style="color: #ad7fa8; font-style: italic;">""" Reformat str as a PostgreSQL timestamp

    MySQL timestamps are like:  20041002152952
    We want instead this input: 2004-10-02 15:29:52
    """</span>
    <span style="color: #729fcf; font-weight: bold;">if</span> <span style="color: #729fcf;">len</span>(<span style="color: #729fcf;">input</span>) != 14:
        <span style="color: #eeeeec;">e</span> = <span style="color: #ad7fa8; font-style: italic;">"MySQL timestamp reformat input too short: %s"</span> % <span style="color: #729fcf;">input</span>
        reject.log(e, <span style="color: #729fcf;">input</span>)

    <span style="color: #eeeeec;">year</span>    = <span style="color: #729fcf;">input</span>[0:4]
    <span style="color: #eeeeec;">month</span>   = <span style="color: #729fcf;">input</span>[4:6]
    <span style="color: #eeeeec;">day</span>     = <span style="color: #729fcf;">input</span>[6:8]
    <span style="color: #eeeeec;">hour</span>    = <span style="color: #729fcf;">input</span>[8:10]
    <span style="color: #eeeeec;">minute</span>  = <span style="color: #729fcf;">input</span>[10:12]
    <span style="color: #eeeeec;">seconds</span> = <span style="color: #729fcf;">input</span>[12:14]

    <span style="color: #729fcf; font-weight: bold;">return</span> <span style="color: #ad7fa8; font-style: italic;">'%s-%s-%s %s:%s:%s'</span> % (year, month, day, hour, minute, seconds)
</pre>

<p>This reformat module will <em>transform</em> a <code>timestamp</code> representation as issued by
certain versions of MySQL into something that PostgreSQL is able to read as
a timestamp.</p>

<p>If you're in the camp that wants to write as little code as possible rather
than easy to read and maintain code, I guess you could write it this way
instead:</p>

<pre class="src">
<span style="color: #729fcf; font-weight: bold;">import</span> re
<span style="color: #729fcf; font-weight: bold;">def</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">timestamp</span>(reject, <span style="color: #729fcf;">input</span>):
    <span style="color: #ad7fa8; font-style: italic;">""" 20041002152952 -&gt; 2004-10-02 15:29:52 """</span>
    <span style="color: #eeeeec;">g</span> = re.match(r<span style="color: #ad7fa8; font-style: italic;">"(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})"</span>, <span style="color: #729fcf;">input</span>)
    <span style="color: #729fcf; font-weight: bold;">return</span> <span style="color: #ad7fa8; font-style: italic;">'%s-%s-%s %s:%s:%s'</span> % <span style="color: #729fcf;">tuple</span>([g.group(x+1) <span style="color: #729fcf; font-weight: bold;">for</span> x <span style="color: #729fcf; font-weight: bold;">in</span> <span style="color: #729fcf;">range</span>(6)])
</pre>

<p>Whenever you have an input file with data that PostgreSQL chokes upon, you
can solve this problem from <a href="http://tapoueh.org/pgsql/pgloader.html">pgloader</a> itself: no need to resort to scripting
and a pipelines of <a href="http://www.gnu.org/software/gawk/manual/gawk.html">awk</a> (which I use a lot in other cases, don't get me
wrong) or other tools.  See, you finally have an excuse to <a href="http://diveintopython.org/">Dive into Python</a>!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 05 Aug 2011 11:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/05-reformating-modules-for-pgloader.html</guid>
</item>
<item>
  <title>Reformater avec pgloader</title>
  <link>http://tapoueh.org/blog/2011/08/05-reformater-avec-pgloader.html</link>
  <description><![CDATA[<p>Dans la série de nos articles sur <a href="http://tapoueh.org/tags/pgloader.html">pgloader</a>, le dernier venu détaille comment
utiliser la fonction de <em>reformatage</em> de cet outil.  Dans le cadre
d'utilisation d'un <a href="http://fr.wikipedia.org/wiki/Extract_Transform_Load">ETL</a>, cela est assimilé à la phase <em>Transform</em>, ce qui fait
de <code>pgloader</code> une solution <em>simple</em> pour vos besoins d'ETL.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 05 Aug 2011 11:26:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/05-reformater-avec-pgloader.html</guid>
</item>
<item>
  <title>See Tsung in action</title>
  <link>http://tapoueh.org/blog/2011/08/02-see-tsung-in-action.html</link>
  <description><![CDATA[<p><a href="http://tsung.erlang-projects.org/">Tsung</a> is an open-source multi-protocol distributed load testing tool and a
mature project.  It's been available for about 10 years and is built with
the <a href="http://www.erlang.org/">Erlang</a> system.  It supports several protocols, including the <a href="http://www.postgresql.org/">PostgreSQL</a>
one.</p>

<p>When you want to benchmark your own application, to know how many more
clients it can handle or how much gain you will see with some new shiny
hardware, <a href="http://tsung.erlang-projects.org/">Tsung</a> is the tool to use.  It will allow you to <em>record</em> a number of
sessions then replay them at high scale.  <a href="http://pgfouine.projects.postgresql.org/tsung.html">pgfouine</a> supports Tsung and is
able to turn your PostgreSQL logs into Tsung sessions, too.</p>

<p>Tsung did get used in the video game world, their version of it is called
<a href="http://www.developer.unitypark3d.com/tools/utsung/">uTsung</a>, apparently using the <a href="http://www.developer.unitypark3d.com/index.html">uLink</a> game development facilities.  They even
made a video demo of uTsung, that you might find interresting:</p>

<blockquote>
<p class="quoted"><a class="image-link" href="http://www.youtube.com/watch?v=rxBhqIP_7ls">
<img src="../../../images/utsung-demo.png"></a></p>
</blockquote>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 02 Aug 2011 10:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/02-see-tsung-in-action.html</guid>
</item>
<item>
  <title>Parallel pgloader</title>
  <link>http://tapoueh.org/blog/2011/08/01-parallel-pgloader.html</link>
  <description><![CDATA[<p>This article continues the series that began with <a href="http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html">How To Use PgLoader</a> then
detailed <a href="http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html">How to Setup pgloader</a>.  We have some more fine points to talk about
here, today's article is about loading your data in parallel with <a href="../../../pgsql/pgloader.html">pgloader</a>.</p>

<h2>several files at a time</h2>

<p class="first">Parallelism is implemented in 3 different ways in pgloader.  First, you can
load more than one file at a time thanks to the <code>max_parallel_sections</code>
parameter, that has to be setup in the <em>global section</em> of the file.</p>

<p>This setting is quite simple and already allows the most common use case.</p>


<h2>several workers per file</h2>

<p class="first">The other use case is when you have huge files to load into the database.
Then you want to be able to have more than one process reading the file at
the same time.  Using <a href="../../../pgsql/pgloader.html">pgloader</a>, you already did the compromise to load the
whole content in more than one transaction, so there's no further drawback
here about having those multiple transactions per file spread to more than
one load <em>worker</em>.</p>

<p>There are basically two ways to split the work between several workers here,
and both are implemented in pgloader.</p>

<h3>N workers, N splits of the file</h3>

<pre class="src">
<span style="color: #eeeeec;">section_threads</span>    = 4
<span style="color: #eeeeec;">split_file_reading</span> = True
</pre>

<p>Setup this way, <a href="../../../pgsql/pgloader.html">pgloader</a> will launch 4 different <em>threads</em> (see the <strong>caveat</strong>
section of this article).  Each thread is then given a part of the input
data file and will run the whole usual pgloader processing on its own.  For
this to work you need to be able to <code>seek</code> in the input stream, which might
not always be convenient.</p>


<h3>one reader, N workers</h3>

<pre class="src">
<span style="color: #eeeeec;">section_threads</span>    = 4
<span style="color: #eeeeec;">split_file_reading</span> = False
<span style="color: #eeeeec;">rrqueue_size</span>       = 5000
</pre>

<p>With such a setup, <a href="../../../pgsql/pgloader.html">pgloader</a> will start 4 different worker <em>threads</em> that will
receive the data input in an internal <a href="http://docs.python.org/library/collections.html#deque-objects">python queue</a>.  Another active <em>thread</em>
will be responsible of reading the input file and filling the queues in a
<em>round robin</em> fashion, but will hand all the processing of the data to each
worker, of course.</p>


<h3>how many threads?</h3>

<p class="first">If you're using a mix and match of <code>max_parallel_sections</code> and <code>section_threads</code>
with <code>split_file_reading</code> set to <code>True</code> of <code>False</code>, it's uneasy to know exactly
how many <em>threads</em> will run at any time in the loading.  How to ascertain
which section will run in parallel when it depends on the timing of the
loading?</p>

<p>The advice here is the usual one, don't overestimate the capabilities of
your system unless you are in a position to check before by doing trial
runs.</p>



<h2>caveat</h2>

<p class="first">Current implementation of all the parallelism in <a href="../../../pgsql/pgloader.html">pgloader</a> has been done with
the <a href="http://docs.python.org/library/threading.html">python threading</a> API.  While this is easy enough to use when you want to
exchange data between threads, it's suffering from the
<a href="http://docs.python.org/c-api/init.html#thread-state-and-the-global-interpreter-lock">Global Interpreter Lock</a> issue.  This means that while the code is doing its
processing in parallel, the <em>runtime</em> not so much.  You might still benefit
from the current implementation if you have hard to parse files, or custom
reformat modules that are part of the loading bottleneck.</p>


<h2>future</h2>

<p class="first">The solution would be to switch to using the newer <a href="http://docs.python.org/library/multiprocessing.html">python multiprocessing</a>
API, and some preliminary work has been done in pgloader to allow for that.
If you're interested in real parallel bulk loading, <a href="dim%20(at)%20tapoueh%20(dot)%20org">contact-me</a>!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 01 Aug 2011 12:15:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/01-parallel-pgloader.html</guid>
</item>
<item>
  <title>Parallel pgloader</title>
  <link>http://tapoueh.org/blog/2011/08/01-parallel-pgloader.html</link>
  <description><![CDATA[<p>This article continues the series that began with <a href="http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html">How To Use PgLoader</a> then
detailed <a href="http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html">How to Setup pgloader</a>.  We have some more fine points to talk about
here, today's article is about loading your data in parallel with <a href="../../../pgsql/pgloader.html">pgloader</a>.</p>

<h2>several files at a time</h2>

<p class="first">Parallelism is implemented in 3 different ways in pgloader.  First, you can
load more than one file at a time thanks to the <code>max_parallel_sections</code>
parameter, that has to be setup in the <em>global section</em> of the file.</p>

<p>This setting is quite simple and already allows the most common use case.</p>


<h2>several workers per file</h2>

<p class="first">The other use case is when you have huge files to load into the database.
Then you want to be able to have more than one process reading the file at
the same time.  Using <a href="../../../pgsql/pgloader.html">pgloader</a>, you already did the compromise to load the
whole content in more than one transaction, so there's no further drawback
here about having those multiple transactions per file spread to more than
one load <em>worker</em>.</p>

<p>There are basically two ways to split the work between several workers here,
and both are implemented in pgloader.</p>

<h3>N workers, N splits of the file</h3>

<pre class="src">
<span style="color: #eeeeec;">section_threads</span>    = 4
<span style="color: #eeeeec;">split_file_reading</span> = True
</pre>

<p>Setup this way, <a href="../../../pgsql/pgloader.html">pgloader</a> will launch 4 different <em>threads</em> (see the <strong>caveat</strong>
section of this article).  Each thread is then given a part of the input
data file and will run the whole usual pgloader processing on its own.  For
this to work you need to be able to <code>seek</code> in the input stream, which might
not always be convenient.</p>


<h3>one reader, N workers</h3>

<pre class="src">
<span style="color: #eeeeec;">section_threads</span>    = 4
<span style="color: #eeeeec;">split_file_reading</span> = False
<span style="color: #eeeeec;">rrqueue_size</span>       = 5000
</pre>

<p>With such a setup, <a href="../../../pgsql/pgloader.html">pgloader</a> will start 4 different worker <em>threads</em> that will
receive the data input in an internal <a href="http://docs.python.org/library/collections.html#deque-objects">python queue</a>.  Another active <em>thread</em>
will be responsible of reading the input file and filling the queues in a
<em>round robin</em> fashion, but will hand all the processing of the data to each
worker, of course.</p>


<h3>how many threads?</h3>

<p class="first">If you're using a mix and match of <code>max_parallel_sections</code> and <code>section_threads</code>
with <code>split_file_reading</code> set to <code>True</code> of <code>False</code>, it's uneasy to know exactly
how many <em>threads</em> will run at any time in the loading.  How to ascertain
which section will run in parallel when it depends on the timing of the
loading?</p>

<p>The advice here is the usual one, don't overestimate the capabilities of
your system unless you are in a position to check before by doing trial
runs.</p>



<h2>caveat</h2>

<p class="first">Current implementation of all the parallelism in <a href="../../../pgsql/pgloader.html">pgloader</a> has been done with
the <a href="http://docs.python.org/library/threading.html">python threading</a> API.  While this is easy enough to use when you want to
exchange data between threads, it's suffering from the
<a href="http://docs.python.org/c-api/init.html#thread-state-and-the-global-interpreter-lock">Global Interpreter Lock</a> issue.  This means that while the code is doing its
processing in parallel, the <em>runtime</em> not so much.  You might still benefit
from the current implementation if you have hard to parse files, or custom
reformat modules that are part of the loading bottleneck.</p>


<h2>future</h2>

<p class="first">The solution would be to switch to using the newer <a href="http://docs.python.org/library/multiprocessing.html">python multiprocessing</a>
API, and some preliminary work has been done in pgloader to allow for that.
If you're interested in real parallel bulk loading, <a href="dim%20(at)%20tapoueh%20(dot)%20org">contact-me</a>!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 01 Aug 2011 12:05:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/01-parallel-pgloader.html</guid>
</item>
<item>
  <title>Configurer pgloader</title>
  <link>http://tapoueh.org/blog/2011/07/29-configurer-pgloader.html</link>
  <description><![CDATA[<p>Je viens de publier un billet en anglais intitulé <a href="http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html">How to Setup pgloader</a>, qui
complète l'écriture en cours d'un <a href="http://tapoueh.org/pgsql/pgloader.html">tutoriel pgloader</a> plus complet.  Une fois
de plus, je n'ai pas pris le temps de traduire cet article en français avant
de savoir si cela vous intéresse, ô lecteurs.  Si c'est le cas il suffit de
me l'indiquer par mail (ou <em>courriel</em>, après tout) pour que j'ajoute cela dans
ma <code>TODO</code> liste.</p>

<p>Bonne lecture !</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 29 Jul 2011 15:00:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/29-configurer-pgloader.html</guid>
</item>
<item>
  <title>How to Setup pgloader</title>
  <link>http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html</link>
  <description><![CDATA[<p>In a previous article we detailed <a href="http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html">how to use pgloader</a>, let's now see how to
write the <code>pgloader.conf</code> that instructs <a href="http://tapoueh.org/pgsql/pgloader.html">pgloader</a> about what to do.</p>

<p>This file is expected in the <code>INI</code> format, with a <em>global</em> section then one
section per file you want to import.  The <em>global</em> section defines some
default options and how to connect to the <a href="http://tapoueh.org/pgsql/index.html">PostgreSQL</a> server.</p>

<p>The configuration setup is fully documented on the <a href="http://pgloader.projects.postgresql.org/">pgloader man page</a> that
you can even easily find online.  As all <em>unix</em> style man pages, though, it's
more a complete reference than introductory material.  Let's review.</p>

<h2>global section</h2>

<p class="first">Here's the <em>global</em> section of the <a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> file of the source
files.  Well, some options are <em>debugger</em> only options, really, so I changed
their value so that what you see here is a better starting point.</p>

<pre class="src">
[<span style="color: #8ae234; font-weight: bold;">pgsql</span>]
<span style="color: #eeeeec;">base</span> = pgloader

<span style="color: #eeeeec;">log_file</span>            = /tmp/pgloader.log
<span style="color: #eeeeec;">log_min_messages</span>    = INFO
<span style="color: #eeeeec;">client_min_messages</span> = WARNING

<span style="color: #eeeeec;">lc_messages</span>         = C
<span style="color: #eeeeec;">pg_option_client_encoding</span> = <span style="color: #ad7fa8; font-style: italic;">'utf-8'</span>
<span style="color: #eeeeec;">pg_option_standard_conforming_strings</span> = on
<span style="color: #eeeeec;">pg_option_work_mem</span> = 128MB

<span style="color: #eeeeec;">copy_every</span>      = 15000

<span style="color: #eeeeec;">null</span>         = <span style="color: #ad7fa8; font-style: italic;">""</span>
<span style="color: #eeeeec;">empty_string</span> = <span style="color: #ad7fa8; font-style: italic;">"\ "</span>

<span style="color: #eeeeec;">max_parallel_sections</span> = 4
</pre>

<p>You don't see all the connection setup, here <code>base</code> was enough.  You might
need to setup <code>host</code>, <code>port</code> and <code>user</code>, and maybe even <code>pass</code>, too, to be able to
connect to the PostgreSQL server.</p>

<p>The logging options allows you to set a file where to log all <code>pgloader</code>
messages, that are categorized as either <code>DEBUG</code>, <code>INFO</code>, <code>WARNING</code>, <code>ERROR</code> or
<code>CRITICAL</code>.  The options <code>log_min_messages</code> and <code>client_min_messages</code> are another
good idea stolen from <a href="http://www.postgresql.org/">PostgreSQL</a> and allow you to setup the level of chatter
you want to see on the interactive console (standard output and standard
error streams) and on the log file.</p>

<p>Please note that the <code>DEBUG</code> level will produce more that 3 times as many data
as the data file you're importing.  If you're not a <code>pgloader</code> contributor or
helping them, well, <em>debug</em> it, you want to avoid setting the log chatter to
this value.</p>

<p>The <code>client_encoding</code> will be <a href="http://www.postgresql.org/docs/current/static/sql-set.html">SET</a> by <a href="http://tapoueh.org/pgsql/pgloader.html">pgloader</a> on the PostgreSQL connection it
establish.  You can now even set any parameter you want by using the
<code>pg_option_parameter_name</code> magic settings.  Note that the command line option
<code>--pg-options</code> (or <code>-o</code> for brevity) allows you to override that.</p>

<p>Then, the <code>copy_every</code> parameter is set to <code>5</code> in the examples, because the test
files are containing less than 10 lines and we want to test several <em>batches</em>
of commits when using them.  So for your real loading, stick to default
parameters (<code>10 000</code> lines per <code>COPY</code> command), or more.  You can play with this
parameter, depending on the network (or local access) and disk system you're
using you might see improvements by reducing it or enlarging it.  There's no
so much theory of operation as empirical testing and setting here.  For a
one-off operation, just remove the lines from the configuration.</p>

<p>The parameters <code>null</code> and <code>empty_string</code> are related to interpreting the data in
the text or <code>csv</code> files you have, and the documentation is quite clear about
them.  Note that you have global setting and per-section setting too.</p>

<p>The last parameter of this example, <code>max_parallel_sections</code>, is detailed later
in the article.</p>


<h2>files section</h2>

<p class="first">After the <em>global</em> section come as many sections as you have file to load.
Plus the <em>template</em> sections, that are only there so that you can share a
bunch of parameters in more than one section.  Picture a series of data file
all of the same format, the only thing that will change is the <code>filename</code>.
Use a template section in this case!</p>

<p>Let's see an example:</p>

<pre class="src">
[<span style="color: #8ae234; font-weight: bold;">simple_tmpl</span>]
<span style="color: #eeeeec;">template</span>     = True
<span style="color: #eeeeec;">format</span>       = text
<span style="color: #eeeeec;">datestyle</span>    = dmy
<span style="color: #eeeeec;">field_sep</span>    = |
<span style="color: #eeeeec;">trailing_sep</span> = True

[<span style="color: #8ae234; font-weight: bold;">simple</span>]
<span style="color: #eeeeec;">use_template</span>    = simple_tmpl
<span style="color: #eeeeec;">table</span>           = simple
<span style="color: #eeeeec;">filename</span>        = simple/simple.data
<span style="color: #eeeeec;">columns</span>         = a:1, b:3, c:2
<span style="color: #eeeeec;">skip_head_lines</span> = 2

<span style="color: #888a85;"># </span><span style="color: #888a85;">those reject settings are defaults one
</span><span style="color: #eeeeec;">reject_log</span>   = /tmp/simple.rej.log
<span style="color: #eeeeec;">reject_data</span>  = /tmp/simple.rej

[<span style="color: #8ae234; font-weight: bold;">partial</span>]
<span style="color: #eeeeec;">table</span>        = partial
<span style="color: #eeeeec;">format</span>       = text
<span style="color: #eeeeec;">filename</span>     = partial/partial.data
<span style="color: #eeeeec;">field_sep</span>    = %
<span style="color: #eeeeec;">columns</span>      = *
<span style="color: #eeeeec;">only_cols</span>    = 1-3, 5
</pre>

<p>That's 2 of the examples from the <a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> file, in 3 sections
so that we see one template example.  Of course, having a single section
using the template, it's just here for the example.</p>


<h2>data file format</h2>

<p class="first">The most important setting that you have to care about is the file format.
Your choice here is either <code>text</code>, <code>csv</code> or <code>fixed</code>.  Mostly, what we are given
nowadays is <code>csv</code>.  You might remember having read that the nice thing about
standards is that there's so many to choose from... well, the <code>csv</code> land is
one where it's pretty hard to find different producers that understand it
the same way.</p>

<p>So when you fail to have pgloader load your <em>mostly csv</em> files with a <code>csv</code>
setup, it's time to consider using <code>text</code> instead.  The <code>text</code> file format
accept a lot of tunables to adapt to crazy situations, but is all <code>python</code>
code when the <a href="http://docs.python.org/library/csv.html">python csv module</a> is a C-coded module, more efficient.</p>

<p>If you're wondering what kind of format we're talking about here, here's the
<a href="https://github.com/dimitri/pgloader/blob/master/examples/cluttered/cluttered.data">cluttered pgloader example</a> for your reading pleasure, using <code>^</code> (carret) as
the field separator:</p>

<pre class="src">
1^some multi\
line text with\
newline escaping^and some other data following^
2^and another line^clean^
3^and\
a last multiline\
escaped line
with a missing\
escaping^just to test^
4^\ ^empty value^
5^^null value^
6^multi line\
escaped value\
\
with empty line\
embeded^last line^
</pre>

<p>And here's what we get by loading that:</p>

<pre class="src">
pgloader/examples$ pgloader -c pgloader.conf -s cluttered
Table name        |    duration |    size |  copy rows |     errors
====================================================================
cluttered         |      0.193s |       - |          6 |          0

pgloader/examples$ psql pgloader -c <span style="color: #ad7fa8; font-style: italic;">"table cluttered;"</span>
 a |               b               |        c
---+-------------------------------+------------------
 1 | and some other data following | some multi
                                   : line text with
                                   : newline escaping
 2 | clean                         | and another line
 3 | just to test                  | and
                                   : a last multiline
                                   : escaped line
                                   : with a missing
                                   : escaping
 4 | empty value                   |
 5 | null value                    |
 6 | last line                     | multi line
                                   : escaped value
                                   :
                                   : with empty line
                                   : embeded
(6 rows)
</pre>

<p>So when you have such kind of data, well, it might be that <code>pgloader</code> is still
able to help you!</p>

<p>Please refer to the <a href="http://pgloader.projects.postgresql.org/">pgloader man page</a> to know about each and every parameter
that you can define and the values accepted, etc.  And the <em>fixed</em> data format
is to be used when you're not given a field separator but field positions in
the file.  Yes, we still encounter those from time to time.  Who needs
variable size storage, after all?</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 29 Jul 2011 15:00:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html</guid>
</item>
<item>
  <title>Emacs ANSI colors</title>
  <link>http://tapoueh.org/blog/2011/07/blog/2011/07/29-emacs-ansi-colors.html</link>
  <description><![CDATA[<p><a href="http://tapoueh.org/emacs/index.html">Emacs</a> comes with a pretty good implementation of a terminal emulator, <code>M-x
term</code>.  Well not that good actually, but given what I use it for, it's just
what I need.  Particulary if you add to that my <a href="http://tapoueh.org/emacs/cssh.html">cssh</a> tool, so that
connecting with <code>ssh</code> to a remote host is just a <code>=C-= runs the command
cssh-term-remote-open</code> away, and completes on the host name thanks to
<code>~/.ssh/known_hosts</code>.</p>

<p>Now, a problem that I still had to solve was the colors used in the
terminal.  As I'm using the <em>tango</em> color theme for emacs, the default <em>ANSI</em>
palette's blue color was not readable.  Here's how to fix that:</p>

<pre class="src">
   (<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">ansi-color</span>)
   (setq ansi-color-names-vector
         (vector (frame-parameter nil 'background-color)
               <span style="color: #bc8f8f;">"#f57900"</span> <span style="color: #bc8f8f;">"#8ae234"</span> <span style="color: #bc8f8f;">"#edd400"</span> <span style="color: #bc8f8f;">"#729fcf"</span>
               <span style="color: #bc8f8f;">"#ad7fa8"</span> <span style="color: #bc8f8f;">"cyan3"</span> <span style="color: #bc8f8f;">"#eeeeec"</span>)
         ansi-term-color-vector ansi-color-names-vector
         ansi-color-map (ansi-color-make-color-map))
</pre>

<p>Now your colors in an emacs terminal are easy to read, as you can see:</p>

<blockquote>
<p class="quoted"><img src="../../../images/emacs-tango-term-colors.png" alt=""></p>
</blockquote>

<p>Hope you enjoy!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 29 Jul 2011 10:00:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/blog/2011/07/29-emacs-ansi-colors.html</guid>
</item>
<item>
  <title>Emacs ANSI colors</title>
  <link>http://tapoueh.org/blog/2011/07/29-emacs-ansi-colors.html</link>
  <description><![CDATA[<p><a href="http://tapoueh.org/emacs/index.html">Emacs</a> comes with a pretty good implementation of a terminal emulator, <code>M-x
term</code>.  Well not that good actually, but given what I use it for, it's just
what I need.  Particulary if you add to that my <a href="http://tapoueh.org/emacs/cssh.html">cssh</a> tool, so that
connecting with <code>ssh</code> to a remote host is just a <code>=C-= runs the command
cssh-term-remote-open</code> away, and completes on the host name thanks to
<code>~/.ssh/known_hosts</code>.</p>

<p>Now, a problem that I still had to solve was the colors used in the
terminal.  As I'm using the <em>tango</em> color theme for emacs, the default <em>ANSI</em>
palette's blue color was not readable.  Here's how to fix that:</p>

<pre class="src">
   (<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">ansi-color</span>)
   (setq ansi-color-names-vector
         (vector (frame-parameter nil 'background-color)
               <span style="color: #ad7fa8; font-style: italic;">"#f57900"</span> <span style="color: #ad7fa8; font-style: italic;">"#8ae234"</span> <span style="color: #ad7fa8; font-style: italic;">"#edd400"</span> <span style="color: #ad7fa8; font-style: italic;">"#729fcf"</span>
               <span style="color: #ad7fa8; font-style: italic;">"#ad7fa8"</span> <span style="color: #ad7fa8; font-style: italic;">"cyan3"</span> <span style="color: #ad7fa8; font-style: italic;">"#eeeeec"</span>)
         ansi-term-color-vector ansi-color-names-vector
         ansi-color-map (ansi-color-make-color-map))
</pre>

<p>Now your colors in an emacs terminal are easy to read, as you can see:</p>

<blockquote>
<p class="quoted"><img src="../../../images/emacs-tango-term-colors.png" alt=""></p>
</blockquote>

<p>Hope you enjoy!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 29 Jul 2011 10:00:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/29-emacs-ansi-colors.html</guid>
</item>
<item>
  <title>How to Setup pgloader</title>
  <link>http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html</link>
  <description><![CDATA[<p>In a previous article we detailed <a href="http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html">how to use pgloader</a>, let's now see how to
write the <code>pgloader.conf</code> that instructs <a href="http://tapoueh.org/pgsql/pgloader.html">pgloader</a> about what to do.</p>

<p>This file is expected in the <code>INI</code> format, with a <em>global</em> section then one
section per file you want to import.  The <em>global</em> section defines some
default options and how to connect to the <a href="http://tapoueh.org/pgsql/index.html">PostgreSQL</a> server.</p>

<p>The configuration setup is fully documented on the <a href="http://pgloader.projects.postgresql.org/">pgloader man page</a> that
you can even easily find online.  As all <em>unix</em> style man pages, though, it's
more a complete reference than introductory material.  Let's review.</p>

<h2>global section</h2>

<p class="first">Here's the <em>global</em> section of the <a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> file of the source
files.  Well, some options are <em>debugger</em> only options, really, so I changed
their value so that what you see here is a better starting point.</p>

<pre class="src">
[<span style="color: #8ae234; font-weight: bold;">pgsql</span>]
<span style="color: #eeeeec;">base</span> = pgloader

<span style="color: #eeeeec;">log_file</span>            = /tmp/pgloader.log
<span style="color: #eeeeec;">log_min_messages</span>    = INFO
<span style="color: #eeeeec;">client_min_messages</span> = WARNING

<span style="color: #eeeeec;">lc_messages</span>         = C
<span style="color: #eeeeec;">pg_option_client_encoding</span> = <span style="color: #ad7fa8; font-style: italic;">'utf-8'</span>
<span style="color: #eeeeec;">pg_option_standard_conforming_strings</span> = on
<span style="color: #eeeeec;">pg_option_work_mem</span> = 128MB

<span style="color: #eeeeec;">copy_every</span>      = 15000

<span style="color: #eeeeec;">null</span>         = <span style="color: #ad7fa8; font-style: italic;">""</span>
<span style="color: #eeeeec;">empty_string</span> = <span style="color: #ad7fa8; font-style: italic;">"\ "</span>

<span style="color: #eeeeec;">max_parallel_sections</span> = 4
</pre>

<p>You don't see all the connection setup, here <code>base</code> was enough.  You might
need to setup <code>host</code>, <code>port</code> and <code>user</code>, and maybe even <code>pass</code>, too, to be able to
connect to the PostgreSQL server.</p>

<p>The logging options allows you to set a file where to log all <code>pgloader</code>
messages, that are categorized as either <code>DEBUG</code>, <code>INFO</code>, <code>WARNING</code>, <code>ERROR</code> or
<code>CRITICAL</code>.  The options <code>log_min_messages</code> and <code>client_min_messages</code> are another
good idea stolen from <a href="http://www.postgresql.org/">PostgreSQL</a> and allow you to setup the level of chatter
you want to see on the interactive console (standard output and standard
error streams) and on the log file.</p>

<p>Please note that the <code>DEBUG</code> level will produce more that 3 times as many data
as the data file you're importing.  If you're not a <code>pgloader</code> contributor or
helping them, well, <em>debug</em> it, you want to avoid setting the log chatter to
this value.</p>

<p>The <code>client_encoding</code> will be <a href="http://www.postgresql.org/docs/current/static/sql-set.html">SET</a> by <a href="http://tapoueh.org/pgsql/pgloader.html">pgloader</a> on the PostgreSQL connection it
establish.  You can now even set any parameter you want by using the
<code>pg_option_parameter_name</code> magic settings.  Note that the command line option
<code>--pg-options</code> (or <code>-o</code> for brevity) allows you to override that.</p>

<p>Then, the <code>copy_every</code> parameter is set to <code>5</code> in the examples, because the test
files are containing less than 10 lines and we want to test several <em>batches</em>
of commits when using them.  So for your real loading, stick to default
parameters (<code>10 000</code> lines per <code>COPY</code> command), or more.  You can play with this
parameter, depending on the network (or local access) and disk system you're
using you might see improvements by reducing it or enlarging it.  There's no
so much theory of operation as empirical testing and setting here.  For a
one-off operation, just remove the lines from the configuration.</p>

<p>The parameters <code>null</code> and <code>empty_string</code> are related to interpreting the data in
the text or <code>csv</code> files you have, and the documentation is quite clear about
them.  Note that you have global setting and per-section setting too.</p>

<p>The last parameter of this example, <code>max_parallel_sections</code>, is detailed later
in the article.</p>


<h2>files section</h2>

<p class="first">After the <em>global</em> section come as many sections as you have file to load.
Plus the <em>template</em> sections, that are only there so that you can share a
bunch of parameters in more than one section.  Picture a series of data file
all of the same format, the only thing that will change is the <code>filename</code>.
Use a template section in this case!</p>

<p>Let's see an example:</p>

<pre class="src">
[<span style="color: #8ae234; font-weight: bold;">simple_tmpl</span>]
<span style="color: #eeeeec;">template</span>     = True
<span style="color: #eeeeec;">format</span>       = text
<span style="color: #eeeeec;">datestyle</span>    = dmy
<span style="color: #eeeeec;">field_sep</span>    = |
<span style="color: #eeeeec;">trailing_sep</span> = True

[<span style="color: #8ae234; font-weight: bold;">simple</span>]
<span style="color: #eeeeec;">use_template</span>    = simple_tmpl
<span style="color: #eeeeec;">table</span>           = simple
<span style="color: #eeeeec;">filename</span>        = simple/simple.data
<span style="color: #eeeeec;">columns</span>         = a:1, b:3, c:2
<span style="color: #eeeeec;">skip_head_lines</span> = 2

<span style="color: #888a85;"># </span><span style="color: #888a85;">those reject settings are defaults one
</span><span style="color: #eeeeec;">reject_log</span>   = /tmp/simple.rej.log
<span style="color: #eeeeec;">reject_data</span>  = /tmp/simple.rej

[<span style="color: #8ae234; font-weight: bold;">partial</span>]
<span style="color: #eeeeec;">table</span>        = partial
<span style="color: #eeeeec;">format</span>       = text
<span style="color: #eeeeec;">filename</span>     = partial/partial.data
<span style="color: #eeeeec;">field_sep</span>    = %
<span style="color: #eeeeec;">columns</span>      = *
<span style="color: #eeeeec;">only_cols</span>    = 1-3, 5
</pre>

<p>That's 2 of the examples from the <a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> file, in 3 sections
so that we see one template example.  Of course, having a single section
using the template, it's just here for the example.</p>

<h3>data file format</h3>

<p class="first">The most important setting that you have to care about is the file format.
Your choice here is either <code>text</code>, <code>csv</code> or <code>fixed</code>.  Mostly, what we are given
nowadays is <code>csv</code>.  You might remember having read that the nice thing about
standards is that there's so many to choose from... well, the <code>csv</code> land is
one where it's pretty hard to find different producers that understand it
the same way.</p>

<p>So when you fail to have pgloader load your <em>mostly csv</em> files with a <code>csv</code>
setup, it's time to consider using <code>text</code> instead.  The <code>text</code> file format
accept a lot of tunables to adapt to crazy situations, but is all <code>python</code>
code when the <a href="http://docs.python.org/library/csv.html">python csv module</a> is a C-coded module, more efficient.</p>

<p>If you're wondering what kind of format we're talking about here, here's the
<a href="https://github.com/dimitri/pgloader/blob/master/examples/cluttered/cluttered.data">cluttered pgloader example</a> for your reading pleasure, using <code>^</code> (carret) as
the field separator:</p>

<pre class="src">
1^some multi\
line text with\
newline escaping^and some other data following^
2^and another line^clean^
3^and\
a last multiline\
escaped line
with a missing\
escaping^just to test^
4^\ ^empty value^
5^^null value^
6^multi line\
escaped value\
\
with empty line\
embeded^last line^
</pre>

<p>And here's what we get by loading that:</p>

<pre class="src">
pgloader/examples$ pgloader -c pgloader.conf -s cluttered
Table name        |    duration |    size |  copy rows |     errors
====================================================================
cluttered         |      0.193s |       - |          6 |          0

pgloader/examples$ psql pgloader -c <span style="color: #ad7fa8; font-style: italic;">"table cluttered;"</span>
 a |               b               |        c
---+-------------------------------+------------------
 1 | and some other data following | some multi
                                   : line text with
                                   : newline escaping
 2 | clean                         | and another line
 3 | just to test                  | and
                                   : a last multiline
                                   : escaped line
                                   : with a missing
                                   : escaping
 4 | empty value                   |
 5 | null value                    |
 6 | last line                     | multi line
                                   : escaped value
                                   :
                                   : with empty line
                                   : embeded
(6 rows)
</pre>

<p>So when you have such kind of data, well, it might be that <code>pgloader</code> is still
able to help you!</p>

<p>Please refer to the <a href="http://pgloader.projects.postgresql.org/">pgloader man page</a> to know about each and every parameter
that you can define and the values accepted, etc.  And the <em>fixed</em> data format
is to be used when you're not given a field separator but field positions in
the file.  Yes, we still encounter those from time to time.</p>



<h2>parallel processing</h2>

<h3>one reader, multiple workers</h3>


<h3>multiple workers, each reading</h3>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 29 Jul 2011 09:57:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html</guid>
</item>
<item>
  <title>Next month partitions</title>
  <link>http://tapoueh.org/blog/2011/07/27-check-parts-for-next-month.html</link>
  <description><![CDATA[<p>When you do partition your tables monthly, then comes the question of when
to create next partitions.  I tend to create them just the week before next
month and I have some nice <a href="http://www.nagios.org/">nagios</a> scripts to alert me in case I've forgotten
to do so.  How to check that by hand in the end of a month?</p>

<p>Here's a catalog query to help you there:</p>

<pre class="src">
=&gt; select *
-&gt;   from
-&gt;   (
(&gt;   select <span style="color: #ad7fa8; font-style: italic;">'previous parts'</span> as schemaname, count(*)::text as tablename
(&gt;     from pg_tables
(&gt;    where schemaname not in (<span style="color: #ad7fa8; font-style: italic;">'pg_catalog'</span>,<span style="color: #ad7fa8; font-style: italic;">'information_schema'</span>)
(&gt;      and tablename like to_char(now(), <span style="color: #ad7fa8; font-style: italic;">'%YYYYMM'</span>)
(&gt;
(&gt;   union
(&gt;
(&gt;   select schemaname, substring(tablename,1,length(tablename)-6) || <span style="color: #ad7fa8; font-style: italic;">'201108'</span>
(&gt;     from pg_tables
(&gt;    where schemaname not in (<span style="color: #ad7fa8; font-style: italic;">'pg_catalog'</span>,<span style="color: #ad7fa8; font-style: italic;">'information_schema'</span>)
(&gt;      and tablename like to_char(now(), <span style="color: #ad7fa8; font-style: italic;">'%YYYYMM'</span>)
(&gt;
(&gt;   except
(&gt;
(&gt;   select schemaname, tablename
(&gt;     from pg_tables
(&gt;    where schemaname not in (<span style="color: #ad7fa8; font-style: italic;">'pg_catalog'</span>,<span style="color: #ad7fa8; font-style: italic;">'information_schema'</span>)
(&gt;      and tablename like to_char(now() + interval <span style="color: #ad7fa8; font-style: italic;">'1 month'</span>, <span style="color: #ad7fa8; font-style: italic;">'%YYYYMM'</span>)
(&gt;   ) as t
-&gt; order by schemaname &lt;&gt; <span style="color: #ad7fa8; font-style: italic;">'previous parts'</span>, schemaname;
   schemaname   |       tablename
<span style="color: #888a85;">----------------+------------------------
</span> previous parts | 1
 central        | stats_entrantes_201108
(2 rows)
</pre>

<p>As you see, our partitions are named <code>_YYYYMM</code> so that's it's easy to match
them in our queries, but I guess about everyone does about the same here.
Then the <code>to_char</code> expressions only allow to not enter manually <code>'%201108'</code> in
the query text.  And there's a trick so that we display how many partitions
we have this month, adding a line to the result...</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 27 Jul 2011 22:35:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/27-check-parts-for-next-month.html</guid>
</item>
<item>
  <title>Comment Utiliser pgloader</title>
  <link>http://tapoueh.org/blog/2011/07/22-comment-utiliser-pgloader.html</link>
  <description><![CDATA[<p>C'est une question qui revient régulièrement, et à laquelle je pensais avoir
apporté une réponse satisfaisante avec <a href="https://github.com/dimitri/pgloader/tree/master/examples">les exemples pgloader</a>. Ce document
ressemble un peu à un <em>tutoriel</em>, en anglais, et je l'ai détaillé dans
l'article <a href="22-how-to-use-pgloader.html">how to use pgloader</a> sur ce même site, en anglais. Si la demande
est suffisante, je le traduirai en français.</p>

<p>En attendant, bonne lecture !</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 22 Jul 2011 13:48:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/22-comment-utiliser-pgloader.html</guid>
</item>
<item>
  <title>How To Use PgLoader</title>
  <link>http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html</link>
  <description><![CDATA[<p>This question about <a href="../../../pgsql/pgloader.html">pgloader</a> usage coms in quite frequently, and I think the
examples <a href="https://github.com/dimitri/pgloader/tree/master/examples">README</a> goes a long way in answering it.  It's not exactly a
<em>tutorial</em> but is almost there. Let me paste it here for reference:</p>

<h2>installing pgloader</h2>

<p class="first">Either use the <a href="http://packages.debian.org/source/pgloader">debian package</a> or the one for your distribution of choice if
you use another one.  RedHat, CentOS, FreeBSD, OpenBSD and some more already
include a binary package that you can use directly.</p>

<p>Or you could <code>git clone https://github.com/dimitri/pgloader.git</code> and go from
there.  As it's all <code>python</code> code, it runs fine interpreted from the source
directory, you don't <em>need</em> to install it in a special place in your system.</p>


<h2>setting up the test environment</h2>

<p class="first">To use them, please first create a <code>pgloader</code> database, then for each example
the tables it needs, then issue the pgloader command:</p>

<pre class="src">
$ createdb --encoding=utf-8 pgloader
$ cd examples
$ psql pgloader &lt; simple/simple.sql
$ ../pgloader.py -Tvc pgloader.conf simple
</pre>

<p>If you want to load data from all examples, create tables for all of them
first, then run pgloader without argument.</p>


<h2>example description</h2>

<p class="first">The provided examples are:</p>

<ul>
<li>simple

<p>This dataset shows basic case, with trailing separator and data
reordering.</p></li>

<li>xzero

<p>Same as simple but using \0 as the null marker ( )</p></li>

<li>errors

<p>Same test, but with impossible dates. Should report some errors. If it
does not report errors, check you're not using psycopg 1.1.21.</p>

<p>Should report 3 errors out of 7 lines (4 updates).</p></li>

<li>clob

<p>This dataset shows some text large object importing to PostgreSQL text
datatype.</p></li>

<li>cluttured

<p>A dataset with newline escaped and multi-line input (without quoting)
Beware of data reordering, too.</p></li>

<li>csv

<p>A dataset with csv delimiter ',' and quoting '&quot;'.</p></li>

<li>partial

<p>A dataset from which we only load some columns of the provided one.</p></li>

<li>serial

<p>In this dataset the id field is ommited, it's a serial which will be
automatically set by PostgreSQL while COPYing.</p></li>

<li>reformat

<p>A timestamp column is formated the way MySQL dump its timestamp,
which is not the same as the way PostgreSQL reads them. The
reformat.mysql module is used to reformat the data on-the-fly.</p></li>

<li>udc

<p>A used defined column test, where all file columns are not used but
a new constant one, not found in the input datafile, is added while
loading data.</p></li>
</ul>


<h2>running the import</h2>

<p class="first">You can launch all those pgloader tests in one run, provided you created the
necessary tables:</p>

<pre class="src">
 $ for sql in */*sql; do psql pgloader &lt; $sql; done
 $ ../pgloader.py -Tsc pgloader.conf

  errors       WARNING  COPY error, trying to find on which line
  errors       WARNING  COPY data buffer saved in /tmp/errors.AhWvAv.pgloader
  errors       WARNING  COPY error recovery done (2/3) in 0.064s
  errors       WARNING  COPY error, trying to find on which line
  errors       WARNING  COPY data buffer saved in /tmp/errors.BclHtj.pgloader
  errors       WARNING  COPY error recovery done (1/1) in 0.054s
  errors       ERROR    3 errors found into [errors] data
  errors       ERROR    please read /tmp/errors.rej.log for errors log
  errors       ERROR    and /tmp/errors.rej for data still to process
  errors       ERROR    3 database errors occured
  reformat     WARNING  COPY error, trying to find on which line
  reformat     WARNING  COPY data buffer saved in /tmp/reformat.6P4WCD.pgloader
  reformat     WARNING  COPY error recovery done (1/4) in 0.034s
  reformat     ERROR    1 errors found into [reformat] data
  reformat     ERROR    please read /tmp/reformat.rej.log for errors log
  reformat     ERROR    and /tmp/reformat.rej for data still to process
  reformat     ERROR    1 database errors occured

  Table name        |    duration |    size |  copy rows |     errors
  ====================================================================
  allcols           |      0.025s |       - |          8 |          0
  clob              |      0.034s |       - |          7 |          0
  cluttered         |      0.061s |       - |          6 |          0
  csv               |      0.035s |       - |          6 |          0
  errors            |      0.113s |       - |          4 |          3
  fixed             |      0.045s |       - |          3 |          0
  partial           |      0.030s |       - |          7 |          0
  reformat          |      0.036s |       - |          4 |          1
  serial            |      0.029s |       - |          7 |          0
  simple            |      0.050s |       - |          7 |          0
  udc               |      0.020s |       - |          5 |          0
  ====================================================================
  Total             |      0.367s |       - |         64 |          4
</pre>

<p>Please note errors test should return 3 errors and reformat 1 error.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 22 Jul 2011 13:38:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html</guid>
</item>
<item>
  <title>Emacs Cheat Sheet</title>
  <link>http://tapoueh.org/blog/2011/07/blog/2011/07/20-emacs-cheat-sheet.html</link>
  <description><![CDATA[<p>I stumbled upon the following <em>cheat sheet</em> for <a href="http://www.gnu.org/software/emacs/">Emacs</a> yesterday, and it's
worth sharing.  I already learnt or discovered again some nice default
chords, like for example <code>C-x C-o runs the command delete-blank-lines</code> and
<code>C-M-o runs the command split-line</code>.  I guess I will use the later one a lot.</p>

<center>
<p><a class="image-link" href="../../../images/emacs-cheat-sheet.png">
<img src="../../../images/emacs-cheat-sheet-tn.png"></a></p>
</center>

<p>Hope you'll like it!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 20 Jul 2011 10:44:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/blog/2011/07/20-emacs-cheat-sheet.html</guid>
</item>
<item>
  <title>Emacs Cheat Sheet</title>
  <link>http://tapoueh.org/blog/2011/07/20-emacs-cheat-sheet.html</link>
  <description><![CDATA[<p>I stumbled upon the following <em>cheat sheet</em> for <a href="http://www.gnu.org/software/emacs/">Emacs</a> yesterday, and it's
worth sharing.  I already learnt or discovered again some nice default
chords, like for example <code>C-x C-o runs the command delete-blank-lines</code> and
<code>C-M-o runs the command split-line</code>.  I guess I will use the later one a lot.</p>

<center>
<p><a class="image-link" href="../../../images/emacs-cheat-sheet.png">
<img src="../../../images/emacs-cheat-sheet-tn.png"></a></p>
</center>

<p>Hope you'll like it!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 20 Jul 2011 10:44:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/20-emacs-cheat-sheet.html</guid>
</item>
<item>
  <title>Skytools3 : les slides</title>
  <link>http://tapoueh.org/blog/2011/07/19-skytools3-slides.html</link>
  <description><![CDATA[<p>La conférence <a href="http://char11.org/">CHAR(11)</a> étant maintenant terminée, il est d'usage de publier
les <em>slides</em> utilisés.  J'ai présenté <a href="http://wiki.postgresql.org/wiki/SkyTools">Skytools</a> <code>3.0</code> dont la prochaine version
sera publiée dès que j'aurais eu le temps de terminer de revoir (en fait
principalement d'écrire) la documentation.</p>

<center>
<p><a class="image-link" href="../../../images/skytools3.pdf">
<img src="../../../images/skytools3-0.png"></a></p>
</center>

<p>Les <em>slides</em> de l'ensemble des présentations devraient être publiés en ligne à
terme, mais cela ne va pas pouvoir être fait aussi rapidement que nous le
voudrions tous.  Alors voici un peu de lecture en attendant la suite !</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 19 Jul 2011 14:39:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/19-skytools3-slides.html</guid>
</item>
<item>
  <title>Skytools3 talk Slides</title>
  <link>http://tapoueh.org/blog/2011/07/19-skytools3-talk-slides.html</link>
  <description><![CDATA[<p>In case you're wondering, here are the slides from the <a href="http://char11.org/">CHAR(11)</a> talk I gave,
about <a href="http://wiki.postgresql.org/wiki/SkyTools">Skytools</a> <code>3.0</code>, <em>soon</em> to be released.  That means as soon as I have
enough time available to polish (or write) the documentation.</p>

<center>
<p><a class="image-link" href="../../../images/skytools3.pdf">
<img src="../../../images/skytools3-0.png"></a></p>
</center>


<p>The slides for all the talks should eventually make their way to a central
place, but expect some noticable delay here.  Sorry about that, and have a
good reading meanwhile!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 19 Jul 2011 14:24:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/19-skytools3-talk-slides.html</guid>
</item>
<item>
  <title>Elisp Breadcrumbs</title>
  <link>http://tapoueh.org/blog/2011/07/blog/2011/07/14-elisp-breadcrumbs.html</link>
  <description><![CDATA[<p>A <a href="http://en.wikipedia.org/wiki/Breadcrumb_(navigation)">breadcrumb</a> is a navigation aid.  I just added one to this website, so that
it gets easier to browse from any article to its local and parents indexes
and back to <a href="../../../index.html">/dev/dim</a>, the root webpage of this site.</p>

<p>As it was not that much work to implement, here's the whole of it:</p>

<pre class="src">
<span style="color: #b22222;">;;;</span><span style="color: #b22222;">
</span><span style="color: #b22222;">;;; </span><span style="color: #b22222;">Breadcrumb support
</span><span style="color: #b22222;">;;;</span><span style="color: #b22222;">
</span>(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">tapoueh-breadcrumb-to-current-page</span> ()
  <span style="color: #bc8f8f;">"Return a list of (name . link) from the index root page to current one"</span>
  (<span style="color: #7f007f;">let*</span> ((current (muse-current-file))
         (cwd     (file-name-directory current))
         (project (muse-project-of-file current))
         (root    (muse-style-element <span style="color: #da70d6;">:path</span> (caddr project)))
         (path    (tapoueh-path-to-root))
         (dirs    (split-string (file-relative-name current root) <span style="color: #bc8f8f;">"/"</span>)))
    <span style="color: #b22222;">;; </span><span style="color: #b22222;">("blog" "2011" "07" "13-back-from-char11.muse")
</span>    (append
     (list (cons <span style="color: #bc8f8f;">"/dev/dim"</span> (concat path <span style="color: #bc8f8f;">"index.html"</span>)))
     (<span style="color: #7f007f;">loop</span> for p in (butlast dirs)
           collect (cons p (format <span style="color: #bc8f8f;">"%s%s/index.html"</span> path p))
           do (setq path (concat path p <span style="color: #bc8f8f;">"/"</span>))))))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">tapoueh-insert-breadcrumb-div</span> ()
  <span style="color: #bc8f8f;">"The real HTML inserting"</span>
  (insert <span style="color: #bc8f8f;">"&lt;div id=\"breadcrumb\"&gt;"</span>)
  (<span style="color: #7f007f;">loop</span> for (name . link) in (tapoueh-breadcrumb-to-current-page)
        do (insert (format <span style="color: #bc8f8f;">"&lt;a href=%s&gt;%s&lt;/a&gt;"</span> link name) <span style="color: #bc8f8f;">" / "</span>))
  (insert <span style="color: #bc8f8f;">"&lt;/div&gt;\n"</span>))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">tapoueh-insert-breadcrumb</span> ()
  <span style="color: #bc8f8f;">"Must run with current buffer being a muse article"</span>
  (<span style="color: #7f007f;">save-excursion</span>
    (beginning-of-buffer)
    (<span style="color: #7f007f;">when</span> (tapoueh-extract-directive <span style="color: #bc8f8f;">"author"</span> (muse-current-file))
      (re-search-forward <span style="color: #bc8f8f;">"&lt;body&gt;"</span> nil t) <span style="color: #b22222;">; </span><span style="color: #b22222;">find where the article content is
</span>      (re-search-forward <span style="color: #bc8f8f;">"&lt;h2&gt;"</span> nil t)   <span style="color: #b22222;">; </span><span style="color: #b22222;">that's the title line
</span>      (beginning-of-line)
      (open-line 1)
      (tapoueh-insert-breadcrumb-div)

      (re-search-forward <span style="color: #bc8f8f;">"&lt;h2&gt;"</span> nil t 2) <span style="color: #b22222;">; </span><span style="color: #b22222;">that's the TAG line
</span>      (beginning-of-line)
      (open-line 1)
      (tapoueh-insert-breadcrumb-div))))
</pre>

<p>This code is now called in the <code>:after</code> function of my <a href="http://www.emacswiki.org/emacs/EmacsMuse">Muse</a> project style, and
it gets the work done.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 14 Jul 2011 18:44:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/blog/2011/07/14-elisp-breadcrumbs.html</guid>
</item>
<item>
  <title>Elisp Breadcrumbs</title>
  <link>http://tapoueh.org/blog/2011/07/14-elisp-breadcrumbs.html</link>
  <description><![CDATA[
      (open-line 1)
      (tapoueh-insert-breadcrumb-div))))
</pre>

<p>This code is now called in the <code>:after</code> function of my <a href="http://www.emacswiki.org/emacs/EmacsMuse">Muse</a> project style, and
it gets the work done.</p>


<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/muse.html">Muse</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 14 Jul 2011 18:44:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/14-elisp-breadcrumbs.html</guid>
</item>
<item>
  <title>De retour de CHAR(11)</title>
  <link>http://tapoueh.org/blog/2011/07/13-de-retour-de-char11.html</link>
  <description><![CDATA[h1>De retour de CHAR(11)</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2011/index.html>2011</a> / <a href=../../../blog/2011/07/index.html>07</a> / </div>
<div class="date">Wednesday, July 13 2011, 17:30</div>
</div>
<div id="article">
<p>Quelle meilleure occupation dans le train du retour de <a href="http://char11.org/schedule">CHAR(11)</a> que de se
faire reporteur pour l'occasion ?  En réalité, dormir serait une idée tant
les soirées se sont prolongées !</p>

<p>Nous avons eu le plaisir d'écouter <strong><em>Jan Wieck</em></strong> présenter un historique
simplifié de la réplication avec <a href="http://www.postgresql.org/">PostgreSQL</a>.  Étant lui-même l'un des
pionniers du domaine, son point de vue est des plus intéressants.  Il a
parlé de l'évolution des solutions de réplication, et je ne peux m'empêcher
de penser que par bien des côtés <a href="http://wiki.postgresql.org/wiki/SKytools">Skytools</a> est une évolution de <a href="http://slony.info/">Slony</a> — Jan,
auteur de Slony, semblait d'accord avec cela.</p>

<p>En effet Skytools est né de limitations de Slony.  Certaines d'entre elles
existent toujours, comme l'absence de séparation entre la couche de <strong><em>queuing</em></strong>
et la couche de réplication elle-même, et certaines ont été résolues depuis,
comme les difficultés à subir de fortes charges en écriture.  Et puis les
deux solutions partagent même une partie de leur implémentation, depuis
PostgreSQL 8.3, avec les types de données <code>txid</code> et <a href="http://www.postgresql.org/docs/8.3/interactive/functions-info.html#FUNCTIONS-TXID-SNAPSHOT">txid_snapshot</a>.  Bien sûr,
l'objectif de Skytools est d'avoir une solution la plus simple possible,
parfaitement adapée à un ensemble de cas d'utilisation précis et bornés,
alors que Slony essaye de résoudre automatiquement les problèmes les plus
difficiles du domaine, au prix d'une interface très complexe.</p>

<p>Bien sûr, <strong><em>Jan</em></strong> a pris le temps de comparer objectivement ces solutions de
réplication avec la solution intégrée dans PostgreSQL, <em>Streaming Replication</em>
et <em>Hot Standby</em>.  Nous avions déjà la réplication binaire asynchrone,
PostgreSQL 9.1 nous apporte la réplication synchrone avec un contrôle par
transaction.  <a href="http://database-explorer.blogspot.com/">Simon Riggs</a>, auteur de la fonctionalité, a insisté sur
l'innovation que cela représente : aucun autre projet ne permet de contrôler
la garantie de durabilité des données avec une granularité aussi souple et
précise !</p>

<p><a href="http://projects.2ndquadrant.com/repmgr">repmgr</a> est une solution d'administration de <em>cluster</em> animés avec <em>Hot Standby</em>
et <em>Streaming Replication</em> (synchrone ou non).  Son fonctionnement a été
détaillé par <strong><em>Greg Smith</em></strong> et <strong><em>Cédric Villemain</em></strong>.  Le premier a montré comment
mettre au point une architecture permettant de répartir la charge en
lecture, et le second comment obtenir un système tolérant aux pannes grâce
au <em>failover</em> automatique intégré dans repmgr. Cette solution innovante a été
mise au point en grande partie par 2ndQuadrant France, nous l'avons déjà
estampillée <em>production ready</em>.</p>

<p><strong><em><a href="http://www.hagander.net/">Magnus Hagander</a></em></strong> a beaucoup travaillé sur le protocole de <em>streaming</em> utilisé
pour la réplication intégrée dans PostgreSQL 9.1, ainsi que sur les outils
qui exploitent ce protocole.  Il a naturellement présenté cela, et l'idée
d'un <em>proxy</em> relayant le flux binaire des journaux de transaction est revenue
dans les discutions (nous avions déjà envisagé cela en 2010, l'article en
anglais <a href="../../2010/05/27-back-from-pgcon2010.html">Back from PgCon2010</a> contient quelques éléments sur le sujet).  Avec
la réplication synchrone, il devient possible de concevoir des architectures
avancées, robustes et versatiles — le proxy pourrait maintenant s'occuper à
la fois des archives et des serveurs <em>standby</em>.</p>

<p><a href="http://database-explorer.blogspot.com/">Simon Riggs</a> nous a ensuite proposé une rétrospective des 7 dernières années
de travail qu'il a réalisé avec PostgreSQL, de l'implémentation du <em>Point in
Time Recovery</em> à la réplication synchrone, en passant par <em>Hot Standby</em>.  Ce
que nous avons dans PostgreSQL 9.0 correspond déjà à ce qu'Oracle propose de
plus avancé en terme de durabitilé des données, et 9.1 permet de franchir
l'étape suivante.  Cela ne freine en rien <strong><em>Simon</em></strong> qui parlait déjà des projets
à venir pour les 10 prochaines années.</p>

<p>Enfin, <a href="http://www.heroku.com/">Heroku</a> nous a présenté leur incroyable entreprise.  Ils ont
aujourd'hui plus de <code>150 000</code> instances de PostgreSQL en production,
démontrant que notre <code>SGBD</code> préféré est prêt pour les hébergeurs. <strong><em>Heroku</em></strong> est
en train de concevoir et réaliser une solution prête à l'emploi pour le
fameux <em>Cloud</em> si difficile à définir.  Ici, il s'agit d'être capable
d'ajouter des nouveaux réplicas en lecture seule à la volée pour encaisser
les pics de trafic, créer des instances de développement d'un clic, etc.</p>

<p>Cet article ne couvre qu'une petite sélection des sujets abordés à la
conférence, je compte sur <a href="http://blog.guillaume.lelarge.info/">Guillaume</a> pour lui aussi vous parler de <a href="http://char11.org/schedule">CHAR(11)</a>,
mais il faudra peut être attendre son retour des <a href="http://2011.rmll.info/">RMLL</a> (quelle énergie !).</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresqlfr.html">PostgreSQLFr</a> <a href="../../../tags/conferences.html">Conferences</a> <a href="../../../tags/skytools.html">Skytools</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 13 Jul 2011 17:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/13-de-retour-de-char11.html</guid>
</item>
<item>
  <title>Back From CHAR(11)</title>
  <link>http://tapoueh.org/blog/2011/07/13-back-from-char11.html</link>
  <description><![CDATA[h1>Back From CHAR(11)</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2011/index.html>2011</a> / <a href=../../../blog/2011/07/index.html>07</a> / </div>
<div class="date">Wednesday, July 13 2011, 17:15</div>
</div>
<div id="article">
<p><a href="http://char11.org/schedule">CHAR(11)</a> finished somewhen in the night leading to today, if you consider
the <em>social events</em> to be part of it, which I definitely do.  This conference
has been a very good one, both on the organisation side of things and of
course for its content.</p>

<p>It began with a perspective about the evolution of replication solutions, by
<strong><em>Jan Wieck</em></strong> himself.  In some way <a href="http://wiki.postgresql.org/wiki/SKytools">Skytools</a> is an evolution of <a href="http://slony.info/">Slony</a>, in the
sense that it reuses the same concepts, a part of the design, and even share
bits of the implementation (like the <a href="http://www.postgresql.org/docs/8.3/interactive/functions-info.html#FUNCTIONS-TXID-SNAPSHOT">txid_snapshot</a> datatype that were added
in PostgreSQL 8.3).  The evolution occured in choosing a subset of the
features of Slony and then simplifying the user interface as much as
possible.  And with Skytools 3.0, those features that were removed but still
are useful to solve real-life problems are now available too.</p>

<p>Of course the talk did approach the other replication solutions (not just
the trigger based ones), and did compare <a href="http://wiki.postgresql.org/wiki/Setting_up_RServ_with_PostgreSQL_7.0.3">RServ</a> to <a href="http://bucardo.org/">Bucardo</a> for example.  And
then all those were compared to the <a href="http://www.postgresql.org/">PostgreSQL</a> core replication facilities,
which are quite a different animal.  It was a really nice <em>keynote</em> here,
preparing the audience minds to make the most out of all the other talks.</p>

<p>I will not review all the talks in details, as I'm pretty sure some other
attendees will turn into reporters themselves: scaling the write load!</p>

<p>Still <a href="http://projects.2ndquadrant.com/repmgr">repmgr</a> got its share of attention.  <a href="http://www.2ndquadrant.com/books/postgresql-9-0-high-performance/">Greg Smith</a> and <a href="http://www.2ndquadrant.fr/">Cédric Villemain</a>
did present both how to do <strong>read scaling</strong> and <strong>auto failover</strong> management with
this tool, going into fine details about how it works internally and how to
best design your services architecture for maximum <strong>data availibility</strong>.  The
question and answers section led to insist on the fact that you can not have
data availibility with less than 3 production nodes.</p>

<p><a href="http://www.hagander.net/">Magnus Hagander</a> detailed how flexible the core protocol support for
replication (and streaming) really is.  That flexibility means that you can
quite easily talk this protocol from any application, and the idea of a <em>wal
proxy</em> did pop out again (see <a href="../../2010/05/27-back-from-pgcon2010.html">Back from PgCon2010</a> article for my first
mentionning of the idea).  The main difference is that we now have
<em>synchronous replication</em> support, so that the proxy could be trusted both for
archiving and serving standbys.</p>

<p>Of course <a href="http://database-explorer.blogspot.com/">Simon</a> still has lots of ideas about next 10 years of replication
oriented projects for core PostgreSQL code, and his talk nicely summarized
the previous 7 years.  Future is bright, and guess what, it's beginning
today!</p>

<p>We also heard about <a href="http://www.heroku.com/">Heroku</a>, and these guys are doing crazy impressive
things.  Like running <code>150 000</code> PostgreSQL instances, for example, showing
that you can actually use our prefered database server in the hosting
business.  I expect that the maturing solution and tool sets providing data
availibility are soon to be a game changer here.  What they are doing is
designing a <strong>flexible data architecture</strong> with strong guarantees (<strong>no data
loss</strong>).  The <em>cloud elasticity</em> is reaching out from the stateless services,
and <em>those guys</em> are making it happen now.</p>

<p>May you live in interresting times!</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/skytools.html">skytools</a> <a href="../../../tags/conferences.html">Conferences</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 13 Jul 2011 17:15:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/13-back-from-char11.html</guid>
</item>
<item>
  <title>Muse setup revised</title>
  <link>http://tapoueh.org/blog/2011/07/05-muse-setup-revised.html</link>
  <description><![CDATA[<p>Most of you are probably reading my posts directly in their <code>RSS</code> reader tools
(mine is <a href="http://www.gnus.org/">gnus</a> thanks to the <a href="http://gwene.org/">Gwene</a> service), so you probably missed it, but I
just <em>pushed</em> a whole new version of <a href="http://tapoueh.org">my website</a>, still using <a href="https://github.com/alexott/muse">Emacs Muse</a> as the
engine.</p>

<p>My setup is tentatively called <a href="../../../tapoueh.el.html">tapoueh.el</a> and browsable online.  It consists
of some tweaks on top of Muse, so that I can enjoy <a href="../../../tags/index.html">tags</a> and proper <a href="../../../rss/">rss</a>
support.  By <em>proper</em>, I mean that I want to be able to produce as many <em>topic</em>
<code>RSS</code> <em>feeds</em> from a single <em>blog</em>, and thanks to the <em>tags</em> support that's now what
I have.</p>

<p>The <code>RSS</code> handling and the tagging system are adhoc code, and this very
article begins like this:</p>

<pre class="src">
#author Dimitri Fontaine
#title  Muse setup revised
#date   20110705-19:55
#tags   Emacs Muse
</pre>

<p>All the information for the site navigation are taken from there, and at
long last the <code>RSS</code> I publish now contains proper <code>URLs</code> without abusing
<a href="../../../blog.dim.html">anchors</a>, as in the previous link which is a compatibility page in case you
had some bookmarks.  The compat only works with javascript (did you know
that <em>anchors</em> are not part of the <code>URL</code> that is sent to the server, so that you
can't apply <code>RedirectMatch</code> or other tweaks?), but all it needs is <em>2 lines of
code</em>, so I guess that's not so bad.</p>

<pre class="src">
<span style="color: #fcaf3e;">var</span> <span style="color: #fce94f;">anchor</span> = window.location.hash;
document.location.href=document.getElementById(anchor).href;
</pre>

<p>I hope you like the new setup as much as I do, even if I'm left with some
debugging to do.  That's the price to pay for doing it yourself I guess.
But I still don't know of a ready to use solution (as in <em>off the shelf</em>) that
meet my criteria for web publishing.  More on that topic another time.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 05 Jul 2011 19:55:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/05-muse-setup-revised.html</guid>
</item>
<item>
  <title>Muse setup revised</title>
  <link>http://tapoueh.org/blog/2011/07/blog/2011/07/05-muse-setup-revised.html</link>
  <description><![CDATA[<p>Most of you are probably reading my posts directly in their <code>RSS</code> reader tools
(mine is <a href="http://www.gnus.org/">gnus</a> thanks to the <a href="http://gwene.org/">Gwene</a> service), so you probably missed it, but I
just <em>pushed</em> a whole new version of <a href="http://tapoueh.org">my website</a>, still using <a href="https://github.com/alexott/muse">Emacs Muse</a> as the
engine.</p>

<p>My setup is tentatively called <a href="../../../tapoueh.el.html">tapoueh.el</a> and browsable online.  It consists
of some tweaks on top of Muse, so that I can enjoy <a href="../../../tags/index.html">tags</a> and proper <a href="../../../rss/">rss</a>
support.  By <em>proper</em>, I mean that I want to be able to produce as many <em>topic</em>
<code>RSS</code> <em>feeds</em> from a single <em>blog</em>, and thanks to the <em>tags</em> support that's now what
I have.</p>

<p>The <code>RSS</code> handling and the tagging system are adhoc code, and this very
article begins like this:</p>

<pre class="src">
#author Dimitri Fontaine
#title <span style="font-size: 140%; font-weight: bold;"> Muse setup revised</span>
#date   20110705-19:55
#tags   Emacs Muse
</pre>

<p>All the information for the site navigation are taken from there, and at
long last the <code>RSS</code> I publish now contains proper <code>URLs</code> without abusing
<a href="../../../blog.dim.html">anchors</a>, as in the previous link which is a compatibility page in case you
had some bookmarks.  The compat only works with javascript (did you know
that <em>anchors</em> are not part of the <code>URL</code> that is sent to the server, so that you
can't apply <code>RedirectMatch</code> or other tweaks?), but all it needs is <em>2 lines of
code</em>, so I guess that's not so bad.</p>

<pre class="src">
<span style="color: #7f007f;">var</span> <span style="color: #b8860b;">anchor</span> = window.location.hash;
document.location.href=document.getElementById(anchor).href;
</pre>

<p>I hope you like the new setup as much as I do, even if I'm left with some
debugging to do.  That's the price to pay for doing it yourself I guess.
But I still don't know of a ready to use solution (as in <em>off the shelf</em>) that
meet my criteria for web publishing.  More on that topic another time.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 05 Jul 2011 19:55:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/blog/2011/07/05-muse-setup-revised.html</guid>
</item>
<item>
  <title>Prêt pour CHAR(11) ?</title>
  <link>http://tapoueh.org/blog/2011/07/04-pret-pour-char11.html</link>
  <description><![CDATA[h1>Prêt pour CHAR(11) ?</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2011/index.html>2011</a> / <a href=../../../blog/2011/07/index.html>07</a> / </div>
<div class="date">Monday, July 04 2011, 20:15</div>
</div>
<div id="article">
<p>La semaine prochaine <strong>déjà</strong> se tient <a href="http://www.char11.org/">CHAR(11)</a>, la conférence spécialisée sur
le <em>Clustering</em>, la <em>Haute Disponibilité</em> et la <em>Réplication</em> avec <a href="http://www.postgresql.org/">PostgreSQL</a>.
C'est en Europe, à Cambridge cette fois, et c'est en anglais même si
plusieurs compatriotes seront dans l'assistance.</p>

<p>Si vous n'avez pas encore jeté un œil au <a href="http://www.char11.org/schedule">programme</a>, je vous encourage à le
faire. Même si vous n'aviez pas prévu de venir… parce qu'il y a de quoi vous
faire changer d'avis !</p>

<p>Il est déjà difficile de suivre les <a href="http://archives.postgresql.org/">listes de diffusions PostgreSQL</a> en
anglais, pour une simple question de temps, mais parfois la barrière de la
langue peut également jouer. Alors si vous n'aviez pas bien suivi, je me
permets de préciser qui sont les principaux intervenants à cette conférence.</p>

<p><strong><em>Jan Wieck</em></strong> assure la première intervention avec un rétrospectif des solutions
de réplication pour PostgreSQL. Il a initié <a href="http://slony.info/">Slony</a> et continue d'être très
actif dans son architecture et son développement.</p>

<p><strong><em>Greg Smith</em></strong>, un collègue chez <a href="http://www.2ndquadrant.us/">2ndQuadrant</a>, est monsieur performances « bas
niveau » : sa spécialité est de tirer le meilleur de votre matériel, de
votre configuration serveur, de PostgreSQL lui-même, et des requêtes que
vous lui soumettez. Son livre <a href="http://www.2ndquadrant.com/books/postgresql-9-0-high-performance/">PostgreSQL High Performance</a> est un
incontournable, à ce titre <a href="http://blog.guillaume.lelarge.info/index.php/post/2011/05/01/%C2%AB-Bases-de-donn%C3%A9es-PostgreSQL&#44;-Gestion-des-performances-%C2%BB">traduit en français</a>.</p>

<p>Nous avons ensuite <strong><em>Magnus Hagander</em></strong> qui a rejoint récemment la <em>core team</em>
(l'organisation centrale du projet), et qui contribue depuis plus de 10 ans
au code de PostgreSQL.</p>

<p><strong><em>Simon Riggs</em></strong>, lui aussi un de <a href="http://www.2ndquadrant.com/about/#riggs">nos collègues</a>, a réalisé le <em>PITR</em>, l'archivage
des journaux de transactions, la réplication asynchrone et pour la prochaine
version de PostgreSQL, la réplication synchrone.</p>

<p><strong><em>Hannu Krosing</em></strong> (devinez <a href="http://www.2ndquadrant.com/">où</a> il travaille ?) a conçu l'architecture (et les
outils) qui permettent à <a href="http://www.skype.com/">Skype</a> d'annoncer une « scalability » infinie, en
tout cas annoncée pour supporter jusqu'à <a href="http://highscalability.com/skype-plans-postgresql-scale-1-billion-users">1 milliard d'utilisateurs</a>.</p>

<p><strong><em>Koichi Suzuki</em></strong> dirige les efforts du produit prometteur <a href="http://postgres-xc.sourceforge.net/">PostgreS-XC</a>, un bel
exemple de collaboration entre différents acteurs du marché, ici
<a href="http://www.enterprisedb.com/">EnterpriseDB</a> et <a href="https://www.oss.ecl.ntt.co.jp/ossc/">NTT Open Source Software Center</a>. Ce qui montre une fois de
plus que l'<a href="http://fr.wikipedia.org/wiki/Open_source">Open Source</a> est solidement ancré dans entreprises commerciales.</p>

<p>Bien sûr, Cédric et moi-même, de la partie française de <a href="http://www.2ndquadrant.fr/">2ndQuadrant</a>, serons
de la partie. Nous interviendrons sur des sujets que nous connaissons bien
pour avoir participé à leur développement et pour les déployer et les
maintenir en production, <a href="http://projects.2ndquadrant.com/repmgr">repmgr</a> et <a href="http://wiki.postgresql.org/wiki/Londiste_Tutorial">Londiste</a>.</p>

<p>Et je passe sur d'autres profils, dont les sujets ne serront pas moins
intéressants. Bref, si <em>réplication</em> et <em>cluster</em> sont des thèmes que vous
voulez conjuguer avec PostgreSQL, c'est l'endroit où passer le début de la
semaine prochaine !</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresqlfr.html">PostgreSQLfr</a> <a href="../../../tags/conferences.html">Conferences</a> <a href="../../../tags/skytools.html">skytools</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 04 Jul 2011 20:15:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/04-pret-pour-char11.html</guid>
</item>




<item>
  <title>Multi-Version support for Extensions</title>
  <link>http://tapoueh.org/blog/2011/06/29-multi-version-support-for-extensions.html</link>
  <description><![CDATA[h1>Multi-Version support for Extensions</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2011/index.html>2011</a> / <a href=../../../blog/2011/06/index.html>06</a> / </div>
<div class="date">Wednesday, June 29 2011, 09:50</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>We still have this problem to solve with extensions and their packaging.
How to best organize things so that your extension is compatible with before
<code>9.1</code> and <code>9.1</code> and following releases of <a href="http://www.postgresql.org/">PostgreSQL</a>?</p>

<p>Well, I had to do it for the <a href="http://pgfoundry.org/projects/ip4r/">ip4r</a> contribution, and I wanted the following
to happen:</p>

<pre class="src">
dpkg-deb: building package `postgresql-8.3-ip4r' ...
dpkg-deb: building package `postgresql-8.4-ip4r' ...
dpkg-deb: building package `postgresql-9.0-ip4r' ...
dpkg-deb: building package `postgresql-9.1-ip4r' ...
</pre>

<p>And here's a simple enough way to achieve that.  First, you have to get your
packaging ready the usual way, and to install the build dependencies.  Then
realizing that <code>/usr/share/postgresql-common/supported-versions</code> from the
latest <code>postgresql-common</code> package will only return <code>8.3</code> in <code>lenny</code> (yes, I'm
doing some <em>backporting</em> here), we have to tweak it.</p>

<pre class="src">
postgresql-server-dev-8.4
postgresql-server-dev-9.0
postgresql-server-dev-9.1
postgresql-server-dev-all

$ sudo dpkg-divert \
--divert /usr/share/postgresql-common/supported-versions.distrib \
--rename /usr/share/postgresql-common/supported-versions

$ cat /usr/share/postgresql-common/supported-versions
#! /bin/bash

dpkg -l postgresql-server-dev-* \
| awk -F '[ -]' '/^ii/ &amp;&amp; ! /server-dev-all/ {print $6}'
</pre>

<p>Now we are allowed to build our extension for all those versions, so we add
<code>9.1</code> to the <code>debian/pgversions</code> file.  And <code>debuild</code> will do the right thing now,
thanks to <a href="http://manpages.debian.net/cgi-bin/man.cgi?query=pg_buildext">pg_buildext</a> from <a href="http://packages.debian.org/sid/postgresql-server-dev-all">postgresql-server-dev-all</a>.</p>

<p>The problem we face is that the built is not an <a href="http://www.postgresql.org/docs/9.1/static/extend-extensions.html">extension</a> as in <code>9.1</code>, so
things like <code>\dx</code> in <code>psql</code> and <a href="http://www.postgresql.org/docs/9.1/static/sql-createextension.html">CREATE EXTENSION</a> will not work out of the box.
First, we need a control file.  Then we need to remove the transaction
control from the install script (here, <code>ip4r.sql</code>), and finally, this script
needs to be called <code>ip4r--1.05.sql</code>.  Here's how I did it:</p>

<pre class="src">
$ cat ip4r.control
comment = 'IPv4 and IPv4 range index types'
default_version = '1.05'
relocatable = yes

$ cat debian/postgresql-9.1-ip4r.install
debian/ip4r-9.1/ip4r.so usr/lib/postgresql/9.1/lib
ip4r.control usr/share/postgresql/9.1/extension
debian/ip4r-9.1/ip4r.sql usr/share/postgresql/9.1/extension

$ cat debian/postgresql-9.1-ip4r.links
usr/share/postgresql/9.1/extension/ip4r.sql usr/share/postgresql/9.1/extension/ip4r--1.05.sql
</pre>

<p>Be careful not to forget to remove any and all <code>BEGIN;</code> and <code>COMMIT;</code> lines from
the <code>ip4r.sql</code> file, which meant that I also removed support for <em>Rtree</em>, which
is not relevant for modern versions of PostgreSQL saith the script (post
<code>8.2</code>).  That means I'm not publishing this very work yet, but I wanted to
share the <code>debian/postgresql-9.1-extension.links</code> idea.</p>

<p>Notice that I didn't change anything about the <code>.sql.in</code> make rule, so I
didn't have to use the support for <code>module_pathname</code> in the control file.</p>

<p>Now, after the usual <code>debuild</code> step, I can just <code>sudo debi</code> to install all the
just build packages and <code>CREATE EXTENSION</code> will run fine.  And in <code>9.0</code> you get
the old way to install it, but it still works:</p>

<pre class="src">
$ psql -U postgres --cluster 9.0/main -1 \
-f /usr/share/postgresql/9.0/contrib/ip4r.sql
&lt;lots of chatter&gt;

$ psql -U postgres --cluster 9.1/main -c 'create extension ip4r;'
CREATE EXTENSION
</pre>

<p>That's it :)</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/ip4r.html">ip4r</a> <a href="../../../tags/9.1.html">9.1</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 29 Jun 2011 09:50:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/06/29-multi-version-support-for-extensions.html</guid>
</item>
<item>
  <title>Don't be afraid of 'cl</title>
  <link>http://tapoueh.org/blog/2011/06/blog/2011/06/20-dont-be-afraid-of-cl.html</link>
  <description><![CDATA[<p>In this <a href="http://tsengf.blogspot.com/2011/06/confirm-to-quit-when-editing-files-from.html">blog article</a>, you're shown a quite long function that loop through
your buffers to find out if any of them is associated with a file whose full
name includes <code>&quot;projects&quot;</code>.  Well, you should not be afraid of using <code>cl</code>:</p>

<pre class="src">
(<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">cl</span>)
(<span style="color: #7f007f;">loop</span> for b being the buffers
      when (string-match <span style="color: #bc8f8f;">"projects"</span> (or (buffer-file-name b) <span style="color: #bc8f8f;">""</span>))
      return t)
</pre>

<p>If you want to collect the list of buffers whose name matches your test,
then replace <code>return t</code> by <code>collect b</code> and you're done.  Really, this <code>loop</code> thing
is worth learning.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 20 Jun 2011 00:15:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/06/blog/2011/06/20-dont-be-afraid-of-cl.html</guid>
</item>
<item>
  <title>Don't be afraid of 'cl</title>
  <link>http://tapoueh.org/blog/2011/06/20-dont-be-afraid-of-cl.html</link>
  <description><![CDATA[h1>Don't be afraid of 'cl</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2011/index.html>2011</a> / <a href=../../../blog/2011/06/index.html>06</a> / </div>
<div class="date">Monday, June 20 2011, 00:15</div>
</div>
<div id="article">
<p>In this <a href="http://tsengf.blogspot.com/2011/06/confirm-to-quit-when-editing-files-from.html">blog article</a>, you're shown a quite long function that loop through
your buffers to find out if any of them is associated with a file whose full
name includes <code>&quot;projects&quot;</code>.  Well, you should not be afraid of using <code>cl</code>:</p>

<pre class="src">
(<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">cl</span>)
(<span style="color: #729fcf; font-weight: bold;">loop</span> for b being the buffers
      when (string-match <span style="color: #ad7fa8; font-style: italic;">"projects"</span> (or (buffer-file-name b) <span style="color: #ad7fa8; font-style: italic;">""</span>))
      return t)
</pre>

<p>If you want to collect the list of buffers whose name matches your test,
then replace <code>return t</code> by <code>collect b</code> and you're done.  Really, this <code>loop</code> thing
is worth learning.</p>


<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 20 Jun 2011 00:15:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/06/20-dont-be-afraid-of-cl.html</guid>
</item>






<item>
  <title>Back from Ottawa, preparing for Cambridge</title>
  <link>http://tapoueh.org/blog/2011/05/30-back-from-ottawa-preparing-for-cambridge.html</link>
  <description><![CDATA[h1>Back from Ottawa, preparing for Cambridge</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2011/index.html>2011</a> / <a href=../../../blog/2011/05/index.html>05</a> / </div>
<div class="date">Monday, May 30 2011, 11:00</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>While <a href="http://blog.hagander.net/">Magnus</a> is all about <a href="http://2011.pgconf.eu/">PG Conf EU</a> already, you have to realize we're just
landed back from <a href="http://www.pgcon.org/2011/">PG Con</a> in Ottawa.  My next stop in the annual conferences
is <a href="http://char11.org/">CHAR 11</a>, the <em>Clustering, High Availability and Replication</em> conference in
Cambridge, 11-12 July.  Yes, on the old continent this time.</p>

<p>This year's <em>pgcon</em> hot topics, for me, have been centered around a better
grasp at <a href="http://www.postgresql.org/docs/9.1/static/transaction-iso.html#XACT-SERIALIZABLE">SSI</a> and <em>DDL Triggers</em>.  Having those beasts in <a href="http://www.postgresql.org/">PostgreSQL</a> would
allow for auditing, finer privileges management and some more automated
replication facilities.  Imagine that <code>ALTER TABLE</code> is able to fire a <em>trigger</em>,
provided by <em>Londiste</em> or <em>Slony</em>, that will do what's needed on the cluster by
itself.  That would be awesome, wouldn't it?</p>

<p>At <em>CHAR 11</em> I'll be talking about <a href="http://wiki.postgresql.org/wiki/SkyTools">Skytools 3</a>.  You know I've been working on
its <em>debian</em> packaging, now is the time to review the documentation and make
there something as good looking as the monitoring system are...</p>

<p>Well, expect some news and a nice big picture diagram overview soon, if work
schedule leaves me anytime that's what I want to be working on now.</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/pgcon.html">pgcon</a> <a href="../../../tags/conferences.html">Conferences</a> <a href="../../../tags/skytools.html">skytools</a> <a href="../../../tags/9.1.html">9.1</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 30 May 2011 11:00:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/30-back-from-ottawa-preparing-for-cambridge.html</guid>
</item>
<item>
  <title>el-get 2.2</title>
  <link>http://tapoueh.org/blog/2011/05/blog/2011/05/26-el-get-22.html</link>
  <description><![CDATA[<p>We've spotted a little too late for our own taste a discrepancy in the
source tree: a work in progress patch landed in git just before to release
<a href="https://github.com/dimitri/el-get">el-get</a> stable.  So we cleaned the tree (thanks again <a href="http://julien.danjou.info/">Julien</a>), branched a
stable maintenance tree, and released <code>2.2</code> from there.</p>

<p>You're back to enjoying <code>el-get</code> :)</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 26 May 2011 12:00:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/blog/2011/05/26-el-get-22.html</guid>
</item>
<item>
  <title>el-get 2.2</title>
  <link>http://tapoueh.org/blog/2011/05/26-el-get-22.html</link>
  <description><![CDATA[h1>el-get 2.2</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2011/index.html>2011</a> / <a href=../../../blog/2011/05/index.html>05</a> / </div>
<div class="date">Thursday, May 26 2011, 12:00</div>
</div>
<div id="article">
<p>We've spotted a little too late for our own taste a discrepancy in the
source tree: a work in progress patch landed in git just before to release
<a href="https://github.com/dimitri/el-get">el-get</a> stable.  So we cleaned the tree (thanks again <a href="http://julien.danjou.info/">Julien</a>), branched a
stable maintenance tree, and released <code>2.2</code> from there.</p>

<p>You're back to enjoying <code>el-get</code> :)</p>


<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/release.html">release</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 26 May 2011 12:00:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/26-el-get-22.html</guid>
</item>
<item>
  <title>el-get 2.1</title>
  <link>http://tapoueh.org/blog/2011/05/blog/2011/05/26-el-get-21.html</link>
  <description><![CDATA[<p>Current <a href="https://github.com/dimitri/el-get">el-get</a> status is stable, ready for daily use and packed with extra
features that make life easier.  There are some more things we could do, as
always, but they will be about smoothing things further.</p>

<h3>Latest released version</h3>

<p><a href="https://github.com/dimitri/el-get">el-get</a> version <code>2.1</code> is available, with a boatload of features, including
autoloads support, byte-compiling in an external <em>clean room</em> <a href="http://www.gnu.org/software/emacs/">Emacs</a> instance,
custom support, lazy initialisation support (defering all <em>init</em> functions to
<code>eval-after-load</code>), and multi repositories <code>ELPA</code> support.</p>


<h3>Version numbering</h3>

<p class="first">Version String are now inspired by how Emacs itself numbers its versions.
First is the major version number, then a dot, then the minor version
number.  The minor version number is <code>0</code> when still developping the next major
version.  So <code>3.0</code> is a developer release while <code>3.1</code> will be the next stable
release.</p>

<p>Please note that this versioning policy has been picked while backing
<code>1.2~dev</code>, so <code>1.0</code> was a <em>stable</em> release in fact.  Ah, history.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 26 May 2011 10:00:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/blog/2011/05/26-el-get-21.html</guid>
</item>
<item>
  <title>el-get 2.1</title>
  <link>http://tapoueh.org/blog/2011/05/26-el-get-21.html</link>
  <description><![CDATA[h1>el-get 2.1</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2011/index.html>2011</a> / <a href=../../../blog/2011/05/index.html>05</a> / </div>
<div class="date">Thursday, May 26 2011, 10:00</div>
</div>
<div id="article">
<p>Current <a href="https://github.com/dimitri/el-get">el-get</a> status is stable, ready for daily use and packed with extra
features that make life easier.  There are some more things we could do, as
always, but they will be about smoothing things further.</p>

<h3>Latest released version</h3>

<p><a href="https://github.com/dimitri/el-get">el-get</a> version <code>2.1</code> is available, with a boatload of features, including
autoloads support, byte-compiling in an external <em>clean room</em> <a href="http://www.gnu.org/software/emacs/">Emacs</a> instance,
custom support, lazy initialisation support (defering all <em>init</em> functions to
<code>eval-after-load</code>), and multi repositories <code>ELPA</code> support.</p>


<h3>Version numbering</h3>

<p class="first">Version String are now inspired by how Emacs itself numbers its versions.
First is the major version number, then a dot, then the minor version
number.  The minor version number is <code>0</code> when still developping the next major
version.  So <code>3.0</code> is a developer release while <code>3.1</code> will be the next stable
release.</p>

<p>Please note that this versioning policy has been picked while backing
<code>1.2~dev</code>, so <code>1.0</code> was a <em>stable</em> release in fact.  Ah, history.</p>




<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/release.html">release</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 26 May 2011 10:00:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/26-el-get-21.html</guid>
</item>
<item>
  <title>Preparing for PGCON</title>
  <link>http://tapoueh.org/blog/2011/05/12-preparing-for-pgcon.html</link>
  <description><![CDATA[h1>Preparing for PGCON</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2011/index.html>2011</a> / <a href=../../../blog/2011/05/index.html>05</a> / </div>
<div class="date">Thursday, May 12 2011, 10:30</div>
</div>
<div id="article">
<p>It's this time of the year again, the main international
<a href="http://www.pgcon.org/2011/">PostgreSQL Conference</a> is next week in Ottawa, Canada.  If previous years are
any indication, this will be great event where to meet with a lot of the
members of your community.  The core team will be there, developers will be
there, and we will meet with users and their challenging use cases.</p>

<p>This is a very good time to review both what you did in the project those
last 12 months, and what you plan to do next year.  To help with that,
several <em>meeting</em> events are organized.  They're like a whole-day round table
with a kind of an agenda, with a limited number of invited people in, and
very intense on-topic discussions about how to organize ourselves for
another great year of innovation in PostgreSQL.</p>

<p>Then we have two days full of talks where I usually learn some new aspect of
the project or of the product, and where ideas tend to just pop-up in a
continuous race.  Being away from home and with people you see only once a
year (some of them more than that of course, hi European fellows!) seems to
allow for some broader thinking.</p>

<p>The talks I want to go to include
<a href="http://www.pgcon.org/2011/schedule/events/361.en.html">Database Scalability Patterns: Sharding for Unlimited Growth</a> by
<a href="http://www.pgcon.org/2011/schedule/speakers/20.en.html">Robert Treat</a>, <a href="http://www.pgcon.org/2011/schedule/events/366.en.html">Maintaining Terabytes</a> by <a href="http://www.pgcon.org/2011/schedule/speakers/112.en.html">Selena Deckelmann</a>, <a href="http://www.pgcon.org/2011/schedule/events/307.en.html">NTT’s Case Report</a>
by <a href="http://www.pgcon.org/2011/schedule/speakers/192.en.html">Tetsuo Sakata</a>, <a href="http://www.pgcon.org/2011/schedule/events/350.en.html">Hacking the Query Planner</a> by <a href="http://www.pgcon.org/2011/schedule/speakers/202.en.html">Tom Lane</a>.  That's for a first
day, right?</p>

<p>Then, on the second day, I notice <a href="http://www.pgcon.org/2011/schedule/events/311.en.html">Range Types</a> by <a href="http://www.pgcon.org/2011/schedule/speakers/83.en.html">Jeff Davis</a>,
<a href="http://www.pgcon.org/2011/schedule/events/309.en.html">SP-GiST - a new indexing infrastructure for PostgreSQL</a> by <a href="http://www.pgcon.org/2011/schedule/speakers/29.en.html">Oleg</a> and <a href="http://www.pgcon.org/2011/schedule/speakers/33.en.html">Teodor</a>,
<a href="http://www.pgcon.org/2011/schedule/events/337.en.html">The Write Stuff</a> by <a href="http://www.pgcon.org/2011/schedule/speakers/110.en.html">Greg Smith</a> (a colleague at <a href="http://www.2ndquadrant.fr/">2ndQuadrant</a>).</p>

<p>I will miss <a href="http://www.pgcon.org/2011/schedule/events/333.en.html">Serializable Snapshot Isolation in Postgres</a> by <a href="http://www.pgcon.org/2011/schedule/speakers/113.en.html">Kevin Grittner</a>
and <a href="http://www.pgcon.org/2011/schedule/speakers/197.en.html">Dan Ports</a>, unfortunately, because I'll be talking about
<a href="http://www.pgcon.org/2011/schedule/events/280.en.html">Extensions Development</a> at the same time.</p>

<p>Well of course this list is just a first selection, hallway tracks are often
what guides me through talks or make me skip some.</p>

<p>See you there!</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/pgcon.html">pgcon</a> <a href="../../../tags/extensions.html">Extensions</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 12 May 2011 10:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/12-preparing-for-pgcon.html</guid>
</item>
<item>
  <title>Mailq modeline display</title>
  <link>http://tapoueh.org/blog/2011/05/blog/2011/05/05-mailq-modeline-display.html</link>
  <description><![CDATA[<p>If you've not been following along, you might have missed it: it appears to
me that even today, in 2011, mail systems work much better when setup the
old way.  Meaning with a local <a href="http://en.wikipedia.org/wiki/Mail_Transfer_Agent">MTA</a> for outgoing mail.  With some niceties,
such as <a href="http://tapoueh.org/articles/news/_Postfix_sender_dependent_relayhost_maps.html">sender dependent relayhost maps</a>.</p>

<p>That's why I needed <a href="http://tapoueh.org/projects.html#sec21">M-x mailq</a> to display the <em>mail queue</em> and have some easy
shortcuts in order to operate it (mainly <code>f runs the command
mailq-mode-flush</code>, but per site and per id delivery are useful too).</p>

<p>Now, I also happen to setup outgoing mail routes to walk through an <em>SSH
tunnel</em>, which thanks to both <a href="http://www.manpagez.com/man/5/ssh_config/">~/.ssh/config</a> and <a href="https://github.com/dimitri/cssh">cssh</a> (<code>C-= runs the
command cssh-term-remote-open</code>, with completion) is a couple of
keystrokes away to start.  Well it still happens to me to forget about
starting it, which causes mails to hold in a queue until I realise it's not
delivered, which always take just about too long.</p>

<p>A solution I've been thinking about is to add a little flag in the <a href="http://www.gnu.org/s/emacs/manual/html_node/elisp/Mode-Line-Format.html">modeline</a>
in my <a href="http://www.gnus.org/">gnus</a> <code>*Group*</code> and <code>*Summary*</code> buffers.  The flag would show up as ✔ when
no mail is queued and waiting for me to open the tunnel, or ✘ as soon as the
queue is not empty.  Here's what it looks like here:</p>

<center>
<p><img src="../../../images//mailq-modeline-display.png" alt=""></p>
</center>

<p>Well I'm pretty happy with the setup.  The flag is refreshed every minute,
and here's as an example how I did setup <code>mailq</code> in my <a href="https://github.com/dimitri/el-get">el-get-sources</a> setup:</p>

<pre class="src">
         (<span style="color: #da70d6;">:name</span> mailq
                <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> () (mailq-modeline-display)))
</pre>

<p>I'm not sure how many of you dear readers are using a local MTA to deliver
your mails, but well, the ones who do (or consider doing so) might even find
this article useful!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 05 May 2011 14:10:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/blog/2011/05/05-mailq-modeline-display.html</guid>
</item>
<item>
  <title>Mailq modeline display</title>
  <link>http://tapoueh.org/blog/2011/05/05-mailq-modeline-display.html</link>
  <description><![CDATA[h1>Mailq modeline display</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2011/index.html>2011</a> / <a href=../../../blog/2011/05/index.html>05</a> / </div>
<div class="date">Thursday, May 05 2011, 14:10</div>
</div>
<div id="article">
<p>If you've not been following along, you might have missed it: it appears to
me that even today, in 2011, mail systems work much better when setup the
old way.  Meaning with a local <a href="http://en.wikipedia.org/wiki/Mail_Transfer_Agent">MTA</a> for outgoing mail.  With some niceties,
such as <a href="http://tapoueh.org/articles/news/_Postfix_sender_dependent_relayhost_maps.html">sender dependent relayhost maps</a>.</p>

<p>That's why I needed <a href="http://tapoueh.org/projects.html#sec21">M-x mailq</a> to display the <em>mail queue</em> and have some easy
shortcuts in order to operate it (mainly <code>f runs the command
mailq-mode-flush</code>, but per site and per id delivery are useful too).</p>

<p>Now, I also happen to setup outgoing mail routes to walk through an <em>SSH
tunnel</em>, which thanks to both <a href="http://www.manpagez.com/man/5/ssh_config/">~/.ssh/config</a> and <a href="https://github.com/dimitri/cssh">cssh</a> (<code>C-= runs the
command cssh-term-remote-open</code>, with completion) is a couple of
keystrokes away to start.  Well it still happens to me to forget about
starting it, which causes mails to hold in a queue until I realise it's not
delivered, which always take just about too long.</p>

<p>A solution I've been thinking about is to add a little flag in the <a href="http://www.gnu.org/s/emacs/manual/html_node/elisp/Mode-Line-Format.html">modeline</a>
in my <a href="http://www.gnus.org/">gnus</a> <code>*Group*</code> and <code>*Summary*</code> buffers.  The flag would show up as ✔ when
no mail is queued and waiting for me to open the tunnel, or ✘ as soon as the
queue is not empty.  Here's what it looks like here:</p>

<center>
<p><img src="../../../images//mailq-modeline-display.png" alt=""></p>
</center>

<p>Well I'm pretty happy with the setup.  The flag is refreshed every minute,
and here's as an example how I did setup <code>mailq</code> in my <a href="https://github.com/dimitri/el-get">el-get-sources</a> setup:</p>

<pre class="src">
         (<span style="color: #729fcf;">:name</span> mailq
                <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> () (mailq-modeline-display)))
</pre>

<p>I'm not sure how many of you dear readers are using a local MTA to deliver
your mails, but well, the ones who do (or consider doing so) might even find
this article useful!</p>


<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/modeline.html">modeline</a> <a href="../../../tags/cssh.html">cssh</a> <a href="../../../tags/mailq.html">mailq</a> <a href="../../../tags/postfix.html">postfix</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 05 May 2011 14:10:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/05-mailq-modeline-display.html</guid>
</item>
<item>
  <title>Tables and Views dependencies</title>
  <link>http://tapoueh.org/blog/2011/05/04-tables-and-views-dependencies.html</link>
  <description><![CDATA[h1>Tables and Views dependencies</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2011/index.html>2011</a> / <a href=../../../blog/2011/05/index.html>05</a> / </div>
<div class="date">Wednesday, May 04 2011, 11:45</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>Let's say you need to <code>ALTER TABLE foo ALTER COLUMN bar TYPE bigint;</code>, and
<a href="http://postgresql.org">PostgreSQL</a> is helpfully telling you that no you can't because such and such
<em>views</em> depend on the column.  The basic way to deal with that is to copy
paste from the error message the names of the views involved, then prepare a
script wherein you first <code>DROP VIEW ...;</code> then <code>ALTER TABLE</code> and finally <code>CREATE
VIEW</code> again, all in the same transaction.</p>

<p>So you have to copy paste also the view definitions.  With large view
definitions, it quickly gets cumbersome to do so.  Well when you're working
on operations, you have to bear in mind that cumbersome is a synonym for
<em>error prone</em>, in fact — so you want another solution if possible.</p>

<p>Oh, and the other drawback of this solution is that <code>ALTER TABLE</code> will first
take a <code>LOCK</code> on the table, locking out any activity.  And more than that, the
lock acquisition will queue behind current activity on the table, which
means waiting for a fairly long time and damaging the service quality on a
moderately loaded server.</p>

<p>It's possible to abuse the <a href="http://www.postgresql.org/docs/current/static/catalogs.html">system catalogs</a> in order to find all <em>views</em> that
depend on a given table, too.  For that, you have to play with <code>pg_depend</code> and
you have to know that internally, a <em>view</em> is in fact a <em>rewrite rule</em>.  Then
here's how to produce the two scripts that we need:</p>

<pre class="src">
=# \t
Showing only tuples.

=# \o /tmp/drop.sql
=# select <span style="color: #ad7fa8; font-style: italic;">'DROP VIEW '</span> || views || <span style="color: #ad7fa8; font-style: italic;">';'</span>
     from (select distinct(r.ev_class::regclass) as views
            from pg_depend d join pg_rewrite r on r.oid = d.objid
           where refclassid = <span style="color: #ad7fa8; font-style: italic;">'pg_class'</span>::regclass
             and refobjid = <span style="color: #ad7fa8; font-style: italic;">'SCHEMA.TABLENAME'</span>::regclass
             and classid = <span style="color: #ad7fa8; font-style: italic;">'pg_rewrite'</span>::regclass
             and pg_get_viewdef(r.ev_class, true) ~ <span style="color: #ad7fa8; font-style: italic;">'COLUMN_NAME'</span>) as x;

=# \o /tmp/create.sql
=# select <span style="color: #ad7fa8; font-style: italic;">'CREATE VIEW '</span> || views || E<span style="color: #ad7fa8; font-style: italic;">' AS \n'</span>
       || pg_get_viewdef(views, true) || <span style="color: #ad7fa8; font-style: italic;">';'</span>
     from (select distinct(r.ev_class::regclass) as views
          from pg_depend d join pg_rewrite r on r.oid = d.objid
         where refclassid = <span style="color: #ad7fa8; font-style: italic;">'pg_class'</span>::regclass
           and refobjid = <span style="color: #ad7fa8; font-style: italic;">'SCHEMA.TABLENAME'</span>::regclass
           and classid = <span style="color: #ad7fa8; font-style: italic;">'pg_rewrite'</span>::regclass
           and pg_get_viewdef(r.ev_class, true) ~ <span style="color: #ad7fa8; font-style: italic;">'COLUMN_NAME'</span>) as x;

=# \o
</pre>

<p>Replace <code>SCHEMA.TABLENAME</code> and <code>COLUMN_NAME</code> with your targets here and the
first query should give you one row per candidate view.  Well if you're not
using the <code>\o</code> trick, that is — if you do, check out the generated file
instead, with <code>\! cat /tmp/drop.sql</code> for example.</p>

<p>Please note that this catalog query is not accurate, as it will select as a
candidate any view that will by chance both depend on the target table and
contain the <code>column_name</code> in its text definition.  So either filter out the
candidates properly (by proper proof reading then another <code>WHERE</code> clause), or
just accept that you might <code>DROP</code> then <code>CREATE</code> again more <em>views</em> than need be.</p>

<p>If you need some more details about the <code>\t \o</code> sequence you might be
interested in this older article about <a href="http://tapoueh.org/articles/blog/_Resetting_sequences._All_of_them&#44;_please&#33;.html">resetting sequences</a>.</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/catalogs.html">catalogs</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 04 May 2011 11:45:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/04-tables-and-views-dependencies.html</guid>
</item>
<item>
  <title>Extension module_pathname and .sql.in</title>
  <link>http://tapoueh.org/blog/2011/05/02-extension-module_pathname-and-sqlin.html</link>
  <description><![CDATA[h1>Extension module_pathname and .sql.in</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2011/index.html>2011</a> / <a href=../../../blog/2011/05/index.html>05</a> / </div>
<div class="date">Monday, May 02 2011, 17:30</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>While currently too busy at work to deliver much Open Source contributions,
let's debunk an old habit of <a href="http://www.postgresql.org/">PostgreSQL</a> extension authors.  It's all down to
copy pasting from <em>contrib</em>, and there's no reason to continue doing <code>$libdir</code>
this way ever since <code>7.4</code> days.</p>

<p>Let's take an example here, with the <a href="https://github.com/dimitri/prefix">prefix</a> extension.  This one too will
need some love, but is still behind on my spare time todo list, sorry about
that.  So, in the <code>prefix.sql.in</code> we read</p>

<pre class="src">
  CREATE OR REPLACE FUNCTION prefix_range_in(cstring)
  RETURNS prefix_range
  AS <span style="color: #ad7fa8; font-style: italic;">'MODULE_PATHNAME'</span>
  LANGUAGE <span style="color: #ad7fa8; font-style: italic;">'C'</span> IMMUTABLE STRICT;
</pre>

<p>Two things are to change here.  First, the PostgreSQL <em>backend</em> will
understand just fine if you just say <code>AS '$libdir/prefix'</code>.  So you have to
know in the <code>sql</code> script the name of the shared object library, but if you do,
you can maintain directly a <code>prefix.sql</code> script instead.</p>

<p>The advantage is that you now can avoid a compatibility problem when you
want to support PostgreSQL from <code>8.2</code> to <code>9.1</code> in your extension (older than
that and it's <a href="http://wiki.postgresql.org/wiki/PostgreSQL_Release_Support_Policy">no longer supported</a>).  You directly ship your script.</p>

<p>For compatibility, you could also use the <a href="http://developer.postgresql.org/pgdocs/postgres/extend-extensions.html">control file</a> <code>module_pathname</code>
property.  But for <code>9.1</code> you then have to add a implicit <code>Make</code> rule so that the
script is derived from your <code>.sql.in</code>. And as you are managing several scripts
— so that you can handle <em>versioning</em> and <em>upgrades</em> — it can get hairy (<em>hint</em>,
you need to copy <code>prefix.sql</code> as <code>prefix--1.1.1.sql</code>, then change its name at
next revision, and think about <em>upgrade</em> scripts too).  The <code>module_pathname</code>
facility is better to keep for when managing more than a single extension in
the same directory, like the <a href="http://git.postgresql.org/gitweb?p=postgresql.git;a=blob;f=contrib/spi/Makefile;h=0c11bfcbbd47b0c3ed002874bfefd9e2022cf5ac;hb=HEAD">SPI contrib</a> is doing.</p>

<p>Sure, maintaining an extension that targets both antique releases of
PostgreSQL and <a href="http://developer.postgresql.org/pgdocs/postgres/sql-createextension.html">CREATE EXTENSION</a> super-powered one(s) (not yet released) is a
little more involved than that.  We'll get back to that, as some people are
still pioneering the movement.</p>

<p>On my side, I'm working with some <a href="http://www.debian.org/">debian</a> <a href="http://qa.debian.org/developer.php?login=myon">developer</a> on how to best manage the
packaging of those extensions, and this work could end up as a specialized
<em>policy</em> document and a coordinated <em>team</em> of maintainers for all things
PostgreSQL in <code>debian</code>.  This will also give some more steam to the PostgreSQL
effort for debian packages: the idea is to maintain packages for all
supported version (from <code>8.2</code> up to soon <code>9.1</code>), something <code>debian</code> itself can not
commit to.</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/prefix.html">prefix</a> <a href="../../../tags/9.1.html">9.1</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 02 May 2011 17:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/02-extension-module_pathname-and-sqlin.html</guid>
</item>
































<item>
  <title>Emacs and PostgreSQL, PL line numbering</title>
  <link>http://tapoueh.org/blog/2011/04/blog/2011/04/23-emacs-and-postgresql-pl-line-numbering.html</link>
  <description><![CDATA[<p><span class="hack"> </span></p>

<p>A while ago I've been fixing and publishing <a href="https://github.com/dimitri/pgsql-linum-format">pgsql-linum-format</a> separately.
That allows to number <code>PL/whatever</code> code lines when editing from <a href="http://www.gnu.org/software/emacs/">Emacs</a>, and
it's something very useful to turn on when debugging.</p>

<center>
<p><img src="../../../images//emacs-pgsql-linum.png" alt=""></p>
</center>


<p>The carrets on the <em>fringe</em> in the emacs window are the result of
<code>(setq-default indicate-buffer-boundaries 'left)</code> and here it's
just overloading the image somehow.  But the idea is to just <code>M-x linum-mode</code>
when you need it, at least that's my usage of it.</p>

<p>You can use <a href="https://github.com/dimitri/el-get">el-get</a> to easily get (then update) this little <code>Emacs</code> extension.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Sat, 23 Apr 2011 10:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/04/blog/2011/04/23-emacs-and-postgresql-pl-line-numbering.html</guid>
</item>
<item>
  <title>Emacs and PostgreSQL, PL line numbering</title>
  <link>http://tapoueh.org/blog/2011/04/23-emacs-and-postgresql-pl-line-numbering.html</link>
  <description><![CDATA[h1>Emacs and PostgreSQL, PL line numbering</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2011/index.html>2011</a> / <a href=../../../blog/2011/04/index.html>04</a> / </div>
<div class="date">Saturday, April 23 2011, 10:30</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>A while ago I've been fixing and publishing <a href="https://github.com/dimitri/pgsql-linum-format">pgsql-linum-format</a> separately.
That allows to number <code>PL/whatever</code> code lines when editing from <a href="http://www.gnu.org/software/emacs/">Emacs</a>, and
it's something very useful to turn on when debugging.</p>

<center>
<p><img src="../../../images//emacs-pgsql-linum.png" alt=""></p>
</center>


<p>The carrets on the <em>fringe</em> in the emacs window are the result of
<code>(setq-default indicate-buffer-boundaries 'left)</code> and here it's
just overloading the image somehow.  But the idea is to just <code>M-x linum-mode</code>
when you need it, at least that's my usage of it.</p>

<p>You can use <a href="https://github.com/dimitri/el-get">el-get</a> to easily get (then update) this little <code>Emacs</code> extension.</p>



<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/pgsql-linum-format.html">pgsql-linum-format</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Sat, 23 Apr 2011 10:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/04/23-emacs-and-postgresql-pl-line-numbering.html</guid>
</item>
<item>
  <title>Emacs Kicker</title>
  <link>http://tapoueh.org/blog/2011/04/blog/2011/04/15-emacs-kicker.html</link>
  <description><![CDATA[<p>Following up on the very popular <a href="https://github.com/technomancy/emacs-starter-kit">emacs-starter-kit</a>, I'm now proposing the
<a href="https://github.com/dimitri/emacs-kicker">emacs-kicker</a>.  It's about the <code>.emacs</code> file you've seen in older posts here,
which I maintain for some colleagues.  After all, if they find it useful,
some more people might to, so I've decided to publish it.</p>

<p>What you'll find is a very simple <code>128</code> lines <a href="http://www.gnu.org/software/emacs/">Emacs</a> user init file, based on
<a href="https://github.com/dimitri/el-get">el-get</a> for external packages.  A not so <em>random</em> selection of those is used,
here's the list when you hide some details:</p>

<pre class="src">
 '(el-get                       <span style="color: #b22222;">; </span><span style="color: #b22222;">el-get is self-hosting
</span>   escreen                      <span style="color: #b22222;">; </span><span style="color: #b22222;">screen for emacs, C-\ C-h
</span>   php-mode-improved            <span style="color: #b22222;">; </span><span style="color: #b22222;">if you're into php...
</span>   psvn                         <span style="color: #b22222;">; </span><span style="color: #b22222;">M-x svn-status
</span>   switch-window                <span style="color: #b22222;">; </span><span style="color: #b22222;">takes over C-x o
</span>   auto-complete                <span style="color: #b22222;">; </span><span style="color: #b22222;">complete as you type with overlays
</span>   emacs-goodies-el             <span style="color: #b22222;">; </span><span style="color: #b22222;">the debian addons for emacs
</span>   yasnippet                    <span style="color: #b22222;">; </span><span style="color: #b22222;">powerful snippet mode
</span>   zencoding-mode               <span style="color: #b22222;">; </span><span style="color: #b22222;">http://www.emacswiki.org/emacs/ZenCoding
</span>   (<span style="color: #da70d6;">:name</span> buffer-move           <span style="color: #b22222;">; </span><span style="color: #b22222;">move buffers around in windows
</span>   (<span style="color: #da70d6;">:name</span> smex                  <span style="color: #b22222;">; </span><span style="color: #b22222;">a better (ido like) M-x
</span>   (<span style="color: #da70d6;">:name</span> magit                 <span style="color: #b22222;">; </span><span style="color: #b22222;">git meet emacs, and a binding
</span>   (<span style="color: #da70d6;">:name</span> goto-last-change      <span style="color: #b22222;">; </span><span style="color: #b22222;">move pointer back to last change
</span></pre>

<p>Another interresting thing to note in this <code>kicker</code> is a choice of some key
bindings that are rather unusual (yet) I guess.</p>

<pre class="src">
(global-set-key (kbd <span style="color: #bc8f8f;">"C-x C-b"</span>) 'ido-switch-buffer)
(global-set-key (kbd <span style="color: #bc8f8f;">"C-x C-c"</span>) 'ido-switch-buffer)
(global-set-key (kbd <span style="color: #bc8f8f;">"C-x B"</span>) 'ibuffer)
</pre>

<p>Yes, you see that I've rebound <code>C-x C-c</code> to switching buffers.  That key is
really easy to use and I don't think that <code>M-x kill-emacs</code> deserves it.  Keys
that are so easy to use should be kept for frequent actions, and quiting
emacs is a once-a-day to once-a-month action here.  And you can still quit
from the window manager button or from the menu or from <code>M-x</code>.</p>

<p>Also <em>Mac</em> users are not left behind, you will see some settings that either
are adapted to the system (like choosing another <em>font</em>, keep displaying the
<code>menu-bar</code> or not installing the darkish <code>tango-color-mode</code> on this system,
where it renders poorly in my opinion), as you can see here:</p>

<pre class="src">
(<span style="color: #7f007f;">if</span> (string-match <span style="color: #bc8f8f;">"apple-darwin"</span> system-configuration)
    (set-face-font 'default <span style="color: #bc8f8f;">"Monaco-13"</span>)
  (set-frame-font <span style="color: #bc8f8f;">"Monospace-10"</span>))

(<span style="color: #7f007f;">when</span> (string-match <span style="color: #bc8f8f;">"apple-darwin"</span> system-configuration)
  (setq mac-allow-anti-aliasing t)
  (setq mac-command-modifier 'meta)
  (setq mac-option-modifier 'none))
</pre>

<p>So all in all, I don't expect this <code>emacs-kicker</code> to please everyone, but I
expect it to be simple and rich enough (thanks to <a href="https://github.com/dimitri/el-get">el-get</a>), and it should be
a good <em>kick start</em> that's easy to adapt.</p>

<p>If you want to try it without installing it it's very easy to do so.  Just
clone the <code>git</code> repository then start an <code>Emacs</code> that will use this.  For
example that could be, using the excellent <a href="http://emacsformacosx.com/">Emacs For MacOSX</a>:</p>

<pre class="src">
 $ /Applications/Emacs.app/Contents/MacOS/Emacs -Q -l init.el
</pre>

<p>I hope some readers will find it useful! :)</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 15 Apr 2011 21:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/04/blog/2011/04/15-emacs-kicker.html</guid>
</item>
<item>
  <title>Emacs Kicker</title>
  <link>http://tapoueh.org/blog/2011/04/15-emacs-kicker.html</link>
  <description><![CDATA[h1>Emacs Kicker</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2011/index.html>2011</a> / <a href=../../../blog/2011/04/index.html>04</a> / </div>
<div class="date">Friday, April 15 2011, 21:30</div>
</div>
<div id="article">
<p>Following up on the very popular <a href="https://github.com/technomancy/emacs-starter-kit">emacs-starter-kit</a>, I'm now proposing the
<a href="https://github.com/dimitri/emacs-kicker">emacs-kicker</a>.  It's about the <code>.emacs</code> file you've seen in older posts here,
which I maintain for some colleagues.  After all, if they find it useful,
some more people might to, so I've decided to publish it.</p>

<p>What you'll find is a very simple <code>128</code> lines <a href="http://www.gnu.org/software/emacs/">Emacs</a> user init file, based on
<a href="https://github.com/dimitri/el-get">el-get</a> for external packages.  A not so <em>random</em> selection of those is used,
here's the list when you hide some details:</p>

<pre class="src">
 '(el-get                       <span style="color: #888a85;">; </span><span style="color: #888a85;">el-get is self-hosting
</span>   escreen                      <span style="color: #888a85;">; </span><span style="color: #888a85;">screen for emacs, C-\ C-h
</span>   php-mode-improved            <span style="color: #888a85;">; </span><span style="color: #888a85;">if you're into php...
</span>   psvn                         <span style="color: #888a85;">; </span><span style="color: #888a85;">M-x svn-status
</span>   switch-window                <span style="color: #888a85;">; </span><span style="color: #888a85;">takes over C-x o
</span>   auto-complete                <span style="color: #888a85;">; </span><span style="color: #888a85;">complete as you type with overlays
</span>   emacs-goodies-el             <span style="color: #888a85;">; </span><span style="color: #888a85;">the debian addons for emacs
</span>   yasnippet                    <span style="color: #888a85;">; </span><span style="color: #888a85;">powerful snippet mode
</span>   zencoding-mode               <span style="color: #888a85;">; </span><span style="color: #888a85;">http://www.emacswiki.org/emacs/ZenCoding
</span>   (<span style="color: #729fcf;">:name</span> buffer-move           <span style="color: #888a85;">; </span><span style="color: #888a85;">move buffers around in windows
</span>   (<span style="color: #729fcf;">:name</span> smex                  <span style="color: #888a85;">; </span><span style="color: #888a85;">a better (ido like) M-x
</span>   (<span style="color: #729fcf;">:name</span> magit                 <span style="color: #888a85;">; </span><span style="color: #888a85;">git meet emacs, and a binding
</span>   (<span style="color: #729fcf;">:name</span> goto-last-change      <span style="color: #888a85;">; </span><span style="color: #888a85;">move pointer back to last change
</span></pre>

<p>Another interresting thing to note in this <code>kicker</code> is a choice of some key
bindings that are rather unusual (yet) I guess.</p>

<pre class="src">
(global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-x C-b"</span>) 'ido-switch-buffer)
(global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-x C-c"</span>) 'ido-switch-buffer)
(global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-x B"</span>) 'ibuffer)
</pre>

<p>Yes, you see that I've rebound <code>C-x C-c</code> to switching buffers.  That key is
really easy to use and I don't think that <code>M-x kill-emacs</code> deserves it.  Keys
that are so easy to use should be kept for frequent actions, and quiting
emacs is a once-a-day to once-a-month action here.  And you can still quit
from the window manager button or from the menu or from <code>M-x</code>.</p>

<p>Also <em>Mac</em> users are not left behind, you will see some settings that either
are adapted to the system (like choosing another <em>font</em>, keep displaying the
<code>menu-bar</code> or not installing the darkish <code>tango-color-mode</code> on this system,
where it renders poorly in my opinion), as you can see here:</p>

<pre class="src">
(<span style="color: #729fcf; font-weight: bold;">if</span> (string-match <span style="color: #ad7fa8; font-style: italic;">"apple-darwin"</span> system-configuration)
    (set-face-font 'default <span style="color: #ad7fa8; font-style: italic;">"Monaco-13"</span>)
  (set-frame-font <span style="color: #ad7fa8; font-style: italic;">"Monospace-10"</span>))

(<span style="color: #729fcf; font-weight: bold;">when</span> (string-match <span style="color: #ad7fa8; font-style: italic;">"apple-darwin"</span> system-configuration)
  (setq mac-allow-anti-aliasing t)
  (setq mac-command-modifier 'meta)
  (setq mac-option-modifier 'none))
</pre>

<p>So all in all, I don't expect this <code>emacs-kicker</code> to please everyone, but I
expect it to be simple and rich enough (thanks to <a href="https://github.com/dimitri/el-get">el-get</a>), and it should be
a good <em>kick start</em> that's easy to adapt.</p>

<p>If you want to try it without installing it it's very easy to do so.  Just
clone the <code>git</code> repository then start an <code>Emacs</code> that will use this.  For
example that could be, using the excellent <a href="http://emacsformacosx.com/">Emacs For MacOSX</a>:</p>

<pre class="src">
 $ /Applications/Emacs.app/Contents/MacOS/Emacs -Q -l init.el
</pre>

<p>I hope some readers will find it useful! :)</p>


<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/switch-window.html">switch-window</a> <a href="../../../tags/emacs-kicker.html">emacs-kicker</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 15 Apr 2011 21:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/04/15-emacs-kicker.html</guid>
</item>
<item>
  <title>Some notes about Skytools3</title>
  <link>http://tapoueh.org/blog/2011/04/11-some-notes-about-skytools3.html</link>
  <description><![CDATA[h1>Some notes about Skytools3</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2011/index.html>2011</a> / <a href=../../../blog/2011/04/index.html>04</a> / </div>
<div class="date">Monday, April 11 2011, 11:30</div>
</div>
<div id="article">
<p>I've been working on <a href="http://github.com/markokr/skytools">skytools3</a> packaging lately.  I've been pushing quite a
lot of work into it, in order to have exactly what I needed out of the box,
after some 3 years of production and experiences with the products.  Plural,
yes, because even if <a href="http://wiki.postgresql.org/wiki/PgBouncer">pgbouncer</a> and <a href="http://wiki.postgresql.org/wiki/PL/Proxy">plproxy</a> are siblings to the projets (same
developers team, separate life cycle and releases), then <code>skytools</code> still
includes several sub-projects.</p>

<p>Here's what the <code>skytools3</code> packaging is going to look like:</p>

<pre class="src">
skytools3              Skytool's replication and queuing
python-pgq3            Skytool's PGQ python library
python-skytools3       python scripts framework for skytools
skytools-ticker3       PGQ ticker daemon service
skytools-walmgr3       high-availability archive and restore commands
postgresql-8.4-pgq3    PGQ server-side code (C module for PostgreSQL)
postgresql-9.0-pgq3    PGQ server-side code (C module for PostgreSQL)
</pre>

<p>This split is needed so that you can install your <em>daemons</em> (we call them
<em>consumers</em>) on separate machines than where you run <a href="http://postgresql.org">PostgreSQL</a>.  But for the
<code>walmgr</code> part, it makes no sense to install it if you don't have a local
PostgreSQL service, as it's providing <code>archive</code> and <code>restore</code> commands.  Then
the <em>ticker</em>, you're free to run it on any machine really, so just package it
this way (in <code>skytools3</code> the <em>ticker</em> is written in <code>C</code> and does not depend on the
python framework any more).</p>

<p>What you can't see here yet is the new goodies that wraps it as a quality
<code>debian</code> package.  A new <code>skytools</code> user is created for you when you install the
<code>skytools3</code> package (which contains the services), along with a skeleton file
<code>/etc/skytools.ini</code> and a user directory <code>/etc/skytools/</code>.  Put in there your
services configuration file, and register those service in the
<code>/etc/skytools.ini</code> file itself.  Then they will get cared about in the <code>init</code>
sequence at startup and shutdown of your server.</p>

<p>The services will run under the <code>skytools</code> system user, and will default to
put their log into <code>/var/log/skytools/</code>.  The <code>pidfile</code> will get into
<code>/var/run/skytools/</code>.  All integrated, automated.</p>

<p>Next big <em>TODO</em> is about documentation, reviewing it and polishing it, and I
think that <code>skytools3</code> will then get ready for public release.  Yes, you read
it right, it's happening this very year!  I'm very excited about it, and
have several architectures that will greatly benefit from the switch to
<code>skytools3</code>.  More on that later, though!  (Yes, my <em>to blog later</em> list is
getting quite long now).</p>



<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/skytools.html">skytools</a> <a href="../../../tags/restore.html">restore</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 11 Apr 2011 11:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/04/11-some-notes-about-skytools3.html</guid>
</item>























<item>
  <title>towards pg_staging 1.0</title>
  <link>http://tapoueh.org/blog/2011/03/29-towards-pg_staging-10.html</link>
  <description><![CDATA[h1>towards pg_staging 1.0</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2011/index.html>2011</a> / <a href=../../../blog/2011/03/index.html>03</a> / </div>
<div class="date">Tuesday, March 29 2011, 15:30</div>
</div>
<div id="article">
<p>If you don't remember about what <a href="pgstaging.html">pg_staging</a> is all about, it's a central
console from where to control all your <a href="http://www.postgresql.org/">PostgreSQL</a> databases.  Typically you
use it to manage your development and pre-production setup, where developers
ask you pretty often to install them some newer dump from the production,
and you want that operation streamlined and easy.</p>

<center>
<p><img src="../../../images//pg_staging.png" alt=""></p>
</center>


<h3>Usage</h3>

<p class="first">The typical session would be something like this:</p>

<pre class="src">
pg_staging&gt; databases foodb.dev
                    foodb      foodb_20100824 :5432
           foodb_20100209      foodb_20100209 :5432
           foodb_20100824      foodb_20100824 :5432
                pgbouncer           pgbouncer :6432
                 postgres            postgres :5432

pg_staging&gt; dbsizes foodb.dev
foodb.dev
           foodb_20100209: -1
           foodb_20100824: 104 GB
Total                    = 104 GB

pg_staging&gt; restore foodb.dev
...
pg_staging&gt; switch foodb.dev today
</pre>

<p>The list of supported commands is quite long now, and documented too (it
comes with two man pages).  The <code>restore</code> one is the most important and will
create the database, add it to the <code>pgbouncer</code> setup, fetch the backup named
<code>dbname.`date -I`.dump</code>, prepare a filtered object list (more on that), load
<em>pre</em> <code>SQL</code> scripts, launch <code>pg_restore</code>, <code>VACUUM ANALYZE</code> the database when
configured to do so, load the <em>post</em> <code>SQL</code> scripts then optionaly <em>switch</em> the
<code>pgbouncer</code> setup to default to this new database.</p>


<h3>Filtering</h3>

<p class="first">The newer option is called <code>tablename_nodata_regexp</code>, and here's its documentation in full:</p>

<blockquote>
<p class="quoted">
List of table names regexp (comma separated) to restore without
content. The <code>pg_restore</code> catalog <code>TABLE DATA</code> sections will get
filtered out.  The regexp is applied against <code>schemaname.tablename</code>
and non-anchored by default.</p>

</blockquote>

<p>This comes to supplement the <code>schemas</code> and <code>schemas_nodata</code> options, that allows
to only restore objects from a given set of <em>schemas</em> (filtering out triggers
that will calls function that are in the excluded schemas, like
e.g. <a href="http://wiki.postgresql.org/wiki/Skytools">Londiste</a> ones) or to restore only the <code>TABLE</code> definitions while skipping
the <code>TABLE DATA</code> entries.</p>


<h3>Setup</h3>

<p class="first">To setup your environment for <em>pg_staging</em>, you need to take some steps.  It's
not complex but it's fairly involved.  The benefit is this amazingly useful
central unique console to control as many databases as you need.</p>

<p>You need a <code>pg_staging.ini</code> file where to describe your environment.  I
typically name the sessions in there by the name of the database to restore
followed by a <code>dev</code> or <code>preprod</code> extension.</p>

<p>You need to have all your backups available through <code>HTTP</code>, and as of now,
served by the famous <em>apache</em> <code>mod_dir</code> directory listing.  It's easy to add
support to other methods, but is has not been done yet.  You also need to
have a cluster wide <code>--globals-only</code> backup available somewhere so that you
can easily create the users etc you need from <code>pg_staging</code>.</p>

<p>You also need to run a <code>pgbouncer</code> daemon on each database server, allowing
you to bypass editing connection strings when you <code>switch</code> a new database
version live.</p>

<p>You also need to install the <em>client</em> script, have a local <code>pgstaging</code> system
user and allow it to run the client script as root, so that it's able to
control some services and edit <code>pgbouncer.ini</code> for you.</p>


<h3>Status</h3>

<p class="first">I'm still using it a lot (several times a week) to manage a whole
development and pre-production environment set, so the very low
<a href="https://github.com/dimitri/pg_staging">code activity</a> of the project is telling that it's pretty stable (last series
of <em>commits</em> are all bug fixes and round corners).</p>

<p>Given that, I'm thinking in terms of <code>pg_staging 1.0</code> soon!  Now is a pretty
good time to try it and see how it can help you.</p>



<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/skytools.html">skytools</a> <a href="../../../tags/backup.html">backup</a> <a href="../../../tags/restore.html">restore</a> <a href="../../../tags/pg_staging.html">pg_staging</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 29 Mar 2011 15:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/03/29-towards-pg_staging-10.html</guid>
</item>
<item>
  <title>Extensions in 9.1</title>
  <link>http://tapoueh.org/blog/2011/03/01-extensions-in-91.html</link>
  <description><![CDATA[h1>Extensions in 9.1</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2011/index.html>2011</a> / <a href=../../../blog/2011/03/index.html>03</a> / </div>
<div class="date">Tuesday, March 01 2011, 16:30</div>
</div>
<div id="article">
<p>If you've not been following closely you might have missed out on extensions
integration.  Well, <a href="http://en.wikipedia.org/wiki/Tom_Lane_(computer_scientist)">Tom</a> spent some time on the patches I've been preparing
for the last 4 months.  And not only did he commit most of the work but he
also enhanced some parts of the code (better factoring) and basically
finished it.</p>

<p>At the <a href="http://wiki.postgresql.org/wiki/PgCon_2010_Developer_Meeting">previous developer meeting</a> his advice was to avoid putting too much
into the very first version of the patch for it to stand its chances of
being integrated, and while in the review process more than one major
<a href="http://www.postgresql.org/">PostgreSQL</a> contributor expressed worries about the size of the patch and the
number of features proposed.  Which is the usual process.</p>

<p>Then what happened is that <strong><em>Tom</em></strong> finally took a similar reasoning as mine
while working on the feature.  To maximize the benefits, once you have the
infrastructure in place, it's not that much more work to provide the really
interesting features.  What's complex is agreeing on what exactly are their
specifications.  And in the <em>little</em> time window we got on this commit fest
(well, we hijacked about 2 full weeks there), we managed to get there.</p>

<p>So in the end the result is quite amazing, and you can see that on the
documentation chapter about it:
<a href="http://developer.postgresql.org/pgdocs/postgres/extend-extensions.html">35.15. Packaging Related Objects into an Extension</a>.</p>

<p>All the <em>contrib</em> modules that are installing <code>SQL</code> objects into databases for
you to use them are now converted to <strong><em>Extensions</em></strong> too, and will get released
in <code>9.1</code> with an upgrade script that allows you to <em>upgrade from unpackaged</em>.
That means that once you've upgraded from a past PostgreSQL release up to
<code>9.1</code>, it will be a command away for you to register <em>extensions</em> as such.  I
expect third party <em>extension</em> authors (from <a href="http://pgfoundry.org/projects/ip4r/">ip4r</a> to <a href="http://pgfoundry.org/projects/temporal">temporal</a>) to release a
<em>upgrade-from-unpackaged</em> version of their work too.</p>

<p>Of course, a big use case of the <em>extensions</em> is also in-house <code>PL</code> code, and
having version number and multi-stage upgrade scripts there will be
fantastic too, I can't wait to work with such a tool set myself.  Some later
blog post will detail the benefits and usage.  I'm already trying to think
how much of this version and upgrade facility could be expanded to classic
<code>DDL</code> objects…</p>

<p>So expect some more blog posts from me on this subject, I will have to talk
about <em>debian packaging</em> an extension (it's getting damn easy with
<a href="http://packages.debian.org/squeeze/postgresql-server-dev-all">postgresql-server-dev-all</a> — yes it has received some planing ahead), and
about how to package your own extension, manage upgrades, turn your current
<code>pre-9.1</code> extension into a <em>full blown extension</em>, and maybe how to stop
worrying about extension when you're a DBA.</p>

<p>If you have some features you would want to discuss for next releases,
please do contact me!</p>

<p>Meanwhile, I'm very happy that this project of mine finally made it to <em>core</em>,
it's been long in the making.  Some years to talk about it and then finally
4 months of coding that I'll remember as a marathon.  Many Thanks go to all
who helped here, from <a href="http://www.2ndquadrant.com/">2ndQuadrant</a> to early reviewers to people I talked to
over beers at conferences… lots of people really.</p>

<p>To an extended PostgreSQL (and beyond) :)</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/pgcon.html">pgcon</a> <a href="../../../tags/conferences.html">Conferences</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/ip4r.html">ip4r</a> <a href="../../../tags/9.1.html">9.1</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 01 Mar 2011 16:30:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/03/01-extensions-in-91.html</guid>
</item>

























<item>
  <title>desktop-mode and readahead</title>
  <link>http://tapoueh.org/blog/2011/02/blog/2011/02/23-desktop-mode-and-readahead.html</link>
  <description><![CDATA[<p>I'm using <a href="http://www.gnu.org/software/emacs/manual/html_node/elisp/Desktop-Save-Mode.html#Desktop-Save-Mode">Desktop Save Mode</a> so that <a href="http://www.gnu.org/software/emacs/">Emacs</a> knows to open again all the
buffers I've been using.  That goes quite well with how often I start <code>Emacs</code>,
that is once a week or once a month.  Now, <code>M-x ibuffer</code> last line is as
following:</p>

<pre class="src">
    718 buffers         19838205                  668 files, 15 processes
</pre>

<p>That means that at startup, <code>Emacs</code> will load that many files.  In order not
to have to wait until it's done doing so, I've setup things this way:</p>

<pre class="src">
<span style="color: #b22222;">;; </span><span style="color: #b22222;">and the session
</span>(setq desktop-restore-eager 20
      desktop-lazy-verbose nil)
(desktop-save-mode 1)
(savehist-mode 1)
</pre>

<p>Problem is that it's still slow.  An idea I had was to use the <a href="https://fedorahosted.org/readahead/browser/README">readahead</a>
tool that allows reducing some distributions boot time.  Of course this tool
is not expecting the same file format as <code>emacs-desktop</code> uses.  Still,
converting is quite easy is some <code>awk</code> magic.  Here's the result:</p>

<pre class="src">
<span style="color: #b22222;">;;; </span><span style="color: #b22222;">dim-desktop.el --- Dimitri Fontaine
</span><span style="color: #b22222;">;;</span><span style="color: #b22222;">
</span><span style="color: #b22222;">;; </span><span style="color: #b22222;">Allows to prepare a readahead file list from desktop-save
</span>
(<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">desktop</span>)

(<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">dim-desktop-file-readahead-list</span>
  <span style="color: #bc8f8f;">"~/.emacs.desktop.readahead"</span>
  <span style="color: #bc8f8f;">"*Where to save the emacs desktop `readahead` file list"</span>)

(<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">dim-desktop-filelist-command</span>
  <span style="color: #bc8f8f;">"gawk -F '[ \"]' '/desktop-.*-buffer/ {getline; if($4) print $4}' %s"</span>
  <span style="color: #bc8f8f;">"Command to run to prepare the readahead file list"</span>)

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">dim-desktop-get-readahead-file-list</span> (<span style="color: #228b22;">&amp;optional</span> filename dir)
  <span style="color: #bc8f8f;">"get the file list for readahead from dekstop file in DIR, or ~"</span>
  (<span style="color: #7f007f;">with-temp-file</span> (or filename dim-desktop-file-readahead-list)
    (insert
     (shell-command-to-string
      (format dim-desktop-filelist-command
              (expand-file-name desktop-base-file-name (or dir <span style="color: #bc8f8f;">"~"</span>)))))))

<span style="color: #b22222;">;; </span><span style="color: #b22222;">This will not work because the hook is run before to add the buffers into
</span><span style="color: #b22222;">;; </span><span style="color: #b22222;">the desktop file.
</span><span style="color: #b22222;">;;</span><span style="color: #b22222;">
</span><span style="color: #b22222;">;;</span><span style="color: #b22222;">(add-hook 'desktop-save-hook 'dim-desktop-get-readahead-file-list)
</span>
<span style="color: #b22222;">;; </span><span style="color: #b22222;">so instead, advise the function
</span>(<span style="color: #7f007f;">defadvice</span> <span style="color: #0000ff;">desktop-save</span> (after desktop-save-readahead activate)
  <span style="color: #bc8f8f;">"Prepare a readahead(8) file for the desktop file"</span>
  (dim-desktop-get-readahead-file-list))

(<span style="color: #7f007f;">provide</span> '<span style="color: #5f9ea0;">dim-desktop</span>)
</pre>

<p>The <code>awk</code> construct <code>getline</code> allows to process the next line of the input file,
which is very practical here (and in a host of other situations).  Now that
we have a file containing the list of files <code>Emacs</code> will load, we have to
tweak the system to <code>readahead</code> those disk blocks.  As I'm currently using <a href="http://kde.org/">KDE</a>
again, I've done it thusly:</p>

<pre class="src">
% cat ~/.kde/Autostart/readahead.emacs.sh
#! /bin/bash

# just readahead the emacs desktop files
# this file listing is maintained directly from Emacs itself
readahead ~/.emacs.desktop.readahead
</pre>

<p>So, well, it works.  The files that <code>Emacs</code> will need are pre-read, so at the
time the desktop really gets to them, I see no more disk activity (laptops
have a led to see that happening).  But the desktop loading time has not
changed...</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 23 Feb 2011 16:45:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/02/blog/2011/02/23-desktop-mode-and-readahead.html</guid>
</item>
<item>
  <title>desktop-mode and readahead</title>
  <link>http://tapoueh.org/blog/2011/02/23-desktop-mode-and-readahead.html</link>
  <description><![CDATA[h1>desktop-mode and readahead</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2011/index.html>2011</a> / <a href=../../../blog/2011/02/index.html>02</a> / </div>
<div class="date">Wednesday, February 23 2011, 16:45</div>
</div>
<div id="article">
<p>I'm using <a href="http://www.gnu.org/software/emacs/manual/html_node/elisp/Desktop-Save-Mode.html#Desktop-Save-Mode">Desktop Save Mode</a> so that <a href="http://www.gnu.org/software/emacs/">Emacs</a> knows to open again all the
buffers I've been using.  That goes quite well with how often I start <code>Emacs</code>,
that is once a week or once a month.  Now, <code>M-x ibuffer</code> last line is as
following:</p>

<pre class="src">
    718 buffers         19838205                  668 files, 15 processes
</pre>

<p>That means that at startup, <code>Emacs</code> will load that many files.  In order not
to have to wait until it's done doing so, I've setup things this way:</p>

<pre class="src">
<span style="color: #888a85;">;; </span><span style="color: #888a85;">and the session
</span>(setq desktop-restore-eager 20
      desktop-lazy-verbose nil)
(desktop-save-mode 1)
(savehist-mode 1)
</pre>

<p>Problem is that it's still slow.  An idea I had was to use the <a href="https://fedorahosted.org/readahead/browser/README">readahead</a>
tool that allows reducing some distributions boot time.  Of course this tool
is not expecting the same file format as <code>emacs-desktop</code> uses.  Still,
converting is quite easy is some <code>awk</code> magic.  Here's the result:</p>

<pre class="src">
<span style="color: #888a85;">;;; </span><span style="color: #888a85;">dim-desktop.el --- Dimitri Fontaine
</span><span style="color: #888a85;">;;</span><span style="color: #888a85;">
</span><span style="color: #888a85;">;; </span><span style="color: #888a85;">Allows to prepare a readahead file list from desktop-save
</span>
(<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">desktop</span>)

(<span style="color: #729fcf; font-weight: bold;">defvar</span> <span style="color: #eeeeec;">dim-desktop-file-readahead-list</span>
  <span style="color: #ad7fa8; font-style: italic;">"~/.emacs.desktop.readahead"</span>
  <span style="color: #888a85;">"*Where to save the emacs desktop `readahead` file list"</span>)

(<span style="color: #729fcf; font-weight: bold;">defvar</span> <span style="color: #eeeeec;">dim-desktop-filelist-command</span>
  <span style="color: #ad7fa8; font-style: italic;">"gawk -F '[ \"]' '/desktop-.*-buffer/ {getline; if($4) print $4}' %s"</span>
  <span style="color: #888a85;">"Command to run to prepare the readahead file list"</span>)

(<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">dim-desktop-get-readahead-file-list</span> (<span style="color: #8ae234; font-weight: bold;">&amp;optional</span> filename dir)
  <span style="color: #888a85;">"get the file list for readahead from dekstop file in DIR, or ~"</span>
  (<span style="color: #729fcf; font-weight: bold;">with-temp-file</span> (or filename dim-desktop-file-readahead-list)
    (insert
     (shell-command-to-string
      (format dim-desktop-filelist-command
              (expand-file-name desktop-base-file-name (or dir <span style="color: #ad7fa8; font-style: italic;">"~"</span>)))))))

<span style="color: #888a85;">;; </span><span style="color: #888a85;">This will not work because the hook is run before to add the buffers into
</span><span style="color: #888a85;">;; </span><span style="color: #888a85;">the desktop file.
</span><span style="color: #888a85;">;;</span><span style="color: #888a85;">
</span><span style="color: #888a85;">;;</span><span style="color: #888a85;">(add-hook 'desktop-save-hook 'dim-desktop-get-readahead-file-list)
</span>
<span style="color: #888a85;">;; </span><span style="color: #888a85;">so instead, advise the function
</span>(<span style="color: #729fcf; font-weight: bold;">defadvice</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">desktop-save</span> (after desktop-save-readahead activate)
  <span style="color: #888a85;">"Prepare a readahead(8) file for the desktop file"</span>
  (dim-desktop-get-readahead-file-list))

(<span style="color: #729fcf; font-weight: bold;">provide</span> '<span style="color: #8ae234;">dim-desktop</span>)
</pre>

<p>The <code>awk</code> construct <code>getline</code> allows to process the next line of the input file,
which is very practical here (and in a host of other situations).  Now that
we have a file containing the list of files <code>Emacs</code> will load, we have to
tweak the system to <code>readahead</code> those disk blocks.  As I'm currently using <a href="http://kde.org/">KDE</a>
again, I've done it thusly:</p>

<pre class="src">
% cat ~/.kde/Autostart/readahead.emacs.sh
#! /bin/bash

# just readahead the emacs desktop files
# this file listing is maintained directly from Emacs itself
readahead ~/.emacs.desktop.readahead
</pre>

<p>So, well, it works.  The files that <code>Emacs</code> will need are pre-read, so at the
time the desktop really gets to them, I see no more disk activity (laptops
have a led to see that happening).  But the desktop loading time has not
changed...</p>


<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/restore.html">restore</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 23 Feb 2011 16:45:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/02/23-desktop-mode-and-readahead.html</guid>
</item>
<item>
  <title>Back from FOSDEM</title>
  <link>http://tapoueh.org/blog/2011/02/07-back-from-fosdem.html</link>
  <description><![CDATA[h1>Back from FOSDEM</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2011/index.html>2011</a> / <a href=../../../blog/2011/02/index.html>02</a> / </div>
<div class="date">Monday, February 07 2011, 11:10</div>
</div>
<div id="article">
<p>This year we were in the main building of the conference, and apparently the
booth went very well, solding lots of <a href="http://postgresqleu.spreadshirt.net/">PostgreSQL merchandise</a> etc.  I had the
pleasure to once again meet with the community, but being there only 1 day I
didn't spend as much time as I would have liked with some of the people there.</p>

<p>In case you're wondering, my <a href="http://fosdem.org/2011/schedule/event/pg_extension1">extension's talk</a> went quite well, and several
people were kind enough to tell me they appreciated it!  There was video
recording of it, so we will soon have proofs showing how bad it really was
and how <em>polite</em> those people really are :)</p>

<p>I will soon be able to write an article series detailing what's an Extension
and how you deal with them, either as a user or an author.  Well in fact the
goal is for any user to easily become an extension author, as I think lots
of people are already maintaining server side code but missing tools to
manage it properly.  But that will begin once the patch is in, so that I
present <em>the real stuff</em> rather than what I proposed to the community… Stay
tuned!</p>



<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/conferences.html">Conferences</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/fosdem.html">FOSDEM</a> <a href="../../../tags/9.1.html">9.1</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 07 Feb 2011 11:10:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/02/07-back-from-fosdem.html</guid>
</item>
<item>
  <title>Going to FOSDEM</title>
  <link>http://tapoueh.org/blog/2011/02/01-going-to-fosdem.html</link>
  <description><![CDATA[h1>Going to FOSDEM</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2011/index.html>2011</a> / <a href=../../../blog/2011/02/index.html>02</a> / </div>
<div class="date">Tuesday, February 01 2011, 13:35</div>
</div>
<div id="article">
<p>A quick blog entry to say that yes:</p>

<center>
<p><img src="../../../images//going-to-fosdem-2011.png" alt=""></p>
</center>


<p>And I will even do my <a href="http://fosdem.org/2011/schedule/event/pg_extension1">Extension's talk</a> which had a <a href="http://blog.hagander.net/archives/183-Feedback-from-PGDay.EU-the-speakers.html">success at pgday.eu</a>.  The
talk will be updated to include the last developments of the extension's
feature, as some of it changed already in between, and to detail the plan
for the <code>ALTER EXTENSION ... UPGRADE</code> feature that I'd like to see included as
soon as <code>9.1</code>, but time is running so fast.</p>

<p>In fact the design for the <code>UPGRADE</code> has been done and reviewed already, but
there's yet to reach consensus on how to setup which is the upgrade file to
use when upgrading from a given version to another.  I've solved it in my
patch, of course, by adding properties into the extension's <em>control
file</em>. That's the best place to have that setup I think, it allows lots of
flexibility, leave the extension's author in charge, and avoids any hard
coding of any kind of assumptions about file naming or whatever.</p>

<p>Next days and reviews will tell us more about how the design is received.
Meanwhile, we're working on finalizing the main extension's patch, offering
<code>pg_dump</code> support.</p>

<p>See you at <a href="http://fosdem.org/2011/">FOSDEM</a>!</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/conferences.html">Conferences</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/fosdem.html">FOSDEM</a> <a href="../../../tags/9.1.html">9.1</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 01 Feb 2011 13:35:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/02/01-going-to-fosdem.html</guid>
</item>


<item>
  <title>Starting afresh with el-get</title>
  <link>http://tapoueh.org/blog/2011/01/blog/2011/01/11-starting-afresh-with-el-get.html</link>
  <description><![CDATA[<p>It so happens that a colleague of mine wanted to start using <a href="http://www.gnu.org/software/emacs/">Emacs</a> but
couldn't get to it. He insists on having proper color themes in all
applications and some sensible defaults full of nifty add-ons everywhere,
and didn't want to have to learn that much about <em>Emacs</em> and <em>Emacs Lisp</em> to get
started. I'm not even sure that he will <a href="http://www.gnu.org/software/emacs/tour/">Take the Emacs tour</a>.</p>

<p>You would tell me that there's nothing we can do for so unfriendly
users. Well, here's what I did:</p>

<pre class="src">
<span style="color: #b22222;">;; </span><span style="color: #b22222;">emacs setup
</span>
(add-to-list 'load-path <span style="color: #bc8f8f;">"~/.emacs.d/el-get/el-get"</span>)
(<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">el-get</span>)
(setq
 el-get-sources
 '(el-get
   php-mode-improved
   psvn
   auto-complete
   switch-window

   (<span style="color: #da70d6;">:name</span> buffer-move
          <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> ()
                   (global-set-key (kbd <span style="color: #bc8f8f;">"&lt;C-S-up&gt;"</span>)     'buf-move-up)
                   (global-set-key (kbd <span style="color: #bc8f8f;">"&lt;C-S-down&gt;"</span>)   'buf-move-down)
                   (global-set-key (kbd <span style="color: #bc8f8f;">"&lt;C-S-left&gt;"</span>)   'buf-move-left)
                   (global-set-key (kbd <span style="color: #bc8f8f;">"&lt;C-S-right&gt;"</span>)  'buf-move-right)))

   (<span style="color: #da70d6;">:name</span> magit
          <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> ()
                   (global-set-key (kbd <span style="color: #bc8f8f;">"C-x C-z"</span>) 'magit-status)))

   (<span style="color: #da70d6;">:name</span> goto-last-change
          <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> ()
                   <span style="color: #b22222;">;; </span><span style="color: #b22222;">azerty keyboard here, don't use C-x C-/
</span>                   (global-set-key (kbd <span style="color: #bc8f8f;">"C-x C-_"</span>) 'goto-last-change)))))

(<span style="color: #7f007f;">when</span> window-system
   (add-to-list 'el-get-sources  'color-theme-tango))

(el-get 'sync)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">visual settings
</span>(setq inhibit-splash-screen t)
(menu-bar-mode -1)
(tool-bar-mode -1)
(scroll-bar-mode -1)

(line-number-mode 1)
(column-number-mode 1)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">Use the clipboard, pretty please, so that copy/paste "works"
</span>(setq x-select-enable-clipboard t)

(set-frame-font <span style="color: #bc8f8f;">"Monospace-10"</span>)

(global-hl-line-mode)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">suivre les changements exterieurs sur les fichiers
</span>(global-auto-revert-mode 1)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">pour les couleurs dans M-x shell
</span>(autoload 'ansi-color-for-comint-mode-on <span style="color: #bc8f8f;">"ansi-color"</span> nil t)
(add-hook 'shell-mode-hook 'ansi-color-for-comint-mode-on)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">S-fleches pour changer de fen&#234;tre
</span>(windmove-default-keybindings)
(setq windmove-wrap-around t)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">find-file-at-point quand &#231;a a du sens
</span>(setq ffap-machine-p-known 'accept) <span style="color: #b22222;">; </span><span style="color: #b22222;">no pinging
</span>(setq ffap-url-regexp nil) <span style="color: #b22222;">; </span><span style="color: #b22222;">disable URL features in ffap
</span>(setq ffap-ftp-regexp nil) <span style="color: #b22222;">; </span><span style="color: #b22222;">disable FTP features in ffap
</span>(define-key global-map (kbd <span style="color: #bc8f8f;">"C-x C-f"</span>) 'find-file-at-point)

(<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">ibuffer</span>)
(global-set-key <span style="color: #bc8f8f;">"\C-x\C-b"</span> 'ibuffer)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">use iswitchb-mode for C-x b
</span>(iswitchb-mode)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">I can't remember having meant to use C-z as suspend-frame
</span>(global-set-key (kbd <span style="color: #bc8f8f;">"C-z"</span>) 'undo)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">winner-mode pour revenir sur le layout pr&#233;c&#233;dent C-c &lt;left&gt;
</span>(winner-mode 1)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">dired-x pour C-x C-j
</span>(<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">dired-x</span>)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">full screen
</span>(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">fullscreen</span> ()
  (interactive)
  (set-frame-parameter nil 'fullscreen
                       (<span style="color: #7f007f;">if</span> (frame-parameter nil 'fullscreen) nil 'fullboth)))
(global-set-key [f11] 'fullscreen)
</pre>

<p>With just this simple 87 lines (all included) of setup, my local user is
very happy to switch to using <a href="http://www.gnu.org/software/emacs/">our favorite editor</a>. And he's not even afraid
(yet) of his <code>~/.emacs</code>. I say that's a very good sign of where we are with
<a href="https://github.com/dimitri/el-get">el-get</a>!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 11 Jan 2011 16:20:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/01/blog/2011/01/11-starting-afresh-with-el-get.html</guid>
</item>
<item>
  <title>Starting afresh with el-get</title>
  <link>http://tapoueh.org/blog/2011/01/11-starting-afresh-with-el-get.html</link>
  <description><![CDATA[h1>Starting afresh with el-get</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2011/index.html>2011</a> / <a href=../../../blog/2011/01/index.html>01</a> / </div>
<div class="date">Tuesday, January 11 2011, 16:20</div>
</div>
<div id="article">
<p>It so happens that a colleague of mine wanted to start using <a href="http://www.gnu.org/software/emacs/">Emacs</a> but
couldn't get to it. He insists on having proper color themes in all
applications and some sensible defaults full of nifty add-ons everywhere,
and didn't want to have to learn that much about <em>Emacs</em> and <em>Emacs Lisp</em> to get
started. I'm not even sure that he will <a href="http://www.gnu.org/software/emacs/tour/">Take the Emacs tour</a>.</p>

<p>You would tell me that there's nothing we can do for so unfriendly
users. Well, here's what I did:</p>

<pre class="src">
<span style="color: #888a85;">;; </span><span style="color: #888a85;">emacs setup
</span>
(add-to-list 'load-path <span style="color: #ad7fa8; font-style: italic;">"~/.emacs.d/el-get/el-get"</span>)
(<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">el-get</span>)
(setq
 el-get-sources
 '(el-get
   php-mode-improved
   psvn
   auto-complete
   switch-window

   (<span style="color: #729fcf;">:name</span> buffer-move
          <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> ()
                   (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"&lt;C-S-up&gt;"</span>)     'buf-move-up)
                   (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"&lt;C-S-down&gt;"</span>)   'buf-move-down)
                   (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"&lt;C-S-left&gt;"</span>)   'buf-move-left)
                   (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"&lt;C-S-right&gt;"</span>)  'buf-move-right)))

   (<span style="color: #729fcf;">:name</span> magit
          <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> ()
                   (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-x C-z"</span>) 'magit-status)))

   (<span style="color: #729fcf;">:name</span> goto-last-change
          <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> ()
                   <span style="color: #888a85;">;; </span><span style="color: #888a85;">azerty keyboard here, don't use C-x C-/
</span>                   (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-x C-_"</span>) 'goto-last-change)))))

(<span style="color: #729fcf; font-weight: bold;">when</span> window-system
   (add-to-list 'el-get-sources  'color-theme-tango))

(el-get 'sync)

<span style="color: #888a85;">;; </span><span style="color: #888a85;">visual settings
</span>(setq inhibit-splash-screen t)
(menu-bar-mode -1)
(tool-bar-mode -1)
(scroll-bar-mode -1)

(line-number-mode 1)
(column-number-mode 1)

<span style="color: #888a85;">;; </span><span style="color: #888a85;">Use the clipboard, pretty please, so that copy/paste "works"
</span>(setq x-select-enable-clipboard t)

(set-frame-font <span style="color: #ad7fa8; font-style: italic;">"Monospace-10"</span>)

(global-hl-line-mode)

<span style="color: #888a85;">;; </span><span style="color: #888a85;">suivre les changements exterieurs sur les fichiers
</span>(global-auto-revert-mode 1)

<span style="color: #888a85;">;; </span><span style="color: #888a85;">pour les couleurs dans M-x shell
</span>(autoload 'ansi-color-for-comint-mode-on <span style="color: #ad7fa8; font-style: italic;">"ansi-color"</span> nil t)
(add-hook 'shell-mode-hook 'ansi-color-for-comint-mode-on)

<span style="color: #888a85;">;; </span><span style="color: #888a85;">S-fleches pour changer de fen&#234;tre
</span>(windmove-default-keybindings)
(setq windmove-wrap-around t)

<span style="color: #888a85;">;; </span><span style="color: #888a85;">find-file-at-point quand &#231;a a du sens
</span>(setq ffap-machine-p-known 'accept) <span style="color: #888a85;">; </span><span style="color: #888a85;">no pinging
</span>(setq ffap-url-regexp nil) <span style="color: #888a85;">; </span><span style="color: #888a85;">disable URL features in ffap
</span>(setq ffap-ftp-regexp nil) <span style="color: #888a85;">; </span><span style="color: #888a85;">disable FTP features in ffap
</span>(define-key global-map (kbd <span style="color: #ad7fa8; font-style: italic;">"C-x C-f"</span>) 'find-file-at-point)

(<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">ibuffer</span>)
(global-set-key <span style="color: #ad7fa8; font-style: italic;">"\C-x\C-b"</span> 'ibuffer)

<span style="color: #888a85;">;; </span><span style="color: #888a85;">use iswitchb-mode for C-x b
</span>(iswitchb-mode)

<span style="color: #888a85;">;; </span><span style="color: #888a85;">I can't remember having meant to use C-z as suspend-frame
</span>(global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-z"</span>) 'undo)

<span style="color: #888a85;">;; </span><span style="color: #888a85;">winner-mode pour revenir sur le layout pr&#233;c&#233;dent C-c &lt;left&gt;
</span>(winner-mode 1)

<span style="color: #888a85;">;; </span><span style="color: #888a85;">dired-x pour C-x C-j
</span>(<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">dired-x</span>)

<span style="color: #888a85;">;; </span><span style="color: #888a85;">full screen
</span>(<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">fullscreen</span> ()
  (interactive)
  (set-frame-parameter nil 'fullscreen
                       (<span style="color: #729fcf; font-weight: bold;">if</span> (frame-parameter nil 'fullscreen) nil 'fullboth)))
(global-set-key [f11] 'fullscreen)
</pre>

<p>With just this simple 87 lines (all included) of setup, my local user is
very happy to switch to using <a href="http://www.gnu.org/software/emacs/">our favorite editor</a>. And he's not even afraid
(yet) of his <code>~/.emacs</code>. I say that's a very good sign of where we are with
<a href="https://github.com/dimitri/el-get">el-get</a>!</p>


<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/switch-window.html">switch-window</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 11 Jan 2011 16:20:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2011/01/11-starting-afresh-with-el-get.html</guid>
</item>
<item>
  <title>el-get 1.1, with 174 recipes</title>
  <link>http://tapoueh.org/blog/2010/12/blog/2010/12/20-el-get-11-with-174-recipes.html</link>
  <description><![CDATA[<p>Yes, you read it well, <a href="https://github.com/dimitri/el-get">el-get</a> currently <em>features</em> <code>174</code> <a href="https://github.com/dimitri/el-get/tree/master/recipes">recipes</a>, and is now
reaching the <code>1.1</code> release. The reason for this release is mainly that I have
two big chunks of code to review and the current code has been very stable
for awhile. It seems better to do a release with the stable code that exists
now before to shake it this much. If you're wondering when to jump in the
water and switch to using <em>el-get</em>, now is a pretty good time.</p>

<h3>New source types</h3>

<p class="first">We now have support for the <a href="http://www.archlinux.org/pacman/">pacman</a> package management for <a href="http://www.archlinux.org/">archlinux</a>, and a
way to handle a different package name in the recipe and in the
distribution. We also have support for <a href="http://mercurial.selenic.com/">mercurial</a> and <a href="http://subversion.tigris.org/">subversion</a> and <a href="http://darcs.net/">darcs</a>.</p>

<p>Also, <a href="http://wiki.debian.org/Apt">apt-get</a> will sometime prompt you to validate its choices, that's the
infamous <em>Do you want to continue?</em> prompt. We now handle that smoothly.</p>


<h3>(el-get 'sync)</h3>

<p class="first">In <code>1.1</code>, that really means <em>synchronous</em>. That means we install one package
after the other, and any error will stop it all. Before that, it was an
active wait loop over a parallel install: this option is still available
through calling <code>(el-get 'wait)</code>.</p>


<h3>No more <em>failed to install</em></h3>

<p class="first">Exactly. This error you may have encountered sometime is due to trying to
install a package over a previous failed install attempt (network outage,
disk full, bad work-in-progress recipe, etc). After awhile in the field it
was clear that no case where found where you would regret it if <a href="https://github.com/dimitri/el-get">el-get</a> just
did removed the previous failed installation for you before to go and
install again, as aked. So that's now automatic.</p>


<h3>Featuring an overhauled :build facility</h3>

<p class="first">The <code>build</code> commands can now either be a list, as before, or some that we
<em>evaluate</em> for you. That allows for easier to maintain <em>recipes</em>, and here's an
exemple of that:</p>

<pre class="src">
(<span style="color: #da70d6;">:name</span> distel
       <span style="color: #da70d6;">:type</span> svn
       <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"http://distel.googlecode.com/svn/trunk/"</span>
       <span style="color: #da70d6;">:info</span> <span style="color: #bc8f8f;">"doc"</span>
       <span style="color: #da70d6;">:build</span> `,(mapcar
                 (<span style="color: #7f007f;">lambda</span> (target)
                   (concat <span style="color: #bc8f8f;">"make "</span> target <span style="color: #bc8f8f;">" EMACS="</span> el-get-emacs))
                 '(<span style="color: #bc8f8f;">"clean"</span> <span style="color: #bc8f8f;">"all"</span>))
       <span style="color: #da70d6;">:load-path</span> (<span style="color: #bc8f8f;">"elisp"</span>)
       <span style="color: #da70d6;">:features</span> distel)
</pre>

<p>As you see that also allows for maintainance of multi-platform build
recipes, and multiple emacs versions too. It's still a little too much on
the <em>awkward</em> side of things, though, and that's one of the ongoing work that
will happen for next version.</p>


<h3>Misc improvements</h3>

<p class="first">We are now able to <code>byte-compile</code> your packages, and offer some more hooks
(<code>el-get-init-hooks</code> has been asked with a nice usage example). There's a new
<code>:localname</code> property that allows to pick where to save the local file when
using <code>HTTP</code> method for retrieval, and that in turn allows to fix some
<em>recipes</em>.</p>

<pre class="src">
(<span style="color: #da70d6;">:name</span> xcscope
       <span style="color: #da70d6;">:type</span> http
       <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"http://cscope.cvs.sourceforge.net/viewvc/cscope/cscope/contrib/xcsc</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">ope/xcscope.el?revision=1.14&amp;content-type=text%2Fplain"</span>
       <span style="color: #da70d6;">:localname</span> <span style="color: #bc8f8f;">"xscope.el"</span>
       <span style="color: #da70d6;">:features</span> xcscope)
</pre>

<p>Oh and you even get <code>:before</code> user function support, even if needing it often
shows that you're doing it in a strange way. More often than not it's
possible to do all you need to in the <code>:after</code> function, but this tool is
there so that you spend less time on having a working environment, not more,
right? :)</p>


<h3>Switch notice</h3>

<p class="first">All in all, if you're already using <a href="https://github.com/dimitri/el-get">el-get</a> you should consider switching to
<code>1.1</code> (by issuing <code>M-x el-get-update</code> of course), and if you're hesitating, just
join the fun now!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 20 Dec 2010 16:45:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/12/blog/2010/12/20-el-get-11-with-174-recipes.html</guid>
</item>
<item>
  <title>el-get 1.1, with 174 recipes</title>
  <link>http://tapoueh.org/blog/2010/12/20-el-get-11-with-174-recipes.html</link>
  <description><![CDATA[h1>el-get 1.1, with 174 recipes</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/12/index.html>12</a> / </div>
<div class="date">Monday, December 20 2010, 16:45</div>
</div>
<div id="article">
<p>Yes, you read it well, <a href="https://github.com/dimitri/el-get">el-get</a> currently <em>features</em> <code>174</code> <a href="https://github.com/dimitri/el-get/tree/master/recipes">recipes</a>, and is now
reaching the <code>1.1</code> release. The reason for this release is mainly that I have
two big chunks of code to review and the current code has been very stable
for awhile. It seems better to do a release with the stable code that exists
now before to shake it this much. If you're wondering when to jump in the
water and switch to using <em>el-get</em>, now is a pretty good time.</p>

<h3>New source types</h3>

<p class="first">We now have support for the <a href="http://www.archlinux.org/pacman/">pacman</a> package management for <a href="http://www.archlinux.org/">archlinux</a>, and a
way to handle a different package name in the recipe and in the
distribution. We also have support for <a href="http://mercurial.selenic.com/">mercurial</a> and <a href="http://subversion.tigris.org/">subversion</a> and <a href="http://darcs.net/">darcs</a>.</p>

<p>Also, <a href="http://wiki.debian.org/Apt">apt-get</a> will sometime prompt you to validate its choices, that's the
infamous <em>Do you want to continue?</em> prompt. We now handle that smoothly.</p>


<h3>(el-get 'sync)</h3>

<p class="first">In <code>1.1</code>, that really means <em>synchronous</em>. That means we install one package
after the other, and any error will stop it all. Before that, it was an
active wait loop over a parallel install: this option is still available
through calling <code>(el-get 'wait)</code>.</p>


<h3>No more <em>failed to install</em></h3>

<p class="first">Exactly. This error you may have encountered sometime is due to trying to
install a package over a previous failed install attempt (network outage,
disk full, bad work-in-progress recipe, etc). After awhile in the field it
was clear that no case where found where you would regret it if <a href="https://github.com/dimitri/el-get">el-get</a> just
did removed the previous failed installation for you before to go and
install again, as aked. So that's now automatic.</p>


<h3>Featuring an overhauled :build facility</h3>

<p class="first">The <code>build</code> commands can now either be a list, as before, or some that we
<em>evaluate</em> for you. That allows for easier to maintain <em>recipes</em>, and here's an
exemple of that:</p>

<pre class="src">
(<span style="color: #729fcf;">:name</span> distel
       <span style="color: #729fcf;">:type</span> svn
       <span style="color: #729fcf;">:url</span> <span style="color: #ad7fa8; font-style: italic;">"http://distel.googlecode.com/svn/trunk/"</span>
       <span style="color: #729fcf;">:info</span> <span style="color: #ad7fa8; font-style: italic;">"doc"</span>
       <span style="color: #729fcf;">:build</span> `,(mapcar
                 (<span style="color: #729fcf; font-weight: bold;">lambda</span> (target)
                   (concat <span style="color: #ad7fa8; font-style: italic;">"make "</span> target <span style="color: #ad7fa8; font-style: italic;">" EMACS="</span> el-get-emacs))
                 '(<span style="color: #ad7fa8; font-style: italic;">"clean"</span> <span style="color: #ad7fa8; font-style: italic;">"all"</span>))
       <span style="color: #729fcf;">:load-path</span> (<span style="color: #ad7fa8; font-style: italic;">"elisp"</span>)
       <span style="color: #729fcf;">:features</span> distel)
</pre>

<p>As you see that also allows for maintainance of multi-platform build
recipes, and multiple emacs versions too. It's still a little too much on
the <em>awkward</em> side of things, though, and that's one of the ongoing work that
will happen for next version.</p>


<h3>Misc improvements</h3>

<p class="first">We are now able to <code>byte-compile</code> your packages, and offer some more hooks
(<code>el-get-init-hooks</code> has been asked with a nice usage example). There's a new
<code>:localname</code> property that allows to pick where to save the local file when
using <code>HTTP</code> method for retrieval, and that in turn allows to fix some
<em>recipes</em>.</p>

<pre class="src">
(<span style="color: #729fcf;">:name</span> xcscope
       <span style="color: #729fcf;">:type</span> http
       <span style="color: #729fcf;">:url</span> <span style="color: #ad7fa8; font-style: italic;">"http://cscope.cvs.sourceforge.net/viewvc/cscope/cscope/contrib/xcsc</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">ope/xcscope.el?revision=1.14&amp;content-type=text%2Fplain"</span>
       <span style="color: #729fcf;">:localname</span> <span style="color: #ad7fa8; font-style: italic;">"xscope.el"</span>
       <span style="color: #729fcf;">:features</span> xcscope)
</pre>

<p>Oh and you even get <code>:before</code> user function support, even if needing it often
shows that you're doing it in a strange way. More often than not it's
possible to do all you need to in the <code>:after</code> function, but this tool is
there so that you spend less time on having a working environment, not more,
right? :)</p>


<h3>Switch notice</h3>

<p class="first">All in all, if you're already using <a href="https://github.com/dimitri/el-get">el-get</a> you should consider switching to
<code>1.1</code> (by issuing <code>M-x el-get-update</code> of course), and if you're hesitating, just
join the fun now!</p>



<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/release.html">release</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 20 Dec 2010 16:45:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/12/20-el-get-11-with-174-recipes.html</guid>
</item>












<item>
  <title>Dynamic Triggers in PLpgSQL</title>
  <link>http://tapoueh.org/blog/2010/11/24-dynamic-triggers-in-plpgsql.html</link>
  <description><![CDATA[h1>Dynamic Triggers in PLpgSQL</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/11/index.html>11</a> / </div>
<div class="date">Wednesday, November 24 2010, 16:45</div>
</div>
<div id="article">
<p>You certainly know that implementing <em>dynamic</em> triggers in <code>PLpgSQL</code> is
impossible. But I had a very bad night, being up from as soon as 3:30 am
today, so that when a developer asked me about reusing the same trigger
function code from more than one table and for a dynamic column name, I
didn't remember about it being impossible.</p>

<p>Here's what happens in such cases, after a long time on the problem (yes,
overall, that's a slow day). Note that I'm abusing the <code>(record_literal).*</code>
notation a lot in there, and even the <code>(record_literal).column_name</code> too.</p>

<pre class="src">
CREATE OR REPLACE FUNCTION public.update_timestamp()
 RETURNS TRIGGER
 LANGUAGE plpgsql
AS $f$
DECLARE
    ts_column varchar;
    old_timestamp timestamptz;
    attname name;
    n text;
    v text;
BEGIN
    IF TG_NARGS != 1
    THEN
        RAISE EXCEPTION <span style="color: #ad7fa8; font-style: italic;">'Trigger public.update_timestamp() called with % args'</span>,
                         TG_NARGS;
    END IF;

    ts_column := TG_ARGV[0];

    EXECUTE <span style="color: #ad7fa8; font-style: italic;">'SELECT n.'</span> || ts_column
         || <span style="color: #ad7fa8; font-style: italic;">' FROM (SELECT ('</span>
         || quote_literal(OLD) || <span style="color: #ad7fa8; font-style: italic;">'::'</span> || TG_RELID::regclass
         || <span style="color: #ad7fa8; font-style: italic;">').*) as n'</span>
       INTO old_timestamp;

    <span style="color: #888a85;">-- build NEW record text
</span>    n := <span style="color: #ad7fa8; font-style: italic;">'('</span>;
    FOR attname IN
      EXECUTE <span style="color: #ad7fa8; font-style: italic;">'SELECT attname '</span>
           || <span style="color: #ad7fa8; font-style: italic;">'  FROM pg_class c left join pg_attribute a on a.attrelid = c.oid'</span>
           || <span style="color: #ad7fa8; font-style: italic;">' WHERE c.oid = $1 and attnum &gt; 0 order by attnum'</span>
       USING TG_RELID
    LOOP
        EXECUTE <span style="color: #ad7fa8; font-style: italic;">'SELECT ('</span> || quote_literal(NEW) || <span style="color: #ad7fa8; font-style: italic;">'::'</span> || TG_RELID::regclass || <span style="color: #ad7fa8; font-style: italic;">').'</span> || attname INTO v;

        IF n != <span style="color: #ad7fa8; font-style: italic;">'('</span> THEN n := n || <span style="color: #ad7fa8; font-style: italic;">','</span>; END IF;

        IF attname = ts_column
           AND v::timestamptz IS NOT DISTINCT FROM old_timestamp
        THEN
                n := n || now();
        ELSE
                n := n || COALESCE(v, <span style="color: #ad7fa8; font-style: italic;">''</span>);
        END IF;
    END LOOP;
    n := n || <span style="color: #ad7fa8; font-style: italic;">')'</span>;

    EXECUTE <span style="color: #ad7fa8; font-style: italic;">'SELECT ($1::'</span> || TG_RELID::regclass || <span style="color: #ad7fa8; font-style: italic;">').*'</span>
      INTO NEW
     USING n;

    RETURN NEW;
END;
$f$;
</pre>

<p>It's not pretty, and not fast. It's about <code>2 ms</code> per call on a table with <code>15</code>
columns, in some preliminary tests. But it sure was a nice challenge!</p>


<h2>Tags</h2>

<p><a href="../../../tags/plpgsql.html">plpgsql</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 24 Nov 2010 16:45:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/11/24-dynamic-triggers-in-plpgsql.html</guid>
</item>
<item>
  <title>pg_basebackup</title>
  <link>http://tapoueh.org/blog/2010/11/07-pg_basebackup.html</link>
  <description><![CDATA[h1>pg_basebackup</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/11/index.html>11</a> / </div>
<div class="date">Sunday, November 07 2010, 13:45</div>
</div>
<div id="article">
<p><a href="http://2ndquadrant.com/about/#krosing">Hannu</a> just gave me a good idea in <a href="http://archives.postgresql.org/pgsql-hackers/2010-11/msg00236.php">this email</a> on <a href="http://archives.postgresql.org/pgsql-hackers/">-hackers</a>, proposing that
<a href="https://github.com/dimitri/pg_basebackup">pg_basebackup</a> should get the <code>xlog</code> files again and again in a loop for the
whole duration of the <em>base backup</em>. That's now done in the aforementioned
tool, whose options got a little more useful now:</p>

<pre class="src">
Usage: pg_basebackup.py [-v] [-f] [-j jobs] <span style="color: #ad7fa8; font-style: italic;">"dsn"</span> dest

Options:
  -h, --help            show this help message and exit
  --version             show version and quit
  -x, --pg_xlog         backup the pg_xlog files
  -v, --verbose         be verbose and about processing progress
  -d, --debug           show debug information, including SQL queries
  -f, --force           remove destination directory if it exists
  -j JOBS, --jobs=JOBS  how many helper jobs to launch
  -D DELAY, --delay=DELAY
                        pg_xlog subprocess loop delay, see -x
  -S, --slave           auxilliary process
  --stdin               get list of files to backup from stdin
</pre>

<p>Yeah, as implementing the <code>xlog</code> idea required having some kind of
parallelism, I built on it and the script now has a <code>--jobs</code> option for you to
setup how many processes to launch in parallel, all fetching some <code>base
backup</code> files in its own standard (<code>libpq</code>) <a href="http://www.postgresql.org/">PostgreSQL</a> connection, in
compressed chunks of <code>8 MB</code> (so that's not <code>8 MB</code> chunks sent over).</p>

<p>The <code>xlog</code> loop will fetch any <code>WAL</code> file whose <code>ctime</code> changed again,
wholesale. It's easier this way, and tools to get optimized behavior already
do exist, either <a href="http://skytools.projects.postgresql.org/doc/walmgr.html">walmgr</a> or <a href="http://www.postgresql.org/docs/9.0/interactive/warm-standby.html#STREAMING-REPLICATION">walreceiver</a>.</p>

<p>The script is still a little <a href="http://python.org/">python</a> self-contained short file, it just went
from about <code>100</code> lines of code to about <code>400</code> lines. There's no external
dependency, all it needs is provided by a standard python installation. The
problem with that is that it's using <code>select.poll()</code> that I think is not
available on windows. Supporting every system or adding to the dependencies,
I've been choosing what's easier for me.</p>

<pre class="src">
    <span style="color: #729fcf; font-weight: bold;">import</span> select
    <span style="color: #eeeeec;">p</span> = select.poll()
    p.register(sys.stdin, select.POLLIN)
</pre>

<p>If you get to try it, please report about it, you should know or easily
discover my <em>email</em>!</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/skytools.html">skytools</a> <a href="../../../tags/backup.html">backup</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Sun, 07 Nov 2010 13:45:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/11/07-pg_basebackup.html</guid>
</item>


<item>
  <title>Introducing Extensions</title>
  <link>http://tapoueh.org/blog/2010/10/21-introducing-extensions.html</link>
  <description><![CDATA[h1>Introducing Extensions</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/10/index.html>10</a> / </div>
<div class="date">Thursday, October 21 2010, 13:45</div>
</div>
<div id="article">
<p>After reading <a href="http://database-explorer.blogspot.com/2010/10/extensions-in-91.html">Simon's blog post</a>, I can't help but try to give some details
about what it is exactly that I'm working on. As he said, there are several
aspects to <em>extensions</em> in <a href="http://www.postgresql.org/">PostgreSQL</a>, it all begins here:
<a href="http://www.postgresql.org/docs/9.0/interactive/extend.html">Chapter 35. Extending SQL</a>.</p>

<p>It's possible, and mostly simple enough, to add your own code or behavior to
PostgreSQL, so that it will use your code and your semantics while solving
user queries. That's highly useful and it's easy to understand how so when
you look at some projects like <a href="http://postgis.refractions.net/">PostGIS</a>, <a href="http://pgfoundry.org/projects/ip4r/">ip4r</a> (index searches of <code>ip</code> in a
<code>range</code>, not limited to <code>CIDR</code> notation), or our own <em>Key Value Store</em>, <a href="http://www.postgresql.org/docs/9.0/interactive/hstore.html">hstore</a>.</p>

<h3>So, what's in an <em>Extension</em>?</h3>

<p class="first">An <em>extension</em> in its simple form is a <code>SQL</code> <em>script</em> that you load on your
database, but manage separately. Meaning you don't want the script to be
part of your backups. Often, that kind of script will create new datatypes
and operators, support functions, user functions and index support, and then
it would include some <code>C</code> code that ships in a <em>shared library object</em>.</p>

<p>As far as PostgreSQL is concerned, at least in the current version of my
patch, the extension is first a <em>meta</em> information file that allows to
register it. We currently call that the <code>control</code> file. Then, it's an <code>SQL</code>
script that is <em>executed</em> by the server when you <code>create</code> the <em>extension</em>.</p>

<p>If it so happens that the <code>SQL</code> script depends on some <em>shared library objects</em>
file, this has to be present at the right place (<code>MODULE_PATHNAME</code>) for the
<em>extension</em> to be successfully created, but that's always been the case.</p>

<p>The problem with current releases of PostgreSQL, that the <em>extension</em> patch is
solving, is the <code>pg_dump</code> and <code>pg_restore</code> support. We said it, you don't want
the <code>SQL</code> script to be part of your dump, because it's not maintained in your
database, but in some code repository out there. What you want is to be able
to install the <em>extension</em> again at the file system level then <code>pg_restore</code> your
database — that depends on it being there.</p>

<p>And that's exactly what the <em>extension</em> patch provides. By now having a <code>SQL</code>
object called an <code>extension</code>, and maintained in the new <code>pg_extension</code> catalog,
we have an <code>Oid</code> to refer to. Which we do by recording a dependency between
any object created by the script and the <em>extension</em> <code>Oid</code>, so that <code>pg_dump</code> can
be instructed to skip those.</p>


<h3>Examples?</h3>

<p class="first">So, let's have a look at what you can do if you play with a patched
development server version, or if you play directly from the <code>git</code> repository
at
<a href="http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=shortlog;h=refs/heads/extension">http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=shortlog;h=refs/heads/extension</a></p>

<pre class="src">
dim ~ createdb exts
dim ~ psql exts
psql (9.1devel)
Type <span style="color: #ad7fa8; font-style: italic;">"help"</span> for help.

dim=# \dx+
                                                        List of extensions
        Name        | | |                               Description
--------------------+-+-+-------------------------------------------------------------------------
 adminpack          | | | Administrative functions for PostgreSQL
 auto_username      | | | functions for tracking who changed a table
 autoinc            | | | functions for autoincrementing fields
 btree_gin          | | | GIN support for common types BTree operators
 btree_gist         | | | GiST support for common types BTree operators
 chkpass            | | | Store crypt()ed passwords
 citext             | | | case-insensitive character string type
 cube               | | | data type for representing multidimensional cubes
 dblink             | | | connect to other PostgreSQL databases from within a database
 dict_int           | | | example of an add-on dictionary template for full-text search
 dict_xsyn          | | | example of an add-on dictionary template for full-text search
 earthdistance      | | | calculating great circle distances on the surface of the Earth
 fuzzystrmatch      | | | determine similarities and distance between strings
 hstore             | | | storing sets of key/value pairs
 int_aggregate      | | | integer aggregator and an enumerator (obsolete)
 intarray           | | | one-dimensional arrays of integers: functions, operators, index support
 isn                | | | data types for the international product numbering standards
 lo                 | | | managing Large Objects
 ltree              | | | data type for hierarchical tree-like structure
 moddatetime        | | | functions for tracking last modification time
 pageinspect        | | | inspect the contents of database pages at a low level
 pg_buffercache     | | | examine the shared buffer cache in real time
 pg_freespacemap    | | | examine the free space map (FSM)
 pg_stat_statements | | | tracking execution statistics of all SQL statements executed
 pg_trgm            | | | determine the similarity of text, with indexing support
 pgcrypto           | | | cryptographic functions
 pgrowlocks         | | | show row locking information for a specified table
 pgstattuple        | | | obtain tuple-level statistics
 prefix             | | | Prefix Match Indexing
 refint             | | | functions for implementing referential integrity
 seg                | | | data type for representing line segments, or floating point intervals
 tablefunc          | | | various functions that return tables, including crosstab(text sql)
 test_parser        | | | example of a custom parser for full-text search
 timetravel         | | | functions for implementing time travel
 tsearch2           | | | backwards-compatible text search functionality (pre-8.3)
 unaccent           | | | text search dictionary that removes accents
(36 rows)
</pre>

<p>Ok I've edited the output in a visible way, to leave the <em>Version</em> and <em>Custom
Variable Classes</em> column out. It's taking lots of screen place and it's not
that useful here. Maybe the <em>classes</em> one will even get dropped out of the
patch before reaching <code>9.1</code>, we'll see.</p>

<p>Let's pick an extension there and install it in our new database:</p>

<pre class="src">
exts=# create extension pg_trgm;
NOTICE:  Installing extension 'pg_trgm' from '/Users/dim/pgsql/exts/share/contrib/pg_trgm.sql', with user data
CREATE EXTENSION
exts=# \dx
                                           List of extensions
  Name   |  |  |                       Description
---------+--+--+---------------------------------------------------------
 pg_trgm |  |  | determine the similarity of text, with indexing support
(1 row)
</pre>

<p>See, that was easy enough. Same thing, the extra columns have been
removed. So, what's in this extension, will you ask me, what are those
objects that you would normally (that is, before the patch) find in your
<code>pg_dump</code> backup script?</p>

<pre class="src">
exts=# select * from pg_extension_objects('pg_trgm');
    class     | classid | objid |                                                                objdesc
--------------+---------+-------+----------------------------------------------------------------------------------------------------------------------------------------
 pg_extension |    3996 | 18498 | extension pg_trgm
 pg_proc      |    1255 | 18499 | function set_limit(real)
 pg_proc      |    1255 | 18500 | function show_limit()
 pg_proc      |    1255 | 18501 | function show_trgm(text)
 pg_proc      |    1255 | 18502 | function similarity(text,text)
 pg_proc      |    1255 | 18503 | function similarity_op(text,text)
 pg_operator  |    2617 | 18504 | operator %(text,text)
 pg_type      |    1247 | 18505 | type gtrgm
 pg_proc      |    1255 | 18506 | function gtrgm_in(cstring)
 pg_proc      |    1255 | 18507 | function gtrgm_out(gtrgm)
 pg_type      |    1247 | 18508 | type gtrgm[]
 pg_proc      |    1255 | 18509 | function gtrgm_consistent(internal,text,integer,oid,internal)
 pg_proc      |    1255 | 18510 | function gtrgm_compress(internal)
 pg_proc      |    1255 | 18511 | function gtrgm_decompress(internal)
 pg_proc      |    1255 | 18512 | function gtrgm_penalty(internal,internal,internal)
 pg_proc      |    1255 | 18513 | function gtrgm_picksplit(internal,internal)
 pg_proc      |    1255 | 18514 | function gtrgm_union(bytea,internal)
 pg_proc      |    1255 | 18515 | function gtrgm_same(gtrgm,gtrgm,internal)
 pg_opfamily  |    2753 | 18516 | operator family gist_trgm_ops for access method gist
 pg_opclass   |    2616 | 18517 | operator class gist_trgm_ops for access method gist
 pg_amop      |    2602 | 18518 | operator 1 %(text,text) of operator family gist_trgm_ops for access method gist
 pg_amproc    |    2603 | 18519 | function 1 gtrgm_consistent(internal,text,integer,oid,internal) of operator family gist_trgm_ops for access method gist
 pg_amproc    |    2603 | 18520 | function 2 gtrgm_union(bytea,internal) of operator family gist_trgm_ops for access method gist
 pg_amproc    |    2603 | 18521 | function 3 gtrgm_compress(internal) of operator family gist_trgm_ops for access method gist
 pg_amproc    |    2603 | 18522 | function 4 gtrgm_decompress(internal) of operator family gist_trgm_ops for access method gist
 pg_amproc    |    2603 | 18523 | function 5 gtrgm_penalty(internal,internal,internal) of operator family gist_trgm_ops for access method gist
 pg_amproc    |    2603 | 18524 | function 6 gtrgm_picksplit(internal,internal) of operator family gist_trgm_ops for access method gist
 pg_amproc    |    2603 | 18525 | function 7 gtrgm_same(gtrgm,gtrgm,internal) of operator family gist_trgm_ops for access method gist
 pg_proc      |    1255 | 18526 | function gin_extract_trgm(text,internal)
 pg_proc      |    1255 | 18527 | function gin_extract_trgm(text,internal,smallint,internal,internal)
 pg_proc      |    1255 | 18528 | function gin_trgm_consistent(internal,smallint,text,integer,internal,internal)
 pg_opfamily  |    2753 | 18529 | operator family gin_trgm_ops for access method gin
 pg_opclass   |    2616 | 18530 | operator class gin_trgm_ops for access method gin
 pg_amop      |    2602 | 18531 | operator 1 %(text,text) of operator family gin_trgm_ops for access method gin
 pg_amproc    |    2603 | 18532 | function 1 btint4cmp(integer,integer) of operator family gin_trgm_ops for access method gin
 pg_amproc    |    2603 | 18533 | function 2 gin_extract_trgm(text,internal) of operator family gin_trgm_ops for access method gin
 pg_amproc    |    2603 | 18534 | function 3 gin_extract_trgm(text,internal,smallint,internal,internal) of operator family gin_trgm_ops for access method gin
 pg_amproc    |    2603 | 18535 | function 4 gin_trgm_consistent(internal,smallint,text,integer,internal,internal) of operator family gin_trgm_ops for access method gin
(38 rows)
</pre>

<p>This function main intended users are the <em>extension</em> authors themselves, so
that it's easy for them to figure out which system identifier (the <code>objid</code>
column) has been attributed to some <code>SQL</code> objects from their install
script. With this knowledge, you can prepare some <em>upgrade</em> scripts. But
that's for another patch altogether, so we'll get back to the matter in
another blog entry.</p>

<p>So we chose <a href="http://www.postgresql.org/docs/9.0/interactive/pgtrgm.html">trgm</a> as an example, let's follow the documentation and create a
test table and a custom index in there, just so that the extension is put to
good use. Then let's try to <code>DROP</code> our extension, because we're testing the
infrastructure, right?</p>

<pre class="src">
exts=# create table test(id bigint, name text);
CREATE TABLE
exts=# CREATE INDEX idx_test_name ON test USING gist (name gist_trgm_ops);
CREATE INDEX
exts=# drop extension pg_trgm;
ERROR:  cannot drop extension pg_trgm because other objects depend on it
DETAIL:  index idx_test_name depends on operator class gist_trgm_ops for access method gist
HINT:  Use DROP ... CASCADE to drop the dependent objects too.
</pre>

<p>Of course PostgreSQL is smart enough here — the <em>extension</em> patch had nothing
special to do to achieve that, apart from recording the dependencies. Next,
as we didn't <code>drop extension pg_trgm cascade;</code>, it's still in the database. So
let's see what a <code>pg_dump</code> will look like. As it's quite a lot of text to
paste, let's see the <code>pg_restore</code> catalog instead. And that's a feature that
needs to be known some more, too.</p>

<pre class="src">
dim ~ pg_dump -Fc exts | pg_restore -l | grep -v '^;'
1812; 1262 18497 DATABASE - exts dim
1; 3996 18498 EXTENSION - pg_trgm
1813; 0 0 COMMENT - EXTENSION pg_trgm
6; 2615 2200 SCHEMA - public dim
1814; 0 0 COMMENT - SCHEMA public dim
1815; 0 0 ACL - public dim
320; 2612 11602 PROCEDURAL LANGUAGE - plpgsql dim
1521; 1259 18543 TABLE public test dim
1809; 0 18543 TABLE DATA public test dim
1808; 1259 18549 INDEX public idx_test_name dim
</pre>

<p>As you see, the only SQL object that got into the backup are an <code>EXTENSION</code>
and its <code>COMMENT</code>. Nothing like the types or the functions that the <code>pg_trgm</code>
script creates.</p>


<h3>What does it means to extension authors?</h3>

<p class="first">In order to be an <em>extension</em>, you have to prepare a <em>control</em> file where to
give the necessary information to register your script. This file must be
named <code>extension.control</code> if the script is named <code>extension.sql</code>, at least at
the moment. This file can benefit from some variable expansion too, like
does the current <code>extension.sql.in</code>, in that if you provide an
<code>extension.control.in</code> file the term <code>VERSION</code> will be expanded to whatever
<code>$(VERSION)</code> is set to in your <code>Makefile</code>.</p>

<p>If you never wrote a <code>C</code> coded <em>extension</em> for PostgreSQL, this might look
complex and irrelevant. Baseline is that you need a <code>Makefile</code> so that you can
benefit easily from the PostgreSQL infrastructure work and have the <code>make
install</code> operation place your files at the right place, including the new
<code>control</code> file.</p>


<h3>That's it for today, folks</h3>

<p class="first">A next blog entry will detail what happens with extensions providing <em>user
data</em>, and the <code>CREATE EXTENSION name WITH NO DATA;</code> variant. Stay tuned!</p>



<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/ip4r.html">ip4r</a> <a href="../../../tags/plpgsql.html">plpgsql</a> <a href="../../../tags/backup.html">backup</a> <a href="../../../tags/restore.html">restore</a> <a href="../../../tags/prefix.html">prefix</a> <a href="../../../tags/9.1.html">9.1</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 21 Oct 2010 13:45:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/10/21-introducing-extensions.html</guid>
</item>
<item>
  <title>Extensions: writing a patch for PostgreSQL</title>
  <link>http://tapoueh.org/blog/2010/10/15-extensions-writing-a-patch-for-postgresql.html</link>
  <description><![CDATA[h1>Extensions: writing a patch for PostgreSQL</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/10/index.html>10</a> / </div>
<div class="date">Friday, October 15 2010, 11:30</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>These days, thanks to my <a href="http://2ndquadrant.com/">community oriented job</a>, I'm working full time on a
<a href="http://www.postgresql.org/">PostgreSQL</a> patch to terminate basic support for <a href="http://www.postgresql.org/docs/9/static/extend.html">extending SQL</a>. First thing I
want to share is that patching the <em>backend code</em> is not as hard as one would
think. Second one is that <a href="http://git-scm.com/">git</a> really is helping.</p>

<p><em>“Not as hard as one would think</em>, are you kidding me?”, I hear some
say. Well, that's true. It's <code>C</code> code in there, but with a very good layer of
abstractions so that you're not dealing with subtle problems that much. Of
course it happens that you have to, and managing the memory isn't an
option. That said, <code>palloc()</code> and the <em>memory contexts</em> implementation makes
that as easy as <em>in lots of cases, you don't have to think about it</em>.</p>

<p>PostgreSQL is very well known for its reliability, and that's not something
that just happened. All the source code is organized in a way that makes it
possible, so your main task is to write code that looks as much as possible
like the existing surrounding code. And we all know how to <em>copy paste</em>,
right?</p>

<p>So, my current work on the <em>extensions</em> is to make it so that if you install
<a href="http://www.postgresql.org/docs/9.0/interactive/hstore.html">hstore</a> in your database (to pick an example), your backup won't contain any
<em>hstore</em> specific objects (types, functions, operators, index support objects,
etc) but rather a single line that tells PostgreSQL to install <em>hstore</em> again.</p>

<pre class="src">
CREATE EXTENSION hstore;
</pre>

<p>The feature already works in <a href="http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=shortlog;h=refs/heads/extension">my git branch</a> and I'm extracting infrastructure
work in there to ease review. That's when <code>git</code> helps a lot. What I've done is
create a new branch from the master one, then <a href="http://www.kernel.org/pub/software/scm/git/docs/git-cherry-pick.html">cherry pick</a> the patches of
interest. Well sometime you have to resort to helper tools. I've been told
after the fact that using <code>git cherry-pick -n</code> would have allowed the
following to be much simpler:</p>

<pre class="src">
dim ~/dev/PostgreSQL/postgresql-extension git cherry-pick 3f291b4f82598309368610431cf2a18d7b7a7950
error: could not apply 3f291b4... Implement dependency tracking for CREATE EXTENSION, and DROP EXTENSION ... CASCADE.
hint: after resolving the conflicts, mark the corrected paths
hint: with 'git add &lt;paths&gt;' or 'git rm &lt;paths&gt;'
hint: and commit the result with 'git commit -c 3f291b4'
dim ~/dev/PostgreSQL/postgresql-extension git status \
| awk '/modified/ &amp;&amp; ! /both/ &amp;&amp; ! /genfile/ {print $3}
       /deleted/ {print $5}
       /both/    {print $4}' \
| xargs echo git reset -- \
| sh
Unstaged changes after reset:
M       src/backend/catalog/dependency.c
M       src/backend/catalog/heap.c
M       src/backend/catalog/pg_aggregate.c
M       src/backend/catalog/pg_conversion.c
M       src/backend/catalog/pg_namespace.c
M       src/backend/catalog/pg_operator.c
M       src/backend/catalog/pg_proc.c
M       src/backend/catalog/pg_type.c
M       src/backend/commands/extension.c
M       src/backend/commands/foreigncmds.c
M       src/backend/commands/opclasscmds.c
M       src/backend/commands/proclang.c
M       src/backend/commands/tsearchcmds.c
M       src/backend/nodes/copyfuncs.c
M       src/backend/nodes/equalfuncs.c
M       src/backend/parser/gram.y
M       src/include/catalog/dependency.h
M       src/include/commands/extension.h
M       src/include/nodes/parsenodes.h
</pre>

<p>That's what I did to prepare a side branch containing only changes to a part
of my current work. I had to filter the diff so much only because I'm
commiting in rather big steps, rather than very little chunks at a time. In
this case that means I had a single patch with several <em>units</em> of changes and
I wanted to extract only one. Well, it happens that even in such a case, <code>git</code>
is helping!</p>

<p>There's more to say about the <em>extension</em> related feature of course, but
that'll do it for this article. I'd just end up with the following nice
<em>diffstat</em> of 4 days of work:</p>

<pre class="src">
dim ~/dev/PostgreSQL/postgresql-extension git --no-pager diff master..|wc -l
    3897
dim ~/dev/PostgreSQL/postgresql-extension git --no-pager diff master..|diffstat
 doc/src/sgml/extend.sgml               |   46 ++
 doc/src/sgml/ref/allfiles.sgml         |    2
 doc/src/sgml/ref/create_extension.sgml |   95 ++++
 doc/src/sgml/ref/drop_extension.sgml   |  115 +++++
 doc/src/sgml/reference.sgml            |    2
 src/backend/access/transam/xlog.c      |   95 ----
 src/backend/catalog/Makefile           |    1
 src/backend/catalog/dependency.c       |   25 +
 src/backend/catalog/heap.c             |    9
 src/backend/catalog/objectaddress.c    |   14
 src/backend/catalog/pg_aggregate.c     |    7
 src/backend/catalog/pg_conversion.c    |    7
 src/backend/catalog/pg_namespace.c     |   13
 src/backend/catalog/pg_operator.c      |    7
 src/backend/catalog/pg_proc.c          |    7
 src/backend/catalog/pg_type.c          |    8
 src/backend/commands/Makefile          |    3
 src/backend/commands/comment.c         |    6
 src/backend/commands/extension.c       |  688 +++++++++++++++++++++++++++++++++
 src/backend/commands/foreigncmds.c     |   19
 src/backend/commands/functioncmds.c    |    7
 src/backend/commands/opclasscmds.c     |   13
 src/backend/commands/proclang.c        |    7
 src/backend/commands/tsearchcmds.c     |   25 +
 src/backend/nodes/copyfuncs.c          |   22 +
 src/backend/nodes/equalfuncs.c         |   18
 src/backend/parser/gram.y              |   51 ++
 src/backend/tcop/utility.c             |   27 +
 src/backend/utils/adt/genfile.c        |  193 +++++++++
 src/backend/utils/init/postinit.c      |    3
 src/backend/utils/misc/Makefile        |    2
 src/backend/utils/misc/cfparser.c      |  113 +++++
 src/backend/utils/misc/guc-file.l      |   26 -
 src/backend/utils/misc/guc.c           |  160 ++++++-
 src/bin/pg_dump/common.c               |    6
 src/bin/pg_dump/pg_dump.c              |  520 ++++++++++++++++++++++--
 src/bin/pg_dump/pg_dump.h              |   10
 src/bin/pg_dump/pg_dump_sort.c         |    7
 src/bin/psql/command.c                 |    3
 src/bin/psql/describe.c                |   45 ++
 src/bin/psql/describe.h                |    3
 src/bin/psql/help.c                    |    1
 src/include/catalog/dependency.h       |    1
 src/include/catalog/indexing.h         |    6
 src/include/catalog/pg_extension.h     |   61 ++
 src/include/catalog/pg_proc.h          |   13
 src/include/catalog/toasting.h         |    1
 src/include/commands/extension.h       |   54 ++
 src/include/nodes/nodes.h              |    2
 src/include/nodes/parsenodes.h         |   20
 src/include/parser/kwlist.h            |    1
 src/include/utils/builtins.h           |    4
 src/include/utils/cfparser.h           |   18
 src/include/utils/guc.h                |   11
 src/makefiles/pgxs.mk                  |   21 -
 55 files changed, 2456 insertions(+), 188 deletions(-)
</pre>



<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/backup.html">backup</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 15 Oct 2010 11:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/10/15-extensions-writing-a-patch-for-postgresql.html</guid>
</item>
<item>
  <title>Date puzzle for starters</title>
  <link>http://tapoueh.org/blog/2010/10/08-date-puzzle-for-starters.html</link>
  <description><![CDATA[h1>Date puzzle for starters</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/10/index.html>10</a> / </div>
<div class="date">Friday, October 08 2010, 10:00</div>
</div>
<div id="article">
<p>The <a href="http://www.postgresql.org/">PostgreSQL</a> <code>IRC</code> channel is a good place to be, for all the very good help
you can get there, because people are always wanting to remain helpful,
because of the off-topics discussions sometime, or to get to talk with
community core members. And to start up your day too.</p>

<p>This morning's question started simple : “how can I check if today is the
&quot;first sunday fo the month&quot;. or &quot;the second tuesday of the month&quot; etc?”</p>

<p>And the first version of the answer, quite simple it is too:</p>

<pre class="src">
dim=#   with begin(d) as (select date_trunc(<span style="color: #ad7fa8; font-style: italic;">'month'</span>, <span style="color: #ad7fa8; font-style: italic;">'today'</span>::date)::date)
dim-# select d + 7 - extract(dow from d)::int as sunday from begin;
   sunday
<span style="color: #888a85;">------------
</span> 2010-10-03
(1 row)
</pre>

<p>So you just have to compare the result of the function with <code>'today'::date</code>
and there you go. The problem is that the question could be read in the
other way round, like, what is today in <em>first</em> or <em>second</em> <em>day name</em> of this
month <em>format</em>? Once more, <a href="http://blog.rhodiumtoad.org.uk/">RhodiumToad</a> to the rescue:</p>

<pre class="src">
select to_char(current_date,
               <span style="color: #ad7fa8; font-style: italic;">'"'</span> || ((ARRAY[<span style="color: #ad7fa8; font-style: italic;">'First'</span>,<span style="color: #ad7fa8; font-style: italic;">'Second'</span>,<span style="color: #ad7fa8; font-style: italic;">'Third'</span>,<span style="color: #ad7fa8; font-style: italic;">'Fourth'</span>,<span style="color: #ad7fa8; font-style: italic;">'Fifth'</span>])
                             [(extract(day from current_date)::integer - 1)/7 + 1]
                      )
                   || <span style="color: #ad7fa8; font-style: italic;">'" Day'</span>);
     to_char
<span style="color: #888a85;">------------------
</span> Second Friday
(1 row)
</pre>

<p>That's a straight answer to the question, read that way!</p>

<p>But the part that I found nice to play with was my first reading of the
question, as I don't get to lose my ideas that easily, you see… so what
about writing a function to return the date of any <em>nth</em> occurrence of a given
<em>day of week</em> in a <em>given month</em>, defaulting to this very month?</p>

<pre class="src">
create or replace function get_nth_dow_of_month
 (
  nth int,
  dow int,
  begin date default current_date
 )
 returns date
 language sql
 strict
 as
$$
with month(d) as (
  select generate_series(date_trunc(<span style="color: #ad7fa8; font-style: italic;">'month'</span>, $3),
                         date_trunc(<span style="color: #ad7fa8; font-style: italic;">'month'</span>, $3) + interval <span style="color: #ad7fa8; font-style: italic;">'1 month - 1 day'</span>,
                         interval <span style="color: #ad7fa8; font-style: italic;">'1 day'</span>)::date
),
     repeat as (
  select d, extract(dow from d) as dow, (d - date_trunc(<span style="color: #ad7fa8; font-style: italic;">'month'</span>, $3)::date) / 7 as repeat
    from month
)
select d
  from repeat
 where dow = $2 and repeat = $1;
$$;

dim=# select get_nth_dow_of_month(0, 0);
 get_nth_dow_of_month
<span style="color: #888a85;">----------------------
</span> 2010-10-03
(1 row)

dim=# select get_nth_dow_of_month(1, 4, <span style="color: #ad7fa8; font-style: italic;">'2010-09-12'</span>);
 get_nth_dow_of_month
<span style="color: #888a85;">----------------------
</span> 2010-09-09
(1 row)
</pre>

<p>So you see we just got the first Sunday of this month <code>(0, 0)</code> and the second
Thursday <code>(1, 4)</code> of the previous one. Any date within a month is a good way
to tell which month you want to work in, as the function's written, abusing
<code>date_trunc</code> like it does.</p>

<p>Now the way the function is written is unfinished. You want to fix it in one
of two ways. Either stop using <code>generate_series</code> to only output one row at a
time, or fix the <code>API</code> so that you can ask for more than a <em>nth dow</em> at a
time. Of course, that was a starter for me, not a problem I need to solve
directly, and that was a good excuse for a blog entry, so I won't fix
it. That's left as an exercise to our interested readers!</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/9.1.html">9.1</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 08 Oct 2010 10:00:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/10/08-date-puzzle-for-starters.html</guid>
</item>
<item>
  <title>Resuming work on Extensions, first little step</title>
  <link>http://tapoueh.org/blog/2010/10/07-resuming-work-on-extensions-first-little-step.html</link>
  <description><![CDATA[h1>Resuming work on Extensions, first little step</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/10/index.html>10</a> / </div>
<div class="date">Thursday, October 07 2010, 17:15</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>Yeah I'm back on working on my part of the extension thing in <a href="http://www.postgresql.org/">PostgreSQL</a>.</p>

<p>First step is a little one, but as it has public consequences, I figured I'd
talk about it already. I've just refreshed my <code>git</code> repository to follow the
new <code>master</code> one, and you can see that here
<a href="http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=commitdiff;h=9a88e9de246218e93c04b6b97e1ef61d97925430">http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=commitdiff;h=9a88e9de246218e93c04b6b97e1ef61d97925430</a>.</p>

<p>It's been easier than I feared, mainly:</p>

<pre class="src">
$ git --no-pager diff master..extension
$ git --no-pager format-patch master..extension
$ cp 0001-First-stab-at-writing-pg_execute_from_file-function.patch ..
$ git checkout master
$ git pull -f pgmaster
$ git reset --hard pgmaster/master
$ git checkout extension
$ git reset --hard master
$ git am -s ../0001-First-stab-at-writing-pg_execute_from_file-function.edit.patch
$ git status
$ git log --short | head
$ git log -n2 --oneline
$ git push -f
</pre>

<p>So that's still more steps that one want to call dead simple, but still. The
<code>format-patch</code> command is to save my work away (all patches that are in the
<em>extension</em> branch but not in the <em>master</em> — well that was only one of them
here). Then, as the master repository <code>URL</code> didn't change, I can simply <code>pull</code>
the changes in. Of course I had a nice message <em>warning: no common commits</em>.</p>

<p>Once pulled, I trashed my local copy and replaced it with the new official
one, that's <code>git reset --hard pgmaster/master</code>, then in the <em>extension</em> branch I
could trash it and have it linked to the local <code>master</code> again.</p>

<p>Of course, the <code>git am</code> method wouldn't apply my patch as-is, there was some
underlying changes in the source files, the identification tag changed from
<code>$PostgreSQL$</code> to, e.g., <code>src/backend/utils/adt/genfile.c</code>, and I had to cope
with that. Maybe there's some tool (<code>git am -3</code> ?) to do it automatically, I
just copy edited the <code>.patch</code> file.</p>

<p>Lastly, it's all about checking the result and publishing the result. This
last line is <code>git push -f</code> and is when I just trashed and replaced my
<a href="http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=summary">postgresql-extension</a> community repository. I don't think anybody was
following it, but should it be the case, you will have to <em>reinit</em> your copy.</p>

<p>More blog posts to come about extensions, as I arranged to have some real
time to devote on the topic. At least I was able to arrange things so that I
can work on the subject for real, and the first thing I did, the very night
before it was meant to begin, is catch a <em>tonsillitis</em>. Lost about a week, not
the project! Stay tuned!</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/extensions.html">Extensions</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 07 Oct 2010 17:15:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/10/07-resuming-work-on-extensions-first-little-step.html</guid>
</item>
<item>
  <title>el-get reaches 1.0</title>
  <link>http://tapoueh.org/blog/2010/10/blog/2010/10/07-el-get-reaches-10.html</link>
  <description><![CDATA[<p>It's been a week since the last commits in the <a href="http://github.com/dimitri/el-get">el-get repository</a>, and those
were all about fixing and adding recipes, and about notifications. Nothing
like <em>core plumbing</em> you see. Also, <code>0.9</code> was released on <em>2010-08-24</em> and felt
pretty complete already, then received lots of improvements. It's high time
to cross the line and call it <code>1.0</code>!</p>

<p>Now existing users will certainly just be moderatly happy to see the tool
reach that version number, depending whether they think more about the bugs
they want to see fixed (ftp is supported, only called http) and the new
features they want to see in (<em>info</em> documentation) or more about what <code>el-get</code>
does for them already today...</p>

<p>For the new users, or the yet-to-be-convinced users, let's take some time
and talk about <code>el-get</code>. A <em>FAQ</em> like session might be best.</p>

<h3>How is el-get different from ELPA?</h3>

<p><a href="http://tromey.com/elpa/">ELPA</a> is the <em>Emacs Lisp Package Archive</em> and is also known as <code>package.el</code>, to
be included in Emacs 24. This allows emacs list extension authors to <em>package</em>
their work. That means they have to follow some guidelines and format their
contribution, then propose it for upload.</p>

<p>This requires licence checks (good) and for the <a href="http://elpa.gnu.org/">new official ELPA mirror</a> it
even requires dead-tree papers exchange and contracts and copyright
assignments, I believe.</p>


<h3>Why have both?</h3>

<p class="first">While <em>ELPA</em> is a great thing to have, it's so easy to find some high quality
Emacs extension out there that are not part of the offer. Either authors are
not interrested into uploading to ELPA, or they don't know how to properly
<em>package</em> for it (it's only simple for single file extensions, see).</p>

<p>So <code>el-get</code> is a pragmatic answer here. It's there because it so happens that
I don't depend only on emacs extensions that are available with Emacs
itself, in my distribution <code>site-lisp</code> and in <code>ELPA</code>. I need some more, and I
don't need it to be complex to find it, fetch it, init it and use it.</p>

<p>Of course I could try and package any extension I find I need and submit it
to <code>ELPA</code>, but really, to do that nicely I'd need to contact the extension
author (<em>upstream</em>) for him to accept my patch, and then consider a fork.</p>

<p>With <code>el-get</code> I propose distributed packaging if you will. Let's have a look
at two <em>recipes</em> here. First, the <code>el-get</code> one itself:</p>

<pre class="src">
(<span style="color: #da70d6;">:name</span> el-get
       <span style="color: #da70d6;">:type</span> git
       <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"git://github.com/dimitri/el-get.git"</span>
       <span style="color: #da70d6;">:features</span> el-get
       <span style="color: #da70d6;">:compile</span> <span style="color: #bc8f8f;">"el-get.el"</span>)
</pre>

<p>Then a much more complex one, the <a href="http://bbdb.sourceforge.net/">bbdb</a> one:</p>

<pre class="src">
(<span style="color: #da70d6;">:name</span> bbdb
       <span style="color: #da70d6;">:type</span> git
       <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"git://github.com/barak/BBDB.git"</span>
       <span style="color: #da70d6;">:load-path</span> (<span style="color: #bc8f8f;">"./lisp"</span> <span style="color: #bc8f8f;">"./bits"</span>)
       <span style="color: #da70d6;">:build</span> (<span style="color: #bc8f8f;">"./configure"</span> <span style="color: #bc8f8f;">"make autoloads"</span> <span style="color: #bc8f8f;">"make"</span>)
       <span style="color: #da70d6;">:build/darwin</span> (<span style="color: #bc8f8f;">"./configure --with-emacs=/Applications/Emacs.app/Contents</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">/MacOS/Emacs" "make autoloads" "make")</span>
       <span style="color: #da70d6;">:features</span> bbdb
       <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> () (bbdb-initialize))
       <span style="color: #da70d6;">:info</span> <span style="color: #bc8f8f;">"texinfo"</span>)
</pre>

<p>The idea is that it's much simpler to just come up with a recipe like this
than to patch existing code and upload it to <code>ELPA</code>. And anybody can share
their <em>recipes</em> very easily, with or without proposing them to me, even if I
very much like to add some more in the official <code>el-get</code> list.</p>

<p>As a user, you don't even need to twiddle with recipes, mostly, because we
already have them for you. What you do instead is list them in
<code>el-get-sources</code>.</p>


<h3>So, show me how you use it?</h3>

<p class="first">Yeah, sure. Here's a sample of my <code>dim-packages.el</code> file, part of my <code>.emacs</code>
<em>suite</em>. Yeah a single <code>.emacs</code> does not suit me anymore, it's a complete
<code>.emacs.d</code> now, but that's because that's how I like it organised, you
know. So, here's the example:</p>

<pre class="src">
<span style="color: #b22222;">;;; </span><span style="color: #b22222;">dim-packages.el --- Dimitri Fontaine
</span><span style="color: #b22222;">;;</span><span style="color: #b22222;">
</span><span style="color: #b22222;">;; </span><span style="color: #b22222;">Set el-get-sources and call el-get to init all those packages we need.
</span>(<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">el-get</span>)
(add-to-list 'el-get-recipe-path <span style="color: #bc8f8f;">"~/dev/emacs/el-get/recipes"</span>)

(setq el-get-sources
      '(cssh el-get switch-window vkill google-maps yasnippet verbiste mailq sic<span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">p</span>

        (<span style="color: #da70d6;">:name</span> magit
               <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> () (global-set-key (kbd <span style="color: #bc8f8f;">"C-x C-z"</span>) 'magit-status))<span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">)</span>

        (<span style="color: #da70d6;">:name</span> asciidoc
               <span style="color: #da70d6;">:type</span> elpa
               <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> ()
                        (autoload 'doc-mode <span style="color: #bc8f8f;">"doc-mode"</span> nil t)
                        (add-to-list 'auto-mode-alist '(<span style="color: #bc8f8f;">"\\.adoc$"</span> . doc-mode))
                        (add-hook 'doc-mode-hook '(<span style="color: #7f007f;">lambda</span> ()
                                                    (turn-on-auto-fill)
                                                    (<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">asciidoc</span>)))))

        (<span style="color: #da70d6;">:name</span> goto-last-change
               <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> ()
                        (global-set-key (kbd <span style="color: #bc8f8f;">"C-x C-/"</span>) 'goto-last-change)))

        (<span style="color: #da70d6;">:name</span> auto-dictionary <span style="color: #da70d6;">:type</span> elpa)
        (<span style="color: #da70d6;">:name</span> gist            <span style="color: #da70d6;">:type</span> elpa)
        (<span style="color: #da70d6;">:name</span> lisppaste       <span style="color: #da70d6;">:type</span> elpa)))

(el-get) <span style="color: #b22222;">; </span><span style="color: #b22222;">that could/should be (el-get 'sync)
</span>(<span style="color: #7f007f;">provide</span> '<span style="color: #5f9ea0;">dim-packages</span>)
</pre>

<p>Ok that's not all of it, but it should give you a nice idea about what
problem I solve with <code>el-get</code> and how. In my emacs startup sequence, somewhere
inside my <code>~/.emacs.d/init.el</code> file, I have a line that says <code>(require
'dim-packages)</code>. This will set <code>el-get-sources</code> to the list just above, then
call <code>(el-get)</code>, the main function.</p>

<p>This main function will check each given package and install it if necessary
(including <em>build</em> the package, as in <code>make autoloads; make</code>), then <em>init</em>
it. What <em>init</em> means exactly depends on what the recipe says. That can
include <em>byte-compiling</em> some files, caring about <em>load-path</em>, <em>load</em> and <em>require</em>
commands, caring about <em>Info-directory-list</em> and <code>ginstall-info</code> too, and some
more.</p>

<p>So in short, it will make it so that your emacs instance is ready for you to
use. And you get the choice to use the given <code>el-get</code> recipes as-is, like I
did for <code>cssh</code>, <code>el-get</code>, <code>switch-window</code> and others, up to <code>sicp</code>, or to tweak them
partly, like in the <code>magit</code> example where I've added a user init function (the
<code>:after</code> property) to bind <code>magit-status</code> to <code>C-x C-z</code> here. You can even embed a
full recipe inline in the <code>el-get-sources</code> variable, that's the case for each
item that gives its <code>:type</code> property, like <code>asciidoc</code> or <code>gist</code>.</p>

<p>And, as you see, we're using <code>ELPA</code> a lot in this sources, so <code>el-get</code> isn't
striving to replace it at all, it's just trying to accomodate to a broader
world.</p>


<h3>I read that the el-get-install is asynchronous, tell me more.</h3>

<p class="first">Yeah, right, the example above says <code>(el-get)</code> at its end, and in the cases
when <code>el-get</code> has to install or build sources, this will be done
asynchronously. Which means that not only several sources will get processed
at once (using your multi cores, yeah) but that it will let emacs start up
as if it was ready.</p>

<p>It happens that's usually what I want, because I seldom add sources in my
setup, but in theory that can break your emacs. What I do is start it again
or fix by hand, what you can do instead is <code>(el-get 'sync)</code> so that emacs is
blocked waiting for <code>el-get</code> to properly install and initialize all the
sources you've setup. Your choice, just add the <code>'sync</code> parameter there.</p>


<h3>Now, explain me why it is better this way, again, please?</h3>

<p class="first">Well, before I wrote <code>el-get</code>, trying out a new extension, setting it up etc
was something quite involved, and that I had to redo on several
machines. The only way not to redo it was to include the extension's code
into my own <code>git</code> repository (my <code>emacs.d</code> is in <code>git</code>, of course).</p>

<p>And putting code I don't maintain into my own <code>git</code> repository is something I
frown upon. I have no business pretending I'll maintain the code, and I know
I will never think to check the <code>URL</code> where I've found it for updates. That's
when I though noting down the <code>URL</code> somewhere.</p>

<p>Also, what about sharing the extension with friends. Uneasy, at best.</p>

<p>Enters <code>el-get</code> and I can just add an entry to <code>el-get-sources</code>, based on a file
somewhere in my own <code>el-get-recipe-path</code>. When I'm happy with this file, I can
contribute it to <code>el-get</code> proper or just send it over to any interested
recipient. Adding it to your sources is easy. Copy the file in your
<code>el-get-recipe-path</code> somewhere, add its name to your <code>el-get-sources</code>, then <code>M-x
el-get-install</code> it. Done. If you were given the <code>:after</code> function, it's all
setup already.</p>

<p>If you contribute the recipe to <code>el-get</code>, then <code>M-x el-get-update RET el-get
RET</code> and you get it on this other machine where you also use Emacs. Or you
can tell your friend to do the same and benefit from your <em>packaging</em>.</p>


<h3>Well, sounds good. What recipes do you have already?</h3>

<p class="first">I count <code>67</code> of them already. One of them is just a book in <em>info</em> format, with
no <em>elisp</em> at all, can you spot it?</p>

<pre class="src">
ELISP&gt; (directory-files <span style="color: #bc8f8f;">"~/dev/emacs/el-get/recipes/"</span> nil <span style="color: #bc8f8f;">"el$"</span>)

(<span style="color: #bc8f8f;">"auctex.el"</span> <span style="color: #bc8f8f;">"auto-complete-etags.el"</span> <span style="color: #bc8f8f;">"auto-complete-extension.el"</span>
<span style="color: #bc8f8f;">"auto-complete.el"</span> <span style="color: #bc8f8f;">"auto-install.el"</span> <span style="color: #bc8f8f;">"autopair.el"</span> <span style="color: #bc8f8f;">"bbdb.el"</span>
<span style="color: #bc8f8f;">"blender-python-mode.el"</span> <span style="color: #bc8f8f;">"color-theme-twilight.el"</span> <span style="color: #bc8f8f;">"color-theme.el"</span>
<span style="color: #bc8f8f;">"cssh.el"</span> <span style="color: #bc8f8f;">"django-mode.el"</span> <span style="color: #bc8f8f;">"el-get.el"</span> <span style="color: #bc8f8f;">"emacs-w3m.el"</span> <span style="color: #bc8f8f;">"emacschrome.el"</span>
<span style="color: #bc8f8f;">"emms.el"</span> <span style="color: #bc8f8f;">"ensime.el"</span> <span style="color: #bc8f8f;">"erc-highlight-nicknames.el"</span> <span style="color: #bc8f8f;">"erc-track-score.el"</span>
<span style="color: #bc8f8f;">"escreen.el"</span> <span style="color: #bc8f8f;">"filladapt.el"</span> <span style="color: #bc8f8f;">"flyguess.el"</span> <span style="color: #bc8f8f;">"gist.el"</span> <span style="color: #bc8f8f;">"google-maps.el"</span>
<span style="color: #bc8f8f;">"google-weather.el"</span> <span style="color: #bc8f8f;">"goto-last-change.el"</span> <span style="color: #bc8f8f;">"haskell-mode.el"</span>
<span style="color: #bc8f8f;">"highlight-parentheses.el"</span> <span style="color: #bc8f8f;">"hl-sexp.el"</span> <span style="color: #bc8f8f;">"levenshtein.el"</span> <span style="color: #bc8f8f;">"magit.el"</span>
<span style="color: #bc8f8f;">"mailq.el"</span> <span style="color: #bc8f8f;">"maxframe.el"</span> <span style="color: #bc8f8f;">"multi-term.el"</span> <span style="color: #bc8f8f;">"muse-blog.el"</span> <span style="color: #bc8f8f;">"nognus.el"</span>
<span style="color: #bc8f8f;">"nterm.el"</span> <span style="color: #bc8f8f;">"nxhtml.el"</span> <span style="color: #bc8f8f;">"offlineimap.el"</span> <span style="color: #bc8f8f;">"package.el"</span> <span style="color: #bc8f8f;">"popup-kill-ring.el"</span>
<span style="color: #bc8f8f;">"pos-tip.el"</span> <span style="color: #bc8f8f;">"pov-mode.el"</span> <span style="color: #bc8f8f;">"psvn.el"</span> <span style="color: #bc8f8f;">"pymacs.el"</span> <span style="color: #bc8f8f;">"rainbow-mode.el"</span>
<span style="color: #bc8f8f;">"rcirc-groups.el"</span> <span style="color: #bc8f8f;">"rinari.el"</span> <span style="color: #bc8f8f;">"ropemacs.el"</span> <span style="color: #bc8f8f;">"rt-liberation.el"</span> <span style="color: #bc8f8f;">"scratch.el"</span>
<span style="color: #bc8f8f;">"session.el"</span> <span style="color: #bc8f8f;">"sicp.el"</span> <span style="color: #bc8f8f;">"smex.el"</span> <span style="color: #bc8f8f;">"switch-window.el"</span> <span style="color: #bc8f8f;">"textile-mode.el"</span>
<span style="color: #bc8f8f;">"todochiku.el"</span> <span style="color: #bc8f8f;">"twitter.el"</span> <span style="color: #bc8f8f;">"twittering-mode.el"</span> <span style="color: #bc8f8f;">"undo-tree.el"</span>
<span style="color: #bc8f8f;">"verbiste.el"</span> <span style="color: #bc8f8f;">"vimpulse-surround.el"</span> <span style="color: #bc8f8f;">"vimpulse.el"</span> <span style="color: #bc8f8f;">"vkill.el"</span> <span style="color: #bc8f8f;">"xcscope.el"</span>
<span style="color: #bc8f8f;">"xml-rpc-el.el"</span> <span style="color: #bc8f8f;">"yasnippet.el"</span>)
</pre>


<h3>Ok, I want to try it, what's next?</h3>

<p class="first">Visit the following <code>URL</code> <a href="http://github.com/dimitri/el-get">http://github.com/dimitri/el-get</a> and follow the
install instructions. You're given a <em>scratch installer</em> there, that's some
<em>elisp</em> code you copy paste into <code>*scratch*</code> then execute there, and you have
<code>el-get</code> ready to serve.</p>

<p>An excellent idea I stole at <code>ELPA</code>!</p>


<h3>Hey, I already know what el-get is, what's new in 1.0?</h3>

<p class="first">The <em>changelog</em> is quite full of good stuff, really:</p>

<ul>
<li>Implement el-get recipes so that el-get-sources can be a simple list
of symbols. Now that there's an authoritative git repository, where
to share the recipes is easy.</li>

<li>Add support for emacswiki directly, save from having to enter the URL</li>

<li>Implement package status on-disk saving so that installing over a
previously failed install is in theory possible. Currently `el-get'
will refrain from removing your package automatically, though.</li>

<li>Fix ELPA remove method, adding a &quot;removed&quot; state too.</li>

<li>Implement CVS login support.</li>

<li>Add lots of recipes</li>

<li>Add support for `system-type' specific build commands</li>

<li>Byte compile files from the load-path entries or :compile files</li>

<li>Implement support for git submodules with the command
`git submodule update &mdash;init &mdash;recursive`</li>

<li>Add catch-all post-install and post-update hooks</li>

<li>Add desktop notification on install/update.</li>
</ul>


<h3>I'm still using the deprecated emacswiki version, what now?</h3>

<p class="first">That version didn't have recipes, and the new version should be perfectly
happy with your current <code>el-get-sources</code>, so that I recommend using the
<em>scratch installer</em> too. Don't forget to add <code>el-get</code> itself into your
<code>el-get-sources</code> list, of course!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 07 Oct 2010 13:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/10/blog/2010/10/07-el-get-reaches-10.html</guid>
</item>
<item>
  <title>el-get reaches 1.0</title>
  <link>http://tapoueh.org/blog/2010/10/07-el-get-reaches-10.html</link>
  <description><![CDATA[h1>el-get reaches 1.0</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/10/index.html>10</a> / </div>
<div class="date">Thursday, October 07 2010, 13:30</div>
</div>
<div id="article">
<p>It's been a week since the last commits in the <a href="http://github.com/dimitri/el-get">el-get repository</a>, and those
were all about fixing and adding recipes, and about notifications. Nothing
like <em>core plumbing</em> you see. Also, <code>0.9</code> was released on <em>2010-08-24</em> and felt
pretty complete already, then received lots of improvements. It's high time
to cross the line and call it <code>1.0</code>!</p>

<p>Now existing users will certainly just be moderatly happy to see the tool
reach that version number, depending whether they think more about the bugs
they want to see fixed (ftp is supported, only called http) and the new
features they want to see in (<em>info</em> documentation) or more about what <code>el-get</code>
does for them already today...</p>

<p>For the new users, or the yet-to-be-convinced users, let's take some time
and talk about <code>el-get</code>. A <em>FAQ</em> like session might be best.</p>

<h3>How is el-get different from ELPA?</h3>

<p><a href="http://tromey.com/elpa/">ELPA</a> is the <em>Emacs Lisp Package Archive</em> and is also known as <code>package.el</code>, to
be included in Emacs 24. This allows emacs list extension authors to <em>package</em>
their work. That means they have to follow some guidelines and format their
contribution, then propose it for upload.</p>

<p>This requires licence checks (good) and for the <a href="http://elpa.gnu.org/">new official ELPA mirror</a> it
even requires dead-tree papers exchange and contracts and copyright
assignments, I believe.</p>


<h3>Why have both?</h3>

<p class="first">While <em>ELPA</em> is a great thing to have, it's so easy to find some high quality
Emacs extension out there that are not part of the offer. Either authors are
not interrested into uploading to ELPA, or they don't know how to properly
<em>package</em> for it (it's only simple for single file extensions, see).</p>

<p>So <code>el-get</code> is a pragmatic answer here. It's there because it so happens that
I don't depend only on emacs extensions that are available with Emacs
itself, in my distribution <code>site-lisp</code> and in <code>ELPA</code>. I need some more, and I
don't need it to be complex to find it, fetch it, init it and use it.</p>

<p>Of course I could try and package any extension I find I need and submit it
to <code>ELPA</code>, but really, to do that nicely I'd need to contact the extension
author (<em>upstream</em>) for him to accept my patch, and then consider a fork.</p>

<p>With <code>el-get</code> I propose distributed packaging if you will. Let's have a look
at two <em>recipes</em> here. First, the <code>el-get</code> one itself:</p>

<pre class="src">
(<span style="color: #729fcf;">:name</span> el-get
       <span style="color: #729fcf;">:type</span> git
       <span style="color: #729fcf;">:url</span> <span style="color: #ad7fa8; font-style: italic;">"git://github.com/dimitri/el-get.git"</span>
       <span style="color: #729fcf;">:features</span> el-get
       <span style="color: #729fcf;">:compile</span> <span style="color: #ad7fa8; font-style: italic;">"el-get.el"</span>)
</pre>

<p>Then a much more complex one, the <a href="http://bbdb.sourceforge.net/">bbdb</a> one:</p>

<pre class="src">
(<span style="color: #729fcf;">:name</span> bbdb
       <span style="color: #729fcf;">:type</span> git
       <span style="color: #729fcf;">:url</span> <span style="color: #ad7fa8; font-style: italic;">"git://github.com/barak/BBDB.git"</span>
       <span style="color: #729fcf;">:load-path</span> (<span style="color: #ad7fa8; font-style: italic;">"./lisp"</span> <span style="color: #ad7fa8; font-style: italic;">"./bits"</span>)
       <span style="color: #729fcf;">:build</span> (<span style="color: #ad7fa8; font-style: italic;">"./configure"</span> <span style="color: #ad7fa8; font-style: italic;">"make autoloads"</span> <span style="color: #ad7fa8; font-style: italic;">"make"</span>)
       <span style="color: #729fcf;">:build/darwin</span> (<span style="color: #ad7fa8; font-style: italic;">"./configure --with-emacs=/Applications/Emacs.app/Contents</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">/MacOS/Emacs" "make autoloads" "make")</span>
       <span style="color: #729fcf;">:features</span> bbdb
       <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> () (bbdb-initialize))
       <span style="color: #729fcf;">:info</span> <span style="color: #ad7fa8; font-style: italic;">"texinfo"</span>)
</pre>

<p>The idea is that it's much simpler to just come up with a recipe like this
than to patch existing code and upload it to <code>ELPA</code>. And anybody can share
their <em>recipes</em> very easily, with or without proposing them to me, even if I
very much like to add some more in the official <code>el-get</code> list.</p>

<p>As a user, you don't even need to twiddle with recipes, mostly, because we
already have them for you. What you do instead is list them in
<code>el-get-sources</code>.</p>


<h3>So, show me how you use it?</h3>

<p class="first">Yeah, sure. Here's a sample of my <code>dim-packages.el</code> file, part of my <code>.emacs</code>
<em>suite</em>. Yeah a single <code>.emacs</code> does not suit me anymore, it's a complete
<code>.emacs.d</code> now, but that's because that's how I like it organised, you
know. So, here's the example:</p>

<pre class="src">
<span style="color: #888a85;">;;; </span><span style="color: #888a85;">dim-packages.el --- Dimitri Fontaine
</span><span style="color: #888a85;">;;</span><span style="color: #888a85;">
</span><span style="color: #888a85;">;; </span><span style="color: #888a85;">Set el-get-sources and call el-get to init all those packages we need.
</span>(<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">el-get</span>)
(add-to-list 'el-get-recipe-path <span style="color: #ad7fa8; font-style: italic;">"~/dev/emacs/el-get/recipes"</span>)

(setq el-get-sources
      '(cssh el-get switch-window vkill google-maps yasnippet verbiste mailq sic<span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">p</span>

        (<span style="color: #729fcf;">:name</span> magit
               <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> () (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-x C-z"</span>) 'magit-status))<span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">)</span>

        (<span style="color: #729fcf;">:name</span> asciidoc
               <span style="color: #729fcf;">:type</span> elpa
               <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> ()
                        (autoload 'doc-mode <span style="color: #ad7fa8; font-style: italic;">"doc-mode"</span> nil t)
                        (add-to-list 'auto-mode-alist '(<span style="color: #ad7fa8; font-style: italic;">"\\.adoc$"</span> . doc-mode))
                        (add-hook 'doc-mode-hook '(<span style="color: #729fcf; font-weight: bold;">lambda</span> ()
                                                    (turn-on-auto-fill)
                                                    (<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">asciidoc</span>)))))

        (<span style="color: #729fcf;">:name</span> goto-last-change
               <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> ()
                        (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-x C-/"</span>) 'goto-last-change)))

        (<span style="color: #729fcf;">:name</span> auto-dictionary <span style="color: #729fcf;">:type</span> elpa)
        (<span style="color: #729fcf;">:name</span> gist            <span style="color: #729fcf;">:type</span> elpa)
        (<span style="color: #729fcf;">:name</span> lisppaste       <span style="color: #729fcf;">:type</span> elpa)))

(el-get) <span style="color: #888a85;">; </span><span style="color: #888a85;">that could/should be (el-get 'sync)
</span>(<span style="color: #729fcf; font-weight: bold;">provide</span> '<span style="color: #8ae234;">dim-packages</span>)
</pre>

<p>Ok that's not all of it, but it should give you a nice idea about what
problem I solve with <code>el-get</code> and how. In my emacs startup sequence, somewhere
inside my <code>~/.emacs.d/init.el</code> file, I have a line that says <code>(require
'dim-packages)</code>. This will set <code>el-get-sources</code> to the list just above, then
call <code>(el-get)</code>, the main function.</p>

<p>This main function will check each given package and install it if necessary
(including <em>build</em> the package, as in <code>make autoloads; make</code>), then <em>init</em>
it. What <em>init</em> means exactly depends on what the recipe says. That can
include <em>byte-compiling</em> some files, caring about <em>load-path</em>, <em>load</em> and <em>require</em>
commands, caring about <em>Info-directory-list</em> and <code>ginstall-info</code> too, and some
more.</p>

<p>So in short, it will make it so that your emacs instance is ready for you to
use. And you get the choice to use the given <code>el-get</code> recipes as-is, like I
did for <code>cssh</code>, <code>el-get</code>, <code>switch-window</code> and others, up to <code>sicp</code>, or to tweak them
partly, like in the <code>magit</code> example where I've added a user init function (the
<code>:after</code> property) to bind <code>magit-status</code> to <code>C-x C-z</code> here. You can even embed a
full recipe inline in the <code>el-get-sources</code> variable, that's the case for each
item that gives its <code>:type</code> property, like <code>asciidoc</code> or <code>gist</code>.</p>

<p>And, as you see, we're using <code>ELPA</code> a lot in this sources, so <code>el-get</code> isn't
striving to replace it at all, it's just trying to accomodate to a broader
world.</p>


<h3>I read that the el-get-install is asynchronous, tell me more.</h3>

<p class="first">Yeah, right, the example above says <code>(el-get)</code> at its end, and in the cases
when <code>el-get</code> has to install or build sources, this will be done
asynchronously. Which means that not only several sources will get processed
at once (using your multi cores, yeah) but that it will let emacs start up
as if it was ready.</p>

<p>It happens that's usually what I want, because I seldom add sources in my
setup, but in theory that can break your emacs. What I do is start it again
or fix by hand, what you can do instead is <code>(el-get 'sync)</code> so that emacs is
blocked waiting for <code>el-get</code> to properly install and initialize all the
sources you've setup. Your choice, just add the <code>'sync</code> parameter there.</p>


<h3>Now, explain me why it is better this way, again, please?</h3>

<p class="first">Well, before I wrote <code>el-get</code>, trying out a new extension, setting it up etc
was something quite involved, and that I had to redo on several
machines. The only way not to redo it was to include the extension's code
into my own <code>git</code> repository (my <code>emacs.d</code> is in <code>git</code>, of course).</p>

<p>And putting code I don't maintain into my own <code>git</code> repository is something I
frown upon. I have no business pretending I'll maintain the code, and I know
I will never think to check the <code>URL</code> where I've found it for updates. That's
when I though noting down the <code>URL</code> somewhere.</p>

<p>Also, what about sharing the extension with friends. Uneasy, at best.</p>

<p>Enters <code>el-get</code> and I can just add an entry to <code>el-get-sources</code>, based on a file
somewhere in my own <code>el-get-recipe-path</code>. When I'm happy with this file, I can
contribute it to <code>el-get</code> proper or just send it over to any interested
recipient. Adding it to your sources is easy. Copy the file in your
<code>el-get-recipe-path</code> somewhere, add its name to your <code>el-get-sources</code>, then <code>M-x
el-get-install</code> it. Done. If you were given the <code>:after</code> function, it's all
setup already.</p>

<p>If you contribute the recipe to <code>el-get</code>, then <code>M-x el-get-update RET el-get
RET</code> and you get it on this other machine where you also use Emacs. Or you
can tell your friend to do the same and benefit from your <em>packaging</em>.</p>


<h3>Well, sounds good. What recipes do you have already?</h3>

<p class="first">I count <code>67</code> of them already. One of them is just a book in <em>info</em> format, with
no <em>elisp</em> at all, can you spot it?</p>

<pre class="src">
ELISP&gt; (directory-files <span style="color: #ad7fa8; font-style: italic;">"~/dev/emacs/el-get/recipes/"</span> nil <span style="color: #ad7fa8; font-style: italic;">"el$"</span>)

(<span style="color: #ad7fa8; font-style: italic;">"auctex.el"</span> <span style="color: #ad7fa8; font-style: italic;">"auto-complete-etags.el"</span> <span style="color: #ad7fa8; font-style: italic;">"auto-complete-extension.el"</span>
<span style="color: #ad7fa8; font-style: italic;">"auto-complete.el"</span> <span style="color: #ad7fa8; font-style: italic;">"auto-install.el"</span> <span style="color: #ad7fa8; font-style: italic;">"autopair.el"</span> <span style="color: #ad7fa8; font-style: italic;">"bbdb.el"</span>
<span style="color: #ad7fa8; font-style: italic;">"blender-python-mode.el"</span> <span style="color: #ad7fa8; font-style: italic;">"color-theme-twilight.el"</span> <span style="color: #ad7fa8; font-style: italic;">"color-theme.el"</span>
<span style="color: #ad7fa8; font-style: italic;">"cssh.el"</span> <span style="color: #ad7fa8; font-style: italic;">"django-mode.el"</span> <span style="color: #ad7fa8; font-style: italic;">"el-get.el"</span> <span style="color: #ad7fa8; font-style: italic;">"emacs-w3m.el"</span> <span style="color: #ad7fa8; font-style: italic;">"emacschrome.el"</span>
<span style="color: #ad7fa8; font-style: italic;">"emms.el"</span> <span style="color: #ad7fa8; font-style: italic;">"ensime.el"</span> <span style="color: #ad7fa8; font-style: italic;">"erc-highlight-nicknames.el"</span> <span style="color: #ad7fa8; font-style: italic;">"erc-track-score.el"</span>
<span style="color: #ad7fa8; font-style: italic;">"escreen.el"</span> <span style="color: #ad7fa8; font-style: italic;">"filladapt.el"</span> <span style="color: #ad7fa8; font-style: italic;">"flyguess.el"</span> <span style="color: #ad7fa8; font-style: italic;">"gist.el"</span> <span style="color: #ad7fa8; font-style: italic;">"google-maps.el"</span>
<span style="color: #ad7fa8; font-style: italic;">"google-weather.el"</span> <span style="color: #ad7fa8; font-style: italic;">"goto-last-change.el"</span> <span style="color: #ad7fa8; font-style: italic;">"haskell-mode.el"</span>
<span style="color: #ad7fa8; font-style: italic;">"highlight-parentheses.el"</span> <span style="color: #ad7fa8; font-style: italic;">"hl-sexp.el"</span> <span style="color: #ad7fa8; font-style: italic;">"levenshtein.el"</span> <span style="color: #ad7fa8; font-style: italic;">"magit.el"</span>
<span style="color: #ad7fa8; font-style: italic;">"mailq.el"</span> <span style="color: #ad7fa8; font-style: italic;">"maxframe.el"</span> <span style="color: #ad7fa8; font-style: italic;">"multi-term.el"</span> <span style="color: #ad7fa8; font-style: italic;">"muse-blog.el"</span> <span style="color: #ad7fa8; font-style: italic;">"nognus.el"</span>
<span style="color: #ad7fa8; font-style: italic;">"nterm.el"</span> <span style="color: #ad7fa8; font-style: italic;">"nxhtml.el"</span> <span style="color: #ad7fa8; font-style: italic;">"offlineimap.el"</span> <span style="color: #ad7fa8; font-style: italic;">"package.el"</span> <span style="color: #ad7fa8; font-style: italic;">"popup-kill-ring.el"</span>
<span style="color: #ad7fa8; font-style: italic;">"pos-tip.el"</span> <span style="color: #ad7fa8; font-style: italic;">"pov-mode.el"</span> <span style="color: #ad7fa8; font-style: italic;">"psvn.el"</span> <span style="color: #ad7fa8; font-style: italic;">"pymacs.el"</span> <span style="color: #ad7fa8; font-style: italic;">"rainbow-mode.el"</span>
<span style="color: #ad7fa8; font-style: italic;">"rcirc-groups.el"</span> <span style="color: #ad7fa8; font-style: italic;">"rinari.el"</span> <span style="color: #ad7fa8; font-style: italic;">"ropemacs.el"</span> <span style="color: #ad7fa8; font-style: italic;">"rt-liberation.el"</span> <span style="color: #ad7fa8; font-style: italic;">"scratch.el"</span>
<span style="color: #ad7fa8; font-style: italic;">"session.el"</span> <span style="color: #ad7fa8; font-style: italic;">"sicp.el"</span> <span style="color: #ad7fa8; font-style: italic;">"smex.el"</span> <span style="color: #ad7fa8; font-style: italic;">"switch-window.el"</span> <span style="color: #ad7fa8; font-style: italic;">"textile-mode.el"</span>
<span style="color: #ad7fa8; font-style: italic;">"todochiku.el"</span> <span style="color: #ad7fa8; font-style: italic;">"twitter.el"</span> <span style="color: #ad7fa8; font-style: italic;">"twittering-mode.el"</span> <span style="color: #ad7fa8; font-style: italic;">"undo-tree.el"</span>
<span style="color: #ad7fa8; font-style: italic;">"verbiste.el"</span> <span style="color: #ad7fa8; font-style: italic;">"vimpulse-surround.el"</span> <span style="color: #ad7fa8; font-style: italic;">"vimpulse.el"</span> <span style="color: #ad7fa8; font-style: italic;">"vkill.el"</span> <span style="color: #ad7fa8; font-style: italic;">"xcscope.el"</span>
<span style="color: #ad7fa8; font-style: italic;">"xml-rpc-el.el"</span> <span style="color: #ad7fa8; font-style: italic;">"yasnippet.el"</span>)
</pre>


<h3>Ok, I want to try it, what's next?</h3>

<p class="first">Visit the following <code>URL</code> <a href="http://github.com/dimitri/el-get">http://github.com/dimitri/el-get</a> and follow the
install instructions. You're given a <em>scratch installer</em> there, that's some
<em>elisp</em> code you copy paste into <code>*scratch*</code> then execute there, and you have
<code>el-get</code> ready to serve.</p>

<p>An excellent idea I stole at <code>ELPA</code>!</p>


<h3>Hey, I already know what el-get is, what's new in 1.0?</h3>

<p class="first">The <em>changelog</em> is quite full of good stuff, really:</p>

<ul>
<li>Implement el-get recipes so that el-get-sources can be a simple list
of symbols. Now that there's an authoritative git repository, where
to share the recipes is easy.</li>

<li>Add support for emacswiki directly, save from having to enter the URL</li>

<li>Implement package status on-disk saving so that installing over a
previously failed install is in theory possible. Currently `el-get'
will refrain from removing your package automatically, though.</li>

<li>Fix ELPA remove method, adding a &quot;removed&quot; state too.</li>

<li>Implement CVS login support.</li>

<li>Add lots of recipes</li>

<li>Add support for `system-type' specific build commands</li>

<li>Byte compile files from the load-path entries or :compile files</li>

<li>Implement support for git submodules with the command
`git submodule update &mdash;init &mdash;recursive`</li>

<li>Add catch-all post-install and post-update hooks</li>

<li>Add desktop notification on install/update.</li>
</ul>


<h3>I'm still using the deprecated emacswiki version, what now?</h3>

<p class="first">That version didn't have recipes, and the new version should be perfectly
happy with your current <code>el-get-sources</code>, so that I recommend using the
<em>scratch installer</em> too. Don't forget to add <code>el-get</code> itself into your
<code>el-get-sources</code> list, of course!</p>



<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/muse.html">Muse</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/switch-window.html">switch-window</a> <a href="../../../tags/cssh.html">cssh</a> <a href="../../../tags/mailq.html">mailq</a> <a href="../../../tags/rcirc.html">rcirc</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 07 Oct 2010 13:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/10/07-el-get-reaches-10.html</guid>
</item>












<item>
  <title>Regexp performances and Finite Automata</title>
  <link>http://tapoueh.org/blog/2010/09/blog/2010/09/26-regexp-performances-and-finite-automata.html</link>
  <description><![CDATA[<p><span class="hack"> </span></p>

<p>The major reason why I dislike <a href="http://www.perl.org/">perl</a> so much, and <a href="http://www.ruby-lang.org">ruby</a> too, and the thing I'd
want different in the <a href="http://www.gnu.org/software/emacs/manual/elisp.html">Emacs Lisp</a> <code>API</code> so far is how they set developers mind
into using <a href="http://www.regular-expressions.info/">regexp</a>. You know the quote, don't you?</p>

<blockquote>
<p class="quoted">
Some people, when confronted with a problem, think “I know, I'll use regular
expressions.” Now they have two problems.</p>

</blockquote>

<p>That said, some situations require the use of <em>regexp</em> — or are so much
simpler to solve using them than the maintenance hell you're building here
ain't that big a drag. The given expressiveness is hard to match with any
other solution, to the point I sometime use them in my code (well I use <a href="http://www.emacswiki.org/emacs/rx">rx</a>
to lower the burden sometime, just see this example).</p>

<pre class="src">
(rx bol (zero-or-more blank) (one-or-more digit) <span style="color: #bc8f8f;">":"</span>)
<span style="color: #bc8f8f;">"^[[:blank:]]*[[:digit:]]+:"</span>
</pre>

<p>The thing you might want to know about <em>regexp</em> is that computing them is an
heavy task usually involving <em>parsing</em> their representation, <em>compiling</em> it to
some executable code, and then <em>executing</em> generated code. It's been showed in
the past (as soon as 1968) that a <em>regexp</em> is just another way to write a
finite automata, at least as soon as you don't need <em>backtracking</em>. The
writing of this article is my reaction to reading
<a href="http://swtch.com/~rsc/regexp/regexp1.html">Regular Expression Matching Can Be Simple And Fast</a> (but is slow in Java,
Perl, PHP, Python, Ruby, ...), a very interesting article — see the
benchmarks in there.</p>

<p>The bulk of it is that we find mainly two categories of <em>regexp</em> engine in the
wild, those that are using <a href="http://en.wikipedia.org/wiki/Nondeterministic_finite_state_machine">NFA</a> and <a href="http://en.wikipedia.org/wiki/Deterministic_finite_automaton">DFA</a> intermediate representation
techniques, and the others. Our beloved <a href="http://www.postgresql.org/">PostgreSQL</a> sure offers the feature,
it's the <code>~</code> and <code>~*</code> <a href="http://www.postgresql.org/docs/9.0/interactive/functions-matching.html">operators</a>. The implementation here is based on
<a href="http://www.arglist.com/regex/">Henry Spencer</a>'s work, which the aforementioned article says</p>

<blockquote>
<p class="quoted">
became very widely used, eventually serving as the basis for the slow
regular expression implementations mentioned earlier: Perl, PCRE, Python,
and so on.</p>

</blockquote>

<p>Having a look at the actual implementation shows that indeed, current
PostgreSQL code for <em>regexp</em> matching uses intermediate representations of
them as <code>NFA</code> and <code>DFA</code>. The code is quite complex, even more than I though it
would be, and I didn't have the time it would take to check it against the
proposed one from the <em>simple and fast</em> article.</p>

<pre class="src">
postgresql/src/backend/regex
  -rw-r--r--   1 dim  staff   4362 Sep 25 20:59 COPYRIGHT
  -rw-r--r--   1 dim  staff    614 Sep 25 20:59 Makefile
  -rw-r--r--   1 dim  staff  28217 Sep 25 20:59 re_syntax.n
  -rw-r--r--   1 dim  staff  16589 Sep 25 20:59 regc_color.c
  -rw-r--r--   1 dim  staff   3464 Sep 25 20:59 regc_cvec.c
  -rw-r--r--   1 dim  staff  25036 Sep 25 20:59 regc_lex.c
  -rw-r--r--   1 dim  staff  16845 Sep 25 20:59 regc_locale.c
  -rw-r--r--   1 dim  staff  35917 Sep 25 20:59 regc_nfa.c
  -rw-r--r--   1 dim  staff  50714 Sep 25 20:59 regcomp.c
  -rw-r--r--   1 dim  staff  17368 Sep 25 20:59 rege_dfa.c
  -rw-r--r--   1 dim  staff   3627 Sep 25 20:59 regerror.c
  -rw-r--r--   1 dim  staff  27664 Sep 25 20:59 regexec.c
  -rw-r--r--   1 dim  staff   2122 Sep 25 20:59 regfree.c
</pre>

<p>So all in all, I'll continue avoiding <em>regexp</em> as much as I currently do, and
will maintain my tendency to using <a href="http://www.gnu.org/manual/gawk/gawk.html">awk</a> when I need them on files (it allows
to refine the searching without resorting to more and more pipes in the
command line). And as far as resorting to using <em>regexp</em> in PostgreSQL is
concerned, it seems that the code here is already about topnotch. Once more.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Sun, 26 Sep 2010 21:00:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/blog/2010/09/26-regexp-performances-and-finite-automata.html</guid>
</item>
<item>
  <title>Regexp performances and Finite Automata</title>
  <link>http://tapoueh.org/blog/2010/09/26-regexp-performances-and-finite-automata.html</link>
  <description><![CDATA[h1>Regexp performances and Finite Automata</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/09/index.html>09</a> / </div>
<div class="date">Sunday, September 26 2010, 21:00</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>The major reason why I dislike <a href="http://www.perl.org/">perl</a> so much, and <a href="http://www.ruby-lang.org">ruby</a> too, and the thing I'd
want different in the <a href="http://www.gnu.org/software/emacs/manual/elisp.html">Emacs Lisp</a> <code>API</code> so far is how they set developers mind
into using <a href="http://www.regular-expressions.info/">regexp</a>. You know the quote, don't you?</p>

<blockquote>
<p class="quoted">
Some people, when confronted with a problem, think “I know, I'll use regular
expressions.” Now they have two problems.</p>

</blockquote>

<p>That said, some situations require the use of <em>regexp</em> — or are so much
simpler to solve using them than the maintenance hell you're building here
ain't that big a drag. The given expressiveness is hard to match with any
other solution, to the point I sometime use them in my code (well I use <a href="http://www.emacswiki.org/emacs/rx">rx</a>
to lower the burden sometime, just see this example).</p>

<pre class="src">
(rx bol (zero-or-more blank) (one-or-more digit) <span style="color: #ad7fa8; font-style: italic;">":"</span>)
<span style="color: #ad7fa8; font-style: italic;">"^[[:blank:]]*[[:digit:]]+:"</span>
</pre>

<p>The thing you might want to know about <em>regexp</em> is that computing them is an
heavy task usually involving <em>parsing</em> their representation, <em>compiling</em> it to
some executable code, and then <em>executing</em> generated code. It's been showed in
the past (as soon as 1968) that a <em>regexp</em> is just another way to write a
finite automata, at least as soon as you don't need <em>backtracking</em>. The
writing of this article is my reaction to reading
<a href="http://swtch.com/~rsc/regexp/regexp1.html">Regular Expression Matching Can Be Simple And Fast</a> (but is slow in Java,
Perl, PHP, Python, Ruby, ...), a very interesting article — see the
benchmarks in there.</p>

<p>The bulk of it is that we find mainly two categories of <em>regexp</em> engine in the
wild, those that are using <a href="http://en.wikipedia.org/wiki/Nondeterministic_finite_state_machine">NFA</a> and <a href="http://en.wikipedia.org/wiki/Deterministic_finite_automaton">DFA</a> intermediate representation
techniques, and the others. Our beloved <a href="http://www.postgresql.org/">PostgreSQL</a> sure offers the feature,
it's the <code>~</code> and <code>~*</code> <a href="http://www.postgresql.org/docs/9.0/interactive/functions-matching.html">operators</a>. The implementation here is based on
<a href="http://www.arglist.com/regex/">Henry Spencer</a>'s work, which the aforementioned article says</p>

<blockquote>
<p class="quoted">
became very widely used, eventually serving as the basis for the slow
regular expression implementations mentioned earlier: Perl, PCRE, Python,
and so on.</p>

</blockquote>

<p>Having a look at the actual implementation shows that indeed, current
PostgreSQL code for <em>regexp</em> matching uses intermediate representations of
them as <code>NFA</code> and <code>DFA</code>. The code is quite complex, even more than I though it
would be, and I didn't have the time it would take to check it against the
proposed one from the <em>simple and fast</em> article.</p>

<pre class="src">
postgresql/src/backend/regex
  -rw-r--r--   1 dim  staff   4362 Sep 25 20:59 COPYRIGHT
  -rw-r--r--   1 dim  staff    614 Sep 25 20:59 Makefile
  -rw-r--r--   1 dim  staff  28217 Sep 25 20:59 re_syntax.n
  -rw-r--r--   1 dim  staff  16589 Sep 25 20:59 regc_color.c
  -rw-r--r--   1 dim  staff   3464 Sep 25 20:59 regc_cvec.c
  -rw-r--r--   1 dim  staff  25036 Sep 25 20:59 regc_lex.c
  -rw-r--r--   1 dim  staff  16845 Sep 25 20:59 regc_locale.c
  -rw-r--r--   1 dim  staff  35917 Sep 25 20:59 regc_nfa.c
  -rw-r--r--   1 dim  staff  50714 Sep 25 20:59 regcomp.c
  -rw-r--r--   1 dim  staff  17368 Sep 25 20:59 rege_dfa.c
  -rw-r--r--   1 dim  staff   3627 Sep 25 20:59 regerror.c
  -rw-r--r--   1 dim  staff  27664 Sep 25 20:59 regexec.c
  -rw-r--r--   1 dim  staff   2122 Sep 25 20:59 regfree.c
</pre>

<p>So all in all, I'll continue avoiding <em>regexp</em> as much as I currently do, and
will maintain my tendency to using <a href="http://www.gnu.org/manual/gawk/gawk.html">awk</a> when I need them on files (it allows
to refine the searching without resorting to more and more pipes in the
command line). And as far as resorting to using <em>regexp</em> in PostgreSQL is
concerned, it seems that the code here is already about topnotch. Once more.</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/emacs.html">Emacs</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Sun, 26 Sep 2010 21:00:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/26-regexp-performances-and-finite-automata.html</guid>
</item>
<item>
  <title>Postfix sender_dependent_relayhost_maps</title>
  <link>http://tapoueh.org/blog/2010/09/23-postfix-sender_dependent_relayhost_maps.html</link>
  <description><![CDATA[h1>Postfix sender_dependent_relayhost_maps</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/09/index.html>09</a> / </div>
<div class="date">Thursday, September 23 2010, 14:30</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>The previous article about <a href="http://tapoueh.org/articles/news/_Scratch_that_itch:_M-x_mailq.html">M-x mailq</a> has raised several mails asking me
details about the <a href="http://www.postfix.com/">Postfix</a> setup I'm talking about. The problem we're trying
to solve is having a local <code>MTA</code> to send mails, so that any old-style Unix
tool just works, instead of only the <code>MUA</code> you've spent time setting up.</p>

<p>Postfix makes it possible to do that quite easily, but it gets a little more
involved if you have more than one <em>relayhost</em> that you want to use depending
on your current <em>From</em> address. Think personal email against work email, or
avoiding your <code>ISP</code> network when sending your private mails, <em>hoping</em> directly
on a server you own or trust.</p>

<p>So how do you do just that? Let's see the relevant parts of <code>main.cf</code>.</p>

<pre class="src">
relayhost = your.default.relay.host.here
relay_domains = domain.org, work-domain.com, other-domain.info
smtp_sender_dependent_authentication = yes
sender_dependent_relayhost_maps = hash:/etc/postfix/relaymap
</pre>

<p>The <code>relaymap</code> looks like this:</p>

<pre class="src">
<span style="color: #888a85;"># </span><span style="color: #888a85;">comments
</span>user@domain.org         mail.domain.org
local@work-domain.com   smtp.work-domain.com
<span style="color: #888a85;"># </span><span style="color: #888a85;">that requires a local tunnel started with ssh, see ~/.ssh/config
</span>me@other-domain.info    [127.0.0.1]:10025
</pre>

<p>You need to use <a href="http://www.postfix.org/postmap.1.html">postmap</a> on this file before to reload or restart your local
instance of Postfix.</p>

<p>Also, you should want to crypt your communication to your preferred relay
host, using <code>TLS</code> goes like this:</p>

<pre class="src">
smtp_sasl_auth_enable=yes
smtp_sasl_password_maps=hash:/etc/postfix/sasl-passwords
smtp_sasl_mechanism_filter = digest-md5
smtp_sasl_security_options = noanonymous
smtp_sasl_mechanism_filter = login, plain
smtp_sasl_type = cyrus

smtp_tls_session_cache_database = btree:${queue_directory}/smtp_scache
smtp_tls_loglevel = 2
smtp_use_tls = yes
smtp_tls_security_level = may
</pre>

<p>The password file will need to get parsed by <code>postmap</code> too, and would better
be set with limited read access, and looks like this:</p>

<pre class="src">
mail.domain.org        user@domain.org:password
smtp.work-domain.com   local@work-domain.com:h4ckm3
[<span style="color: #8ae234; font-weight: bold;">127.0.0.1</span>]:10025      me@other-domain.info:guess
</pre>

<p>Hope this help you get started, at least that's a document I would have
enjoyed reading when I first started to setup my local relaying <code>MTA</code>.</p>

<p>Oh, and now that you have this, I hope you will enjoy my <code>M-x mailq</code> tool for
occasions when you're wondering why you're not receiving an answer back yet,
then start the ssh tunnel…</p>


<h2>Tags</h2>

<p><a href="../../../tags/mailq.html">mailq</a> <a href="../../../tags/postfix.html">postfix</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 23 Sep 2010 14:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/23-postfix-sender_dependent_relayhost_maps.html</guid>
</item>






<item>
  <title>Scratch that itch: M-x mailq</title>
  <link>http://tapoueh.org/blog/2010/09/23-scratch-that-itch-m-x-mailq.html</link>
  <description><![CDATA[h1>Scratch that itch: M-x mailq</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/09/index.html>09</a> / </div>
<div class="date">Thursday, September 23 2010, 09:30</div>
</div>
<div id="article">
<p>Nowadays, most people would think that email is something simple, you just
setup your preferred client (that's called a <code>MUA</code>) with some information such
as the <code>smtp</code> host you want it to talk to (that's call a <code>MTA</code> and this one is
your <code>relayhost</code>). Then there's all the receiving mails part, and that's <code>smtp</code>
again on the server side. Then there's how to get those mail, read them,
flag them, manage them, and that's better served by <code>IMAP</code>. Let's talk about
sending mails in <code>smtp</code> for this entry.</p>

<p>The traditional way to handle mail sending is to have your own <code>MTA</code> on each
system you use — there used to be a <em>sysadmin</em> team caring about all those
systems, but we're lost in the personal computer era now — that only means
<strong><em>you</em></strong> are the sysadmin. So about any Unix tool that wants to send a mail will
do so with the command <code>/usr/bin/sendmail</code> to queue the outgoing message.</p>

<p>My typical <em>workstation</em> setup includes a full-blown <code>MTA</code> (my choice is
<a href="http://www.postfix.com/">Postfix</a>) that will choose the next relay host depending on the message <em>From</em>
field: I don't want to trust any local default relayhost. Note that the next
relay is connected to with authentication and over an encrypted protocol.</p>

<blockquote>
<p class="quoted">
We're getting there, really. But I don't know a better way to present a
software, little as it be, other than talking about the need that leads to
its development.</p>

</blockquote>

<p>Some relaying I do atop an <code>ssh</code> tunnel, and it happens that I send mail and
have forgotten about setting up the aforementioned tunnel. In this case, the
advantage is that it will not block my <code>MUA</code> (<a href="http://gnus.org/">gnus</a>, in quite good shape those
days, receiving lots of love), as the queueing happens as usual. The
drawback is that <a href="http://www.postfix.com/">Postfix</a> will <em>silently</em> queue the mail until it's able to
deliver it, which can take days.</p>

<p>Enters <code>M-x mailq</code>! Ok, I could be doing <code>M-! mailq</code> and see <em>Mail queue is empty</em>
in the message area, but then as soon as the queue's not empty I need to
resort to some <em>shell</em> or <em>terminal</em> in order to <em>flush</em> the queue — that's after
setting up the tunnel, as easy as <code>C-= remote</code> in my case, see
<a href="http://github.com/dimitri/cssh">cssh</a>. Scratching that itch, I now only have to hit <code>f</code> here, to flush the
queue. And from the <em>gnus</em> <code>*Group*</code> and <code>*Summary*</code> buffers, it's <code>M-q</code> to see the
mail queue.</p>

<p>Thanks to <a href="http://forum.ubuntu-fr.org/viewtopic.php?id=218883">http://forum.ubuntu-fr.org/viewtopic.php?id=218883</a> here's a visual
sample of the <code>mailq</code> mode, where you see the mail queue in colors and the
<em>keymap</em> you're offered.</p>

<center>
<p><img src="../../../images//mailq-el.png" alt=""></p>
</center>

<p>So you could even <em>flush</em> only a given <code>queue id</code> or a given <code>site</code>, or just <em>kill</em>
the current <code>id</code> or the current <code>site</code> so that it's a <code>C-y</code> away. I hope it's
useful for you too — oh, and it's already in the <a href="http://github.com/dimitri/el-get">el-get</a> recipes, of course!</p>


<h2>Tags</h2>

<p><a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/cssh.html">cssh</a> <a href="../../../tags/mailq.html">mailq</a> <a href="../../../tags/postfix.html">postfix</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 23 Sep 2010 09:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/23-scratch-that-itch-m-x-mailq.html</guid>
</item>
<item>
  <title>switch-window reaches 0.8</title>
  <link>http://tapoueh.org/blog/2010/09/blog/2010/09/13-switch-window-reaches-08.html</link>
  <description><![CDATA[<p>I wanted to play with the idea of using the whole keyboard for my
<a href="http://github.com/dimitri/switch-window">switch-window</a> utility, but wondered how to get those keys in the right order
and all. Finally found <code>quail-keyboard-layout</code> which seems to exists for such
uses, as you can see:</p>

<pre class="src">
(<span style="color: #7f007f;">loop</span> with layout = (split-string quail-keyboard-layout <span style="color: #bc8f8f;">""</span>)
  for row from 1 to 4
  collect (<span style="color: #7f007f;">loop</span> for col from 1 to 12
 (<span style="color: #bc8f8f;">"q"</span> <span style="color: #bc8f8f;">"w"</span> <span style="color: #bc8f8f;">"e"</span> <span style="color: #bc8f8f;">"r"</span> <span style="color: #bc8f8f;">"t"</span> <span style="color: #bc8f8f;">"y"</span> <span style="color: #bc8f8f;">"u"</span> <span style="color: #bc8f8f;">"i"</span> <span style="color: #bc8f8f;">"o"</span> <span style="color: #bc8f8f;">"p"</span> <span style="color: #bc8f8f;">"["</span> <span style="color: #bc8f8f;">"]"</span>)
 (<span style="color: #bc8f8f;">"a"</span> <span style="color: #bc8f8f;">"s"</span> <span style="color: #bc8f8f;">"d"</span> <span style="color: #bc8f8f;">"f"</span> <span style="color: #bc8f8f;">"g"</span> <span style="color: #bc8f8f;">"h"</span> <span style="color: #bc8f8f;">"j"</span> <span style="color: #bc8f8f;">"k"</span> <span style="color: #bc8f8f;">"l"</span> <span style="color: #bc8f8f;">";"</span> <span style="color: #bc8f8f;">"'"</span> <span style="color: #bc8f8f;">"\\"</span>)
 (<span style="color: #bc8f8f;">"z"</span> <span style="color: #bc8f8f;">"x"</span> <span style="color: #bc8f8f;">"c"</span> <span style="color: #bc8f8f;">"v"</span> <span style="color: #bc8f8f;">"b"</span> <span style="color: #bc8f8f;">"n"</span> <span style="color: #bc8f8f;">"m"</span> <span style="color: #bc8f8f;">","</span> <span style="color: #bc8f8f;">"."</span> <span style="color: #bc8f8f;">"/"</span> <span style="color: #bc8f8f;">" "</span> <span style="color: #bc8f8f;">" "</span>))
</pre>

<p>So now <code>switch-window</code> will use that (but only the first <code>10</code> letters) instead
of <em>hard-coding</em> numbers from 1 to 9 as labels and direct switches. That makes
it more suitable to <a href="http://github.com/dimitri/cssh">cssh</a> users too, I guess.</p>

<p>In other news, I think <a href="http://github.com/dimitri/el-get">el-get</a> is about ready for its <code>1.0</code> release. Please
test it and report any problem very soon before the release!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 13 Sep 2010 17:45:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/blog/2010/09/13-switch-window-reaches-08.html</guid>
</item>
<item>
  <title>switch-window reaches 0.8</title>
  <link>http://tapoueh.org/blog/2010/09/13-switch-window-reaches-08.html</link>
  <description><![CDATA[h1>switch-window reaches 0.8</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/09/index.html>09</a> / </div>
<div class="date">Monday, September 13 2010, 17:45</div>
</div>
<div id="article">
<p>I wanted to play with the idea of using the whole keyboard for my
<a href="http://github.com/dimitri/switch-window">switch-window</a> utility, but wondered how to get those keys in the right order
and all. Finally found <code>quail-keyboard-layout</code> which seems to exists for such
uses, as you can see:</p>

<pre class="src">
(<span style="color: #729fcf; font-weight: bold;">loop</span> with layout = (split-string quail-keyboard-layout <span style="color: #ad7fa8; font-style: italic;">""</span>)
  for row from 1 to 4
  collect (<span style="color: #729fcf; font-weight: bold;">loop</span> for col from 1 to 12
 (<span style="color: #ad7fa8; font-style: italic;">"q"</span> <span style="color: #ad7fa8; font-style: italic;">"w"</span> <span style="color: #ad7fa8; font-style: italic;">"e"</span> <span style="color: #ad7fa8; font-style: italic;">"r"</span> <span style="color: #ad7fa8; font-style: italic;">"t"</span> <span style="color: #ad7fa8; font-style: italic;">"y"</span> <span style="color: #ad7fa8; font-style: italic;">"u"</span> <span style="color: #ad7fa8; font-style: italic;">"i"</span> <span style="color: #ad7fa8; font-style: italic;">"o"</span> <span style="color: #ad7fa8; font-style: italic;">"p"</span> <span style="color: #ad7fa8; font-style: italic;">"["</span> <span style="color: #ad7fa8; font-style: italic;">"]"</span>)
 (<span style="color: #ad7fa8; font-style: italic;">"a"</span> <span style="color: #ad7fa8; font-style: italic;">"s"</span> <span style="color: #ad7fa8; font-style: italic;">"d"</span> <span style="color: #ad7fa8; font-style: italic;">"f"</span> <span style="color: #ad7fa8; font-style: italic;">"g"</span> <span style="color: #ad7fa8; font-style: italic;">"h"</span> <span style="color: #ad7fa8; font-style: italic;">"j"</span> <span style="color: #ad7fa8; font-style: italic;">"k"</span> <span style="color: #ad7fa8; font-style: italic;">"l"</span> <span style="color: #ad7fa8; font-style: italic;">";"</span> <span style="color: #ad7fa8; font-style: italic;">"'"</span> <span style="color: #ad7fa8; font-style: italic;">"\\"</span>)
 (<span style="color: #ad7fa8; font-style: italic;">"z"</span> <span style="color: #ad7fa8; font-style: italic;">"x"</span> <span style="color: #ad7fa8; font-style: italic;">"c"</span> <span style="color: #ad7fa8; font-style: italic;">"v"</span> <span style="color: #ad7fa8; font-style: italic;">"b"</span> <span style="color: #ad7fa8; font-style: italic;">"n"</span> <span style="color: #ad7fa8; font-style: italic;">"m"</span> <span style="color: #ad7fa8; font-style: italic;">","</span> <span style="color: #ad7fa8; font-style: italic;">"."</span> <span style="color: #ad7fa8; font-style: italic;">"/"</span> <span style="color: #ad7fa8; font-style: italic;">" "</span> <span style="color: #ad7fa8; font-style: italic;">" "</span>))
</pre>

<p>So now <code>switch-window</code> will use that (but only the first <code>10</code> letters) instead
of <em>hard-coding</em> numbers from 1 to 9 as labels and direct switches. That makes
it more suitable to <a href="http://github.com/dimitri/cssh">cssh</a> users too, I guess.</p>

<p>In other news, I think <a href="http://github.com/dimitri/el-get">el-get</a> is about ready for its <code>1.0</code> release. Please
test it and report any problem very soon before the release!</p>


<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/switch-window.html">switch-window</a> <a href="../../../tags/cssh.html">cssh</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 13 Sep 2010 17:45:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/13-switch-window-reaches-08.html</guid>
</item>
<item>
  <title>Window Functions example remix</title>
  <link>http://tapoueh.org/blog/2010/09/12-window-functions-example-remix.html</link>
  <description><![CDATA[h1>Window Functions example remix</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/09/index.html>09</a> / </div>
<div class="date">Sunday, September 12 2010, 21:35</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>The drawback of hosting a static only website is, obviously, the lack of
comments. What happens actually, though, is that I receive very few comments
by direct mail. As I don't get another <em>spam</em> source to cleanup, I'm left
unconvinced that's such a drawback. I still miss the low probability of
seeing blog readers exchange directly, but I think a <code>tapoueh.org</code> mailing
list would be my answer, here...</p>

<p>Anyway, <a href="http://people.planetpostgresql.org/dfetter/">David Fetter</a> took the time to send me a comment by mail with a
cleaned up rewrite of the previous entry <code>SQL</code>, here's it for your pleasure!</p>

<pre class="src">
WITH t AS (
    SELECT
        o, w,
        CASE WHEN
            LAG(w) OVER(w) IS DISTINCT FROM w AND
            ROW_NUMBER() OVER (w) &gt; 1 <span style="color: #888a85;">/* Eliminate first change */</span>
        THEN 1
        END AS change
    FROM (
        VALUES
            (1, 5),
            (2, 10),
            (3, 7),
            (4, 7),
            (5, 7)
    ) AS data(o, w)
    WINDOW w AS (ORDER BY o) <span style="color: #888a85;">/* Factor out WINDOW */</span>
)
SELECT SUM(change) FROM t;
</pre>

<p>As you can see <strong><em>David</em></strong> chose to filter the first change in the subquery rather
than hacking it away with a simple <code>-1</code> at the outer level. I'm still
wondering which way is cleaner (that depends on how you look at the
problem), but I think I know which one is simpler! Thanks <strong><em>David</em></strong> for this
blog entry!</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Sun, 12 Sep 2010 21:35:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/12-window-functions-example-remix.html</guid>
</item>
<item>
  <title>Window Functions example</title>
  <link>http://tapoueh.org/blog/2010/09/09-window-functions-example.html</link>
  <description><![CDATA[h1>Window Functions example</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/09/index.html>09</a> / </div>
<div class="date">Thursday, September 09 2010, 16:35</div>
</div>
<div id="article">
<p>So, when <code>8.4</code> came out there was all those comments about how getting
<a href="http://www.postgresql.org/docs/8.4/interactive/tutorial-window.html">window functions</a> was an awesome addition. Now, it seems that a lot of people
seeking for help in <a href="http://wiki.postgresql.org/index.php?title=IRC">#postgresql</a> just don't know what kind of problem this
feature helps solving. I've already been using them in some cases here in
this blog, for getting some nice overview about
<a href="http://tapoueh.org/articles/blog/_Partitioning:_relation_size_per_%E2%80%9Cgroup%E2%80%9D.html">Partitioning: relation size per “group”</a>.</p>

<p>Now, another example use case rose on <code>IRC</code> today. I'll quote directly our user here:</p>

<blockquote>
<p class="quoted">
hey there, how can i count the number of (value) changes in one column?</p>
<p class="quoted">  example: a table with a column <em>weight</em>. let's say we have 5 rows, having
the following values for weight: <code>5, 10, 7, 7, 7</code>. the number of changes of
weight would be 2 here (from 5 to 10 and 10 to 7). any idea how I could do
that in SQL using PGSQL 8.4.4? GROUP BY or count(distinct weight)
obviously does not work. thx in advance</p>

</blockquote>

<p>Now, several of us began talking about <em>window functions</em> and about the fact
that you need some other column to identify the ordering of those weights,
obviously, because that's the only way to define what a change is in this
context. Let's have a first try at it.</p>

<pre class="src">
=# select o, w,
          case when lag(w) over(order by o) is distinct from w then 1 end as change
     from (values (1, 5), (2, 10), (3, 7), (4, 7), (5, 7)) as data(o, w);
 o | w  | change
<span style="color: #888a85;">---+----+--------
</span> 1 |  5 |      1
 2 | 10 |      1
 3 |  7 |      1
 4 |  7 |
 5 |  7 |
(5 rows)
</pre>

<p>Not too bad, but of course we are seeing a false change on the first line,
as for any <em>window</em> of rows you define the previous one, given by <code>lag()
over()</code>, will be <code>NULL</code>. The easiest way to accommodate is the following:</p>

<pre class="src">
=# select sum(change) -1 as changes
     from (select case when lag(w) over(order by o) is distinct from w
                       then 1
                   end as change
             from (values (1, 5),
                          (2, 10),
                          (3, 7),
                          (4, 7),
                          (5, 7)) as t(o, w)) as x;
 changes
<span style="color: #888a85;">---------
</span>       2
(1 row)
</pre>

<p>So don't be shy and go read about <a href="http://www.postgresql.org/docs/8.4/interactive/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS">window functions in SQL expressions</a> and
<a href="http://www.postgresql.org/docs/8.4/interactive/queries-table-expressions.html#QUERIES-WINDOW">window function processing</a> in the query table expressions. That's a very
nice tool to have and my guess is that you will soon enough realize the only
reason why you could think you don't have a need for them is that you didn't
know it existed, and what you can do with it. <em>Sharpen your saw!</em> :)</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 09 Sep 2010 16:35:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/09-window-functions-example.html</guid>
</item>
<item>
  <title>Synchronous Replication</title>
  <link>http://tapoueh.org/blog/2010/09/06-synchronous-replication.html</link>
  <description><![CDATA[h1>Synchronous Replication</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/09/index.html>09</a> / </div>
<div class="date">Monday, September 06 2010, 18:05</div>
</div>
<div id="article">
<p>Although the new asynchronous replication facility that ships with 9.0 ain't
released to the wide public yet, our hackers hero are already working on the
synchronous version of it. A part of the facility is rather easy to design,
we want something comparable to <a href="http://www.drbd.org/">DRBD</a> flexibility, but specific to our
database world.  So <em>synchronous</em> would either mean <em>recv</em>, <em>fsync</em> or <em>apply</em>,
depending on what you need the <em>standby</em> to have already done when the master
acknowledges the <code>COMMIT</code>. Let's call that the <em>service level</em>.</p>

<p>The part of the design that's not so easy is more interesting. Do we need to
register standbys and have the <em>service level</em> setup per standby? Can we get
some more flexibility and have the <em>service level</em> set on a per-transaction
basis? The idea here would be that the application knows which transactions
are meant to be extra-safe and which are not, the same way that you can set
<code>synchronous_commit to off</code> when dealing with web sessions, for example.</p>

<p><em>Why choosing?</em> I hear you ask. Well, it's all about having more data safety,
and a typical setup would contain an asynchronous reporting server and a
local <em>failover</em> synchronous server. Then add a remote one, too. So even if we
pick the transaction based facility, we still want to be able to choose at
setup time which server to failover to. Than means we don't want that much
flexibility now, we want to know where the data is safe, we don't want to
have to guess.</p>

<p>Some way to solve that is to be able to setup a slave as being the failover
one, or say, the <code>sync</code> one. Now, the detail that ruins it all is that we need
a <em>timeout</em> to handle worst cases when a given slave loses its connectivity
(or power, say). Now, the slave ain't in <em>sync</em> any more and some people will
require that the service is still available (<em>timeout</em> but <code>COMMIT</code>) and some
will require that the service is down: don't accept a new transaction if you
can't make its data safe to the slave too.</p>

<p>The answer would be to have the master arbitrate between what the
transaction wants and what the slave is setup to provide, and what it's able
to provide at the time of the transaction. Given a transaction with a
<em>service level</em> of <em>apply</em> and a slave setup for being <em>async</em>, the <code>COMMIT</code> does
not have to wait, because there's no known slave able to offer the needed
level. Or the <code>COMMIT</code> can not happen, for the very same reason.</p>

<p>Then I think it all flows quite naturally from there, and while arbitrating
the master could record which slave is currently offering what <em>service
level</em>. And offering the information in a system view too, of course.</p>

<p>The big question that's not answered in this proposal is how to setup that
being unable to reach the wanted <em>service level</em> is an error or a
warning?</p>

<p>That too would need to be for the master to arbitrate based on a per standby
and a per transaction setting, and in the general case it could be a <em>quorum</em>
setup: each slave is given a <em>weight</em> and each transaction a <em>quorum</em> to
reach. The master sums up the weights of the standby that ack the
transaction at the needed <em>service level</em> and the <code>COMMIT</code> happens as soon as
the quorum is reached, or is canceled as soon as the <em>timeout</em> is reached,
whichever comes first.</p>

<p>Such a model allows for very flexible setups, where each standby has a
<em>weight</em> and offers a given <em>service level</em>, and each transaction waits until a
<em>quorum</em> is reached. Giving the right weights to your standbys (like, powers
of two) allow you to set the quorum in a way that only one given standby is
able to acknowledge the most important transactions. But that's flexible
enough you can change it at any time, it's just a <em>weight</em> that allows a <em>sum</em>
to be made, so my guess would be it ends up in the <em>feedback loop</em> between the
standby and its master.</p>

<p>The most appealing part of this proposal is that it doesn't look complex to
implement, and should allow for highly flexible setups. Of course, the devil
is in the details, and we're talking about latencies in the distributed
system here. That's also being discussed on the <a href="http://archives.postgresql.org/pgsql-hackers/">mailing list</a>.</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/release.html">release</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 06 Sep 2010 18:05:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/06-synchronous-replication.html</guid>
</item>


<item>
  <title>Want to share your recipes?</title>
  <link>http://tapoueh.org/blog/2010/08/blog/2010/08/31-want-to-share-your-recipes.html</link>
  <description><![CDATA[<p>Yes, that's another <a href="http://github.com/dimitri/el-get/">el-get</a> related entry. It seems to take a lot of my
attention these days. After having setup the <code>git</code> repository so that you can
update <code>el-get</code> from within itself (so that it's <em>self-contained</em>), the next
logical step is providing <em>recipes</em>.</p>

<p>By that I mean that <code>el-get-sources</code> entries will certainly look a lot alike
between a user and another. Let's take the <code>el-get</code> entry itself:</p>

<pre class="src">
(<span style="color: #da70d6;">:name</span> el-get
       <span style="color: #da70d6;">:type</span> git
       <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"git://github.com/dimitri/el-get.git"</span>
       <span style="color: #da70d6;">:features</span> <span style="color: #bc8f8f;">"el-get"</span>)
</pre>

<p>I guess all <code>el-get</code> users will have just the same 4 lines in their
<code>el-get-sources</code>. So let's call that a <em>recipe</em>, and have <code>el-get</code> look for yours
into the <code>el-get-recipe-path</code> directories. A recipe is found looking in those
directories in order, and must be named <code>package.el</code>. Now, <code>el-get</code> already
contains a handful of them, as you can see:</p>

<pre class="src">
ELISP&gt; (directory-files <span style="color: #bc8f8f;">"~/dev/emacs/el-get/recipes/"</span> nil <span style="color: #bc8f8f;">"[</span><span style="color: #bc8f8f;">^</span><span style="color: #bc8f8f;">.]$"</span>)
(<span style="color: #bc8f8f;">"auctex.el"</span> <span style="color: #bc8f8f;">"bbdb.el"</span> <span style="color: #bc8f8f;">"cssh.el"</span> <span style="color: #bc8f8f;">"el-get.el"</span> <span style="color: #bc8f8f;">"emms.el"</span> <span style="color: #bc8f8f;">"erc-track-score.el"</span>
 <span style="color: #bc8f8f;">"escreen.el"</span> <span style="color: #bc8f8f;">"google-maps.el"</span> <span style="color: #bc8f8f;">"haskell-mode.el"</span> <span style="color: #bc8f8f;">"hl-sexp.el"</span> <span style="color: #bc8f8f;">"magit.el"</span>
 <span style="color: #bc8f8f;">"muse-blog.el"</span> <span style="color: #bc8f8f;">"nxhtml.el"</span> <span style="color: #bc8f8f;">"psvn.el"</span> <span style="color: #bc8f8f;">"rainbow-mode.el"</span> <span style="color: #bc8f8f;">"rcirc-groups.el"</span>
 <span style="color: #bc8f8f;">"vkill.el"</span> <span style="color: #bc8f8f;">"xcscope.el"</span> <span style="color: #bc8f8f;">"xml-rpc-el.el"</span> <span style="color: #bc8f8f;">"yasnippet.el"</span>)
</pre>

<p>Please note that you can have your own local recipes by adding directories
to <code>el-get-recipe-path</code>. So now your minimalistic <code>el-get-sources</code> list will
look like <code>'(el-get cssh screen)</code>, say. And if you want to override a recipe,
for instance to use the default one but still have a personal <code>:after</code>
function containing your own setup, then simply have your <code>el-get-source</code>
entry a partial entry. Missing <code>:type</code> and <code>el-get</code> will merge your local
overrides atop the default one.</p>

<p>Finally, the way to share your recipes is by sending me an email with the
file, or to do the same over the <code>github</code> interface, I guess I'll still
receive a mail then.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 31 Aug 2010 14:15:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/blog/2010/08/31-want-to-share-your-recipes.html</guid>
</item>
<item>
  <title>Want to share your recipes?</title>
  <link>http://tapoueh.org/blog/2010/08/31-want-to-share-your-recipes.html</link>
  <description><![CDATA[h1>Want to share your recipes?</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/08/index.html>08</a> / </div>
<div class="date">Tuesday, August 31 2010, 14:15</div>
</div>
<div id="article">
<p>Yes, that's another <a href="http://github.com/dimitri/el-get/">el-get</a> related entry. It seems to take a lot of my
attention these days. After having setup the <code>git</code> repository so that you can
update <code>el-get</code> from within itself (so that it's <em>self-contained</em>), the next
logical step is providing <em>recipes</em>.</p>

<p>By that I mean that <code>el-get-sources</code> entries will certainly look a lot alike
between a user and another. Let's take the <code>el-get</code> entry itself:</p>

<pre class="src">
(<span style="color: #729fcf;">:name</span> el-get
       <span style="color: #729fcf;">:type</span> git
       <span style="color: #729fcf;">:url</span> <span style="color: #ad7fa8; font-style: italic;">"git://github.com/dimitri/el-get.git"</span>
       <span style="color: #729fcf;">:features</span> <span style="color: #ad7fa8; font-style: italic;">"el-get"</span>)
</pre>

<p>I guess all <code>el-get</code> users will have just the same 4 lines in their
<code>el-get-sources</code>. So let's call that a <em>recipe</em>, and have <code>el-get</code> look for yours
into the <code>el-get-recipe-path</code> directories. A recipe is found looking in those
directories in order, and must be named <code>package.el</code>. Now, <code>el-get</code> already
contains a handful of them, as you can see:</p>

<pre class="src">
ELISP&gt; (directory-files <span style="color: #ad7fa8; font-style: italic;">"~/dev/emacs/el-get/recipes/"</span> nil <span style="color: #ad7fa8; font-style: italic;">"[</span><span style="color: #ad7fa8; font-style: italic;">^</span><span style="color: #ad7fa8; font-style: italic;">.]$"</span>)
(<span style="color: #ad7fa8; font-style: italic;">"auctex.el"</span> <span style="color: #ad7fa8; font-style: italic;">"bbdb.el"</span> <span style="color: #ad7fa8; font-style: italic;">"cssh.el"</span> <span style="color: #ad7fa8; font-style: italic;">"el-get.el"</span> <span style="color: #ad7fa8; font-style: italic;">"emms.el"</span> <span style="color: #ad7fa8; font-style: italic;">"erc-track-score.el"</span>
 <span style="color: #ad7fa8; font-style: italic;">"escreen.el"</span> <span style="color: #ad7fa8; font-style: italic;">"google-maps.el"</span> <span style="color: #ad7fa8; font-style: italic;">"haskell-mode.el"</span> <span style="color: #ad7fa8; font-style: italic;">"hl-sexp.el"</span> <span style="color: #ad7fa8; font-style: italic;">"magit.el"</span>
 <span style="color: #ad7fa8; font-style: italic;">"muse-blog.el"</span> <span style="color: #ad7fa8; font-style: italic;">"nxhtml.el"</span> <span style="color: #ad7fa8; font-style: italic;">"psvn.el"</span> <span style="color: #ad7fa8; font-style: italic;">"rainbow-mode.el"</span> <span style="color: #ad7fa8; font-style: italic;">"rcirc-groups.el"</span>
 <span style="color: #ad7fa8; font-style: italic;">"vkill.el"</span> <span style="color: #ad7fa8; font-style: italic;">"xcscope.el"</span> <span style="color: #ad7fa8; font-style: italic;">"xml-rpc-el.el"</span> <span style="color: #ad7fa8; font-style: italic;">"yasnippet.el"</span>)
</pre>

<p>Please note that you can have your own local recipes by adding directories
to <code>el-get-recipe-path</code>. So now your minimalistic <code>el-get-sources</code> list will
look like <code>'(el-get cssh screen)</code>, say. And if you want to override a recipe,
for instance to use the default one but still have a personal <code>:after</code>
function containing your own setup, then simply have your <code>el-get-source</code>
entry a partial entry. Missing <code>:type</code> and <code>el-get</code> will merge your local
overrides atop the default one.</p>

<p>Finally, the way to share your recipes is by sending me an email with the
file, or to do the same over the <code>github</code> interface, I guess I'll still
receive a mail then.</p>


<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/muse.html">Muse</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/cssh.html">cssh</a> <a href="../../../tags/rcirc.html">rcirc</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 31 Aug 2010 14:15:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/31-want-to-share-your-recipes.html</guid>
</item>
<item>
  <title>Happy Numbers</title>
  <link>http://tapoueh.org/blog/2010/08/blog/2010/08/30-happy-numbers.html</link>
  <description><![CDATA[<p>After discovering the excellent <a href="http://gwene.org/">Gwene</a> service, which allows you to subscribe
to <em>newsgroups</em> to read <code>RSS</code> content (<em>blogs</em>, <em>planets</em>, <em>commits</em>, etc), I came to
read this nice article about <a href="http://programmingpraxis.com/2010/07/23/happy-numbers/">Happy Numbers</a>. That's a little problem that
fits well an interview style question, so I first solved it yesterday
evening in <a href="http://www.gnu.org/software/emacs/emacs-lisp-intro/html_node/List-Processing.html#List-Processing">Emacs Lisp</a> as that's the language I use the most those days.</p>

<blockquote>
<p class="quoted">
A happy number is defined by the following process. Starting with any
positive integer, replace the number by the sum of the squares of its
digits, and repeat the process until the number equals 1 (where it will
stay), or it loops endlessly in a cycle which does not include 1. Those
numbers for which this process ends in 1 are happy numbers, while those
that do not end in 1 are unhappy numbers (or sad numbers).</p>

</blockquote>

<p>Now, what about implementing the same in pure <code>SQL</code>, for more fun? Now that's
interesting! After all, we didn't get <code>WITH RECURSIVE</code> for tree traversal
only, <a href="http://archives.postgresql.org/message-id/e08cc0400911042333o5361b21cu2c9438f82b1e55ce@mail.gmail.com">did we</a>?</p>

<p>Unfortunately, we need a little helper function first, if only to ease the
reading of the recursive query. I didn't try to inline it, but here it goes:</p>

<pre class="src">
create or replace function digits(x bigint)
  returns setof int
  language sql
as $$
  select substring($1::text from i for 1)::int
    from generate_series(1, length($1::text)) as t(i)
$$;
</pre>

<p>That was easy: it will output one row per digit of the input number — and
rather than resorting to powers of ten and divisions and remainders, we do
use plain old text representation and <code>substring</code>. Now, to the real
problem. If you're read what is an happy number and already did read the
fine manual about <a href="http://www.postgresql.org/docs/8.4/interactive/queries-with.html">Recursive Query Evaluation</a>, it should be quite easy to
read the following:</p>

<pre class="src">
with recursive happy(n, seen) as (
    select 7::bigint, <span style="color: #bc8f8f;">'{}'</span>::bigint[]
  union all
    select sum(d*d), h.seen || sum(d*d)
      from (select n, digits(n) as d, seen
              from happy
           ) as h
  group by h.n, h.seen
    having not seen @&gt; array[sum(d*d)]
)
  select * from happy;
  n  |       seen
<span style="color: #b22222;">-----+------------------
</span>   7 | {}
  49 | {49}
  97 | {49,97}
 130 | {49,97,130}
  10 | {49,97,130,10}
   1 | {49,97,130,10,1}
(6 rows)

Time: 1.238 ms
</pre>

<p>That shows how it works for some <em>happy</em> number, and it's easy to test for a
non-happy one, like for example <code>17</code>. The query won't cycle thanks to the <code>seen</code>
array and the <code>having</code> filter, so the only difference between an <em>happy</em> and a
<em>sad</em> number will be that in the former case the last line output by the
recursive query will have <code>n = 1</code>. Let's expand this knowledge
into a proper function (because we want to be able to have the number we
test for happiness as an argument):</p>

<pre class="src">
create or replace function happy(x bigint)
  returns boolean
  language sql
as $$
with recursive happy(n, seen) as (
    select $1, <span style="color: #bc8f8f;">'{}'</span>::bigint[]
  union all
    select sum(d*d), h.seen || sum(d*d)
      from (select n, digits(n) as d, seen
              from happy
           ) as h
  group by h.n, h.seen
    having not seen @&gt; array[sum(d*d)]
)
  select n = 1 as happy
    from happy
order by array_length(seen, 1) desc nulls last
   limit 1
$$;
</pre>

<p>We need the <code>desc nulls last</code> trick in the <code>order by</code> because the <code>array_length()</code>
of any dimension of an empty array is <code>NULL</code>, and we certainly don't want to
return all and any number as unhappy on the grounds that the query result
contains a line <code>input, {}</code>. Let's now play the same tricks as in the puzzle
article:</p>

<pre class="src">
=# select array_agg(x) as happy
     from generate_series(1, 50) as t(x)
    where happy(x);
              happy
<span style="color: #b22222;">----------------------------------
</span> {1,7,10,13,19,23,28,31,32,44,49}
(1 row)

Time: 24.527 ms

=# explain analyze select x
                     from generate_series(1, 10000) as t(x)
                    where happy(x);
                      QUERY PLAN
<span style="color: #b22222;">------------------------------------------------------------
</span> Function Scan on generate_series t
     (cost=0.00..265.00 rows=333 width=4)
     (actual time=2.938..3651.019 rows=1442 loops=1)
   Filter: happy((x)::bigint)
 Total runtime: 3651.534 ms
(3 rows)

Time: 3652.178 ms
</pre>

<p>(Yes, I tricked the <code>EXPLAIN ANALYZE</code> output so that it fits on the page width
here). For what it's worth, finding the first <code>10000</code> happy numbers in <em>Emacs
Lisp</em> on the same laptop takes <code>2830 ms</code>, also running a recursive version of
the code.</p>

<h3>Update, the Emacs Lisp version, inline:</h3>

<pre class="src">
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">happy?</span> (<span style="color: #228b22;">&amp;optional</span> n seen)
  <span style="color: #bc8f8f;">"return true when n is a happy number"</span>
  (interactive)
  (<span style="color: #7f007f;">let*</span> ((number    (or n (read-from-minibuffer
                           <span style="color: #bc8f8f;">"Is this number happy: "</span>)))
         (digits    (mapcar
                     'string-to-int
                     (subseq (split-string number <span style="color: #bc8f8f;">""</span>) 1 -1)))
         (squares   (mapcar (<span style="color: #7f007f;">lambda</span> (x) (* x x)) digits))
         (happiness (apply '+ squares)))
    (<span style="color: #7f007f;">cond</span> ((eq 1 happiness)      t)
          ((memq happiness seen) nil)
          (t
           (happy? (number-to-string happiness)
                   (push happiness seen))))))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">find-happy-numbers</span> (<span style="color: #228b22;">&amp;optional</span> limit)
  <span style="color: #bc8f8f;">"find all happy numbers from 1 to limit"</span>
  (interactive)
  (<span style="color: #7f007f;">let</span> ((count (or limit
                   (read-from-minibuffer
                    <span style="color: #bc8f8f;">"List of happy numbers from 1 to: "</span>)))
        happy)
    (<span style="color: #7f007f;">dotimes</span> (n (string-to-int count))
      (<span style="color: #7f007f;">when</span> (happy? (number-to-string (1+ n)))
        (push (1+ n) happy)))
    (nreverse happy)))
</pre>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 30 Aug 2010 11:00:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/blog/2010/08/30-happy-numbers.html</guid>
</item>
<item>
  <title>Happy Numbers</title>
  <link>http://tapoueh.org/blog/2010/08/30-happy-numbers.html</link>
  <description><![CDATA[h1>Happy Numbers</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/08/index.html>08</a> / </div>
<div class="date">Monday, August 30 2010, 11:00</div>
</div>
<div id="article">
<p>After discovering the excellent <a href="http://gwene.org/">Gwene</a> service, which allows you to subscribe
to <em>newsgroups</em> to read <code>RSS</code> content (<em>blogs</em>, <em>planets</em>, <em>commits</em>, etc), I came to
read this nice article about <a href="http://programmingpraxis.com/2010/07/23/happy-numbers/">Happy Numbers</a>. That's a little problem that
fits well an interview style question, so I first solved it yesterday
evening in <a href="http://www.gnu.org/software/emacs/emacs-lisp-intro/html_node/List-Processing.html#List-Processing">Emacs Lisp</a> as that's the language I use the most those days.</p>

<blockquote>
<p class="quoted">
A happy number is defined by the following process. Starting with any
positive integer, replace the number by the sum of the squares of its
digits, and repeat the process until the number equals 1 (where it will
stay), or it loops endlessly in a cycle which does not include 1. Those
numbers for which this process ends in 1 are happy numbers, while those
that do not end in 1 are unhappy numbers (or sad numbers).</p>

</blockquote>

<p>Now, what about implementing the same in pure <code>SQL</code>, for more fun? Now that's
interesting! After all, we didn't get <code>WITH RECURSIVE</code> for tree traversal
only, <a href="http://archives.postgresql.org/message-id/e08cc0400911042333o5361b21cu2c9438f82b1e55ce@mail.gmail.com">did we</a>?</p>

<p>Unfortunately, we need a little helper function first, if only to ease the
reading of the recursive query. I didn't try to inline it, but here it goes:</p>

<pre class="src">
create or replace function digits(x bigint)
  returns setof int
  language sql
as $$
  select substring($1::text from i for 1)::int
    from generate_series(1, length($1::text)) as t(i)
$$;
</pre>

<p>That was easy: it will output one row per digit of the input number — and
rather than resorting to powers of ten and divisions and remainders, we do
use plain old text representation and <code>substring</code>. Now, to the real
problem. If you're read what is an happy number and already did read the
fine manual about <a href="http://www.postgresql.org/docs/8.4/interactive/queries-with.html">Recursive Query Evaluation</a>, it should be quite easy to
read the following:</p>

<pre class="src">
with recursive happy(n, seen) as (
    select 7::bigint, <span style="color: #ad7fa8; font-style: italic;">'{}'</span>::bigint[]
  union all
    select sum(d*d), h.seen || sum(d*d)
      from (select n, digits(n) as d, seen
              from happy
           ) as h
  group by h.n, h.seen
    having not seen @&gt; array[sum(d*d)]
)
  select * from happy;
  n  |       seen
<span style="color: #888a85;">-----+------------------
</span>   7 | {}
  49 | {49}
  97 | {49,97}
 130 | {49,97,130}
  10 | {49,97,130,10}
   1 | {49,97,130,10,1}
(6 rows)

Time: 1.238 ms
</pre>

<p>That shows how it works for some <em>happy</em> number, and it's easy to test for a
non-happy one, like for example <code>17</code>. The query won't cycle thanks to the <code>seen</code>
array and the <code>having</code> filter, so the only difference between an <em>happy</em> and a
<em>sad</em> number will be that in the former case the last line output by the
recursive query will have <code>n = 1</code>. Let's expand this knowledge
into a proper function (because we want to be able to have the number we
test for happiness as an argument):</p>

<pre class="src">
create or replace function happy(x bigint)
  returns boolean
  language sql
as $$
with recursive happy(n, seen) as (
    select $1, <span style="color: #ad7fa8; font-style: italic;">'{}'</span>::bigint[]
  union all
    select sum(d*d), h.seen || sum(d*d)
      from (select n, digits(n) as d, seen
              from happy
           ) as h
  group by h.n, h.seen
    having not seen @&gt; array[sum(d*d)]
)
  select n = 1 as happy
    from happy
order by array_length(seen, 1) desc nulls last
   limit 1
$$;
</pre>

<p>We need the <code>desc nulls last</code> trick in the <code>order by</code> because the <code>array_length()</code>
of any dimension of an empty array is <code>NULL</code>, and we certainly don't want to
return all and any number as unhappy on the grounds that the query result
contains a line <code>input, {}</code>. Let's now play the same tricks as in the puzzle
article:</p>

<pre class="src">
=# select array_agg(x) as happy
     from generate_series(1, 50) as t(x)
    where happy(x);
              happy
<span style="color: #888a85;">----------------------------------
</span> {1,7,10,13,19,23,28,31,32,44,49}
(1 row)

Time: 24.527 ms

=# explain analyze select x
                     from generate_series(1, 10000) as t(x)
                    where happy(x);
                      QUERY PLAN
<span style="color: #888a85;">------------------------------------------------------------
</span> Function Scan on generate_series t
     (cost=0.00..265.00 rows=333 width=4)
     (actual time=2.938..3651.019 rows=1442 loops=1)
   Filter: happy((x)::bigint)
 Total runtime: 3651.534 ms
(3 rows)

Time: 3652.178 ms
</pre>

<p>(Yes, I tricked the <code>EXPLAIN ANALYZE</code> output so that it fits on the page width
here). For what it's worth, finding the first <code>10000</code> happy numbers in <em>Emacs
Lisp</em> on the same laptop takes <code>2830 ms</code>, also running a recursive version of
the code.</p>

<h3>Update, the Emacs Lisp version, inline:</h3>

<pre class="src">
(<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">happy?</span> (<span style="color: #8ae234; font-weight: bold;">&amp;optional</span> n seen)
  <span style="color: #888a85;">"return true when n is a happy number"</span>
  (interactive)
  (<span style="color: #729fcf; font-weight: bold;">let*</span> ((number    (or n (read-from-minibuffer
                           <span style="color: #ad7fa8; font-style: italic;">"Is this number happy: "</span>)))
         (digits    (mapcar
                     'string-to-int
                     (subseq (split-string number <span style="color: #ad7fa8; font-style: italic;">""</span>) 1 -1)))
         (squares   (mapcar (<span style="color: #729fcf; font-weight: bold;">lambda</span> (x) (* x x)) digits))
         (happiness (apply '+ squares)))
    (<span style="color: #729fcf; font-weight: bold;">cond</span> ((eq 1 happiness)      t)
          ((memq happiness seen) nil)
          (t
           (happy? (number-to-string happiness)
                   (push happiness seen))))))

(<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">find-happy-numbers</span> (<span style="color: #8ae234; font-weight: bold;">&amp;optional</span> limit)
  <span style="color: #888a85;">"find all happy numbers from 1 to limit"</span>
  (interactive)
  (<span style="color: #729fcf; font-weight: bold;">let</span> ((count (or limit
                   (read-from-minibuffer
                    <span style="color: #ad7fa8; font-style: italic;">"List of happy numbers from 1 to: "</span>)))
        happy)
    (<span style="color: #729fcf; font-weight: bold;">dotimes</span> (n (string-to-int count))
      (<span style="color: #729fcf; font-weight: bold;">when</span> (happy? (number-to-string (1+ n)))
        (push (1+ n) happy)))
    (nreverse happy)))
</pre>



<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/emacs.html">Emacs</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 30 Aug 2010 11:00:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/30-happy-numbers.html</guid>
</item>
<item>
  <title>welcome el-get scratch installer</title>
  <link>http://tapoueh.org/blog/2010/08/blog/2010/08/27-welcome-el-get-scratch-installer.html</link>
  <description><![CDATA[<p><span class="hack"> </span></p>

<p>A very good remark from some users: installing and managing <code>el-get</code> should be
simpler. They wanted both an easy install of the thing, and a way to be able
to manage it afterwards (like, update the local copy against the
authoritative source). So I decided it was high time for getting the code
out of my <code>~/.emacs.d</code> git repository and up to a public place:
<a href="http://github.com/dimitri/el-get">http://github.com/dimitri/el-get</a>.</p>

<p>Then, I added some documentation (a <code>README</code>), and then, a <code>*scratch*
installer</code>, following great ideas from <code>ELPA</code>. So have at it, it's a copy paste
away!</p>

<p>Don't forget to setup your <code>el-get-sources</code> and include there the <code>el-get</code>
source for updates, there's nothing magic about it so it's up to you. You
may notice that it's not yet possible to init <code>el-get</code> from <code>el-get-sources</code>,
though, that's the drawback of the lack of magic. So you will have to still
add an explicit <code>(require 'el-get)</code> before to go and define you own
<code>el-get-sources</code> then finally <code>(el-get)</code>. I don't think that's a problem I need
to solve, though.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 27 Aug 2010 14:15:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/blog/2010/08/27-welcome-el-get-scratch-installer.html</guid>
</item>
<item>
  <title>welcome el-get scratch installer</title>
  <link>http://tapoueh.org/blog/2010/08/27-welcome-el-get-scratch-installer.html</link>
  <description><![CDATA[h1>welcome el-get scratch installer</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/08/index.html>08</a> / </div>
<div class="date">Friday, August 27 2010, 14:15</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>A very good remark from some users: installing and managing <code>el-get</code> should be
simpler. They wanted both an easy install of the thing, and a way to be able
to manage it afterwards (like, update the local copy against the
authoritative source). So I decided it was high time for getting the code
out of my <code>~/.emacs.d</code> git repository and up to a public place:
<a href="http://github.com/dimitri/el-get">http://github.com/dimitri/el-get</a>.</p>

<p>Then, I added some documentation (a <code>README</code>), and then, a <code>*scratch*
installer</code>, following great ideas from <code>ELPA</code>. So have at it, it's a copy paste
away!</p>

<p>Don't forget to setup your <code>el-get-sources</code> and include there the <code>el-get</code>
source for updates, there's nothing magic about it so it's up to you. You
may notice that it's not yet possible to init <code>el-get</code> from <code>el-get-sources</code>,
though, that's the drawback of the lack of magic. So you will have to still
add an explicit <code>(require 'el-get)</code> before to go and define you own
<code>el-get-sources</code> then finally <code>(el-get)</code>. I don't think that's a problem I need
to solve, though.</p>



<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 27 Aug 2010 14:15:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/27-welcome-el-get-scratch-installer.html</guid>
</item>
<item>
  <title>Playing with bit strings</title>
  <link>http://tapoueh.org/blog/2010/08/26-playing-with-bit-strings.html</link>
  <description><![CDATA[h1>Playing with bit strings</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/08/index.html>08</a> / </div>
<div class="date">Thursday, August 26 2010, 17:45</div>
</div>
<div id="article">
<p>The idea of the day ain't directly from me, I'm just helping with a very
thin subpart of the problem. The problem, I can't say much about, let's just
assume you want to reduce the storage of <code>MD5</code> in your database, so you want
to abuse <a href="http://www.postgresql.org/docs/8.4/interactive/datatype-bit.html">bit strings</a>. A solution to use them works fine, but the datatype is
still missing some facilities, for example going from and to hexadecimal
representation in text.</p>

<pre class="src">
create or replace function hex_to_varbit(h text)
 returns varbit
 language sql
as $$
  select (<span style="color: #ad7fa8; font-style: italic;">'X'</span> || $1)::varbit;
$$;

create or replace function varbit_to_hex(b varbit)
 returns text
 language sql
as $$
  select array_to_string(array_agg(to_hex((b &lt;&lt; (32*o))::bit(32)::bigint)), <span style="color: #ad7fa8; font-style: italic;">''</span>)
    from (select b, generate_series(0, n-1) as o
            from (select $1, octet_length($1)/4) as t(b, n)) as x
$$;
</pre>

<p>To understand the magic in the second function, let's walk through the tests
one could do when wanting to grasp how things work in the <code>bitstring</code> world
(using also some reading of the fine documentation, too).</p>

<pre class="src">
=# select ('101011001011100110010110'::varbit &lt;&lt; 0)::bit(8);
   bit
----------
 10101100
(1 row)

=# select ('101011001011100110010110'::varbit &lt;&lt; 8)::bit(8);
   bit
----------
 10111001
(1 row)

=# select ('101011001011100110010110'::varbit &lt;&lt; 16)::bit(8);
   bit
----------
 10010110
(1 row)

=# select * from *TEMP VERSION OF THE FUNCTION FOR TESTING*
 o |                b                 |    x
---+----------------------------------+----------
 0 | 10101100101111010001100011011011 | acbd18db
 1 | 01001100110000101111100001011100 | 4cc2f85c
 2 | 11101101111011110110010101001111 | edef654f
 3 | 11001100110001001010010011011000 | ccc4a4d8
(4 rows)
</pre>

<p>What do we get from that, will you ask? Let's see a little example:</p>

<pre class="src">
=# select hex_to_varbit(md5('foo'));
                                                          hex_to_varbit
----------------------------------------------------------------------------------------------------------------------------------
 10101100101111010001100011011011010011001100001011111000010111001110110111101111011001010100111111001100110001001010010011011000
(1 row)

=# select md5('foo'), varbit_to_hex(hex_to_varbit(md5('foo')));
               md5                |          varbit_to_hex
----------------------------------+----------------------------------
 acbd18db4cc2f85cedef654fccc4a4d8 | acbd18db4cc2f85cedef654fccc4a4d8
(1 row)
</pre>

<p>Storing <code>varbits</code> rather than the <code>text</code> form of the <code>MD5</code> allows us to go from
<code>6510 MB</code> down to <code>4976 MB</code> on a sample table containing 100 millions
rows. We're targeting more that that, so that's a great win down here!</p>

<p>In case you wonder, querying the main index on <code>varbit</code> rather than the one on
<code>text</code> for a single result row, the cost of doing the conversion with
<code>varbit_to_hex</code> seems to be around <code>28 µs</code>. We can afford it.</p>

<p>Hope this helps!</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 26 Aug 2010 17:45:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/26-playing-with-bit-strings.html</guid>
</item>
<item>
  <title>el-get news</title>
  <link>http://tapoueh.org/blog/2010/08/blog/2010/08/26-el-get-news.html</link>
  <description><![CDATA[<p>I've been receiving some requests for <a href="http://www.emacswiki.org/emacs/el-get.el">el-get</a>, some of them even included a
patch. So now there's support for <code>bzr</code>, <code>CSV</code> and <code>http-tar</code>, augmenting the
existing support for <code>git</code>, <code>git-svn</code>, <code>apt-get</code>, <code>fink</code> and <code>ELPA</code> formats.</p>

<p>Also, as the <code>install</code> and even the <code>build</code> are completely <em>asynchronous</em> —
there's a pending bugfix for the building, which is now using
<a href="http://www.gnu.org/software/emacs/elisp/html_node/Asynchronous-Processes.html">start-process-shell-command</a>. The advantage of doing so is that you're free
to use Emacs as usual while <code>el-get</code> is having your piece of <code>elisp</code> code
compiled, which can take time.</p>

<p>The drawback is that it's uneasy to to do the associated setup at the right
time without support from <code>el-get</code>, so you have the new option <code>:after</code> which
takes a <code>functionp</code> object: please consider using that to give your own
special setup for the external emacs bits and pieces you're using.</p>

<p>Let's see some examples of the new features:</p>

<pre class="src">
  (<span style="color: #da70d6;">:name</span> xml-rpc-el
         <span style="color: #da70d6;">:type</span> bzr
         <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"lp:xml-rpc-el"</span>)

  (<span style="color: #da70d6;">:name</span> haskell-mode
         <span style="color: #da70d6;">:type</span> http-tar
         <span style="color: #da70d6;">:options</span> (<span style="color: #bc8f8f;">"xzf"</span>)
         <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"http://projects.haskell.org/haskellmode-emacs/haskell-mode-2.8.0.</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">tar.gz"</span>
         <span style="color: #da70d6;">:load</span> <span style="color: #bc8f8f;">"haskell-site-file.el"</span>
         <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> ()
                  (add-hook 'haskell-mode-hook 'turn-on-haskell-doc-mode)
                  (add-hook 'haskell-mode-hook 'turn-on-haskell-indentation)))

  (<span style="color: #da70d6;">:name</span> auctex
         <span style="color: #da70d6;">:type</span> cvs
         <span style="color: #da70d6;">:module</span> <span style="color: #bc8f8f;">"auctex"</span>
         <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">":pserver:anonymous@cvs.sv.gnu.org:/sources/auctex"</span>
         <span style="color: #da70d6;">:build</span> (<span style="color: #bc8f8f;">"./autogen.sh"</span> <span style="color: #bc8f8f;">"./configure"</span> <span style="color: #bc8f8f;">"make"</span>)
         <span style="color: #da70d6;">:load</span>  (<span style="color: #bc8f8f;">"auctex.el"</span> <span style="color: #bc8f8f;">"preview/preview-latex.el"</span>)
         <span style="color: #da70d6;">:info</span> <span style="color: #bc8f8f;">"doc"</span>)
</pre>

<p>As you can see, there are also the new options <code>:module</code> (only used by <code>CVS</code> so
far) and <code>:options</code> (only used by <code>http-tar</code> so far). With this later method,
the <code>:options</code> key allows you to have support for virtually any kind of <code>tar</code>
compression (<code>.tar.bz2</code>, etc).</p>

<p>The <code>CVS</code> support currently does not include authentication against the
anonymous <code>pserver</code>, because the only repository I've been asked support for
isn't using that, and the couple of servers that I know of are either
wanting no password at the prompt, or a dummy one. That's for another day,
if needed at all.</p>

<p>That pushes the little local hack to more than a thousand lines of <code>elisp</code>
code, and the next steps include proposing it to <a href="http://tromey.com/elpa/">ELPA</a> so that getting to use
it is easier than ever. You'd just have to choose whether to install <code>ELPA</code>
from <code>el-get</code> or the other way around.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 26 Aug 2010 16:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/blog/2010/08/26-el-get-news.html</guid>
</item>
<item>
  <title>el-get news</title>
  <link>http://tapoueh.org/blog/2010/08/26-el-get-news.html</link>
  <description><![CDATA[h1>el-get news</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/08/index.html>08</a> / </div>
<div class="date">Thursday, August 26 2010, 16:30</div>
</div>
<div id="article">
<p>I've been receiving some requests for <a href="http://www.emacswiki.org/emacs/el-get.el">el-get</a>, some of them even included a
patch. So now there's support for <code>bzr</code>, <code>CSV</code> and <code>http-tar</code>, augmenting the
existing support for <code>git</code>, <code>git-svn</code>, <code>apt-get</code>, <code>fink</code> and <code>ELPA</code> formats.</p>

<p>Also, as the <code>install</code> and even the <code>build</code> are completely <em>asynchronous</em> —
there's a pending bugfix for the building, which is now using
<a href="http://www.gnu.org/software/emacs/elisp/html_node/Asynchronous-Processes.html">start-process-shell-command</a>. The advantage of doing so is that you're free
to use Emacs as usual while <code>el-get</code> is having your piece of <code>elisp</code> code
compiled, which can take time.</p>

<p>The drawback is that it's uneasy to to do the associated setup at the right
time without support from <code>el-get</code>, so you have the new option <code>:after</code> which
takes a <code>functionp</code> object: please consider using that to give your own
special setup for the external emacs bits and pieces you're using.</p>

<p>Let's see some examples of the new features:</p>

<pre class="src">
  (<span style="color: #729fcf;">:name</span> xml-rpc-el
         <span style="color: #729fcf;">:type</span> bzr
         <span style="color: #729fcf;">:url</span> <span style="color: #ad7fa8; font-style: italic;">"lp:xml-rpc-el"</span>)

  (<span style="color: #729fcf;">:name</span> haskell-mode
         <span style="color: #729fcf;">:type</span> http-tar
         <span style="color: #729fcf;">:options</span> (<span style="color: #ad7fa8; font-style: italic;">"xzf"</span>)
         <span style="color: #729fcf;">:url</span> <span style="color: #ad7fa8; font-style: italic;">"http://projects.haskell.org/haskellmode-emacs/haskell-mode-2.8.0.</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">tar.gz"</span>
         <span style="color: #729fcf;">:load</span> <span style="color: #ad7fa8; font-style: italic;">"haskell-site-file.el"</span>
         <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> ()
                  (add-hook 'haskell-mode-hook 'turn-on-haskell-doc-mode)
                  (add-hook 'haskell-mode-hook 'turn-on-haskell-indentation)))

  (<span style="color: #729fcf;">:name</span> auctex
         <span style="color: #729fcf;">:type</span> cvs
         <span style="color: #729fcf;">:module</span> <span style="color: #ad7fa8; font-style: italic;">"auctex"</span>
         <span style="color: #729fcf;">:url</span> <span style="color: #ad7fa8; font-style: italic;">":pserver:anonymous@cvs.sv.gnu.org:/sources/auctex"</span>
         <span style="color: #729fcf;">:build</span> (<span style="color: #ad7fa8; font-style: italic;">"./autogen.sh"</span> <span style="color: #ad7fa8; font-style: italic;">"./configure"</span> <span style="color: #ad7fa8; font-style: italic;">"make"</span>)
         <span style="color: #729fcf;">:load</span>  (<span style="color: #ad7fa8; font-style: italic;">"auctex.el"</span> <span style="color: #ad7fa8; font-style: italic;">"preview/preview-latex.el"</span>)
         <span style="color: #729fcf;">:info</span> <span style="color: #ad7fa8; font-style: italic;">"doc"</span>)
</pre>

<p>As you can see, there are also the new options <code>:module</code> (only used by <code>CVS</code> so
far) and <code>:options</code> (only used by <code>http-tar</code> so far). With this later method,
the <code>:options</code> key allows you to have support for virtually any kind of <code>tar</code>
compression (<code>.tar.bz2</code>, etc).</p>

<p>The <code>CVS</code> support currently does not include authentication against the
anonymous <code>pserver</code>, because the only repository I've been asked support for
isn't using that, and the couple of servers that I know of are either
wanting no password at the prompt, or a dummy one. That's for another day,
if needed at all.</p>

<p>That pushes the little local hack to more than a thousand lines of <code>elisp</code>
code, and the next steps include proposing it to <a href="http://tromey.com/elpa/">ELPA</a> so that getting to use
it is easier than ever. You'd just have to choose whether to install <code>ELPA</code>
from <code>el-get</code> or the other way around.</p>


<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 26 Aug 2010 16:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/26-el-get-news.html</guid>
</item>
<item>
  <title>el-get and dim-switch-window status update</title>
  <link>http://tapoueh.org/blog/2010/08/blog/2010/08/09-el-get-and-dim-switch-window-status-update.html</link>
  <description><![CDATA[<p><span class="hack"> </span></p>

<p>Thanks to you readers of <a href="http://planet.emacsen.org/">Planet Emacsen</a> taking the time to try those pieces
of emacs lisp found in my blog, and also the time to comment on them, some
bugs have been fixed, and new releases appeared.</p>

<p><a href="http://tapoueh.org/projects.html#sec20">el-get</a> had some typo kind of bug in its support for <code>apt-get</code> and <code>fink</code>
packages, and I managed to break the <code>elpa</code> and <code>http</code> support when going <em>all
asynchronous</em> by forgetting to update the call convention I'm using. Fixing
that, I also switched to using <code>url-retrieve</code> so that the <code>http</code> support also is
<em>asynchronous</em>. That makes the version <code>0.5</code>, available on <a href="http://www.emacswiki.org/emacs/el-get.el">emacswiki el-get</a>
page.</p>

<p>Meanwhile <a href="http://tapoueh.org/projects.html#sec19">dim-switch-window.el</a> got some testers too and got updated with a
nice fix, or so I think. If you're using it with a small enough emacs frame,
or some very little windows in there, you'd have noticed that the number get
so big they don't fit anymore, and all you see while it's waiting for your
window number choice is... blank windows. Not very helpful. Thanks to the
following piece of code, that's no longer the case as of the current
version, available on <a href="http://www.emacswiki.org/emacs/switch-window.el">emacswiki switch-window</a> page.</p>

<p>In short, where I used to blindly apply <code>dim:switch-window-increase</code> on the
big numbers to display, the code now checks that there's enough room for it
to get there, and adjust the <em>increase</em> level scaling it down if
necessary. Very simple, and effective too:</p>

<pre class="src">
    (<span style="color: #7f007f;">with-current-buffer</span> buf
      (text-scale-increase
       (<span style="color: #7f007f;">if</span> (&gt; (/ (float (window-body-height win))
                 dim:switch-window-increase)
              1)
           dim:switch-window-increase
         (window-body-height win)))
      (insert <span style="color: #bc8f8f;">"\n\n    "</span> (number-to-string num)))
</pre>

<p>Centering the text in the window's width is another story entirely, as the
<code>text-scale-increase</code> ain't linear on this axis. I'd take any good idea,
here's what I'm currently at, but it's not there yet:</p>

<pre class="src">
    (<span style="color: #7f007f;">with-current-buffer</span> buf
      (<span style="color: #7f007f;">let*</span> ((w (window-width win))
             (h (window-body-height win))
             (increased-lines (/ (float h) dim:switch-window-increase))
             (scale (<span style="color: #7f007f;">if</span> (&gt; increased-lines 1) dim:switch-window-increase h))
             (lines-before (/ increased-lines 2))
             (margin-left (/ w h) ))
        <span style="color: #b22222;">;; </span><span style="color: #b22222;">increase to maximum dim:switch-window-increase
</span>        (text-scale-increase scale)
        <span style="color: #b22222;">;; </span><span style="color: #b22222;">make it so that the hyuge number appears centered
</span>        (<span style="color: #7f007f;">dotimes</span> (i lines-before) (insert <span style="color: #bc8f8f;">"\n"</span>))
        (<span style="color: #7f007f;">dotimes</span> (i margin-left)  (insert <span style="color: #bc8f8f;">" "</span>))
        (insert (number-to-string num))))
</pre>

<p>So, if you're using one or the other (both?) of those utilities, update your
local version of them!</p>

<p>Note: I also fixed a but in <a href="http://github.com/dimitri/rcirc-groups">rcirc-groups</a> this week-end, but I'll talk about
it in another entry, if I may.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 09 Aug 2010 15:35:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/blog/2010/08/09-el-get-and-dim-switch-window-status-update.html</guid>
</item>
<item>
  <title>el-get and dim-switch-window status update</title>
  <link>http://tapoueh.org/blog/2010/08/09-el-get-and-dim-switch-window-status-update.html</link>
  <description><![CDATA[h1>el-get and dim-switch-window status update</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/08/index.html>08</a> / </div>
<div class="date">Monday, August 09 2010, 15:35</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>Thanks to you readers of <a href="http://planet.emacsen.org/">Planet Emacsen</a> taking the time to try those pieces
of emacs lisp found in my blog, and also the time to comment on them, some
bugs have been fixed, and new releases appeared.</p>

<p><a href="http://tapoueh.org/projects.html#sec20">el-get</a> had some typo kind of bug in its support for <code>apt-get</code> and <code>fink</code>
packages, and I managed to break the <code>elpa</code> and <code>http</code> support when going <em>all
asynchronous</em> by forgetting to update the call convention I'm using. Fixing
that, I also switched to using <code>url-retrieve</code> so that the <code>http</code> support also is
<em>asynchronous</em>. That makes the version <code>0.5</code>, available on <a href="http://www.emacswiki.org/emacs/el-get.el">emacswiki el-get</a>
page.</p>

<p>Meanwhile <a href="http://tapoueh.org/projects.html#sec19">dim-switch-window.el</a> got some testers too and got updated with a
nice fix, or so I think. If you're using it with a small enough emacs frame,
or some very little windows in there, you'd have noticed that the number get
so big they don't fit anymore, and all you see while it's waiting for your
window number choice is... blank windows. Not very helpful. Thanks to the
following piece of code, that's no longer the case as of the current
version, available on <a href="http://www.emacswiki.org/emacs/switch-window.el">emacswiki switch-window</a> page.</p>

<p>In short, where I used to blindly apply <code>dim:switch-window-increase</code> on the
big numbers to display, the code now checks that there's enough room for it
to get there, and adjust the <em>increase</em> level scaling it down if
necessary. Very simple, and effective too:</p>

<pre class="src">
    (<span style="color: #729fcf; font-weight: bold;">with-current-buffer</span> buf
      (text-scale-increase
       (<span style="color: #729fcf; font-weight: bold;">if</span> (&gt; (/ (float (window-body-height win))
                 dim:switch-window-increase)
              1)
           dim:switch-window-increase
         (window-body-height win)))
      (insert <span style="color: #ad7fa8; font-style: italic;">"\n\n    "</span> (number-to-string num)))
</pre>

<p>Centering the text in the window's width is another story entirely, as the
<code>text-scale-increase</code> ain't linear on this axis. I'd take any good idea,
here's what I'm currently at, but it's not there yet:</p>

<pre class="src">
    (<span style="color: #729fcf; font-weight: bold;">with-current-buffer</span> buf
      (<span style="color: #729fcf; font-weight: bold;">let*</span> ((w (window-width win))
             (h (window-body-height win))
             (increased-lines (/ (float h) dim:switch-window-increase))
             (scale (<span style="color: #729fcf; font-weight: bold;">if</span> (&gt; increased-lines 1) dim:switch-window-increase h))
             (lines-before (/ increased-lines 2))
             (margin-left (/ w h) ))
        <span style="color: #888a85;">;; </span><span style="color: #888a85;">increase to maximum dim:switch-window-increase
</span>        (text-scale-increase scale)
        <span style="color: #888a85;">;; </span><span style="color: #888a85;">make it so that the hyuge number appears centered
</span>        (<span style="color: #729fcf; font-weight: bold;">dotimes</span> (i lines-before) (insert <span style="color: #ad7fa8; font-style: italic;">"\n"</span>))
        (<span style="color: #729fcf; font-weight: bold;">dotimes</span> (i margin-left)  (insert <span style="color: #ad7fa8; font-style: italic;">" "</span>))
        (insert (number-to-string num))))
</pre>

<p>So, if you're using one or the other (both?) of those utilities, update your
local version of them!</p>

<p>Note: I also fixed a but in <a href="http://github.com/dimitri/rcirc-groups">rcirc-groups</a> this week-end, but I'll talk about
it in another entry, if I may.</p>


<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/switch-window.html">switch-window</a> <a href="../../../tags/rcirc.html">rcirc</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 09 Aug 2010 15:35:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/09-el-get-and-dim-switch-window-status-update.html</guid>
</item>
<item>
  <title>Editing constants in constraints</title>
  <link>http://tapoueh.org/blog/2010/08/09-editing-constants-in-constraints.html</link>
  <description><![CDATA[h1>Editing constants in constraints</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/08/index.html>08</a> / </div>
<div class="date">Monday, August 09 2010, 14:45</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>We're using constants in some constraints here, for example in cases where
several servers are replicating to the same <em>federating</em> one: each origin
server has his own schema, and all is replicated nicely on the central host,
thanks to <a href="http://wiki.postgresql.org/wiki/Londiste_Tutorial#Federated_database">Londiste</a>, as you might have guessed already.</p>

<p>For bare-metal recovery scripts, I'm working on how to change those
constants in the constraints, so that <code>pg_dump -s</code> plus some schema tweaking
would kick-start a server. Here's a <code>PLpgSQL</code> snippet to do just that:</p>

<pre class="src">
  FOR rec IN EXECUTE
$s$
SELECT schemaname, tablename, conname, attnames, def
  FROM (
   SELECT n.nspname, c.relname, r.conname,
          (select array_accum(attname)
             from pg_attribute
            where attrelid = c.oid and r.conkey @&gt; array[attnum]) as attnames,
          pg_catalog.pg_get_constraintdef(r.oid, true)
   FROM pg_catalog.pg_constraint r
        JOIN pg_class c on c.oid = r.conrelid
        JOIN pg_namespace n ON n.oid = c.relnamespace
   WHERE r.contype = <span style="color: #ad7fa8; font-style: italic;">'c'</span>
ORDER BY 1, 2, 3
       ) as cons(schemaname, tablename, conname, attnames, def)
WHERE attnames @&gt; array[<span style="color: #ad7fa8; font-style: italic;">'server'</span>]::name[]
$s$
  LOOP
    rec.def := replace(rec.def, <span style="color: #ad7fa8; font-style: italic;">'server = '</span> || old_id,
                                <span style="color: #ad7fa8; font-style: italic;">'server = '</span> || new_id);

    sql := <span style="color: #ad7fa8; font-style: italic;">'ALTER TABLE '</span> || rec.schemaname || <span style="color: #ad7fa8; font-style: italic;">'.'</span> || rec.tablename
        || <span style="color: #ad7fa8; font-style: italic;">' DROP CONSTRAINT '</span> || rec.conname;
    RAISE NOTICE <span style="color: #ad7fa8; font-style: italic;">'%'</span>, sql;
    RETURN NEXT;
    EXECUTE sql;

    sql := <span style="color: #ad7fa8; font-style: italic;">'ALTER TABLE '</span> || rec.schemaname || <span style="color: #ad7fa8; font-style: italic;">'.'</span> || rec.tablename
        || <span style="color: #ad7fa8; font-style: italic;">' ADD '</span> || rec.def;
    RAISE NOTICE <span style="color: #ad7fa8; font-style: italic;">'%'</span>, sql;
    RETURN NEXT;
    EXECUTE sql;

  END LOOP;
</pre>

<p>This relies on the fact that our constraints are on the column <code>server</code>. Why
would this be any better than a <code>sed</code> one-liner, would you ask me? I'm fed up
with having pseudo-parsing scripts and taking the risk that the simple
command will change data I didn't want to edit. I want context aware tools,
pretty please, to <em>feel</em> safe.</p>

<p>Otherwise I'd might have gone with <code>pg_dump -s| sed -e 's:\(server =\)
17:\1 18:'</code> but this one-liner already contains too much useless magic
for my taste (the space before <code>17</code> ain't in the group match to allow for
having <code>\1 18</code> in the right hand side. And this isn't yet parametrized, and
there I'll need to talk to the database, as that's were I store the servers
name and their id (a <code>bigserial</code> — yes, the constraints are all generated from
scripts). I don't want to write an <em>SQL parser</em> and I don't want to play
loose, so the <code>PLpgSQL</code> approach is what I'm thinking as the best tool
here. Opinionated answers get to my mailbox!</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/plpgsql.html">plpgsql</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 09 Aug 2010 14:45:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/09-editing-constants-in-constraints.html</guid>
</item>
<item>
  <title>debian packaging PostgreSQL extensions</title>
  <link>http://tapoueh.org/blog/2010/08/06-debian-packaging-postgresql-extensions.html</link>
  <description><![CDATA[h1>debian packaging PostgreSQL extensions</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/08/index.html>08</a> / </div>
<div class="date">Friday, August 06 2010, 13:00</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>In trying to help an extension <em>debian packaging</em> effort, I've once again
proposed to handle it. That's because I now begin to know how to do it, as
you can see in my <a href="http://qa.debian.org/developer.php?login=dim%40tapoueh.org">package overview</a> page at <em>debian QA</em> facility. There's a
reason why I proposed myself here, it's that yet another tool of mine is now
to be found in <em>debian</em>, and should greatly help <em>extension packaging</em>
there. You can already check for the <a href="http://packages.debian.org/sid/postgresql-server-dev-all">postgresql-server-dev-all</a> package page
if you're that impatient!</p>

<p>Back? Ok, so I used to have two main gripes against debian support for
<a href="http://www.postgresql.org/">PostgreSQL</a>. The first one, which is now feeling alone, is that both project
<a href="http://wiki.postgresql.org/wiki/PostgreSQL_Release_Support_Policy">release support policy</a> aren't compatible enough for debian stable to include
all currently supported stable PostgreSQL major version. That's very bad
that debian stable will only propose one major version, knowing that the
support for several of them is in there.</p>

<p>The problem is two fold: first, debian stable has to maintain any
distributed package. There's no <em>deprecation policy</em> allowing for droping the
ball. So the other side of this coin is that debian developers must take on
themselves maintaining included software for as long as stable is not
renamed <code>oldstable</code>. And it so happens that there's no debian developer that
feels like maintaining <em>end of lined</em> PostgreSQL releases without help from
<a href="http://www.postgresql.org/community/contributors/">PostgreSQL Core Team</a>. Or, say, without official statement that they would
help.</p>

<p>Now, why I don't like this situation is because I'm pretty sure there's very
few software development group offering as long and reliable maintenance
policy as PostgreSQL is doing, but debian will still happily distribute
<em>unknown-maintenance-policy</em> pieces of code in its stable repositories. So the
<em>uncertainty</em> excuse is rather poor. And highly frustrating.</p>

<blockquote>
<p class="quoted">
<strong><em>Note:</em></strong> you have to admit that the debian stable management model copes very
well with all the debian included software. You can't release stable with
a new PostgreSQL major version unless each and every package depending on
PostgreSQL will actually work with the newer version, and the debian
scripts will care for upgrading the cluster. Where it's not working good
is when you're using debian for a PostgreSQL server for a proprietary
application, which happens quite frequently too.</p>

</blockquote>

<p>The consequence of this fact leads to my second main gripe against debian
support for PostgreSQL: the extensions. It so happens that the PostgreSQL
extensions are developped for supporting several major versions from the
same source code. So typically, all you need to do is recompile the
extension against the new major version, and there you go.</p>

<p>Now, say debian new stable is coming with <a href="http://packages.debian.org/squeeze/postgresql-8.4">8.4</a> rather than <a href="http://packages.debian.org/lenny/postgresql-8.3">8.3</a> as it used
to. You should be able to just build the extensions (like <a href="http://packages.debian.org/squeeze/postgresql-8.4-prefix">prefix</a>), without
changing the source package, nor droping <code>postgresql-8.3-prefix</code> from the
distribution on the grounds that <code>8.3</code> ain't in debian stable anymore.</p>

<p>I've been ranting a lot about this state of facts, and I finally provided a
patch to the <a href="http://packages.debian.org/sid/postgresql-common">postgresql-common</a> debian packaging, which made it into version
<code>110</code>: welcome <a href="http://packages.debian.org/sid/postgresql-server-dev-all">pg_buildext</a>. An exemple of how to use it can be found in the
git branch for <a href="http://github.com/dimitri/prefix">prefix</a>, it shows up in <a href="http://github.com/dimitri/prefix/blob/master/debian/pgversions">debian/pgversions</a> and <a href="http://github.com/dimitri/prefix/blob/master/debian/rules">debian/rules</a>
files.</p>

<p>As you can see, the <code>pg_buildext</code> tool allows you to list the PostgreSQL major
versions the extension you're packaging supports, and only those that are
both in your list and in the current debian supported major version list
will get built. <code>pg_buildext</code> will do a <code>VPATH</code> build of your extension, so it's
capable of building the same extension for multiple major versions of
PostgreSQL. Here's how it looks:</p>

<pre class="src">
        # build all supported version
        pg_buildext build $(SRCDIR) $(TARGET) <span style="color: #ad7fa8; font-style: italic;">"$(CFLAGS)"</span>

        # then install each of them
        for v in `pg_buildext supported-versions $(SRCDIR)`; do \
                dh_install -ppostgresql-$$v-prefix ;\
        done
</pre>

<p>And the files are to be found in those places:</p>

<pre class="src">
dim ~/dev/prefix cat debian/postgresql-8.3-prefix.install
debian/prefix-8.3/prefix.so usr/lib/postgresql/8.3/lib
debian/prefix-8.3/prefix.sql usr/share/postgresql/8.3/contrib

dim ~/dev/prefix cat debian/postgresql-8.4-prefix.install
debian/prefix-8.4/prefix.so usr/lib/postgresql/8.4/lib
debian/prefix-8.4/prefix.sql usr/share/postgresql/8.4/contrib
</pre>

<p>So you still need to maintain <a href="http://github.com/dimitri/prefix/blob/master/debian/pgversions">debian/pgversions</a> and the
<code>postgresql-X.Y-extension.*</code> files, but then a change in debian support for
PostgreSQL major versions will be handled automatically (there's a facility
to trigger automatic rebuild when necessary).</p>

<p>All this ranting to explain that pretty soon, the extenion's packages that I
maintain will no longer have to be patched when dropping a previously
supported major version of PostgreSQL. I'm breathing a little better, so
thanks a lot <a href="http://www.piware.de/category/debian/">Martin</a>!</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/prefix.html">prefix</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 06 Aug 2010 13:00:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/06-debian-packaging-postgresql-extensions.html</guid>
</item>
<item>
  <title>Querying the Catalog to plan an upgrade</title>
  <link>http://tapoueh.org/blog/2010/08/05-querying-the-catalog-to-plan-an-upgrade.html</link>
  <description><![CDATA[h1>Querying the Catalog to plan an upgrade</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/08/index.html>08</a> / </div>
<div class="date">Thursday, August 05 2010, 11:00</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>Some user on <code>IRC</code> was reading the releases notes in order to plan for a minor
upgrade of his <code>8.3.3</code> installation, and was puzzled about potential needs for
rebuilding <code>GIST</code> indexes. That's from the <a href="http://www.postgresql.org/docs/8.3/static/release-8-3-5.html">8.3.5 release notes</a>, and from the
<a href="http://www.postgresql.org/docs/8.3/static/release-8-3-8.html">8.3.8 notes</a> you see that you need to consider <em>hash</em> indexes on <em>interval</em>
columns too. Now the question is, how to find out if any such beasts are in
use in your database?</p>

<p>It happens that <a href="http://www.postgresql.org/">PostgreSQL</a> is letting you know those things by querying its
<a href="http://www.postgresql.org/docs/8.4/static/catalogs.html">system catalogs</a>. That might look hairy at first, but it's very worth getting
used to those system tables. You could compare that to introspection and
reflexive facilities of some programming languages, except much more useful,
because you're reaching all the system at once. But, well, here it goes:</p>

<pre class="src">
SELECT schemaname, tablename, relname, amname, indexdef
  FROM pg_indexes i
       JOIN pg_class c ON i.indexname = c.relname and c.relkind = <span style="color: #ad7fa8; font-style: italic;">'i'</span>
       JOIN pg_am am ON c.relam = am.oid
 WHERE amname = <span style="color: #ad7fa8; font-style: italic;">'gist'</span>;
</pre>

<p>Now you could replace the <code>WHERE</code> clause with <code>WHERE amname IN ('gist', 'hash')</code>
to check both conditions at once. What about pursuing the restriction on the
<em>hash</em> indexes rebuild to schedule, as they should only get done to indexes on
<code>interval</code> columns. Well let's try it:</p>

<pre class="src">
SELECT schemaname, tablename, relname as indexname, amname, indclass
  FROM pg_indexes i
       JOIN pg_class c on i.indexname = c.relname and c.relkind = <span style="color: #ad7fa8; font-style: italic;">'i'</span>
       JOIN pg_am am on c.relam = am.oid
       JOIN pg_index x on x.indexrelid = c.oid
 WHERE amname in (<span style="color: #ad7fa8; font-style: italic;">'btree'</span>, <span style="color: #ad7fa8; font-style: italic;">'gist'</span>)
       and schemaname not in (<span style="color: #ad7fa8; font-style: italic;">'pg_catalog'</span>, <span style="color: #ad7fa8; font-style: italic;">'information_schema'</span>);
</pre>

<p>We're not there yet, because as you notice, the catalogs are somewhat
optimized and not always in a normal form. That's good for the system's
performance, but it makes querying a bit uneasy. What we want is to get from
the <code>indclass</code> column if there's any of them (it's an <code>oidvector</code>) that applies
to an <code>interval</code> data type. There's a subtlety here as the index could store
<code>interval</code> data even if the column is not of an <code>interval</code> type itself, so we
have to find both cases.</p>

<p>Well the <em>subtlety</em> applies after you know what an <a href="http://www.postgresql.org/docs/8.4/static/xindex.html">operator class</a> is: <em>“An
operator class defines how a particular data type can be used with an
index”</em> is what the <a href="http://www.postgresql.org/docs/8.4/static/sql-createopclass.html">CREATE OPERATOR CLASS</a> manual page teaches us. What we
need to know here is that an index will talk to an operator class to get to
the data type, either the <em>column</em> data type or the index <em>storage</em> one.</p>

<pre class="src">
SELECT schemaname, tablename, relname as indexname, amname, indclass, opcname, typname
  FROM pg_indexes i
       JOIN pg_class c on i.indexname = c.relname and c.relkind = <span style="color: #ad7fa8; font-style: italic;">'i'</span>
       JOIN pg_am am on c.relam = am.oid
       JOIN pg_index x on x.indexrelid = c.oid
       JOIN pg_opclass o
         on string_to_array(x.indclass::text, <span style="color: #ad7fa8; font-style: italic;">' '</span>)::oid[] @&gt; array[o.oid]::oid[]
       JOIN pg_type t on o.opckeytype = t.oid
WHERE amname = <span style="color: #ad7fa8; font-style: italic;">'hash'</span> and t.typname = <span style="color: #ad7fa8; font-style: italic;">'interval'</span>

UNION ALL

SELECT schemaname, tablename, relname as indexname, amname, indclass, opcname, typname
  FROM pg_indexes i
       JOIN pg_class c on i.indexname = c.relname and c.relkind = <span style="color: #ad7fa8; font-style: italic;">'i'</span>
       JOIN pg_am am on c.relam = am.oid
       JOIN pg_index x on x.indexrelid = c.oid
       JOIN pg_opclass o
         on string_to_array(x.indclass::text, <span style="color: #ad7fa8; font-style: italic;">' '</span>)::oid[] @&gt; array[o.oid]::oid[]
       JOIN pg_type t on o.opcintype = t.oid
WHERE amname = <span style="color: #ad7fa8; font-style: italic;">'hash'</span> and t.typname = <span style="color: #ad7fa8; font-style: italic;">'interval'</span>;
</pre>

<p>Most certainly this query will return no row for you, as <em>hash</em> indexes are
not widely used, mainly because they are not crash tolerant. For seeing some
results you could remove the <code>amname</code> restriction of course, that would show
the query is working, but don't forget to add the restriction back to plan
for the upgrade!</p>

<p>But hey, why walking the extra mile here, would you ask me? After all, in
the second query we would already have had the information we needed should
we added the <code>indexdef</code> column, albeit in a human reader friendly way: the
<em>resultset</em> would then contain the <code>CREATE INDEX</code> command you need to issue to
build the index from scratch. That would be enough for checking only the
catalog, but the extra mile allows you to produce a <code>SQL</code> script to build the
indexes that need your attention post upgrade. That last step is left as an
exercise for the reader, though.</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/catalogs.html">catalogs</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 05 Aug 2010 11:00:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/05-querying-the-catalog-to-plan-an-upgrade.html</guid>
</item>
<item>
  <title>el-get</title>
  <link>http://tapoueh.org/blog/2010/08/blog/2010/08/04-el-get.html</link>
  <description><![CDATA[<p>I've been using emacs for a long time, and a long time it took me to
consider learning <a href="http://www.gnu.org/software/emacs/emacs-lisp-intro/html_node/index.html">Emacs Lisp</a>. Before that, I didn't trust my level of
understanding enough to be comfortable in managing my setup efficiently.</p>

<p>One of the main problems of setting up <a href="http://www.gnu.org/software/emacs/">Emacs</a> is that not only you tend to
accumulate so many tricks from <a href="http://www.emacswiki.org/">EmacsWiki</a> and <a href="http://planet.emacsen.org/">blog posts</a> that your <code>.emacs</code> has
to grow to a full <code>~/.emacs.d/</code> directory (starting at <code>~/.emacs.d/init.el</code>),
but also you finally depend on several <em>librairies</em> of code you're not
authoring nor maintaining. Let's call them <em>packages</em>.</p>

<p>Some of them will typically be available on <a href="http://tromey.com/elpa/index.html">ELPA</a>, which allows you to
breathe and keep cool. But most of them, let's face it, are not there. Most
of the packages I use I tend to get them either from <a href="http://www.debian.org/">debian</a> (see
<a href="http://packages.debian.org/sid/apt-rdepends">apt-rdepends</a> for having the complete list of packages that depends on emacs,
unfortunately I'm not finding an online version of the tool to link too), or
from <code>ELPA</code>, or from their own <code>git</code> repository somewhere. Some of them even I
get directly from an <a href="http://www.splode.com/~friedman/software/emacs-lisp">obscure website</a> not maintained anymore, but always
there when you need them.</p>

<p>Of course, my emacs setup is managed in a private <code>git</code> repository. Some
people on <code>#emacs</code> are using <a href="http://www.kernel.org/pub/software/scm/git/docs/git-submodule.html">git submodules</a> (or was it straight <em>import</em>) for
managing external repositories in there, but all I can say is that I frown
on this idea. I want an easy canonical list of packages I depend on to run
emacs, and I want this documentation to be usable as-is. Enters <a href="http://www.emacswiki.org/emacs/el-get.el">el-get</a>!</p>

<p>As we're all damn lazy, here's a <em>visual</em> introduction to <code>el-get</code>:</p>

<pre class="src">
(setq el-get-sources
      '((<span style="color: #da70d6;">:name</span> bbdb
               <span style="color: #da70d6;">:type</span> git
               <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"git://github.com/barak/BBDB.git"</span>
               <span style="color: #da70d6;">:load-path</span> (<span style="color: #bc8f8f;">"./lisp"</span> <span style="color: #bc8f8f;">"./bits"</span>)
               <span style="color: #da70d6;">:info</span> <span style="color: #bc8f8f;">"texinfo"</span>
               <span style="color: #da70d6;">:build</span> (<span style="color: #bc8f8f;">"./configure"</span> <span style="color: #bc8f8f;">"make"</span>))

        (<span style="color: #da70d6;">:name</span> magit
               <span style="color: #da70d6;">:type</span> git
               <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"http://github.com/philjackson/magit.git"</span>
               <span style="color: #da70d6;">:info</span> <span style="color: #bc8f8f;">"."</span>
               <span style="color: #da70d6;">:build</span> (<span style="color: #bc8f8f;">"./autogen.sh"</span> <span style="color: #bc8f8f;">"./configure"</span> <span style="color: #bc8f8f;">"make"</span>))

        (<span style="color: #da70d6;">:name</span> vkill
               <span style="color: #da70d6;">:type</span> http
               <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"http://www.splode.com/~friedman/software/emacs-lisp/src/vki</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">ll.el"</span>
               <span style="color: #da70d6;">:features</span> vkill)

        (<span style="color: #da70d6;">:name</span> yasnippet
               <span style="color: #da70d6;">:type</span> git-svn
               <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"http://yasnippet.googlecode.com/svn/trunk/"</span>)

        (<span style="color: #da70d6;">:name</span> asciidoc         <span style="color: #da70d6;">:type</span> elpa)
        (<span style="color: #da70d6;">:name</span> dictionary-el    <span style="color: #da70d6;">:type</span> apt-get)
        (<span style="color: #da70d6;">:name</span> emacs-goodies-el <span style="color: #da70d6;">:type</span> apt-get)))

(el-get)
</pre>

<p>So now you have a pretty good documentation of the packages you want
installed, where to get them, and how to install them. For the <em>advanced</em>
methods (such as <code>elpa</code> or <code>apt-get</code>), you basically just need the package
name. When relying on a bare <code>git</code> repository, you need to give some more
information, such as the <code>URL</code> to <em>clone</em> and the <code>build</code> steps if any. Then also
what <em>features</em> to <code>require</code> and maybe where to find the <em>texinfo</em> documentation
of the package, for automatic inclusion into your local <em>Info</em> menu.</p>

<p>The good news is that not only you now have a solid readable description of
all that in a central place, but this very description is all <code>(el-get)</code> needs
to do its magic. This command will check that each and every package is
installed on your system (in <code>el-get-dir</code>) and if that's not the case, it will
actually install it. Then, it will <code>init</code> the packages: that means caring
about the <code>load-path</code>, the <code>Info-directory-list</code> (and <em>dir</em> texinfo menu
building), the <em>loading</em> of the <code>emacs-lisp</code> files, and finally it will <code>require</code>
the <em>features</em>.</p>

<p>Here's a prettyfied <code>ielm</code> session that will serve as a demo:</p>

<pre class="src">
ELISP&gt; (el-get)
(<span style="color: #bc8f8f;">"aspell-en"</span> <span style="color: #bc8f8f;">"aspell-fr"</span> <span style="color: #bc8f8f;">"muse"</span> <span style="color: #bc8f8f;">"dictionary"</span> <span style="color: #bc8f8f;">"htmlize"</span> <span style="color: #bc8f8f;">"bbdb"</span> <span style="color: #bc8f8f;">"google-maps"</span>
<span style="color: #bc8f8f;">"magit"</span> <span style="color: #bc8f8f;">"emms"</span> <span style="color: #bc8f8f;">"nxhtml"</span> <span style="color: #bc8f8f;">"vkill"</span> <span style="color: #bc8f8f;">"xcscope"</span> <span style="color: #bc8f8f;">"yasnippet"</span> <span style="color: #bc8f8f;">"asciidoc"</span>
<span style="color: #bc8f8f;">"auto-dictionary"</span> <span style="color: #bc8f8f;">"css-mode"</span> <span style="color: #bc8f8f;">"gist"</span> <span style="color: #bc8f8f;">"lua-mode"</span> <span style="color: #bc8f8f;">"lisppaste"</span>)
</pre>

<p>All the packages being already installed, it's running fast enough that I
won't bother measuring the run time, that seems to be somewhere around one
second.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 04 Aug 2010 22:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/blog/2010/08/04-el-get.html</guid>
</item>
<item>
  <title>el-get</title>
  <link>http://tapoueh.org/blog/2010/08/04-el-get.html</link>
  <description><![CDATA[h1>el-get</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/08/index.html>08</a> / </div>
<div class="date">Wednesday, August 04 2010, 22:30</div>
</div>
<div id="article">
<p>I've been using emacs for a long time, and a long time it took me to
consider learning <a href="http://www.gnu.org/software/emacs/emacs-lisp-intro/html_node/index.html">Emacs Lisp</a>. Before that, I didn't trust my level of
understanding enough to be comfortable in managing my setup efficiently.</p>

<p>One of the main problems of setting up <a href="http://www.gnu.org/software/emacs/">Emacs</a> is that not only you tend to
accumulate so many tricks from <a href="http://www.emacswiki.org/">EmacsWiki</a> and <a href="http://planet.emacsen.org/">blog posts</a> that your <code>.emacs</code> has
to grow to a full <code>~/.emacs.d/</code> directory (starting at <code>~/.emacs.d/init.el</code>),
but also you finally depend on several <em>librairies</em> of code you're not
authoring nor maintaining. Let's call them <em>packages</em>.</p>

<p>Some of them will typically be available on <a href="http://tromey.com/elpa/index.html">ELPA</a>, which allows you to
breathe and keep cool. But most of them, let's face it, are not there. Most
of the packages I use I tend to get them either from <a href="http://www.debian.org/">debian</a> (see
<a href="http://packages.debian.org/sid/apt-rdepends">apt-rdepends</a> for having the complete list of packages that depends on emacs,
unfortunately I'm not finding an online version of the tool to link too), or
from <code>ELPA</code>, or from their own <code>git</code> repository somewhere. Some of them even I
get directly from an <a href="http://www.splode.com/~friedman/software/emacs-lisp">obscure website</a> not maintained anymore, but always
there when you need them.</p>

<p>Of course, my emacs setup is managed in a private <code>git</code> repository. Some
people on <code>#emacs</code> are using <a href="http://www.kernel.org/pub/software/scm/git/docs/git-submodule.html">git submodules</a> (or was it straight <em>import</em>) for
managing external repositories in there, but all I can say is that I frown
on this idea. I want an easy canonical list of packages I depend on to run
emacs, and I want this documentation to be usable as-is. Enters <a href="http://www.emacswiki.org/emacs/el-get.el">el-get</a>!</p>

<p>As we're all damn lazy, here's a <em>visual</em> introduction to <code>el-get</code>:</p>

<pre class="src">
(setq el-get-sources
      '((<span style="color: #729fcf;">:name</span> bbdb
               <span style="color: #729fcf;">:type</span> git
               <span style="color: #729fcf;">:url</span> <span style="color: #ad7fa8; font-style: italic;">"git://github.com/barak/BBDB.git"</span>
               <span style="color: #729fcf;">:load-path</span> (<span style="color: #ad7fa8; font-style: italic;">"./lisp"</span> <span style="color: #ad7fa8; font-style: italic;">"./bits"</span>)
               <span style="color: #729fcf;">:info</span> <span style="color: #ad7fa8; font-style: italic;">"texinfo"</span>
               <span style="color: #729fcf;">:build</span> (<span style="color: #ad7fa8; font-style: italic;">"./configure"</span> <span style="color: #ad7fa8; font-style: italic;">"make"</span>))

        (<span style="color: #729fcf;">:name</span> magit
               <span style="color: #729fcf;">:type</span> git
               <span style="color: #729fcf;">:url</span> <span style="color: #ad7fa8; font-style: italic;">"http://github.com/philjackson/magit.git"</span>
               <span style="color: #729fcf;">:info</span> <span style="color: #ad7fa8; font-style: italic;">"."</span>
               <span style="color: #729fcf;">:build</span> (<span style="color: #ad7fa8; font-style: italic;">"./autogen.sh"</span> <span style="color: #ad7fa8; font-style: italic;">"./configure"</span> <span style="color: #ad7fa8; font-style: italic;">"make"</span>))

        (<span style="color: #729fcf;">:name</span> vkill
               <span style="color: #729fcf;">:type</span> http
               <span style="color: #729fcf;">:url</span> <span style="color: #ad7fa8; font-style: italic;">"http://www.splode.com/~friedman/software/emacs-lisp/src/vki</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">ll.el"</span>
               <span style="color: #729fcf;">:features</span> vkill)

        (<span style="color: #729fcf;">:name</span> yasnippet
               <span style="color: #729fcf;">:type</span> git-svn
               <span style="color: #729fcf;">:url</span> <span style="color: #ad7fa8; font-style: italic;">"http://yasnippet.googlecode.com/svn/trunk/"</span>)

        (<span style="color: #729fcf;">:name</span> asciidoc         <span style="color: #729fcf;">:type</span> elpa)
        (<span style="color: #729fcf;">:name</span> dictionary-el    <span style="color: #729fcf;">:type</span> apt-get)
        (<span style="color: #729fcf;">:name</span> emacs-goodies-el <span style="color: #729fcf;">:type</span> apt-get)))

(el-get)
</pre>

<p>So now you have a pretty good documentation of the packages you want
installed, where to get them, and how to install them. For the <em>advanced</em>
methods (such as <code>elpa</code> or <code>apt-get</code>), you basically just need the package
name. When relying on a bare <code>git</code> repository, you need to give some more
information, such as the <code>URL</code> to <em>clone</em> and the <code>build</code> steps if any. Then also
what <em>features</em> to <code>require</code> and maybe where to find the <em>texinfo</em> documentation
of the package, for automatic inclusion into your local <em>Info</em> menu.</p>

<p>The good news is that not only you now have a solid readable description of
all that in a central place, but this very description is all <code>(el-get)</code> needs
to do its magic. This command will check that each and every package is
installed on your system (in <code>el-get-dir</code>) and if that's not the case, it will
actually install it. Then, it will <code>init</code> the packages: that means caring
about the <code>load-path</code>, the <code>Info-directory-list</code> (and <em>dir</em> texinfo menu
building), the <em>loading</em> of the <code>emacs-lisp</code> files, and finally it will <code>require</code>
the <em>features</em>.</p>

<p>Here's a prettyfied <code>ielm</code> session that will serve as a demo:</p>

<pre class="src">
ELISP&gt; (el-get)
(<span style="color: #ad7fa8; font-style: italic;">"aspell-en"</span> <span style="color: #ad7fa8; font-style: italic;">"aspell-fr"</span> <span style="color: #ad7fa8; font-style: italic;">"muse"</span> <span style="color: #ad7fa8; font-style: italic;">"dictionary"</span> <span style="color: #ad7fa8; font-style: italic;">"htmlize"</span> <span style="color: #ad7fa8; font-style: italic;">"bbdb"</span> <span style="color: #ad7fa8; font-style: italic;">"google-maps"</span>
<span style="color: #ad7fa8; font-style: italic;">"magit"</span> <span style="color: #ad7fa8; font-style: italic;">"emms"</span> <span style="color: #ad7fa8; font-style: italic;">"nxhtml"</span> <span style="color: #ad7fa8; font-style: italic;">"vkill"</span> <span style="color: #ad7fa8; font-style: italic;">"xcscope"</span> <span style="color: #ad7fa8; font-style: italic;">"yasnippet"</span> <span style="color: #ad7fa8; font-style: italic;">"asciidoc"</span>
<span style="color: #ad7fa8; font-style: italic;">"auto-dictionary"</span> <span style="color: #ad7fa8; font-style: italic;">"css-mode"</span> <span style="color: #ad7fa8; font-style: italic;">"gist"</span> <span style="color: #ad7fa8; font-style: italic;">"lua-mode"</span> <span style="color: #ad7fa8; font-style: italic;">"lisppaste"</span>)
</pre>

<p>All the packages being already installed, it's running fast enough that I
won't bother measuring the run time, that seems to be somewhere around one
second.</p>



<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/muse.html">Muse</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/el-get.html">el-get</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 04 Aug 2010 22:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/04-el-get.html</guid>
</item>
<item>
  <title>Database Virtual Machines</title>
  <link>http://tapoueh.org/blog/2010/08/03-database-virtual-machines.html</link>
  <description><![CDATA[h1>Database Virtual Machines</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/08/index.html>08</a> / </div>
<div class="date">Tuesday, August 03 2010, 13:30</div>
</div>
<div id="article">
<p>Today I'm being told once again about <a href="http://www.sqlite.org/">SQLite</a> as an embedded database
software. That one ain't a <em>database server</em> but a <em>software library</em> that you
can use straight into your main program. I'm yet to use it, but it looks
like <a href="http://www.sqlite.org/lang.html">its SQL support</a> is good enough for simple things — and that covers
<em>loads</em> of things. I guess read-only cache and configuration storage would be
the obvious ones, because it seems that <a href="http://www.sqlite.org/whentouse.html">SQLite use cases</a> aren't including
<a href="http://www.sqlite.org/lockingv3.html">mixed concurrency</a>, that is workloads with concurrent readers and writers.</p>

<p>The part that got my full attention is
<a href="http://www.sqlite.org/vdbe.html">The Virtual Database Engine of SQLite</a>, as this blog title would imply. It
seems to be the same idea as what <a href="http://monetdb.cwi.nl/">MonetDB</a> calls their
<a href="http://monetdb.cwi.nl/MonetDB/Documentation/MAL-Synopsis.html">MonetDB Assembly Language</a>, and I've been trying to summarize some idea about
it in my <a href="http://tapoueh.org/char10.html#sec11">Next Generation PostgreSQL</a> article.</p>

<p>The main thing is how to further optimize <a href="http://www.postgresql.org/">PostgreSQL</a> given what we have. It
seems that among the major road blocks in the performance work is how we get
the data from disk and to the client. We're still spending so many time in
the <code>CPU</code> that the disk bandwidth are not always saturated, and that's a
problem. Further thoughts on the <a href="http://tapoueh.org/char10.html#sec11">full length article</a>, but that's just about
a one page section now!</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 03 Aug 2010 13:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/03-database-virtual-machines.html</guid>
</item>


<item>
  <title>Partitioning: relation size per “group”</title>
  <link>http://tapoueh.org/blog/2010/07/26-partitioning-relation-size-per-group.html</link>
  <description><![CDATA[h1>Partitioning: relation size per “group”</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/07/index.html>07</a> / </div>
<div class="date">Monday, July 26 2010, 17:00</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>This time, we are trying to figure out where is the bulk of the data on
disk. The trick is that we're using <a href="http://www.postgresql.org/docs/current/static/ddl-partitioning.html">DDL partitioning</a>, but we want a “nice”
view of size per <em>partition set</em>. Meaning that if you have for example a
parent table <code>foo</code> with partitions <code>foo_201006</code> and <code>foo_201007</code>, you would want
to see a single category <code>foo</code> containing the accumulated size of all the
partitions underneath <code>foo</code>.</p>

<p>Here we go:</p>

<pre class="src">
select groupe, pg_size_pretty(sum(bytes)::bigint) as size, sum(bytes)
  from (
select relkind as k, nspname, relname, tablename, bytes,
         case when relkind = <span style="color: #ad7fa8; font-style: italic;">'r'</span> and relname ~ <span style="color: #ad7fa8; font-style: italic;">'[0-9]{6}$'</span>
              then substring(relname from 1 for length(relname)-7)

              when relkind = <span style="color: #ad7fa8; font-style: italic;">'i'</span> and  tablename ~ <span style="color: #ad7fa8; font-style: italic;">'[0-9]{6}$'</span>
              then substring(tablename from 1 for length(tablename)-7)

              else <span style="color: #ad7fa8; font-style: italic;">'core'</span>
          end as groupe
  from (
  select nspname, relname,
         case when relkind = <span style="color: #ad7fa8; font-style: italic;">'i'</span>
              then (select relname
                      from pg_index x
                           join pg_class xc on x.indrelid = xc.oid
                           join pg_namespace xn on xc.relnamespace = xn.oid
                     where x.indexrelid = c.oid
                    )
              else null
           end as tablename,
         pg_size_pretty(pg_relation_size(c.oid)) as relation,
         pg_total_relation_size(c.oid) as bytes,
         relkind
    from pg_class c join pg_namespace n on c.relnamespace = n.oid
   where c.relkind in (<span style="color: #ad7fa8; font-style: italic;">'r'</span>, <span style="color: #ad7fa8; font-style: italic;">'i'</span>)
         and nspname in (<span style="color: #ad7fa8; font-style: italic;">'public'</span>, <span style="color: #ad7fa8; font-style: italic;">'archive'</span>)
         and pg_total_relation_size(c.oid) &gt; 32 * 1024
order by 5 desc
       ) as s
       ) as t
group by 1
order by 3 desc;
</pre>

<p>Note that by simply removing those last two lines here, you will get a
detailed view of the <em>indexes</em> and <em>tables</em> that are taking the most volume on
disk at your place.</p>

<p>Now, what about using <a href="http://www.postgresql.org/docs/8.4/static/functions-window.html">window functions</a> here so that we get some better
detailed view of historic changes on each partition? With some evolution
figure in percentage from the previous partition of the same year,
accumulated size per partition and per year, yearly sum, you name it. Here's
another one you might want to try, ready for some tuning (schema name, table
name, etc):</p>

<pre class="src">
WITH s AS (
  select relname,
         pg_relation_size(c.oid) as size,
         pg_total_relation_size(c.oid) as tsize,
         substring(substring(relname from <span style="color: #ad7fa8; font-style: italic;">'[0-9]{6}$'</span>) for 4)::bigint as year
    from pg_class c
         join pg_namespace n on n.oid = c.relnamespace
   where c.relkind = <span style="color: #ad7fa8; font-style: italic;">'r'</span>
     <span style="color: #888a85;">-- and n.nspname = 'public'
</span>     <span style="color: #888a85;">-- and c.relname ~ 'stats'
</span>     and substring(substring(relname from <span style="color: #ad7fa8; font-style: italic;">'[0-9]{6}$'</span>) for 4)::bigint &gt;= 2008
order by relname
),
     sy AS (
  select relname,
         size,
         tsize,
         year,
         (sum(size) over w_year)::bigint as ysize,
         (sum(size) over w_month)::bigint as cumul,
         (lag(size) over (order by relname))::bigint as previous
    from s
  window w_year  as (partition by year),
         w_month as (partition by year order by relname)
),
     syp AS (
  select relname,
         size,
         tsize,
         rank() over (partition by year order by size desc) as rank,
         case when ysize = 0 then ysize
              else round(size / ysize::numeric * 100, 2) end as yp,
         case when previous = 0 then previous
              else round((size / previous::numeric - 1.0) * 100, 2) end as evol,
         cumul,
         year,
         ysize
    from sy
)
  SELECT relname,
         pg_size_pretty(size) as size,
         pg_size_pretty(tsize) as "+indexes",
         evol, yp as "% annuel", rank,
         pg_size_pretty(cumul) as cumul, year,
         pg_size_pretty(ysize) as "yearly sum",
         pg_size_pretty((sum(size) over())::bigint) as total
    FROM syp
ORDER BY relname;
</pre>

<p>Hope you'll find it useful, I certainly do!</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 26 Jul 2010 17:00:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/07/26-partitioning-relation-size-per-group.html</guid>
</item>
<item>
  <title>dim-switch-window.el: fixes</title>
  <link>http://tapoueh.org/blog/2010/07/blog/2010/07/26-dim-switch-windowel-fixes.html</link>
  <description><![CDATA[<p>Thanks to amazing readers of <a href="http://planet.emacsen.org/">planet emacsen</a>, two annoyances of
<a href="http://www.emacswiki.org/emacs/switch-window.el">switch-window.el</a> have already been fixed! The first is that handling of <code>C-g</code>
isn't exactly an option after all, and the other is that you want to avoid
the buffer creation in the simple cases (1 or 2 windows only), because it's
the usual case.</p>

<p>I've received code to handle the second case, that I mostly merged. Thanks a
lot guys, the new version is on <a href="http://wwww.emacswiki.org">emacswiki</a> already!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 26 Jul 2010 11:55:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/07/blog/2010/07/26-dim-switch-windowel-fixes.html</guid>
</item>
<item>
  <title>dim-switch-window.el: fixes</title>
  <link>http://tapoueh.org/blog/2010/07/26-dim-switch-windowel-fixes.html</link>
  <description><![CDATA[h1>dim-switch-window.el: fixes</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/07/index.html>07</a> / </div>
<div class="date">Monday, July 26 2010, 11:55</div>
</div>
<div id="article">
<p>Thanks to amazing readers of <a href="http://planet.emacsen.org/">planet emacsen</a>, two annoyances of
<a href="http://www.emacswiki.org/emacs/switch-window.el">switch-window.el</a> have already been fixed! The first is that handling of <code>C-g</code>
isn't exactly an option after all, and the other is that you want to avoid
the buffer creation in the simple cases (1 or 2 windows only), because it's
the usual case.</p>

<p>I've received code to handle the second case, that I mostly merged. Thanks a
lot guys, the new version is on <a href="http://wwww.emacswiki.org">emacswiki</a> already!</p>


<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/switch-window.html">switch-window</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 26 Jul 2010 11:55:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/07/26-dim-switch-windowel-fixes.html</guid>
</item>
<item>
  <title>dim-switch-window.el</title>
  <link>http://tapoueh.org/blog/2010/07/blog/2010/07/25-dim-switch-windowel.html</link>
  <description><![CDATA[<p>So it's Sunday and I'm thinking I'll get into <code>el-get</code> sometime later. Now is
the time to present <code>dim-switch-window.el</code> which implements a <em>visual</em> <code>C-x o</code>. I
know of only one way to present a <em>visual effect</em>, and that's with a screenshot:</p>

<center>
<p><img src="../../../images//emacs-switch-window.png" alt=""></p>
</center>

<p>So as you can see, it's all about showing a <em>big</em> number in each window,
tweaking each window's name, and waiting till the user press one of the
expected key — or timeout and stay on the same window as before <code>C-x o</code>. When
there's only 1 or 2 windows displayed, though, the right thing happen and
you see no huge number (in the former case, nothing happens, in the latter,
focus moves to the other window).</p>

<p>The code for that can be found on <a href="http://www.emacswiki.org/">emacswiki</a> under the name
<a href="http://www.emacswiki.org/emacs/switch-window.el">switch-window.el</a>. Hope you'll find it useful!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Sun, 25 Jul 2010 13:25:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/07/blog/2010/07/25-dim-switch-windowel.html</guid>
</item>
<item>
  <title>dim-switch-window.el</title>
  <link>http://tapoueh.org/blog/2010/07/25-dim-switch-windowel.html</link>
  <description><![CDATA[h1>dim-switch-window.el</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/07/index.html>07</a> / </div>
<div class="date">Sunday, July 25 2010, 13:25</div>
</div>
<div id="article">
<p>So it's Sunday and I'm thinking I'll get into <code>el-get</code> sometime later. Now is
the time to present <code>dim-switch-window.el</code> which implements a <em>visual</em> <code>C-x o</code>. I
know of only one way to present a <em>visual effect</em>, and that's with a screenshot:</p>

<center>
<p><img src="../../../images//emacs-switch-window.png" alt=""></p>
</center>

<p>So as you can see, it's all about showing a <em>big</em> number in each window,
tweaking each window's name, and waiting till the user press one of the
expected key — or timeout and stay on the same window as before <code>C-x o</code>. When
there's only 1 or 2 windows displayed, though, the right thing happen and
you see no huge number (in the former case, nothing happens, in the latter,
focus moves to the other window).</p>

<p>The code for that can be found on <a href="http://www.emacswiki.org/">emacswiki</a> under the name
<a href="http://www.emacswiki.org/emacs/switch-window.el">switch-window.el</a>. Hope you'll find it useful!</p>



<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/switch-window.html">switch-window</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Sun, 25 Jul 2010 13:25:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/07/25-dim-switch-windowel.html</guid>
</item>
<item>
  <title>ClusterSSH gets dsh support</title>
  <link>http://tapoueh.org/blog/2010/07/blog/2010/07/23-clusterssh-gets-dsh-support.html</link>
  <description><![CDATA[<p>If you don't know about <a href="cssh.html">ClusterSSH</a>, it's a project that builds on <code>M-x term</code>
and <code>ssh</code> to offer a nice and simple way to open remote terminals. It's
available in <a href="http://tromey.com/elpa/index.html">ELPA</a> and developed at <a href="http://github.com/dimitri/cssh">github cssh</a> repository.</p>

<p>The default binding is <code>C-=</code> and asks for the name of the server
to connect to, in the <em>minibuffer</em>, with completion. The host list used for
the completion comes from <code>tramp</code> and is pretty complete, all the more if
you've setup <code>~/.ssh/config</code> with <code>HashKnownHosts no</code>.</p>

<p>So the usual way to use <code>cssh.el</code> would be to just open a single remote
connection at a time. But of course you can open as many as you like, and
you get them all in a mosaic of <code>term</code> in your emacs frame, with an input
window at the bottom to control them all. There were two ways to get there,
either opening all remote hosts whose name is matching a given regexp, that
would be using <code>C-M-=</code> or getting to <code>IBuffer</code> and marking there
the existing remote <code>terms</code> you want to control all at once then use
<code>C-=</code>.</p>

<p>Well I've just added another mode of operation by supporting <em>enhanced</em> <a href="http://www.netfort.gr.jp/~dancer/software/dsh.html.en">dsh</a>
group files. In such files, you're supposed to have a remote host name per
line and that's it. We've added support for line containing <code>@group</code> kind of
lines so that you can <em>include</em> another group easily. To use the facility,
either open your <code>~/.dsh/group</code> directory in <code>dired</code> and type <code>C-=</code>
when on the right line, or simply use the global <code>C-=</code> you
already know and love. Then, type <code>@</code> and complete to any existing group found
in your <code>cssh-dsh-path</code> (it defaults to the right places, so chances are you
will never have to edit this one). And that's it, <a href="http://www.gnu.org/software/emacs/">Emacs</a> will open one <code>term</code>
per remote host you have in the <code>dsh</code> group you just picked. With a <code>*cssh*</code>
controler window, too.</p>

<p>Coming next, how I solved my <code>init.el</code> dependancies burden thanks to <code>el-get</code>!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 23 Jul 2010 22:20:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/07/blog/2010/07/23-clusterssh-gets-dsh-support.html</guid>
</item>
<item>
  <title>ClusterSSH gets dsh support</title>
  <link>http://tapoueh.org/blog/2010/07/23-clusterssh-gets-dsh-support.html</link>
  <description><![CDATA[h1>ClusterSSH gets dsh support</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/07/index.html>07</a> / </div>
<div class="date">Friday, July 23 2010, 22:20</div>
</div>
<div id="article">
<p>If you don't know about <a href="cssh.html">ClusterSSH</a>, it's a project that builds on <code>M-x term</code>
and <code>ssh</code> to offer a nice and simple way to open remote terminals. It's
available in <a href="http://tromey.com/elpa/index.html">ELPA</a> and developed at <a href="http://github.com/dimitri/cssh">github cssh</a> repository.</p>

<p>The default binding is <code>C-=</code> and asks for the name of the server
to connect to, in the <em>minibuffer</em>, with completion. The host list used for
the completion comes from <code>tramp</code> and is pretty complete, all the more if
you've setup <code>~/.ssh/config</code> with <code>HashKnownHosts no</code>.</p>

<p>So the usual way to use <code>cssh.el</code> would be to just open a single remote
connection at a time. But of course you can open as many as you like, and
you get them all in a mosaic of <code>term</code> in your emacs frame, with an input
window at the bottom to control them all. There were two ways to get there,
either opening all remote hosts whose name is matching a given regexp, that
would be using <code>C-M-=</code> or getting to <code>IBuffer</code> and marking there
the existing remote <code>terms</code> you want to control all at once then use
<code>C-=</code>.</p>

<p>Well I've just added another mode of operation by supporting <em>enhanced</em> <a href="http://www.netfort.gr.jp/~dancer/software/dsh.html.en">dsh</a>
group files. In such files, you're supposed to have a remote host name per
line and that's it. We've added support for line containing <code>@group</code> kind of
lines so that you can <em>include</em> another group easily. To use the facility,
either open your <code>~/.dsh/group</code> directory in <code>dired</code> and type <code>C-=</code>
when on the right line, or simply use the global <code>C-=</code> you
already know and love. Then, type <code>@</code> and complete to any existing group found
in your <code>cssh-dsh-path</code> (it defaults to the right places, so chances are you
will never have to edit this one). And that's it, <a href="http://www.gnu.org/software/emacs/">Emacs</a> will open one <code>term</code>
per remote host you have in the <code>dsh</code> group you just picked. With a <code>*cssh*</code>
controler window, too.</p>

<p>Coming next, how I solved my <code>init.el</code> dependancies burden thanks to <code>el-get</code>!</p>


<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/cssh.html">cssh</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Fri, 23 Jul 2010 22:20:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/07/23-clusterssh-gets-dsh-support.html</guid>
</item>
<item>
  <title>Emacs and PostgreSQL</title>
  <link>http://tapoueh.org/blog/2010/07/blog/2010/07/22-emacs-and-postgresql.html</link>
  <description><![CDATA[<p>Those are my two all times favorite Open Source Software. Or <a href="http://www.gnu.org/philosophy/free-sw.html">Free Software</a>
in the <a href="http://www.gnu.org/">GNU</a> sense of the world, as both the <em>BSD</em> and the <em>GPL</em> are labeled free
there. Even if I prefer the <a href="http://www.debian.org/social_contract">The Debian Free Software Guidelines</a> as a global
definition and the <a href="http://sam.zoy.org/wtfpl/">WTFPL</a> license. But that's a digression.</p>

<p>I think that <a href="http://www.gnu.org/software/emacs/">Emacs</a> and <a href="http://www.postgresql.org/">PostgreSQL</a> do share a lot in common. I'd begin with
the documentation, which quality is amazing for both projects. Then of
course the extensibility with <a href="http://www.gnu.org/software/emacs/emacs-lisp-intro/html_node/Preface.html#Preface">Emacs Lisp</a> on the one hand and
<a href="http://www.postgresql.org/docs/8.4/static/extend.html">catalog-driven operations</a> on the other hand. Whether you're extending Emacs
or PostgreSQL you'll find that it's pretty easy to tweak the system <em>while
it's running</em>. The other comparison points are less important, like the fact
the both the systems get about the same uptime on my laptop (currently <em>13
days, 23 hours, 57 minutes, 10 seconds</em>).</p>

<p>So of course I'm using <em>Emacs</em> to edit <em>PostgreSQL</em> <code>.sql</code> files, including stored
procedures. And it so happens that <a href="http://archives.postgresql.org/pgsql-hackers/2010-07/msg01067.php">line numbering in plpgsql</a> is not as
straightforward as one would naively think, to the point that we'd like to
have better tool support there. So I've extended Emacs <a href="http://www.gnu.org/software/emacs/manual/html_node/emacs/Minor-Modes.html">linum-mode minor mode</a>
to also display the line numbers as computed per PostgreSQL, and here's what
it looks like:</p>

<center>
<p><img src="../../../images//emacs-pgsql-line-numbers.png" alt=""></p>
</center>

<p>Now, here's also the source code, <a href="https://github.com/dimitri/pgsql-linum-format">pgsql-linum-format</a>. Hope you'll enjoy!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 22 Jul 2010 09:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/07/blog/2010/07/22-emacs-and-postgresql.html</guid>
</item>
<item>
  <title>Emacs and PostgreSQL</title>
  <link>http://tapoueh.org/blog/2010/07/22-emacs-and-postgresql.html</link>
  <description><![CDATA[h1>Emacs and PostgreSQL</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/07/index.html>07</a> / </div>
<div class="date">Thursday, July 22 2010, 09:30</div>
</div>
<div id="article">
<p>Those are my two all times favorite Open Source Software. Or <a href="http://www.gnu.org/philosophy/free-sw.html">Free Software</a>
in the <a href="http://www.gnu.org/">GNU</a> sense of the world, as both the <em>BSD</em> and the <em>GPL</em> are labeled free
there. Even if I prefer the <a href="http://www.debian.org/social_contract">The Debian Free Software Guidelines</a> as a global
definition and the <a href="http://sam.zoy.org/wtfpl/">WTFPL</a> license. But that's a digression.</p>

<p>I think that <a href="http://www.gnu.org/software/emacs/">Emacs</a> and <a href="http://www.postgresql.org/">PostgreSQL</a> do share a lot in common. I'd begin with
the documentation, which quality is amazing for both projects. Then of
course the extensibility with <a href="http://www.gnu.org/software/emacs/emacs-lisp-intro/html_node/Preface.html#Preface">Emacs Lisp</a> on the one hand and
<a href="http://www.postgresql.org/docs/8.4/static/extend.html">catalog-driven operations</a> on the other hand. Whether you're extending Emacs
or PostgreSQL you'll find that it's pretty easy to tweak the system <em>while
it's running</em>. The other comparison points are less important, like the fact
the both the systems get about the same uptime on my laptop (currently <em>13
days, 23 hours, 57 minutes, 10 seconds</em>).</p>

<p>So of course I'm using <em>Emacs</em> to edit <em>PostgreSQL</em> <code>.sql</code> files, including stored
procedures. And it so happens that <a href="http://archives.postgresql.org/pgsql-hackers/2010-07/msg01067.php">line numbering in plpgsql</a> is not as
straightforward as one would naively think, to the point that we'd like to
have better tool support there. So I've extended Emacs <a href="http://www.gnu.org/software/emacs/manual/html_node/emacs/Minor-Modes.html">linum-mode minor mode</a>
to also display the line numbers as computed per PostgreSQL, and here's what
it looks like:</p>

<center>
<p><img src="../../../images//emacs-pgsql-line-numbers.png" alt=""></p>
</center>

<p>Now, here's also the source code, <a href="https://github.com/dimitri/pgsql-linum-format">pgsql-linum-format</a>. Hope you'll enjoy!</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/plpgsql.html">plpgsql</a> <a href="../../../tags/pgsql-linum-format.html">pgsql-linum-format</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 22 Jul 2010 09:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/07/22-emacs-and-postgresql.html</guid>
</item>
<item>
  <title>Background writers</title>
  <link>http://tapoueh.org/blog/2010/07/19-background-writers.html</link>
  <description><![CDATA[h1>Background writers</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/07/index.html>07</a> / </div>
<div class="date">Monday, July 19 2010, 16:30</div>
</div>
<div id="article">
<p>There's currently a thread on <a href="http://archives.postgresql.org/pgsql-hackers/">hackers</a> about <a href="http://archives.postgresql.org/pgsql-hackers/2010-07/msg00493.php">bg worker: overview</a> and a series
of 6 patches. Thanks a lot <strong><em>Markus</em></strong>! This is all about generalizing a concept
already in use in the <em>autovacuum</em> process, where you have an independent
subsystem that require having an autonomous <em>daemon</em> running and able to start
its own <em>workers</em>.</p>

<p>I've been advocating about generalizing this concept for awhile already, in
order to have <em>postmaster</em> able to communicate to subsystems when to shut down
and start and reload, etc. Some external processes are only external because
there's no need to include them <em>by default</em> in to the database engine, not
because there's no sense to having them in there.</p>

<p>So even if <strong><em>Markus</em></strong> work is mainly about generalizing <em>autovacuum</em> so that he
has a <em>coordinator</em> to ask for helper backends to handle broadcasting of
<em>writesets</em> for <a href="http://postgres-r.org/">Postgres-R</a>, it still could be a very good first step towards
something more general. What I'd like to see the generalization handle are
things like <a href="http://wiki.postgresql.org/wiki/PGQ_Tutorial">PGQ</a>, or the <em>pgagent scheduler</em>. In some cases, <a href="http://pgbouncer.projects.postgresql.org/doc/usage.html">pgbouncer</a> too.</p>

<p>What we're missing there is an <em>API</em> for everybody to be able to extend
PostgreSQL with its own background processes and workers. What would such a
beast look like? I have some preliminary thoughts about this in my
<a href="char10.html#sec16">Next Generation PostgreSQL</a> article, but that's still early thoughts. The
main idea is to steal as much as sensible from
<a href="http://www.erlang.org/doc/man/supervisor.html">Erlang Generic Supervisor Behaviour</a>, and maybe up to its
<a href="http://www.erlang.org/doc/design_principles/fsm.html">Generic Finite State Machines</a> <em>behavior</em>. In the <em>Erlang</em> world, a <em>behavior</em> is a
generic process.</p>

<p>The <em>FSM</em> approach would allow for any user daemon to provide an initial state
and register functions that would do some processing then change the
state. My feeling is that if those functions are exposed at the SQL level,
then you can <em>talk</em> to the daemon from anywhere (the Erlang ideas include a
globally —cluster wide— unique name). Of course the goal would be to
provide an easy way for the <em>FSM</em> functions to have a backend connected to the
target database handle the work for it, or be able to connect itself. Then
we'd need something else here, a way to produce events based on the clock. I
guess relying on <code>SIGALRM</code> is a possibility.</p>

<p>I'm not sure about how yet, but I think getting back in consultancy after
having opened <a href="http://2ndQuadrant.com">2ndQuadrant</a> <a href="http://2ndQuadrant.fr">France</a> has some influence on how I think about all
that. My guess is that those blog posts are a first step on a nice journey!</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 19 Jul 2010 16:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/07/19-background-writers.html</guid>
</item>
<item>
  <title>Logs analysis</title>
  <link>http://tapoueh.org/blog/2010/07/13-logs-analysis.html</link>
  <description><![CDATA[h1>Logs analysis</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/07/index.html>07</a> / </div>
<div class="date">Tuesday, July 13 2010, 14:15</div>
</div>
<div id="article">
<p>Nowadays to analyze logs and provide insights, the more common tool to use
is <a href="http://pgfouine.projects.postgresql.org/">pgfouine</a>, which does an excellent job. But there has been some
improvements in logs capabilities that we're not benefiting from yet, and
I'm thinking about the <code>CSV</code> log format.</p>

<p>So the idea would be to turn <em>pgfouine</em> into a set of <code>SQL</code> queries against the
logs themselves once imported into the database. Wait. What about having our
next PostgreSQL version, which is meant (I believe) to include CSV support
in <em>SQL/MED</em>, to directly expose its logs as a system view?</p>

<p>A good thing would be to expose that as a ddl-partitioned table following
the log rotation scheme as setup in <code>postgresql.conf</code>, or maybe given in some
sort of a setup, in order to support <code>logrotate</code> users. At least some
facilities to do that would be welcome, and I'm not sure plain <em>SQL/MED</em> is
that when it comes to <em>source</em> partitioning.</p>

<p>Then all that remains to be done is a set of <code>SQL</code> queries and some static or
dynamic application to derive reports from there.</p>

<p>This is yet again an idea I have in mind but don't have currently time to
explore myself, so I talk about it here in the hope that others will share
the interest. Of course, now that I work at <a href="http://2ndQuadrant.com">2ndQuadrant</a>, you can make it so
that we consider the idea in more details, up to implementing and
contributing it!</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 13 Jul 2010 14:15:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/07/13-logs-analysis.html</guid>
</item>
<item>
  <title>Using indexes as column store?</title>
  <link>http://tapoueh.org/blog/2010/07/08-using-indexes-as-column-store.html</link>
  <description><![CDATA[h1>Using indexes as column store?</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/07/index.html>07</a> / </div>
<div class="date">Thursday, July 08 2010, 11:15</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>There's a big trend nowadays about using column storage as opposed to what
PostgreSQL is doing, which would be row storage. The difference is that if
you have the same column value in a lot of rows, you could get to a point
where you have this value only once in the underlying storage file. That
means high compression. Then you tweak the <em>executor</em> to be able to load this
value only once, not once per row, and you win another huge source of data
traffic (often enough, from disk).</p>

<p>Well, it occurs to me that maybe we could have column oriented storage
support without adding any new storage facility into PostgreSQL itself, just
using in new ways what we already have now. Column oriented storage looks
somewhat like an index, where any given value is meant to appear only
once. And you have <em>links</em> to know where to find the full row associated in
the main storage.</p>

<p>There's a work in progress to allow for PostgreSQL to use indexes on their
own, without having to get to the main storage for checking the
visibility. That's known as the <a href="http://www.postgresql.org/docs/8.4/static/storage-vm.html">Visibility Map</a>, which is still only a hint
in released versions. The goal is to turn that into a crash-safe trustworthy
source in the future, so that we get <em>covering indexes</em>. That means we can use
an index and skip getting to the full row in main storage and get the
visibility information there.</p>

<p>Now, once we have that, we could consider using the indexes in more
queries. It could be a win to get the column values from the index when
possible and if you don't <em>output</em> more columns from the <em>heap</em>, return the
values from there. Scanning the index only once per value, not once per row.</p>

<p>There's a little more though on the point in the <a href="char10.html#sec10">Next Generation PostgreSQL</a>
article I've been referencing already, should you be interested.</p>



<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/release.html">release</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 08 Jul 2010 11:15:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/07/08-using-indexes-as-column-store.html</guid>
</item>
<item>
  <title>MVCC in the Cloud</title>
  <link>http://tapoueh.org/blog/2010/07/06-mvcc-in-the-cloud.html</link>
  <description><![CDATA[h1>MVCC in the Cloud</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/07/index.html>07</a> / </div>
<div class="date">Tuesday, July 06 2010, 10:50</div>
</div>
<div id="article">
<p>At <a href="http://char10.org/">CHAR(10)</a> <strong><em>Markus</em></strong> had a talk about
<a href="http://char10.org/talk-schedule-details#talk13">Using MVCC for Clustered Database Systems</a> and explained how <a href="http://postgres-r.org/">Postgres-R</a> does
it. The scope of his project is to maintain a set of database servers in the
same state, eventually.</p>

<p>Now, what does it mean to get &quot;In the Cloud&quot;? Well there are more than one
answer I'm sure, mine would insist on including this &quot;Elasticity&quot; bit. What
I mean here is that it'd be great to be able to add or lose nodes and stay
<em>online</em>. Granted, that what's <em>Postgres-R</em> is providing. Does that make it
ready for the &quot;Cloud&quot;? Well it happens so that I don't think so.</p>

<p>Once you have elasticity, you also want <em>scalability</em>. That could mean lots of
thing, and <em>Postgres-R</em> already provides a great deal of it, at the connect
and reads level: you can do your business <em>unlimited</em> on any node, the others
will eventually (<em>eagerly</em>) catch-up, and you can do your <code>select</code> on any node
too, reading from the same data set. Eventually.</p>

<p>What's still missing here is the hard sell, <em>write scalability</em>. This is the
idea that you don't want to sustain the same <em>write load</em> on all the members
of the &quot;Cloud cluster&quot;. It happens that I have some idea about how to go on
this, and this time I've been trying to write them down. You might be
interested into the <a href="http://tapoueh.org/char10.html#sec3">MVCC in the Cloud</a> part of my <a href="http://tapoueh.org/char10.html">Next Generation PostgreSQL</a>
notes.</p>

<p>My opinion is that if you want to distribute the data, this is a problem
that falls in the category of finding the data on disk. This problem is
already solved in the executor, it knows which operating system level file
to open and where to seek inside that in order to find a row value for a
given relation. So it should be possible to teach it that some relation's
storage ain't local, to get the data it needs to communicate to another
PostgreSQL instance.</p>

<p>I would call that a <em>remote tablespace</em>. It allows for distributing both the
data and their processing, which could happen in parallel. Of course that
means there's now some latency concerns, and that some <em>JOIN</em> will get slow if
you need to retrieve the data from the network each time. For that what I'm
thinking about is the possibility to manage a local copy of a remote
tablespace, which would be a <em>mirror tablespace</em>. But that's for another blog
post.</p>

<p>Oh, if that makes you think a lot of <a href="http://wiki.postgresql.org/wiki/SQL/MED">SQL/MED</a>, that would mean I did a good
enough job at explaining the idea. The main difference though would be to
ensure transaction boundaries over the local and remote data: it's one
single distributed database we're talking about here.</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/conferences.html">Conferences</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 06 Jul 2010 10:50:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/07/06-mvcc-in-the-cloud.html</guid>
</item>
<item>
  <title>Back from CHAR(10)</title>
  <link>http://tapoueh.org/blog/2010/07/05-back-from-char10.html</link>
  <description><![CDATA[h1>Back from CHAR(10)</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/07/index.html>07</a> / </div>
<div class="date">Monday, July 05 2010, 09:30</div>
</div>
<div id="article">
<p>It surely does not feel like a full month and some more went by since we
were enjoying <a href="http://www.pgcon.org/2010/">PGCon 2010</a>, but in fact it was already the time for
<a href="http://char10.org/talk-schedule">CHAR(10)</a>. The venue was most excellent, as Oxford is a very beautiful
city. Also, the college was like a city in the city, and having the
accomodation all in there really smoothed it all.</p>

<p>On a more technical viewpoint, the <a href="http://char10.org/talk-schedule">range of topics</a> we talked about and the
even broader one in the <em>&quot;Hall Track&quot;</em> make my mind full of ideas, again. So
I'm preparing a quite lengthy article to summarise or present all those
ideas, and I think a post series should cover the points in there. When
trying to label things, it appears that my current obsessions are mainly
about <em>PostgreSQL in the Cloud</em> and <em>Further Optimising PostgreSQL</em>, so that's
what I'll be talking about those next days.</p>

<p>Meanwhile I'm going to search for existing solutions on how to use the
<a href="http://en.wikipedia.org/wiki/Paxos_algorithm">Paxos algorithm</a> to generate a reliable distributed sequence, using <a href="http://libpaxos.sourceforge.net/">libpaxos</a>
for example. The goal would be to see if it's feasible to have a way to
offer some global <code>XID</code> from a network of servers in a distributed fashion,
ideally in such a way that new members can join in at any point, and of
course that losing a member does not cause downtime for the online ones. It
sounds like this problem has been extensively researched and is solved,
either by the <em>Global Communication Systems</em> or the underlying
algorithms. Given the current buy-in lack of our community for <code>GCS</code> my guess
is that bypassing them would be a pretty good move, even if that mean
implementing a limited form of <code>GCS</code> ourselves.</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/conferences.html">Conferences</a> <a href="../../../tags/pgcon.html">pgcon</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 05 Jul 2010 09:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/07/05-back-from-char10.html</guid>
</item>






<item>
  <title>Back from PgCon2010</title>
  <link>http://tapoueh.org/blog/2010/05/27-back-from-pgcon2010.html</link>
  <description><![CDATA[h1>Back from PgCon2010</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/05/index.html>05</a> / </div>
<div class="date">Thursday, May 27 2010, 14:26</div>
</div>
<div id="article">
<p>This year's edition has been the <a href="http://www.pgcon.org/2010/">best pgcon</a> ever for me. Granted, it's only
my third time, but still :) As <a href="http://blog.endpoint.com/2010/05/pgcon-hall-track.html">Josh said</a> the <em>&quot;Hall Track&quot;</em> in particular was
very good, and the <a href="http://wiki.postgresql.org/wiki/PgCon_2010_Developer_Meeting">Dev Meeting</a> has been very effective!</p>

<h3>Extensions</h3>

<p class="first">This time I prepared some <a href="http://wiki.postgresql.org/wiki/Image:Pgcon2010-dev-extensions.pdf">slides to present the extension design</a> and I tried
hard to make it so that we get to agree on a plan, even recognizing it's not
solving all of our problems from the get go. I had been talking about the
concept and design with lots of people already, and continued to do so while in
Ottawa on Monday evening and through all Tuesday. So Wednesday, I felt
prepared. It proved to be a good thing, as I edited the slides with ideas from
several people I had the chance to expose my ideas to! Thanks <em>Greg Stark</em> and
<em>Heikki Linnakangas</em> for the part we talked about at the meeting, and a lot more
people for the things we'll have to solve later (Hi <em>Stefan</em>!).</p>

<p>So the current idea for <strong>extensions</strong> is for the <em>backend</em> support to start with a
file in <code>`pg_config --sharedir`/extensions/foo/control</code> containing
the <em>foo</em> extension's <em>metadata</em>. From that we know if we can install an extension
and how. Here's an example:</p>

<pre class="src">
name = foo
version = 1.0
custom_variable_classes = 'foo'
depends  = bar (&gt;= 1.1), baz
conflicts = bla (&lt; 0.8)
</pre>

<p>The other files should be <code>install.sql</code>, <code>uninstall.sql</code> and <code>foo.conf</code>. The only
command the user will have to type in order for using the extension in his
database will then be:</p>

<pre class="src">
  INSTALL EXTENSION foo;
</pre>

<p>For that to work all that needs to happen is for me to write the code. I'll
keep you informed as soon as I get a change to resume my activities on the
<a href="http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=shortlog;h=refs/heads/extension">git branch</a> I'm using. You can already find my first attempt at a
<code>pg_execute_from_file()</code> function <a href="http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=commitdiff;h=6eed4eca0179cbdeb737b9783084e9f03fcb7470">there</a>.</p>

<p>Building atop that backend support we already have two gentlemen competing on
features to offer to <a href="http://justatheory.com/computers/databases/postgresql/pgan-bikeshedding.html">distribute</a> and <a href="http://petereisentraut.blogspot.com/2010/05/postgresql-package-management.html">package</a> extensions! That will complete the
work just fine, thanks guys.</p>


<h3>Hot Standby</h3>

<p class="first">Heikki's talk about <a href="http://www.pgcon.org/2010/schedule/events/264.en.html">Built-in replication in PostgreSQL 9.0</a> left me with lots of
thinking. In particular it seems we need two projects out of core to complete
what <code>9.0</code> has to offer, namely something very simple to prepare a base backup
and something more involved to manage a pool of standbys.</p>

<h4>pg_basebackup</h4>

<p class="first">The idea I had listening to the talk was that it might be possible to ask the
server, in a single SQL query, for the list of all the files it's using. After
all, there's those <code>pg_ls_files()</code> and <code>pg_read_file()</code> functions, we could put
them to good use. I couldn't get the idea out of my head, so I had to write
some code and see it running: <a href="http://github.com/dimitri/pg_basebackup">pg_basebackup</a> is there at <code>github</code>, grab a copy!</p>

<p>What it does is very simple, in about 100 lines of self-contained python code
it get all the files from a running server through a normal PostgreSQL
connection. That was my first <a href="http://www.postgresql.org/docs/8.4/interactive/queries-with.html">recursive query</a>. I had to create a new function
to get the file contents as the existing one returns text, and I want <code>bytea</code>
here, of course.</p>

<p>Note that the code depends on the <code>bytea</code> representation in use, so it's only
working with <code>9.0</code> as of now. Can be changed easily though, send a patch or just
ask me to do it!</p>

<p>Lastly, note that even if <code>pg_basebackup</code> will compress each chunk it sends over
the <code>libpq</code> connection, it won't be your fastest option around. Its only
advantage there is its simplicity. Get the code, run it with 2 arguments: a
connection string and a destination directory. There you are.</p>


<h4>wal proxy, wal relay</h4>

<p class="first">The other thing that we'll miss in <code>9.0</code> is the ability to both manage more than
a couple of <em>standby</em> servers and to manage failover gracefully. Here the idea
would be to have a proxy server acting as both a <em>walreceiver</em> and a
<em>walsender</em>. Its role would be to both <em>archive</em> the WAL and <em>relay</em> them to the real
standbys.</p>

<p>Then in case of master's failure, we could instruct this <em>proxy</em> to be fed from
the elected new master (manual procedure), the other standbys not being
affected. Well apart than apparently changing the <em>timeline</em> (which will happen
as soon as you promote a standby to master) while streaming is not meant to be
supported. So the <em>proxy</em> would also disconnect all the <em>slaves</em> and have them
reconnect.</p>

<p>If we need such a finesse, we could have the <code>restore_command</code> on the <em>standbys</em>
prepared so that it'll connect to the <em>proxy's archive</em>. Now on failover, the
<em>standbys</em> are disconnected from the stream, get a <code>WAL</code> file with a new <em>timeline</em>
from the <em>archive</em>, replay it, and reconnect.</p>

<p>That means that for a full <code>HA</code> scenario you could get on with three
servers. You're back to two servers at failover time and need to rebuild the
crashed master as a standby, running a base backup again.</p>

<p>If you've followed the idea, I hope you liked it! I still have to motivate some
volunteers so that some work gets done here, as I'm probably not the one to ask
to as far as coding this is concerned, if you want it out before <code>9.1</code> kicks in!</p>



<h3>Queuing</h3>

<p class="first">We also had a nice <em>Hall Tack</em> session with <em>Jan Wieck</em>, <em>Marko Kreen</em> and <em>Jim Nasby</em>
about how to get a single general (enough) queueing solution for PostgreSQL. It
happens that the Slony queueing ideas made their way into <code>PGQ</code> and that we'd
want to add some more capabilities to this one.</p>

<p>What we talked about was adding more interfaces (event producers, event format
translating at both ends of the pipe) and optimising how many events from the
past we keep in the queue for the subscribers, in a cascading environment.</p>

<p>It seems that the basic architecture of the queue is what <code>PGQ 3</code> provides
already, so it could even be not that much of a hassle to get something working
out of the ideas exchanged.</p>

<p>Of course, one of those ideas has been discussed at the <a href="http://wiki.postgresql.org/wiki/PgCon_2010_Developer_Meeting">Dev Meeting</a>, it's about
deriving the transaction commit order from the place which already has the
information rather than <em>reconstructing</em> it after the fact. We'll see how it
goes, but it started pretty well with a design mail thread.</p>


<h3>Other talks</h3>

<p class="first">I went to some other talks too, of course, unfortunately with an attention span
far from constant. Between the social events (you should read that as <em>beer
drinking evenings</em>) and the hall tracks, more than once my brain were less
present than my body in the talks. I won't risk into commenting them here, but
overall it was very good: in about each talk, new ideas popped into my
head. And I love that.</p>


<h3>Conclusion: I'm addicted.</h3>

<p class="first">The social aspect of the conference has been very good too. Once more, a warm
welcome from the people that are central to the project, and who are so easily
available for a chat about any aspect of it! Or just for sharing a drink.</p>

<p>Meeting our users is very important too, and <a href="http://www.pgcon.org/2010/">pgcon</a> allows for that also. I've
met some people I'm used to talk to via <code>IRC</code>, and it was good fun sharing a beer
over there.</p>

<p>All in all, I'm very happy I made it to Ottawa despite the volcano activity,
there's so much happening over there! Thanks to all the people who made it
possible by either organizing the conference or attending to it! See you next
year, I'm addicted...</p>



<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/pgcon.html">pgcon</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/backup.html">backup</a> <a href="../../../tags/restore.html">restore</a> <a href="../../../tags/9.1.html">9.1</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 27 May 2010 14:26:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/05/27-back-from-pgcon2010.html</guid>
</item>


<item>
  <title>Import fixed width data with pgloader</title>
  <link>http://tapoueh.org/blog/2010/04/27-import-fixed-width-data-with-pgloader.html</link>
  <description><![CDATA[h1>Import fixed width data with pgloader</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/04/index.html>04</a> / </div>
<div class="date">Tuesday, April 27 2010, 12:01</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>So, following previous blog entries about importing <em>fixed width</em> data, from
<a href="http://www.postgresonline.com/journal/index.php?/archives/157-Import-fixed-width-data-into-PostgreSQL-with-just-PSQL.html">Postgres Online Journal</a> and <a href="http://people.planetpostgresql.org/dfetter/index.php?/archives/58-psql&#44;-Paste&#44;-Perl-Pefficiency&#33;.html">David (perl) Fetter</a>, I couldn't resist following
the meme and showing how to achieve the same thing with <a href="http://pgloader.projects.postgresql.org/#toc9">pgloader</a>.</p>

<p>I can't say how much I dislike such things as the following, and I can't
help thinking that non IT people are right looking at us like this when
encountering such prose.</p>

<pre class="src">
  map {s<span style="color: #ad7fa8; font-style: italic;">/\D*(\d+)-(\d+).*/$a.="A".(1+$2-$1). " "/</span>e} split(<span style="color: #ad7fa8; font-style: italic;">/\n/</span>,&lt;&lt;<span style="color: #ad7fa8; font-style: italic;">'EOT'</span>);
</pre>

<p>So, the <em>pgloader</em> way. First you need to have setup a database, I called it
<code>pgloader</code> here. Then you need the same <code>CREATE TABLE</code> as on the original
article, here is it for completeness:</p>

<pre class="src">
CREATE TABLE places(usps char(2) NOT NULL,
    fips char(2) NOT NULL,
    fips_code char(5),
    loc_name varchar(64));
</pre>

<p>Now the data file I've taken here:
<a href="http://www.census.gov/tiger/tms/gazetteer/places2k.txt">http://www.census.gov/tiger/tms/gazetteer/places2k.txt</a>.</p>

<p>Then we translate the file description into <em>pgloader</em> setup:</p>

<pre class="src">
[<span style="color: #8ae234; font-weight: bold;">pgsql</span>]
<span style="color: #eeeeec;">host</span> = localhost
<span style="color: #eeeeec;">port</span> = 5432
<span style="color: #eeeeec;">base</span> = pgloader
<span style="color: #eeeeec;">user</span> = dim
<span style="color: #eeeeec;">pass</span> = None

<span style="color: #eeeeec;">log_file</span>            = /tmp/pgloader.log
<span style="color: #eeeeec;">log_min_messages</span>    = DEBUG
<span style="color: #eeeeec;">client_min_messages</span> = WARNING

<span style="color: #eeeeec;">client_encoding</span> = <span style="color: #ad7fa8; font-style: italic;">'latin1'</span>
<span style="color: #eeeeec;">lc_messages</span>         = C
<span style="color: #eeeeec;">pg_option_standard_conforming_strings</span> = on

[<span style="color: #8ae234; font-weight: bold;">fixed</span>]
<span style="color: #eeeeec;">table</span>           = places
<span style="color: #eeeeec;">format</span>          = fixed
<span style="color: #eeeeec;">filename</span>        = places2k.txt
<span style="color: #eeeeec;">columns</span>         = *
<span style="color: #eeeeec;">fixed_specs</span>     = usps:0:2, fips:2:2, fips_code:4:5, loc_name:9:64, p:73:9, h:82:9, land:91:14, water:105:14, ldm:119:14, wtm:131:14, lat:143:10, long:153:11
</pre>

<p>We're ready to import the data now:</p>

<pre class="src">
dim ~/PostgreSQL/examples pgloader -vsTc pgloader.conf
pgloader     INFO     Logger initialized
pgloader     WARNING  path entry '/usr/share/python-support/pgloader/reformat' does not exists, ignored
pgloader     INFO     Reformat path is []
pgloader     INFO     Will consider following sections:
pgloader     INFO       fixed
pgloader     INFO     Will load 1 section at a time
fixed        INFO     columns = *, got [('usps', 1), ('fips', 2), ('fips_code', 3), ('loc_name', 4)]
fixed        INFO     Loading threads: 1
fixed        INFO     closing current database connection
fixed        INFO     fixed processing
fixed        INFO     TRUNCATE TABLE places;
pgloader     INFO     All threads are started, wait for them to terminate
fixed        INFO     COPY 1: 10000 rows copied in 5.769s
fixed        INFO     COPY 2: 10000 rows copied in 5.904s
fixed        INFO     COPY 3: 5375 rows copied in 3.187s
fixed        INFO     No data were rejected
fixed        INFO      25375 rows copied in 3 commits took 14.907 seconds
fixed        INFO     No database error occured
fixed        INFO     closing current database connection
fixed        INFO     releasing fixed semaphore
fixed        INFO     Announce it's over

Table name        |    duration |    size |  copy rows |     errors
====================================================================
fixed             |     14.901s |       - |      25375 |          0
</pre>

<p>Note the <code>-T</code> option is for <code>TRUNCATE</code>, which you only need when you want to
redo the loading, I've come to always mention it in interactive usage. The
<code>-v</code> option is for some more <em>verbosity</em> and the <code>-s</code> for the <em>summary</em> at end of
operations.</p>

<p>With the <code>pgloader.conf</code> and <code>places2k.txt</code> in the current directory, and an
empty table, just typing in <code>pgloader</code> at the prompt would have done the job.</p>

<p>Oh, the <code>pg_option_standard_conforming_strings</code> bit is from the <a href="http://github.com/dimitri/pgloader">git HEAD</a>, the
current released version has no support for setting any PostgreSQL knob
yet. Still, it's not necessary here, so you can forget about it.</p>

<p>You will also notice that <em>pgloader</em> didn't trim the data for you, which ain't
funny for the <em>places</em> column. That's a drawback of the fixed width format
that you can work on two ways here, either by means of </p>

<pre class="src">
UPDATE places SET loc_name = trim(loc_name)&#160;;
</pre> or a custom
reformat module for <em>pgloader</em>. I guess the latter solution is overkill, but
it allows for <em>pipe</em> style processing of the data and a single database write.

<p>Send me a mail if you want me to show here how to setup such a reformatting
module in a next blog entry!</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/pgloader.html">pgloader</a> <a href="../../../tags/9.1.html">9.1</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 27 Apr 2010 12:01:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/04/27-import-fixed-width-data-with-pgloader.html</guid>
</item>
<item>
  <title>pgloader activity report</title>
  <link>http://tapoueh.org/blog/2010/04/06-pgloader-activity-report.html</link>
  <description><![CDATA[h1>pgloader activity report</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/04/index.html>04</a> / </div>
<div class="date">Tuesday, April 06 2010, 09:10</div>
</div>
<div id="article">
<p>Yes. This <a href="http://pgloader.projects.postgresql.org/">pgloader</a> project is still maintained and somewhat
active. Development happens when I receive a complaint, either about a bug
in existing code or a feature in yet-to-write code. If you have a bug to
report, just send me an email!</p>

<p>If you're following the development of it, the sources just moved from <code>CVS</code>
at <a href="http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pgloader/pgloader/">pgfoundry</a> to <a href="http://github.com/dimitri/pgloader">http://github.com/dimitri/pgloader</a>. I will still put the
releases at <a href="http://pgfoundry.org/projects/pgloader">pgfoundry</a>, and the existing binary packages maintenance should
continue. See also the <a href="http://pgloader.projects.postgresql.org/dev/pgloader.1.html">development version documentation</a>, which contains not
yet released stuff.</p>

<p>This time it's about new features, the goal being to open <em>pgloader</em> usage
without describing all the file format related details into the
<code>pgloader.conf</code> file. This time around, <a href="http://database-explorer.blogspot.com/">Simon</a> is giving feedback and told me
he would appreciate that pgloader would work more like the competition.</p>

<p>We're getting there with some new options. The first one is that rather than
only <code>Sections</code>, now your can give a <code>filename</code> as an argument. <em>pgloader</em> will
then create a configuration section for you, considering the file format to
be <code>CSV</code>, setting <code>columns = *</code>. The default <em>field separator</em> is <code>|</code>,
so you have also the <code>-f, --field-separator</code> option to set that from the
command line.</p>

<p>As if that wasn't enough, <em>pgloader</em> now supports any <a href="http://www.postgresql.org/">PostgreSQL</a> option either
in the configuration file (prefix the real name with <code>pg_option_</code>) or on the
command line, via the <code>-o, --pg-options</code> switch, that you can use more than
once. Command line setting will take precedence over any other setup, of
course. Consider for example <code>-o standard_conforming_strings=on</code>.</p>

<p>While at it, some more options can now be set on the command line, including
<code>-t, --section-threads</code> and <code>-m, --max-parallel-sections</code> on the one hand and
<code>-r, --reject-log</code> and <code>-j, --reject-data</code> on the other hand. Those two last
must contain a <code>%s</code> place holder which will get replaced by the <em>section</em> name,
or the <code>filename</code> if you skipped setting up a <em>section</em> for it.</p>

<p>Your <em>pgloader</em> usage is now more command line friendly than ever!</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/prefix.html">prefix</a> <a href="../../../tags/pgloader.html">pgloader</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 06 Apr 2010 09:10:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/04/06-pgloader-activity-report.html</guid>
</item>






<item>
  <title>Finding orphaned sequences</title>
  <link>http://tapoueh.org/blog/2010/03/17-finding-orphaned-sequences.html</link>
  <description><![CDATA[h1>Finding orphaned sequences</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/03/index.html>03</a> / </div>
<div class="date">Wednesday, March 17 2010, 13:35</div>
</div>
<div id="article">
<p>This time we're having a database where <em>sequences</em> were used, but not
systematically as a <em>default value</em> of a given column. It's mainly an historic
bad idea, but you know the usual excuse with bad ideas and bad code: the
first 6 months it's experimental, after that it's historic.</p>

<p>Still, here's a query for <code>8.4</code> that will allow you to list those <em>sequences</em>
you have that are not used as a default value in any of your tables:</p>

<pre class="src">
WITH seqs AS (
  SELECT n.nspname, relname as seqname
    FROM pg_class c
         JOIN pg_namespace n on n.oid = c.relnamespace
   WHERE relkind = <span style="color: #ad7fa8; font-style: italic;">'S'</span>
),
     attached_seqs AS (
  SELECT n.nspname,
         c.relname as tablename,
         (regexp_matches(pg_get_expr(d.adbin, d.adrelid),
                         <span style="color: #ad7fa8; font-style: italic;">'''([^'']+)'''</span>))[1] as seqname
    FROM pg_class c
         JOIN pg_namespace n on n.oid = c.relnamespace
         JOIN pg_attribute a on a.attrelid = c.oid
         JOIN pg_attrdef d on d.adrelid = a.attrelid
                            and d.adnum = a.attnum
                            and a.atthasdef
  WHERE relkind = <span style="color: #ad7fa8; font-style: italic;">'r'</span> and a.attnum &gt; 0
        and pg_get_expr(d.adbin, d.adrelid) ~ <span style="color: #ad7fa8; font-style: italic;">'^nextval'</span>
)

 SELECT nspname, seqname, tablename
   FROM seqs s
        LEFT JOIN attached_seqs a USING(nspname, seqname)
  WHERE a.tablename IS NULL;
</pre>

<p>I hope you don't need the query...</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/catalogs.html">catalogs</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 17 Mar 2010 13:35:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/03/17-finding-orphaned-sequences.html</guid>
</item>
<item>
  <title>Emacs Muse hacking</title>
  <link>http://tapoueh.org/blog/2010/03/blog/2010/03/04-emacs-muse-hacking.html</link>
  <description><![CDATA[<p>Now you know what piece of software is used to publish this blog. I really
like it, the major mode makes it a great experience to be using this tool,
and the fact that you produce the <code>HTML</code> and <code>rsync</code> it all from within Emacs
(<code>C-c C-p</code> then <code>C-c C-r</code> with some easy <a href="http://git.tapoueh.org/?p=tapoueh.org.git;a=blob;f=dim-muse.el;hb=HEAD">elisp code</a>) is a big advantage as far
as I'm concerned. No need to resort to <code>shell</code> and <code>Makefile</code>.</p>

<p>What's new here is that I missed the <em>one page per article</em> trend that other
blog software propose, and the blog entries index too. I didn't want to
invest time into hacking Muse itself, that was my excuse for accepting the
situation. But I finally took a deeper look at the <a href="http://mwolson.org/static/doc/muse/Style-Elements.html#Style-Elements">Emacs Muse Manual</a>, and
found out about the <code>:after</code> and <code>:final</code> functions.</p>

<p>Those two function will get run while in the output buffer, the <code>HTML</code>
formatted one. With the <code>:after</code> function, it's still possible to edit the
buffer content, for example to add a mini index to previous articles,
whereas with the <code>:final</code> function the buffer is <code>read-only</code> and already written
to disk, so it's to late to edit it. Still it's possible to cut it in pieces
and write a new file per article you find in there.</p>

<p>The code to realize my wishes is <a href="http://git.tapoueh.org/?p=tapoueh.org.git;a=summary">available</a> but has not been edited with
customisation in mind, so to use it you will have to edit some places rather
than just <code>setq</code> some <code>defcustom</code>. Well, if I have demand, I'll generalize the
code and share it on <a href="http://www.emacswiki.org/">Emacs Wiki</a> and <a href="http://tromey.com/elpa/">ELPA</a>. Meanwhile, happy hacking!</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 04 Mar 2010 13:33:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/03/blog/2010/03/04-emacs-muse-hacking.html</guid>
</item>
<item>
  <title>Emacs Muse hacking</title>
  <link>http://tapoueh.org/blog/2010/03/04-emacs-muse-hacking.html</link>
  <description><![CDATA[h1>Emacs Muse hacking</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/03/index.html>03</a> / </div>
<div class="date">Thursday, March 04 2010, 13:33</div>
</div>
<div id="article">
<p>Now you know what piece of software is used to publish this blog. I really
like it, the major mode makes it a great experience to be using this tool,
and the fact that you produce the <code>HTML</code> and <code>rsync</code> it all from within Emacs
(<code>C-c C-p</code> then <code>C-c C-r</code> with some easy <a href="http://git.tapoueh.org/?p=tapoueh.org.git;a=blob;f=dim-muse.el;hb=HEAD">elisp code</a>) is a big advantage as far
as I'm concerned. No need to resort to <code>shell</code> and <code>Makefile</code>.</p>

<p>What's new here is that I missed the <em>one page per article</em> trend that other
blog software propose, and the blog entries index too. I didn't want to
invest time into hacking Muse itself, that was my excuse for accepting the
situation. But I finally took a deeper look at the <a href="http://mwolson.org/static/doc/muse/Style-Elements.html#Style-Elements">Emacs Muse Manual</a>, and
found out about the <code>:after</code> and <code>:final</code> functions.</p>

<p>Those two function will get run while in the output buffer, the <code>HTML</code>
formatted one. With the <code>:after</code> function, it's still possible to edit the
buffer content, for example to add a mini index to previous articles,
whereas with the <code>:final</code> function the buffer is <code>read-only</code> and already written
to disk, so it's to late to edit it. Still it's possible to cut it in pieces
and write a new file per article you find in there.</p>

<p>The code to realize my wishes is <a href="http://git.tapoueh.org/?p=tapoueh.org.git;a=summary">available</a> but has not been edited with
customisation in mind, so to use it you will have to edit some places rather
than just <code>setq</code> some <code>defcustom</code>. Well, if I have demand, I'll generalize the
code and share it on <a href="http://www.emacswiki.org/">Emacs Wiki</a> and <a href="http://tromey.com/elpa/">ELPA</a>. Meanwhile, happy hacking!</p>


<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/muse.html">Muse</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 04 Mar 2010 13:33:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/03/04-emacs-muse-hacking.html</guid>
</item>




<item>
  <title>Getting out of SQL_ASCII, part 2</title>
  <link>http://tapoueh.org/blog/2010/02/23-getting-out-of-sql_ascii-part-2.html</link>
  <description><![CDATA[h1>Getting out of SQL_ASCII, part 2</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/02/index.html>02</a> / </div>
<div class="date">Tuesday, February 23 2010, 17:30</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>So, if you followed the previous blog entry, now you have a new database
containing all the <em>static</em> tables encoded in <code>UTF-8</code> rather than
<code>SQL_ASCII</code>. Because if it was not yet the case, you now severely distrust
this non-encoding.</p>

<p>Now is the time to have a look at properly encoding the <em>live</em> data, those
stored in tables that continue to receive write traffic. The idea is to use
the <code>UPDATE</code> facilities of PostgreSQL to tweak the data, and too fix the
applications so as not to continue inserting badly encoded strings in there.</p>

<h3>Finding non UTF-8 data</h3>

<p class="first">First you want to find out the badly encoded data. You can do that with this
helper function that <a href="http://blog.rhodiumtoad.org.uk/">RhodiumToad</a> gave me on IRC. I had a version from the
archives before that, but the <em>regexp</em> was hard to maintain and quote into a
<code>PL</code> function. This is avoided by two means, first one is to have a separate
pure <code>SQL</code> function for the <em>regexp</em> checking (so that you can index it should
you need to) and the other one is to apply the regexp to <code>hex</code> encoded
data. Here we go:</p>

<pre class="src">
create or replace function public.utf8hex_valid(str text)
 returns boolean
 language sql immutable
as $f$
   select $1 ~ $r$(?x)
                  ^(?:(?:[0-7][0-9a-f])
                     |(?:(?:c[2-9a-f]|d[0-9a-f])
                        |e0[ab][0-9a-f]
                        |ed[89][0-9a-f]
                        |(?:(?:e[1-9abcef])
                           |f0[9ab][0-9a-f]
                           |f[1-3][89ab][0-9a-f]
                           |f48[0-9a-f]
                          )[89ab][0-9a-f]
                       )[89ab][0-9a-f]
                    )*$
                $r$;
$f$;
</pre>

<p>Now some little scripting around it in order to skip intense manual and
boring work (and see, some more catalog queries). Don't forget we will have
to work on a per-column basis here...</p>

<pre class="src">
create or replace function public.check_encoding_utf8
 (
   IN schemaname text,
   IN tablename  text,
  OUT relname    text,
  OUT attname    text,
  OUT count      bigint
 )
 returns setof record
 language plpgsql
as $f$
DECLARE
  v_sql text;
BEGIN
  FOR relname, attname
   IN SELECT c.relname, a.attname
        FROM pg_attribute a
             JOIN pg_class c on a.attrelid = c.oid
             JOIN pg_namespace s on s.oid = c.relnamespace
             JOIN pg_roles r on r.oid = c.relowner
       WHERE s.nspname = schemaname
         AND atttypid IN (25, 1043) <span style="color: #888a85;">-- text, varchar
</span>         AND relkind = <span style="color: #ad7fa8; font-style: italic;">'r'</span>          <span style="color: #888a85;">-- ordinary table
</span>         AND r.rolname = <span style="color: #ad7fa8; font-style: italic;">'some_specific_role'</span>
         AND CASE WHEN tablename IS NOT NULL
                  THEN c.relname ~ tablename
                  ELSE true
              END
  LOOP
    v_sql := <span style="color: #ad7fa8; font-style: italic;">'SELECT count(*) '</span>
          || <span style="color: #ad7fa8; font-style: italic;">'  FROM ONLY '</span>|| schemaname || <span style="color: #ad7fa8; font-style: italic;">'.'</span> || relname
          || <span style="color: #ad7fa8; font-style: italic;">' WHERE NOT public.utf8hex_valid(encode(textsend('</span>
          || attname
          || <span style="color: #ad7fa8; font-style: italic;">'), ''hex''))'</span>;

    <span style="color: #888a85;">-- RAISE NOTICE 'Checking: %.%', relname, attname;
</span>    <span style="color: #888a85;">-- RAISE NOTICE 'SQL: %', v_sql;
</span>    EXECUTE v_sql INTO count;
    RETURN NEXT;
  END LOOP;
END;
$f$;
</pre>

<p>Note that the <code>tablename</code> is compared using the <code>~</code> operator, so that's <em>regexp</em>
matching there too. Also note that I wanted only to check those tables that
are owned by a specific role, your case may vary.</p>

<p>The way I used this function was like this:</p>

<pre class="src">
create table leon.check_utf8 as
 select *
   from public.check_encoding_utf8();
</pre>

<p>Then you need to take action on those lines in <code>leon.check_utf8</code> table which
have a <code>count &gt; 0</code>. Rince and repeat, but you may soon realise building the
table over and over again is costly.</p>


<h3>Cleaning up the data</h3>

<p class="first">Up for some more helper tools? Unless you really want to manually fix this
huge amount of columns where some data ain't <code>UTF-8</code> compatible... here's some
more:</p>

<pre class="src">
create or replace function leon.nettoyeur
 (
  IN  action      text,
  IN  encoding    text,
  IN  tablename   text,
  IN  columname   text,

  OUT orig        text,
  OUT utf8        text
 )
 returns setof record
 language plpgsql
as $f$
DECLARE
  p_convert text;
BEGIN
  IF encoding IS NULL
  THEN
    p_convert := <span style="color: #ad7fa8; font-style: italic;">'translate('</span>
              || columname || <span style="color: #ad7fa8; font-style: italic;">', '</span>
              || $$<span style="color: #ad7fa8; font-style: italic;">'\211\203\202'</span>$$
              || <span style="color: #ad7fa8; font-style: italic;">', '</span>
              || $$<span style="color: #ad7fa8; font-style: italic;">'   '</span>$$
              || <span style="color: #ad7fa8; font-style: italic;">') '</span>;
  ELSE
    <span style="color: #888a85;">-- in 8.2, write convert using, in 8.3, the other expression
</span>    <span style="color: #888a85;">-- p_convert := 'convert(' || columname || ' using ' || conversion || ') ';
</span>    p_convert := <span style="color: #ad7fa8; font-style: italic;">'convert(textsend('</span> || columname || <span style="color: #ad7fa8; font-style: italic;">'), '''</span>|| encoding ||<span style="color: #ad7fa8; font-style: italic;">''', ''utf-8'' ) '</span>;
  END IF;

  IF action = <span style="color: #ad7fa8; font-style: italic;">'select'</span>
  THEN
    FOR orig, utf8
     IN EXECUTE <span style="color: #ad7fa8; font-style: italic;">'SELECT '</span> || columname || <span style="color: #ad7fa8; font-style: italic;">', '</span>
         || p_convert
         || <span style="color: #ad7fa8; font-style: italic;">'  FROM ONLY '</span> || tablename
         || <span style="color: #ad7fa8; font-style: italic;">' WHERE not public.utf8hex_valid('</span>
         || <span style="color: #ad7fa8; font-style: italic;">'encode(textsend('</span>|| columname ||<span style="color: #ad7fa8; font-style: italic;">'), ''hex''))'</span>
    LOOP
      RETURN NEXT;
    END LOOP;

  ELSIF action = <span style="color: #ad7fa8; font-style: italic;">'update'</span>
  THEN
    EXECUTE <span style="color: #ad7fa8; font-style: italic;">'UPDATE ONLY '</span> || tablename
         || <span style="color: #ad7fa8; font-style: italic;">' SET '</span> || columname || <span style="color: #ad7fa8; font-style: italic;">' = '</span> || p_convert
         || <span style="color: #ad7fa8; font-style: italic;">' WHERE not public.utf8hex_valid('</span>
         || <span style="color: #ad7fa8; font-style: italic;">'encode(textsend('</span>|| columname ||<span style="color: #ad7fa8; font-style: italic;">'), ''hex''))'</span>;

    FOR orig, utf8
     IN SELECT *
          FROM leon.nettoyeur(<span style="color: #ad7fa8; font-style: italic;">'select'</span>, encoding, tablename, columname)
    LOOP
      RETURN NEXT;
    END LOOP;

  ELSE
    RAISE EXCEPTION <span style="color: #ad7fa8; font-style: italic;">'L&#233;on, Nettoyeur, veut de l''action.'</span>;

  END IF;
END;
$f$;
</pre>

<p>As you can see, this function allows to check the conversion process from a
given supposed encoding before to actually convert the data in place. This
is very useful as even when you're pretty sure the non-utf8 data is <code>latin1</code>,
sometime you find it's <code>windows-1252</code> or such. So double check before telling
<code>leon.nettoyeur()</code> to update your precious data!</p>

<p>Also, there's a facility to use <code>translate()</code> when none of the encoding match
your expectations. This is a skeleton just replacing invalid characters with
a <code>space</code>, tweak it at will!</p>


<h3>Conclusion</h3>

<p class="first">Enjoy your clean database now, even if it still accepts new data that will
probably not pass the checks, so we still have to be careful about that and
re-clean every day until the migration is effective. Or maybe add a <code>CHECK</code>
clause that will reject badly encoded data...</p>

<p>In fact here we're using <a href="http://wiki.postgresql.org/wiki/Londiste_Tutorial">Londiste</a> to replicate the <em>live</em> data from the old to
the new server, and that means the replication will break each time there's
new data written in non-utf8, as the new server is running <code>8.4</code>, which by
design ain't very forgiving. Our plan is to clean-up as we go (remove table
from the <em>subscriber</em>, fix it, add it again) and migrate as soon as possible!</p>

<p>Bonus points to those of you getting the convoluted reference :)</p>



<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/plpgsql.html">plpgsql</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 23 Feb 2010 17:30:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/02/23-getting-out-of-sql_ascii-part-2.html</guid>
</item>
<item>
  <title>Getting out of SQL_ASCII, part 1</title>
  <link>http://tapoueh.org/blog/2010/02/18-getting-out-of-sql_ascii-part-1.html</link>
  <description><![CDATA[h1>Getting out of SQL_ASCII, part 1</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/02/index.html>02</a> / </div>
<div class="date">Thursday, February 18 2010, 11:37</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>It happens that you have to manage databases <em>designed</em> by your predecessor,
and it even happens that the team used to not have a <em>DBA</em>. Those <em>histerical
raisins</em> can lead to having a <code>SQL_ASCII</code> database. The horror!</p>

<p>What <code>SQL_ASCII</code> means, if you're not already familiar with the consequences
of such a choice, is that all the <code>text</code> and <code>varchar</code> data that you put in the
database is accepted as-is. No checks. At all. It's pretty nice when you're
lazy enough to not dealing with <em>strange</em> errors in your application, but if
you think that t's a smart move, please go read
<a href="http://www.joelonsoftware.com/articles/Unicode.html">The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)</a>
by <a href="http://www.joelonsoftware.com/">Joel Spolsky</a> now. I said now, I'm waiting for you to get back here. Yes,
I'll wait.</p>

<p>The problem of course is not being able to read the data you just stored,
which is seldom the use case anywhere you use a database solution such as
<a href="http://www.postgresql.org/">PostgreSQL</a>.</p>

<p>Now, it happens too that it's high time to get off of <code>SQL_ASCII</code>, the
infamous. In our case we're lucky enough in that the data are all in fact
<code>latin1</code> or about that, and this comes from the fact that all the applications
connecting to the database are sharing some common code and setup. Then we
have some tables that can be tagged <em>archives</em> and some other <em>live</em>. This blog
post will only deal with the former category.</p>

<p>For those tables that are not receiving changes anymore, we will migrate
them by using a simple but time hungry method: <code>COPY OUT|recode|COPY IN</code>. I've
tried to use <code>iconv</code> for recoding our data, but it failed to do so in lots of
cases, so I've switched to using the <a href="http://www.gnu.org/software/recode/recode.html">GNU recode</a> tool, which works just fine.</p>

<p>The fact that it takes so much time doing the conversion is not really a
problem here, as you can do it <em>offline</em>, while the applications are still
using the <code>SQL_ASCII</code> database. So, here's the program's help:</p>

<pre class="src">
recode.sh [-npdf0TI] [-U user ] -s schema [-m mintable] pattern
     -d debug
     -n dry run, only print table names and expected files
     -s schema
     -m mintable, to skip already processed once
     -U connect to PostgreSQL as user
     -f force table loading even when export files do exist
     -0 only (re)load tables with zero-sized copy files
     -T Truncate the tables before COPYing recoded data
     -I Temporarily drop the indexes of the table while COPYing
pattern ^table_name_, e.g.
</pre>

<p>The <code>-I</code> option is neat enough to create the indexes in parallel, but with no
upper limit on the number of index creation launched. In our case it worked
well, so I didn't have to bother.</p>

<p>Take a look at the <a href="static/recode.sh">recode.sh</a> script, and don't hesitate editing it for your
purpose. It's missing some obvious options to get useful in the large, such</p>

<p>We'll get back to the subject of this entry in <em>part 2</em>, dealing with how to
recode your data in the database itself, thanks to some insane regexp based
queries and helper functions. And thanks to a great deal of IRC based
helping, too.</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 18 Feb 2010 11:37:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/02/18-getting-out-of-sql_ascii-part-1.html</guid>
</item>
<item>
  <title>Resetting sequences. All of them, please!</title>
  <link>http://tapoueh.org/blog/2010/02/16-resetting-sequences-all-of-them-please.html</link>
  <description><![CDATA[h1>Resetting sequences. All of them, please!</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2010/index.html>2010</a> / <a href=../../../blog/2010/02/index.html>02</a> / </div>
<div class="date">Tuesday, February 16 2010, 16:23</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>So, after restoring a production dump with intermediate filtering, none of
our sequences were set to the right value. I could have tried to review the
process of filtering the dump here, but it's a <em>one-shot</em> action and you know
what that sometimes mean. With some pressure you don't script enough of it
and you just crawl more and more.</p>

<p>Still, I think how I solved it is worthy of a blog entry. Not that it's
about a super unusual <em>clever</em> trick, quite the contrary, because questions
involving this trick are often encountered on the support <code>IRC</code>.</p>

<p>The idea is to query the catalog for all sequences, and produce from there
the <code>SQL</code> command you will have to issue for each of them. Once you have this
query, it's quite easy to arrange from the <code>psql</code> prompt as if you had dynamic
scripting capabilities. Of course in <code>9.0</code> you will have <em>inline anonymous</em> <code>DO</code>
blocks.</p>

<pre class="src">
#&gt; \o /tmp/sequences.sql
#&gt; \t
Showing only tuples.
#&gt; YOUR QUERY HERE
#&gt; \o
#&gt; \t
Tuples only is off.
</pre>

<p>Once you have the <code>/tmp/sequences.sql</code> file, you can ask <code>psql</code> to execute its
command as you're used to, that's using <code>\i</code> in an explicit transaction block.</p>

<p>Now, the interresting part if you got here attracted by the blog entry title
is in fact the query itself. A nice way to start is to <code>\set ECHO_HIDDEN</code> then
describe some table, you now have a catalog example query to work with. Then
you tweak it somehow and get this:</p>

<pre class="src">
SELECT <span style="color: #ad7fa8; font-style: italic;">'select '</span>
        || trim(trailing <span style="color: #ad7fa8; font-style: italic;">')'</span>
           from replace(pg_get_expr(d.adbin, d.adrelid),
                        <span style="color: #ad7fa8; font-style: italic;">'nextval'</span>, <span style="color: #ad7fa8; font-style: italic;">'setval'</span>))
        || <span style="color: #ad7fa8; font-style: italic;">', (select max( '</span> || a.attname || <span style="color: #ad7fa8; font-style: italic;">') from only '</span>
        || nspname || <span style="color: #ad7fa8; font-style: italic;">'.'</span> || relname || <span style="color: #ad7fa8; font-style: italic;">'));'</span>
  FROM pg_class c
       JOIN pg_namespace n on n.oid = c.relnamespace
       JOIN pg_attribute a on a.attrelid = c.oid
       JOIN pg_attrdef d on d.adrelid = a.attrelid
                          and d.adnum = a.attnum
                          and a.atthasdef
 WHERE relkind = <span style="color: #ad7fa8; font-style: italic;">'r'</span> and a.attnum &gt; 0
       and pg_get_expr(d.adbin, d.adrelid) ~ <span style="color: #ad7fa8; font-style: italic;">'^nextval'</span>;
</pre>

<p>Coming next, a <code>recode</code> based script in order to get from <code>SQL_ASCII</code> to <code>UTF-8</code>,
and some strange looking queries too.</p>

<pre class="src">
recode.sh [-npdf0TI] [-U user ] -s schema [-m mintable] pattern
</pre>

<p>Stay tuned!</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/catalogs.html">catalogs</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 16 Feb 2010 16:23:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2010/02/16-resetting-sequences-all-of-them-please.html</guid>
</item>
<item>
  <title>pg_staging's bird view</title>
  <link>http://tapoueh.org/blog/2009/12/blog/2009/12/08-pg_stagings-bird-view.html</link>
  <description><![CDATA[<p>One of the most important feedback I got about the presentation of <a href="pgstaging.html">pgstaging</a>
were the lack of pictures, something like a bird-view of how you operate
it. Well, thanks to <a href="http://ditaa.sourceforge.net/">ditaa</a> and Emacs <code>picture-mode</code> here it is:</p>

<center>
<p><img src="../../../images//pg_staging.png" alt=""></p>
</center>

<p>Hope you enjoy, it should not be necessary to comment much if I got to the
point!</p>

<p>Of course I commited the <a href="http://github.com/dimitri/pg_staging/blob/master/bird-view.txt">text source file</a> to the <code>Git</code> repository. The only
problem I ran into is that <code>ditaa</code> defaults to ouputing a quite big right
margin containing only white pixels, and that didn't fit well, visually, in
this blog. So I had to resort to <a href="http://www.imagemagick.org/script/command-line-options.php#crop">ImageMagik crop command</a> in order to avoid
any mouse usage in the production of this diagram.</p>

<pre class="src">
convert .../pg_staging/bird-view.png -crop <span style="color: #bc8f8f;">'!550'</span> bird-view.png
mv bird-view-0.png pg_staging.png
</pre>

<p>Quicker than learning to properly use a mouse, at least for me :)</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 08 Dec 2009 12:04:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/12/blog/2009/12/08-pg_stagings-bird-view.html</guid>
</item>
<item>
  <title>pg_staging's bird view</title>
  <link>http://tapoueh.org/blog/2009/12/08-pg_stagings-bird-view.html</link>
  <description><![CDATA[h1>pg_staging's bird view</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2009/index.html>2009</a> / <a href=../../../blog/2009/12/index.html>12</a> / </div>
<div class="date">Tuesday, December 08 2009, 12:04</div>
</div>
<div id="article">
<p>One of the most important feedback I got about the presentation of <a href="pgstaging.html">pgstaging</a>
were the lack of pictures, something like a bird-view of how you operate
it. Well, thanks to <a href="http://ditaa.sourceforge.net/">ditaa</a> and Emacs <code>picture-mode</code> here it is:</p>

<center>
<p><img src="../../../images//pg_staging.png" alt=""></p>
</center>

<p>Hope you enjoy, it should not be necessary to comment much if I got to the
point!</p>

<p>Of course I commited the <a href="http://github.com/dimitri/pg_staging/blob/master/bird-view.txt">text source file</a> to the <code>Git</code> repository. The only
problem I ran into is that <code>ditaa</code> defaults to ouputing a quite big right
margin containing only white pixels, and that didn't fit well, visually, in
this blog. So I had to resort to <a href="http://www.imagemagick.org/script/command-line-options.php#crop">ImageMagik crop command</a> in order to avoid
any mouse usage in the production of this diagram.</p>

<pre class="src">
convert .../pg_staging/bird-view.png -crop <span style="color: #ad7fa8; font-style: italic;">'!550'</span> bird-view.png
mv bird-view-0.png pg_staging.png
</pre>

<p>Quicker than learning to properly use a mouse, at least for me :)</p>


<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/pg_staging.html">pg_staging</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 08 Dec 2009 12:04:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/12/08-pg_stagings-bird-view.html</guid>
</item>
<item>
  <title>PGday.eu feedback</title>
  <link>http://tapoueh.org/blog/2009/12/blog/2009/12/01-pgdayeu-feedback.html</link>
  <description><![CDATA[<p>At <a href="http://2009.pgday.eu/">pgday</a> there was this form you could fill to give speakers some <em>feedback</em>
about their talks. And that's a really nice way as a speaker to know what to
improve. And as <a href="http://blog.hagander.net/archives/157-Feedback-from-pgday.eu.html">Magnus</a> was searching a nice looking chart facility in python
and I spoke about <a href="http://matplotlib.sourceforge.net/gallery.html">matplotlib</a>, it felt like having to publish something.</p>

<p>Here is my try at some nice graphics. Well I'll let you decide how nice the
result is:</p>

<center>
<p><a class="image-link" href="../../../images//feedback.png">
<img src="../../../images/feedback.png"></a></p>
</center>

<p>If you want to see the little python script I used, here it is: <a href="http://git.tapoueh.org/?p=pgconfs.git;a=blob;f=pgday_2009/feedback.py;hb=master">feedback.py</a>,
with the data embedded and all...</p>

<p>Now, how to read it? Well, the darker the color the better the score. For
example I had <code>5</code> people score me <code>5</code> for <em>Topic Importance</em> on the Hi-Media talk
(in french) and only <code>3</code> people at this same score and topic for <code>pg_staging</code>
talk. The scores are from <code>1</code> to <code>5</code>, <code>5</code> being the best.</p>

<p>The comitee accepted interesting enough topics and it seems I managed to
deliver acceptable content from there. Not very good content, because
reading the comments I missed some nice birds-eye pictures to help the
audience get into the subject. As I'm unable to draw (with or without a
mouse) I plan to fix this in latter talks by using <a href="http://ditaa.sourceforge.net/">ditaa</a>, the <em>DIagrams
Through Ascii Art</em> tool. I already used it and together with <a href="news.dim.html">Emacs</a>
<code>picture-mode</code> it's very nice.</p>

<p>Oh yes the baseline of this post is that there will be later talks. I seem
to be liking those and the audience feedback this time is saying that it's
not too bad for them. See you soon :)</p>

<h3>Update</h3>

<p class="first">I have added the <code>feedback.py</code> script now that each page here is published
separately.</p>

<pre class="src">
<span style="color: #b22222;">#</span><span style="color: #b22222;">! /usr/bin/env python
</span><span style="color: #b22222;">#</span><span style="color: #b22222;">
</span><span style="color: #b22222;"># </span><span style="color: #b22222;">http://matplotlib.sourceforge.net/examples/pylab_examples/bar_stacked.html
</span>
<span style="color: #7f007f;">from</span> pylab <span style="color: #7f007f;">import</span> *
<span style="color: #7f007f;">import</span> numpy <span style="color: #7f007f;">as</span> np

clf()
subplot(111)

<span style="color: #b8860b;">N</span> = 4

<span style="color: #b22222;"># </span><span style="color: #b22222;">http://html-color-codes.info/ for inspiration
</span><span style="color: #b8860b;">scoreColors</span>   = ((<span style="color: #bc8f8f;">'#F5D0A9'</span>, <span style="color: #bc8f8f;">'#F7BE81'</span>,
                  <span style="color: #bc8f8f;">'#FAAC58'</span>, <span style="color: #bc8f8f;">'#FF8000'</span>, <span style="color: #bc8f8f;">'#DF7401'</span>),
                 (<span style="color: #bc8f8f;">'#A9F5A9'</span>, <span style="color: #bc8f8f;">'#81F781'</span>,
                  <span style="color: #bc8f8f;">'#58FA58'</span>, <span style="color: #bc8f8f;">'#2EFE2E'</span>, <span style="color: #bc8f8f;">'#01DF01'</span>))

<span style="color: #b22222;"># </span><span style="color: #b22222;">data from the mail
</span><span style="color: #b8860b;">expHMScores</span>   = ((0, 0, 1, 2, 5),
                 (0, 0, 1, 3, 4),
                 (0, 0, 0, 0, 8),
                 (0, 0, 0, 3, 5))

<span style="color: #b8860b;">stagingScores</span> = ((0, 0, 0, 3, 3),
                 (0, 1, 1, 1, 3),
                 (0, 0, 1, 1, 4),
                 (0, 0, 0, 4, 2))

<span style="color: #b8860b;">ind</span> = np.arange(N)    <span style="color: #b22222;"># </span><span style="color: #b22222;">the x locations for the groups
</span><span style="color: #b8860b;">width</span> = 0.4       <span style="color: #b22222;"># </span><span style="color: #b22222;">the width of the bars: can also be len(x) sequence
</span>
<span style="color: #b8860b;">hd</span> = array([expHMScores[x][0] <span style="color: #7f007f;">for</span> x <span style="color: #7f007f;">in</span> <span style="color: #da70d6;">range</span>(0, 4)])
<span style="color: #b8860b;">hp</span> = bar(ind, hd, width, color = scoreColors[0][0])

<span style="color: #b8860b;">sd</span> = array([stagingScores[x][0] <span style="color: #7f007f;">for</span> x <span style="color: #7f007f;">in</span> <span style="color: #da70d6;">range</span>(0, 4)])
<span style="color: #b8860b;">sp</span> = bar(ind+width, sd, width, color = scoreColors[1][0])

<span style="color: #7f007f;">for</span> s <span style="color: #7f007f;">in</span> <span style="color: #da70d6;">range</span>(1, 5):
    <span style="color: #b8860b;">d</span> = array([expHMScores[x][s] <span style="color: #7f007f;">for</span> x <span style="color: #7f007f;">in</span> <span style="color: #da70d6;">range</span>(0, 4)])
    bar(ind, d, width,
        color = scoreColors[0][s], bottom = hd)
    <span style="color: #b8860b;">hd</span> += d

    <span style="color: #b8860b;">d</span> = array([stagingScores[x][s] <span style="color: #7f007f;">for</span> x <span style="color: #7f007f;">in</span> <span style="color: #da70d6;">range</span>(0, 4)])
    bar(ind+width, d, width,
        color = scoreColors[1][s], bottom = sd)
    <span style="color: #b8860b;">sd</span> += d

ylabel(<span style="color: #bc8f8f;">'Scores'</span>)
title(<span style="color: #bc8f8f;">'PGday 2009 feedback'</span>)
xticks(ind+width,
       (<span style="color: #bc8f8f;">'Topic Importance'</span>,
        <span style="color: #bc8f8f;">'Content Quality'</span>,
        <span style="color: #bc8f8f;">'Speaker knowledge'</span>,
        <span style="color: #bc8f8f;">'Speaker Quality'</span>) )

legend([hp[0], sp[0]], [<span style="color: #bc8f8f;">"Hi-Media"</span>, <span style="color: #bc8f8f;">"pg_staging"</span>])

grid(<span style="color: #da70d6;">True</span>)
savefig(<span style="color: #bc8f8f;">'feedback.png'</span>, dpi=75, orientation=<span style="color: #bc8f8f;">'portrait'</span>)
</pre>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 01 Dec 2009 16:45:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/12/blog/2009/12/01-pgdayeu-feedback.html</guid>
</item>
<item>
  <title>PGday.eu feedback</title>
  <link>http://tapoueh.org/blog/2009/12/01-pgdayeu-feedback.html</link>
  <description><![CDATA[h1>PGday.eu feedback</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2009/index.html>2009</a> / <a href=../../../blog/2009/12/index.html>12</a> / </div>
<div class="date">Tuesday, December 01 2009, 16:45</div>
</div>
<div id="article">
<p>At <a href="http://2009.pgday.eu/">pgday</a> there was this form you could fill to give speakers some <em>feedback</em>
about their talks. And that's a really nice way as a speaker to know what to
improve. And as <a href="http://blog.hagander.net/archives/157-Feedback-from-pgday.eu.html">Magnus</a> was searching a nice looking chart facility in python
and I spoke about <a href="http://matplotlib.sourceforge.net/gallery.html">matplotlib</a>, it felt like having to publish something.</p>

<p>Here is my try at some nice graphics. Well I'll let you decide how nice the
result is:</p>

<center>
<p><a class="image-link" href="../../../images//feedback.png">
<img src="../../../images/feedback.png"></a></p>
</center>

<p>If you want to see the little python script I used, here it is: <a href="http://git.tapoueh.org/?p=pgconfs.git;a=blob;f=pgday_2009/feedback.py;hb=master">feedback.py</a>,
with the data embedded and all...</p>

<p>Now, how to read it? Well, the darker the color the better the score. For
example I had <code>5</code> people score me <code>5</code> for <em>Topic Importance</em> on the Hi-Media talk
(in french) and only <code>3</code> people at this same score and topic for <code>pg_staging</code>
talk. The scores are from <code>1</code> to <code>5</code>, <code>5</code> being the best.</p>

<p>The comitee accepted interesting enough topics and it seems I managed to
deliver acceptable content from there. Not very good content, because
reading the comments I missed some nice birds-eye pictures to help the
audience get into the subject. As I'm unable to draw (with or without a
mouse) I plan to fix this in latter talks by using <a href="http://ditaa.sourceforge.net/">ditaa</a>, the <em>DIagrams
Through Ascii Art</em> tool. I already used it and together with <a href="news.dim.html">Emacs</a>
<code>picture-mode</code> it's very nice.</p>

<p>Oh yes the baseline of this post is that there will be later talks. I seem
to be liking those and the audience feedback this time is saying that it's
not too bad for them. See you soon :)</p>

<h3>Update</h3>

<p class="first">I have added the <code>feedback.py</code> script now that each page here is published
separately.</p>

<pre class="src">
<span style="color: #888a85;">#</span><span style="color: #888a85;">! /usr/bin/env python
</span><span style="color: #888a85;">#</span><span style="color: #888a85;">
</span><span style="color: #888a85;"># </span><span style="color: #888a85;">http://matplotlib.sourceforge.net/examples/pylab_examples/bar_stacked.html
</span>
<span style="color: #729fcf; font-weight: bold;">from</span> pylab <span style="color: #729fcf; font-weight: bold;">import</span> *
<span style="color: #729fcf; font-weight: bold;">import</span> numpy <span style="color: #729fcf; font-weight: bold;">as</span> np

clf()
subplot(111)

<span style="color: #eeeeec;">N</span> = 4

<span style="color: #888a85;"># </span><span style="color: #888a85;">http://html-color-codes.info/ for inspiration
</span><span style="color: #eeeeec;">scoreColors</span>   = ((<span style="color: #ad7fa8; font-style: italic;">'#F5D0A9'</span>, <span style="color: #ad7fa8; font-style: italic;">'#F7BE81'</span>,
                  <span style="color: #ad7fa8; font-style: italic;">'#FAAC58'</span>, <span style="color: #ad7fa8; font-style: italic;">'#FF8000'</span>, <span style="color: #ad7fa8; font-style: italic;">'#DF7401'</span>),
                 (<span style="color: #ad7fa8; font-style: italic;">'#A9F5A9'</span>, <span style="color: #ad7fa8; font-style: italic;">'#81F781'</span>,
                  <span style="color: #ad7fa8; font-style: italic;">'#58FA58'</span>, <span style="color: #ad7fa8; font-style: italic;">'#2EFE2E'</span>, <span style="color: #ad7fa8; font-style: italic;">'#01DF01'</span>))

<span style="color: #888a85;"># </span><span style="color: #888a85;">data from the mail
</span><span style="color: #eeeeec;">expHMScores</span>   = ((0, 0, 1, 2, 5),
                 (0, 0, 1, 3, 4),
                 (0, 0, 0, 0, 8),
                 (0, 0, 0, 3, 5))

<span style="color: #eeeeec;">stagingScores</span> = ((0, 0, 0, 3, 3),
                 (0, 1, 1, 1, 3),
                 (0, 0, 1, 1, 4),
                 (0, 0, 0, 4, 2))

<span style="color: #eeeeec;">ind</span> = np.arange(N)    <span style="color: #888a85;"># </span><span style="color: #888a85;">the x locations for the groups
</span><span style="color: #eeeeec;">width</span> = 0.4       <span style="color: #888a85;"># </span><span style="color: #888a85;">the width of the bars: can also be len(x) sequence
</span>
<span style="color: #eeeeec;">hd</span> = array([expHMScores[x][0] <span style="color: #729fcf; font-weight: bold;">for</span> x <span style="color: #729fcf; font-weight: bold;">in</span> <span style="color: #729fcf;">range</span>(0, 4)])
<span style="color: #eeeeec;">hp</span> = bar(ind, hd, width, color = scoreColors[0][0])

<span style="color: #eeeeec;">sd</span> = array([stagingScores[x][0] <span style="color: #729fcf; font-weight: bold;">for</span> x <span style="color: #729fcf; font-weight: bold;">in</span> <span style="color: #729fcf;">range</span>(0, 4)])
<span style="color: #eeeeec;">sp</span> = bar(ind+width, sd, width, color = scoreColors[1][0])

<span style="color: #729fcf; font-weight: bold;">for</span> s <span style="color: #729fcf; font-weight: bold;">in</span> <span style="color: #729fcf;">range</span>(1, 5):
    <span style="color: #eeeeec;">d</span> = array([expHMScores[x][s] <span style="color: #729fcf; font-weight: bold;">for</span> x <span style="color: #729fcf; font-weight: bold;">in</span> <span style="color: #729fcf;">range</span>(0, 4)])
    bar(ind, d, width,
        color = scoreColors[0][s], bottom = hd)
    <span style="color: #eeeeec;">hd</span> += d

    <span style="color: #eeeeec;">d</span> = array([stagingScores[x][s] <span style="color: #729fcf; font-weight: bold;">for</span> x <span style="color: #729fcf; font-weight: bold;">in</span> <span style="color: #729fcf;">range</span>(0, 4)])
    bar(ind+width, d, width,
        color = scoreColors[1][s], bottom = sd)
    <span style="color: #eeeeec;">sd</span> += d

ylabel(<span style="color: #ad7fa8; font-style: italic;">'Scores'</span>)
title(<span style="color: #ad7fa8; font-style: italic;">'PGday 2009 feedback'</span>)
xticks(ind+width,
       (<span style="color: #ad7fa8; font-style: italic;">'Topic Importance'</span>,
        <span style="color: #ad7fa8; font-style: italic;">'Content Quality'</span>,
        <span style="color: #ad7fa8; font-style: italic;">'Speaker knowledge'</span>,
        <span style="color: #ad7fa8; font-style: italic;">'Speaker Quality'</span>) )

legend([hp[0], sp[0]], [<span style="color: #ad7fa8; font-style: italic;">"Hi-Media"</span>, <span style="color: #ad7fa8; font-style: italic;">"pg_staging"</span>])

grid(<span style="color: #729fcf;">True</span>)
savefig(<span style="color: #ad7fa8; font-style: italic;">'feedback.png'</span>, dpi=75, orientation=<span style="color: #ad7fa8; font-style: italic;">'portrait'</span>)
</pre>



<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/pg_staging.html">pg_staging</a> <a href="../../../tags/conferences.html">Conferences</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 01 Dec 2009 16:45:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/12/01-pgdayeu-feedback.html</guid>
</item>




<item>
  <title>prefix 1.1.0</title>
  <link>http://tapoueh.org/blog/2009/11/30-prefix-110.html</link>
  <description><![CDATA[h1>prefix 1.1.0</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2009/index.html>2009</a> / <a href=../../../blog/2009/11/index.html>11</a> / </div>
<div class="date">Monday, November 30 2009, 12:10</div>
</div>
<div id="article">
<p>So I had two <a href="http://archives.postgresql.org/pgsql-general/2009-11/msg01042.php">bug</a> <a href="http://lists.pgfoundry.org/pipermail/prefix-users/2009-November/000005.html">reports</a> about <a href="prefix.html">prefix</a> in less than a week. It means several
things, one of them is that my code is getting used in the wild, which is
nice. The other side of the coin is that people do find bugs in there. This
one is about the behavior of the <code>btree opclass</code> of the type <code>prefix range</code>. We
cheat a lot there by simply having written one, because a range does not
have a strict ordering: is <code>[1-3]</code> before of after <code>[2-4]</code>? But when you know
you have no overlapping intervals in your <code>prefix_range</code> column, being able to
have it part of a <em>primary key</em> is damn useful.</p>

<p>Note: in <code>8.5</code> we should have a way to express <em>contraint exclusion</em> and have
PostgreSQL forbids overlapping entries for us. Not being there yet, you
could write a <em>constraint trigger</em> and use the <em>GiST index</em> to have nice speed
there, which is exactly what this <em>constraint exclusion</em> support is about.</p>

<p>It turns out the code change required is pretty simple:</p>

<pre class="src">
-    <span style="color: #729fcf; font-weight: bold;">return</span> (a-&gt;first == b-&gt;first) ? (a-&gt;last - b-&gt;last) : (a-&gt;first - b-&gt;first)<span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">;</span>
+    <span style="color: #888a85;">/*</span><span style="color: #888a85;">
+     * we are comparing e.g. '1' and '12' (the shorter contains the
+     * smaller), so let's pretend '12' &lt; '1' as it contains less elements.
+     </span><span style="color: #888a85;">*/</span>
+    <span style="color: #729fcf; font-weight: bold;">return</span> (alen == mlen) ? 1 : -1;
</pre>

<p>This happens in the <em>compare support function</em> (see
<a href="http://www.postgresql.org/docs/8.4/interactive/xindex.html">Interfacing Extensions To Indexes</a>) so that means you now have to rebuild
your <code>prefix_range</code> btree indexes, hence the version number bump.</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/prefix.html">prefix</a> <a href="../../../tags/9.1.html">9.1</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 30 Nov 2009 12:10:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/11/30-prefix-110.html</guid>
</item>
<item>
  <title>Yet Another PostgreSQL tool hits debian</title>
  <link>http://tapoueh.org/blog/2009/11/25-yet-another-postgresql-tool-hits-debian.html</link>
  <description><![CDATA[h1>Yet Another PostgreSQL tool hits debian</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2009/index.html>2009</a> / <a href=../../../blog/2009/11/index.html>11</a> / </div>
<div class="date">Wednesday, November 25 2009, 11:49</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>So there it is, this newer contribution of mine that I presented at <a href="http://2009.pgday.eu">PGDay</a> is
now in <code>debian NEW</code> queue. <a href="pgstaging.html">pg_staging</a> will empower you with respect to what
you do about those nightly backups (<code>pg_dump -Fc</code> or something).</p>

<p>The tool provides a lot of commands to either <code>dump</code> or <code>restore</code> a database. It
comes with documentation covering about it all, except for the <em>londiste</em>
support part, which will be there in time for <code>1.0.0</code> release. The <a href="http://github.com/dimitri/pg_staging/blob/master/TODO">Todo list</a>
is getting smaller and smaller, the version you'll soon find in <code>debian sid</code>
is already called <code>0.9</code>.</p>

<p>So, how do you go about using this software, and what service it implements?</p>

<h3>it's all about deriving a staging environment from your backups</h3>

<p class="first">To validate backups, you want to restore them and check the database you get
from them. And your developers will want to sometime refresh the database
they're working with. And you could have both an integration environment and
a pre-live one: On the former, you develop new code atop a stable set of
data; while on the latter you test stable enough code (ready to go live) on
a set of data as near as live data as possible.</p>

<p>And you want to be flexible about it, so that there's not a fulltime job to
handle retoring databases each and every days, for project A integration or
project B pre-live testing, or project C accounting snapshot. Or you name
it.</p>

<p>And of course you want to have a single point of control of all your
databases. Let's call it the <em>controler</em>.</p>


<h3>setting up pg_staging</h3>

<p class="first">The <a href="pgstaging.html">pg_staging</a> setup consists of one <code>pg_staging.ini</code> file wherein you
describe your different target databases (those <code>dev</code> and <code>prelive</code> ones), and
of course where to get the production backups from. Currently you have to
serve the backups file in a format suitable for <code>pg_restore</code> (that means you
use either <code>pg_dump -Ft</code> or <code>pg_dump -Fc</code>) on an <code>apache</code> folder. The produced
<code>HTML</code> will get parsed.</p>

<p>So you setup the <code>DEFAULT</code> section with common settings, then one section per
target: the databases you want to restore. Tell <code>pg_staging</code> where they are
(<code>host</code>), etc, and it'll be able to drive them.</p>

<p>In order to being able to host more than a single restored dump on a staging
server, for the same database, we use <code>pgbouncer</code>:</p>

<pre class="src">
pg_staging&gt; pgbouncer some_db.dev
              some_db      some_db_20091029 :5432
     some_db_20090717      some_db_20090717 :5432
     some_db_20091029      some_db_20091029 :5432
</pre>

<p>So as explained into the <code>pg_staging(1)</code> man page, you have to open
non-interactive <code>SSH</code> connection from the <em>controler</em> to the <em>hosts</em> where the
databases will get restored. Then you have to do a minimal setup pgbouncer
on the <em>hosts</em> with a <code>trust</code> connection. It'll get used from <code>pg_staging</code> for
adding newly restored database and have them accessible. Then you can also
<code>switch</code> the new database to being the virtual <em>some_db</em> so that you avoid
editing any connection string on your softwares.</p>

<p>Also, install the <code>pgstaging-client</code> package on every host you target. The
client is a simple shell script that must run as root (<code>sudo</code> is used) in
order to replace your <code>pgbouncer</code> setup or manage your <code>londiste</code> services.</p>

<p>See <code>man 5 pg_staging</code> for available options, including <em>schemas</em> to filter out
either completely or just skipping data restoring in those.</p>


<h3>pg_staging usage</h3>

<p class="first">Now you're all setup, you can begin to enjoy using <code>pgstaging</code>. Enter the
console and see what you have in there.</p>

<pre class="src">
$ pg_staging
Welcome to pg_staging 0.9.
pg_staging&gt; databases
...
pg_staging&gt; restore some_db.dev
...
pg_staging&gt; pgbouncer some_db.dev
...
pg_staging&gt; dbsizes --all some_db.dev
...
pg_staging&gt; psql some_db.dev
some_db_20091125=#
</pre>

<p>And as you can see in <code>man pg_staging</code> there are a lot of commands
already. You can for example obtain a new <em>pg_restore catalog</em> from a dump
file, with some <em>schemas</em> commented out. It will even comment out <code>triggers</code>
that are using a <code>function</code> which is defined in a filtered out <code>schema</code>, for
example a <code>PGQ</code> trigger. And much much more.</p>

<p><a href="pgstaging.html">pg_staging</a> will even allow you to <code>dump</code> your production databases, but
consider installing a separate instance of it on the machine serving the
backups to your local network thanks to an <code>apache</code> directory listing!</p>


<h3>Roadmap to <code>1.0.0</code></h3>

<p class="first">What's remain to be done is testing and having <code>PITR</code> based restoring to work,
and adding some documentation (tutorial, which this blog post about is; and
<em>londiste</em> support). At this point, unless some reader here asks for a new
feature (set), I'll consider <code>pg_staging</code> ready for <code>1.0.0</code>. After all, we're
using it about daily here :)</p>

<p>Consider commenting, you should be able to easily spot my private mail
address...</p>



<h2>Tags</h2>

<p><a href="../../../tags/debian.html">debian</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/backup.html">backup</a> <a href="../../../tags/restore.html">restore</a> <a href="../../../tags/pg_staging.html">pg_staging</a> <a href="../../../tags/9.1.html">9.1</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 25 Nov 2009 11:49:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/11/25-yet-another-postgresql-tool-hits-debian.html</guid>
</item>
<item>
  <title>PGDay.eu, Paris: it was awesome!</title>
  <link>http://tapoueh.org/blog/2009/11/09-pgdayeu-paris-it-was-awesome.html</link>
  <description><![CDATA[h1>PGDay.eu, Paris: it was awesome!</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2009/index.html>2009</a> / <a href=../../../blog/2009/11/index.html>11</a> / </div>
<div class="date">Monday, November 09 2009, 09:50</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>moment. Lots of <a href="http://2009.pgday.eu/_media/group_2009_1.jpg?cache=">attendees</a>, lots of quality talks (<a href="http://wiki.postgresql.org/wiki/PGDay.EU%2C_Paris_2009">slides</a> are online), good
food, great party: all the ingredients were there!</p>

<p>It also was for me the occasion to first talk about this tool I've been
working on for months, called <a href="pgstaging.html">pg_staging</a>, which aims to empower those boring
production backups to help maintaining <em>staging</em> environments (for your
developers and testers).</p>

<p>All in all such events keep reminding me what it means exactly when we way
that one of the greatest things about <a href="http://www.postgresql.org/">PostgreSQL</a> is its community. If you
don't know what I'm talking about, consider <a href="http://www.postgresql.org/community/">joining</a>!</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/backup.html">backup</a> <a href="../../../tags/pg_staging.html">pg_staging</a> <a href="../../../tags/conferences.html">Conferences</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 09 Nov 2009 09:50:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/11/09-pgdayeu-paris-it-was-awesome.html</guid>
</item>


<item>
  <title>Emacs Muse based publishing</title>
  <link>http://tapoueh.org/blog/2009/10/blog/2009/10/06-emacs-muse-based-publishing.html</link>
  <description><![CDATA[<p>As you might have noticed, this little blog of mine is not compromising much
and entirely maintained from Emacs. Until today, I had to resort to <code>term</code> to
upload my publications, though, as I've been too lazy to hack up the tools
integration for simply doing a single <code>rsync</code> command line. That was one time
to many:</p>

<pre class="src">
(<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">dim:muse-rsync-options</span> <span style="color: #bc8f8f;">"-avz"</span>
  <span style="color: #bc8f8f;">"rsync options"</span>)

(<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">dim:muse-rsync-source</span> <span style="color: #bc8f8f;">"~/dev/muse/out"</span>
  <span style="color: #bc8f8f;">"local path from where to rsync, with no ending /"</span>)

(<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">dim:muse-rsync-target</span>
  <span style="color: #bc8f8f;">"dim@tapoueh.org:/home/www/tapoueh.org/blog.tapoueh.org"</span>
  <span style="color: #bc8f8f;">"Remote URL to use as rsync target, with no ending /"</span>)

(<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">dim:muse-rsync-extra-subdirs</span>
  '(<span style="color: #bc8f8f;">"../css"</span> <span style="color: #bc8f8f;">"../images"</span> <span style="color: #bc8f8f;">"../pdf"</span>)
  <span style="color: #bc8f8f;">"static subdirs to rsync too, path from dim:muse-rsync-source, no ending /"</span>)

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">dim:muse-project-rsync</span> (<span style="color: #228b22;">&amp;optional</span> static)
  <span style="color: #bc8f8f;">"publish tapoueh.org using rsync"</span>
  (interactive <span style="color: #bc8f8f;">"P"</span>)
  (<span style="color: #7f007f;">let*</span> ((rsync-command (format <span style="color: #bc8f8f;">"rsync %s %s %s"</span>
                                dim:muse-rsync-options
                                (concat dim:muse-rsync-source <span style="color: #bc8f8f;">"/"</span>)
                                (concat dim:muse-rsync-target <span style="color: #bc8f8f;">"/"</span>))))
    (<span style="color: #7f007f;">with-current-buffer</span> (get-buffer-create <span style="color: #bc8f8f;">"*muse-rsync*"</span>)
      (erase-buffer)
      (insert (concat rsync-command <span style="color: #bc8f8f;">"\n"</span>))
      (message <span style="color: #bc8f8f;">"%s"</span> rsync-command)
      (insert (shell-command-to-string rsync-command))
      (insert <span style="color: #bc8f8f;">"\n"</span>)

      (<span style="color: #7f007f;">when</span> static
        (<span style="color: #7f007f;">dolist</span> (subdir dim:muse-rsync-extra-subdirs)
          (<span style="color: #7f007f;">let</span> ((cmd (format <span style="color: #bc8f8f;">"rsync %s %s %s"</span>
                             dim:muse-rsync-options
                             (concat dim:muse-rsync-source <span style="color: #bc8f8f;">"/"</span> subdir)
                             dim:muse-rsync-target)))
            (insert (concat cmd <span style="color: #bc8f8f;">"\n"</span>))
            (message <span style="color: #bc8f8f;">"%s"</span> cmd)
            (insert (shell-command-to-string cmd))
            (insert <span style="color: #bc8f8f;">"\n"</span>)))))))

(define-key muse-mode-map (kbd <span style="color: #bc8f8f;">"C-c R"</span>) 'dim:muse-project-rsync)
</pre>

<p>So now to publish this blog, it's just a <code>C-c R</code> away! :)</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 06 Oct 2009 17:23:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/10/blog/2009/10/06-emacs-muse-based-publishing.html</guid>
</item>
<item>
  <title>Emacs Muse based publishing</title>
  <link>http://tapoueh.org/blog/2009/10/06-emacs-muse-based-publishing.html</link>
  <description><![CDATA[h1>Emacs Muse based publishing</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2009/index.html>2009</a> / <a href=../../../blog/2009/10/index.html>10</a> / </div>
<div class="date">Tuesday, October 06 2009, 17:23</div>
</div>
<div id="article">
<p>As you might have noticed, this little blog of mine is not compromising much
and entirely maintained from Emacs. Until today, I had to resort to <code>term</code> to
upload my publications, though, as I've been too lazy to hack up the tools
integration for simply doing a single <code>rsync</code> command line. That was one time
to many:</p>

<pre class="src">
(<span style="color: #729fcf; font-weight: bold;">defvar</span> <span style="color: #eeeeec;">dim:muse-rsync-options</span> <span style="color: #ad7fa8; font-style: italic;">"-avz"</span>
  <span style="color: #888a85;">"rsync options"</span>)

(<span style="color: #729fcf; font-weight: bold;">defvar</span> <span style="color: #eeeeec;">dim:muse-rsync-source</span> <span style="color: #ad7fa8; font-style: italic;">"~/dev/muse/out"</span>
  <span style="color: #888a85;">"local path from where to rsync, with no ending /"</span>)

(<span style="color: #729fcf; font-weight: bold;">defvar</span> <span style="color: #eeeeec;">dim:muse-rsync-target</span>
  <span style="color: #ad7fa8; font-style: italic;">"dim@tapoueh.org:/home/www/tapoueh.org/blog.tapoueh.org"</span>
  <span style="color: #888a85;">"Remote URL to use as rsync target, with no ending /"</span>)

(<span style="color: #729fcf; font-weight: bold;">defvar</span> <span style="color: #eeeeec;">dim:muse-rsync-extra-subdirs</span>
  '(<span style="color: #ad7fa8; font-style: italic;">"../css"</span> <span style="color: #ad7fa8; font-style: italic;">"../images"</span> <span style="color: #ad7fa8; font-style: italic;">"../pdf"</span>)
  <span style="color: #888a85;">"static subdirs to rsync too, path from dim:muse-rsync-source, no ending /"</span>)

(<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">dim:muse-project-rsync</span> (<span style="color: #8ae234; font-weight: bold;">&amp;optional</span> static)
  <span style="color: #888a85;">"publish tapoueh.org using rsync"</span>
  (interactive <span style="color: #ad7fa8; font-style: italic;">"P"</span>)
  (<span style="color: #729fcf; font-weight: bold;">let*</span> ((rsync-command (format <span style="color: #ad7fa8; font-style: italic;">"rsync %s %s %s"</span>
                                dim:muse-rsync-options
                                (concat dim:muse-rsync-source <span style="color: #ad7fa8; font-style: italic;">"/"</span>)
                                (concat dim:muse-rsync-target <span style="color: #ad7fa8; font-style: italic;">"/"</span>))))
    (<span style="color: #729fcf; font-weight: bold;">with-current-buffer</span> (get-buffer-create <span style="color: #ad7fa8; font-style: italic;">"*muse-rsync*"</span>)
      (erase-buffer)
      (insert (concat rsync-command <span style="color: #ad7fa8; font-style: italic;">"\n"</span>))
      (message <span style="color: #ad7fa8; font-style: italic;">"%s"</span> rsync-command)
      (insert (shell-command-to-string rsync-command))
      (insert <span style="color: #ad7fa8; font-style: italic;">"\n"</span>)

      (<span style="color: #729fcf; font-weight: bold;">when</span> static
        (<span style="color: #729fcf; font-weight: bold;">dolist</span> (subdir dim:muse-rsync-extra-subdirs)
          (<span style="color: #729fcf; font-weight: bold;">let</span> ((cmd (format <span style="color: #ad7fa8; font-style: italic;">"rsync %s %s %s"</span>
                             dim:muse-rsync-options
                             (concat dim:muse-rsync-source <span style="color: #ad7fa8; font-style: italic;">"/"</span> subdir)
                             dim:muse-rsync-target)))
            (insert (concat cmd <span style="color: #ad7fa8; font-style: italic;">"\n"</span>))
            (message <span style="color: #ad7fa8; font-style: italic;">"%s"</span> cmd)
            (insert (shell-command-to-string cmd))
            (insert <span style="color: #ad7fa8; font-style: italic;">"\n"</span>)))))))

(define-key muse-mode-map (kbd <span style="color: #ad7fa8; font-style: italic;">"C-c R"</span>) 'dim:muse-project-rsync)
</pre>

<p>So now to publish this blog, it's just a <code>C-c R</code> away! :)</p>


<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/muse.html">Muse</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 06 Oct 2009 17:23:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/10/06-emacs-muse-based-publishing.html</guid>
</item>








<item>
  <title>prefix 1.0.0</title>
  <link>http://tapoueh.org/blog/2009/10/06-prefix-100.html</link>
  <description><![CDATA[h1>prefix 1.0.0</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2009/index.html>2009</a> / <a href=../../../blog/2009/10/index.html>10</a> / </div>
<div class="date">Tuesday, October 06 2009, 15:56</div>
</div>
<div id="article">
<p>So there it is, at long last, the final <code>1.0.0</code> release of prefix! It's on its
way into the debian repository (targetting sid, in testing in 10 days) and
available on <a href="http://pgfoundry.org/frs/?group_id=1000352">pgfoundry</a> to.</p>

<p>In order to make it clear that I intend to maintain this version, the number
has 3 digits rather than 2... which is also what <a href="http://www.postgresql.org/support/versioning">PostgreSQL</a> users will
expect.</p>

<p>The only last minute change is that you can now use the first version of the
two following rather than the second one:</p>

<pre class="src">
-  create index idx_prefix on prefixes using gist(prefix gist_prefix_range_ops);
+  create index idx_prefix on prefixes using gist(prefix);
</pre>

<p>For you information, I'm thinking about leaving <code>pgfoundry</code> as far as the
source code management goes, because I'd like to be done with <code>CVS</code>. I'd still
use the release file hosting though at least for now. It's a burden but it's
easier for the users to find them, when they are not using plain <code>apt-get
install</code>. That move would lead to host <a href="http://pgfoundry.org/projects/prefix/">prefix</a> and <a href="http://pgfoundry.org/projects/pgloader">pgloader</a> and the <a href="http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/backports/">backports</a>
over there at <a href="http://github.com/dimitri">github</a>, where my next pet project, <code>pg_staging</code>, will be hosted
too.</p>

<p>The way to see this <em>pgfoundry</em> leaving is that if everybody does the same,
then migrating the facility to some better or more recent hosting software
will be easier. Maybe some other parts of the system are harder than the
sources to migrate, though. If that's the case I'll consider moving them out
too, maybe getting listed on the <a href="http://www.postgresql.org/download/product-categories">PostgreSQL Software Catalogue</a> will prove
enough as far as web presence goes?</p>



<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/backports.html">backports</a> <a href="../../../tags/pg_staging.html">pg_staging</a> <a href="../../../tags/prefix.html">prefix</a> <a href="../../../tags/pgloader.html">pgloader</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 06 Oct 2009 15:56:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/10/06-prefix-100.html</guid>
</item>
















<item>
  <title>Emacs is Twinkling here</title>
  <link>http://tapoueh.org/blog/2009/09/blog/2009/09/24-emacs-is-twinkling-here.html</link>
  <description><![CDATA[<p>So you have a <em>rolodex</em> like database in your Emacs, or you have this phone
number in a mail and you want to call it. It happens you have <code>VoIP</code> setup and
you're using <a href="http://www.twinklephone.com/">Twinkle</a> to make your calls. Maybe you'll then find this
function useful:</p>

<pre class="src">
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">twinkle-call-symbol-or-region</span> ()
  <span style="color: #bc8f8f;">"Call the phone number at point (symbol seems good enough), or in region"</span>
  (interactive)
  (shell-command-to-string
   (format <span style="color: #bc8f8f;">"twinkle --cmd 'call %s'"</span>
           (replace-regexp-in-string
            <span style="color: #bc8f8f;">"[</span><span style="color: #bc8f8f;">^</span><span style="color: #bc8f8f;">0-9+]"</span> <span style="color: #bc8f8f;">""</span>
            (<span style="color: #7f007f;">if</span> (use-region-p)
                (buffer-substring (region-beginning) (region-end))
              (thing-at-point 'symbol))))))
</pre>

<p>It happens that <code>symbol</code> is better than <code>word</code> here because some phone numbers
begin with <code>+</code>. And some contains <code>/</code> or <code>.</code> as separators, or some other
variations (spaces) so as the number is easy to read for human eyes. <em>Twinkle</em>
will not like this.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 24 Sep 2009 18:08:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/09/blog/2009/09/24-emacs-is-twinkling-here.html</guid>
</item>
<item>
  <title>Emacs is Twinkling here</title>
  <link>http://tapoueh.org/blog/2009/09/24-emacs-is-twinkling-here.html</link>
  <description><![CDATA[h1>Emacs is Twinkling here</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2009/index.html>2009</a> / <a href=../../../blog/2009/09/index.html>09</a> / </div>
<div class="date">Thursday, September 24 2009, 18:08</div>
</div>
<div id="article">
<p>So you have a <em>rolodex</em> like database in your Emacs, or you have this phone
number in a mail and you want to call it. It happens you have <code>VoIP</code> setup and
you're using <a href="http://www.twinklephone.com/">Twinkle</a> to make your calls. Maybe you'll then find this
function useful:</p>

<pre class="src">
(<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">twinkle-call-symbol-or-region</span> ()
  <span style="color: #888a85;">"Call the phone number at point (symbol seems good enough), or in region"</span>
  (interactive)
  (shell-command-to-string
   (format <span style="color: #ad7fa8; font-style: italic;">"twinkle --cmd 'call %s'"</span>
           (replace-regexp-in-string
            <span style="color: #ad7fa8; font-style: italic;">"[</span><span style="color: #ad7fa8; font-style: italic;">^</span><span style="color: #ad7fa8; font-style: italic;">0-9+]"</span> <span style="color: #ad7fa8; font-style: italic;">""</span>
            (<span style="color: #729fcf; font-weight: bold;">if</span> (use-region-p)
                (buffer-substring (region-beginning) (region-end))
              (thing-at-point 'symbol))))))
</pre>

<p>It happens that <code>symbol</code> is better than <code>word</code> here because some phone numbers
begin with <code>+</code>. And some contains <code>/</code> or <code>.</code> as separators, or some other
variations (spaces) so as the number is easy to read for human eyes. <em>Twinkle</em>
will not like this.</p>


<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 24 Sep 2009 18:08:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/09/24-emacs-is-twinkling-here.html</guid>
</item>
<item>
  <title>Escreen integration</title>
  <link>http://tapoueh.org/blog/2009/09/blog/2009/09/22-escreen-integration.html</link>
  <description><![CDATA[<p>After having used <a href="http://www.morishima.net/~naoto/software/elscreen/">elscreen</a> for a long time, I'm now a very happy user of
<a href="http://www.splode.com/~friedman/software/emacs-lisp/#ui">escreen</a>, which feels much better integrated and allows to have one ring of
recently visited buffers per screen. Which is what you need when using a
<em>screen</em> like feature, really.</p>

<p>At first, it seemed so good as not to require any tweaking, but soon enough
I had to adapt it to my workflow. After all that's exactly for being able to
do this that I'm using emacs :)</p>

<p>It began quite simple with things like <code>M-[</code> and <code>M-]</code> to navigate in screens,
and mouse wheel support to, but then I found that the <code>C-\ b</code> list of screens
could also support the <code>C-\ a runs the command
escreen-get-active-screen-numbers</code> command by just adding some <em>emphasis</em> to
the current escreen in use.</p>

<p>As soon as I had this, and seeing people eyes blinking when working with me
in front of my computer, I wanted to have <em>escreen</em> switching display where I
am in the minibuffer. You have to try the mouse wheel navigation to fully
appreciate it I guess. Anyway, here it is:</p>

<pre class="src">
(load <span style="color: #bc8f8f;">"escreen"</span>)
(escreen-install)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">add C-\ l to list screens with emphase for current one
</span>(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">escreen-get-active-screen-numbers-with-emphasis</span> ()
  <span style="color: #bc8f8f;">"what the name says"</span>
  (interactive)
  (<span style="color: #7f007f;">let</span> ((escreens (escreen-get-active-screen-numbers))
        (emphased <span style="color: #bc8f8f;">""</span>))

    (<span style="color: #7f007f;">dolist</span> (s escreens)
      (setq emphased
            (concat emphased (<span style="color: #7f007f;">if</span> (= escreen-current-screen-number s)
                                 (propertize (number-to-string s)
                                             <span style="color: #b22222;">;;</span><span style="color: #b22222;">'face 'custom-variable-tag) " ")
</span>                                             'face 'info-title-3)
                                             <span style="color: #b22222;">;;</span><span style="color: #b22222;">'face 'font-lock-warning-face)
</span>                                             <span style="color: #b22222;">;;</span><span style="color: #b22222;">'face 'secondary-selection)
</span>                               (number-to-string s))
                    <span style="color: #bc8f8f;">" "</span>)))
    (message <span style="color: #bc8f8f;">"escreen: active screens: %s"</span> emphased)))

(global-set-key (kbd <span style="color: #bc8f8f;">"C-\\ l"</span>) 'escreen-get-active-screen-numbers-with-emphasis)

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">dim:escreen-goto-last-screen</span> ()
  (interactive)
  (escreen-goto-last-screen)
  (escreen-get-active-screen-numbers-with-emphasis))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">dim:escreen-goto-prev-screen</span> (<span style="color: #228b22;">&amp;optional</span> n)
  (interactive <span style="color: #bc8f8f;">"p"</span>)
  (escreen-goto-prev-screen n)
  (escreen-get-active-screen-numbers-with-emphasis))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">dim:escreen-goto-next-screen</span> (<span style="color: #228b22;">&amp;optional</span> n)
  (interactive <span style="color: #bc8f8f;">"p"</span>)
  (escreen-goto-next-screen n)
  (escreen-get-active-screen-numbers-with-emphasis))

(define-key escreen-map escreen-prefix-char 'dim:escreen-goto-last-screen)

(global-set-key (kbd <span style="color: #bc8f8f;">"M-["</span>) 'dim:escreen-goto-prev-screen)
(global-set-key (kbd <span style="color: #bc8f8f;">"M-]"</span>) 'dim:escreen-goto-next-screen)
(global-set-key (kbd <span style="color: #bc8f8f;">"C-\\ DEL"</span>) 'dim:escreen-goto-prev-screen)
(global-set-key (kbd <span style="color: #bc8f8f;">"C-\\ SPC"</span>) 'dim:escreen-goto-next-screen)

(global-set-key '[s-mouse-4] 'dim:escreen-goto-prev-screen)
(global-set-key '[s-mouse-5] 'dim:escreen-goto-next-screen)
</pre>

<p>Oh, and as I'm in the <em>terms in emacs</em> part of universe (rather than using
<code>emacs -nw</code> in some terminal emulator, but loosing sync between X clipbloard
and emacs selection), I had to add this too:</p>

<pre class="src">
<span style="color: #b22222;">;; </span><span style="color: #b22222;">add support for C-\ from terms
</span>(<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">term</span>)
(define-key term-raw-map escreen-prefix-char escreen-map)
(define-key term-raw-map (kbd <span style="color: #bc8f8f;">"M-["</span>) 'dim:escreen-goto-prev-screen)
(define-key term-raw-map (kbd <span style="color: #bc8f8f;">"M-]"</span>) 'dim:escreen-goto-next-screen)
</pre>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 22 Sep 2009 23:04:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/09/blog/2009/09/22-escreen-integration.html</guid>
</item>
<item>
  <title>Escreen integration</title>
  <link>http://tapoueh.org/blog/2009/09/22-escreen-integration.html</link>
  <description><![CDATA[h1>Escreen integration</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2009/index.html>2009</a> / <a href=../../../blog/2009/09/index.html>09</a> / </div>
<div class="date">Tuesday, September 22 2009, 23:04</div>
</div>
<div id="article">
<p>After having used <a href="http://www.morishima.net/~naoto/software/elscreen/">elscreen</a> for a long time, I'm now a very happy user of
<a href="http://www.splode.com/~friedman/software/emacs-lisp/#ui">escreen</a>, which feels much better integrated and allows to have one ring of
recently visited buffers per screen. Which is what you need when using a
<em>screen</em> like feature, really.</p>

<p>At first, it seemed so good as not to require any tweaking, but soon enough
I had to adapt it to my workflow. After all that's exactly for being able to
do this that I'm using emacs :)</p>

<p>It began quite simple with things like <code>M-[</code> and <code>M-]</code> to navigate in screens,
and mouse wheel support to, but then I found that the <code>C-\ b</code> list of screens
could also support the <code>C-\ a runs the command
escreen-get-active-screen-numbers</code> command by just adding some <em>emphasis</em> to
the current escreen in use.</p>

<p>As soon as I had this, and seeing people eyes blinking when working with me
in front of my computer, I wanted to have <em>escreen</em> switching display where I
am in the minibuffer. You have to try the mouse wheel navigation to fully
appreciate it I guess. Anyway, here it is:</p>

<pre class="src">
(load <span style="color: #ad7fa8; font-style: italic;">"escreen"</span>)
(escreen-install)

<span style="color: #888a85;">;; </span><span style="color: #888a85;">add C-\ l to list screens with emphase for current one
</span>(<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">escreen-get-active-screen-numbers-with-emphasis</span> ()
  <span style="color: #888a85;">"what the name says"</span>
  (interactive)
  (<span style="color: #729fcf; font-weight: bold;">let</span> ((escreens (escreen-get-active-screen-numbers))
        (emphased <span style="color: #ad7fa8; font-style: italic;">""</span>))

    (<span style="color: #729fcf; font-weight: bold;">dolist</span> (s escreens)
      (setq emphased
            (concat emphased (<span style="color: #729fcf; font-weight: bold;">if</span> (= escreen-current-screen-number s)
                                 (propertize (number-to-string s)
                                             <span style="color: #888a85;">;;</span><span style="color: #888a85;">'face 'custom-variable-tag) " ")
</span>                                             'face 'info-title-3)
                                             <span style="color: #888a85;">;;</span><span style="color: #888a85;">'face 'font-lock-warning-face)
</span>                                             <span style="color: #888a85;">;;</span><span style="color: #888a85;">'face 'secondary-selection)
</span>                               (number-to-string s))
                    <span style="color: #ad7fa8; font-style: italic;">" "</span>)))
    (message <span style="color: #ad7fa8; font-style: italic;">"escreen: active screens: %s"</span> emphased)))

(global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-\\ l"</span>) 'escreen-get-active-screen-numbers-with-emphasis)

(<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">dim:escreen-goto-last-screen</span> ()
  (interactive)
  (escreen-goto-last-screen)
  (escreen-get-active-screen-numbers-with-emphasis))

(<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">dim:escreen-goto-prev-screen</span> (<span style="color: #8ae234; font-weight: bold;">&amp;optional</span> n)
  (interactive <span style="color: #ad7fa8; font-style: italic;">"p"</span>)
  (escreen-goto-prev-screen n)
  (escreen-get-active-screen-numbers-with-emphasis))

(<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">dim:escreen-goto-next-screen</span> (<span style="color: #8ae234; font-weight: bold;">&amp;optional</span> n)
  (interactive <span style="color: #ad7fa8; font-style: italic;">"p"</span>)
  (escreen-goto-next-screen n)
  (escreen-get-active-screen-numbers-with-emphasis))

(define-key escreen-map escreen-prefix-char 'dim:escreen-goto-last-screen)

(global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"M-["</span>) 'dim:escreen-goto-prev-screen)
(global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"M-]"</span>) 'dim:escreen-goto-next-screen)
(global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-\\ DEL"</span>) 'dim:escreen-goto-prev-screen)
(global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-\\ SPC"</span>) 'dim:escreen-goto-next-screen)

(global-set-key '[s-mouse-4] 'dim:escreen-goto-prev-screen)
(global-set-key '[s-mouse-5] 'dim:escreen-goto-next-screen)
</pre>

<p>Oh, and as I'm in the <em>terms in emacs</em> part of universe (rather than using
<code>emacs -nw</code> in some terminal emulator, but loosing sync between X clipbloard
and emacs selection), I had to add this too:</p>

<pre class="src">
<span style="color: #888a85;">;; </span><span style="color: #888a85;">add support for C-\ from terms
</span>(<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">term</span>)
(define-key term-raw-map escreen-prefix-char escreen-map)
(define-key term-raw-map (kbd <span style="color: #ad7fa8; font-style: italic;">"M-["</span>) 'dim:escreen-goto-prev-screen)
(define-key term-raw-map (kbd <span style="color: #ad7fa8; font-style: italic;">"M-]"</span>) 'dim:escreen-goto-next-screen)
</pre>



<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/prefix.html">prefix</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 22 Sep 2009 23:04:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/09/22-escreen-integration.html</guid>
</item>
<item>
  <title>Follow-up on dim:mailrc-add-entry</title>
  <link>http://tapoueh.org/blog/2009/09/blog/2009/09/07-follow-up-on-dimmailrc-add-entry.html</link>
  <description><![CDATA[<p><span class="hack"> </span></p>

<p>The function didn't allow for using more than one <code>mailrc</code> file, which isn't a
good idea, so I've just added that. Oh and for <code>gnus</code> integration what I need
is <code>(add-hook 'message-mode-hook 'mail-abbrevs-setup)</code> it seems... so that if
I type the alias it'll get automatically expanded. And to be real lazy and
avoid having to type in the entire alias, <code>mail-abbrev-complete-alias</code> to the
rescue, assigned to some easy to reach keys.</p>

<pre class="src">
(<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">message</span>)
(define-key message-mode-map (kbd <span style="color: #bc8f8f;">"C-'"</span>) 'mail-abbrev-complete-alias)

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">dim:mailrc-add-entry</span> (<span style="color: #228b22;">&amp;optional</span> prefix alias)
  <span style="color: #bc8f8f;">"read email at point and add it to an ~/.mailrc file"</span>
  (interactive <span style="color: #bc8f8f;">"P\nMalias: "</span>)
  (<span style="color: #7f007f;">let*</span> ((default-mailrc (file-name-nondirectory mail-personal-alias-file))
         (mailrc (<span style="color: #7f007f;">if</span> prefix (expand-file-name
                             (read-file-name
                              <span style="color: #bc8f8f;">"Add alias into file: "</span>
                              <span style="color: #bc8f8f;">"~/"</span>
                              default-mailrc
                              t
                              default-mailrc))
                   mail-personal-alias-file))
         (address (thing-at-point 'email-address))
         (buffer (find-file-noselect mailrc t)))
    (<span style="color: #7f007f;">when</span> address
      (<span style="color: #7f007f;">with-current-buffer</span> buffer
        <span style="color: #b22222;">;; </span><span style="color: #b22222;">we don't support updating existing alias in the file
</span>        (<span style="color: #7f007f;">save-excursion</span>
          (goto-char (point-min))
          (<span style="color: #7f007f;">if</span> (search-forward (concat <span style="color: #bc8f8f;">"alias "</span> alias) nil t)
              (<span style="color: #ff0000; font-weight: bold;">error</span> <span style="color: #bc8f8f;">"Alias %s is already present in .mailrc"</span> alias)))

        (<span style="color: #7f007f;">save-current-buffer</span>
          (<span style="color: #7f007f;">save-excursion</span>
            (goto-char (point-max))
            (insert (format <span style="color: #bc8f8f;">"\nalias %s \"%s &lt;%s&gt;\""</span> alias (cdr address) (car ad<span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">dress)))))))))</span>
</pre>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 07 Sep 2009 12:50:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/09/blog/2009/09/07-follow-up-on-dimmailrc-add-entry.html</guid>
</item>
<item>
  <title>Follow-up on dim:mailrc-add-entry</title>
  <link>http://tapoueh.org/blog/2009/09/07-follow-up-on-dimmailrc-add-entry.html</link>
  <description><![CDATA[h1>Follow-up on dim:mailrc-add-entry</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2009/index.html>2009</a> / <a href=../../../blog/2009/09/index.html>09</a> / </div>
<div class="date">Monday, September 07 2009, 12:50</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>The function didn't allow for using more than one <code>mailrc</code> file, which isn't a
good idea, so I've just added that. Oh and for <code>gnus</code> integration what I need
is <code>(add-hook 'message-mode-hook 'mail-abbrevs-setup)</code> it seems... so that if
I type the alias it'll get automatically expanded. And to be real lazy and
avoid having to type in the entire alias, <code>mail-abbrev-complete-alias</code> to the
rescue, assigned to some easy to reach keys.</p>

<pre class="src">
(<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">message</span>)
(define-key message-mode-map (kbd <span style="color: #ad7fa8; font-style: italic;">"C-'"</span>) 'mail-abbrev-complete-alias)

(<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">dim:mailrc-add-entry</span> (<span style="color: #8ae234; font-weight: bold;">&amp;optional</span> prefix alias)
  <span style="color: #888a85;">"read email at point and add it to an ~/.mailrc file"</span>
  (interactive <span style="color: #ad7fa8; font-style: italic;">"P\nMalias: "</span>)
  (<span style="color: #729fcf; font-weight: bold;">let*</span> ((default-mailrc (file-name-nondirectory mail-personal-alias-file))
         (mailrc (<span style="color: #729fcf; font-weight: bold;">if</span> prefix (expand-file-name
                             (read-file-name
                              <span style="color: #ad7fa8; font-style: italic;">"Add alias into file: "</span>
                              <span style="color: #ad7fa8; font-style: italic;">"~/"</span>
                              default-mailrc
                              t
                              default-mailrc))
                   mail-personal-alias-file))
         (address (thing-at-point 'email-address))
         (buffer (find-file-noselect mailrc t)))
    (<span style="color: #729fcf; font-weight: bold;">when</span> address
      (<span style="color: #729fcf; font-weight: bold;">with-current-buffer</span> buffer
        <span style="color: #888a85;">;; </span><span style="color: #888a85;">we don't support updating existing alias in the file
</span>        (<span style="color: #729fcf; font-weight: bold;">save-excursion</span>
          (goto-char (point-min))
          (<span style="color: #729fcf; font-weight: bold;">if</span> (search-forward (concat <span style="color: #ad7fa8; font-style: italic;">"alias "</span> alias) nil t)
              (<span style="color: #f57900; font-weight: bold;">error</span> <span style="color: #ad7fa8; font-style: italic;">"Alias %s is already present in .mailrc"</span> alias)))

        (<span style="color: #729fcf; font-weight: bold;">save-current-buffer</span>
          (<span style="color: #729fcf; font-weight: bold;">save-excursion</span>
            (goto-char (point-max))
            (insert (format <span style="color: #ad7fa8; font-style: italic;">"\nalias %s \"%s &lt;%s&gt;\""</span> alias (cdr address) (car ad<span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">dress)))))))))</span>
</pre>


<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/prefix.html">prefix</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 07 Sep 2009 12:50:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/09/07-follow-up-on-dimmailrc-add-entry.html</guid>
</item>
<item>
  <title>Improving ~/.mailrc usage</title>
  <link>http://tapoueh.org/blog/2009/09/blog/2009/09/07-improving-mailrc-usage.html</link>
  <description><![CDATA[<p>So I've been adviced to use <code>~/.mailrc</code> for keeping a basic address book in
Emacs, for use within <code>gnus</code> for example. I had to resort to the manual to
find out how to use the file aliases when I need them, that is when
composing a mail. For the record, here's what I had to do:</p>

<pre class="src">
<span style="color: #b22222;">;; </span><span style="color: #b22222;">mails and aliases
</span>(add-hook 'mail-mode-hook 'mail-abbrevs-setup)
(global-set-key (kbd <span style="color: #bc8f8f;">"C-c @"</span>) 'mail-abbrev-insert-alias)
</pre>

<p>That means I prefer hitting <code>C-c @</code>, then typing the alias in the minibuffer
(with completion) and there after see the full mail address in my
<code>message-mode</code> buffer. This looks like it'll change over time, but rather than
searching how to have a nice inline alias completion (<code>M-tab</code> maybe, but
already used by the <em>window manager</em>), I've tackled the problem of maintaining
the ~/.mailrc file.</p>

<p>Lazy as I am (or I wouldn't be using Emacs this much), having to manually
select the email region in the buffer, open or switch to the <code>mailrc</code> buffer
then paste my new entry, not forgetting to format it with <code>alias foo</code> prefix
and checking for alias usage while doing so didn't strike me as
appealing. Oh and don't forget to add quote where they belong, too.</p>

<p>Too much work that I wanted to automate. Here we go:</p>

<pre class="src">
<span style="color: #b22222;">;; </span><span style="color: #b22222;">automate adding mail at point to ~/.mailrc
</span>(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">dim:mailrc-add-entry</span> (alias)
  <span style="color: #bc8f8f;">"read email at point"</span>
  (interactive <span style="color: #bc8f8f;">"Malias: "</span>)
  (<span style="color: #7f007f;">let</span> ((address (thing-at-point 'email-address))
        (buffer (find-file-noselect mail-personal-alias-file t)))
    (<span style="color: #7f007f;">when</span> address
      (<span style="color: #7f007f;">with-current-buffer</span> buffer
        <span style="color: #b22222;">;; </span><span style="color: #b22222;">we don't support updating existing alias in the file
</span>        (<span style="color: #7f007f;">save-excursion</span>
          (goto-char (point-min))
          (<span style="color: #7f007f;">if</span> (search-forward (concat <span style="color: #bc8f8f;">"alias "</span> alias) nil t)
              (<span style="color: #ff0000; font-weight: bold;">error</span> <span style="color: #bc8f8f;">"Alias %s is already present in .mailrc"</span> alias)))

        (<span style="color: #7f007f;">save-current-buffer</span>
          (<span style="color: #7f007f;">save-excursion</span>
            (goto-char (point-max))
            (insert (format <span style="color: #bc8f8f;">"\nalias %s \"%s &lt;%s&gt;\""</span> alias (cdr address) (car ad<span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">dress)))))))))</span>

(global-set-key (kbd <span style="color: #bc8f8f;">"C-c C-@"</span>) 'dim:mailrc-add-entry)
</pre>

<p>Quite there, you'll notice that I'm using <code>thing-at-point 'email-address</code>, and
maybe you already know that <code>emacs23</code> does not provide this. It provides
<code>thing-at-point 'email</code> which will ignore real name and all. For example,
given a point somewhere inside the right part of <code>John Doe
&lt;johndoe@email.tld&gt;</code> the <code>'email</code> variant of <code>thing-at-point</code> will return
<code>johndoe@email.tld</code>. In words of one syllabe: not what I want.</p>

<p>So after searching around for a solution, I saw <code>mail-header-parse-address</code>
from the API oriented <code>mail-parse</code> librairy, and finaly came up with this dead simple
solution which works fine enough for me:</p>

<pre class="src">
(<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">mail-parse</span>)

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">thing-at-point-bounds-of-email-address</span> ()
  <span style="color: #bc8f8f;">"return a cons of begin and end position of email address at point, including </span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">full name"</span>
  (<span style="color: #7f007f;">save-excursion</span>
    (<span style="color: #7f007f;">let*</span> ((search-point (point))
           (start (re-search-backward <span style="color: #bc8f8f;">"[:,]"</span> (line-beginning-position) 'move))
           (dummy (goto-char search-point))
           (end   (re-search-forward  <span style="color: #bc8f8f;">"[:,]"</span> (line-end-position) t)))
      (setq start (<span style="color: #7f007f;">if</span> start (+ 1 start)
                    (line-beginning-position)))
      (<span style="color: #7f007f;">unless</span> end (setq end (line-end-position)))
      (cons start end))))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">thing-at-point-email-address</span> ()
  <span style="color: #bc8f8f;">"return full email address at point"</span>
  (<span style="color: #7f007f;">let*</span> ((bounds (thing-at-point-bounds-of-email-address))
         (email-address-text
          (<span style="color: #7f007f;">when</span> bounds (buffer-substring-no-properties (car bounds) (cdr bounds)<span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">))))</span>
    (mail-header-parse-address email-address-text)))

(put 'email-address 'bounds-of-thing-at-point 'thing-at-point-bounds-of-email-ad<span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">dress)</span>
(put 'email-address 'thing-at-point 'thing-at-point-email-address)
</pre>

<p>Now, when I receive a mail and want to store an alias for it, I simply place
point somewhere in the mail then hit <code>C-c C-@</code>, and <em>voilà</em> my <code>~/.mailrc</code> is
uptodate.</p>

<p>Hope it'll be useful for someone else, but at least I'm keeping annotated
history of the files :)</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 07 Sep 2009 01:29:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/09/blog/2009/09/07-improving-mailrc-usage.html</guid>
</item>
<item>
  <title>Improving ~/.mailrc usage</title>
  <link>http://tapoueh.org/blog/2009/09/07-improving-mailrc-usage.html</link>
  <description><![CDATA[h1>Improving ~/.mailrc usage</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2009/index.html>2009</a> / <a href=../../../blog/2009/09/index.html>09</a> / </div>
<div class="date">Monday, September 07 2009, 01:29</div>
</div>
<div id="article">
<p>So I've been adviced to use <code>~/.mailrc</code> for keeping a basic address book in
Emacs, for use within <code>gnus</code> for example. I had to resort to the manual to
find out how to use the file aliases when I need them, that is when
composing a mail. For the record, here's what I had to do:</p>

<pre class="src">
<span style="color: #888a85;">;; </span><span style="color: #888a85;">mails and aliases
</span>(add-hook 'mail-mode-hook 'mail-abbrevs-setup)
(global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-c @"</span>) 'mail-abbrev-insert-alias)
</pre>

<p>That means I prefer hitting <code>C-c @</code>, then typing the alias in the minibuffer
(with completion) and there after see the full mail address in my
<code>message-mode</code> buffer. This looks like it'll change over time, but rather than
searching how to have a nice inline alias completion (<code>M-tab</code> maybe, but
already used by the <em>window manager</em>), I've tackled the problem of maintaining
the ~/.mailrc file.</p>

<p>Lazy as I am (or I wouldn't be using Emacs this much), having to manually
select the email region in the buffer, open or switch to the <code>mailrc</code> buffer
then paste my new entry, not forgetting to format it with <code>alias foo</code> prefix
and checking for alias usage while doing so didn't strike me as
appealing. Oh and don't forget to add quote where they belong, too.</p>

<p>Too much work that I wanted to automate. Here we go:</p>

<pre class="src">
<span style="color: #888a85;">;; </span><span style="color: #888a85;">automate adding mail at point to ~/.mailrc
</span>(<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">dim:mailrc-add-entry</span> (alias)
  <span style="color: #888a85;">"read email at point"</span>
  (interactive <span style="color: #ad7fa8; font-style: italic;">"Malias: "</span>)
  (<span style="color: #729fcf; font-weight: bold;">let</span> ((address (thing-at-point 'email-address))
        (buffer (find-file-noselect mail-personal-alias-file t)))
    (<span style="color: #729fcf; font-weight: bold;">when</span> address
      (<span style="color: #729fcf; font-weight: bold;">with-current-buffer</span> buffer
        <span style="color: #888a85;">;; </span><span style="color: #888a85;">we don't support updating existing alias in the file
</span>        (<span style="color: #729fcf; font-weight: bold;">save-excursion</span>
          (goto-char (point-min))
          (<span style="color: #729fcf; font-weight: bold;">if</span> (search-forward (concat <span style="color: #ad7fa8; font-style: italic;">"alias "</span> alias) nil t)
              (<span style="color: #f57900; font-weight: bold;">error</span> <span style="color: #ad7fa8; font-style: italic;">"Alias %s is already present in .mailrc"</span> alias)))

        (<span style="color: #729fcf; font-weight: bold;">save-current-buffer</span>
          (<span style="color: #729fcf; font-weight: bold;">save-excursion</span>
            (goto-char (point-max))
            (insert (format <span style="color: #ad7fa8; font-style: italic;">"\nalias %s \"%s &lt;%s&gt;\""</span> alias (cdr address) (car ad<span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">dress)))))))))</span>

(global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-c C-@"</span>) 'dim:mailrc-add-entry)
</pre>

<p>Quite there, you'll notice that I'm using <code>thing-at-point 'email-address</code>, and
maybe you already know that <code>emacs23</code> does not provide this. It provides
<code>thing-at-point 'email</code> which will ignore real name and all. For example,
given a point somewhere inside the right part of <code>John Doe
&lt;johndoe@email.tld&gt;</code> the <code>'email</code> variant of <code>thing-at-point</code> will return
<code>johndoe@email.tld</code>. In words of one syllabe: not what I want.</p>

<p>So after searching around for a solution, I saw <code>mail-header-parse-address</code>
from the API oriented <code>mail-parse</code> librairy, and finaly came up with this dead simple
solution which works fine enough for me:</p>

<pre class="src">
(<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">mail-parse</span>)

(<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">thing-at-point-bounds-of-email-address</span> ()
  <span style="color: #888a85;">"return a cons of begin and end position of email address at point, including </span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">full name"</span>
  (<span style="color: #729fcf; font-weight: bold;">save-excursion</span>
    (<span style="color: #729fcf; font-weight: bold;">let*</span> ((search-point (point))
           (start (re-search-backward <span style="color: #ad7fa8; font-style: italic;">"[:,]"</span> (line-beginning-position) 'move))
           (dummy (goto-char search-point))
           (end   (re-search-forward  <span style="color: #ad7fa8; font-style: italic;">"[:,]"</span> (line-end-position) t)))
      (setq start (<span style="color: #729fcf; font-weight: bold;">if</span> start (+ 1 start)
                    (line-beginning-position)))
      (<span style="color: #729fcf; font-weight: bold;">unless</span> end (setq end (line-end-position)))
      (cons start end))))

(<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">thing-at-point-email-address</span> ()
  <span style="color: #888a85;">"return full email address at point"</span>
  (<span style="color: #729fcf; font-weight: bold;">let*</span> ((bounds (thing-at-point-bounds-of-email-address))
         (email-address-text
          (<span style="color: #729fcf; font-weight: bold;">when</span> bounds (buffer-substring-no-properties (car bounds) (cdr bounds)<span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">))))</span>
    (mail-header-parse-address email-address-text)))

(put 'email-address 'bounds-of-thing-at-point 'thing-at-point-bounds-of-email-ad<span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">dress)</span>
(put 'email-address 'thing-at-point 'thing-at-point-email-address)
</pre>

<p>Now, when I receive a mail and want to store an alias for it, I simply place
point somewhere in the mail then hit <code>C-c C-@</code>, and <em>voilà</em> my <code>~/.mailrc</code> is
uptodate.</p>

<p>Hope it'll be useful for someone else, but at least I'm keeping annotated
history of the files :)</p>


<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/prefix.html">prefix</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 07 Sep 2009 01:29:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/09/07-improving-mailrc-usage.html</guid>
</item>
<item>
  <title>hstore-new &amp; preprepare reach debian too</title>
  <link>http://tapoueh.org/blog/2009/08/18-hstore-new--preprepare-reach-debian-too.html</link>
  <description><![CDATA[h1>hstore-new & preprepare reach debian too</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2009/index.html>2009</a> / <a href=../../../blog/2009/08/index.html>08</a> / </div>
<div class="date">Tuesday, August 18 2009, 09:14</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>It seems like debian developers are back from annual conference and holiday,
so they have had a look at the <code>NEW</code> queue and processed the packages in
there. Two of them were mines and waiting to get in <code>unstable</code>, <a href="http://packages.debian.org/hstore-new">hstore-new</a> and
<a href="http://packages.debian.org/preprepare">preprepare</a>.</p>

<p>Time to do some bug fixing already, as <code>hstore-new</code> packaging is using a
<em>bash'ism</em> I shouldn't rely on (or so the debian buildfarm is <a href="https://buildd.debian.org/~luk/status/package.php?p=hstore-new">telling me</a>) and
for <code>preprepare</code> I was waiting for inclusion before to go improving the <code>GUC</code>
management, stealing some code from <a href="http://blog.endpoint.com/search/label/postgres">Selena</a>'s <a href="http://blog.endpoint.com/2009/07/pggearman-01-release.html">pgGearman</a> :)</p>

<p>As some of you wonder about <code>prefix 1.0</code> scheduling, it should soon get there
now it's been in testing long enough and no bug has been reported. Of course
releasing <code>1.0</code> in august isn't good timing, so maybe I should just wait some
more weeks.</p>


<h2>Tags</h2>

<p><a href="../../../tags/debian.html">debian</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/prefix.html">prefix</a> <a href="../../../tags/preprepare.html">preprepare</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 18 Aug 2009 09:14:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/08/18-hstore-new--preprepare-reach-debian-too.html</guid>
</item>
<item>
  <title>Some emacs nifties</title>
  <link>http://tapoueh.org/blog/2009/08/blog/2009/08/03-some-emacs-nifties.html</link>
  <description><![CDATA[<p>First, here's a way to insert at current position the last message printed
into the minibuffer... well not exactly, in <code>*Messages*</code> buffer in fact. I was
tired of doing it myself after invoking, e.g., <code>M-x emacs-version</code>.</p>

<pre class="src">
<span style="color: #b22222;">;; </span><span style="color: #b22222;">print last message
</span><span style="color: #b22222;">;; </span><span style="color: #b22222;">current-message is already lost by the time this gets called
</span>(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">dim:previous-message</span> (<span style="color: #228b22;">&amp;optional</span> nth)
  <span style="color: #bc8f8f;">"get last line of *Message* buffer"</span>
  (<span style="color: #7f007f;">with-current-buffer</span> (get-buffer <span style="color: #bc8f8f;">"*Messages*"</span>)
    (<span style="color: #7f007f;">save-excursion</span>
      (goto-char (point-max))
      (setq nth (<span style="color: #7f007f;">if</span> nth nth 1))
      (<span style="color: #7f007f;">while</span> (&gt; nth 0)
        (previous-line)
        (setq nth (- nth 1)))
      (buffer-substring (line-beginning-position) (line-end-position)))))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">dim:insert-previous-message</span> (<span style="color: #228b22;">&amp;optional</span> nth)
  <span style="color: #bc8f8f;">"insert last message of *Message* to current position"</span>
  (interactive <span style="color: #bc8f8f;">"p"</span>)
  (insert (format <span style="color: #bc8f8f;">"%s"</span> (dim:previous-message nth))))

(global-set-key (kbd <span style="color: #bc8f8f;">"C-c m"</span>) 'dim:insert-previous-message)
</pre>

<p>Now I stumbled accross <a href="http://planet.emacsen.org/">Planet Emacsen</a> and saw this <a href="http://curiousprogrammer.wordpress.com/2009/07/26/emacs-utility-functions/">Emacs Utility Functions</a>
post, containing a version of <code>duplicate-current-line</code> that I didn't
like... here's mine:</p>

<pre class="src">
<span style="color: #b22222;">;; </span><span style="color: #b22222;">duplicate current line
</span>(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">duplicate-current-line</span> (<span style="color: #228b22;">&amp;optional</span> n)
  <span style="color: #bc8f8f;">"duplicate current line, make more than 1 copy given a numeric argument"</span>
  (interactive <span style="color: #bc8f8f;">"p"</span>)
  (<span style="color: #7f007f;">save-excursion</span>
    (<span style="color: #7f007f;">let</span> ((nb (or n 1))
          (current-line (thing-at-point 'line)))
      <span style="color: #b22222;">;; </span><span style="color: #b22222;">when on last line, insert a newline first
</span>      (<span style="color: #7f007f;">when</span> (or (= 1 (forward-line 1)) (eq (point) (point-max)))
        (insert <span style="color: #bc8f8f;">"\n"</span>))

      <span style="color: #b22222;">;; </span><span style="color: #b22222;">now insert as many time as requested
</span>      (<span style="color: #7f007f;">while</span> (&gt; n 0)
        (insert current-line)
        (decf n)))))

(global-set-key (kbd <span style="color: #bc8f8f;">"C-S-d"</span>) 'duplicate-current-line)
</pre>

<p>And a last one inspired by some strange <code>vim</code> behavior for which I fail to see
a need:</p>

<pre class="src">
<span style="color: #b22222;">;; </span><span style="color: #b22222;">on request by cyrilb, who missed it from vim
</span><span style="color: #b22222;">;; </span><span style="color: #b22222;">no global-set-key yet, still have to think I'll use it someday...
</span>(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">copy-char-from-prev-line</span> ()
  <span style="color: #bc8f8f;">"Copy char at same position on previous line, when such a line and position ex</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">ists"</span>
  (interactive)
  (<span style="color: #7f007f;">let</span> ((c)
        (p (- (point) (line-beginning-position))))
    (<span style="color: #7f007f;">save-excursion</span>
      (<span style="color: #7f007f;">when</span> (eq 0 (forward-line -1))
        (<span style="color: #7f007f;">when</span> (&lt; (+ (point) p) (line-end-position))
          (forward-char p)
          (setq c (thing-at-point 'char)))))
    (<span style="color: #7f007f;">when</span> c
      (insert c))))
</pre>

<p>Next time I'll try to talk about <code>rcirc-groups</code> or <code>cssh</code> which have managed to
take some of my free time recently.</p>
]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 03 Aug 2009 15:15:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/08/blog/2009/08/03-some-emacs-nifties.html</guid>
</item>
<item>
  <title>Some emacs nifties</title>
  <link>http://tapoueh.org/blog/2009/08/03-some-emacs-nifties.html</link>
  <description><![CDATA[h1>Some emacs nifties</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2009/index.html>2009</a> / <a href=../../../blog/2009/08/index.html>08</a> / </div>
<div class="date">Monday, August 03 2009, 15:15</div>
</div>
<div id="article">
<p>First, here's a way to insert at current position the last message printed
into the minibuffer... well not exactly, in <code>*Messages*</code> buffer in fact. I was
tired of doing it myself after invoking, e.g., <code>M-x emacs-version</code>.</p>

<pre class="src">
<span style="color: #888a85;">;; </span><span style="color: #888a85;">print last message
</span><span style="color: #888a85;">;; </span><span style="color: #888a85;">current-message is already lost by the time this gets called
</span>(<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">dim:previous-message</span> (<span style="color: #8ae234; font-weight: bold;">&amp;optional</span> nth)
  <span style="color: #888a85;">"get last line of *Message* buffer"</span>
  (<span style="color: #729fcf; font-weight: bold;">with-current-buffer</span> (get-buffer <span style="color: #ad7fa8; font-style: italic;">"*Messages*"</span>)
    (<span style="color: #729fcf; font-weight: bold;">save-excursion</span>
      (goto-char (point-max))
      (setq nth (<span style="color: #729fcf; font-weight: bold;">if</span> nth nth 1))
      (<span style="color: #729fcf; font-weight: bold;">while</span> (&gt; nth 0)
        (previous-line)
        (setq nth (- nth 1)))
      (buffer-substring (line-beginning-position) (line-end-position)))))

(<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">dim:insert-previous-message</span> (<span style="color: #8ae234; font-weight: bold;">&amp;optional</span> nth)
  <span style="color: #888a85;">"insert last message of *Message* to current position"</span>
  (interactive <span style="color: #ad7fa8; font-style: italic;">"p"</span>)
  (insert (format <span style="color: #ad7fa8; font-style: italic;">"%s"</span> (dim:previous-message nth))))

(global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-c m"</span>) 'dim:insert-previous-message)
</pre>

<p>Now I stumbled accross <a href="http://planet.emacsen.org/">Planet Emacsen</a> and saw this <a href="http://curiousprogrammer.wordpress.com/2009/07/26/emacs-utility-functions/">Emacs Utility Functions</a>
post, containing a version of <code>duplicate-current-line</code> that I didn't
like... here's mine:</p>

<pre class="src">
<span style="color: #888a85;">;; </span><span style="color: #888a85;">duplicate current line
</span>(<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">duplicate-current-line</span> (<span style="color: #8ae234; font-weight: bold;">&amp;optional</span> n)
  <span style="color: #888a85;">"duplicate current line, make more than 1 copy given a numeric argument"</span>
  (interactive <span style="color: #ad7fa8; font-style: italic;">"p"</span>)
  (<span style="color: #729fcf; font-weight: bold;">save-excursion</span>
    (<span style="color: #729fcf; font-weight: bold;">let</span> ((nb (or n 1))
          (current-line (thing-at-point 'line)))
      <span style="color: #888a85;">;; </span><span style="color: #888a85;">when on last line, insert a newline first
</span>      (<span style="color: #729fcf; font-weight: bold;">when</span> (or (= 1 (forward-line 1)) (eq (point) (point-max)))
        (insert <span style="color: #ad7fa8; font-style: italic;">"\n"</span>))

      <span style="color: #888a85;">;; </span><span style="color: #888a85;">now insert as many time as requested
</span>      (<span style="color: #729fcf; font-weight: bold;">while</span> (&gt; n 0)
        (insert current-line)
        (decf n)))))

(global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-S-d"</span>) 'duplicate-current-line)
</pre>

<p>And a last one inspired by some strange <code>vim</code> behavior for which I fail to see
a need:</p>

<pre class="src">
<span style="color: #888a85;">;; </span><span style="color: #888a85;">on request by cyrilb, who missed it from vim
</span><span style="color: #888a85;">;; </span><span style="color: #888a85;">no global-set-key yet, still have to think I'll use it someday...
</span>(<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">copy-char-from-prev-line</span> ()
  <span style="color: #888a85;">"Copy char at same position on previous line, when such a line and position ex</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">ists"</span>
  (interactive)
  (<span style="color: #729fcf; font-weight: bold;">let</span> ((c)
        (p (- (point) (line-beginning-position))))
    (<span style="color: #729fcf; font-weight: bold;">save-excursion</span>
      (<span style="color: #729fcf; font-weight: bold;">when</span> (eq 0 (forward-line -1))
        (<span style="color: #729fcf; font-weight: bold;">when</span> (&lt; (+ (point) p) (line-end-position))
          (forward-char p)
          (setq c (thing-at-point 'char)))))
    (<span style="color: #729fcf; font-weight: bold;">when</span> c
      (insert c))))
</pre>

<p>Next time I'll try to talk about <code>rcirc-groups</code> or <code>cssh</code> which have managed to
take some of my free time recently.</p>


<h2>Tags</h2>

<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/cssh.html">cssh</a> <a href="../../../tags/rcirc.html">rcirc</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 03 Aug 2009 15:15:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/08/03-some-emacs-nifties.html</guid>
</item>
<item>
  <title>prefix 1.0~rc2 in debian testing</title>
  <link>http://tapoueh.org/blog/2009/08/03-prefix-10rc2-in-debian-testing.html</link>
  <description><![CDATA[h1>prefix 1.0~rc2 in debian testing</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2009/index.html>2009</a> / <a href=../../../blog/2009/08/index.html>08</a> / </div>
<div class="date">Monday, August 03 2009, 14:50</div>
</div>
<div id="article">
<p>At long last, <a href="http://packages.debian.org/search?searchon=sourcenames&amp;keywords=prefix">here it is</a>. With binary versions both for <code>postgresal-8.3</code> and
<code>postgresal-8.4</code>! Unfortunately my other packaging efforts are still waiting
on the <code>NEW</code> queue, but I hope to soon see <code>hstore-new</code> and <code>preprepare</code> enter
debian too.</p>

<p>Anyway, the plan for <code>prefix</code> is to now wait something like 2 weeks, then,
baring showstopper bugs, release the <code>1.0</code> final version. If you have a use
for it, now is the good time for testing it!</p>

<p>About upgrading a current <code>prefix</code> installation, the advice is to save data as
<code>text</code> instead of <code>prefix_range</code>, remove prefix support, install new version,
change again the columns data type:</p>

<pre class="src">
BEGIN;
  ALTER TABLE foo
     ALTER COLUMN prefix
             TYPE text USING text(prefix);

  DROP TYPE prefix_range CASCADE;
  \i prefix.sql

  ALTER TABLE foo
     ALTER COLUMN prefix
             TYPE prefix_range USING prefix_range(prefix);

  CREATE INDEX idx_foo_prefix ON foo
         USING gist(prefix gist_prefix_range_ops);
COMMIT;
</pre>

<p>Note: I just added the <code>gist_prefix_range_ops</code> as default for type
<code>prefix_range</code> so it'll be optional to specify this in final <code>1.0</code>. I got so
used to typing it I didn't realize we don't have to :)</p>


<h2>Tags</h2>

<p><a href="../../../tags/debian.html">debian</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/prefix.html">prefix</a> <a href="../../../tags/preprepare.html">preprepare</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Mon, 03 Aug 2009 14:50:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/08/03-prefix-10rc2-in-debian-testing.html</guid>
</item>
<item>
  <title>prefix 1.0~rc2-1</title>
  <link>http://tapoueh.org/blog/2009/07/09-prefix-10rc2-1.html</link>
  <description><![CDATA[h1>prefix 1.0~rc2-1</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2009/index.html>2009</a> / <a href=../../../blog/2009/07/index.html>07</a> / </div>
<div class="date">Thursday, July 09 2009, 12:48</div>
</div>
<div id="article">
<p>I've been having problem with building both <code>postgresql-8.3-prefix</code> and
<code>postgresql-8.4-prefix</code> debian packages from the same source package, and
fixing the packaging issue forced me into modifying the main <code>prefix</code>
<code>Makefile</code>. So while reaching <code>rc2</code>, I tried to think about missing pieces easy
to add this late in the game: and there's one, that's a function
<code>length(prefix_range)</code>, so that you don't have to cast to text no more in the
following wildspread query:</p>

<pre class="src">
  SELECT foo, bar
    FROM prefixes
   WHERE prefix @&gt; <span style="color: #ad7fa8; font-style: italic;">'012345678'</span>
ORDER BY length(prefix) DESC
   LIMIT 1;
</pre>

<p>And here's a simple stupid benchmark of the new function, here in
<a href="http://prefix.projects.postgresql.org/prefix-1.0~rc2.tar.gz">prefix-1.0~rc2.tar.gz</a>. And it'll soon reach debian, if my QA dept agrees (my
<a href="http://julien.danjou.info/blog/">sponsor</a> is a QA dept all by himself!).</p>

<p>First some preparation:</p>

<pre class="src">
dim=#   create table prefixes (
dim(#          prefix    prefix_range primary key,
dim(#          name      text not null,
dim(#          shortname text,
dim(#          status    char default <span style="color: #ad7fa8; font-style: italic;">'S'</span>,
dim(#
dim(#          check( status in (<span style="color: #ad7fa8; font-style: italic;">'S'</span>, <span style="color: #ad7fa8; font-style: italic;">'R'</span>) )
dim(#   );
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "prefixes_pkey" for
 table "prefixes"
CREATE TABLE
Time: 74,357 ms
dim=#   \copy prefixes from <span style="color: #ad7fa8; font-style: italic;">'prefixes.fr.csv'</span> with delimiter ; csv quote <span style="color: #ad7fa8; font-style: italic;">'"'</span>
Time: 200,982 ms
dim=# select count(*) from prefixes ;
 count
<span style="color: #888a85;">-------
</span> 11966
(1 row)
Time: 3,047 ms
</pre>

<p>And now for the micro-benchmark:</p>

<pre class="src">
dim=# \o /dev/null
dim=# select length(prefix) from prefixes;
Time: 16,040 ms
dim=# select length(prefix::text) from prefixes;
Time: 23,364 ms
dim=# \o
</pre>

<p>Hope you enjoy!</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/prefix.html">prefix</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 09 Jul 2009 12:48:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/07/09-prefix-10rc2-1.html</guid>
</item>
<item>
  <title>prefix extension reaches 1.0 (rc1)</title>
  <link>http://tapoueh.org/blog/2009/06/23-prefix-extension-reaches-10-rc1.html</link>
  <description><![CDATA[h1>prefix extension reaches 1.0 (rc1)</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2009/index.html>2009</a> / <a href=../../../blog/2009/06/index.html>06</a> / </div>
<div class="date">Tuesday, June 23 2009, 10:53</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>At long last, after millions and millions of queries just here at work and
some more in other places, the <a href="prefix.html">prefix</a> project is reaching <code>1.0</code> milestone. The
release candidate is getting uploaded into debian at the moment of this
writing, and available at the following place: <a href="http://prefix.projects.postgresql.org/prefix-1.0~rc1.tar.gz">prefix-1.0~rc1.tar.gz</a>.</p>

<p>If you have any use for it (as some <em>VoIP</em> companies have already), please
consider testing it, in order for me to release a shiny <code>1.0</code> next week! :)</p>

<p>Recent changes include getting rid of those square brackets output when it's
not neccesary, fixing btree operators, adding support for more operators in
the <code>GiST</code> support code (now supported: <code>@&gt;</code>, <code>&lt;@</code>, <code>=</code>, <code>&amp;&amp;</code>). Enjoy!</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/prefix.html">prefix</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 23 Jun 2009 10:53:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/06/23-prefix-extension-reaches-10-rc1.html</guid>
</item>
<item>
  <title>PgCon 2009</title>
  <link>http://tapoueh.org/blog/2009/05/27-pgcon-2009.html</link>
  <description><![CDATA[h1>PgCon 2009</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2009/index.html>2009</a> / <a href=../../../blog/2009/05/index.html>05</a> / </div>
<div class="date">Wednesday, May 27 2009, 14:30</div>
</div>
<div id="article">
<p>I can't really compare <a href="http://www.pgcon.org/2009/">PgCon 2009</a> with previous years versions, last time I
enjoyed the event it was in 2006, in Toronto. But still I found the
experience to be a great one, and I hope I'll be there next year too!</p>

<p>I've met a lot of known people in the community, some of them I already had
the chance to run into at Toronto or <a href="http://2008.pgday.org/en/">Prato</a>, but this was the first time I
got to talk to many of them about interresting projects and ideas. That only
was awesome already, and we also had a lot of talks to listen to: as others
have said, it was really hard to get to choose to go to only one place out
of three.</p>

<p>I'm now back home and seems to be recovering quite fine from jet lag, and I
even begun to move on the todo list from the conference. It includes mainly
<code>Skytools 3</code> testing and contributions (code and documentation),
<a href="http://wiki.postgresql.org/wiki/ExtensionPackaging">Extension Packaging</a> work (Stephen Frost seems to be willing to help, which I
highly appreciate) begining with <a href="http://archives.postgresql.org/pgsql-hackers/2009-05/msg00912.php">search_path issues</a>, and posting some
backtrace to help fix some <a href="http://archives.postgresql.org/pgsql-hackers/2009-05/msg00923.php">SPI_connect()</a> bug at <code>_PG_init()</code> time in an
extension.</p>

<p>The excellent <a href="http://wiki.postgresql.org/wiki/PgCon_2009_Lightning_talks">lightning talk</a> about <u>How not to Review a Patch</u> by Joshua
Tolley took me out of the <em>dim</em>, I'll try to be <em>bright</em> enough and participate
as a reviewer in later commit fests (well maybe not the first next ones as
some personal events on the agenda will take all my <em>&quot;free&quot;</em> time)...</p>

<p>Oh and the <a href="http://code.google.com/p/golconde/">Golconde</a> presentation gave some insights too: this queueing based
solution is to compare to the <code>listen/notify</code> mechanisms we already have in
<a href="http://www.postgresql.org/docs/current/static/sql-listen.html">PostgreSQL</a>, in the sense that's it's not transactional, and the events are
kept in memory only to achieve very high distribution rates. So it's a very
fine solution to manage a distributed caching system, for example, but not
so much for asynchronous replication (you need not to replicate events tied
to rollbacked transactions).</p>

<p>So all in all, spending last week in Ottawa was a splendid way to get more
involved in the PostgreSQL community, which is a very fine place to be
spending ones free time, should you ask me. See you soon!</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/pgcon.html">pgcon</a> <a href="../../../tags/skytools.html">skytools</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Wed, 27 May 2009 14:30:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/05/27-pgcon-2009.html</guid>
</item>
<item>
  <title>Prepared Statements and pgbouncer</title>
  <link>http://tapoueh.org/blog/2009/05/14-prepared-statements-and-pgbouncer.html</link>
  <description><![CDATA[h1>Prepared Statements and pgbouncer</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2009/index.html>2009</a> / <a href=../../../blog/2009/05/index.html>05</a> / </div>
<div class="date">Thursday, May 14 2009</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>On the performance mailing list, a recent <a href="http://archives.postgresql.org/pgsql-performance/2009-05/msg00026.php">thread</a> drew my attention. It
devired to be about using a connection pool software and prepared statements
in order to increase scalability of PostgreSQL when confronted to a lot of
concurrent clients all doing simple <code>select</code> queries. The advantage of the
<em>pooler</em> is to reduce the number of <em>backends</em> needed to serve the queries, thus
reducing PostgreSQL internal bookkeeping. Of course, my choice of software
here is clear: <a href="https://developer.skype.com/SkypeGarage/DbProjects/PgBouncer">PgBouncer</a> is an excellent top grade solution, performs real
well (it won't parse queries), reliable, flexible.</p>

<p>The problem is that while conbining <code>pgbouncer</code> and <a href="http://www.postgresql.org/docs/current/static/sql-prepare.html">prepared statements</a> is
possible, it requires the application to check at connection time if the
statements it's interrested in are already prepared. This can be done by a
simple catalog query of this kind:</p>

<pre class="src">
  SELECT name
    FROM pg_prepared_statements
   WHERE name IN (<span style="color: #ad7fa8; font-style: italic;">'my'</span>, <span style="color: #ad7fa8; font-style: italic;">'prepared'</span>, <span style="color: #ad7fa8; font-style: italic;">'statements'</span>);
</pre>

<p>Well, this is simple but requires to add some application logic. What would
be great would be to only have to <code>EXECUTE my_statement(x, y, z)</code> and never
bother if the <code>backend</code> connection is a fresh new one or an existing one, as
to avoid having to check if the application should <code>prepare</code>.</p>

<p>The <a href="http://preprepare.projects.postgresql.org/">preprepare</a> pgfoundry project is all about this: it comes with a
<code>prepare_all()</code> function which will take all statements present in a given
table (<code>SET preprepare.relation TO 'schema.the_table';</code>) and prepare them for
you. If you now tell <code>pgbouncer</code> to please call the function at <code>backend</code>
creation time, you're done (see <code>connect_query</code>).</p>

<p>There's even a detailed <a href="http://preprepare.projects.postgresql.org/README.html">README</a> file, but no release yet (check out the code
in the <a href="http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/preprepare/preprepare/">CVS</a>, <code>pgfoundry</code> project page has <a href="http://pgfoundry.org/scm/?group_id=1000442">clear instruction</a> about how to do so.</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/preprepare.html">preprepare</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Thu, 14 May 2009 00:00:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/05/14-prepared-statements-and-pgbouncer.html</guid>
</item>
<item>
  <title>Skytools 3.0 reaches alpha1</title>
  <link>http://tapoueh.org/blog/2009/04/14-skytools-30-reaches-alpha1.html</link>
  <description><![CDATA[h1>Skytools 3.0 reaches alpha1</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2009/index.html>2009</a> / <a href=../../../blog/2009/04/index.html>04</a> / </div>
<div class="date">Tuesday, April 14 2009</div>
</div>
<div id="article">
<p>It's time for <a href="http://wiki.postgresql.org/wiki/Skytools">Skytools</a> news again! First, we did improve documentation of
current stable branch with hosting high level presentations and <a href="http://wiki.postgresql.org/wiki/Londiste_Tutorial">tutorials</a> on
the <a href="http://wiki.postgresql.org/">PostgreSQL wiki</a>. Do check out the <a href="http://wiki.postgresql.org/wiki/Londiste_Tutorial">Londiste Tutorial</a>, it seems that's
what people hesitating to try out londiste were missing the most.</p>

<p>The other things people miss out a lot in current stable Skytools (version
<code>2.1.9</code> currently) are cascading replication (which allows for <em>switchover</em> and
<em>failover</em>) and <code>DDL</code> support. The new incarnation of skytools, version <code>3.0</code>
<a href="http://lists.pgfoundry.org/pipermail/skytools-users/2009-April/001029.html">reaches alpha1</a> today. It comes with full support for <em>cascading</em> and <em>DDL</em>, so
you might want to give it a try.</p>

<p>It's a rough release, documentation is still to get written for a large part
of it, and bugs are still to get fixed. But it's all in the Skytools spirit:
simple and efficient concepts, easy to use and maintain. Think about this
release as a <em>developer preview</em> and join us :)</p>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/skytools.html">skytools</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 14 Apr 2009 00:00:00 +0200</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/04/14-skytools-30-reaches-alpha1.html</guid>
</item>
<item>
  <title>Prefix GiST index now in 8.1</title>
  <link>http://tapoueh.org/blog/2009/02/10-prefix-gist-index-now-in-81.html</link>
  <description><![CDATA[h1>Prefix GiST index now in 8.1</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2009/index.html>2009</a> / <a href=../../../blog/2009/02/index.html>02</a> / </div>
<div class="date">Tuesday, February 10 2009</div>
</div>
<div id="article">
<p>The <a href="http://blog.tapoueh.org/prefix.html">prefix</a> project is about matching a <em>literal</em> against <em>prefixes</em> in your
table, the typical example being a telecom routing table. Thanks to the
excellent work around <em>generic</em> indexes in PostgreSQL with <a href="http://www.postgresql.org/docs/current/static/gist-intro.html">GiST</a>, indexing
prefix matches is easy to support in an external module. Which is what
the <a href="http://prefix.projects.postgresql.org/">prefix</a> extension is all about.</p>

<p>Maybe you didn't come across this project before, so here's the typical
query you want to run to benefit from the special indexing, where the <code>@&gt;</code>
operator is read <em>contains</em> or <em>is a prefix of</em>:</p>

<pre class="src">
  SELECT * FROM prefixes WHERE prefix @&gt; <span style="color: #ad7fa8; font-style: italic;">'0123456789'</span>;
</pre>

<p>Now, a user asked about an <code>8.1</code> version of the module, as it's what some
distributions ship (here, Red Hat Enterprise Linux 5.2). It turned out it
was easy to support <code>8.1</code> when you already support <code>8.2</code>, so the <code>CVS</code> now hosts
<a href="http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/prefix/prefix/">8.1 support code</a>. And here's what the user asking about the feature has to
say:</p>

<blockquote>
<p class="quoted">
It's works like a charm now with 3ms queries over 200,000+ rows.  The speed
also stays less than 4ms when doing complex queries designed for fallback,
priority shuffling, and having multiple carriers.</p>

</blockquote>


<h2>Tags</h2>

<p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/prefix.html">prefix</a></p>


</div>

]]></description>
  <author>dim@tapoueh.org (Dimitri Fontaine)</author>
  <pubDate>Tue, 10 Feb 2009 00:00:00 +0100</pubDate>
  <guid isPermaLink="true">http://tapoueh.org/blog/2009/02/10-prefix-gist-index-now-in-81.html</guid>
</item>
<item>
  <title>Importing XML content from file</title>
  <link>http://tapoueh.org/blog/2009/02/05-importing-xml-content-from-file.html</link>
  <description><![CDATA[h1>Importing XML content from file</h1>
<div id="breadcrumb"><a href=../../../index.html>/dev/dim</a> / <a href=../../../blog/index.html>blog</a> / <a href=../../../blog/2009/index.html>2009</a> / <a href=../../../blog/2009/02/index.html>02</a> / </div>
<div class="date">Thursday, February 05 2009</div>
</div>
<div id="article">
<p><span class="hack"> </span></p>

<p>The problem was raised this week on <a href="http://www.postgresql.org/community/irc">IRC</a> and this time again I felt it would
be a good occasion for a blog entry: how to load an <code>XML</code> file content into a
single field?</p>

<p>The usual tool used to import files is <a href="http://www.postgresql.org/docs/current/interactive/sql-copy.html">COPY</a>, but it'll want each line of the
file to host a text representation of a database tuple, so it doesn't apply
to the case at hand. <a href="http://blog.rhodiumtoad.org.uk/">RhodiumToad</a> was online and offered t