<?xml version="1.0" encoding="utf-8"?> <rss version="2.0">

<channel> <title>tail -f /dev/dim</title> <link>http://tapoueh.org/index.html</link> <description>Dimitri Fontaine's blog</description> <language>en-us</language> <generator>Emacs Muse</generator> <item> <title>from Parsing to Compiling</title> <link>http://tapoueh.org/blog/2013/05/13-from-parser-to-compiler.html</link> <description><![CDATA[<p>Last week came with two bank hollidays in a row, and I took the opportunity to design a <em>command language</em> for <a href="../../../pgsql/pgloader.html">pgloader</a>. While doing that, I unexpectedly stumbled accross a very nice <em>AHAH!</em> moment, and I now want to share it with you, dear reader.</p>

<center> <p><img src="../../../images/lightbulb.gif" alt=""></p> </center> <center> <p><em>AHAH, you'll see!</em></p> </center> <p>The general approach I'm following code wise with that <em>command language</em> is to first get a code API to expose the capabilities of the system, then somehow plug the <em>command language</em> into that API thanks to a <em>parser</em>. It turns out that doing so in <em>Common Lisp</em> is really easy, and that you can get a <em>compiler</em> for free too, while at it. Let's see about that.</p> <h3>A very simple toy example</h3> <p class="first">In this newsgroup article <a href="https://groups.google.com/forum/?fromgroups=#&#33;topic/comp.lang.lisp/JJxTBqf7scU">What is symbolic compoutation?</a>, <a href="http://informatimago.com/">Pascal Bourguignon</a> did propose a very simple piece of code:</p> <pre class="src"> (<span style="color: #fcaf3e;">defparameter</span> <span style="color: #fce94f;">*additive-color-graph*</span>

'((red (red white) (green yellow) (blue magenta)) (green (red yellow) (green white) (blue cyan)) (blue (red magenta) (green cyan) (blue white))))

(<span style="color: #fcaf3e;">defun</span> <span style="color: #729fcf;">symbolic-color-add</span> (a b)

(cadr (assoc a (cdr (assoc b additive-color-graph))))) </pre>

<p>This is an example of <em>symbolic computation</em>, and we're going to build a little <em>language</em> to express the data and the code. Not that we would need to build one, mind you, more in order to have a really simple example leading us to the <em>ahah</em> moment you're now waiting for.</p> <p>Before we dive into the main topic, you have to realize that the previous code example actually works: it's defining some data, using an implicit data structure composed by nesting lists together, and defines a function that knows how to sort out the data in that anonymous data structure so as to compound 2 colors together.</p> <pre class="src"> TOY-PARSER&gt; (symbolic-color-add 'red 'green) YELLOW </pre> <h3>A command language and parser</h3> <p class="first">I decided to go with the following <em>language</em>:</p> <pre class="src"> color red +red white +green yellow +blue magenta color green +red yellow +green white +blue cyan color blue +red magenta +green cyan +blue white

mix red and green </pre>

<p>And here's how some of the parser looks like, using the <a href="http://nikodemus.github.io/esrap/">esrap</a> <em>packrat</em> lib:</p> <pre class="src"> (defrule color-name (and whitespaces (+ (alpha-char-p character)))

(<span style="color: #729fcf;">:destructure</span> (ws name) (<span style="color: #fcaf3e;">declare</span> (ignore ws)) <span style="color: #888a85;">; </span><span style="color: #888a85;">ignore whitespaces </span> <span style="color: #888a85;">;; </span><span style="color: #888a85;">CL symbols default to upper case. </span> (intern (string-upcase (coerce name 'string)) <span style="color: #729fcf;">:toy-parser</span>)))

<span style="color: #888a85;">;;; </span><span style="color: #888a85;">parse string "+ red white" </span>(defrule color-mix (and whitespaces <span style="color: #73d216;">"+"</span> color-name color-name)

(<span style="color: #729fcf;">:destructure</span> (ws plus color-added color-obtained) (<span style="color: #fcaf3e;">declare</span> (ignore ws plus)) <span style="color: #888a85;">; </span><span style="color: #888a85;">ignore whitespaces and keywords </span> (list color-added color-obtained)))

<span style="color: #888a85;">;;; </span><span style="color: #888a85;">mix red and green </span>(defrule mix-two-colors (and kw-mix color-name kw-and color-name)

(<span style="color: #729fcf;">:destructure</span> (mix c1 and c2) (<span style="color: #fcaf3e;">declare</span> (ignore mix and)) <span style="color: #888a85;">; </span><span style="color: #888a85;">ignore keywords </span> (list c1 c2))) </pre>

<p>Those <em>rules</em> are not the whole parser, go have a look at the project on github if you want to see the whole code, it's called <a href="https://github.com/dimitri/toy-parser">toy-parser</a> over there. The main idea here is to show that when we parse a line from our little language, we produce the simplest possible structured data: in lisp that's <em>symbols</em> and <em>lists</em>.</p> <p>The reason why it makes sense doing that is the next rule:</p> <center> <p><img src="../../../images/the-one-ring.jpg" alt=""></p> </center> <center> <p><em>The one grammar rule to bind them all</em></p> </center> <pre class="src"> (defrule program (and colors mix-two-colors)

(<span style="color: #729fcf;">:destructure</span> (graph (c1 c2)) `(<span style="color: #fcaf3e;">lambda</span> () (<span style="color: #fcaf3e;">let</span> ((additive-color-graph ',graph)) (symbolic-color-add ',c1 ',c2))))) </pre>

<p>This rule is the complex one to bind them all. It's using a <em>quasiquote</em>, a basic lisp syntax element allowing the programmer to very easily produce data that looks exactly like code. Let's see how it goes with a very simple example:</p> <pre class="src"> TOY-PARSER&gt; (pprint (parse 'program

<span style="color: #73d216;">"color red +green yellow mix green and red"</span>))

(LAMBDA NIL

(LET ((ADDITIVE-COLOR-GRAPH '((RED (GREEN YELLOW))))) (SYMBOLIC-COLOR-ADD 'RED 'GREEN)))

</pre> <p>The parser is producing structure (nested) data that really looks like lisp code, right? So maybe we can just run that code...</p> <h3>What about a compiler now?</h3> <center> <p><img src="../../../images/aha.jpg" alt=""></p> </center> <center> <p><em>Here is my AHAH moment!</em></p> </center> <p>Let's see about actually running the code:</p> <pre class="src"> TOY-PARSER&gt; (<span style="color: #fcaf3e;">let*</span> ((code <span style="color: #73d216;">"color red +green yellow mix green and red"</span>)

(program (parse 'program code))) (compile nil program)) Function #x3020027CF0EF&gt; NIL NIL TOY-PARSER&gt; (<span style="color: #fcaf3e;">let*</span> ((code <span style="color: #73d216;">"color red +green yellow mix green and red"</span>) (program (parse 'program code))) (funcall (compile nil program))) YELLOW </pre>

<p>So we have a string reprensing code in our very little language, and a parser that know how to produce a nested list of atoms that looks like lisp code. And as we have lisp, we can actually compile that code at run-time with the same compiler that we used to produce our parser, and we can then funcall that function we just built.</p> <p>Oh and the function is actually compiled down to native code, of course:</p> <pre class="src"> TOY-PARSER&gt; (<span style="color: #fcaf3e;">let*</span> ((code <span style="color: #73d216;">"color red +green yellow mix red and green"</span>)

(program (parse 'program code)) (func (compile nil program))) (time (<span style="color: #fcaf3e;">loop</span> repeat 1000 do (funcall func))))

(<span style="color: #fcaf3e;">LOOP</span> REPEAT 1000 DO (FUNCALL FUNC)) took 108 microseconds (0.000108 seconds) to run. During that period, and with 4 available CPU cores,

105 microseconds (0.000105 seconds) were spent in user mode 13 microseconds (0.000013 seconds) were spent in system mode NIL </pre>

<p>Yeah, it took the whole of 108 microseconds to actually run the code generated by our own <em>parser</em> <strong>a thousand times</strong>, on my laptop. I can believe it's been compiled to native code, that seems like the right ballpark.</p> <h3>Conclusion</h3> <p class="first">The <a href="https://github.com/dimitri/toy-parser">toy-parser</a> code is there on <em>GitHub</em> and you can actually load it using <a href="http://www.quicklisp.org/">Quicklisp</a>: clone the repository in ~/quicklisp/local-projects/ then (ql:quickload &quot;toy-parser&quot;), and play with it in (in-package :toy-parser).</p> <p>The only thing I still want to say here is this: can your programming language of choice make it that easy?</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 13 May 2013 11:08:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/05/13-from-parser-to-compiler.html</guid> </item> <item> <title>Nearest Big City</title> <link>http://tapoueh.org/blog/2013/05/02-nearest-big-city.html</link> <description><![CDATA[<p>In this article, we want to find the town with the greatest number of inhabitants near a given location.</p>

<center> <p><img src="../../../images/global_accessibility-640.png" alt=""></p> </center> <h3>A very localized example</h3> <p class="first">We first need to find and import some data, and I found at the following place a <a href="http://www.lion1906.com/Pages/francais/utile/telechargements.html">CSV listing of french cities with coordinates and population</a> and some numbers of interest for the exercise here.</p> <p>To import the data set, we first need a table, then a COPY command:</p> <pre class="src"> <span style="color: #fcaf3e;">CREATE</span> <span style="color: #fcaf3e;">TABLE</span> <span style="color: #729fcf;">lion1906</span> (

insee <span style="color: #c17d11;">text</span>, nom <span style="color: #c17d11;">text</span>, altitude <span style="color: #c17d11;">integer</span>, code_postal <span style="color: #c17d11;">text</span>, longitude <span style="color: #c17d11;">double</span> <span style="color: #c17d11;">precision</span>, latitude <span style="color: #c17d11;">double</span> <span style="color: #c17d11;">precision</span>, pop99 <span style="color: #c17d11;">bigint</span>, surface <span style="color: #c17d11;">double</span> <span style="color: #c17d11;">precision</span> );

\<span style="color: #729fcf;">copy</span> lion1906 <span style="color: #fcaf3e;">from</span> <span style="color: #73d216;">'villes.csv'</span> <span style="color: #fcaf3e;">with</span> <span style="color: #729fcf;">csv</span> <span style="color: #729fcf;">header</span> <span style="color: #729fcf;">delimiter</span> <span style="color: #73d216;">';'</span> <span style="color: #729fcf;">encoding</span> <span style="color: #73d216;">'latin1'</span> </pre>

<p>With that data in place, we can find the 10 nearest towns of a random choosing of us, let's pick <em>Villeurbanne</em> which is in the region of <em>Lyon</em>.</p> <pre class="src">

<span style="color: #fcaf3e;">select</span> code_postal, nom, pop99 <span style="color: #fcaf3e;">from</span> lion1906 <span style="color: #fcaf3e;">order</span> <span style="color: #729fcf;">by</span> <span style="color: #c17d11;">point</span>(longitude, latitude) &lt;-&gt; (<span style="color: #fcaf3e;">select</span> <span style="color: #c17d11;">point</span>(longitude, latitude) <span style="color: #fcaf3e;">from</span> lion1906 <span style="color: #fcaf3e;">where</span> nom = <span style="color: #73d216;">'Villeurbanne'</span>) <span style="color: #fcaf3e;">limit</span> 10;

code_postal nom pop99

<span style="color: #888a85;">————-+————————+———

</span> 69100 Villeurbanne 124215
69300 Caluire-et-Cuire 41233
69120 Vaulx-en-Velin 39154
69580 Sathonay-Camp 4336
69140 Rillieux-la-Pape 28367
69000 Lyon 445452
69500 Bron 37369
69580 Sathonay-Village 1693
01700 Neyron 2157
69660 Collonges-au-Mont-d<span style="color: #73d216;">'Or 3420

(10 rows) </span></pre>

<p>We find Lyon in our list in there, and we want the query now to return only that one as it has the greatest number of inhabitants in the list:</p> <pre class="src"> <span style="color: #fcaf3e;">with</span> neighbours <span style="color: #fcaf3e;">as</span> (

<span style="color: #fcaf3e;">select</span> code_postal, nom, pop99 <span style="color: #fcaf3e;">from</span> lion1906 <span style="color: #fcaf3e;">order</span> <span style="color: #729fcf;">by</span> <span style="color: #c17d11;">point</span>(longitude, latitude) &lt;-&gt; (<span style="color: #fcaf3e;">select</span> <span style="color: #c17d11;">point</span>(longitude, latitude) <span style="color: #fcaf3e;">from</span> lion1906 <span style="color: #fcaf3e;">where</span> nom = <span style="color: #73d216;">'Villeurbanne'</span>) <span style="color: #fcaf3e;">limit</span> 10 ) <span style="color: #fcaf3e;">select</span> * <span style="color: #fcaf3e;">from</span> neighbours <span style="color: #fcaf3e;">order</span> <span style="color: #729fcf;">by</span> pop99 <span style="color: #fcaf3e;">desc</span> <span style="color: #fcaf3e;">limit</span> 1;

code_postal nom pop99

<span style="color: #888a85;">————-+——+———

</span> 69000 Lyon 445452

(1 <span style="color: #729fcf;">row</span>) </pre>

<p>Well, thank you PostgreSQL, that was easy!</p> <p>Note that you can actually index such queries, that's called a <em>KNN index</em>. PostgreSQL knows how to use some kind of indexes to fetch data matching an expression such as ORDER BY a &lt;-&gt; b, which allow you to consider a <em>KNN</em> search in your application.</p> <h3>Let's get worldwide</h3> <p class="first">The real scope of our exercise is to associate every known town in the world with some big city around, so let's first fetch and import some worldwide data this time, from <a href="maxmind">http://download.maxmind.com/download/worldcities/worldcitiespop.txt.gz</a>.</p> <center> <p><img src="../../../images/map_nearest_city_01.gif" alt=""></p> </center> <pre class="src"> <span style="color: #fcaf3e;">CREATE</span> <span style="color: #fcaf3e;">TABLE</span> <span style="color: #729fcf;">maxmind_worldcities</span> (

country_code <span style="color: #c17d11;">text</span>, city_lower <span style="color: #c17d11;">text</span>, city_normal <span style="color: #c17d11;">text</span>, region_code <span style="color: #c17d11;">text</span> <span style="color: #fcaf3e;">DEFAULT</span> <span style="color: #73d216;">''</span>, population <span style="color: #c17d11;">INT</span> <span style="color: #fcaf3e;">DEFAULT</span> <span style="color: #73d216;">'0'</span>, latitude <span style="color: #c17d11;">float8</span> <span style="color: #fcaf3e;">DEFAULT</span> <span style="color: #73d216;">'0'</span>, longitude <span style="color: #c17d11;">float8</span> <span style="color: #fcaf3e;">DEFAULT</span> <span style="color: #73d216;">'0'</span> );

\<span style="color: #729fcf;">copy</span> maxmind_worldcities <span style="color: #fcaf3e;">FROM</span> <span style="color: #73d216;">'/tmp/worldcitiespop.txt'</span> <span style="color: #fcaf3e;">WITH</span> <span style="color: #729fcf;">DELIMITER</span> <span style="color: #73d216;">','</span> <span style="color: #729fcf;">QUOTE</span> E<span style="color: #73d216;">'\f'</span> <span style="color: #729fcf;">CSV</span> <span style="color: #729fcf;">HEADER</span> <span style="color: #729fcf;">ENCODING</span> <span style="color: #73d216;">'LATIN1'</span>;

<span style="color: #729fcf;">alter</span> <span style="color: #fcaf3e;">table</span> <span style="color: #729fcf;">maxmind_worldcities</span> <span style="color: #729fcf;">add</span> <span style="color: #fcaf3e;">column</span> loc <span style="color: #c17d11;">point</span>; <span style="color: #729fcf;">update</span> maxmind_worldcities <span style="color: #729fcf;">set</span> loc = <span style="color: #c17d11;">point</span>(longitude, latitude); </pre>

<p>This time you can see that I created an extra column with the <em>location</em> in there, so that I don't have to compute it each time I need it, like I did before.</p> <p>Now is the time to test that data set and hopefully fetch the same result as before when we only had french cities loaded:</p> <pre class="src"> <span style="color: #fcaf3e;">with</span> neighbours <span style="color: #fcaf3e;">as</span> (

<span style="color: #fcaf3e;">select</span> country_code, city_lower, population <span style="color: #fcaf3e;">from</span> maxmind_worldcities <span style="color: #fcaf3e;">where</span> population <span style="color: #fcaf3e;">is</span> <span style="color: #fcaf3e;">not</span> <span style="color: #fcaf3e;">null</span> <span style="color: #fcaf3e;">order</span> <span style="color: #729fcf;">by</span> loc &lt;-&gt; (<span style="color: #fcaf3e;">select</span> loc <span style="color: #fcaf3e;">from</span> maxmind_worldcities <span style="color: #fcaf3e;">where</span> city_lower = <span style="color: #73d216;">'villeurbanne'</span>) <span style="color: #fcaf3e;">limit</span> 10 ) <span style="color: #fcaf3e;">select</span> * <span style="color: #fcaf3e;">from</span> neighbours <span style="color: #fcaf3e;">order</span> <span style="color: #729fcf;">by</span> population <span style="color: #fcaf3e;">desc</span> <span style="color: #fcaf3e;">limit</span> 1;

country_code city_lower population

<span style="color: #888a85;">—————+————+————

</span> fr lyon 463700

(1 <span style="color: #729fcf;">row</span>) </pre>

<p>Ok, looks like we're all set for the real problem. Now we want to pick for each of those cities it's nearest neighboor, so here's how to do that:</p> <pre class="src"> <span style="color: #fcaf3e;">create</span> <span style="color: #729fcf;">index</span> <span style="color: #fcaf3e;">on</span> maxmind_worldcities(country_code, region_code, city_lower); <span style="color: #fcaf3e;">create</span> <span style="color: #729fcf;">index</span> <span style="color: #fcaf3e;">on</span> maxmind_worldcities <span style="color: #fcaf3e;">using</span> gist(loc);

<span style="color: #fcaf3e;">create</span> <span style="color: #fcaf3e;">table</span> <span style="color: #729fcf;">maxmind_neighbours</span> <span style="color: #fcaf3e;">as</span>

<span style="color: #fcaf3e;">select</span> country_code, region_code, city_lower, (<span style="color: #fcaf3e;">with</span> neighbours <span style="color: #fcaf3e;">as</span> ( <span style="color: #fcaf3e;">select</span> country_code, city_lower, population <span style="color: #fcaf3e;">from</span> maxmind_worldcities <span style="color: #fcaf3e;">where</span> population <span style="color: #fcaf3e;">is</span> <span style="color: #fcaf3e;">not</span> <span style="color: #fcaf3e;">null</span> <span style="color: #fcaf3e;">and</span> country_code = wc.country_code <span style="color: #fcaf3e;">and</span> region_code = wc.region_code <span style="color: #fcaf3e;">order</span> <span style="color: #729fcf;">by</span> loc &lt;-&gt; wc.loc <span style="color: #fcaf3e;">limit</span> 10) <span style="color: #fcaf3e;">select</span> city_lower <span style="color: #fcaf3e;">from</span> neighbours <span style="color: #fcaf3e;">order</span> <span style="color: #729fcf;">by</span> population <span style="color: #fcaf3e;">desc</span> <span style="color: #fcaf3e;">limit</span> 1 ) <span style="color: #fcaf3e;">as</span> neighbour <span style="color: #fcaf3e;">from</span> maxmind_worldcities wc ; </pre>

<p>To be fair, I have to tell you that this query took almost 2 hours to complete on my laptop here, but as I'm doing that for friend and a blog article, I've been lazy and didn't try to optimise it. It could be using LATERAL for sure, I don't know if that would help very much with performances: I didn't try.</p> <p>With that in hands we can now check some cities and their <em>biggest</em> neighbours, as in the following query:</p> <pre class="src"> <span style="color: #fcaf3e;">select</span> * <span style="color: #fcaf3e;">from</span> maxmind_neighbours <span style="color: #fcaf3e;">where</span> city_lower = <span style="color: #73d216;">'villeurbanne'</span>;
country_code region_code city_lower neighbour

<span style="color: #888a85;">—————+————-+—————+————

</span> fr B9 villeurbanne lyon

(1 <span style="color: #729fcf;">row</span>) </pre>

<p>And looking for New-York City suburbs I did find a <em>chinatown</em>, which is a pretty common smaller town name apparently:</p> <pre class="src"> <span style="color: #fcaf3e;">select</span> * <span style="color: #fcaf3e;">from</span> maxmind_neighbours <span style="color: #fcaf3e;">where</span> city_lower = <span style="color: #73d216;">'chinatown'</span>;
country_code region_code city_lower neighbour

<span style="color: #888a85;">—————+————-+————+—————

</span> sb 08 chinatown honiara
us CA chinatown san francisco
us DC chinatown washington
us HI chinatown honolulu
us IL chinatown chicago
us MT chinatown missoula
us NV chinatown reno
us NY chinatown <span style="color: #729fcf;">new</span> york

(8 <span style="color: #729fcf;">rows</span>) </pre>

<h3>Big Cities in the big world</h3> <center> <p><img src="../../../images/Old-Photos-of-Big-Cities-21.jpg" alt=""></p> </center> <center> <p><em>We might need to change some of our views</em></p> </center> <p>So, let's see how many smaller towns each of those random big cities have:</p> <pre class="src">

<span style="color: #fcaf3e;">select</span> country_code, region_code, neighbour, <span style="color: #729fcf;">count</span>() <span style="color: #fcaf3e;">from</span> maxmind_neighbours <span style="color: #fcaf3e;">where</span> neighbour <span style="color: #fcaf3e;">in</span> (<span style="color: #73d216;">'london'</span>, <span style="color: #73d216;">'new york'</span>, <span style="color: #73d216;">'moscow'</span>, <span style="color: #73d216;">'paris'</span>, <span style="color: #73d216;">'tokyo'</span>, <span style="color: #73d216;">'sao polo'</span>, <span style="color: #73d216;">'chicago'</span>) <span style="color: #fcaf3e;">group</span> <span style="color: #729fcf;">by</span> country_code, region_code, neighbour;

country_code region_code neighbour <span style="color: #729fcf;">count</span>

<span style="color: #888a85;">—————+————-+————+——-

</span> gb H9 london 2
jp 40 tokyo 414
us NY <span style="color: #729fcf;">new</span> york 131
ca 08 london 16
ru 48 moscow 245
fr A8 paris 16
us IL chicago 13

(7 <span style="color: #729fcf;">rows</span>) </pre>

<p>And now let's be fair and see where are the cities with the greatest number of towns nearby them, with the following query:</p> <pre class="src">

<span style="color: #fcaf3e;">select</span> country_code, region_code, neighbour, <span style="color: #729fcf;">count</span>() <span style="color: #fcaf3e;">from</span> maxmind_neighbours <span style="color: #fcaf3e;">where</span> neighbour <span style="color: #fcaf3e;">is</span> <span style="color: #fcaf3e;">not</span> <span style="color: #fcaf3e;">null</span> <span style="color: #fcaf3e;">group</span> <span style="color: #729fcf;">by</span> country_code, region_code, neighbour <span style="color: #fcaf3e;">order</span> <span style="color: #729fcf;">by</span> 4 <span style="color: #fcaf3e;">desc</span> <span style="color: #fcaf3e;">limit</span> 25;

country_code region_code neighbour <span style="color: #729fcf;">count</span>

<span style="color: #888a85;">—————+————-+————+——-

</span> cn 03 nanchang 16759
cn 26 xian 12864
<span style="color: #729fcf;">id</span> 18 kupang 10715
cn 24 taiyuan 10550
mm 11 taunggyi 10253
<span style="color: #729fcf;">id</span> 38 makasar 9471
ir 15 ahvaz 9461
<span style="color: #729fcf;">id</span> 01 banda aceh 9161
cn 14 lasa 8841
cn 15 lanzhou 8618
ir 29 kerman 8579
<span style="color: #729fcf;">id</span> 26 medan 7787
ir 04 iranshahr 7249
ir 07 shiraz 7219
ma 55 agadir 7121
ir 42 mashhad 7107
af 08 gazni 7011
ir 33 tabriz 6586
cn 01 hefei 6521
bd 81 dhaka 6480
ir 08 rasht 6471
<span style="color: #729fcf;">id</span> 17 mataram 6467
<span style="color: #729fcf;">id</span> 33 cilegon 6287
af 23 qandahar 6213
cn 07 fuzhou 6089

(25 <span style="color: #729fcf;">rows</span>) </pre> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 02 May 2013 11:34:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/05/02-nearest-big-city.html</guid> </item> <item> <title>Emacs Conference</title> <link>http://tapoueh.org/blog/2013/04/02-Emacs-Conference.html</link> <description><![CDATA[<p>Yes it did happen, for real, in London: the <a href="http://emacsconf.herokuapp.com/">Emacs Conference</a>. It was easter week-end. Yet the conference managed to have more than 60 people meet together and spend a full day talking about <a href="http://www.gnu.org/software/emacs/">Emacs</a>. If you weren't there, a live stream was available and soon enough (wait for about two weeks) the video material will be published, as <a href="http://sachachua.com/blog/">sacha</a> is working on it.</p>

<center> <p><img src="../../../images/toplap-small.png" alt=""></p> </center> <p>The conference has been packed with awesome really. Among the things that I'm going home with are new thoughts, tricks and tips, and new modes to use in Emacs.</p> <p>The main new though is all about learning to program. That's a problem space in which I have a growing interest in, and the conference talk about <em>arxana</em> showed that it should be possible to build an environment where you can learn programming with the excuse of having fun with maths. And after talking about music and its notation and <a href="http://www.lilypond.org/">lilypond</a>, it should even be possible to offer some interactive programming environment where not only you can play music live as <a href="http://meta-ex.com/">Meta-Ex</a> is doing, but where the other output of your program would be the updated music scores.</p> <p>The main practical bits I'm going home with is <a href="http://www.foldr.org/~michaelw/projects/redshank">redshank</a>, <em>A collection of code-wrangling Emacs macros mostly geared towards Common Lisp, but some are useful for other Lisp dialects, too</em>. That complements <a href="http://mumble.net/~campbell/emacs/paredit.el">paredit</a> and allows you to do some reformating very easily.</p> <p>Lastly, I'm back to giving the dark background environment a try now. I think I prefer the contrast and richer color sets of the default Emacs color theme, but the black window has some classy visual effect too. And with <a href="https://github.com/jasonm23/emacs-mainline">main-line</a> the effect is quite awesome!</p> <center> <p><img src="../../../images/Emacs-Tango-2-Main-Line.png" alt=""></p> </center> <center> <p><em>Look at that!</em></p> </center> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 02 Apr 2013 09:56:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/04/02-Emacs-Conference.html</guid> </item> <item> <title>The Need For Speed</title> <link>http://tapoueh.org/blog/2013/03/29-the-need-for-speed.html</link> <description><![CDATA[<p>Hier se tenait la <a href="http://www.postgresql-sessions.org/en/5/start">cinquième édition</a> de la conférence organisée par <em>dalibo</em>, où des intervenants extérieurs sont régulièrement invités. Le thème hier était à la fois clair et très vaste : la performance.</p>

<p>J'ai eu le plaisir de réaliser une présentation intitulée « The Need for Speed » dans laquelle on replace l'effort d'optimisation dans son contexte métier, afin de faire une étude des coûts et bénéfices et de savoir non seulement à quoi s'attendre mais aussi quand s'arrêter.</p> <center> <p><a class="image-link" href="../../../images/confs/the_need_for_speed.pdf"> <img src="../../../images/confs/the_need_for_speed-3.png"></a></p> </center> <p>Merci à <em>dalibo</em> pour cette conférence !</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 29 Mar 2013 09:49:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/03/29-the-need-for-speed.html</guid> </item> <item> <title>Bulk Replication</title> <link>http://tapoueh.org/blog/2013/03/18-bulk-replication.html</link> <description><![CDATA[<p>In the previous article here we talked about how to <em>properly</em> update more than one row at a time, under the title <a href="http://tapoueh.org/blog/2013/03/15-batch-update.html">Batch Update</a>. We did consider performances, including network round trips, and did look at the behavior of our results when used concurrently.</p>

<center> <p><img src="../../../images/clock-key.jpg" alt=""></p> </center> <p>A case where we want to apply the previous article approach is when replicating data with a <em>trigger based solution</em>, such as <a href="http://wiki.postgresql.org/wiki/SkyTools">SkyTools</a> and <a href="https://github.com/markokr/skytools">londiste</a>. Well, maybe not in all cases, we need to have a amount of UPDATE trafic worthy of setting up the solution. As soon as we know we're getting to <em>replay</em> important enough batches of events, though, certainly using the <em>batch update</em> tricks makes sense.</p> <p>It so happens that londiste 3 includes the capability to use <em>handlers</em>. Those are plugins written in <em>python</em> (like all the client side code from <em>SkyTools</em>) whose job is to handle the <em>processing</em> of the event batches. Several of them are included in the <a href="https://github.com/markokr/skytools/tree/master/python/londiste">londiste sources</a>, and one of them is named bulk.py.</p> <h3>Bulk loading data with londiste</h3> <p class="first">To use set in londiste.ini:</p> <pre class="src"> <span style="color: #fce94f;">handler_modules</span> = londiste.handlers.bulk </pre> <p>then add table with one of those commands:</p> <pre class="src"> londiste3 add-table xx —handler=<span style="color: #73d216;">"bulk"</span> londiste3 add-table xx —handler=<span style="color: #73d216;">"bulk(method=X)"</span> </pre> <p>The default method is 0, and the available methods are the following:</p> <p><em>correct</em> (0)</p> <ul> <li>inserts as COPY into table</li> <li>update as COPY into temp table and single UPDATE from there</li> <li>delete as COPY into temp table and single DELETE from there</li> </ul> <p><em>delete</em> (1)</p> <ul> <li>as <em>correct</em>, but <em>update</em> are done as DELETE then COPY</li> </ul> <p><em>merged</em> (2)</p> <ul> <li>as <em>delete</em>, but merge <em>insert</em> rows with <em>update</em> rows</li> </ul> <h3>Conclusion</h3> <center> <p><img src="../../../images/londiste.jpg" alt=""></p> </center> <p>Yes, by using that <em>handler</em> which is provided by default in <em>londiste</em>, you will apply the previous article tricks in your replication solution. And you can even choose to use that for only some of the tables you are replicating.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 18 Mar 2013 14:54:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/03/18-bulk-replication.html</guid> </item> <item> <title>Batch Update</title> <link>http://tapoueh.org/blog/2013/03/15-batch-update.html</link> <description><![CDATA[<p>Performance consulting involves some tricks that you have to teach over and over again. One of them is that SQL tends to be so much better at dealing with plenty of rows in a single statement when compared to running as many statements, each one against a single row.</p>

<center> <p><img src="../../../images/Home-Brewing.jpg" alt=""></p> </center> <center> <p><em>Another kind of Batch to update</em></p> </center> <p>So when you need to UPDATE a bunch of rows from a given source, remember that you can actually use a JOIN in the <em>update</em> statement. Either the source of data is already in the database, in which case it's as simple as using the FROM clause in the <em>update</em> statement, or it's not, and we're getting back to that in a minute.</p> <h3>UPDATE FROM</h3> <p class="first">It's all about using that FROM clause in an <em>update</em> statement, right?</p> <pre class="src">

<span style="color: #729fcf;">UPDATE</span> target <span style="color: #729fcf;">t</span> <span style="color: #729fcf;">SET</span> counter = <span style="color: #729fcf;">t</span>.counter + s.counter, <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">source</span> s <span style="color: #fcaf3e;">WHERE</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> = s.<span style="color: #729fcf;">id</span> </pre>

<p>Using that, you can actually update thousands of rows in our <em>target</em> table in a single statement, and you can't really get faster than that.</p> <h3>Preparing the Batch</h3> <p class="first">Now, if you happen to have the source data in your application process' memory, the previous bits is not doing you any good, you think. Well, the trick is that pushing your in-memory data into the database and then joining against the now local source of data is generally faster than looping in the application and having to do a whole network <em>round trip</em> per row.</p> <center> <p><img src="../../../images/round-trip.png" alt=""></p> </center> <center> <p><em>What about</em> <strong><em>that</em></strong> <em>round trip?</em></p> </center> <p>Let's see how it goes:</p> <pre class="src"> <span style="color: #fcaf3e;">CREATE</span> <span style="color: #729fcf;">TEMP</span> <span style="color: #fcaf3e;">TABLE</span> <span style="color: #729fcf;">source</span>(<span style="color: #fcaf3e;">LIKE</span> target <span style="color: #729fcf;">INCLUDING</span> <span style="color: #fcaf3e;">ALL</span>) <span style="color: #fcaf3e;">ON</span> <span style="color: #729fcf;">COMMIT</span> <span style="color: #729fcf;">DROP</span>;

<span style="color: #729fcf;">COPY</span> <span style="color: #729fcf;">source</span> <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">STDIN</span>;

<span style="color: #729fcf;">UPDATE</span> target <span style="color: #729fcf;">t</span>

<span style="color: #729fcf;">SET</span> counter = <span style="color: #729fcf;">t</span>.counter + s.counter, <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">source</span> s <span style="color: #fcaf3e;">WHERE</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> = s.<span style="color: #729fcf;">id</span> </pre>

<p>As we're talking about performances, the trick here is to use the <a href="http://www.postgresql.org/docs/9.2/static/sql-copy.html">COPY</a> protocol to fill in the <em>temporary table</em> we just create to hold our data. So we're now sending the whole data set in a temporary location in the database, then using that as the UPDATE source. And that's way faster than doing a separate UPDATE statement per row in your batch, even for small batches.</p> <p>Also, rather than using the SQL COPY command, you might want to look up the docs of the PostgreSQL driver you are currently using in your application, it certainly includes some higher level facilities to deal with pushing the data into the streaming protocol.</p> <h3>Insert or Update</h3> <p class="first">And now sometime some of the rows in the batch have to be <em>updated</em> while some others are new and must be inserted. How do you do that? Well, PostgreSQL 9.1 brings on the table WITH support for all <a href="http://www.postgresql.org/docs/9.2/static/dml.html">DML</a> queries, which means that you can do the following just fine:</p> <pre class="src"> <span style="color: #fcaf3e;">WITH</span> upd <span style="color: #fcaf3e;">AS</span> (

<span style="color: #729fcf;">UPDATE</span> target <span style="color: #729fcf;">t</span> <span style="color: #729fcf;">SET</span> counter = <span style="color: #729fcf;">t</span>.counter + s.counter, <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">source</span> s <span style="color: #fcaf3e;">WHERE</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> = s.<span style="color: #729fcf;">id</span> <span style="color: #fcaf3e;">RETURNING</span> s.<span style="color: #729fcf;">id</span> ) <span style="color: #729fcf;">INSERT</span> <span style="color: #fcaf3e;">INTO</span> target(<span style="color: #729fcf;">id</span>, counter) <span style="color: #fcaf3e;">SELECT</span> <span style="color: #729fcf;">id</span>, <span style="color: #729fcf;">sum</span>(counter) <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">source</span> s <span style="color: #fcaf3e;">LEFT</span> <span style="color: #fcaf3e;">JOIN</span> upd <span style="color: #729fcf;">t</span> <span style="color: #fcaf3e;">USING</span>(<span style="color: #729fcf;">id</span>) <span style="color: #fcaf3e;">WHERE</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> <span style="color: #fcaf3e;">IS</span> <span style="color: #fcaf3e;">NULL</span> <span style="color: #fcaf3e;">GROUP</span> <span style="color: #729fcf;">BY</span> s.<span style="color: #729fcf;">id</span> <span style="color: #fcaf3e;">RETURNING</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> </pre>

<p>That query here is <em>updating</em> all the rows that are known in both the <em>target</em> and the <em>source</em> and returns what we took from the <em>source</em> in the operation, so that we can do an <em>anti-join</em> in the next step of the query, where we're <em>inserting</em> any row that was not taken care of in the <em>update</em> part of the statement.</p> <p>Note that when the batch gets to bigger size it's usually better to join against the <em>target</em> table in the INSERT statement, because that will have an <em>index</em> on the join key.</p> <h3>Concurrency patterns</h3> <p class="first">Now, you will tell me that we just solved the UPSERT problem. Well what happens if more than one transaction is trying to do the WITH (UPDATE) INSERT dance at the same time? It's a single <em>statement</em>, so it's a single <em>snapshot</em>. What can go wrong?</p> <center> <p><img src="../../../images/gophermegaphones.480.jpg" alt=""></p> </center> <center> <p><em>Concurrent processing</em></p> </center> <p>What happens is that as soon as the concurrent sources contain some data for the same <em>primary key</em>, you get a <em>duplicate key</em> error on the insert. As both the transactions are concurrent, they are seeing the same <em>target</em> table where the new data does not exists, and both will conclude that they need to INSERT the new data into the <em>target</em> table.</p> <p>There are two things that you can do to avoid the problem. The first thing is to make it so that you're doing only one <em>batch update</em> at any time, by architecting your application around that constraint. That's the most effective way around the problem, but not the most practical.</p> <p>The other thing you can do, is force the concurrent transactions to serialize one after the other, using an <a href="http://www.postgresql.org/docs/9.2/static/explicit-locking.html">explicit locking</a> statement:</p> <pre class="src"> <span style="color: #729fcf;">LOCK</span> <span style="color: #fcaf3e;">TABLE</span> target <span style="color: #fcaf3e;">IN</span> <span style="color: #729fcf;">SHARE</span> <span style="color: #729fcf;">ROW</span> <span style="color: #729fcf;">EXCLUSIVE</span> <span style="color: #729fcf;">MODE</span>; </pre> <p>That <em>lock level</em> is not automatically acquired by any PostgreSQL command, so the only way it helps you is when you're doing that for every transaction you want to serialize. When you know you're not at risk (that is, when not playing the <em>insert or update</em> dance), you can omit taking that <em>lock</em>.</p> <h3>Conclusion</h3> <center> <p><a class="image-link" href="http://www.flickr.com/photos/asquarephotography/6841106459/in/photostream/"> <img src="../../../images/stack-of-old-books.jpg"></a></p> </center> <p>The SQL language has its quirks, that's true. It's been made for efficient data processing, and with recent enough <a href="http://www.postgresql.org/about/featurematrix/">PostgreSQL releases</a> you even have some advanced pipelining facilities included in the language. Properly learning how to make the most out of that old component of your programming stack still makes a lot of sense today!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 15 Mar 2013 10:47:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/03/15-batch-update.html</guid> </item> <item> <title>Batch Update</title> <link>http://tapoueh.org/blog/2013/03/15-batch-update.html</link> <description><![CDATA[<p>Performance consulting involves some tricks that you have to teach over and over again. One of them is that SQL tends to be so much better at dealing with plenty of rows in a single statement when compared to running as many statements, each one against a single row.</p>

<center> <p><img src="../../../images/Home-Brewing.jpg" alt=""></p> </center> <center> <p><em>Another kind of Batch to update</em></p> </center> <p>So when you need to UPDATE a bunch of rows from a given source, remember that you can actually use a JOIN in the <em>update</em> statement. Either the source of data is already in the database, in which case it's as simple as using the FROM clause in the <em>update</em> statement, or it's not, and we're getting back to that in a minute.</p> <h3>UPDATE FROM</h3> <p class="first">It's all about using that FROM clause in an <em>update</em> statement, right?</p> <pre class="src">

<span style="color: #729fcf;">UPDATE</span> target <span style="color: #729fcf;">t</span> <span style="color: #729fcf;">SET</span> counter = <span style="color: #729fcf;">t</span>.counter + s.counter, <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">source</span> s <span style="color: #fcaf3e;">WHERE</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> = s.<span style="color: #729fcf;">id</span> </pre>

<p>Using that, you can actually update thousands of rows in our <em>target</em> table in a single statement, and you can't really get faster than that.</p> <h3>Preparing the Batch</h3> <p class="first">Now, if you happen to have the source data in your application process' memory, the previous bits is not doing you any good, you think. Well, the trick is that pushing your in-memory data into the database and then joining against the now local source of data is generally faster than looping in the application and having to do a whole network <em>round trip</em> per row.</p> <center> <p><img src="../../../images/round-trip.png" alt=""></p> </center> <center> <p><em>What about</em> <strong><em>that</em></strong> <em>round trip?</em></p> </center> <p>Let's see how it goes:</p> <pre class="src"> <span style="color: #fcaf3e;">CREATE</span> <span style="color: #729fcf;">TEMP</span> <span style="color: #fcaf3e;">TABLE</span> <span style="color: #729fcf;">source</span>(<span style="color: #fcaf3e;">LIKE</span> target <span style="color: #729fcf;">INCLUDING</span> <span style="color: #fcaf3e;">ALL</span>) <span style="color: #fcaf3e;">ON</span> <span style="color: #729fcf;">COMMIT</span> <span style="color: #729fcf;">DROP</span>;

<span style="color: #729fcf;">COPY</span> <span style="color: #729fcf;">source</span> <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">STDIN</span>;

<span style="color: #729fcf;">UPDATE</span> target <span style="color: #729fcf;">t</span>

<span style="color: #729fcf;">SET</span> counter = <span style="color: #729fcf;">t</span>.counter + s.counter, <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">source</span> s <span style="color: #fcaf3e;">WHERE</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> = s.<span style="color: #729fcf;">id</span> </pre>

<p>As we're talking about performances, the trick here is to use the <a href="http://www.postgresql.org/docs/9.2/static/sql-copy.html">COPY</a> protocol to fill in the <em>temporary table</em> we just create to hold our data. So we're now sending the whole data set in a temporary location in the database, then using that as the UPDATE source. And that's way faster than doing a separate UPDATE statement per row in your batch, even for small batches.</p> <p>Also, rather than using the SQL COPY command, you might want to look up the docs of the PostgreSQL driver you are currently using in your application, it certainly includes some higher level facilities to deal with pushing the data into the streaming protocol.</p> <h3>Insert or Update</h3> <p class="first">And now sometime some of the rows in the batch have to be <em>updated</em> while some others are new and must be inserted. How do you do that? Well, PostgreSQL 9.1 brings on the table WITH support for all <a href="http://www.postgresql.org/docs/9.2/static/dml.html">DML</a> queries, which means that you can do the following just fine:</p> <pre class="src"> <span style="color: #fcaf3e;">WITH</span> upd <span style="color: #fcaf3e;">AS</span> (

<span style="color: #729fcf;">UPDATE</span> target <span style="color: #729fcf;">t</span> <span style="color: #729fcf;">SET</span> counter = <span style="color: #729fcf;">t</span>.counter + s.counter, <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">source</span> s <span style="color: #fcaf3e;">WHERE</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> = s.<span style="color: #729fcf;">id</span> <span style="color: #fcaf3e;">RETURNING</span> s.<span style="color: #729fcf;">id</span> ) <span style="color: #729fcf;">INSERT</span> <span style="color: #fcaf3e;">INTO</span> target(<span style="color: #729fcf;">id</span>, counter) <span style="color: #fcaf3e;">SELECT</span> <span style="color: #729fcf;">id</span>, <span style="color: #729fcf;">sum</span>(counter) <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">source</span> s <span style="color: #fcaf3e;">LEFT</span> <span style="color: #fcaf3e;">JOIN</span> upd <span style="color: #729fcf;">t</span> <span style="color: #fcaf3e;">USING</span>(<span style="color: #729fcf;">id</span>) <span style="color: #fcaf3e;">WHERE</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> <span style="color: #fcaf3e;">IS</span> <span style="color: #fcaf3e;">NULL</span> <span style="color: #fcaf3e;">GROUP</span> <span style="color: #729fcf;">BY</span> s.<span style="color: #729fcf;">id</span> <span style="color: #fcaf3e;">RETURNING</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> </pre>

<p>That query here is <em>updating</em> all the rows that are known in both the <em>target</em> and the <em>source</em> and returns what we took from the <em>source</em> in the operation, so that we can do an <em>anti-join</em> in the next step of the query, where we're <em>inserting</em> any row that was not taken care of in the <em>update</em> part of the statement.</p> <p>Note that when the batch gets to bigger size it's usually better to join against the <em>target</em> table in the INSERT statement, because that will have an <em>index</em> on the join key.</p> <h3>Concurrency patterns</h3> <p class="first">Now, you will tell me that we just solved the UPSERT problem. Well what happens if more than one transaction is trying to do the WITH (UPDATE) INSERT dance at the same time? It's a single <em>statement</em>, so it's a single <em>snapshot</em>. What can go wrong?</p> <center> <p><img src="../../../images/gophermegaphones.480.jpg" alt=""></p> </center> <center> <p><em>Concurrent processing</em></p> </center> <p>What happens is that as soon as the concurrent sources contain some data for the same <em>primary key</em>, you get a <em>duplicate key</em> error on the insert. As both the transactions are concurrent, they are seeing the same <em>target</em> table where the new data does not exists, and both will conclude that they need to INSERT the new data into the <em>target</em> table.</p> <p>There are two things that you can do to avoid the problem. The first thing is to make it so that you're doing only one <em>batch update</em> at any time, by architecting your application around that constraint. That's the most effective way around the problem, but not the most practical.</p> <p>The other thing you can do, is force the concurrent transactions to serialize one after the other, using an <a href="http://www.postgresql.org/docs/9.2/static/explicit-locking.html">explicit locking</a> statement:</p> <pre class="src"> <span style="color: #729fcf;">LOCK</span> <span style="color: #fcaf3e;">TABLE</span> target <span style="color: #fcaf3e;">IN</span> <span style="color: #729fcf;">SHARE</span> <span style="color: #729fcf;">ROW</span> <span style="color: #729fcf;">EXCLUSIVE</span> <span style="color: #729fcf;">MODE</span>; </pre> <p>That <em>lock level</em> is not automatically acquired by any PostgreSQL command, so the only way it helps you is when you're doing that for every transaction you want to serialize. When you know you're not at risk (that is, when not playing the <em>insert or update</em> dance), you can omit taking that <em>lock</em>.</p> <h3>Conclusion</h3> <center> <p><a class="image-link" href="http://www.flickr.com/photos/asquarephotography/6841106459/in/photostream/"> <img src="../../../images/stack-of-old-books.jpg"></a></p> </center> <p>The SQL language has its quirks, that's true. It's been made for efficient data processing, and with recent enough <a href="http://www.postgresql.org/about/featurematrix/">PostgreSQL releases</a> you even have some advanced pipelining facilities included in the language. Properly learning how to make the most out of that old component of your programming stack still makes a lot of sense today!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 15 Mar 2013 10:47:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/03/15-batch-update.html</guid> </item> <item> <title>Batch Update</title> <link>http://tapoueh.org/blog/2013/03/15-batch-update.html</link> <description><![CDATA[<p>Performance consulting involves some tricks that you have to teach over and over again. One of them is that SQL tends to be so much better at dealing with plenty of rows in a single statement when compared to running as many statements, each one against a single row.</p>

<center> <p><img src="../../../images/Home-Brewing.jpg" alt=""></p> </center> <center> <p><em>Another kind of Batch to update</em></p> </center> <p>So when you need to UPDATE a bunch of rows from a given source, remember that you can actually use a JOIN in the <em>update</em> statement. Either the source of data is already in the database, in which case it's as simple as using the FROM clause in the <em>update</em> statement, or it's not, and we're getting back to that in a minute.</p> <h3>UPDATE FROM</h3> <p class="first">It's all about using that FROM clause in an <em>update</em> statement, right?</p> <pre class="src">

<span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span> <span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter, <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span> </pre>

<p>Using that, you can actually update thousands of rows in our <em>target</em> table in a single statement, and you can't really get faster than that.</p> <h3>Preparing the Batch</h3> <p class="first">Now, if you happen to have the source data in your application process' memory, the previous bits is not doing you any good, you think. Well, the trick is that pushing your in-memory data into the database and then joining against the now local source of data is generally faster than looping in the application and having to do a whole network <em>round trip</em> per row.</p> <center> <p><img src="../../../images/round-trip.png" alt=""></p> </center> <center> <p><em>What about</em> <strong><em>that</em></strong> <em>round trip?</em></p> </center> <p>Let's see how it goes:</p> <pre class="src"> <span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">TEMP</span> <span style="color: #7f007f;">TABLE</span> <span style="color: #da70d6;">source</span>(<span style="color: #7f007f;">LIKE</span> target <span style="color: #da70d6;">INCLUDING</span> <span style="color: #7f007f;">ALL</span>) <span style="color: #7f007f;">ON</span> <span style="color: #da70d6;">COMMIT</span> <span style="color: #da70d6;">DROP</span>;

<span style="color: #da70d6;">COPY</span> <span style="color: #da70d6;">source</span> <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">STDIN</span>;

<span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span>

<span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter, <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span> </pre>

<p>As we're talking about performances, the trick here is to use the <a href="http://www.postgresql.org/docs/9.2/static/sql-copy.html">COPY</a> protocol to fill in the <em>temporary table</em> we just create to hold our data. So we're now sending the whole data set in a temporary location in the database, then using that as the UPDATE source. And that's way faster than doing a separate UPDATE statement per row in your batch, even for small batches.</p> <p>Also, rather than using the SQL COPY command, you might want to look up the docs of the PostgreSQL driver you are currently using in your application, it certainly includes some higher level facilities to deal with pushing the data into the streaming protocol.</p> <h3>Insert or Update</h3> <p class="first">And now sometime some of the rows in the batch have to be <em>updated</em> while some others are new and must be inserted. How do you do that? Well, PostgreSQL 9.1 brings on the table WITH support for all <a href="http://www.postgresql.org/docs/9.2/static/dml.html">DML</a> queries, which means that you can do the following just fine:</p> <pre class="src"> <span style="color: #7f007f;">WITH</span> upd <span style="color: #7f007f;">AS</span> (

<span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span> <span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter, <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span> <span style="color: #7f007f;">RETURNING</span> s.<span style="color: #da70d6;">id</span> ) <span style="color: #da70d6;">INSERT</span> <span style="color: #7f007f;">INTO</span> target(<span style="color: #da70d6;">id</span>, counter) <span style="color: #7f007f;">SELECT</span> <span style="color: #da70d6;">id</span>, <span style="color: #da70d6;">sum</span>(counter) <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">LEFT</span> <span style="color: #7f007f;">JOIN</span> upd <span style="color: #da70d6;">t</span> <span style="color: #7f007f;">USING</span>(<span style="color: #da70d6;">id</span>) <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> <span style="color: #7f007f;">IS</span> <span style="color: #7f007f;">NULL</span> <span style="color: #7f007f;">GROUP</span> <span style="color: #da70d6;">BY</span> s.<span style="color: #da70d6;">id</span> <span style="color: #7f007f;">RETURNING</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> </pre>

<p>That query here is <em>updating</em> all the rows that are known in both the <em>target</em> and the <em>source</em> and returns what we took from the <em>source</em> in the operation, so that we can do an <em>anti-join</em> in the next step of the query, where we're <em>inserting</em> any row that was not taken care of in the <em>update</em> part of the statement.</p> <p>Note that when the batch gets to bigger size it's usually better to join against the <em>target</em> table in the INSERT statement, because that will have an <em>index</em> on the join key.</p> <h3>Concurrency patterns</h3> <p class="first">Now, you will tell me that we just solved the UPSERT problem. Well what happens if more than one transaction is trying to do the WITH (UPDATE) INSERT dance at the same time? It's a single <em>statement</em>, so it's a single <em>snapshot</em>. What can go wrong?</p> <center> <p><img src="../../../images/gophermegaphones.480.jpg" alt=""></p> </center> <center> <p><em>Concurrent processing</em></p> </center> <p>What happens is that as soon as the concurrent sources contain some data for the same <em>primary key</em>, you get a <em>duplicate key</em> error on the insert. As both the transactions are concurrent, they are seeing the same <em>target</em> table where the new data does not exists, and both will conclude that they need to INSERT the new data into the <em>target</em> table.</p> <p>There are two things that you can do to avoid the problem. The first thing is to make it so that you're doing only one <em>batch update</em> at any time, by architecting your application around that constraint. That's the most effective way around the problem, but not the most practical.</p> <p>The other thing you can do, is force the concurrent transactions to serialize one after the other, using an <a href="http://www.postgresql.org/docs/9.2/static/explicit-locking.html">explicit locking</a> statement:</p> <pre class="src"> <span style="color: #da70d6;">LOCK</span> <span style="color: #7f007f;">TABLE</span> target <span style="color: #7f007f;">IN</span> <span style="color: #da70d6;">SHARE</span> <span style="color: #da70d6;">ROW</span> <span style="color: #da70d6;">EXCLUSIVE</span> <span style="color: #da70d6;">MODE</span>; </pre> <p>That <em>lock level</em> is not automatically acquired by any PostgreSQL command, so the only way it helps you is when you're doing that for every transaction you want to serialize. When you know you're not at risk (that is, when not playing the <em>insert or update</em> dance), you can omit taking that <em>lock</em>.</p> <h3>Conclusion</h3> <center> <p><a class="image-link" href="http://www.flickr.com/photos/asquarephotography/6841106459/in/photostream/"> <img src="../../../images/stack-of-old-books.jpg"></a></p> </center> <p>The SQL language has its quirks, that's true. It's been made for efficient data processing, and with recent enough <a href="http://www.postgresql.org/about/featurematrix/">PostgreSQL releases</a> you even have some advanced pipelining facilities included in the language. Properly learning how to make the most out of that old component of your programming stack still makes a lot of sense today!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 15 Mar 2013 10:47:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/03/15-batch-update.html</guid> </item> <item> <title>Batch Update</title> <link>http://tapoueh.org/blog/2013/03/15-batch-update.html</link> <description><![CDATA[<p>Performance consulting involves some tricks that you have to teach over and over again. One of them is that SQL tends to be so much better at dealing with plenty of rows in a single statement when compared to running as many statements, each one against a single row.</p>

<center> <p><img src="../../../images/Home-Brewing.jpg" alt=""></p> </center> <center> <p><em>Another kind of Batch to update</em></p> </center> <p>So when you need to UPDATE a bunch of rows from a given source, remember that you can actually use a JOIN in the <em>update</em> statement. Either the source of data is already in the database, in which case it's as simple as using the FROM clause in the <em>update</em> statement, or it's not, and we're getting back to that in a minute.</p> <h3>UPDATE FROM</h3> <p class="first">It's all about using that FROM clause in an <em>update</em> statement, right?</p> <pre class="src">

<span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span> <span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter, <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span> </pre>

<p>Using that, you can actually update thousands of rows in our <em>target</em> table in a single statement, and you can't really get faster than that.</p> <h3>Preparing the Batch</h3> <p class="first">Now, if you happen to have the source data in your application process' memory, the previous bits is not doing you any good, you think. Well, the trick is that pushing your in-memory data into the database and then joining against the now local source of data is generally faster than looping in the application and having to do a whole network <em>round trip</em> per row.</p> <center> <p><img src="../../../images/round-trip.png" alt=""></p> </center> <center> <p><em>What about</em> <strong><em>that</em></strong> <em>round trip?</em></p> </center> <p>Let's see how it goes:</p> <pre class="src"> <span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">TEMP</span> <span style="color: #7f007f;">TABLE</span> <span style="color: #da70d6;">source</span>(<span style="color: #7f007f;">LIKE</span> target <span style="color: #da70d6;">INCLUDING</span> <span style="color: #7f007f;">ALL</span>) <span style="color: #7f007f;">ON</span> <span style="color: #da70d6;">COMMIT</span> <span style="color: #da70d6;">DROP</span>;

<span style="color: #da70d6;">COPY</span> <span style="color: #da70d6;">source</span> <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">STDIN</span>;

<span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span>

<span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter, <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span> </pre>

<p>As we're talking about performances, the trick here is to use the <a href="http://www.postgresql.org/docs/9.2/static/sql-copy.html">COPY</a> protocol to fill in the <em>temporary table</em> we just create to hold our data. So we're now sending the whole data set in a temporary location in the database, then using that as the UPDATE source. And that's way faster than doing a separate UPDATE statement per row in your batch, even for small batches.</p> <p>Also, rather than using the SQL COPY command, you might want to look up the docs of the PostgreSQL driver you are currently using in your application, it certainly includes some higher level facilities to deal with pushing the data into the streaming protocol.</p> <h3>Insert or Update</h3> <p class="first">And now sometime some of the rows in the batch have to be <em>updated</em> while some others are new and must be inserted. How do you do that? Well, PostgreSQL 9.1 brings on the table WITH support for all <a href="http://www.postgresql.org/docs/9.2/static/dml.html">DML</a> queries, which means that you can do the following just fine:</p> <pre class="src"> <span style="color: #7f007f;">WITH</span> upd <span style="color: #7f007f;">AS</span> (

<span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span> <span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter, <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span> <span style="color: #7f007f;">RETURNING</span> s.<span style="color: #da70d6;">id</span> ) <span style="color: #da70d6;">INSERT</span> <span style="color: #7f007f;">INTO</span> target(<span style="color: #da70d6;">id</span>, counter) <span style="color: #7f007f;">SELECT</span> <span style="color: #da70d6;">id</span>, <span style="color: #da70d6;">sum</span>(counter) <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">LEFT</span> <span style="color: #7f007f;">JOIN</span> upd <span style="color: #7f007f;">USING</span>(<span style="color: #da70d6;">id</span>) <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> <span style="color: #7f007f;">IS</span> <span style="color: #7f007f;">NULL</span> <span style="color: #7f007f;">GROUP</span> <span style="color: #da70d6;">BY</span> s.<span style="color: #da70d6;">id</span> <span style="color: #7f007f;">RETURNING</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> </pre>

<p>That query here is <em>updating</em> all the rows that are known in both the <em>target</em> and the <em>source</em> and returns what we took from the <em>source</em> in the operation, so that we can do an <em>anti-join</em> in the next step of the query, where we're <em>inserting</em> any row that was not taken care of in the <em>update</em> part of the statement.</p> <p>Note that when the batch gets to bigger size it's usually better to join against the <em>target</em> table in the INSERT statement, because that will have an <em>index</em> on the join key.</p> <h3>Concurrency patterns</h3> <p class="first">Now, you will tell me that we just solved the UPSERT problem. Well what happens if more than one transaction is trying to do the WITH (UPDATE) INSERT dance at the same time? It's a single <em>statement</em>, so it's a single <em>snapshot</em>. What can go wrong?</p> <center> <p><img src="../../../images/gophermegaphones.480.jpg" alt=""></p> </center> <center> <p><em>Concurrent processing</em></p> </center> <p>What happens is that as soon as the concurrent sources contain some data for the same <em>primary key</em>, you get a <em>duplicate key</em> error on the insert. As both the transactions are concurrent, they are seeing the same <em>target</em> table where the new data does not exists, and both will conclude that they need to INSERT the new data into the <em>target</em> table.</p> <p>There are two things that you can do to avoid the problem. The first thing is to make it so that you're doing only one <em>batch update</em> at any time, by architecting your application around that constraint. That's the most effective way around the problem, but not the most practical.</p> <p>The other thing you can do, is force the concurrent transactions to serialize one after the other, using an <a href="http://www.postgresql.org/docs/9.2/static/explicit-locking.html">explicit locking</a> statement:</p> <pre class="src"> <span style="color: #da70d6;">LOCK</span> <span style="color: #7f007f;">TABLE</span> target <span style="color: #7f007f;">IN</span> <span style="color: #da70d6;">SHARE</span> <span style="color: #da70d6;">ROW</span> <span style="color: #da70d6;">EXCLUSIVE</span> <span style="color: #da70d6;">MODE</span>; </pre> <p>That <em>lock level</em> is not automatically acquired by any PostgreSQL command, so the only way it helps you is when you're doing that for every transaction you want to serialize. When you know you're not at risk (that is, when not playing the <em>insert or update</em> dance), you can omit taking that <em>lock</em>.</p> <h3>Conclusion</h3> <center> <p><a class="image-link" href="http://www.flickr.com/photos/asquarephotography/6841106459/in/photostream/"> <img src="../../../images/stack-of-old-books.jpg"></a></p> </center> <p>The SQL language has its quirks, that's true. It's been made for efficient data processing, and with recent enough <a href="http://www.postgresql.org/about/featurematrix/">PostgreSQL releases</a> you even have some advanced pipelining facilities included in the language. Properly learning how to make the most out of that old component of your programming stack still makes a lot of sense today!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 15 Mar 2013 10:47:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/03/15-batch-update.html</guid> </item> <item> <title>Batch Update</title> <link>http://tapoueh.org/blog/2013/03/15-batch-update.html</link> <description><![CDATA[<p>Performance consulting involves some tricks that you have to teach over and over again. One of them is that SQL tends to be so much better at dealing with plenty of rows in a single statement when compared to running as many statements, each one against a single row.</p>

<center> <p><img src="../../../images/Home-Brewing.jpg" alt=""></p> </center> <center> <p><em>Another kind of Batch to update</em></p> </center> <p>So when you need to UPDATE a bunch of rows from a given source, remember that you can actually use a JOIN in the <em>update</em> statement. Either the source of data is already in the database, in which case it's as simple as using the FROM clause in the <em>update</em> statement, or it's not, and we're getting back to that in a minute.</p> <h3>UPDATE FROM</h3> <p class="first">It's all about using that FROM clause in an <em>update</em> statement, right?</p> <pre class="src">

<span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span> <span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter, <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span> </pre>

<p>Using that, you can actually update thousands of rows in our <em>target</em> table in a single statement, and you can't really get faster than that.</p> <h3>Preparing the Batch</h3> <p class="first">Now, if you happen to have the source data in your application process' memory, the previous bits is not doing you any good, you think. Well, the trick is that pushing your in-memory data into the database and then joining against the now local source of data is generally faster than looping in the application and having to do a whole network <em>round trip</em> per row.</p> <center> <p><img src="../../../images/round-trip.png" alt=""></p> </center> <center> <p><em>What about</em> <strong><em>that</em></strong> <em>round trip?</em></p> </center> <p>Let's see how it goes:</p> <pre class="src"> <span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">TEMP</span> <span style="color: #7f007f;">TABLE</span> <span style="color: #da70d6;">source</span>(<span style="color: #7f007f;">LIKE</span> target <span style="color: #da70d6;">INCLUDING</span> <span style="color: #7f007f;">ALL</span>) <span style="color: #7f007f;">ON</span> <span style="color: #da70d6;">COMMIT</span> <span style="color: #da70d6;">DROP</span>;

<span style="color: #da70d6;">COPY</span> <span style="color: #da70d6;">source</span> <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">STDIN</span>;

<span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span>

<span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter, <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span> </pre>

<p>As we're talking about performances, the trick here is to use the <a href="http://www.postgresql.org/docs/9.2/static/sql-copy.html">COPY</a> protocol to fill in the <em>temporary table</em> we just create to hold our data. So we're now sending the whole data set in a temporary location in the database, then using that as the UPDATE source. And that's way faster than doing a separate UPDATE statement per row in your batch, even for small batches.</p> <p>Also, rather than using the SQL COPY command, you might want to look up the docs of the PostgreSQL driver you are currently using in your application, it certainly includes some higher level facilities to deal with pushing the data into the streaming protocol.</p> <h3>Insert or Update</h3> <p class="first">And now sometime some of the rows in the batch have to be <em>updated</em> while some others are new and must be inserted. How do you do that? Well, PostgreSQL 9.1 brings on the table WITH support for all <a href="http://www.postgresql.org/docs/9.2/static/dml.html">DML</a> queries, which means that you can do the following just fine:</p> <pre class="src"> <span style="color: #7f007f;">WITH</span> upd <span style="color: #7f007f;">AS</span> (

<span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span> <span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter, <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span> <span style="color: #7f007f;">RETURNING</span> s.<span style="color: #da70d6;">id</span> ) <span style="color: #da70d6;">INSERT</span> <span style="color: #7f007f;">INTO</span> target(<span style="color: #da70d6;">id</span>, counter) <span style="color: #7f007f;">SELECT</span> <span style="color: #da70d6;">id</span>, <span style="color: #da70d6;">sum</span>(counter) <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">LEFT</span> <span style="color: #7f007f;">JOIN</span> upd <span style="color: #7f007f;">USING</span>(<span style="color: #da70d6;">id</span>) <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> <span style="color: #7f007f;">IS</span> <span style="color: #7f007f;">NULL</span> <span style="color: #7f007f;">GROUP</span> <span style="color: #da70d6;">BY</span> s.<span style="color: #da70d6;">id</span> <span style="color: #7f007f;">RETURNING</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> </pre>

<p>That query here is <em>updating</em> all the rows that are known in both the <em>target</em> and the <em>source</em> and returns what we took from the <em>source</em> in the operation, so that we can do an <em>anti-join</em> in the next step of the query, where we're <em>inserting</em> any row that was not taken care of in the <em>update</em> part of the statement.</p> <p>Note that when the batch gets to bigger size it's usually better to join against the <em>target</em> table in the INSERT statement, because that will have an <em>index</em> on the join key.</p> <h3>Concurrency patterns</h3> <p class="first">Now, you will tell me that we just solved the UPSERT problem. Well what happens if more than one transaction is trying to do the WITH (UPDATE) INSERT dance at the same time? It's a single <em>statement</em>, so it's a single <em>snapshot</em>. What can go wrong?</p> <center> <p><img src="../../../images/gophermegaphones.480.jpg" alt=""></p> </center> <center> <p><em>Concurrent processing</em></p> </center> <p>What happens is that as soon as the concurrent sources contain some data for the same <em>primary key</em>, you get a <em>duplicate key</em> error on the insert. As both the transactions are concurrent, they are seeing the same <em>target</em> table where the new data does not exists, and both will conclude that they need to INSERT the new data into the <em>target</em> table.</p> <p>There are two things that you can do to avoid the problem. The first thing is to make it so that you're doing only one <em>batch update</em> at any time, by architecting your application around that constraint. That's the most effective way around the problem, but not the most practical.</p> <p>The other thing you can do, is force the concurrent transactions to serialize one after the other, using an <a href="http://www.postgresql.org/docs/9.2/static/explicit-locking.html">explicit locking</a> statement:</p> <pre class="src"> <span style="color: #da70d6;">LOCK</span> <span style="color: #7f007f;">TABLE</span> target <span style="color: #7f007f;">IN</span> <span style="color: #da70d6;">SHARE</span> <span style="color: #da70d6;">ROW</span> <span style="color: #da70d6;">EXCLUSIVE</span> <span style="color: #da70d6;">MODE</span>; </pre> <p>That <em>lock level</em> is not automatically acquired by any PostgreSQL command, so the only way it helps you is when you're doing that for every transaction you want to serialize. When you know you're not at risk (that is, when not playing the <em>insert or update</em> dance), you can omit taking that <em>lock</em>.</p> <h3>Conclusion</h3> <center> <p><a class="image-link" href="http://www.flickr.com/photos/asquarephotography/6841106459/in/photostream/"> <img src="../../../images/stack-of-old-books.jpg"></a></p> </center> <p>The SQL language has its quirks, that's true. It's been made for efficient data processing, and with recent enough <a href="http://www.postgresql.org/about/featurematrix/">PostgreSQL releases</a> you even have some advanced pipelining facilities included in the language. Properly learning how to make the most out of that old component of your programming stack still makes a lot of sense today!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 15 Mar 2013 10:47:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/03/15-batch-update.html</guid> </item> <item> <title>Emacs Conference</title> <link>http://tapoueh.org/blog/2013/03/04-Emacs-Conference.html</link> <description><![CDATA[<p>The <a href="http://emacsconf.herokuapp.com/">Emacs Conference</a> is happening, it's real, and it will take place at the end of this month in London. Check it out, and register at <a href="http://emacsconf.eventbrite.co.uk/">Emacs Conference Event Brite</a>. It's free and there's still some availability.</p>

<center> <p><img src="../../../images/emacs-rocks-logo.png" alt=""></p> </center> <center> <p><em>It's all about Emacs, and it rocks!</em></p> </center> <p>We have a great line-up for this conference, which makes me proud to be able to be there. If you've ever been paying attention when using <a href="http://www.gnu.org/software/emacs/">Emacs</a> then you've already heard those names: <a href="http://sachachua.com/blog/">Sacha Chua</a> is frequently blogging about how she manages to improve her workflow thanks to <a href="http://www.gnu.org/software/emacs/emacs-lisp-intro/">Emacs Lisp</a>, <a href="https://github.com/jwiegley">John Wiegley</a> is a proficient Emacs contributor maybe best known for his <a href="https://github.com/ledger/ledger">ledger</a> <em>Emacs Mode</em>, then we have <a href="http://www.lukego.com/">Luke Gorrie</a> who hacked up <a href="http://wingolog.org/archives/2006/01/02/slime">SLIME</a> among other things, we also have <a href="http://nic.ferrier.me.uk/">Nic Ferrier</a> who is starting a revolution in how to use <em>Emacs Lisp</em> with <a href="http://elnode.org/">elnode</a>. And more! Including <a href="http://en.wikipedia.org/wiki/Steve_Yegge">Steve Yegge</a>!</p> <center> <p>See you there in London.</p> </center> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 04 Mar 2013 13:58:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/03/04-Emacs-Conference.html</guid> </item> <item> <title>HyperLogLog Unions</title> <link>http://tapoueh.org/blog/2013/02/26-hll-union.html</link> <description><![CDATA[<p>In the article from yesterday we talked about <a href="http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html">PostgreSQL HyperLogLog</a> with some details. The real magic of that extension has been skimmed over though, and needs another very small article all by itself, in case you missed it.</p>

<center> <p><img src="../../../images/SetOperations.480.png" alt=""></p> </center> <center> <p><em>Which Set Operation do you want for counting unique values?</em></p> </center> <p>The first query here has the default level of magic in it, really. What happens is that each time we do an update of the <em>HyperLogLog</em> <em>hash</em> value, we update some data which are allowing us to compute its cardinality.</p> <pre class="src"> &gt; <span style"color: #7f007f;">select</span> <span style="color: #228b22;">date</span>,

#users <span style="color: #7f007f;">as</span> daily, pg_column_size(users) <span style="color: #7f007f;">as</span> bytes <span style="color: #7f007f;">from</span> daily_uniques <span style="color: #7f007f;">order</span> <span style="color: #da70d6;">by</span> <span style="color: #228b22;">date</span>;

<span style="color: #228b22;">date</span> daily bytes

<span style="color: #b22222;">————+——————+——-

</span> 2013-02-22 401676.779509985 1287
2013-02-23 660187.271908359 1287
2013-02-24 869980.029947449 1287
2013-02-25 580865.296677817 1287
2013-02-26 240569.492722719 1287

(5 <span style="color: #da70d6;">rows</span>) </pre>

<p>And has advertized the data is kept in a static sized data structure. The magic here all happens at hll_add() time, the function you have to call to update the data.</p> <p>Now on to something way more magic!</p> <center> <p><img src="../../../images/aggregates2.jpg" alt=""></p> </center> <center> <p><em>Are those the aggregates you're looking for?</em></p> </center> <pre class="src"> &gt; <span style"color: #7f007f;">select</span> to_char(<span style="color: #228b22;">date</span>, <span style="color: #bc8f8f;">'YYYY/MM'</span>) <span style="color: #7f007f;">as</span> <span style="color: #da70d6;">month</span>,

round(#hll_union_agg(users)) <span style="color: #7f007f;">as</span> monthly <span style="color: #7f007f;">from</span> daily_uniques <span style="color: #7f007f;">group</span> <span style="color: #da70d6;">by</span> 1;

<span style="color: #da70d6;">month</span> monthly

<span style="color: #b22222;">———+———

</span> 2013/02 1960380

(1 <span style="color: #da70d6;">row</span>) </pre>

<p>The <em>HyperLogLog</em> data structure is allowing the implementation of an <strong><em>union</em></strong> algorithm that will be able to compute how many unique values you happen to have registered in both one day and the next. Extended in its general form, and doing SQL, what you get is an <em>aggregate</em> that you can use in GROUP BY constructs and <a href="http://www.postgresql.org/docs/9.2/static/tutorial-window.html">window functions</a>. Did you read about them yet?</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 26 Feb 2013 12:44:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/26-hll-union.html</guid> </item> <item> <title>PostgreSQL HyperLogLog</title> <link>http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</link> <description><![CDATA[<p>If you've been following along at home the newer statistics developments, you might have heard about this new <a href="http://research.google.com/pubs/pub40671.html">State of The Art Cardinality Estimation Algorithm</a> called <a href="http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data/">HyperLogLog</a>. This technique is now available for PostgreSQL in the extension <a href="http://blog.aggregateknowledge.com/2013/02/04/open-source-release-postgresql-hll/">postgresql-hll</a> available at <a href="https://github.com/aggregateknowledge/postgresql-hll">https://github.com/aggregateknowledge/postgresql-hll</a> and soon to be in debian.</p>

<center> <p><img src="../../../images/cardinality1.jpg" alt=""></p> </center> <center> <p><em>How to Compute Cardinality?</em></p> </center> <h3>Installing postgresql-hll</h3> <p class="first">It's as simple as CREATE EXTENSION hll; really, even if to get there you must have installed the <em>package</em> on your system. We did some packaging work for debian and the result should appear soon in a distro near you.</p> <p>Then you also need to keep your data in some table, straight from the documentation we can use that schema:</p> <pre class="src"> <span style="color: #b22222;">— Create the destination table </span><span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">TABLE</span> <span style="color: #0000ff;">daily_uniques</span> ( <span style="color: #228b22;">DATE</span> <span style="color: #228b22;">DATE</span> <span style="color: #7f007f;">UNIQUE</span>, users hll ); </pre> <p>Then to add some data for which you want to know the <em>cardinality</em> of, it's as simple as in the following UPDATE statement:</p> <pre class="src"> <span style="color: #da70d6;">UPDATE</span> daily_uniques

<span style="color: #da70d6;">SET</span> users = hll_add(users, hll_hash_text(<span style="color: #bc8f8f;">'123.123.123.123'</span>)) <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>; </pre>

<p>So in our example what you see is that we want to decipher how many unique IP addresses we saw, and we do that by first creating a <em>hash</em> of that source data then calling hll_add() with the current value and the hash result.</p> <p>The current value must be initialized using hll_empty().</p> <h3>Concurrency</h3> <p class="first">The most awake readers among you have already spotted that: using an UPDATE on the same row over and over again is a good recipe to kill any form of concurrency, so you don't want to do that on your production setup unless you don't care about those UPDATE waiting piling up in your system.</p> <p>The idea is then to fill-in a queue of updates and asynchronously update the daily_uniques table from that queue, possibly using the hll_add_agg aggregate that the extension provides, so that you do only one update per batch of values to process.</p> <h3>∅: Empty Set and NULL</h3> <center> <p><img src="../../../images/EmptySet_L.gif" alt=""></p> </center> <center> <p><em>Yes there's a <a href="http://www.unicodemap.org/details/0x2205/index.html">unicode</a> entry for that, ∅</em></p> </center> <p>Now, what happens when the batch of new unique values you want to update from is itself empty? Well I would have expected hll_add_agg over an empty set to return an empty hll value, the same as returned by hll_empty(), but it turns out it's returning NULL instead.</p> <p>And then hll_add(users, NULL) will happily return NULL. So the next UPDATE is cancelling all the previous work, which is not nice. We had to cater for that case explicitely in the UPDATE query that's working from the batch of new values to add to our current <em>HyperLogLog</em> hash entry, and I can't resist to show off one of the most awesome PostgreSQL features here: <em>writable CTE</em>.</p> <pre class="src"> <span style="color: #7f007f;">WITH</span> hll(agg) <span style="color: #7f007f;">AS</span> (

<span style="color: #7f007f;">SELECT</span> hll_add_agg(hll_hash_text(<span style="color: #da70d6;">value</span>)) <span style="color: #7f007f;">FROM</span> new_batch ) <span style="color: #da70d6;">UPDATE</span> daily_uniques <span style="color: #da70d6;">SET</span> users = <span style="color: #7f007f;">CASE</span> <span style="color: #7f007f;">WHEN</span> hll.agg <span style="color: #7f007f;">IS</span> <span style="color: #7f007f;">NULL</span> <span style="color: #7f007f;">THEN</span> users <span style="color: #7f007f;">ELSE</span> hll_union(users, hll.agg) <span style="color: #7f007f;">END</span> <span style="color: #7f007f;">FROM</span> hll <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>; </pre>

<p>That's how you protect against an empty set being turned into a NULL. I think the real fix would need to be included in postgresql-hll itself, in making it so that the hll_add_agg aggregate returns hll_empty() on an empty set, and I will report that bug (with that very article as the detailed explanation of it).</p> <h3>Using postgresql-hll</h3> <p class="first">When using postgresql-hll on the production system, we were able to get some good looking numbers from our daily_uniques table:</p> <pre class="src"> <span style="color: #7f007f;">with</span> stats <span style="color: #7f007f;">as</span> (

<span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, #users <span style="color: #7f007f;">as</span> daily, #hll_union_agg(users) <span style="color: #7f007f;">over</span>() <span style="color: #7f007f;">as</span> total <span style="color: #7f007f;">from</span> daily_uniques ) <span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, round(daily) <span style="color: #7f007f;">as</span> daily, round((daily/total*100)::<span style="color: #228b22;">numeric</span>, 2) <span style="color: #7f007f;">as</span> percent <span style="color: #7f007f;">from</span> stats <span style="color: #7f007f;">order</span> <span style="color: #da70d6;">by</span> <span style="color: #228b22;">date</span>;

<span style="color: #228b22;">date</span> daily percent

<span style="color: #b22222;">————+———+———

</span> 2013-02-22 401677 25.19
2013-02-23 660187 41.41
2013-02-24 869980 54.56
2013-02-25 154996 9.72

(4 <span style="color: #da70d6;">rows</span>) </pre>

<p>I coulnd't resist to show off two of my favorite SQL constructs in that example query here, which are the <a href="http://www.postgresql.org/docs/9.2/static/queries-with.html">Common Table Expressions</a> (or CTE) and <a href="http://www.postgresql.org/docs/9.2/static/tutorial-window.html">window functions</a>. If that over() clause reads strange to you, take a minute now and go read about it. Yes, do that now, we're waiting.</p> <p>The data here is showing that we did setup the facility in the middle of the first day, and that the morning's activity is quite low.</p> <h3>Conclusion</h3> <center> <p><img src="../../../images/hll-dv-estimator.png" alt=""></p> </center> <center> <p><em>The <a href="http://blog.aggregateknowledge.com/author/wwkae/">HyperLogLog DV estimator</a></em></p> </center> <p>When using postgresql-hll you need to be careful not to kill your application concurrency abilities, and you need to protect yourself against the ∅ killer too. The other thing to keep in mind is that the numbers you get out of the hll technique are estimates within a given <em>precision</em>, and you might want to read some more about what it means for your intended usage of the feature.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 25 Feb 2013 10:23:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</guid> </item> <item> <title>PostgreSQL HyperLogLog</title> <link>http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</link> <description><![CDATA[<p>If you've been following along at home the newer statistics developments, you might have heard about this new <a href="http://research.google.com/pubs/pub40671.html">State of The Art Cardinality Estimation Algorithm</a> called <a href="http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data/">HyperLogLog</a>. This technique is now available for PostgreSQL in the extension <a href="http://blog.aggregateknowledge.com/2013/02/04/open-source-release-postgresql-hll/">postgresql-hll</a> available at <a href="https://github.com/aggregateknowledge/postgresql-hll">https://github.com/aggregateknowledge/postgresql-hll</a> and soon to be in debian.</p>

<center> <p><img src="../../../images/cardinality1.jpg" alt=""></p> </center> <center> <p><em>How to Compute Cardinality?</em></p> </center> <h3>Installing postgresql-hll</h3> <p class="first">It's as simple as CREATE EXTENSION hll; really, even if to get there you must have installed the <em>package</em> on your system. We did some packaging work for debian and the result should appear soon in a distro near you.</p> <p>Then you also need to keep your data in some table, straight from the documentation we can use that schema:</p> <pre class="src"> <span style="color: #b22222;">— Create the destination table </span><span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">TABLE</span> <span style="color: #0000ff;">daily_uniques</span> ( <span style="color: #228b22;">DATE</span> <span style="color: #228b22;">DATE</span> <span style="color: #7f007f;">UNIQUE</span>, users hll ); </pre> <p>Then to add some data for which you want to know the <em>cardinality</em> of, it's as simple as in the following UPDATE statement:</p> <pre class="src"> <span style="color: #da70d6;">UPDATE</span> daily_uniques

<span style="color: #da70d6;">SET</span> users = hll_add(users, hll_hash_text(<span style="color: #bc8f8f;">'123.123.123.123'</span>)) <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>; </pre>

<p>So in our example what you see is that we want to decipher how many unique IP addresses we saw, and we do that by first creating a <em>hash</em> of that source data then calling hll_add() with the current value and the hash result.</p> <p>The current value must be initialized using hll_empty().</p> <h3>Concurrency</h3> <p class="first">The most awake readers among you have already spotted that: using an UPDATE on the same row over and over again is a good recipe to kill any form of concurrency, so you don't want to do that on your production setup unless you don't care about those UPDATE waiting piling up in your system.</p> <p>The idea is then to fill-in a queue of updates and asynchronously update the daily_uniques table from that queue, possibly using the hll_add_agg aggregate that the extension provides, so that you do only one update per batch of values to process.</p> <h3>∅: Empty Set and NULL</h3> <center> <p><img src="../../../images/EmptySet_L.gif" alt=""></p> </center> <center> <p><em>Yes there's a <a href="http://www.unicodemap.org/details/0x2205/index.html">unicode</a> entry for that, ∅</em></p> </center> <p>Now, what happens when the batch of new unique values you want to update from is itself empty? Well I would have expected hll_add_agg over an empty set to return an empty hll value, the same as returned by hll_empty(), but it turns out it's returning NULL instead.</p> <p>And then hll_add(users, NULL) will happily return NULL. So the next UPDATE is cancelling all the previous work, which is not nice. We had to cater for that case explicitely in the UPDATE query that's working from the batch of new values to add to our current <em>HyperLogLog</em> hash entry, and I can't resist to show off one of the most awesome PostgreSQL features here: <em>writable CTE</em>.</p> <pre class="src"> <span style="color: #7f007f;">WITH</span> hll(agg) <span style="color: #7f007f;">AS</span> (

<span style="color: #7f007f;">SELECT</span> hll_add_agg(hll_hash_text(<span style="color: #da70d6;">value</span>)) <span style="color: #7f007f;">FROM</span> new_batch ) <span style="color: #da70d6;">UPDATE</span> daily_uniques <span style="color: #da70d6;">SET</span> users = <span style="color: #7f007f;">CASE</span> <span style="color: #7f007f;">WHEN</span> hll.agg <span style="color: #7f007f;">IS</span> <span style="color: #7f007f;">NULL</span> <span style="color: #7f007f;">THEN</span> users <span style="color: #7f007f;">ELSE</span> hll_union(users, hll.agg) <span style="color: #7f007f;">END</span> <span style="color: #7f007f;">FROM</span> hll <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>; </pre>

<p>That's how you protect against an empty set being turned into a NULL. I think the real fix would need to be included in postgresql-hll itself, in making it so that the hll_add_agg aggregate returns hll_empty() on an empty set, and I will report that bug (with that very article as the detailed explanation of it).</p> <h3>Using postgresql-hll</h3> <p class="first">When using postgresql-hll on the production system, we were able to get some good looking numbers from our daily_uniques table:</p> <pre class="src"> <span style="color: #7f007f;">with</span> stats <span style="color: #7f007f;">as</span> (

<span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, #users <span style="color: #7f007f;">as</span> daily, #hll_union_agg(users) <span style="color: #7f007f;">over</span>() <span style="color: #7f007f;">as</span> total <span style="color: #7f007f;">from</span> daily_uniques ) <span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, round(daily) <span style="color: #7f007f;">as</span> daily, round((daily/total*100)::<span style="color: #228b22;">numeric</span>, 2) <span style="color: #7f007f;">as</span> percent <span style="color: #7f007f;">from</span> stats <span style="color: #7f007f;">order</span> <span style="color: #da70d6;">by</span> <span style="color: #228b22;">date</span>;

<span style="color: #228b22;">date</span> daily percent

<span style="color: #b22222;">————+———+———

</span> 2013-02-22 401677 25.19
2013-02-23 660187 41.41
2013-02-24 869980 54.56
2013-02-25 154996 9.72

(4 <span style="color: #da70d6;">rows</span>) </pre>

<p>I coulnd't resist to show off two of my favorite SQL constructs in that example query here, which are the <a href="http://www.postgresql.org/docs/9.2/static/queries-with.html">Common Table Expressions</a> (or CTE) and <a href="http://www.postgresql.org/docs/9.2/static/tutorial-window.html">window functions</a>. If that over() clause reads strange to you, take a minute now and go read about it. Yes, do that now, we're waiting.</p> <p>The data here is showing that we did setup the facility in the middle of the first day, and that the morning's activity is quite low.</p> <h3>Conclusion</h3> <center> <p><img src="../../../images/hll-dv-estimator.png" alt=""></p> </center> <center> <p><em>The <a href="http://blog.aggregateknowledge.com/author/wwkae/">HyperLogLog DV estimator</a></em></p> </center> <p>When using postgresql-hll you need to be careful not to kill your application concurrency abilities, and you need to protect yourself against the ∅ killer too. The other thing to keep in mind is that the numbers you get out of the hll technique are estimates within a given <em>precision</em>, and you might want to read some more about what it means for your intended usage of the feature.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 25 Feb 2013 10:23:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</guid> </item> <item> <title>PostgreSQL HyperLogLog</title> <link>http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</link> <description><![CDATA[<p>If you've been following along at home the newer statistics developments, you might have heard about this new <a href="http://research.google.com/pubs/pub40671.html">State of The Art Cardinality Estimation Algorithm</a> called <a href="http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data/">HyperLogLog</a>. This technique is now available for PostgreSQL in the extension <a href="http://blog.aggregateknowledge.com/2013/02/04/open-source-release-postgresql-hll/">postgresql-hll</a> available at <a href="https://github.com/aggregateknowledge/postgresql-hll">https://github.com/aggregateknowledge/postgresql-hll</a> and soon to be in debian.</p>

<center> <p><img src="../../../images/cardinality1.jpg" alt=""></p> </center> <center> <p><em>How to Compute Cardinality?</em></p> </center> <h3>Installing postgresql-hll</h3> <p class="first">It's as simple as CREATE EXTENSION hll; really, even if to get there you must have installed the <em>package</em> on your system. We did some packaging work for debian and the result should appear soon in a distro near you.</p> <p>Then you also need to keep your data in some table, straight from the documentation we can use that schema:</p> <pre class="src"> <span style="color: #b22222;">— Create the destination table </span><span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">TABLE</span> <span style="color: #0000ff;">daily_uniques</span> ( <span style="color: #228b22;">DATE</span> <span style="color: #228b22;">DATE</span> <span style="color: #7f007f;">UNIQUE</span>, users hll ); </pre> <p>Then to add some data for which you want to know the <em>cardinality</em> of, it's as simple as in the following UPDATE statement:</p> <pre class="src"> <span style="color: #da70d6;">UPDATE</span> daily_uniques

<span style="color: #da70d6;">SET</span> users = hll_add(users, hll_hash_text(<span style="color: #bc8f8f;">'123.123.123.123'</span>)) <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>; </pre>

<p>So in our example what you see is that we want to decipher how many unique IP addresses we saw, and we do that by first creating a <em>hash</em> of that source data then calling hll_add() with the current value and the hash result.</p> <p>The current value must be initialized using hll_empty().</p> <h3>Concurrency</h3> <p class="first">The most awake readers among you have already spotted that: using an UPDATE on the same row over and over again is a good recipe to kill any form of concurrency, so you don't want to do that on your production setup unless you don't care about those UPDATE waiting piling up in your system.</p> <p>The idea is then to fill-in a queue of updates and asynchronously update the daily_uniques table from that queue, possibly using the hll_add_agg aggregate that the extension provides, so that you do only one update per batch of values to process.</p> <h3>∅: Empty Set and NULL</h3> <center> <p><img src="../../../images/EmptySet_L.gif" alt=""></p> </center> <center> <p><em>Yes there's a <a href="http://www.unicodemap.org/details/0x2205/index.html">unicode</a> entry for that, ∅</em></p> </center> <p>Now, what happens when the batch of new unique values you want to update from is itself empty? Well I would have expected hll_add_agg over an empty set to return an empty hll value, the same as returned by hll_empty(), but it turns out it's returning NULL instead.</p> <p>And then hll_add(users, NULL) will happily return NULL. So the next UPDATE is cancelling all the previous work, which is not nice. We had to cater for that case explicitely in the UPDATE query that's working from the batch of new values to add to our current <em>HyperLogLog</em> hash entry, and I can't resist to show off one of the most awesome PostgreSQL features here: <em>writable CTE</em>.</p> <pre class="src"> <span style="color: #7f007f;">WITH</span> hll(agg) <span style="color: #7f007f;">AS</span> (

<span style="color: #7f007f;">SELECT</span> hll_add_agg(hll_hash_text(<span style="color: #da70d6;">value</span>)) <span style="color: #7f007f;">FROM</span> new_batch ) <span style="color: #da70d6;">UPDATE</span> daily_uniques <span style="color: #da70d6;">SET</span> users = <span style="color: #7f007f;">CASE</span> <span style="color: #7f007f;">WHEN</span> hll.agg <span style="color: #7f007f;">IS</span> <span style="color: #7f007f;">NULL</span> <span style="color: #7f007f;">THEN</span> users <span style="color: #7f007f;">ELSE</span> hll_union(users, hll.agg) <span style="color: #7f007f;">END</span> <span style="color: #7f007f;">FROM</span> hll <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>; </pre>

<p>That's how you protect against an empty set being turned into a NULL. I think the real fix would need to be included in postgresql-hll itself, in making it so that the hll_add_agg aggregate returns hll_empty() on an empty set, and I will report that bug (with that very article as the detailed explanation of it).</p> <h3>Using postgresql-hll</h3> <p class="first">When using postgresql-hll on the production system, we were able to get some good looking numbers from our daily_uniques table:</p> <pre class="src"> <span style="color: #7f007f;">with</span> stats <span style="color: #7f007f;">as</span> (

<span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, #users <span style="color: #7f007f;">as</span> daily, #hll_union_agg(users) <span style="color: #7f007f;">over</span>() <span style="color: #7f007f;">as</span> total <span style="color: #7f007f;">from</span> daily_uniques ) <span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, round(daily) <span style="color: #7f007f;">as</span> daily, round((daily/total*100)::<span style="color: #228b22;">numeric</span>, 2) <span style="color: #7f007f;">as</span> percent <span style="color: #7f007f;">from</span> stats <span style="color: #7f007f;">order</span> <span style="color: #da70d6;">by</span> <span style="color: #228b22;">date</span>;

<span style="color: #228b22;">date</span> daily percent

<span style="color: #b22222;">————+———+———

</span> 2013-02-22 401677 25.19
2013-02-23 660187 41.41
2013-02-24 869980 54.56
2013-02-25 154996 9.72

(4 <span style="color: #da70d6;">rows</span>) </pre>

<p>I coulnd't resist to show off two of my favorite SQL constructs in that example query here, which are the <a href="http://www.postgresql.org/docs/9.2/static/queries-with.html">Common Table Expressions</a> (or CTE) and <a href="http://www.postgresql.org/docs/9.2/static/tutorial-window.html">window functions</a>. If that over() clause reads strange to you, take a minute now and go read about it. Yes, do that now, we're waiting.</p> <p>The data here is showing that we did setup the facility in the middle of the first day, and that the morning's activity is quite low.</p> <h3>Conclusion</h3> <center> <p><img src="../../../images/hll-dv-estimator.png" alt=""></p> </center> <center> <p><em>The <a href="http://blog.aggregateknowledge.com/author/wwkae/">HyperLogLog DV estimator</a></em></p> </center> <p>When using postgresql-hll you need to be careful not to kill your application concurrency abilities, and you need to protect yourself against the ∅ killer too. The other thing to keep in mind is that the numbers you get out of the hll technique are estimates within a given <em>precision</em>, and you might want to read some more about what it means for your intended usage of the feature.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 25 Feb 2013 10:23:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</guid> </item> <item> <title>PostgreSQL HyperLogLog</title> <link>http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</link> <description><![CDATA[<p>If you've been following along at home the newer statistics developments, you might have heard about this new <a href="http://research.google.com/pubs/pub40671.html">State of The Art Cardinality Estimation Algorithm</a> called <a href="http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data/">HyperLogLog</a>. This technique is now available for PostgreSQL in the extension <a href="http://blog.aggregateknowledge.com/2013/02/04/open-source-release-postgresql-hll/">postgresql-hll</a> available at <a href="https://github.com/aggregateknowledge/postgresql-hll">https://github.com/aggregateknowledge/postgresql-hll</a> and soon to be in debian.</p>

<center> <p><img src="../../../images/cardinality1.jpg" alt=""></p> </center> <center> <p><em>How to Compute Cardinality?</em></p> </center> <h3>Installing postgresql-hll</h3> <p class="first">It's as simple as CREATE EXTENSION hll; really, even if to get there you must have installed the <em>package</em> on your system. We did some packaging work for debian and the result should appear soon in a distro near you.</p> <p>Then you also need to keep your data in some table, straight from the documentation we can use that schema:</p> <pre class="src"> <span style="color: #b22222;">— Create the destination table </span><span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">TABLE</span> <span style="color: #0000ff;">daily_uniques</span> ( <span style="color: #228b22;">DATE</span> <span style="color: #228b22;">DATE</span> <span style="color: #7f007f;">UNIQUE</span>, users hll ); </pre> <p>Then to add some data for which you want to know the <em>cardinality</em> of, it's as simple as in the following UPDATE statement:</p> <pre class="src"> <span style="color: #da70d6;">UPDATE</span> daily_uniques

<span style="color: #da70d6;">SET</span> users = hll_add(users, hll_hash_text(<span style="color: #bc8f8f;">'123.123.123.123'</span>)) <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>; </pre>

<p>So in our example what you see is that we want to decipher how many unique IP addresses we saw, and we do that by first creating a <em>hash</em> of that source data then calling hll_add() with the current value and the hash result.</p> <p>The current value must be initialized using hll_empty().</p> <h3>Concurrency</h3> <p class="first">The most awake readers among you have already spotted that: using an UPDATE on the same row over and over again is a good recipe to kill any form of concurrency, so you don't want to do that on your production setup unless you don't care about those UPDATE waiting piling up in your system.</p> <p>The idea is then to fill-in a queue of updates and asynchronously update the daily_uniques table from that queue, possibly using the hll_add_agg aggregate that the extension provides, so that you do only one update per batch of values to process.</p> <h3>∅: Empty Set and NULL</h3> <center> <p><img src="../../../images/EmptySet_L.gif" alt=""></p> </center> <center> <p><em>Yes there's a <a href="http://www.unicodemap.org/details/0x2205/index.html">unicode</a> entry for that, ∅</em></p> </center> <p>Now, what happens when the batch of new unique values you want to update from is itself empty? Well I would have expected hll_add_agg over an empty set to return an empty hll value, the same as returned by hll_empty(), but it turns out it's returning NULL instead.</p> <p>And then hll_add(users, NULL) will happily return NULL. So the next UPDATE is cancelling all the previous work, which is not nice. We had to cater for that case explicitely in the UPDATE query that's working from the batch of new values to add to our current <em>HyperLogLog</em> hash entry, and I can't resist to show off one of the most awesome PostgreSQL features here: <em>writable CTE</em>.</p> <pre class="src"> <span style="color: #7f007f;">WITH</span> hll(agg) <span style="color: #7f007f;">AS</span> (

<span style="color: #7f007f;">SELECT</span> hll_add_agg(hll_hash_text(<span style="color: #da70d6;">value</span>)) <span style="color: #7f007f;">FROM</span> new_batch ) <span style="color: #da70d6;">UPDATE</span> daily_uniques <span style="color: #da70d6;">SET</span> users = <span style="color: #7f007f;">CASE</span> <span style="color: #7f007f;">WHEN</span> hll.agg <span style="color: #7f007f;">IS</span> <span style="color: #7f007f;">NULL</span> <span style="color: #7f007f;">THEN</span> users <span style="color: #7f007f;">ELSE</span> hll_union(users, hll.agg) <span style="color: #7f007f;">END</span> <span style="color: #7f007f;">FROM</span> hll <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>; </pre>

<p>That's how you protect against an empty set being turned into a NULL. I think the real fix would need to be included in postgresql-hll itself, in making it so that the hll_add_agg aggregate returns hll_empty() on an empty set, and I will report that bug (with that very article as the detailed explanation of it).</p> <h3>Using postgresql-hll</h3> <p class="first">When using postgresql-hll on the production system, we were able to get some good looking numbers from our daily_uniques table:</p> <pre class="src"> <span style="color: #7f007f;">with</span> stats <span style="color: #7f007f;">as</span> (

<span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, #users <span style="color: #7f007f;">as</span> daily, #hll_union_agg(users) <span style="color: #7f007f;">over</span>() <span style="color: #7f007f;">as</span> total <span style="color: #7f007f;">from</span> daily_uniques ) <span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, round(daily) <span style="color: #7f007f;">as</span> daily, round((daily/total*100)::<span style="color: #228b22;">numeric</span>, 2) <span style="color: #7f007f;">as</span> percent <span style="color: #7f007f;">from</span> stats <span style="color: #7f007f;">order</span> <span style="color: #da70d6;">by</span> <span style="color: #228b22;">date</span>;

<span style="color: #228b22;">date</span> daily percent

<span style="color: #b22222;">————+———+———

</span> 2013-02-22 401677 25.19
2013-02-23 660187 41.41
2013-02-24 869980 54.56
2013-02-25 154996 9.72

(4 <span style="color: #da70d6;">rows</span>) </pre>

<p>I coulnd't resist to show off two of my favorite SQL constructs in that example query here, which are the <a href="http://www.postgresql.org/docs/9.2/static/queries-with.html">Common Table Expressions</a> (or CTE) and <a href="http://www.postgresql.org/docs/9.2/static/tutorial-window.html">window functions</a>. If that over() clause reads strange to you, take a minute now and go read about it. Yes, do that now, we're waiting.</p> <p>The data here is showing that we did setup the facility in the middle of the first day, and that the morning's activity is quite low.</p> <h3>Conclusion</h3> <center> <p><img src="../../../images/hll-dv-estimator.png" alt=""></p> </center> <center> <p><em>The <a href="http://blog.aggregateknowledge.com/author/wwkae/">HyperLogLog DV estimator</a></em></p> </center> <p>When using postgresql-hll you need to be careful not to kill your application concurrency abilities, and you need to protect yourself against the ∅ killer too. The other thing to keep in mind is that the numbers you get out of the hll technique are estimates within a given <em>precision</em>, and you might want to read some more about what it means for your intended usage of the feature.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 25 Feb 2013 10:23:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</guid> </item> <item> <title>PostgreSQL HyperLogLog</title> <link>http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</link> <description><![CDATA[<p>If you've been following along at home the newer statistics developments, you might have heard about this new <a href="http://research.google.com/pubs/pub40671.html">State of The Art Cardinality Estimation Algorithm</a> called <a href="http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data/">HyperLogLog</a>. This technique is now available for PostgreSQL in the extension <a href="http://blog.aggregateknowledge.com/2013/02/04/open-source-release-postgresql-hll/">postgresql-hll</a> available at <a href="https://github.com/aggregateknowledge/postgresql-hll">https://github.com/aggregateknowledge/postgresql-hll</a> and soon to be in debian.</p>

<center> <p><img src="../../../images/cardinality1.jpg" alt=""></p> </center> <center> <p><em>How to Compute Cardinality?</em></p> </center> <h3>Installing postgresql-hll</h3> <p class="first">It's as simple as CREATE EXTENSION hll; really, even if to get there you must have installed the <em>package</em> on your system. We did some packaging work for debian and the result should appear soon in a distro near you.</p> <p>Then you also need to keep your data in some table, straight from the documentation we can use that schema:</p> <pre class="src"> <span style="color: #b22222;">— Create the destination table </span><span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">TABLE</span> <span style="color: #0000ff;">daily_uniques</span> ( <span style="color: #228b22;">DATE</span> <span style="color: #228b22;">DATE</span> <span style="color: #7f007f;">UNIQUE</span>, users hll ); </pre> <p>Then to add some data for which you want to know the <em>cardinality</em> of, it's as simple as in the following UPDATE statement:</p> <pre class="src"> <span style="color: #da70d6;">UPDATE</span> daily_uniques

<span style="color: #da70d6;">SET</span> users = hll_add(users, hll_hash_text(<span style="color: #bc8f8f;">'123.123.123.123'</span>)) <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>; </pre>

<p>So in our example what you see is that we want to decipher how many unique IP addresses we saw, and we do that by first creating a <em>hash</em> of that source data then calling hll_add() with the current value and the hash result.</p> <p>The current value must be initialized using hll_empty().</p> <h3>Concurrency</h3> <p class="first">The most awake readers among you have already spotted that: using an UPDATE on the same row over and over again is a good recipe to kill any form of concurrency, so you don't want to do that on your production setup unless you don't care about those UPDATE waiting piling up in your system.</p> <p>The idea is then to fill-in a queue of updates and asynchronously update the daily_uniques table from that queue, possibly using the hll_add_agg aggregate that the extension provides, so that you do only one update per batch of values to process.</p> <h3>∅: Empty Set and NULL</h3> <center> <p><img src="../../../images/EmptySet_L.gif" alt=""></p> </center> <center> <p><em>Yes there's a <a href="http://www.unicodemap.org/details/0x2205/index.html">unicode</a> entry for that, ∅</em></p> </center> <p>Now, what happens when the batch of new unique values you want to update from is itself empty? Well I would have expected hll_add_agg over an empty set to return an empty hll value, the same as returned by hll_empty(), but it turns out it's returning NULL instead.</p> <p>And then hll_add(users, NULL) will happily return NULL. So the next UPDATE is cancelling all the previous work, which is not nice. We had to cater for that case explicitely in the UPDATE query that's working from the batch of new values to add to our current <em>HyperLogLog</em> hash entry, and I can't resist to show off one of the most awesome PostgreSQL features here: <em>writable CTE</em>.</p> <pre class="src"> <span style="color: #7f007f;">WITH</span> hll(agg) <span style="color: #7f007f;">AS</span> (

<span style="color: #7f007f;">SELECT</span> hll_add_agg(hll_hash_text(<span style="color: #da70d6;">value</span>)) <span style="color: #7f007f;">FROM</span> new_batch ) <span style="color: #da70d6;">UPDATE</span> daily_uniques <span style="color: #da70d6;">SET</span> users = <span style="color: #7f007f;">CASE</span> <span style="color: #7f007f;">WHEN</span> hll.agg <span style="color: #7f007f;">IS</span> <span style="color: #7f007f;">NULL</span> <span style="color: #7f007f;">THEN</span> users <span style="color: #7f007f;">ELSE</span> hll_union(users, hll.agg) <span style="color: #7f007f;">END</span> <span style="color: #7f007f;">FROM</span> hll <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>; </pre>

<p>That's how you protect against an empty set being turned into a NULL. I think the real fix would need to be included in postgresql-hll itself, in making it so that the hll_add_agg aggregate returns hll_empty() on an empty set, and I will report that bug (with that very article as the detailed explanation of it).</p> <h3>Using postgresql-hll</h3> <p class="first">When using postgresql-hll on the production system, we were able to get some good looking numbers from our daily_uniques table:</p> <pre class="src"> <span style="color: #7f007f;">with</span> stats <span style="color: #7f007f;">as</span> (

<span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, #users <span style="color: #7f007f;">as</span> daily, #hll_union_agg(users) <span style="color: #7f007f;">over</span>() <span style="color: #7f007f;">as</span> total <span style="color: #7f007f;">from</span> daily_uniques ) <span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, round(daily) <span style="color: #7f007f;">as</span> daily, round((daily/total*100)::<span style="color: #228b22;">numeric</span>, 2) <span style="color: #7f007f;">as</span> percent <span style="color: #7f007f;">from</span> stats <span style="color: #7f007f;">order</span> <span style="color: #da70d6;">by</span> <span style="color: #228b22;">date</span>;

<span style="color: #228b22;">date</span> daily percent

<span style="color: #b22222;">————+———+———

</span> 2013-02-22 401677 25.19
2013-02-23 660187 41.41
2013-02-24 869980 54.56
2013-02-25 154996 9.72

(4 <span style="color: #da70d6;">rows</span>) </pre>

<p>I coulnd't resist to show off two of my favorite SQL constructs in that example query here, which are the <a href="http://www.postgresql.org/docs/9.2/static/queries-with.html">Common Table Expressions</a> (or CTE) and <a href="http://www.postgresql.org/docs/9.2/static/tutorial-window.html">window functions</a>. If that over() clause reads strange to you, take a minute now and go read about it. Yes, do that now, we're waiting.</p> <p>The data here is showing that we did setup the facility in the middle of the first day, and that the morning's activity is quite low.</p> <h3>Conclusion</h3> <center> <p><img src="../../../images/hll-dv-estimator.png" alt=""></p> </center> <center> <p><em>The <a href="http://blog.aggregateknowledge.com/author/wwkae/">HyperLogLog DV estimator</a></em></p> </center> <p>When using postgresql-hll you need to be careful not to kill your application concurrency abilities, and you need to protect yourself against the ∅ killer too. The other thing to keep in mind is that the numbers you get out of the hll technique are estimates within a given <em>precision</em>, and you might want to read some more about what it means for your intended usage of the feature.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 25 Feb 2013 10:23:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</guid> </item> <item> <title>Playing with pgloader</title> <link>http://tapoueh.org/blog/2013/02/12-playing-with-pgloader.html</link> <description><![CDATA[<p>While making progress with both <a href="http://wiki.postgresql.org/wiki/Event_Triggers">Event Triggers</a> and <a href="http://tapoueh.org/blog/2013/01/08-Extensions-Templates.html">Extension Templates</a>, I needed to make a little break. My current keeping sane mental exercise seems to mainly involve using <em>Common Lisp</em>, a programming language that ships with about all the building blocks you need.</p>

<center> <p><img src="../../../images/made-with-lisp.png" alt=""></p> </center> <center> <p><em>Yes, that old language brings so much on the table</em></p> </center> <p>When using <em>Common Lisp</em>, you have an awesome interactive development environment where you can redefine function and objects <em>while testing them</em>. That means you don't have to quit the interpreter, reload the new version of the code and put the interactive test case together all over again after a change. Just evaluate the change in the interactive environement: functions are compiled incrementally over their previous definition, objects whose classes have changed are migrated live.</p> <p>See, I just said <em>objects</em> and <em>classes</em>. <em>Common Lisp</em> comes with some advanced <em>Object Oriented Programming</em> facilities named <a href="http://www.aiai.ed.ac.uk/~jeff/clos-guide.html">CLOS</a> and <a href="http://www.alu.org/mop/index.html">MOP</a> where the <em>Java</em> and <em>Python</em> and <em>C++</em> object models are just a subset of what you're being offered. Hint, those don't have <a href="http://en.wikipedia.org/wiki/Multiple_dispatch">Multiple Dispatch</a>.</p> <p>And you have a very sophisticated <a href="http://www.gigamonkeys.com/book/beyond-exception-handling-conditions-and-restarts.html">Condition System</a> where <em>Exceptions</em> are just a subset of what you can do (hint: have a look a <a href="http://www.gigamonkeys.com/book/beyond-exception-handling-conditions-and-restarts.html#restarts">restarts</a> and tell me you didn't wish your programming language of choice had them). And it continues that way for about any basic building bloc you might want to be using.</p> <h3>Loading data</h3> <p class="first">Back to <a href="http://tapoueh.org/pgsql/pgloader.html">pgloader</a> will you tell me. Right. I've been spending a couple of evening on hacking on the new version of pgloader in <em>Common Lisp</em>, and wanted to share some preliminary results.</p> <center> <p><img src="../../../images/toy-loader.320.jpg" alt=""></p> </center> <center> <p><em>Playing with the loader</em></p> </center> <p>The current status of the new <em>pgloader</em> still is pretty rough, if you're not used to develop in Common Lisp you might not find it ready for use yet. I'm still working on the internal APIs and trying to make something clean and easy to use for a developer, and then I will provide some external ways to play with it, user oriented. I missed that step once with the <em>Python</em> based version of the tool, I don't want to do the same errors again this time.</p> <p>So here's a test run with the current <em>pgloader</em>, on a small enough data set of 226 MB of CSV files.</p> <pre class="src"> time python pgloader.py -R.. —summary -Tc ../pgloader.dbname.conf
Table name duration size copy rows errors

====================================================================

aaaaaaaaaa_aaaa 2.148s - 24595 0
bbbbbbbbbb_bbbb...| 0.609s - 326 0
cccccccccc_cccc...| 2.868s - 25126 0
dddddddddd_dddd...| 0.638s - 8 0
eeeeeeeeee_eeee...| 2.874s - 36825 0
ffffffffff_ffffff 0.667s - 624 0
gggggggggg_gggg...| 0.847s - 5638 0
hhh_hhhhhhh 9.907s - 120159 0
iii_iiiiiiiiiiiii 0.574s - 661 0
jjjjjjj 6.647s - 30027 0
kkk_kkkkkkkkk 0.439s - 12 0
lll_llllll 0.308s - 4 0
mmmm_mmm 2.139s - 29669 0
nnnn_nnnnnn 8.555s - 100197 0
oooo_ooooo 13.781s - 93555 0
pppp_ppppppp 8.275s - 76457 0
qqqq_qqqqqqqqqqqq 8.568s - 126159 0

====================================================================

Total 01m09.902s - 670042 0
</pre> <h3>Streaming data</h3> <p class="first">With the new code in <em>Common Lisp</em>, I could benefit from real multi threading and higher level abstraction to make it easy to use: <a href="http://lparallel.org/">lparallel</a> is a lib providing exactly what I need here, with <em>workers</em> and <em>queues</em> to communicate data in between them.</p> <p>What I'm doing is that two threads are separated, one is reading the data from either a CSV file or a <em>MySQL</em> database directly, and pushing that data in the queue; while the other thread is pulling data from the queue and writing it into our <a href="http://www.postgresql.org/">PostgreSQL</a> database.</p> <pre class="src"> CL-USER&gt; (pgloader.csv:import-database <span style="color: #bc8f8f;">"dbname"</span>

:csv-path-root <span style="color: #bc8f8f;">"/path/to/csv/"</span> :separator #\Tab :quote #\" :escape <span style="color: #bc8f8f;">"\"\""</span> :null-as <span style="color: #bc8f8f;">":null:"</span>) table name read imported errors time


——— ——— ——— ——— aaaaaaaaaa_aaaa 24595 24595 0 0.995s bbbbbbbbbb_bbbbbbbbb 326 326 0 0.570s cccccccccc_cccccccccccc 25126 25126 0 1.461s dddddddddd_dddddddddd_dd 8 8 0 0.650s eeeeeeeeee_eeeeeeeeee_eeeeeeee 36825 36825 0 1.664s ffffffffff_ffffff 624 624 0 0.707s gggggggggg_ggggg_gggggggg 5638 5638 0 0.655s hhh_hhhhhhh 120159 120159 0 3.415s iii_iiiiiiiiiiiii 661 661 0 0.420s jjjjjjj 30027 30027 0 2.743s kkk_kkkkkkkkk 12 12 0 0.327s lll_llllll 4 4 0 0.315s mmmm_mmm 29669 29669 0 1.182s nnnn_nnnnnn 100197 100197 0 2.206s oooo_ooooo 93555 93555 0 9.683s pppp_ppppppp 76457 76457 0 5.349s qqqq_qqqqqqqqqqqq 126159 126159 0 2.495s
——— ——— ——— ——— Total import time 670042 670042 0 34.836s NIL </pre>

<p>As you can see the control is still made for interactive developer usage, which is fine for now but will have to change down the road, when the APIs stabilize.</p> <p>Now, let's compare to reading directly from <em>MySQL</em>:</p> <pre class="src"> CL-USER&gt; (pgloader.mysql:stream-database <span style="color: #bc8f8f;">"dbname"</span>)

table name read imported errors time


——— ——— ——— ——— aaaaaaaaaa_aaaa 24595 24595 0 0.887s bbbbbbbbbb_bbbbbbbbb 326 326 0 0.617s cccccccccc_cccccccccccc 25126 25126 0 1.497s dddddddddd_dddddddddd_dd 8 8 0 0.582s eeeeeeeeee_eeeeeeeeee_eeeeeeee 36825 36825 0 1.697s ffffffffff_ffffff 624 624 0 0.748s gggggggggg_ggggg_gggggggg 5638 5638 0 0.923s hhh_hhhhhhh 120159 120159 0 3.525s iii_iiiiiiiiiiiii 661 661 0 0.449s jjjjjjj 30027 30027 0 2.546s kkk_kkkkkkkkk 12 12 0 0.330s lll_llllll 4 4 0 0.323s mmmm_mmm 29669 29669 0 1.227s nnnn_nnnnnn 100197 100197 0 2.489s oooo_ooooo 93555 93555 0 9.148s pppp_ppppppp 76457 76457 0 6.713s qqqq_qqqqqqqqqqqq 126159 126159 0 4.571s
——— ——— ——— ——— Total streaming time 670042 670042 0 38.272s NIL </pre>

<p>The <em>streaming</em> here is a tad slower than the <em>importing</em> from files. Now if you want to be fair when comparing those, you would have to take into account the time it takes to <em>export</em> the data out from its source. When doing that <em>export/import</em> dance, a quick test shows a timing of 1m4.745s. Now, if we do an <em>export only</em> test, it runs in 31.822s. So yes streaming is a good thing to have here.</p> <h3>Conclusion</h3> <p class="first">We just got twice as fast as the python version.</p> <p>Some will say that I'm not comparing fairly to the <em>Python</em> version of pgloader here, because I could have implemented the streaming facility in <em>Python</em> too. Well actually I did, the option are called <a href="http://tapoueh.org/pgsql/pgloader.html#sec13">section_threads</a> and <a href="http://tapoueh.org/pgsql/pgloader.html#sec15">split_file_reading</a>, that you can set so that a reader is pushing data into a set of queues and several workers are feeding each from its own queue. It didn't help with performances at all. Once again, read about the infamous <a href="http://docs.python.org/3/c-api/init.html#threads">Global Interpreter Lock</a> to understand why not.</p> <center> <p><img src="../../../images/lisplogo_flag_128.png" alt=""></p> </center> <p>So actually it's a fair comparison here where the new code is twice as fast as the previous one, with only some hours of hacking and before spending any time on optimisation. Well, apart from using a <em>producer</em>, a <em>consumer</em> and a <em>queue</em>, which I almost had to have for streaming in between two database connections anyways.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 12 Feb 2013 11:17:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/12-playing-with-pgloader.html</guid> </item> <item> <title>Marking whole word</title> <link>http://tapoueh.org/blog/2013/02/04-Emacs-mark-word.html</link> <description><![CDATA[<p>I've discovered recently another Emacs facility that I since then use several times a day, and I wonder how I did without it before: C-M-SPC runs the command mark-sexp.</p>

<center> <p><img src="../../../images/sexp.gif" alt=""></p> </center> <center> <p><em>Well, mark-sexp apparently is related to the Sex Pistols</em></p> </center> <p>It's pretty simple actually, when you have the <em>point</em> at the beginning of a word or an identifier (containing numbers, dashes, underscores and other punctuation signs), you can select the <em>whole</em> of it in a single key chord!</p> <p>The best thing is that if you press the same key chord again, it will expand to include the next expression. And that works in plain text and most programming languages where I've tried it, which is not so much recently. It does not depend that much on the programming language anyway.</p> <p>The full general solution here is to use something like <a href="https://github.com/magnars/expand-region.el">expand region</a>, don't miss the <a href="http://emacsrocks.com/e09.html">Emacs Rocks Expand Region Episode</a>, it's less than 3 minutes and you will want to install <em>expand-region</em> after that. For easy installing, of course you are already using <a href="http://tapoueh.org/emacs/el-get.html">el-get</a> right?</p> <p>Now, a friend just asked this morning how to select the <em>current word</em> even when the the point is currently in the middle of it. Going manually back to the beginning of it is no fun. I knew about thing-at-point and a little about how it works, but didn't find anything readily made for that use case (hint: it needs to be an <em>interactive</em> command).</p> <p>Here's what I came up with, then:</p> <pre class="src"> (<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">mha:select-current-word</span> ()

<span style="color: #bc8f8f;">"Select the current word."</span> (interactive) (beginning-of-thing 'symbol) (push-mark (point) nil t) (end-of-thing 'symbol) (exchange-point-and-mark))

(global-set-key (kbd <span style="color: #bc8f8f;">"C-S-M-SPC"</span>) 'mha:select-current-word) </pre>

<p>I picked C-M-S-SPC not because it's the easiest way to invoke the new command, but because to me it's a quite natural extension to the C-M-SPC that I use so often. Again, each time you want to <em>select</em> a identifier in some code of yours, you'd most certainly be better off using C-M-SPC.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 08 Feb 2013 17:15:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/04-Emacs-mark-word.html</guid> </item> <item> <title>Live Upgrading PGQ</title> <link>http://tapoueh.org/blog/2013/02/08-PGQ-Live-Upgrade.html</link> <description><![CDATA[<p>Some <a href="http://skytools.projects.pgfoundry.org/skytools-3.0/doc/">skytools</a> related new today, it's been a while. For those who where at my <a href="http://tapoueh.org/blog/2013/02/04-Another-great-FOSDEM.html">FOSDEM's talk</a> about <a href="https://fosdem.org/2013/schedule/event/postgresql_implementing_high_availability/">Implementing High Availability</a> you might have heard that I really like working with <a href="http://wiki.postgresql.org/wiki/Skytools#PgQ">PGQ</a>. A new version has been released a while ago, and the most recent verion is now 3.1.3, as announced in the <a href="http://www.postgresql.org/message-id/CACMqXCLD2je5VFqUCzjwC2s5QQVYLe6-4awJaRvqLSBEVw8_MQ@mail.gmail.com">Skytools 3.1.3</a> email.</p>

<center> <p><img src="../../../images/software-upgrade.320.png" alt=""></p> </center> <center> <p><em>Upgrade time!</em></p> </center> <h3>Skytools 3.1.3 enters debian</h3> <p class="first">First news is that <em>Skytools 3.1.3</em> has been entering <a href="http://packages.debian.org/search?keywords=skytools3">debian</a> today (I hope that by the time you reach that URL, it's been updated to show information according to the news here, but I might be early). As there's current a <em>debian freeze</em> to release <em>wheezy</em> (and you can help <a href="http://www.debian.org/News/2012/20121110">squash some bugs</a>), this version is only getting uploaded to <em>experimental</em> for now. Thanks to the tireless work of <a href="http://www.df7cb.de/blog/2012/apt.postgresql.org.html">Christoph Berg</a> though, this version is already available from <a href="https://wiki.postgresql.org/wiki/Apt">apt.postgresql.org</a>.</p> <h3>Upgrading to PGQ 3</h3> <p class="first">The other news is that I've been testing <em>live upgrade</em> scenario where we want to upgrade from PGQ to PGQ3, and it works pretty well, and it's quite simple to achieve too. Here's how.</p> <p>So the first thing is to shut down the current <em>ticker</em> process. Then we install the new packages, assuming that you did follow the step in the wiki pointed above, please go read <a href="https://wiki.postgresql.org/wiki/Apt">apt.postgresql.org</a> again now if needs be.</p> <pre class="src"> pgqadm.py ticker.ini -s sudo apt-get install postgresql-9.1-pgq3 skytools3-ticker skytools3 </pre> <p>The ticker is not running anymore, we have the right version of the software installed. Next step is to upgrade the database parts of PGQ:</p> <pre class="src"> psql -f /usr/share/skytools3/pgq.upgrade_2.1_to_3.0.sql ... psql -1 -f /usr/share/postgresql/9.1/contrib/pgq.upgrade.sql ... </pre> <p>Of course replace those ... with options such as your actual connection string. I tend to always add -vON_ERROR_STOP=1 to all these commands, so that I don't depend on having the right .psqlrc on the particular server I'm connected to. Also remember that if you want to do that for more than one database, you need to actually run that pair of commands for each of them.</p> <p>Now it's time to restart the new ticker. The main changes from the previous one is that it is now a C program called pgqd that knows how to tick for any number of <em>databases</em>, so that you only have to have <em>one instance</em> around <em>per cluster</em> now.</p> <pre class="src"> sudo /etc/init.d/skytools3 start tail -f /var/log/skytools/pgqd.log </pre> <p>Those two commands are taking for granted that you did prepare the pgqd setup the <em>debian</em> and <em>skytools</em> way, by adding your config in /etc/skytools3/pgqd.ini and editing /etc/skytools.ini accordingly, so that it's automatically taken into account at machine boot.</p> <p>Note that I did actually exercised the procedure above while running a <a href="http://www.postgresql.org/docs/9.2/static/pgbench.html">pgbench</a> test replicated with londiste. Of course the replication has been lagging a little while no <em>ticker</em> was running, and then it catched-up as fast as it could, in that case:</p> <pre class="src"> INFO {count: 245673, ignored: 0, duration: 422.104366064} </pre> <h3>Happy Hacking!</h3> <p class="first">So if you have any <em>batch processing</em> needs, remember to consider what PGQ has to offer. And yes if you're running some cron job to compute things out of the database for you, you are doing some <em>batch processing</em>.</p> <center> <p><img src="../../../images/hayseed.jpg" alt=""></p> </center> <center> <p><em>Yes, I did search for Transactional Batch Processing</em></p> </center> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 08 Feb 2013 15:52:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/08-PGQ-Live-Upgrade.html</guid> </item> <item> <title>Another Great FOSDEM</title> <link>http://tapoueh.org/blog/2013/02/04-Another-great-FOSDEM.html</link> <description><![CDATA[<p>This year's FOSDEM has been a great edition, in particular the <a href="http://fosdem2013.pgconf.eu/">FOSDEM PGDAY 2013</a> was a great way to begin a 3 days marathon of talking about PostgreSQL with people not only from our community but also from plenty other Open Source communities too: users!</p>

<center> <p><a class="image-link" href="https://fosdem.org/2013/"> <img src="../../../images/fosdem-logo.png"></a><a class="image-link" href="http://fosdem2013.pgconf.eu/"> <img src="../../../images/postgresql-elephant.small.png"></a></p> </center> <center> <p><em>PostgreSQL at FOSDEM made for a great event</em></p> </center> <p>Having had the opportunity to meet more people from those other development communities, I really think we should go and reach for them in their own conferences. About any PostgreSQL community member I've been talking about with about that idea seemed to agree and generally already was thinking the same thing. And most are already doing it, in fact...</p> <p>I had the pleasure to run two conferences there, both in the <a href="https://fosdem.org/2013/schedule/track/postgresql/">PostgreSQL devroom</a>.</p> <h3>Event Triggers</h3> <p class="first">I'm currently in the middle of implementing <em>Event Triggers</em> for PostgreSQL and I have been for about the last 2 years. It's a quite complex feature to get right and so the patch itself is complex and large, which means the reviewing process is complex and takes time.</p> <p>That also means that some parts of the design have already been redone completely at least 3 times, and that what got <em>commited</em> to the PostgreSQL code is nothing like what the design we decided should go in looks like. That's just a fact of life, maybe, but that makes for a very long development process.</p> <p>We're now getting to the end of it though, and this talk is showing both where we want to go with <em>Event Triggers</em>, where we are now and what remains to be done for 9.3 if we want the feature to be any useful.</p> <p>If you're interested into that development, have a look at the slide deck and possibly ask me some questions about what's not clear on the <a href="http://www.postgresql.org/list/pgsql-hackers/">pgsql-hackers</a> mailing list (preferably).</p> <center> <p><a class="image-link" href="../../../images/confs/Fosdem2013_Event_Triggers.pdf"> <img src="../../../images/confs/Fosdem2013_Event_Triggers.png"></a></p> </center> <center> <p><em>Event Triggers, The Real Mess™</em></p> </center> <p>The other way to get summarized and clear information about Event Triggers is the wiki page by the same name: <a href="http://wiki.postgresql.org/wiki/Event_Triggers">Event Triggers</a>.</p> <p>You will see that while a lot has been done (internal refactoring, adding new infrastructure and SQL level commands, and the minimum PLpgSQL support); a lot remains to be done where the code has already been submitted several times, following several designs directions given by careful review on hackers, and still we have some choices to make.</p> <h3>Implementing High-Availability</h3> <p class="first">This talk is showing several ways to implement <em>High Availability</em> with PostgreSQL. The fact is that that term is overloaded already, and usually covers two very different things which are <em>Service Availability</em> and <em>Data Availability</em>.</p> <p>In the talk, we're showing up several techniques that you can use to address different set of compromises in between <em>scaling</em>, <em>load balancing</em>, <em>data availability</em> and <em>durability</em>, and <em>service availability</em>. The first two points could seem unrelated to the main topic, but <em>scaling</em> often is a simple enough way to achieve <em>service availability</em>... until you need to think about <em>sharding</em>, that is.</p> <center> <p><a class="image-link" href="../../../images/confs/Fosdem2013_High_Availability.pdf"> <img src="../../../images/confs/Fosdem2013_High_Availability.640.png"></a></p> </center> <center> <p><em>Implementing High Availability of Services and Data with PostgreSQL</em></p> </center> <p>So the talk is all about making compromises in between them and getting to an architecture able to implement the choosen compromises. While the talk has been pretty well received, it was delivered in a 50 mins slot where we usually take a whole day or three when addressing that problems at a customer's site.</p> <p>Some parts of how to get to the right architecture for the compromises that are important for you can't be fully covered in that time slot, while still being able to actually present the techniques that we're using.</p> <p>I think it might be useful to extract a single use-case or two from that talk then have a full 50 mins version reduced to a single or a couple of very clear compromises and how to achieve them in details, rather than trying to present a full range of techniques and how to use them in different scenarios.</p> <h3>FOSDEM</h3> <p class="first">After having been talking with many people, it appears that for next year's edition I should be proposing a more general talk that aims at helping developpers in other communities (python, ruby, etc) discover what's in for them in PostgreSQL. This database is full of advanced features that are really easy to use, and the only problem when preparing such a talk is choosing the right subset...</p> <p>If you're running a local developper user group and are interested into learning some more about how PostgreSQL can help you in a daily basis, please do get in touch with me and let's schedule a presentation together!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 04 Feb 2013 09:55:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/04-Another-great-FOSDEM.html</guid> </item> <item> <title>Another Great FOSDEM</title> <link>http://tapoueh.org/blog/2013/02/04-Another-great-FOSDEM.html</link> <description><![CDATA[<p>This year's FOSDEM has been a great edition, in particular the <a href="http://fosdem2013.pgconf.eu/">FOSDEM PGDAY 2013</a> was a great way to begin a 3 days marathon of talking about PostgreSQL with people not only from our community but also from plenty other Open Source communities too: users!</p>

<center> <p><a class="image-link" href="https://fosdem.org/2013/"> <img src="../../../images/fosdem-logo.png"></a><a class="image-link" href="http://fosdem2013.pgconf.eu/"> <img src="../../../images/postgresql-elephant.small.png"></a></p> </center> <center> <p><em>PostgreSQL at FOSDEM made for a great event</em></p> </center> <p>Having had the opportunity to meet more people from those other development communities, I really think we should go and reach for them in their own conferences. About any PostgreSQL community member I've been talking about with about that idea seemed to agree and generally already was thinking the same thing. And most are already doing it, in fact...</p> <p>I had the pleasure to run two conferences there, both in the <a href="https://fosdem.org/2013/schedule/track/postgresql/">PostgreSQL devroom</a>.</p> <h3>Event Triggers</h3> <p class="first">I'm currently in the middle of implementing <em>Event Triggers</em> for PostgreSQL and I have been for about the last 2 years. It's a quite complex feature to get right and so the patch itself is complex and large, which means the reviewing process is complex and takes time.</p> <p>That also means that some parts of the design have already been redone completely at least 3 times, and that what got <em>commited</em> to the PostgreSQL code is nothing like what the design we decided should go in looks like. That's just a fact of life, maybe, but that makes for a very long development process.</p> <p>We're now getting to the end of it though, and this talk is showing both where we want to go with <em>Event Triggers</em>, where we are now and what remains to be done for 9.3 if we want the feature to be any useful.</p> <p>If you're interested into that development, have a look at the slide deck and possibly ask me some questions about what's not clear on the <a href="http://www.postgresql.org/list/pgsql-hackers/">pgsql-hackers</a> mailing list (preferably).</p> <center> <p><a class="image-link" href="../../../images/confs/Fosdem2013_Event_Triggers.pdf"> <img src="../../../images/confs/Fosdem2013_Event_Triggers.png"></a></p> </center> <center> <p><em>Event Triggers, The Real Mess™</em></p> </center> <p>The other way to get summarized and clear information about Event Triggers is the wiki page by the same name: <a href="http://wiki.postgresql.org/wiki/Event_Triggers">Event Triggers</a>.</p> <p>You will see that while a lot has been done (internal refactoring, adding new infrastructure and SQL level commands, and the minimum PLpgSQL support); a lot remains to be done where the code has already been submitted several times, following several designs directions given by careful review on hackers, and still we have some choices to make.</p> <h3>Implementing High-Availability</h3> <p class="first">This talk is showing several ways to implement <em>High Availability</em> with PostgreSQL. The fact is that that term is overloaded already, and usually covers two very different things which are <em>Service Availability</em> and <em>Data Availability</em>.</p> <p>In the talk, we're showing up several techniques that you can use to address different set of compromises in between <em>scaling</em>, <em>load balancing</em>, <em>data availability</em> and <em>durability</em>, and <em>service availability</em>. The first two points could seem unrelated to the main topic, but <em>scaling</em> often is a simple enough way to achieve <em>service availability</em>... until you need to think about <em>sharding</em>, that is.</p> <center> <p><a class="image-link" href="../../../images/confs/Fosdem2013_High_Availability.pdf"> <img src="../../../images/confs/Fosdem2013_High_Availability.640.png"></a></p> </center> <center> <p><em>Implementing High Availability of Services and Data with PostgreSQL</em></p> </center> <p>So the talk is all about making compromises in between them and getting to an architecture able to implement the choosen compromises. While the talk has been pretty well received, it was delivered in a 50 mins slot where we usually take a whole day or three when addressing that problems at a customer's site.</p> <p>Some parts of how to get to the right architecture for the compromises that are important for you can't be fully covered in that time slot, while still being able to actually present the techniques that we're using.</p> <p>I think it might be useful to extract a single use-case or two from that talk then have a full 50 mins version reduced to a single or a couple of very clear compromises and how to achieve them in details, rather than trying to present a full range of techniques and how to use them in different scenarios.</p> <h3>FOSDEM</h3> <p class="first">After having been talking with many people, it appears that for next year's edition I should be proposing a more general talk that aims at helping developpers in other communities (python, ruby, etc) discover what's in for them in PostgreSQL. This database is full of advanced features that are really easy to use, and the only problem when preparing such a talk is choosing the right subset...</p> <p>If you're running a local developper user group and are interested into learning some more about how PostgreSQL can help you in a daily basis, please do get in touch with me and let's schedule a presentation together!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 04 Feb 2013 09:55:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/04-Another-great-FOSDEM.html</guid> </item> <item> <title>Another Great FOSDEM</title> <link>http://tapoueh.org/blog/2013/02/04-Another-great-FOSDEM.html</link> <description><![CDATA[<p>This year's FOSDEM has been a great edition, in particular the <a href="http://fosdem2013.pgconf.eu/">FOSDEM PGDAY 2013</a> was a great way to begin a 3 days marathon of talking about PostgreSQL with people not only from our community but also from plenty other Open Source communities too: users!</p>

<center> <p><a class="image-link" href="https://fosdem.org/2013/"> <img src="../../../images/fosdem-logo.png"></a><a class="image-link" href="http://fosdem2013.pgconf.eu/"> <img src="../../../images/postgresql-elephant.small.png"></a></p> </center> <center> <p><em>PostgreSQL at FOSDEM made for a great event</em></p> </center> <p>Having had the opportunity to meet more people from those other development communities, I really think we should go and reach for them in their own conferences. About any PostgreSQL community member I've been talking about with about that idea seemed to agree and generally already was thinking the same thing. And most are already doing it, in fact...</p> <p>I had the pleasure to run two conferences there, both in the <a href="https://fosdem.org/2013/schedule/track/postgresql/">PostgreSQL devroom</a>.</p> <h3>Event Triggers</h3> <p class="first">I'm currently in the middle of implementing <em>Event Triggers</em> for PostgreSQL and I have been for about the last 2 years. It's a quite complex feature to get right and so the patch itself is complex and large, which means the reviewing process is complex and takes time.</p> <p>That also means that some parts of the design have already been redone completely at least 3 times, and that what got <em>commited</em> to the PostgreSQL code is nothing like what the design we decided should go in looks like. That's just a fact of life, maybe, but that makes for a very long development process.</p> <p>We're now getting to the end of it though, and this talk is showing both where we want to go with <em>Event Triggers</em>, where we are now and what remains to be done for 9.3 if we want the feature to be any useful.</p> <p>If you're interested into that development, have a look at the slide deck and possibly ask me some questions about what's not clear on the <a href="http://www.postgresql.org/list/pgsql-hackers/">pgsql-hackers</a> mailing list (preferably).</p> <center> <p><a class="image-link" href="../../../images/confs/Fosdem2013_Event_Triggers.pdf"> <img src="../../../images/confs/Fosdem2013_Event_Triggers.png"></a></p> </center> <center> <p><em>Event Triggers, The Real Mess™</em></p> </center> <p>The other way to get summarized and clear information about Event Triggers is the wiki page by the same name: <a href="http://wiki.postgresql.org/wiki/Event_Triggers">Event Triggers</a>.</p> <p>You will see that while a lot has been done (internal refactoring, adding new infrastructure and SQL level commands, and the minimum PLpgSQL support); a lot remains to be done where the code has already been submitted several times, following several designs directions given by careful review on hackers, and still we have some choices to make.</p> <h3>Implementing High-Availability</h3> <p class="first">This talk is showing several ways to implement <em>High Availability</em> with PostgreSQL. The fact is that that term is overloaded already, and usually covers two very different things which are <em>Service Availability</em> and <em>Data Availability</em>.</p> <p>In the talk, we're showing up several techniques that you can use to address different set of compromises in between <em>scaling</em>, <em>load balancing</em>, <em>data availability</em> and <em>durability</em>, and <em>service availability</em>. The first two points could seem unrelated to the main topic, but <em>scaling</em> often is a simple enough way to achieve <em>service availability</em>... until you need to think about <em>sharding</em>, that is.</p> <center> <p><a class="image-link" href="../../../images/confs/Fosdem2013_High_Availability.pdf"> <img src="../../../images/confs/Fosdem2013_High_Availability.640.png"></a></p> </center> <center> <p><em>Implementing High Availability of Services and Data with PostgreSQL</em></p> </center> <p>So the talk is all about making compromises in between them and getting to an architecture able to implement the choosen compromises. While the talk has been pretty well received, it was delivered in a 50 mins slot where we usually take a whole day or three when addressing that problems at a customer's site.</p> <p>Some parts of how to get to the right architecture for the compromises that are important for you can't be fully covered in that time slot, while still being able to actually present the techniques that we're using.</p> <p>I think it might be useful to extract a single use-case or two from that talk then have a full 50 mins version reduced to a single or a couple of very clear compromises and how to achieve them in details, rather than trying to present a full range of techniques and how to use them in different scenarios.</p> <h3>FOSDEM</h3> <p class="first">After having been talking with many people, it appears that for next year's edition I should be proposing a more general talk that aims at helping developpers in other communities (python, ruby, etc) discover what's in for them in PostgreSQL. This database is full of advanced features that are really easy to use, and the only problem when preparing such a talk is choosing the right subset...</p> <p>If you're running a local developper user group and are interested into learning some more about how PostgreSQL can help you in a daily basis, please do get in touch with me and let's schedule a presentation together!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 04 Feb 2013 09:55:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/04-Another-great-FOSDEM.html</guid> </item> <item> <title>A Sunday at FOSDEM</title> <link>http://tapoueh.org/blog/2013/01/30-A-Sunday-at-FOSDEM.html</link> <description><![CDATA[<p>The previous article <a href="29-FOSDEM-2013.html">FOSDEM 2013</a> said to be careful with the <a href="https://fosdem.org/2013/schedule/track/postgresql/">PostgreSQL devroom schedule</a> because one of my talks there might get swapped with a slot on the <a href="http://fosdem2013.pgconf.eu/">FOSDEM PGDay 2013</a> which happens <strong><em>this Friday</em></strong> and has been sold out anyway.</p>

<p>Turns out it's not true, because we still depend on past century technologies somehow. Not everybody will be looking at the schedule on the web using a connected mobile device (you know, you've heard of them, those <em>tracking and surveillance devices</em>, if you want to believe <a href="http://stallman.org/rms-lifestyle.html">Stallman</a>), and as the schedule gets printed on little paper sheets, it's unfortunately too late to change it now.</p> <center> <p><a class="image-link" href="https://fosdem.org/2013/"> <img src="../../../images/fosdem.png"></a></p> </center> <center> <p><em>Those flyers are already printed on paper sheets, the schedule too</em></p> </center> <p>So it happens that I'll be speaking twice on Sunday and not at all on Friday.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 30 Jan 2013 10:50:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/01/30-A-Sunday-at-FOSDEM.html</guid> </item> <item> <title>FOSDEM 2013</title> <link>http://tapoueh.org/blog/2013/01/29-FOSDEM-2013.html</link> <description><![CDATA[<p>This year again I'm going to <a href="https://fosdem.org/2013/">FOSDEM</a>, and to the extra special <a href="http://fosdem2013.pgconf.eu/">PostgreSQL FOSDEM day</a>. It will be the first time that I'm going to be at the event for the full week-end rather than just commuting in for the day.</p>

<center> <p><a class="image-link" href="https://fosdem.org/2013/"> <img src="../../../images/fosdem.png"></a></p> </center> <center> <p><em>I'm Going to the FOSDEM, hope to see you there!</em></p> </center> <p>And I'm presenting two talks over there that are both currently scheduled on the Sunday in the <a href="https://fosdem.org/2013/schedule/track/postgresql/">PostgreSQL devroom</a>. We're talking about changing that though, so that one of those will in fact happen <strong><em>this Friday</em></strong> at the <a href="http://www.postgresql.eu/events/schedule/fosdem2013/">FOSDEM PGDay 2013</a>, which has a different schedule, so consider watching for that.</p> <p>One of those two talks is about <a href="https://fosdem.org/2013/schedule/event/postgresql_implementing_high_availability/">Implementing High Availability</a> (yes, with PostgreSQL). It's been quite well received in the places I had to chance to make it before (namely <em>PGDay France</em> and <em>PG Conf Europe</em>), and it's going to be a stripped down version of it so that it fits well in the 45 mins slot we have here.</p> <p>The other talk is going to be about <a href="https://fosdem.org/2013/schedule/event/postgresql_event_triggers/">Event Triggers</a>, a feature new in PostgreSQL 9.3 (due in september 2013, crossing fingers) and while the goal of that talk is to introduce what the feature is all about and a bunch of use cases that you can address by using it, it will certainly offer a peek into the PostgreSQL development cycle and community processes.</p> <center> <p><img src="../../../images/belgium-beers.jpg" alt=""></p> </center> <center> <p><em>See you in Brussels!</em></p> </center> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 29 Jan 2013 10:11:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/01/29-FOSDEM-2013.html</guid> </item> <item> <title>pgloader: what's next?</title> <link>http://tapoueh.org/blog/2013/01/28-pgloader-future.html</link> <description><![CDATA[<p><a href="../../../pgsql/pgloader.html">pgloader</a> is a tool to help loading data into <a href="http://www.postgresql.org/">PostgreSQL</a>, adding some error management to the <a href="http://www.postgresql.org/docs/9.2/interactive/sql-copy.html">COPY</a> command. COPY is the fast way of loading data into PostgreSQL and is transaction safe. That means that if a single error appears within your bulk of data, you will have loaded none of it. pgloader will submit the data again in smaller chunks until it's able to isolate the bad from the good, and then the good is loaded in.</p>

<center> <p><img src="../../../images/PDL_Adapter-250.png" alt=""></p> </center> <center> <p><em>Not quite this kind of data loader</em></p> </center> <p>In a recent migration project where we freed data from MySQL into PostgreSQL, we used pgloader again. But the loading time was not fast enough for the service downtime window that we had here. Indeed <a href="http://www.python.org/">Python</a> is not known for being the fastest solution around. It's easy to use and to ship to production, but sometimes you not only want to be able to be efficient when writing code, you also need the code to actually run fast too.</p> <h3>Faster data loading</h3> <p class="first">So I began writing a little dedicated tool for that migration in <a href="http://cliki.net/">Common Lisp</a> which is growing on me as my personal answer to the burning question: <em>python 2 or python 3</em>? I find <em>Common Lisp</em> to offer an even more dynamic programming environment, an easier language to use, and the result often has performances characteristics way beyond what I can get with python. Between <a href="http://tapoueh.org/blog/2012/07/10-solving-sudoku.html">5 times faster</a> and <a href="http://tapoueh.org/blog/2012/08/20-performance-the-easiest-way.html">121 times faster</a> in some quite stupid benchmark.</p> <p>Here, with real data, my one shot attempt has been running more than <em>twice as fast</em> as the python version, after about a day of programming.</p> <center> <p><img src="../../../images/lisp-python.png" alt=""></p> </center> <center> <p><em>See what's happening now?</em></p> </center> <p>The other thing here is that I've tempted to get pgloader work in parallel, but at the time I didn't know about the <a href="http://docs.python.org/3/c-api/init.html#threads">Global Interpreter Lock</a> that they didn't find how to remove in Python 3 still, by the way. So my threading attempts at making pgloader work in parallel are pretty useless.</p> <p>Whereas in <em>Common Lisp</em> I can just use the <a href="http://lparallel.org/">lparallel</a> lib, which exposes threading facilities and some <em>queueing</em> facilities as a mean to communicate data in between workers, and have my code easily work in parallel for real.</p> <h3>Compatibility</h3> <p class="first">The only drawback that I can see here is that if you've been writing your own <em>reformating modules</em> in python for pgloader (yes you can <a href="http://tapoueh.org/pgsql/pgloader.html#sec21">implement your own reformating module for pgloader</a>), then you would have to port it to <em>Common Lisp</em>. Shout me an email if that's your case.</p> <h3>Next version</h3> <p class="first">So, I think we're going to have a <em>pgloader 3</em> someday, that will be way faster than the current one, and bundle some more features: real parallel behavior, ability to fetch non local data (connecting to MySQL directly, or HTTP, S3, etc); and I'm thinking about offering a COPY like syntax to drive the loading too, while at it. Also, the ability to discover the set of data to load all by itself when you want to load a whole database: think of it as a special <em>Migration</em> mode of operations.</p> <p>Some feature requests can't be solved easily when keeping the old .INI syntax cruft, so it's high time to implement some kind of a real command language. I have several ideas about those, in between the COPY syntax and the SQL*Loader configuration format, which is both clunky and quite powerful, too.</p> <p>After a beginning in TCL and a complete rewrite in python in 2005, it looks like 2013 is going to be the year of <em>pgloader 3</em>, in <em>Common Lisp</em>!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 28 Jan 2013 10:48:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/01/28-pgloader-future.html</guid> </item> <item> <title>Automated Setup for pgloader</title> <link>http://tapoueh.org/blog/2013/01/17-pgloader-auto-setup.html</link> <description><![CDATA[<p>Another day, another migration from <em>MySQL</em> to <a href="http://www.postgresql.org/">PostgreSQL</a>... or at least that's how it feels sometimes. This time again I've been using some quite old scripts to help me do the migration.</p>

<center> <p><img src="../../../images/dauphin-logo.jpg" alt=""></p> </center> <center> <p><em>That's how I feel for MySQL users</em></p> </center> <h3>Migrating the schema</h3> <p class="first">For the <em>schema</em> parts, I've been using <a href="http://pgfoundry.org/projects/mysql2pgsql/">mysql2pgsql</a> with success for many years. This tool is not complete and will do only about <em>80%</em> of the work. As I think that the schema should always be validated manually when doing a migration anyway, I happen to think that it's good news.</p> <h3>Getting the data out</h3> <p class="first">Then for the data parts I keep on using <a href="../../../pgsql/pgloader.html">pgloader</a>. The data is never quite right, and the ability to filter out what you can't readily import in a <em>reject</em> file proves itself a a must have here. The problems you have in the exported MySQL data are quite serious:</p> <center> <p><img src="../../../images/data-unlocked.320.png" alt=""></p> </center> <center> <p><em>Can I have my data please?</em></p> </center> <p>First, date formating is not compatible with what PostgreSQL expects, sometimes using 20130117143218 instead of what we expect: 2013-01-17 14:32:18, and of course even when the format is right (that seems to depend on the MySQL server's version), you still have to transform the 0000-00-00 00:00:00 into NULL.</p> <blockquote> <p class="quoted"> Before thinking about the usage of that particular date rather than using NULL when you don't have the information, you might want to remember that there's no <a href="http://en.wikipedia.org/wiki/0_(year)">year zero</a> in the calendar, it's year 1 BC and then year 1.</p> </blockquote> <p>Then, text encoding is often mixed up, even when the MySQL databases are said to be in <em>latin1</em> or <em>unicode</em>, you somehow always end up finding texts in <em>win1252</em> or some other <em>code page</em> in there.</p> <p>And of course, MySQL provides no tool to export the data to CSV, so you have to come up with your own. The SELECT INTO OUTFILE command on the server produces non conforming CSV (\n can appear in non-escaped field contents), and while the mysql client manual page details that it outputs CSV when stdout is not a terminal, it won't even try to quote fields or escape \t when they appear in the data.</p> <p>So, we use the <a href="https://github.com/slardiere/mysqltocsv">mysqltocsv</a> little script to export the data, and then use that data to feed <a href="../../../pgsql/pgloader.html">pgloader</a>.</p> <h3>Loading the data in</h3> <p class="first">Now, we have to write down a configuration file for pgloader to know what to load and where to find the data. What about generating the file from the database schema instead, using the query in <a href="generate-pgloader-config.sql">generate-pgloader-config.sql</a>:</p> <pre class="src"> <span style="color: #7f007f;">with</span> reformat <span style="color: #7f007f;">as</span> (

<span style="color: #7f007f;">select</span> relname, attnum, attname, typname, <span style="color: #7f007f;">case</span> typname <span style="color: #7f007f;">when</span> <span style="color: #bc8f8f;">'timestamptz'</span>

<span style="color: #7f007f;">then</span> attname <span style="color: #bc8f8f;">':mynull:timestamp'</span>

<span style="color: #7f007f;">when</span> <span style="color: #bc8f8f;">'date'</span>

<span style="color: #7f007f;">then</span> attname <span style="color: #bc8f8f;">':mynull:date'</span>

<span style="color: #7f007f;">end</span> <span style="color: #7f007f;">as</span> reformat <span style="color: #7f007f;">from</span> pg_class <span style="color: #da70d6;">c</span> <span style="color: #7f007f;">join</span> pg_namespace n <span style="color: #7f007f;">on</span> n.oid = <span style="color: #da70d6;">c</span>.relnamespace <span style="color: #7f007f;">left</span> <span style="color: #7f007f;">join</span> pg_attribute <span style="color: #da70d6;">a</span> <span style="color: #7f007f;">on</span> <span style="color: #da70d6;">c</span>.oid = <span style="color: #da70d6;">a</span>.attrelid <span style="color: #7f007f;">join</span> pg_type <span style="color: #da70d6;">t</span> <span style="color: #7f007f;">on</span> <span style="color: #da70d6;">t</span>.oid = <span style="color: #da70d6;">a</span>.atttypid <span style="color: #7f007f;">where</span> <span style="color: #da70d6;">c</span>.relkind = <span style="color: #bc8f8f;">'r'</span> <span style="color: #7f007f;">and</span> attnum &gt; 0 <span style="color: #7f007f;">and</span> n.nspname = <span style="color: #bc8f8f;">'public'</span> ), config_reformat <span style="color: #7f007f;">as</span> ( <span style="color: #7f007f;">select</span> relname,

<span style="color: #bc8f8f;">'['</span>||relname||<span style="color: #bc8f8f;">']'</span> E<span style="color: #bc8f8f;">'\n'</span>
<span style="color: #bc8f8f;">'table = '</span> relname E<span style="color: #bc8f8f;">' \n'</span>
<span style="color: #bc8f8f;">'filename = /path/to/csv/'</span> relname E<span style="color: #bc8f8f;">'.csv\n'</span>
<span style="color: #bc8f8f;">'format = csv'</span> E<span style="color: #bc8f8f;">'\n'</span>
<span style="color: #bc8f8f;">'field_sep = \t'</span> E<span style="color: #bc8f8f;">'\n'</span>
<span style="color: #bc8f8f;">'columns = '</span> E<span style="color: #bc8f8f;">' \n'</span>
<span style="color: #bc8f8f;">'reformat = '</span> array_to_string(<span style="color: #da70d6;">array_agg</span>(reformat), <span style="color: #bc8f8f;">', '</span>)
E<span style="color: #bc8f8f;">'\n'</span> <span style="color: #7f007f;">as</span> config

<span style="color: #7f007f;">from</span> reformat <span style="color: #7f007f;">where</span> reformat <span style="color: #7f007f;">is</span> <span style="color: #7f007f;">not</span> <span style="color: #7f007f;">null</span> <span style="color: #7f007f;">group</span> <span style="color: #da70d6;">by</span> relname ), noreformat <span style="color: #7f007f;">as</span> ( <span style="color: #7f007f;">select</span> relname, bool_and(reformat <span style="color: #7f007f;">is</span> <span style="color: #7f007f;">null</span>) <span style="color: #7f007f;">as</span> noreformating <span style="color: #7f007f;">from</span> reformat <span style="color: #7f007f;">group</span> <span style="color: #da70d6;">by</span> relname ), config_noreformat <span style="color: #7f007f;">as</span> ( <span style="color: #7f007f;">select</span> relname,

<span style="color: #bc8f8f;">'['</span>||relname||<span style="color: #bc8f8f;">']'</span> E<span style="color: #bc8f8f;">'\n'</span>
<span style="color: #bc8f8f;">'table = '</span> relname E<span style="color: #bc8f8f;">' \n'</span>
<span style="color: #bc8f8f;">'filename = /path/to/csv/'</span> relname E<span style="color: #bc8f8f;">'.csv\n'</span>
<span style="color: #bc8f8f;">'format = csv'</span> E<span style="color: #bc8f8f;">'\n'</span>
<span style="color: #bc8f8f;">'field_sep = \t'</span> E<span style="color: #bc8f8f;">'\n'</span>
<span style="color: #bc8f8f;">'columns = '</span> E<span style="color: #bc8f8f;">' \n'</span>
E<span style="color: #bc8f8f;">'\n'</span> <span style="color: #7f007f;">as</span> config

<span style="color: #7f007f;">from</span> reformat <span style="color: #7f007f;">join</span> noreformat <span style="color: #7f007f;">using</span> (relname) <span style="color: #7f007f;">where</span> noreformating <span style="color: #7f007f;">group</span> <span style="color: #da70d6;">by</span> relname ), allconfs <span style="color: #7f007f;">as</span> ( <span style="color: #7f007f;">select</span> relname, config <span style="color: #7f007f;">from</span> config_reformat <span style="color: #7f007f;">union</span> <span style="color: #7f007f;">all</span> <span style="color: #7f007f;">select</span> relname, config <span style="color: #7f007f;">from</span> config_noreformat ) <span style="color: #7f007f;">select</span> config <span style="color: #7f007f;">from</span> allconfs <span style="color: #7f007f;">where</span> relname <span style="color: #7f007f;">not</span> <span style="color: #7f007f;">in</span> (<span style="color: #bc8f8f;">'tables'</span>, <span style="color: #bc8f8f;">'wedont'</span>, <span style="color: #bc8f8f;">'wantto'</span>, <span style="color: #bc8f8f;">'load'</span>) <span style="color: #7f007f;">order</span> <span style="color: #da70d6;">by</span> relname; </pre>

<p>To work with the setup generated, you will have to prepend a global section for pgloader and to include a reformating module in python, that I named <a href="mynull.py">mynull.py</a>:</p> <pre class="src"> <span style="color: #b22222;"># Author: Dimitri Fontaine &lt;<a href="mailto:dimitri&#64;2ndQuadrant.fr">dimitri&#64;2ndQuadrant.fr</a>&gt; # # pgloader mysql reformating module </span> <span style="color: #7f007f;">def</span> <span style="color: #0000ff;">timestamp</span>(reject, <span style="color: #da70d6;">input</span>):

<span style="color: #bc8f8f;">""" Reformat str as a PostgreSQL timestamp

MySQL timestamps are ok this time: 2012-12-18 23:38:12 But may contain the infamous all-zero date, where we want NULL. """</span> <span style="color: #7f007f;">if</span> <span style="color: #da70d6;">input</span> == <span style="color: #bc8f8f;">'0000-00-00 00:00:00'</span>: <span style="color: #7f007f;">return</span> <span style="color: #5f9ea0;">None</span>

<span style="color: #7f007f;">return</span> <span style="color: #da70d6;">input</span>

<span style="color: #7f007f;">def</span> <span style="color: #0000ff;">date</span>(reject, <span style="color: #da70d6;">input</span>):

<span style="color: #bc8f8f;">""" date columns can also have '0000-00-00'"""</span> <span style="color: #7f007f;">if</span> <span style="color: #da70d6;">input</span> == <span style="color: #bc8f8f;">'0000-00-00'</span>: <span style="color: #7f007f;">return</span> <span style="color: #5f9ea0;">None</span>

<span style="color: #7f007f;">return</span> <span style="color: #da70d6;">input</span> </pre>

<p>Now you can launch pgloader and profit!</p> <h3>Conclusion</h3> <p class="first">There are plenty of tools to assist you migrating away from MySQL and other databases. When you make that decision, you're not alone, and it's easy enough to find people to come and help you.</p> <p>While MySQL is Open Source and is not a <em>lock in</em> from a licencing perspective, I still find it hard to swallow that there's no provided tools for getting data out in a sane format, and that so many little inconsistencies exist in the product with respect to data handling (try to have a NOT NULL column, then enjoy the default empty strings that have been put in there). So at this point, yes, I consider that moving to <a href="http://www.postgresql.org/">PostgreSQL</a> is a way to <em>free your data</em>:</p> <center> <p><img src="../../../images/free-our-open-data.jpg" alt=""></p> </center> <center> <p><em>Free your data!</em></p> </center> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 17 Jan 2013 14:32:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/01/17-pgloader-auto-setup.html</guid> </item> <item> <title>Lost in scope</title> <link>http://tapoueh.org/blog/2013/01/09-Lost-in-scope.html</link> <description><![CDATA[<p>Thanks to <a href="https://twitter.com/mickael/status/288795520179240962">Mickael</a> on <em>twitter</em> I got to read an article about loosing scope with some common programming languages. As the blog article <a href="https://my.smeuh.org/al/blog/lost-in-scope">Lost in scope</a> references <em>functional programming languages</em> and plays with both <em>Javascript</em> and <em>Erlang</em>, I though I had to try it out with <em>Common Lisp</em> too.</p>

<center> <p><img src="../../../images/lambda.png" alt=""></p> </center> <center> <p><em>Let's have fun with lambda!</em></p> </center> <p>So, here we go with a simple Common Lisp attempt. The <em>Lost in scope</em> article begins with defining a very simple function returning a boolean value, only true when it's not monday.</p> <h3>Monday is special</h3> <p class="first">Keep in mind that the following example has been choosen to be simple yet offer a case of <em>lexical binding shadowing</em>. It looks convoluted. Focus on the day binding.</p> <pre class="src"> (<span style="color: #7f007f;">defparameter</span> <span style="color: #b8860b;">*days*</span>

'(monday tuesday wednesday thursday friday saturday sunday) <span style="color: #bc8f8f;">"List of days in the week"</span>)

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">any-day-but-monday?</span> (day)

<span style="color: #bc8f8f;">"Returns a generalized boolean, true unless DAY is 'monday"</span> (member day (remove-if (<span style="color: #7f007f;">lambda</span> (day) (eq day 'monday)) days))) </pre>

<p>So as you can see, in <em>Common Lisp</em> we just get away with a list of symbols rather than a string that we split to have a list of strings, or an array of strings, as in the examples with <em>python</em> and <em>ruby</em>.</p> <p>Now, the <em>generalized boolean</em> is either nil to mean false, or anything else to mean true, and in that example the return value of <a href="http://www.lispworks.com/documentation/HyperSpec/Body/a_member.htm">member</a> is a sub-list that begins where the <em>member</em> was found:</p> <pre class="src"> CL-USER&gt; (any-day-but-monday? 'monday) NIL

CL-USER&gt; (any-day-but-monday? 'tuesday) (TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY) </pre>

<p>Oh, and as we work with <em>Common Lisp</em>, we're having a real <a href="http://www.gigamonkeys.com/book/lather-rinse-repeat-a-tour-of-the-repl.html">REPL</a> where to play directly with our code, no need to add <em>interactive</em> stanzas in the main program text file just to be able to play with it. In <a href="http://common-lisp.net/project/slime/">Emacs Slime</a> we just use C-M-x on a <em>form</em> to have it available in the <em>REPL</em>, or C-c C-l to load the whole file we're working on.</p> <p>So, we see that <em>Common Lisp</em> scoping rules are silently doing the right thing here. Within the <a href="http://www.lispworks.com/documentation/HyperSpec/Body/f_rm_rm.htm">remove-if</a> call we define a <em>lambda</em> function taking a single parameter called <em>day</em>. It so happens that this parameter is shadowing the <em>any-day-but-monday?</em> function parameter, and that shadowing only happens in the <em>lexical scope</em> of the <em>lambda</em> we are creating. For a detailed discussion about that concept, I would refer you to the <a href="http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node43.html">Scope and Extent</a> chapter of <em>Common Lisp the Language, 2nd Edition</em>.</p> <p>In <em>Common Lisp</em> we have both <em>lexical scope</em> and <em>dynamic extents</em>, and a variable defined with <em>defparameter</em> or <em>defvar</em> or that you otherwise <a href="http://www.lispworks.com/documentation/HyperSpec/Body/s_declar.htm">declare</a> <em>special</em> will have a <em>dynamic extent</em>. Hence this section title.</p> <h3>Closures</h3> <p class="first">Now, the <a href="https://my.smeuh.org/al/blog/lost-in-scope">lost in scope</a> article tries some more at finding a solution around the scoping rules of the <em>python</em> and <em>ruby</em> languages, where the developer can not easily instruct the language about the scoping rules he wants to be using in a case by case way, as far as I can see.</p> <p>First, let's reproduce the problem by using a single variable that we bind in all the closures. Those are called <em>callbacks</em> in the original article, so I've kept using that name here.</p> <center> <p><img src="../../../images/callback.jpg" alt=""></p> </center> <pre class="src"> (<span style="color: #7f007f;">defparameter</span> <span style="color: #b8860b;">*callbacks-all-sunday*</span>

(<span style="color: #7f007f;">loop</span> for day in days collect (<span style="color: #7f007f;">lambda</span> () day)) <span style="color: #bc8f8f;">"loop binds DAY only once"</span>) </pre>

<p>In that example, there's only a single variable day that we reuse throughout the <em>loop</em> construct, so that when the loop ends, we have a list of closures all refering to the same variable, and this variable, by the end of the loop, has sunday as its value.</p> <pre class="src"> CL-USER&gt; (mapcar #'funcall callbacks-all-sunday) (SUNDAY SUNDAY SUNDAY SUNDAY SUNDAY SUNDAY SUNDAY) </pre> <h3>Closures, take 2</h3> <p class="first">Now, the way to have what we want here, that is a list of closures each having its own variable.</p> <pre class="src">

(<span style="color: #7f007f;">defparameter</span> <span style="color: #b8860b;">*callbacks*</span>

(mapcar (<span style="color: #7f007f;">lambda</span> (day) <span style="color: #b22222;">;; </span><span style="color: #b22222;">for each day, produce a separate closure </span> <span style="color: #b22222;">;; </span><span style="color: #b22222;">around its own lexical variable day </span> (<span style="color: #7f007f;">lambda</span> () day)) days) <span style="color: #bc8f8f;">"A list of callbacks to return the current day..."</span>) </pre>

<p>And there we go:</p> <pre class="src"> CL-USER&gt; (mapcar #'funcall callbacks) (MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY) </pre> <h3>Conclusion</h3> <p class="first">Scoping rules are very important in any programming language, functional or not, and must be well understood by programmers. I find that once again, that topic has received a very deep thinking in <em>Common Lisp</em>, and the language is giving all the options to its developers.</p> <center> <p><img src="../../../images/scope.png" alt=""></p> </center> <center> <p><em>What are your language of choice scoping rules?</em></p> </center> <p>I want to stress that in <em>Common Lisp</em> the scope rules are very clearly defined in the standard documentation of the language. For instance, <em>defun</em> and <em>let</em> both introduce a lexical binding, <em>defvar</em> and <em>defparameter</em> introduce a <em>dynamic variable</em>.</p> <p>Also, as a user of the language you have the ability to <em>declare</em> any variable as being <em>special</em> in order to introduce yourself a <em>dynamic variable</em>. In C you can declare some variables as being <em>static</em>, which is something else and frown with a very different set of problems.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 09 Jan 2013 11:07:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/01/09-Lost-in-scope.html</guid> </item> <item> <title>Extensions Templates</title> <link>http://tapoueh.org/blog/2013/01/08-Extensions-Templates.html</link> <description><![CDATA[<p>In a recent article titled <a href="../../2012/12/13-Inline-Extensions.html">Inline Extensions</a> we detailed the problem of how to distribute an extension's <em>package</em> to a remote server without having access to its file system at all. The solution to that problem is non trivial, let's say. But thanks to the awesome <a href="http://www.postgresql.org/community/">PostgreSQL Community</a> we finaly have some practical ideas on how to address the problem as discussed on <a href="http://archives.postgresql.org/pgsql-hackers/">pgsql-hackers</a>, our development mailing list.</p>

<center> <p><img src="../../../images/community.jpg" alt=""> <em>PostgreSQL is first an Awesome Community</em></p> </center> <p>The solution we talked about is to use <em>templates</em>, and so I've been working on a patch to bring <em>templates for extensions</em> to PostgreSQL. As we're talking about 3 new system catalogs, that's a big patch in term of lines of code. In term of features though, it's quite an easy one.</p> <p>Here's how it goes. Let's say you want to prepare the system to be able to CREATE EXTENSION pair; without having to install it as an <em>OS package</em> for which you would need to get root access on the server where your PostgreSQL instance is running, which is not always easy, and sometimes not a good idea.</p> <h3>Installing an extension template</h3> <p class="first">With the <a href="http://www.postgresql.org/message-id/m2wqvoha0p.fsf%402ndQuadrant.fr">template patch</a> I just sent on the lists, what you can do is prepare a template with your extension's script and properties, then use it to install the extensions.</p> <pre class="src"> <span style="color: #7f007f;">create</span> <span style="color: #da70d6;">template</span>

<span style="color: #7f007f;">for</span> extension pair <span style="color: #7f007f;">default</span> <span style="color: #da70d6;">version</span> <span style="color: #bc8f8f;">'1.0'</span> <span style="color: #7f007f;">with</span> (<span style="color: #da70d6;">nosuperuser</span>, norelocatable, <span style="color: #da70d6;">schema</span> <span style="color: #da70d6;">public</span>) <span style="color: #7f007f;">as</span> $$ <span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">TYPE</span> <span style="color: #0000ff;">pair</span> <span style="color: #7f007f;">AS</span> ( <span style="color: #da70d6;">k</span> <span style="color: #228b22;">text</span>, v <span style="color: #228b22;">text</span> );

<span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">OR</span> <span style="color: #da70d6;">REPLACE</span> <span style="color: #da70d6;">FUNCTION</span> <span style="color: #0000ff;">pair</span>(anyelement, <span style="color: #228b22;">text</span>) <span style="color: #da70d6;">RETURNS</span> pair <span style="color: #da70d6;">LANGUAGE</span> <span style="color: #da70d6;">SQL</span> <span style="color: #7f007f;">AS</span> <span style="color: #bc8f8f;">'SELECT ROW($1, $2)::pair'</span>;

<span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">OR</span> <span style="color: #da70d6;">REPLACE</span> <span style="color: #da70d6;">FUNCTION</span> <span style="color: #0000ff;">pair</span>(<span style="color: #228b22;">text</span>, anyelement) <span style="color: #da70d6;">RETURNS</span> pair <span style="color: #da70d6;">LANGUAGE</span> <span style="color: #da70d6;">SQL</span> <span style="color: #7f007f;">AS</span> <span style="color: #bc8f8f;">'SELECT ROW($1, $2)::pair'</span>;

<span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">OR</span> <span style="color: #da70d6;">REPLACE</span> <span style="color: #da70d6;">FUNCTION</span> <span style="color: #0000ff;">pair</span>(anyelement, anyelement) <span style="color: #da70d6;">RETURNS</span> pair <span style="color: #da70d6;">LANGUAGE</span> <span style="color: #da70d6;">SQL</span> <span style="color: #7f007f;">AS</span> <span style="color: #bc8f8f;">'SELECT ROW($1, $2)::pair'</span>;

<span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">OR</span> <span style="color: #da70d6;">REPLACE</span> <span style="color: #da70d6;">FUNCTION</span> <span style="color: #0000ff;">pair</span>(<span style="color: #228b22;">text</span>, <span style="color: #228b22;">text</span>) <span style="color: #da70d6;">RETURNS</span> pair <span style="color: #da70d6;">LANGUAGE</span> <span style="color: #da70d6;">SQL</span> <span style="color: #7f007f;">AS</span> <span style="color: #bc8f8f;">'SELECT ROW($1, $2)::pair;'</span>; $$; </pre>

<h3>Installing an extension from a template</h3> <p class="first">With the template installed in the catalogs, now you can go and install your extension:</p> <pre class="src"> foo&gt; <span style="color: #7f007f;">create</span> extension pair; <span style="color: #7f007f;">CREATE</span> EXTENSION

foo&gt; \dx pair

List <span style="color: #da70d6;">of</span> installed extensions

<span style="color: #da70d6;">Name</span> <span style="color: #da70d6;">Version</span> <span style="color: #da70d6;">Schema</span> Description

<span style="color: #b22222;">——+———+———+————-

</span> pair 1.0 <span style="color: #da70d6;">public</span>

(1 <span style="color: #da70d6;">row</span>)

foo&gt; \dx+ pair

Objects <span style="color: #7f007f;">in</span> extension "pair" <span style="color: #da70d6;">Object</span> Description <span style="color: #b22222;">————————————— </span> <span style="color: #da70d6;">function</span> pair(anyelement,anyelement) <span style="color: #da70d6;">function</span> pair(anyelement,<span style="color: #228b22;">text</span>) <span style="color: #da70d6;">function</span> pair(<span style="color: #228b22;">text</span>,anyelement) <span style="color: #da70d6;">function</span> pair(<span style="color: #228b22;">text</span>,<span style="color: #228b22;">text</span>) <span style="color: #da70d6;">type</span> pair (5 <span style="color: #da70d6;">rows</span>) </pre>

<p>The extension installation is now happening from the catalog templates rather than the file system, which means you didn't need to be root on the system where the server is running. Also note that this example above did happen when connected as the <em>database owner</em>, a user who is not the <em>superuser</em>. Requiring less privileges is always good news, right?</p> <h3>Managing upgrade scripts and extension update</h3> <p class="first">Now that the extension is installed, you might want to update it with some new awesome features. Let's have a look at that.</p> <center> <p><img src="../../../images/extension-update.png" alt=""></p> </center> <center> <p><em>Upload your Extension Update Scripts</em></p> </center> <p>Rather than make a new version of the extension package with the new files in there, then asking the operations team to make the new package available on the internal repositories then install them on the servers, you could now prepare and <em>QA</em> the new setup that way:</p> <pre class="src"> <span style="color: #7f007f;">create</span> <span style="color: #da70d6;">template</span> <span style="color: #7f007f;">for</span> extension pair <span style="color: #7f007f;">from</span> <span style="color: #bc8f8f;">'1.0'</span> <span style="color: #7f007f;">to</span> <span style="color: #bc8f8f;">'1.1'</span> <span style="color: #7f007f;">as</span> $$

<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">OPERATOR</span> ~&gt; (LEFTARG = <span style="color: #228b22;">text</span>, RIGHTARG = anyelement, <span style="color: #da70d6;">PROCEDURE</span> = pair);

<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">OPERATOR</span> ~&gt; (LEFTARG = anyelement, RIGHTARG = <span style="color: #228b22;">text</span>, <span style="color: #da70d6;">PROCEDURE</span> = pair);

<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">OPERATOR</span> ~&gt; (LEFTARG = anyelement, RIGHTARG = anyelement, <span style="color: #da70d6;">PROCEDURE</span> = pair);

<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">OPERATOR</span> ~&gt; (LEFTARG = <span style="color: #228b22;">text</span>, RIGHTARG = <span style="color: #228b22;">text</span>, <span style="color: #da70d6;">PROCEDURE</span> = pair); $$;

<span style="color: #7f007f;">create</span> <span style="color: #da70d6;">template</span>

<span style="color: #7f007f;">for</span> extension pair <span style="color: #7f007f;">from</span> <span style="color: #bc8f8f;">'1.1'</span> <span style="color: #7f007f;">to</span> <span style="color: #bc8f8f;">'1.2'</span> <span style="color: #7f007f;">as</span> $$ <span style="color: #da70d6;">comment</span> <span style="color: #7f007f;">on</span> extension pair <span style="color: #7f007f;">is</span> <span style="color: #bc8f8f;">'Simple Key Value Text Type'</span>; $$; </pre>

<p>Of course it's not the most realistic example when you look at the content. In particular the 1.2 version that only adds a comment to the extension. I needed another version to test the automatic upgrade path with more than one step though, so here we go.</p> <pre class="src"> foo&gt; <span style="color: #da70d6;">alter</span> extension pair <span style="color: #da70d6;">update</span> <span style="color: #7f007f;">to</span> <span style="color: #bc8f8f;">'1.2'</span>; <span style="color: #da70d6;">ALTER</span> EXTENSION

foo&gt; \dx pair

List <span style="color: #da70d6;">of</span> installed extensions

<span style="color: #da70d6;">Name</span> <span style="color: #da70d6;">Version</span> <span style="color: #da70d6;">Schema</span> Description

<span style="color: #b22222;">——+———+———+—————————-

</span> pair 1.2 <span style="color: #da70d6;">public</span> <span style="color: #da70d6;">Simple</span> <span style="color: #da70d6;">Key</span> <span style="color: #da70d6;">Value</span> <span style="color: #228b22;">Text</span> <span style="color: #da70d6;">Type</span>

(1 <span style="color: #da70d6;">row</span>)

foo&gt; \dx+ pair

Objects <span style="color: #7f007f;">in</span> extension "pair" <span style="color: #da70d6;">Object</span> Description <span style="color: #b22222;">————————————— </span> <span style="color: #da70d6;">function</span> pair(anyelement,anyelement) <span style="color: #da70d6;">function</span> pair(anyelement,<span style="color: #228b22;">text</span>) <span style="color: #da70d6;">function</span> pair(<span style="color: #228b22;">text</span>,anyelement) <span style="color: #da70d6;">function</span> pair(<span style="color: #228b22;">text</span>,<span style="color: #228b22;">text</span>) <span style="color: #da70d6;">operator</span> ~&gt;(anyelement,anyelement) <span style="color: #da70d6;">operator</span> ~&gt;(anyelement,<span style="color: #228b22;">text</span>) <span style="color: #da70d6;">operator</span> ~&gt;(<span style="color: #228b22;">text</span>,anyelement) <span style="color: #da70d6;">operator</span> ~&gt;(<span style="color: #228b22;">text</span>,<span style="color: #228b22;">text</span>) <span style="color: #da70d6;">type</span> pair (9 <span style="color: #da70d6;">rows</span>) </pre>

<p>We did it!</p> <h3>Internals</h3> <p class="first">Let's have a look at those new catalogs:</p> <center> <p><img src="../../../images/octopus-anatomy.jpg" alt=""></p> </center> <center> <p><em>Oh, that's not quite the internals I expected...</em></p> </center> <p>Here we go now:</p> <pre class="src"> foo&gt; select * from pg_extension_control; select * from pg_extension_control; -[ RECORD 1 ]—+——-
ctlname pair
ctlowner 32926
ctldefault t
ctlrelocatable f
ctlsuperuser f
ctlnamespace public
ctlversion 1.0
ctlrequires

foo&gt; select * from pg_extension_template; select * from pg_extension_template; -[ RECORD 1 ]——————————————————————

tplname pair
tplowner 32926
tplversion 1.0
tplscript
CREATE TYPE pair AS ( k text, v text );
CREATE OR REPLACE FUNCTION pair(anyelement, text)
RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair';
CREATE OR REPLACE FUNCTION pair(text, anyelement)
RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair';
CREATE OR REPLACE FUNCTION pair(anyelement, anyelement)
RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair';
CREATE OR REPLACE FUNCTION pair(text, text)
RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair;';

foo&gt; select * from pg_extension_uptmpl; select * from pg_extension_uptmpl; -[ RECORD 1 ]————————————————————-

uptname pair
uptowner 32926
uptfrom 1.0
uptto 1.1
uptscript
CREATE OPERATOR ~&gt; (LEFTARG = text,
RIGHTARG = anyelement,
PROCEDURE = pair);
CREATE OPERATOR ~&gt; (LEFTARG = anyelement,
RIGHTARG = text,
PROCEDURE = pair);
CREATE OPERATOR ~&gt; (LEFTARG = anyelement,
RIGHTARG = anyelement,
PROCEDURE = pair);
CREATE OPERATOR ~&gt; (LEFTARG = text,
RIGHTARG = text,
PROCEDURE = pair);

-[ RECORD 2 ]————————————————————-

uptname pair
uptowner 32926
uptfrom 1.1
uptto 1.2
uptscript
comment on extension pair is 'Simple Key Value Text Type';
</pre> <p>As you can see there's nothing too complex here, it's quite straightforward. We need to separate away the <em>creating</em> templates from the <em>updating</em> templates because we need <strong><em>unique</em></strong> keys and we can't have that on NULL columns.</p> <pre class="src"> foo&gt; \d pg_extension_template \d pg_extension_template Table <span style="color: #bc8f8f;">"pg_catalog.pg_extension_template"</span>
Column Type Modifiers

+——+————
tplname name not null
tplowner oid not null
tplversion text
tplscript text

Indexes:

<span style="color: #bc8f8f;">"pg_extension_template_name_version_index"</span> UNIQUE, btree (tplname, tplversion) <span style="color: #bc8f8f;">"pg_extension_template_oid_index"</span> UNIQUE, btree (oid)

foo&gt; \d pg_extension_uptmpl \d pg_extension_uptmpl Table <span style="color: #bc8f8f;">"pg_catalog.pg_extension_uptmpl"</span>

Column Type Modifiers

+——+————
uptname name not null
uptowner oid not null
uptfrom text
uptto text
uptscript text

Indexes:

<span style="color: #bc8f8f;">"pg_extension_uptmpl_name_from_to_index"</span> UNIQUE, btree (uptname, uptfrom, uptto) <span style="color: #bc8f8f;">"pg_extension_uptmpl_oid_index"</span> UNIQUE, btree (oid) </pre>

<h3>Next steps</h3> <p class="first">Now that we have the basics in place, the patch is far from finished still. It needs pg_dump and psql support, support for the function pg_available_extension_versions(), implementing some ALTER TEMPLATE FOR EXTENSION commands for which I only sketched the syntax in the grammar, and some more infrastructure to be able to have ALTER OWNER and ALTER RENAME commands.</p> <center> <p><img src="../../../images/patch-brewing.jpg" alt=""></p> </center> <center> <p><em>Warning: patch brewing here! Syntax and other key elements will change.</em></p> </center> <p>All that is pretty technical though, the real thing that patch needs is some quality review and maybe some adjustments. I would be surprised if it didn't need adjustments, really. Because the way the community works, we always need some. That's why the PostgreSQL product is so good!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 08 Jan 2013 17:53:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/01/08-Extensions-Templates.html</guid> </item> <item> <title>Inline Extensions</title> <link>http://tapoueh.org/blog/2012/12/13-Inline-Extensions.html</link> <description><![CDATA[<p>We've been having the CREATE EXTENSION feature in <a href="http://www.postgresql.org/">PostgreSQL</a> for a couple of releases now, so let's talk about how to go from here. The first goal of the extension facility has been to allow for a clean <em>dump</em> and <em>restore</em> process of <a href="http://www.postgresql.org/docs/9.2/static/contrib.html">contrib</a> modules. As such it's been tailored to the needs of deploying files on the <em>file system</em> because there's no escaping from that when you have to ship <em>binary</em> and <em>executable</em> files, those infamous .so, .dll or .dylib things.</p>

<center> <p><img src="../../../images/dylibbundler.png" alt=""></p> </center> <p>Now that we have the <em>Extension</em> facility though, what we see is a growing number of users taking advantage of it for the purpose of managing in house procedural code and related objects. This code can be a bunch of <a href="http://www.postgresql.org/docs/9.2/static/plpgsql.html">PLpgSQL</a> or <a href="http://www.postgresql.org/docs/9.2/static/plpython.html">plpython</a> functions and as such you normaly create them directly from any application connection to PostgreSQL.</p> <p>So the idea would be to allow creating <em>Extensions</em> fully from a SQL command, including the whole set of objects it contains. More than one approach are possible to reach that goal, each with downsides and advantages. We will see them later in that document.</p> <p>Before that though, let's first review what the extension mechanism has to offer to its users when there's no <em>contrib like</em> module to manage.</p> <h3>A use case for next generation extensions</h3> <p class="first">The only design goal of the 9.1 PostgreSQL Extension feature has been to support a proper <em>dump &amp; restore</em> user experience when using <em>contrib modules</em> such as hstore or ltree. Building up on that, what do <em>Extensions</em> have to offer to non C developpers out there? In other words, what CREATE EXTENSION brings on the table that a bunch of <em>loose</em> objects does not? What problems can we now solve?</p> <center> <p><img src="../../../images/multi_function_equipment.jpg" alt=""></p> </center> <center> <p><em>A Multi Functions Equipment, All Bundled Together</em></p> </center> <p>A way to phrase it is to say that <em>Extensions</em> are user defined CASCADE support. <em>Extensions</em> brings extensibility to the pg_depend PostgreSQL internal dependency tracking system that CASCADE is built on. From that angle, <em>Extensions</em> are a way to manage dependencies of <em>SQL objects</em> in a way that allow you to manage them as a single entity.</p> <p>One of the existing problems this helps solving is the infamous lack of dependency tracking between function calls. Using <em>Extensions</em> when you deal with a set of functions acting as an API, you can at least protect that as a unit:</p> <pre class="src"> STATEMENT: drop function public.populate_record(anyelement,hstore);

ERROR: cannot drop function populate_record(anyelement,hstore) because extension hstore requires it HINT: You can drop extension hstore instead. </pre>

<p>And you also have a version number and tools integration to manage extensions, with psql \dx command and the equivalent feature in <a href="http://www.pgadmin.org/">pgAdmin</a>. Coming with your own version number management is not impossible, some do that already. Here it's integrated and the upgrade sequences are offered too (applying 1.1--1.2 then 1.2--1.3 automatically).</p> <p>Let's just say that it's very easy to understand the <em>traction</em> our users feel towards leveraging <em>Extensions</em> features in order to properly manage their set of stored procedures and SQL objects.</p> <h3>The <em>dump &amp; restore</em> experience</h3> <p class="first">The common problem of all those proposals is very central to the whole idea of <em>Extensions</em> as we know them. The goal of building them as been to fix the <em>restoring</em> experience when using extensions in a database, and we managed to do that properly for contrib likes extensions.</p> <center> <p><img src="../../../images/fly.tn.png" alt=""></p> </center> <center> <p><em>A fly in the ointment</em></p> </center> <p>When talking about <em>Inline Extensions</em>, the fly in the ointment is how to properly manage their pg_dump behavior. The principle we built for <em>Extensions</em> and that is almost unique to them is to <strong><em>omit</em></strong> them in the dump files. The only other objects that we filter out of the dump are the one installed at server initialisation times, when using <a href="http://www.postgresql.org/docs/9.2/static/app-initdb.html">initdb</a>, to be found in the pg_catalog and information_schema systems' <em>schema</em>.</p> <p>At restore time, the dump file contains the CREATE EXTENSION command so the PostgreSQL server will go fetch the <em>control</em> and <em>script</em> files on disk and process them, loading the database with the right set of SQL objects.</p> <p>Now we're talking about <em>Extensions</em> which we would maybe want to dump the objects of, so that at <em>restore</em> time we don't need to find them from unknown external resources: the fact that the extension is <em>Inline</em> means that the PostgreSQL server has no way to know where its content is coming from.</p> <p>The next proposals are trying to address that problem, with more or less success. So far none of them is entirely sastisfying to me, even if a clear temporary winner as emerged on the <em>hackers</em> mailing list, summarized in the <a href="http://archives.postgresql.org/message-id/m2fw3judug.fsf@2ndQuadrant.fr">in-catalog Extension Scripts and Control parameters (templates?)</a> thread.</p> <h3>Inline Extension Proposals</h3> <p class="first">Now, on to some proposals to make the best out of our all time favorite PostgreSQL feature, the only one that makes no sense at all by itself...</p> <h4>Starting from an empty extension</h4> <p class="first">We already have the facility to add existing <em>loose</em> objects to an extension, and that's exactly what we use when we create an extension for the first time when it used not to be an extension before, with the CREATE EXTENSION ... FROM 'unpackaged'; command.</p> <p>The hstore--unpackaged--1.0.sql file contains statements such as:</p> <pre class="src"> <span style="color: #da70d6;">ALTER</span> EXTENSION hstore <span style="color: #da70d6;">ADD</span> <span style="color: #da70d6;">type</span> <span style="color: #0000ff;">hstore</span>; <span style="color: #da70d6;">ALTER</span> EXTENSION hstore <span style="color: #da70d6;">ADD</span> <span style="color: #da70d6;">function</span> <span style="color: #0000ff;">hstore_in</span>(cstring); <span style="color: #da70d6;">ALTER</span> EXTENSION hstore <span style="color: #da70d6;">ADD</span> <span style="color: #da70d6;">function</span> <span style="color: #0000ff;">hstore_out</span>(hstore); <span style="color: #da70d6;">ALTER</span> EXTENSION hstore <span style="color: #da70d6;">ADD</span> <span style="color: #da70d6;">function</span> <span style="color: #0000ff;">hstore_recv</span>(internal); <span style="color: #da70d6;">ALTER</span> EXTENSION hstore <span style="color: #da70d6;">ADD</span> <span style="color: #da70d6;">function</span> <span style="color: #0000ff;">hstore_send</span>(hstore); </pre> <p>Opening CREATE EXTENSION so that it allows you to create a really <em>empty</em> extension would then allow you to fill-in as you need, with as many commands as you want to add objects to it. The <em>control</em> file properties would need to find their way in that design, that sure can be taken care of.</p> <center> <p><img src="../../../images/empty-extension.jpg" alt=""></p> </center> <center> <p><em>Look me, an Empty Extension!</em></p> </center> <p>The main drawback here is that there's no separation anymore in between the extension author, the distribution means, the DBA and the database user. When you want to install a third party <em>Extension</em> using only SQL commands, you could do it with that scheme by using a big script full of one-liners commands.</p> <p>So that if you screw up your <em>copy/pasting</em> session (well you should maybe reconsider your choice of tooling at this point, but that's another topic), you will end up with a perfectly valid <em>Extension</em> that does not contain what you wanted. As the end user, you have no clue about that until the first time using the extension fails.</p> <h4>CREATE EXTENSION AS</h4> <p class="first">The next idea is to embed the <em>Extension</em> script itself in the command, so as to to get a cleaner command API (in my opinion at least) and a better error message when the paste is wrong. Of course it your <em>paste</em> problem happens to just be loosing a line in the middle of the script there is not so much I can do for you...</p> <pre class="src"> <span style="color: #7f007f;">CREATE</span> EXTENSION hstore

<span style="color: #7f007f;">WITH</span> <span style="color: #da70d6;">parameter</span> = <span style="color: #da70d6;">value</span>, ... <span style="color: #7f007f;">AS</span> $$ <span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">TYPE</span> <span style="color: #0000ff;">hstore</span>;

<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">FUNCTION</span> <span style="color: #0000ff;">hstore_in</span>(cstring) <span style="color: #da70d6;">RETURNS</span> hstore

<span style="color: #7f007f;">AS</span> <span style="color: #bc8f8f;">'MODULE_PATHNAME'</span> <span style="color: #da70d6;">LANGUAGE</span> <span style="color: #da70d6;">C</span> <span style="color: #da70d6;">STRICT</span> <span style="color: #da70d6;">IMMUTABLE</span>;

<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">FUNCTION</span> <span style="color: #0000ff;">hstore_out</span>(hstore) <span style="color: #da70d6;">RETURNS</span> cstring <span style="color: #7f007f;">AS</span> <span style="color: #bc8f8f;">'MODULE_PATHNAME'</span> <span style="color: #da70d6;">LANGUAGE</span> <span style="color: #da70d6;">C</span> <span style="color: #da70d6;">STRICT</span> <span style="color: #da70d6;">IMMUTABLE</span>;

<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">FUNCTION</span> <span style="color: #0000ff;">hstore_recv</span>(internal) <span style="color: #da70d6;">RETURNS</span> hstore <span style="color: #7f007f;">AS</span> <span style="color: #bc8f8f;">'MODULE_PATHNAME'</span> <span style="color: #da70d6;">LANGUAGE</span> <span style="color: #da70d6;">C</span> <span style="color: #da70d6;">STRICT</span> <span style="color: #da70d6;">IMMUTABLE</span>;

<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">FUNCTION</span> <span style="color: #0000ff;">hstore_send</span>(hstore) <span style="color: #da70d6;">RETURNS</span> <span style="color: #228b22;">bytea</span> <span style="color: #7f007f;">AS</span> <span style="color: #bc8f8f;">'MODULE_PATHNAME'</span> <span style="color: #da70d6;">LANGUAGE</span> <span style="color: #da70d6;">C</span> <span style="color: #da70d6;">STRICT</span> <span style="color: #da70d6;">IMMUTABLE</span>;

<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">TYPE</span> <span style="color: #0000ff;">hstore</span> (

INTERNALLENGTH = -1, <span style="color: #da70d6;">STORAGE</span> = extended <span style="color: #da70d6;">INPUT</span> = hstore_in, <span style="color: #da70d6;">OUTPUT</span> = hstore_out, RECEIVE = hstore_recv, SEND = hstore_send); $$; </pre> <center> <p><em>An edited version of hstore--1.1.sql for vertical space concerns</em></p> </center>

<p>I've actually proposed a patch to implement that, as you can see in the <a href="https://commitfest.postgresql.org/action/patch_view?id=981">pg_dump —extension-script</a> commit fest entry. As spoiled by the commit fest entry title, the main problem we have with <em>Inline Extensions</em> is their management in the seamless experience of <em>dump &amp; restore</em> that we are so happy to have now. More about that later, though.</p> <h4>Extension Templates</h4> <p class="first">Another idea is to continue working from control parameters and scripts to install and update extensions, but to have two different places where to find those. Either on the server's <em>File System</em> (when dealing with <em>contribs</em> and <em>shared libraries</em>, there's but a choice), or on the system catalogs.</p> <center> <p><img src="../../../images/templates.png" alt=""></p> </center> <center> <p><em>We Already Have TEXT SEARCH TEMPLATE After All</em></p> </center> <p>The idea would then be to have some new specific TEMPLATE SQL Object that would be used to <em>import</em> or <em>upload</em> your control file and create and update scripts in the database, using nothing else than a SQL connection. Then at CREATE EXTENSION time the system would be able to work either from the file system or the <em>template</em> catalogs.</p> <p>One obvious problem is how to deal with a unique namespace when we split the sources into the file system and the database, and when the file system is typically maintained by using apt-get or yum commands.</p> <p>Then again I would actually prefer that mechanism better than the other proposals if the idea was to load the file system control and scripts files as TEMPLATEs themselves and then only operate <em>Extensions</em> from <em>Templates</em>. But doing that would mean getting back to the situation where we still are not able to devise a good, simple and robust pg_dump policy for extensions and templates.</p> <h3>Conclusion</h3> <p class="first">I hope to be finding the right solution to my long term plan in this release development cycle, but it looks like the right challenge to address now is to find the right compromise instead. Using the <em>Templates</em> idea already brings a lot on the table, if not the whole set of features I would like to see.</p> <center> <p><img src="../../../images/building-blocks.jpg" alt=""></p> </center> <center> <p><em>PostgreSQL: Building on Solid Foundations</em></p> </center> <p>What would be missing mainly would be the ability for an <em>Extension</em> to switch from being file based to being a template, either because the author decided to change the way he's shipping it, or because the user is switching from using the <a href="http://pgxnclient.projects.pgfoundry.org/">pgxn client</a> to using <em>proper</em> system packages. I guess that's something we can see about later, though.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 13 Dec 2012 11:34:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/12/13-Inline-Extensions.html</guid> </item> <item> <title>Trigger Parameters</title> <link>http://tapoueh.org/blog/2012/12/06-parametrized-triggers.html</link> <description><![CDATA[<p>We have a not too active <a href="http://archives.postgresql.org/pgsql-fr-generale/2012-12/index.php">postgresql-fr-generale</a> mailing list where some interesting questions are asked by our subscribers. This article comes from such a question about how to deal with trigger parameters, which are nice to have, but static.</p>

<center> <p><img src="../../../images/trigger-wheels.big.jpg" alt=""></p> </center> <p>Another way to ask that question is saying that</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 06 Dec 2012 11:10:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/12/06-parametrized-triggers.html</guid> </item> <item> <title>M-x ack</title> <link>http://tapoueh.org/blog/2012/11/22-Emacs-Ack-Mode.html</link> <description><![CDATA[<p>I've been asked about how to integrate the <a href="http://betterthangrep.com/">ack</a> tool (you know, the one that is <em>better than grep</em>) into Emacs today. Again. And I just realized that I didn't blog about my solution. That might explain why I keep getting asked about it after all...</p>

<p>So here it is, M-x ack:</p> <pre class="src"> <span style="color: #b22222;">;;; </span><span style="color: #b22222;">dim-ack.el — Dimitri Fontaine </span><span style="color: #b22222;">;;</span><span style="color: #b22222;"> </span><span style="color: #b22222;">;; </span><span style="color: #b22222;">http://stackoverflow.com/questions/2322389/ack-does-not-work-when-run-from-gr</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">ep-find-in-emacs-on-windows</span><span style="color: #b22222;"> </span> (<span style="color: #7f007f;">defcustom</span> <span style="color: #b8860b;">ack-command</span> (or (executable-find <span style="color: #bc8f8f;">"ack"</span>)

(executable-find <span style="color: #bc8f8f;">"ack-grep"</span>)) <span style="color: #bc8f8f;">"Command to use to call ack, e.g. ack-grep under debian"</span> <span style="color: #da70d6;">:type</span> 'file)

(<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">ack-command-line</span> (concat ack-command <span style="color: #bc8f8f;">" —nogroup —nocolor "</span>)) (<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">ack-history</span> nil) (<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">ack-host-defaults-alist</span> nil)

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">ack</span> ()

<span style="color: #bc8f8f;">"Like grep, but using ack-command as the default"</span> (interactive) <span style="color: #b22222;">; </span><span style="color: #b22222;">Make sure grep has been initialized </span> (<span style="color: #7f007f;">if</span> (&gt;= emacs-major-version 22) (<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">grep</span>) (<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">compile</span>)) <span style="color: #b22222;">; </span><span style="color: #b22222;">Close STDIN to keep ack from going into filter mode </span> (<span style="color: #7f007f;">let</span> ((null-device (format <span style="color: #bc8f8f;">"&lt; %s"</span> null-device)) (grep-command ack-command-line) (grep-history ack-history) (grep-host-defaults-alist ack-host-defaults-alist)) (call-interactively 'grep) (setq ack-history grep-history ack-host-defaults-alist grep-host-defaults-alist)))

(<span style="color: #7f007f;">provide</span> '<span style="color: #5f9ea0;">dim-ack</span>) </pre>

<p>Enjoy!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 22 Nov 2012 17:36:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/11/22-Emacs-Ack-Mode.html</guid> </item> <item> <title>CL Happy Numbers</title> <link>http://tapoueh.org/blog/2012/11/20-CL-Happy-Numbers.html</link> <description><![CDATA[<p>A while ago I stumbled upon <a href="http://tapoueh.org/blog/2010/08/30-happy-numbers.html">Happy Numbers</a> as explained in <a href="http://programmingpraxis.com/2010/07/23/happy-numbers/">programming praxis</a>, and offered an implementation of them in SQL and in Emacs Lisp. Yeah, I know. Why not, though?</p>

<center> <p><img src="../../../images/happy-numbers.png" alt=""></p> </center> <p>Today I'm back on that topic and as I'm toying with <em>Common Lisp</em> I though it would be a good excuse to learn me some new tricks. As you can see from the earlier blog entry, last time I did attack the <em>digits</em> problem quite lightly. Let's try a better approach now.</p> <pre class="src"> (<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">digits</span> (n)

<span style="color: #bc8f8f;">"return the list of the digits of N"</span> (nreverse (<span style="color: #7f007f;">loop</span> for x = n then r for (r d) = (multiple-value-list (truncate x 10)) collect d until (zerop r)))) </pre>

<p>As you can see I wanted to use that facility I like very much, the for x = n then r way to handle first loop iteration differently from the next ones. But I've been hinted on #lisp that there's a much better way to write same code:</p> <pre class="src"> (<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">integer-digits</span> (integer)

<span style="color: #bc8f8f;">"stassats version"</span> (nreverse (<span style="color: #7f007f;">loop</span> with remainder do (setf (values integer remainder) (truncate integer 10)) collect remainder until (zerop integer)))) </pre>

<p>That code runs about twice as fast as the previous one and is easier to reason about. It's using setf and the form <a href="http://www.lispworks.com/documentation/lw51/CLHS/Body/f_values.htm">setf values</a>, something nice to discover as it seems to be quite powerful. Let's see how to use it, even if it's really simple:</p> <pre class="src"> CL-USER&gt; (integer-digits 12304501) (1 2 3 0 4 5 0 1) </pre> <p>Let's move on to solving the <em>Happy Numbers</em> problem though:</p> <pre class="src"> (<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">sum-of-squares-of-digits</span> (integer)

(<span style="color: #7f007f;">loop</span> with remainder do (setf (values integer remainder) (truncate integer 10)) sum (* remainder remainder) until (zerop integer)))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">happy?</span> (n <span style="color: #228b22;">&amp;optional</span> seen)

<span style="color: #bc8f8f;">"return true when n is a happy number"</span> (<span style="color: #7f007f;">let*</span> ((happiness (sum-of-squares-of-digits n))) (<span style="color: #7f007f;">cond</span> ((eq 1 happiness) t) ((memq happiness seen) nil) (t (happy? happiness (push happiness seen))))))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">find-happy-numbers</span> (limit)

<span style="color: #bc8f8f;">"find all happy numbers from 1 to limit"</span> (<span style="color: #7f007f;">loop</span> for n from 1 to limit when (happy? n) collect n)) </pre>

<p>And here's how it goes:</p> <pre class="src"> CL-USER&gt; (find-happy-numbers 100) (1 7 10 13 19 23 28 31 32 44 49 68 70 79 82 86 91 94 97 100)

CL-USER&gt; (time (length (find-happy-numbers 1000000))) (LENGTH (FIND-HAPPY-NUMBERS 1000000)) took 1,621,413 microseconds (1.621413 seconds) to run.

116,474 microseconds (0.116474 seconds, 7.18%) of which was spent in GC. During that period, and with 4 available CPU cores, 1,431,332 microseconds (1.431332 seconds) were spent in user mode 145,941 microseconds (0.145941 seconds) were spent in system mode 185,438,208 bytes of memory allocated. 1 minor page faults, 0 major page faults, 0 swaps. 143071 </pre>

<p>Of course that code is much faster than the one I wrote before both in SQL and <em>Emacs Lisp</em>, the reason being that instead of writing the number into a <em>string</em> with (format t &quot;~d&quot; number) then <a href="http://www.lispworks.com/documentation/HyperSpec/Body/f_subseq.htm">subseq</a> to get them one after the other, we're now using <a href="http://www.lispworks.com/documentation/HyperSpec/Body/f_floorc.htm">truncate</a>.</p> <p>Happy hacking!</p> <h3>Update</h3> <p class="first">It turns out that to solve math related problem, some maths hindsight is helping. Who would have believed that? So if you want to easily get some more performances out of the previous code, just try that solution:</p> <pre class="src"> (<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">*depressed-squares*</span> '(0 4 16 20 37 42 58 89 145)

<span style="color: #bc8f8f;">"see http://oeis.org/A039943"</span>)

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">undepressed?</span> (n)

<span style="color: #bc8f8f;">"same as happy?, using a static list of unhappy sums"</span> (<span style="color: #7f007f;">cond</span> ((eq 1 n) t) ((member n depressed-squares) nil) (t (<span style="color: #7f007f;">let</span> ((h (sum-of-squares-of-digits n))) (undepressed? h)))))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">find-undepressed-numbers</span> (limit)

<span style="color: #bc8f8f;">"find all happy numbers from 1 to limit"</span> (<span style="color: #7f007f;">loop</span> for n from 1 to limit when (undepressed? n) collect n)) </pre>

<p>Time to compare:</p> <pre class="src"> CL-USER&gt; (time (length (find-happy-numbers 1000000))) (LENGTH (FIND-HAPPY-NUMBERS 1000000)) took 1,938,048 microseconds (1.938048 seconds) to run.

290,902 microseconds (0.290902 seconds, 15.01%) of which was spent in GC. During that period, and with 4 available CPU cores, 1,778,021 microseconds (1.778021 seconds) were spent in user mode 140,862 microseconds (0.140862 seconds) were spent in system mode 185,438,208 bytes of memory allocated. 3,320 minor page faults, 0 major page faults, 0 swaps. 143071

CL-USER&gt; (time (length (find-undepressed-numbers 1000000))) (LENGTH (FIND-UNDEPRESSED-NUMBERS 1000000)) took 1,036,847 microseconds (1.036847 seconds) to run.

5,372 microseconds (0.005372 seconds, 0.52%) of which was spent in GC. During that period, and with 4 available CPU cores, 1,018,708 microseconds (1.018708 seconds) were spent in user mode 16,982 microseconds (0.016982 seconds) were spent in system mode 2,289,152 bytes of memory allocated. 143071 CL-USER&gt; </pre> ]]></description> <author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 20 Nov 2012 18:20:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/11/20-CL-Happy-Numbers.html</guid> </item> <item> <title>About Vimgolf</title> <link>http://tapoueh.org/blog/2012/11/06-About-vimgolf.html</link> <description><![CDATA[<p>Following some <em>tweet</em> I found myself desultory watching an episode of the awesome <a href="http://vimeo.com/channels/222837">VimGolf in Emacs</a> video series by <a href="http://vimeo.com/timvisher">Tim Visher</a>. Those series are about picking some challenge from <a href="http://vimgolf.com/">vimgolf</a> and implementing it with our favorite editor instead. Because <a href="http://emacsrocks.com/">Emacs Rocks</a> guys.</p>

<center> <p><a class="image-link" href="http://emacsrocks.com/"> <img src="../../../images/emacs-rocks-logo.png"></a></p> </center> <p>Let me tell you upfront that I really dislike the whole idea of the <em>vim golf</em> challenge. I've been a user of both <em>Emacs</em> and <em>Vim</em> for many years, and finally decided to switch to <em>living in Emacs</em>; or if you prefer, climbing my way up from level 2 as in <a href="http://blog.vivekhaldar.com/post/3996068979/the-levels-of-emacs-proficiency">The Levels Of Emacs Proficiency</a>. The reason why is that I found that in my case, using <em>Vim</em> would mean spending more time thinking about <em>how</em> to do some editing operation rather than the <em>problem</em> I wanted to solve by editing some text, most often code.</p> <p>Of course, the main effect of <em>Vim Golf</em> is to make you focus even more on the wrong problem. There's still a good side of it though, which is that such challenges are good excuses to discover new features of your editor. So let's use that excuse to talk about some nice <em>Emacs</em> features.</p> <center> <p><a class="image-link" href="http://vimgolf.com/challenges/4dd3e19aec9eb6000100000d"> <img src="../../../images/vim_golf_logo.png"></a></p> </center> <center> <p><em>Vim Golf Challenge: Complete the hex array data</em></p> </center> <h3>The challenge</h3> <p class="first">The previous image will lead you to a particular challenge where it's all about filling in an array with consecutive <em>hexadecimal</em> numbers written as 0xab, where you begin with a template containing only the 0x00 entry. The idea is of course to use the <em>Vim</em> feature that will increment the <em>number at point</em>, and is available through the C-a keystroke.</p> <pre class="src"> <span style="color: #228b22;">unsigned</span> <span style="color: #228b22;">int</span> <span style="color: #b8860b;">hex</span>[] = {

0x00, }; </pre>

<h3>A first solution</h3> <p><em>Emacs</em> does not ship with an <em>increment-number-at-point</em>, much less so with one that would support <em>decimal</em>, <em>octal</em> and <em>hexadecimal</em> and even automatically recognize 0x as a prefix meaning that the next number is <em>hexadecimal</em>. But <em>Emacs</em> ship with <a href="http://www.gnu.org/software/emacs/manual/html_node/emacs/Keyboard-Macros.html">Emacs Keyboard Macros</a> and those have a counter, so it's easy enough to fill in numbers from 1 to 255 that way: M-1 F3 F3 , F4 will register a macro where the counter starts at 1, and each time you hit F4 it will insert the current counter value, increment it and insert a coma. You want to do that 254 times, so you do C-u 2 5 4 F4 and <em>Emacs</em> just does that.</p> <p>Now, to transform those decimal numbers into their <em>hexadecimal</em> representation, you can use advanced <a href="http://www.gnu.org/software/emacs/manual/html_node/emacs/Regexp-Replace.html">Emacs Regexp Replace</a> features. Replace [0-9]+ with the result from the following <em>Emacs Lisp</em> code: \,(format &quot;0x%02x&quot; (string-to-number \&amp;)). The \&amp; in there will be replaced by the matching text, so that will do what we need here, turning 10 into 0x0a.</p> <h3>Let's get some better tools</h3> <p class="first">We could do better, though. I happen to already use a <em>key chord</em> to duplicate the current line, and we would need a function to <a href="http://www.emacswiki.org/emacs/IncrementNumber">Increment Number At Point</a>. Those I found over at <a href="http://www.emacswiki.org/">EmacsWiki</a> were not to my taste as they were not able to figure out easily which <em>base</em> to use. So here's a little <em>Emacs Lisp</em> example showing how to extend your favorite editor to have some <em>Vim</em> common features, which is why <em>Emacs</em> ships with <em>Emacs Lisp</em> in the first place.</p> <pre class="src"> (<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">duplicate-current-line</span> (<span style="color: #228b22;">&amp;optional</span> n)

<span style="color: #bc8f8f;">"duplicate current line, make more than 1 copy given a numeric argument"</span> (interactive <span style="color: #bc8f8f;">"p"</span>) (<span style="color: #7f007f;">let</span> ((nb (or n 1)) (current-line (thing-at-point 'line))) (<span style="color: #7f007f;">save-excursion</span> <span style="color: #b22222;">;; </span><span style="color: #b22222;">when on last line, insert a newline first </span> (<span style="color: #7f007f;">when</span> (or (= 1 (forward-line 1)) (eq (point) (point-max))) (insert <span style="color: #bc8f8f;">"\n"</span>))

<span style="color: #b22222;">;; </span><span style="color: #b22222;">now insert as many time as requested </span> (<span style="color: #7f007f;">while</span> (&gt; n 0) (insert current-line) (decf n))) <span style="color: #b22222;">;; </span><span style="color: #b22222;">now move down as many lines as we inserted </span> (next-line nb)))

(global-set-key (kbd <span style="color: #bc8f8f;">"C-S-d"</span>) 'duplicate-current-line) </pre>

<center> <p><a class="image-link" href="http://lisperati.com/"> <img src="../../../images/emacs-on-toaster.jpg"></a></p> </center> <pre class="src"> (<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">cl</span>) <span style="color: #b22222;">; </span><span style="color: #b22222;">destructuring-bind is found there </span> (<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">dim:increment-number-at-point</span> (<span style="color: #228b22;">&amp;optional</span> prefix)

(interactive <span style="color: #bc8f8f;">"p"</span>) (<span style="color: #7f007f;">let*</span> ((beg (skip-chars-backward <span style="color: #bc8f8f;">"0-9a-fA-F"</span>)) (hexa (<span style="color: #7f007f;">save-excursion</span> (forward-char -2) (looking-at-p <span style="color: #bc8f8f;">"0x"</span>))) <span style="color: #b22222;">;; </span><span style="color: #b22222;">force the prefix to hexa (4) we see "0x" before the number </span> (prefix (<span style="color: #7f007f;">if</span> hexa 4 prefix)) (end (re-search-forward <span style="color: #bc8f8f;">"[0-9a-fA-F]+"</span> nil t)) (nstr (match-string 0)) (l (- (match-end 0) (match-beginning 0))) (fmt (format <span style="color: #bc8f8f;">"%%0%d"</span> l))) (message <span style="color: #bc8f8f;">"PLOP: %d"</span> prefix) (<span style="color: #7f007f;">destructuring-bind</span> (base format) (<span style="color: #7f007f;">case</span> prefix ((1) '(10 <span style="color: #bc8f8f;">"d"</span>)) <span style="color: #b22222;">; </span><span style="color: #b22222;">no command prefix, decimal </span> ((4) '(16 <span style="color: #bc8f8f;">"x"</span>)) <span style="color: #b22222;">; </span><span style="color: #b22222;">C-u, hexadecimal </span> ((16) '(8 <span style="color: #bc8f8f;">"o"</span>))) <span style="color: #b22222;">; </span><span style="color: #b22222;">C-u C-u, octal </span> (<span style="color: #7f007f;">let*</span> ((n (string-to-number nstr base)) (n+1 (+ n 1)) (fmt (format <span style="color: #bc8f8f;">"%s%s"</span> fmt format))) (replace-match (format fmt n+1))))))

(global-set-key (kbd <span style="color: #bc8f8f;">"C-c +"</span>) 'dim:increment-number-at-point) </pre>

<blockquote> <p class="quoted"> So if you're using <em>Emacs</em> a lot but always found an excuse not to grasp <em>Emacs Lisp</em>, I hope that article could be an excuse for you to do so…</p> </blockquote> <h3>Another solution</h3> <p class="first">Anyway, now that we are much better equipped, we can picture a better way to solve the problem. Instead of using a macro that inserts the next counter value, we can use one that duplicate current line, increment number at point (and figures out on its own that the number prefixed with 0x is <em>hexadecimal</em>), and does that 254 times more. Then it's all about reformatting the text so that if fits nicely on screen, and for that the command M-q runs the command fill-paragraph is exactly what we need. The command C-x f runs the command set-fill-column can be used to set the maximum column we allow <em>Emacs</em> to reach before going to the next line.</p> <p>Our <em>Golf</em> then becomes a 19 steps solution if you start with the cursor at the ',' in the previous example:</p> <pre class="src"> C-x f 5 6 RET F3 C-S-d C-c + F4 C-u 2 5 4 F4 C-SPC M-&lt; C-n M-q </pre> <p>First, set the <em>fill column</em>, then register a macro (in between F3 and F4) that will duplicate current line (using C-S-d) then increment number at point (using C-c +). Third line, replay that macro 254 times (C-u 2 5 4 F4). Fourth line, select all those <em>hexadecimal</em> numbers and fill the paragraph they form correctly, so as to get:</p> <h3>All those tips for...</h3> <pre class="src"> <span style="color: #228b22;">unsigned</span> <span style="color: #228b22;">int</span> <span style="color: #b8860b;">hex</span>[] = {

0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, 0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f, 0x40, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47, 0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, 0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58, 0x59, 0x5a, 0x5b, 0x5c, 0x5d, 0x5e, 0x5f, 0x60, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68, 0x69, 0x6a, 0x6b, 0x6c, 0x6d, 0x6e, 0x6f, 0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76, 0x77, 0x78, 0x79, 0x7a, 0x7b, 0x7c, 0x7d, 0x7e, 0x7f, 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f, 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97, 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f, 0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7, 0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf, 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7, 0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf, 0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7, 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf, 0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7, 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf, 0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7, 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef, 0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7, 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff, }; </pre>

<h3>Conclusion</h3> <p class="first">Thanks to the excuse of that challenge we now have a generic facility to increment a number at point in different common bases, which is a nice building block for all kinds of <em>Emacs Keyboard Macros</em>. We also now have a function to duplicate the line at point, which is something I use very often myself.</p> <p>More importantly, we've been refreshing our memory on how to use some advanced replacement facilities wherein you can actually use inline <em>Emacs Lisp</em> code as a replacement pattern, and for the most interested readers here we have a good excuse to learn some more about <em>Emacs Lisp</em> programming.</p> <p>The main thing I want to say is that using <em>Emacs Keyboard Macros</em> is an interactive process: you don't have to pause your current activity to write some code in another language (here, that would be either <em>Vim Script</em> or <em>Emacs Lisp</em>) just to save a few minutes on a boring task.</p> <p>How effective your are at solving that challenge, for me, is not at all about measure how many keystrokes you ended up using, it's all about being able to get some precious help from your working tools <strong><em>without</em></strong> having to stop focusing on the main problem you are solving.</p> <p>I wouldn't ever get to write such <em>Emacs Lisp</em> code when doing that kind of editing once. I would only do that when I'm thinking I've just been doing a boring task by hand one time too many already. Like for example copying and pasting the pg_backend_pid() of the <a href="http://www.postgresql.org/">PostgreSQL</a> backend I'm working with at the psql prompt so that I can attach gdb to it. I'll get back to talking about <a href="https://github.com/dimitri/pgdevenv-el">pgdevenv-el</a> later!</p> <p>Hope you did enjoy that article, whose goal is to help you while you're journeying in <a href="http://blog.vivekhaldar.com/post/3996068979/the-levels-of-emacs-proficiency">The Levels Of Emacs Proficiency</a>.</p> <h3>Update</h3> <p class="first">While looking at the docs for the <a href="http://www.gnu.org/software/emacs/manual/html_node/emacs/Keyboard-Macro-Counter.html#Keyboard-Macro-Counter">Keyboard Macro Counter</a> to check how to reset it without having to record the macro again, I just stumbled on this part of the docs: C-x C-k C-f runs the command kmacro-set-format. So another way to solve our problem with only facilities that come with a bare Emacs is the following:</p> <pre class="src"> C-x f 5 6 RET C-x C-k C-f 0x%02x RET C-1 F3 SPC F3 , F4 C-u 2 5 4 F4 DEL C-SPC C-a C-q </pre> <p>We're now at 30 keystrokes, so much more than previously, but it's stock Emacs features and that kmacro-set-format is a wonderful little tool you might as well have a need for in the future.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Sun, 11 Nov 2012 20:52:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/11/06-About-vimgolf.html</guid> </item> <item> <title>Editing SQL</title> <link>http://tapoueh.org/blog/2012/11/06-Interactive-SQL.html</link> <description><![CDATA[<p>It's hard to read my blog yet not know I'm using <a href="http://www.gnu.org/software/emacs/#Platforms">Emacs</a>. It really is a great tool and has a lot to compare to <a href="http://www.postgresql.org/">PostgreSQL</a> in terms of extensibility, documentation quality and community. And there's even a native implementation of the <a href="http://www.postgresql.org/docs/current/static/protocol.html">PostgreSQL Protocol</a> written in <a href="http://www.gnu.org/software/emacs/emacs-lisp-intro/">Emacs Lisp</a>.</p>

<center> <p><a class="image-link" href="http://www.online-marketwatch.com/pgel/pg.html"> <img src="../../../images/pg-el.png"></a></p> </center> <p>One of the things where <em>Emacs</em> really shines is that interactive development environment you get when working on some <em>Emacs Lisp</em> code. Evaluating an function as easy as a single <em>key chord</em>, and that will both compile in the function and load it in the running process. I can't tell you how many times I've been missing that ability when editing C code.</p> <p>With <em>PostgreSQL</em> too we get a pretty interactive environment with the <a href="http://www.postgresql.org/docs/current/static/app-psql.html">psql</a> console application, or with <a href="http://www.pgadmin.org/">pgAdmin</a>. One feature from <em>pgAdmin</em> that I've often wished I had in <em>psql</em> is the ability to edit my query online and easily run it in the console, rather than either using the <em>readline</em> limited history editing features or launching a new editor process each time with \e. At the same time I would much prefer using my usual <em>Emacs</em> editor to actually <em>edit</em> the query.</p> <p>If you've been reading that blog before you know what to expect. My solution to the stated problem is available in <a href="https://github.com/dimitri/pgdevenv-el">pgdevenv-el</a>, an <em>Emacs</em> package aimed at helping <em>PostgreSQL</em> developers. Most of the features in there are geared toward the <em>core backend</em> developers, except for this one I want to talk about today (I'll blog about the other ones too I guess).</p> <center> <p><img src="../../../images/pgdevenv-el-eval-sql.png" alt=""></p> </center> <p>What you can see from that screenshot is that the selected query text has been sent to the <em>psql</em> buffer and exectuted over there. And that the <em>psql</em> buffer is echoing all queries sent to it. What you can not see straight from that picture is the interaction to get there. Well, I've been implementing some <em>elisp</em> features that I was missing.</p> <p>First, movement: you can do C-M-a and C-M-e to navigate to the beginning and the end of the SQL query at point, like you do in C or in lisp in <em>Emacs</em>.</p> <p>Then, selection: you can do C-M-h to select the SQL query at point, you don't have to navigate yourself, <a href="https://github.com/dimitri/pgdevenv-el">pgdev-sql-mode</a> knows how to do that. Side note, pgdev-sql-mode is the name of the <em>minor mode</em> you need to activate in your SQL buffers to have the magic available.</p> <p>Last but not least, evaluation: as when editing lisp code, you can now use C-M-x to send the current query text to an associated <em>psql</em> buffer.</p> <p>The way to associate the <em>psql</em> buffer to an <em>SQL</em> buffer is currently done thanks to the other <em>pgdevenv-el</em> features that this blog post is not talking about, and the setup is addressed in the documentation: you have to let know <em>pgdevenv-el</em> where your PostgreSQL branches are installed locally so that it can prepare you a <em>Shell</em> buffer with PGDATA and PGPORT already set for you. And currently, for C-M-x to work you need to open the buffer yourself before hand, using C-c - n (to run the command pgdev-open-shell), and type psql in the <em>Shell</em> prompt.</p> <p>What that means for me is that I can at least edit SQL (in <em>PostgreSQL</em> regression files and other places) in my usual <em>Emacs</em> buffer and actually refine it as I go until it does exactly what I need, without having to use the <em>readline</em> history editing or the \e command, which is not great when your <em>Shell</em> is in already running inside <em>Emacs</em>.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 06 Nov 2012 09:55:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/11/06-Interactive-SQL.html</guid> </item> <item> <title>Concurrent Hello</title> <link>http://tapoueh.org/blog/2012/11/04-Concurrent-Hello.html</link> <description><![CDATA[<p>Thanks to <a href="https://twitter.com/mickael/status/265191809100181504">Mickael</a> on <em>twitter</em> I ran into that article about implementing a very basic <em>Hello World!</em> program as a way to get into a new concurrent language or facility. The original article, titled <a href="http://himmele.blogspot.de/2012/11/concurrent-hello-world-in-go-erlang.html">Concurrent Hello World in Go, Erlang and C++</a> is all about getting to know <a href="http://golang.org/">The Go Programming Language</a> better.</p>

<p>To quote the article:</p> <blockquote> <p class="quoted"> The first thing I always do when playing around with a new software platform is to write a concurrent &quot;Hello World&quot; program. The program works as follows: One active entity (e.g. thread, Erlang process, Goroutine) has to print &quot;Hello &quot; and another one &quot;World!\n&quot; with the two active entities synchronizing with each other so that the output always is &quot;Hello World!\n&quot;.</p> </blockquote> <p>Here's my try in <a href="http://cliki.net/">Common Lisp</a> using <a href="http://lparallel.org/">lparallel</a> and some <em>local nicknames</em>, the whole 23 lines of it:</p> <pre class="src"> (<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">say-hello</span> (helloq worldq n)

(<span style="color: #7f007f;">dotimes</span> (i n) (format t <span style="color: #bc8f8f;">"Hello "</span>) (lq:push-queue <span style="color: #da70d6;">:say-world</span> worldq) (lq:pop-queue helloq)) (lq:push-queue <span style="color: #da70d6;">:quit</span> worldq))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">say-world</span> (helloq worldq)

(<span style="color: #7f007f;">when</span> (eq (lq:pop-queue worldq) <span style="color: #da70d6;">:say-world</span>) (format t <span style="color: #bc8f8f;">"World!~%"</span>) (lq:push-queue <span style="color: #da70d6;">:say-hello</span> helloq) (say-world helloq worldq)))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">hello-world</span> (n)

(<span style="color: #7f007f;">let*</span> ((lp:*kernel* (lp:make-kernel 2)) <span style="color: #b22222;">; </span><span style="color: #b22222;">a new one each time, as we end it </span> (channel (lp:make-channel)) (helloq (lq:make-queue)) (worldq (lq:make-queue))) (lp:submit-task channel #'say-world helloq worldq) (lp:submit-task channel #'say-hello helloq worldq n) (lp:receive-result channel) (lp:receive-result channel) (lp:end-kernel))) </pre>

<p>If you want to play locally with that code, I've been updating it to a <em>github</em> project named <a href="https://github.com/dimitri/go-hello-world">go-hello-world</a>, even if it's coded in <em>CL</em>. See the package.lisp in there for how I did enable the <em>local nicknames</em> lp and lq for the <em>lparallel</em> packages.</p> <h3>Beware of the REPL</h3> <p class="first">In a previous version of this very article, I said that sometimes I get an extra line feed in the output and I didn't understand why. Some great Common Lisp folks did hint me about that: it's the <em>REPL</em> output that get intermingled with the program output, and that's because the hello-world main function was returning before the thing is over.</p> <p>I've added a receive-result call in it per worker so that it waits until the end of the program before returning to the <em>REPL</em>, and that indeed fixes it. A way to assert that is using the time macro, which was always intermingled with the output before. It's fixed now:</p> <pre class="src"> CL-USER&gt; (time (go-hello-world:hello-world 1000)) Hello World! ... Hello World! (GO-HELLO-WORLD:HELLO-WORLD 1000) took 27,886 microseconds (0.027886 seconds) to run.

1,593 microseconds (0.001593 seconds, 5.71%) of which was spent in GC. During that period, and with 4 available CPU cores, 23,246 microseconds (0.023246 seconds) were spent in user mode 14,427 microseconds (0.014427 seconds) were spent in system mode 4,272 bytes of memory allocated. 10 minor page faults, 0 major page faults, 0 swaps. ( lparallel kernel shutdown manager(62) [Reset] #x30200109F65D&gt; ...) CL-USER&gt; </pre>

<h3>Conclusion</h3> <p class="first">While <em>Go</em> language seems to bring very interesting things on the table, such as better compilation units and tools, I still think that the concurrency primitives at the core of it are easy to find in other places. Which is a good thing, as it means we know they work.</p> <p>That also means that we don't need to accept <em>Go</em> syntax as the only way to properly solve that <em>concurrency</em> problem, I much prefer doing so with <em>Common Lisp</em> (lack of?) syntax myself.</p> <h3>Update</h3> <p class="first">A previous version of this article was finished and published too quickly, and the conclusion was made from a buggy version of the program. It's all fixed now. Thanks a lot to people who contributed comments so that I could fix it, and thanks again to <em>James M. Lawrence</em> for <a href="http://lparallel.org/">lparallel</a>!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Sun, 04 Nov 2012 23:04:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/11/04-Concurrent-Hello.html</guid> </item> <item> <title>PostgreSQL for developers</title> <link>http://tapoueh.org/blog/2012/11/02-Conference-AFUP-Lyon.html</link> <description><![CDATA[<p>As <a href="http://blog.guillaume.lelarge.info/index.php/post/2012/11/01/Conf%C3%A9rence-%C3%A0-l-AFUP-Lyon">Guillaume</a> says, we've been enjoying a great evening conference in Lyon 2 days ago, presenting PostgreSQL to developers. He did the first hour presenting the project and the main things you want to know to start using <a href="http://www.postgresql.org/">PostgreSQL</a> in production, then I took the opportunity to be talking to developers to show off some SQL.</p>

<center> <p><a class="image-link" href="../../../images/confs/developper-avec-pgsql.pdf"> <img src="../../../images/confs/developper-avec-pgsql-0.png"></a></p> </center> <p>That slide deck contains mainly SQL language, but some french too, rather than english. Sorry for the inconvenience if that's not something you can read. Get me to talk at an english developer friendly conference and I'll translate it for you! :)</p> <p>The aim of that talk is to have people think about SQL as a real asset in their development tool set. SQL really should get compared to your application development language rather than your UI formating language, it's more like PHP or Python than it is like HTML.</p> <p>So the whole talk is about showing off some advanced SQL features, all provided by default in released PostgreSQL versions. The main parts of the talk all come from an article in this blog: <a href="../10/05-reset-counter.html">Reset Counter</a>.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 02 Nov 2012 16:22:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/11/02-Conference-AFUP-Lyon.html</guid> </item> <item> <title>Another awesome conf</title> <link>http://tapoueh.org/blog/2012/10/30-Prague-Lyon.html</link> <description><![CDATA[<p>Last week was <a href="http://2012.pgconf.eu/">PostgreSQL Conference Europe 2012</a> in Prague, and it's been awesome. Many thanks to the organisers who did manage to host a very smooth conference with 290 attendees, including speakers. That means you kept walking into interesting people to talk to, and in particular the <em>Hallway Track</em> has been a giant success.</p>

<center> <p><a class="image-link" href="http://www.flickr.com/photos/obartunov/8128604476/lightbox/"> <img src="../../../images/prague.jpg"></a></p> </center> <center> <p><em>Photo by <a href="http://www.sai.msu.su/~megera/">Oleg Bartunov</a></em></p> </center> <p>I did have the chance to speak several times at that event, and you can get the slides at my <a href="../../../conferences.html">Conferences</a> page that I try to keep up to date. I did one talk about <a href="http://www.postgresql.eu/events/schedule/pgconfeu2012/session/318-implementing-high-availability/">Implementing High Availability</a> that was about 2 hours long (a double slot), <a href="http://www.postgresql.eu/events/schedule/pgconfeu2012/session/373-lightning-talks/">PGQ Cooperative Consumers</a> that Marko Kreen copresented with me and the <a href="http://www.postgresql.eu/events/schedule/pgconfeu2012/session/317-large-scale-mysql-migration-to-postgresql/">Large Scale MySQL Migration to PostgreSQL</a> that I already presented before this year.</p> <center> <p><a class="image-link" href="../../../images/high-availability.pdf"> <img src="../../../images/high-availability.png"></a></p> </center> <p>Next conference is in Lyon and will be in French, the talk is called <a href="http://lyon.afup.org/2012/10/17/presentation-de-postgresql-31102012-a-19h30/">Présentation de PostgreSQL</a>. The audience is going to be composed of PHP developers interested to know more about PostgreSQL, I'll tell you how it goes!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 30 Oct 2012 12:50:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/10/30-Prague-Lyon.html</guid> </item> <item> <title>Prefixes and Ranges</title> <link>http://tapoueh.org/blog/2012/10/16-prefix-update.html</link> <description><![CDATA[<p>It's been a long time since I last had some time to spend on the <a href="http://tapoueh.org/pgsql/prefix.html">prefix</a> PostgreSQL extension and its prefix_range data type. With PostgreSQL 9.2 out, some users wanted me to update the extension for that release, and hinted me that it was high time that I fix that old bug for which I already had a patch.</p>

<center> <p><img src="../../../images/Prefix-Pro-Blend.jpg" alt=""></p> </center> <h3>prefix_range release 1.2.0</h3> <p class="first">I'm sorry it took that long. It's now done, you can have prefix 1.2.0 from <a href="https://github.com/dimitri/prefix">https://github.com/dimitri/prefix</a> or if you want a <em>tagged</em> tarball then you can use this link: <a href="https://github.com/dimitri/prefix/tarball/v1.2.0">https://github.com/dimitri/prefix/tarball/v1.2.0</a>.</p> <p>The <em>changelog</em> is all about fixing an index search bug and updating the package to primarily be an extension for PostgreSQL 9.1 and 9.2. Of course older Major Versions are still supported (all of them since 8.1, but please first consider upgrading PostgreSQL) if you want to install it <em>manually</em>, using the prefix--1.2.0.sql file.</p> <h3>debian package</h3> <p class="first">And thanks to <a href="http://www.df7cb.de/">Christoph Berg</a> the debian package is already validated and has reached <em>debian experimental</em>. We don't target <em>sid</em> these days because debian is preparing a new stable release, so there's a freeze. I think. Anyway, take your prefix package from here if you need it: <a href="http://packages.debian.org/experimental/postgresql-9.1-prefix">http://packages.debian.org/experimental/postgresql-9.1-prefix</a>.</p> <h3>Range Types</h3> <p class="first">If you step back a little there's an interesting question to answer here. Why isn't prefix_range and <a href="http://www.postgresql.org/docs/9.2/static/rangetypes.html">PostgreSQL Range Type</a>? Given the names it seems like a pretty good candidate.</p> <p>Well the thing is that to make a generic range type you need to have a total ordering on the range elements, and a distance function that tells you how far any two elements of a range are one from each other.</p> <p>When talking about prefixes, I don't see how to do that. The prefix range ['abcd', 'abce') contains an infinity of elements, all the <em>strings</em> that begin with the letters abcd. I guess that coming with an ordering on text is possible, but what if any text element represents a prefix?</p> <p>I mean that in our case, the elements would be of type prefix, and 'abcd' is a prefix of 'abcdefg'. The question I want to answer is that given a table with prefixes 'abcd', 'abce' and 'abcde' which row in there has the longest prefix matching the literal 'abcdef'.</p> <p>I'm not seeing how to abuse the <em>Range Types</em> mechanism to implement that, so if you have some ideas please share them!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 16 Oct 2012 10:47:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/10/16-prefix-update.html</guid> </item> <item> <title>Reset Counter</title> <link>http://tapoueh.org/blog/2012/10/05-reset-counter.html</link> <description><![CDATA[<p>I've been given a nice puzzle that I think is a good blog article opportunity, as it involves some thinking and <em>window functions</em>.</p>

<h3>What's to solve</h3> <p class="first">Say we store in a table entries from a <em>counter</em> that only increases and the time stamp when we did the measurement. So that when you read 30 then later 40 in fact that means we counted 10 more the second reading when compared to the first, in other words the first 30 are counted again in the second counter value, 40.</p> <center> <p><a class="image-link" href="http://xkcd.com/363/"> <img src="../../../images/reset.png"></a></p> </center> <p>Now of course it's a real world counter. Think network traffic counter on a network interface, if you want something real to play with in your mind. So the counter will sometime reset and you will read measure sequences such as 40, 0, 20 if you happen to read just when the counter is reset, or most of the time that will look like 45, 25, 50.</p> <p>The question we want to answer is, given a series of that counter measures, including some resets, what is the current logical value of the counter?</p> <p>Given the sequence of measures 0, 10, 20, 30, 40, 0, 20, 30, 60 the result we want is 40 + 60, that is 100. Right?</p> <h3>Playing with some data</h3> <p class="first">Let's model an hypothetical dataset easy enough to play with. What about just the previous example? We also need to <em>time stamp</em> the measurements, let's just use a <em>tick</em> for now, as it's easier to think about:</p> <pre class="src"> create table measures(tick int, nb int);

insert into measures

values (1, 0), (2, 10), (3, 20), (4, 30), (5, 40), (6, 0), (7, 20), (8, 30), (9, 60); </pre>

<p>Now that we have some data in a table to play with, let's try to find out the numbers we are interested in: we only want to keep the latest measure we read on the counter just before it wraps. That means values where the <em>next one</em> (in tick or time stamping order) is lesser than the current counter value.</p> <p>As we are lucky enough to be playing with the awesome <a href="http://www.postgresql.org/">PostgreSQL</a> which brings <a href="http://www.postgresql.org/docs/9.2/static/tutorial-window.html">window functions</a> on the table, we can easily implement just what we said in a readable way:</p> <pre class="src">

select tick, nb, case when lead(nb) over w &lt; nb then nb when lead(nb) over w is null then nb else null end as max from measures window w as (order by tick); </pre>

<p>The firt <em>case</em> is the exact translation of the problem as spelled in english in just the previous paragraph where we stated we want to keep the current counter value in case of a <em>wraparound</em>, so I guess it's easy enough to get at.</p> <center> <p><img src="../../../images/reset-circuit-thumbnail.jpg" alt=""></p> </center> <p>Then we have a couple of tricks in that query in order to massage the data as we want it. First, the last row of the output won't have a <em>lead</em>, that <em>window function</em> call is going to return NULL. In that case, we keep the current counter value as if we just did a <em>wraparound</em>. And finally, when there's no <em>wraparound</em>, we don't care about the data. Well, for the purpose of knowing the current <em>logical</em> value of the counter, that is.</p> <p>And we get that encouraging result:</p> <pre class="src">
tick nb max

+——+——
1 0
2 10
3 20
4 30
5 40 40
6 0
7 20
8 30
9 60 60
</pre> <p>As you see, we have been able to create a new column out of the dataset, and that new column only contains the data we are interested into.</p> <h3>Finding the current counter value</h3> <p class="first">All we have to do now is sum this computed columns entries. Remember that the sum() aggregate function will simply discard nulls, so that we don't have to turn them into a bunch of 0.</p> <pre class="src"> with t(tops) as (

select case when lead(nb) over w &lt; nb then nb when lead(nb) over w is null then nb else null end as max from measures window w as (order by tick) ) select sum(tops) from t; </pre>

<center> <p><img src="../../../images/reset-elect.jpg" alt=""></p> </center> <p>And here's the expected result:</p> <pre class="src">

sum <span style="color: #b22222;">—— </span> 100 </pre>

<p>Now what about testing with another set of data or two, just to be sure that the counter is allowed to wrap more than once within our solution?</p> <pre class="src"> insert into measures

values (10, 0), (11, 10), (12, 30), (13, 35), (14, 45), (15, 25), (16, 50), (17, 100), (18, 110); </pre>

<p>Then we have:</p> <pre class="src"> with t(tops) as (

select case when lead(nb) over w &lt; nb then nb when lead(nb) over w is null then nb else null end as max from measures window w as (order by tick) ) select sum(tops) from t; sum <span style="color: #b22222;">—— </span> 255 (1 row) </pre>

<p>All good!</p> <h3>Counter logical value over a given period</h3> <p class="first">Now of course what we want is to find the logical value of the counter for a given day's or month's worth of measures. We then need to pay attention to the value of the counter at the start of our period so that we know to substract it from the logical sum over the period.</p> <center> <p><img src="../../../images/reset-coin-counter.jpg" alt=""></p> </center> <p>Here's an SQL version of the same sentence, applied to the period in between ticks 4 and 14, in a completely arbitrary choosing of mine:</p> <pre class="src"> with t as (

select tick, first_value(nb) over w as first, case when lead(nb) over w &lt; nb then nb when lead(nb) over w is null then nb else null end as max from measures where tick &gt;= 4 and tick &lt; 14 window w as (order by tick) ) select sum(max) - min(first) as sum from t; </pre>

<p>Here we are using the <em>first_value()</em> window function to retain it in the whole resultset of the <em>Common Table Expression</em> (the inner query introduced by the keyword WITH is called that way). And when doing the sum we're interested in at the outer level, we didn't forget to substract the first value: we need to use an aggregate here because we're doing a sum() aggregate at the same query level, and we have the same value in each row of the resultset, so we used min(), max() would have been as good.</p> <p>Another important trick we're using in that query is how to express the date range. Never use between for that, as you would end up counting boundaries twice, and customer won't like your accounting process if you do that. Always use a combo of inclusive and exclusive boundaries comparison, as in that WHERE clause in the previous query.</p> <p>Let's have a quick look at the raw data in that range, using another nice <em>aggregate</em> that PostgreSQL comes with:</p> <pre class="src"> select array_agg(nb) from measures where tick &gt;= 4 and tick &lt; 14;

array_agg <span style="color: #b22222;">——————————- </span> {30,40,0,20,30,60,0,10,30,35} (1 row) </pre>

<p>And now, the <em>logical counter value</em> for that period is computed as the following value by the previous query:</p> <pre class="src">

sum


105 (1 row) </pre>

<p>We can verify it manually, we want 40 + 60 + 35 - 30, I think we're all good again. Don't forget we have to substract the first measure from the period!</p> <h3>Extending the problem</h3> <p class="first">Another interesting problem, that we didn't have here but that I find interesting enough to extend this article, is finding the ranges of time (here, ticks) within which the counter didn't reset.</p> <center> <p><img src="../../../images/reset-A2a.jpg" alt=""></p> </center> <p>The query is more complex because we need to split the data into partitions, each partition containing data from the same counter series of measures without wrapping. The usual trick is to self-join our data set so that for each given row we have a set of rows from the same partition, we are going to instead use a <em>correlated subquery</em> to go fetch the next <em>wraparound</em> value:</p> <pre class="src"> with tops as (

select tick, nb, case when lead(nb) over w &lt; nb then nb when lead(nb) over w is null then nb else null end as max from measures window w as (order by tick) ) select tick, nb, max, (select tick from tops t2 where t2.tick &gt;= t1.tick and max is not null order by t2.tick limit 1) as p from tops t1;

tick nb max p

<span style="color: #b22222;">——+——+——+—-

</span> 1 0 5
2 10 5
3 20 5
4 30 5
5 40 40 5
6 0 9
7 20 9
8 30 9
9 60 60 9
10 0 14
11 10 14
12 30 14
13 35 14
14 45 45 14
15 25 18
16 50 18
17 100 18
18 110 110 18

(18 rows) </pre>

<p>With that as an input it's then possible to build ranges of ticks including non wrapping set of measures from our counter, and get for each range the logical value tat the counter had at the end of it:</p> <pre class="src"> with tops as (

select tick, nb, case when lead(nb) over w &lt; nb then nb when lead(nb) over w is null then nb else null end as max from measures window w as (order by tick) ), parts as ( select tick, nb, max, (select tick from tops t2 where t2.tick &gt;= t1.tick and max is not null order by t2.tick limit 1) as p from tops t1 ), ranges as ( select first_value(tick) over w as start, last_value(tick) over w as end, max(max) over w from parts window w as (partition by p order by tick) ) select * from ranges where max is not null;

start end max

<span style="color: #b22222;">——-+——+——

</span> 1 5 40
6 9 60
10 14 45
15 18 110

(4 rows) </pre>

<h3>Conclusion</h3> <p class="first">What I hope to have shown here, apart from some <em>window function</em> tips and some nice use cases for <em>common table expressions</em>, is that as a developper adding SQL to your tool set is a very good idea.</p> <center> <p><img src="../../../images/skill-set.jpg" alt=""></p> </center> <p>You don't want to have several parts of your code dealing with a logical counter like this, because you want the reporting, accounting, quota, billing and other software to all agree on the values. And you most probably want to avoid to fetch a huge result set of data and process it in the application memory (it'd better fit) rather than just get back a single integer column single row resultset, right?</p> <p>If you find this SQL example to be off the limits, it's a good sign that you need to improve on your skills so that SQL is a real asset of your developer multi languages multi paradygm talents.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 05 Oct 2012 09:44:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/10/05-reset-counter.html</guid> </item> <item> <title>PostgreSQL 9.3</title> <link>http://tapoueh.org/blog/2012/09/15-PostgreSQL-9.3.html</link> <description><![CDATA[<p><a href="http://www.postgresql.org/">PostgreSQL 9.2</a> is released! It's an awesome new release that I urge you to consider trying and adopting, an upgrade from even 9.1 should be very well worth it, as your hardware could suddenly be able to process a much higher load. Indeed, better performances mean more work done on the same budget, that's the name of the game!</p>

<p>As a <em>PostgreSQL contributor</em> though, the release of 9.2 mainly means to me that it's time to fully concentrate on preparing 9.3. The developement <em>season</em> of which as already begun, by the way, so some amount of work has already been done here.</p> <center> <p><img src="../../../images/event-trigger.jpg" alt=""></p> </center> <p>The list of things I want to be working on for that next release is quite long, and looks more like a christmas list than anything else. Let's only talk about those things I might as well make happen rather than all the things I wish I was able to be delivering in a single release...</p> <h3>Event Triggers</h3> <p class="first">We missed 9.2 for wanting to include too big a feature in one go, leading to too many choices to review and take decision about, for once, and also to some non optimal choices that had to be reconsidered. Thanks to <a href="../06/24-back-from-pgcon.html">PGCON</a> in Ottawa earlier this year, I could meet in person with <strong>Robert Haas</strong> and we've been able to decide how to attack that big patch I had. The first step has been to <em>commit</em> in the PostgreSQL tree only infrastructure parts, on which we will be able to build the feature itself.</p> <h4>Infrastructure</h4> <p class="first">What we already have today is the ability to run <em>user defined function</em> when some event occurs, and an event can only be a ddl_command_start as of now. Also the <em>trigger</em> itself must be written in PLpgSQL or PL/C, as the support for the other languages was not included from the patch.</p> <p>That leaves some work to be done in the next months, right?</p> <h4>PL support</h4> <p class="first">The <em>user defined function</em> will get some information from <em>magic variables</em> such as TG_EVENT and such. That allows easier integration of future information we want to add, without disrupting those existing <em>triggers</em> that you wrote (no API change), at the cost of having to write a specific integration per <em>procedural language</em>.</p> <p>So one of the first things to do now is to take the support for the others PL that I had in my proposal and make a new patch with only that in there.</p> <h4>Fill-in more information</h4> <p class="first">Then again, this first infrastructure part was all about being actually able to run a user function and left behind most of the information I would like the function to have. The information already there is the command tag, the event name and the parsetree that's only usable if you're writing your trigger in C, which we expect some users to be doing.</p> <p>To supplement that, we're talking about the Object ID that has been the target of the <em>event</em>, the schema it leaves in when applicable, the Object Name, the Operation that's running (CREATE, ALTER, DROP), the Object Kind being the target of said operation (e.g. TABLE or FUNCTION), and the command string.</p> <h4>Publishing the Command String</h4> <p class="first">Publishing the <em>Command String</em> here is not an easy task, because we have to rebuild a normalized version of it. Or maybe we can go with passing explicit context in which the command is running, such as the search_path.</p> <p>Even with an explicit context that would be easy enough to SET back again (in a remote server where you would be replicating the DDL, say), it would be better to normalize the <em>command string</em> so as to remove extra spaces and make it easier to parse and process from a <em>user defined function</em>.</p> <p>That part looks like where most of the work is going to happen in the next <em>commit fests</em>.</p> <h4>Events</h4> <p class="first">The other big thing I want to be working on with respect to this feature is the <em>event</em> support, which is basically <em>hard coded</em> to be ddl_command_start in the current state of the 9.3 code.</p> <p>We certainly will want to be able to run <em>user defined function</em> not only at the very beginning of a <em>DDL command</em>, but also just before it finishes so that the newly created object already exists, for example.</p> <p>We might also be interested into supporting triggers on more than DDL, there I doubt we will see that happening in 9.3, as some people in the community would go crazy about complex use cases. Time is limited, and I think this is better kept open for the next release, as the way our beloved PostgreSQL works is by delivering reliable features: quality first.</p> <h4>Use cases</h4> <p class="first">I'm always happy to hear about use cases for the features I'm working on, and this one has the potential to be covering a non trivial amount of them. I already can think of <em>trigger based replication systems</em> and some integrated <em>extension network facilities</em>. With your help we can give those the place they should have: early days use cases in a great collection.</p> <h3>Extensions</h3> <center> <p><img src="../../../images/extensions-cords.jpg" alt=""></p> </center> <p>So yes, <em>event triggers</em> first use case for me is in relation with <em>extensions</em>. Surprise! There's still some more I want to do with <em>extensions</em>, so much that I could consider their implementation in 9.1 just an enabler. In 9.1 the game has been to offer the best support we could design for existing contrib modules, with a very strong angle toward clean support for <em>dump</em> and <em>restore</em>.</p> <p>The typical contrib module exports in SQL a list of C coded functions, sometime supporting a new datatype, sometime a set of administration functions. It's quite rare that contrib modules are handling <em>user data</em> embedded in their SQL definition, and when it happens it's mostly <em>configuration</em> kind of data, such as with <a href="TODO:%20add%20the%20link">PostGIS</a>.</p> <p>Now we want to fully support <em>extensions</em> that are maintaining their own <em>user data</em>, or even those that are all about them. The main difficulty here is that our current design of <em>dump</em> and <em>restore</em> support is following a model where installing the same extesion in a new database is all covered by create extension foo;. This is a limited model of the reality, that we need to expand.</p> <p>The first manifestation of those problems is in the SEQUENCE support in extensions, and that impacts one of my favorite extensions: <a href="http://wiki.postgresql.org/wiki/Skytools">PGQ</a>.</p> <h3>PostgreSQL releases</h3> <p><a href="http://www.postgresql.org/">PostgreSQL</a> just released an awesome release with 9.2, where we get tremendous performance optimisations and truly innovative features, such as RANGE TYPE. How not to consider PostgreSQL as a part of your application stack, where to develop and host your features.</p> <p>While users are enjoying the newer release, contributors are already preparing the next one, hard at work again!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Sat, 15 Sep 2012 18:43:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/09/15-PostgreSQL-9.3.html</guid> </item> <item> <title>El-Get 4.1 is out</title> <link>http://tapoueh.org/blog/2012/08/28-el-get-new-stable-release.html</link> <description><![CDATA[<p>Please welcome the new stable version of <a href="https://github.com/dimitri/el-get#readme">El-Get</a>, the much awaited version 4.1 has now been branched for your pleasure. It's packed with lots of features to make your life easy, comes with a <em>Info</em> documentation book and even has a <em>logo</em>. That's no joke, I found one, at least:</p>

<center> <p><img src="../../../images/el-get.big.png" alt=""></p> </center> <h3>Why El-Get is relevant</h3> <p class="first">Emacs 24.1 is the first release that includes package.el, and it even allows the user to setup several sources where to fetch packages. Those sources, such as <a href="http://marmalade-repo.org/">Marmalade</a>, are hosting lots of third party code for Emacs. package.el makes it easy to <em>install</em> (partly) those software.</p> <p>This is a very fine way of getting extra features in your Emacs installation, and one that is supported out of the box. For a <em>package</em> to be listed, its sources need to be prepared, and you need to rely on the central website you now depend on to be up and running and accessible.</p> <p>El-Get is all about allowing you to easily cope with the still vast majority of Emacs Lisp extensions you can find out there, that is non packaged code that is only available on some more or less mainstream <em>distribution method</em>, ranging from <a href="http://emacswiki.org/">EmacsWiki</a> to <a href="http://github.com/">github</a> including <em>bare HTTP</em> personal hosting.</p> <p>With El-Get, you fetch the package where it's located. There's no need for a central server to host packaged and released software, and it's easy to share your findings with friends, or even to <em>publish</em> any elisp code you write.</p> <p>El-Get will also take care of final steps that package.el did choose not to support, such as including <em>Info</em> material in your info browser (remember C-h i runs the command info?), running ./configure &amp;&amp; make for you, <em>byte compiling</em> the sources you just retrieved, adding the necessary <a href="http://www.gnu.org/software/emacs/emacs-lisp-intro/html_node/Autoload.html">autoload</a> support, etc.</p> <p>And of course one of the <em>methods</em> supported by El-Get is ELPA, known as <em>Emacs Lisp Package Archive</em> and implemented by package.el.</p> <p>So definitely, you typically want both ELPA and <a href="http://github.com/dimitri/el-get">El-Get</a>.</p> <h3>El-Get 4.1 Changelog Summary</h3> <p class="first">The new El-Get release is packed with features. It really is. I will only list some of them now:</p> <ul> <li>Plenty new recipes, we now have 590 of them managed in the El-Get source repository itself, and El-Get will download the current <a href="http://emacswiki.org/">EmacsWiki</a> list of emacs lisp files at install time too.</li> <li>The default installation and usage has been simplified a lot.</li> <li>More options are provided to setup El-Get packages, see el-get-user-package-directory for example.</li> <li>Part of the simplification, el-get-sources has been revisited and now serves only one goal.</li> <li>We dropped (el-get 'wait) which was a misconception and had been broken for a long time in the development version of El-Get.</li> <li>We made improvements in the error handling and in dealing with some corner cases that still happen often enough for users to report them. Please continue reporting them!</li> <li>More caching is done, with a better dependency tracking and status management.</li> <li>Enhanced notification support, from DBUS to growl.</li> <li>Support for <em>checksums</em> with a lot of <em>source types.</em></li> <li>Completing our git support, <em>shallow</em> clones and <em>submodules</em> are there.</li> <li>Better support for github including the zip and tar releases.</li> <li>Ability to reload a package when it's been <em>updated</em>.</li> <li><em>Moar</em> features</li> </ul> <p>And most importantly, El-Get documentation is now almost complete and comes in the nice <em>Info</em> format I know you've been expecting for so long!</p> <h3>Using El-Get</h3> <p class="first">Here's a quick summary of what using El-Get is like, for a new user in 4.1. If you're already using El-Get see the section about upgrading. To install El-Get you need to paste those lines to your *scratch* buffer then hit C-j after the last closing parenthesis:</p> <pre class="src"> (url-retrieve

<span style="color: #bc8f8f;">"https://raw.github.com/dimitri/el-get/master/el-get-install.el"</span> (lambda (s) (goto-char (point-max)) (eval-print-last-sexp))) </pre>

<p>Then you can try M-x el-get-list-packages and browse through more than 2000 available packages. Mark the ones you want to install with i then type x to see El-Get fetch and install all those packages you just selected. Here's a summary of what's available to you in the M-x el-get-list-packages buffer:</p> <pre class="src"> Major Mode Bindings: SPC el-get-package-menu-mark-unmark ? el-get-package-menu-describe d el-get-package-menu-mark-delete g el-get-package-menu-revert h el-get-package-menu-quick-help i el-get-package-menu-mark-install u el-get-package-menu-mark-update x el-get-package-menu-execute </pre> <p>Once a package is <em>installed</em>, El-Get will <em>initialize</em> it for you, and it will also do that step at every Emacs startup from there on, provided that you added some lines to your ~/.emacs initialization file, that look a lot like the previous *scratch* code you did paste:</p> <pre class="src"> ;; ;; Here's a typical El-Get integration for your .emacs file: ;; (add-to-list 'load-path <span style="color: #bc8f8f;">"~/.emacs.d/el-get/el-get"</span>) (setq el-get-user-package-directory <span style="color: #bc8f8f;">"~/.emacs.d/packages.d/"</span>)

(unless (require 'el-get nil t)

(with-current-buffer (url-retrieve-synchronously <span style="color: #bc8f8f;">"https://raw.github.com/dimitri/el-get/master/el-get-install.el"</span>) (goto-char (point-max)) (eval-print-last-sexp)))

(el-get 'sync) </pre>

<p>Then you can add files named like init-&lt;package&gt;.el in the el-get-user-package-directory directory, those files will get loaded when El-Get <em>initialize</em> &lt;package&gt;.</p> <p>You can also use M-x el-get-install if you want to bypass the full screen package listing, you will get completion on the package name.</p> <h3>Community and development</h3> <p class="first">El-Get community grew to be a really cool place to be participating in these days, with core and <em>recipe</em> contributions from more than 130 different people already, and with 526 stars on github and 184 forks. I almost can't believe it!</p> <pre class="src">
git —no-pager shortlog -n -s wc -l

137

git —no-pager shortlog -n -s head -10

734 Dimitri Fontaine 336 Ryan C. Thompson 114 Julien Danjou 110 Dave Abrahams 73 Ryan Thompson 72 S&#233;bastien Gross 42 Takafumi Arakaki 27 Alex Ott 25 Yakkala Yagnesh Raghava 21 R&#252;diger Sonderfeld </pre>

<p>Now that we have something that looks like a <em>core team</em> forming up, I'm thinking about scheduling much more aggressive stable release. 4.1 has been very long in the making, I hope to now have a rapid release cycle leading us to 4.2 in quite a short time. As that's not an individual effort by any mean, though, only history will tell.</p> <h3>The roadmap</h3> <p class="first">We have lots of ideas and some rough edges to address, so 4.1 is only a stop in the release history of El-Get. Next ideas include better error management in face of rare corner cases and in face of external events, like when you did rm -rf a directory holding an El-Get managed extension: we should mark it <em>removed</em> and clean up the autoloads that came from it.</p> <h3>Upgrading to 4.1</h3> <p class="first">This item has received some treatment in the documentation. The basic idea is that el-get-sources is no longer what it used to be, it's now only an alternative source location for <em>recipes</em>, like it should always have been. Not that you can still <em>override</em> in there some properties that you want <em>merged</em> with an official <em>recipe</em>.</p> <p>The new thing about el-get-sources is that it will no longer be the authoritative list of packages that El-Get manages. That list is not either given explicitly when calling the el-get function in your .emacs setup, or derived from the packages that are known <em>installed</em> on your system (like e.g. debian is doing).</p> <p>Also, given that it took us so much time to brew 4.1 a lot of packages have changed either their hosting location or even switched their SCM. In such cases an automatic update of the recipe will no longer be possible, you might need to el-get-remove then el-get-install packages to get them back.</p> <h3>Conclusion</h3> <p class="first">El-Get 4.1 is now ready for public consumption, don't be shy, we've been a lot of users running the development branch for a long time now, I'm running 4.0.7.6901194 while writing this post. 4.0 is the development version of what is now released as 4.1.</p> <p>Many thanks to all who contributed to El-Get and to all our users, I'm very proud that together we worked out a very nice and complete tool!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 28 Aug 2012 11:43:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/08/28-el-get-new-stable-release.html</guid> </item> <item> <title>Fast and stupid?</title> <link>http://tapoueh.org/blog/2012/08/20-performance-the-easiest-way.html</link> <description><![CDATA[<p>I stumbled onto an interesting article about performance when using python, called <a href="http://jiaaro.com/python-performance-the-easyish-way">Python performance the easy(ish) way</a>, where the author tries to get the bet available performances out of the dumbiest possible python code, trying to solve a very simple and stupid problem.</p>

<p>With so many <em>smart</em> qualifiers you can only guess that I did love the challenge. The idea is to write the simplest code possible and see how smarter you need to be when you need perfs. Let's have a try!</p> <h3>local python results</h3> <p class="first">Here's the code I did use to benchmark the python solution:</p> <pre class="src"> <span style="color: #7f007f;">def</span> <span style="color: #0000ff;">sumrange</span>(arg):

<span style="color: #7f007f;">return</span> <span style="color: #da70d6;">sum</span>(<span style="color: #da70d6;">xrange</span>(arg))

<span style="color: #7f007f;">def</span> <span style="color: #0000ff;">sumrange2</span>(arg):

<span style="color: #b8860b;">x</span> = <span style="color: #b8860b;">i</span> = 0 <span style="color: #7f007f;">while</span> i &lt; arg: <span style="color: #b8860b;">x</span> += i <span style="color: #b8860b;">i</span> += 1 <span style="color: #7f007f;">return</span> x

<span style="color: #7f007f;">import</span> ctypes <span style="color: #b8860b;">ct_sumrange</span> = ctypes.CDLL(<span style="color: #bc8f8f;">'/Users/dim/dev/CL/jiaroo/sumrange.so'</span>)

<span style="color: #7f007f;">def</span> <span style="color: #0000ff;">sumrange_ctypes</span>(arg):

<span style="color: #7f007f;">return</span> ct_sumrange.sumrange(arg)

<span style="color: #7f007f;">if</span> <span style="color: #da70d6;">__name__</span> == <span style="color: #bc8f8f;">"__main__"</span>:

<span style="color: #7f007f;">import</span> timeit <span style="color: #b8860b;">t1</span> = timeit.Timer(<span style="color: #bc8f8f;">'import jiaroo; jiaroo.sumrange(10**10)'</span>) <span style="color: #b8860b;">t2</span> = timeit.Timer(<span style="color: #bc8f8f;">'import jiaroo; jiaroo.sumrange2(10**10)'</span>) <span style="color: #b8860b;">ct</span> = timeit.Timer(<span style="color: #bc8f8f;">'import jiaroo; jiaroo.sumrange_ctypes(10**10)'</span>)

<span style="color: #7f007f;">print</span> <span style="color: #bc8f8f;">'timing python sumrange(10**10)'</span> <span style="color: #7f007f;">print</span> <span style="color: #bc8f8f;">'xrange: %5fs'</span> % t1.timeit(1) <span style="color: #7f007f;">print</span> <span style="color: #bc8f8f;">'while: %5fs'</span> % t2.timeit(1) <span style="color: #7f007f;">print</span> <span style="color: #bc8f8f;">'ctypes: %5fs'</span> % ct.timeit(1) </pre>

<p>Oh. And the C code too, sorry about that.</p> <pre class="src"> <span style="color: #da70d6;">#include</span> <span style="color: #bc8f8f;">&lt;stdio.h&gt;</span>

<span style="color: #228b22;">int</span> <span style="color: #0000ff;">sumrange</span>(<span style="color: #228b22;">int</span> <span style="color: #b8860b;">arg</span>) {

<span style="color: #228b22;">int</span> <span style="color: #b8860b;">i</span>, <span style="color: #b8860b;">x</span>; x = 0;

<span style="color: #7f007f;">for</span> (i = 0; i &lt; arg; i++) { x = x + i; } <span style="color: #7f007f;">return</span> x; } </pre>

<p>And here's how I did compile it. The author of the inspiring article insisted on stupid optimisation targets, I did follow him:</p> <pre class="src"> gcc -shared -Wl,-install_name,sumrange.so -o sumrange.so -fPIC sumrange.c -O0 </pre> <p>And here's the result I did get out of it:</p> <pre class="src"> python jiaroo.py timing python sumrange(10**10) <span style="color: #da70d6;">xrange</span>: 927.039917s <span style="color: #7f007f;">while</span>: 2377.291237s ctypes: 5.297124s </pre> <p>Let's be fair, with -O2 we get much better results:</p> <pre class="src"> timing python sumrange(10**10) ctypes: 1.065684s </pre> <h3>Common Lisp to the rescue</h3> <p class="first">So let's have a try in Common Lisp, will you ask me, right?</p> <p>Here's the code I did use, you can see three different tries:</p> <pre class="src"> <span style="color: #b22222;">;;;; </span><span style="color: #b22222;">jiaroo.lisp </span><span style="color: #b22222;">;;;</span><span style="color: #b22222;"> </span><span style="color: #b22222;">;;; </span><span style="color: #b22222;">See http://jiaaro.com/python-performance-the-easyish-way </span><span style="color: #b22222;">;;;</span><span style="color: #b22222;"> </span><span style="color: #b22222;">;;; </span><span style="color: #b22222;">The goal here is to find out if CL needs to resort to C for very simple </span><span style="color: #b22222;">;;; </span><span style="color: #b22222;">optimisation tricks like python apparently needs too, unless using pypy </span><span style="color: #b22222;">;;; </span><span style="color: #b22222;">(to some extend). </span> (<span style="color: #7f007f;">in-package</span> #<span style="color: #da70d6;">:jiaroo</span>)

<span style="color: #b22222;">;;; </span><span style="color: #b22222;">"jiaroo" goes here. Hacks and glory await! </span> (<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">sumrange-loop</span> (max)

<span style="color: #bc8f8f;">"return the sum of numbers from 1 to MAX"</span> (<span style="color: #7f007f;">let</span> ((sum 0)) (<span style="color: #7f007f;">declare</span> (type (and unsigned-byte fixnum) max sum) (optimize speed)) (<span style="color: #7f007f;">loop</span> for i fixnum from 1 to max do (incf sum i))))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">sumrange-dotimes</span> (max)

<span style="color: #bc8f8f;">"return the sum of numbers from 1 to MAX"</span> (<span style="color: #7f007f;">let</span> ((sum 0)) (<span style="color: #7f007f;">declare</span> (type (and unsigned-byte fixnum) max sum) (optimize speed)) (<span style="color: #7f007f;">dotimes</span> (i max sum) (incf sum i))))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">pk-sumrange</span> (max)

(<span style="color: #7f007f;">declare</span> (type (and unsigned-byte fixnum) max) (optimize speed)) (<span style="color: #7f007f;">let</span> ((sum 0)) (<span style="color: #7f007f;">declare</span> (type (and fixnum unsigned-byte) sum)) (<span style="color: #7f007f;">dotimes</span> (i max sum) (setf sum (logand (+ sum i) most-positive-fixnum)))))

(<span style="color: #7f007f;">defmacro</span> <span style="color: #0000ff;">timing</span> (<span style="color: #228b22;">&amp;body</span> forms)

<span style="color: #bc8f8f;">"return both how much real time was spend in body and its result"</span> (<span style="color: #7f007f;">let</span> ((start (gensym)) (end (gensym)) (result (gensym))) `(<span style="color: #7f007f;">let*</span> ((,start (get-internal-real-time)) (,result (<span style="color: #7f007f;">progn</span> ,@forms)) (,end (get-internal-real-time))) (values ,result (/ (- ,end ,start) internal-time-units-per-second)))))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">bench-sumrange</span> (power)

<span style="color: #bc8f8f;">"print execution time of both the previous functions"</span> (<span style="color: #7f007f;">let*</span> ((max (expt 10 power)) (lp-time (<span style="color: #7f007f;">multiple-value-bind</span> (r s) (timing (sumrange-loop max)) s)) (dt-time (<span style="color: #7f007f;">multiple-value-bind</span> (r s) (timing (sumrange-dotimes max)) s)) (pk-time (<span style="color: #7f007f;">multiple-value-bind</span> (r s) (timing (pk-sumrange max)) s))) (format t <span style="color: #bc8f8f;">"timing common lisp sumrange 10**~d~%"</span> power) (format t <span style="color: #bc8f8f;">"loop: ~2,3fs ~%"</span> lp-time) (format t <span style="color: #bc8f8f;">"dotimes: ~2,3fs ~%"</span> dt-time) (format t <span style="color: #bc8f8f;">"pk dotimes: ~2,3fs ~%"</span> pk-time))) </pre>

<p>And here's the results:</p> <pre class="src"> CL-USER&gt; (bench-sumrange 10) timing common lisp sumrange 10**10 loop: 11.213s dotimes: 7.642s pk dotimes: 22.185s NIL </pre> <h3>Discussion</h3> <p class="first">So python is very slow. C is pretty fast. And Common Lisp just in the middle. Honnestly I expected better performances from my beloved Common Lisp here, but I didn't try very hard, by using <a href="http://ccl.clozure.com/">Clozure Common Lisp</a> which is not the quicker Common Lisp implementation around. For this very benchmark, if you're seeking speed use either <a href="http://sbcl.org/">Steel Bank Common Lisp</a> or <a href="http://www.clisp.org/">CLISP</a> which is known to have a pretty fast bignums implementation (which you don't need in 64 bits in that game).</p> <p>On the other hand, I think that having to go write a C plugin and deal with how to compile and deploy it in the middle of a python script is something to avoid. When using Common Lisp you don't need to resort to that for the <em>runtime</em> to get down from python <em>xrange</em> implementation at 927.039917s down to the <em>dotimes</em> implementation taking 7.642s. That's about 121 times faster.</p> <p>So while C is even better, and while I would like a Common Lisp guru to show me how to get a better speed here, I still very much appreciate the solution here.</p> <p>Let's see the winning source code in <em>python</em> and <em>common lisp</em> to compare the programmer side of things: how hard was it really to get 121 times faster?</p> <pre class="src"> <span style="color: #7f007f;">def</span> <span style="color: #0000ff;">sumrange</span>(arg):

<span style="color: #7f007f;">return</span> <span style="color: #da70d6;">sum</span>(<span style="color: #da70d6;">xrange</span>(arg)) </pre>

<pre class="src"> (<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">sumrange-dotimes</span> (max)

<span style="color: #bc8f8f;">"return the sum of numbers from 1 to MAX"</span> (<span style="color: #7f007f;">let</span> ((sum 0)) (<span style="color: #7f007f;">declare</span> (type (and unsigned-byte fixnum) max sum) (optimize speed)) (<span style="color: #7f007f;">dotimes</span> (i max sum) (incf sum i)))) </pre>

<p>That's about it. Yes we can see some <em>manual</em> optimisation directives here, which are optimisation <em>extra complexity</em>. Not to the same level as bringing a compiled artifact that you need to build and deploy, though. Remember that you will need to know the full path where to find the sumrange.so file on the production system, in the optimised <em>python</em> case, so that's what we are comparing against.</p> <p>Here's what happens without the optimisation, and with a smaller target:</p> <pre class="src"> CL-USER&gt; (time (jiaroo:sumrange-dotimes (expt 10 9))) (JIAROO:SUMRANGE-DOTIMES (EXPT 10 9)) took 722,592 microseconds (0.722592 seconds) to run. During that period, and with 2 available CPU cores,

714,709 microseconds (0.714709 seconds) were spent in user mode 1,183 microseconds (0.001183 seconds) were spent in system mode 499999999500000000

CL-USER&gt; (time (<span style="color: #7f007f;">let</span> ((sum 0)) (<span style="color: #7f007f;">dotimes</span> (i (expt 10 9) sum) (incf sum i)))) (<span style="color: #7f007f;">LET</span> ((SUM 0)) (<span style="color: #7f007f;">DOTIMES</span> (I (EXPT 10 9) SUM) (INCF SUM I))) took 2,174,767 microseconds (2.174767 seconds) to run. During that period, and with 2 available CPU cores,

2,156,549 microseconds (2.156549 seconds) were spent in user mode 10,225 microseconds (0.010225 seconds) were spent in system mode 499999999500000000 </pre>

<p>We get a 3 times speed-up from those 2 lines of lisp optimisation directives, which is pretty good. And it's exponential as I didn't have the patience to actually wait until the non optimised 10^10 run finished, I killed it.</p> <h3>Conclusion</h3> <p class="first">That's a case here where I don't know how to reach C level of performances with Common Lisp, which could just be because I don't know yet how to do.</p> <p>Still, getting a 121 times speedup when compared to the pure <em>python</em> version of the code is pretty good and encourages me to continue diving into Common Lisp.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 22 Aug 2012 16:05:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/08/20-performance-the-easiest-way.html</guid> </item> <item> <title>Autumn 2012 Conferences</title> <link>http://tapoueh.org/blog/2012/08/01-autumn-conferences.html</link> <description><![CDATA[<p>The <a href="http://www.postgresql.org/">PostgreSQL</a> community host a number of <a href="../../../conferences.html">conferences</a> all over the year, and the next ones I'm lucky enough to get to are approaching fast now. First, next month in September, we have <a href="http://postgresopen.org/2012/home/">Postgres Open</a> in Chicago, where my talk about <a href="http://tapoueh.org/blog/2012/05/24-back-from-pgcon.html">Large Scale Migration from MySQL to PostgreSQL</a> has been selected!</p>

<center> <p><img src="../../../images/autumn-leave-480.jpg" alt=""></p> </center> <p>This talk shares hindsights about the why and the how of that migration, what problems couldn't be solved without moving away and how the solution now looks. The tools used for migrating away the data, the methods the new architecture are detailed. And the new home, in the cloud!</p> <p>Not that much later after that the European PostgreSQL community is giving us a very nice occasion to get to Prague with <a href="http://2012.pgconf.eu/">PostgreSQL Conference Europe 2012</a> (October 23-26). If you've been meaning to meet with the community, if you've been meaning to visit Prague someday, or any mix of those two very good reasons, think about booking that conference already.</p> <p>The <a href="http://2012.pgconf.eu/callforpapers/">call for papers for pgconf.eu</a> has been extended to August 7th, 2012. Consider sharing your hindsights too!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 02 Aug 2012 01:08:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/08/01-autumn-conferences.html</guid> </item> <item> <title>Solving Every Sudoku Puzzle</title> <link>http://tapoueh.org/blog/2012/07/10-solving-sudoku.html</link> <description><![CDATA[<p><a href="http://norvig.com/">Peter Norvig</a> published a while ago a very nice article titled <a href="http://norvig.com/sudoku.html">Solving Every Sudoku Puzzle</a> wherein he presents a programmatic approach to solving that puzzle game.</p>

<center> <p><a class="image-link" href="http://en.wikipedia.org/wiki/Sudoku"> <img src="../../../images/sudoku.png"></a></p> </center> <p>The article is very well written and makes it easy to think that coming up with the code for such a solver is a very easy task, you apply some basic problem search principles and there you are. Which is partly true, in fact. Also, he uses python, and that means that a lot of trivial programming activities are not a concern anymore, such as memory management.</p> <p>As I've been teaching myself <a href="http://www.cliki.net/Common%20Lisp">Common Lisp</a> for some weeks now I though I would like to read a lisp version of his code, and the article even has a section titled <em>Translations</em>. Unfortunately, no lisp version is available there. One might argue that <a href="http://clojure.org/">Clojure</a> is a decent enough lisp, but my current quest is all about <em>Common Lisp</em> really. So I had to write one myself.</p> <pre class="src"> CL-USER&gt; (sudoku:print-puzzle

(sudoku:solve-grid <span style="color: #bc8f8f;">"5300700006001950000980000608000600034008030017000200060600002800004190050000800</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">79"))</span>

5 3 4 6 7 8 9 1 2
6 7 2 1 9 5 3 4 8
1 9 8 3 4 2 5 6 7

+——-+——
8 5 9 7 6 1 4 2 3
4 2 6 8 5 3 7 9 1
7 1 3 9 2 4 8 5 6

+——-+——
9 6 1 5 3 7 2 8 4
2 8 7 4 1 9 6 3 5
3 4 5 2 8 6 1 7 9

took 1,974 microseconds (0.001974 seconds) to run. During that period, and with 2 available CPU cores,

1,894 microseconds (0.001894 seconds) were spent in user mode 88 microseconds (0.000088 seconds) were spent in system mode 174,320 bytes of memory allocated. </pre>

<h3>Comments on the python version</h3> <p class="first">Norvig's article is very well written, I think. By that I mean that by reading it you're confident that you've understood the problem and how the solution is articulated, so you almost think you don't need to really try to understand the code, it's just an illustration of the text.</p> <p>Well, not so much. When you want to port the exact same algorithm you have to understand exactly what the code is doing so that you're not implementing something else. All the more when, as I did, you want to use some other data structure.</p> <p>My goal was not to rewrite the code as-is, but to try and come up with <em>idiomatic</em> lisp code implementing Norvig's solution. So rather than using <em>strings</em> and <em>dictionaries</em> (in lisp, they still call them a <a href="http://www.lispworks.com/documentation/lw50/CLHS/Body/f_mk_has.htm">hash table</a>) I've been using more natural data structures.</p> <p>The <em>python</em> code is really uneasy to follow, full of functional programming veteran tricks. I mean avoiding <em>exceptions</em> and simply returning False whenever there's a problem, and using functions such as all and some to manage that. It's certainly working, it's not making the code any easier to read.</p> <p>To summarize, that code looks like it's been written by someone smart who didn't want to spend more than a couple of hours on it, and did take all known trustworthy shortcuts he could to achieve that goal. Quality and readability certainly weren't the key motive. I've been quite deceived after reading a very good article.</p> <h3>Comments on the common lisp version</h3> <p class="first">Keep in mind that I'm just a <em>Common Lisp</em> newbie. I've been told some good pieces of advice by knowledgeable people though, so with some luck my implementation is somewhat <em>lispy</em> enough.</p> <p>So we start by defining some data structures and low-level functions to build up the more complex one, so that it's easier to read and debug. The <em>sudoku</em> puzzle is then a grid of digits and a grid of possible values in places where the digits are yet unknown.</p> <p>The way to represent that 9x9 grid is with using <a href="http://www.lispworks.com/documentation/lw51/CLHS/Body/f_mk_ar.htm">make-array</a>:</p> <pre class="src"> (make-array '(9 9)

<span style="color: #da70d6;">:element-type</span> '(integer 0 9) <span style="color: #da70d6;">:initial-element</span> 0) </pre>

<p>Then the possible values. I though about using a bit-vector (and actually I did implement it that way), then I've been told that the <em>Common Lisp</em> way to approach that is using <a href="http://psg.com/~dlamkins/sl/chapter18.html">2-complement integer representation</a>, as we have plenty of functions to operate numbers that way. I wouldn't believe that would make the code simpler, but in fact it really did, see:</p> <pre class="src"> CL-USER&gt; #b111111111 511 CL-USER&gt; (logcount #b111111111) 9 CL-USER&gt; (logcount 511) 9 CL-USER&gt; (logbitp 3 #b100100100) NIL CL-USER&gt; (logbitp 2 #b100100100) T CL-USER&gt; (format nil <span style="color: #bc8f8f;">"~2r"</span> (logxor #b111111111 (ash 1 4))) <span style="color: #bc8f8f;">"111101111"</span> CL-USER&gt; (logbitp 4 (logxor #b111111111 (ash 1 4))) NIL </pre> <p>With that in mind, we can write the following code:</p> <pre class="src"> (<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">count-remaining-possible-values</span> (possible-values)

<span style="color: #bc8f8f;">"How many possible values are left in there?"</span> <span style="color: #b22222;">;; </span><span style="color: #b22222;">we could raise an empty-values condition if we get 0... </span> (logcount possible-values))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">first-set-value</span> (possible-values)

<span style="color: #bc8f8f;">"Return the index of the first set value in POSSIBLE-VALUES."</span> (+ 1 (floor (log possible-values 2))))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">only-possible-value-is?</span> (possible-values value)

<span style="color: #bc8f8f;">"Return a generalized boolean which is true when the only value found in POSSIBLE-VALUES is VALUE"</span> (and (logbitp (- value 1) possible-values) (= 1 (logcount possible-values))))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">list-all-possible-values</span> (possible-values)

<span style="color: #bc8f8f;">"Return a list of all possible values to explore"</span> (<span style="color: #7f007f;">loop</span> for i from 1 to 9 when (logbitp (- i 1) possible-values) collect i))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">value-is-set?</span> (possible-values value)

<span style="color: #bc8f8f;">"Return a generalized boolean which is true when given VALUE is possible in POSSIBLE-VALUES"</span> (logbitp (- value 1) possible-values))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">unset-possible-value</span> (possible-values value)

<span style="color: #bc8f8f;">"return an integer representing POSSIBLE-VALUES with VALUE unset"</span> (logxor possible-values (ash 1 (- value 1)))) </pre>

<p>You can see here that I was also under the influence of a recent reading about <a href="http://gar1t.com/blog/2012/06/10/solving-embarrassingly-obvious-problems-in-erlang/">making it obvious</a>, or so called <a href="http://dieswaytoofast.blogspot.fr/2012/07/erlang-why-so-many-seemingly-identical.html">intentional programming</a>, following what <a href="http://armstrongonsoftware.blogspot.fr/">Joe Armstrong</a> has to say about it:</p> <blockquote> <p class="quoted"><em>Intentional programming is a name I give to a style of programming where the reader of a program can easily see what the programmer intended by their code. The intention of the code should be obvious from the names of the functions involved and not be inferred by analysing the structure of the code. (Reading the code should) precisely expresses the intention of the programmer—here no guesswork or program analysis is involved, we clearly read what was intended.</em></p> </blockquote> <p>So there we go with function names such as count-remaining-possible-values, that will help when reading some more complex code, as in the following, the meat of the solution:</p> <pre class="src"> (<span style="color: #7f007f;">defmethod</span> <span style="color: #0000ff;">eliminate</span> ((puzzle puzzle) row col value)

<span style="color: #bc8f8f;">"Eliminate given VALUE from possible values in cell ROWxCOL of PUZZLE, and propagate when needed"</span> (<span style="color: #7f007f;">with-slots</span> (grid values) puzzle <span style="color: #b22222;">;; </span><span style="color: #b22222;">if already unset, work is already done </span> (<span style="color: #7f007f;">when</span> (value-is-set? (aref values row col) value) <span style="color: #b22222;">;; </span><span style="color: #b22222;">eliminate the value from the set of possible values </span> (<span style="color: #7f007f;">let*</span> ((possible-values (unset-possible-value (aref values row col) value))) (setf (aref values row col) possible-values)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">now if we're left with a single possible value </span> (<span style="color: #7f007f;">when</span> (= 1 (count-remaining-possible-values possible-values)) (<span style="color: #7f007f;">let</span> ((found-value (first-set-value possible-values))) <span style="color: #b22222;">;; </span><span style="color: #b22222;">update the main grid </span> (setf (aref grid row col) found-value)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">eliminate that value we just found in all peers </span> (eliminate-value-in-peers puzzle row col found-value)))

<span style="color: #b22222;">;; </span><span style="color: #b22222;">now check if any unit has a single possible place for that value </span> (<span style="color: #7f007f;">loop</span> for (r . c) in (list-places-with-single-unit-solution puzzle row col value) do (assign puzzle r c value)))))) </pre>

<p>So that lisp code is quite verbose and at 389 lines almost doubles the 201 lines Norvig had. When clarity is part of the goal, that's hard to avoid, I hope I made a good case that this is not due to lisp being overly verbose by itself.</p> <h3>Comments on the development environment</h3> <p class="first">Or why I even considered <em>Common Lisp</em> as an interesting language for that kind of exercise, and some more. <em>I'll have to tell about re-sharding data live with 16 threads and 256 databases, all in CL, someday</em>.</p> <p>So I've been doing some <em>Emacs Lisp</em> development for a while now, and the part that makes that so much fun is the instant reward. You write some code in your editor, type a key chord (usually, that's C-M-x runs the command eval-defun) and your code is loaded up, ready to be tested. In <em>Emacs Lisp</em> the test can be simply using your editor and watching the new behavior taking place, or playing in the M-x ielm console. When the code is not ready, it crashes, and you're left in the interactive debugger, where you can use C-x C-e runs the command eval-last-sexp to evaluate any expression in your source and see its value in the current <em>debug frame</em>.</p> <p>That way of working is a huge productivity boost, that I've been missing much when getting back to writing C code for PostgreSQL. I can't C-M-x the current function and go write some SQL to test it right away, I have to <em>compile</em> the whole source tree, then <em>install</em> the new binaries, then <em>restart</em> the test server and then open up a <em>psql</em> console to interact with the new code. Of course I could just make check and watch the results, but then if I attach a <em>debugger</em> it complains that the code on-disk is more recent than the code in the <em>core dump</em>.</p> <p>What if you want <em>Emacs Lisp</em> integrated facilities and something made for general programming rather than suited to building a text editor? Don't get me wrong, you can probably find more production ready code in <em>elisp</em> than in many other languages, just because Emacs has been there for about 35 years. Editor targeted production code, though.</p> <p>This integrated development cycle is all the same when you're using <em>Common Lisp</em>. The awesome <a href="http://common-lisp.net/project/slime/">Superior Lisp Interaction Mode for Emacs</a> is providing exactly that experience. Just run M-x slime and then as you define your code you can C-M-x the function at point, see the compilation errors and warnings if any in the associated <em>REPL</em>, and just try your code. I tend to mostly play in the command line, it's possible to just use C-x C-e while typing too.</p> <h3>Performances</h3> <p class="first">Of course we do care! After all the original article came with a quite detailed performance analysis with graphs and all. I won't be reproducing that, sorry. I'll just show you what penalty you get for using an older language specification, much more dynamic and with more features than python, and with a great, scratch that, awesome development environment.</p> <p>Oh wait, that's the other way round, no penalty, it's actually so much faster!</p> <h4>Python version perfs</h4> <p class="first">The results I got on my desktop machine are about twice as fast as in the original article, I guess newer machines and newer python have something to say for that:</p> <pre class="src">

dim ~/dev/CL/sudoku python sudoku.dim.py All tests pass. Solved 50 of 50 easy puzzles (avg 0.01 secs (151 Hz), max 0.01 secs). Solved 95 of 95 hard puzzles (avg 0.02 secs (42 Hz), max 0.12 secs). Solved 11 of 11 hardest puzzles (avg 0.01 secs (115 Hz), max 0.01 secs). </pre>

<p>That makes an average of (50*151 + 95*42 + 11*115) / (50+95+11) = 82Hz.</p> <p>That seems pretty good, let's continue.</p> <p>As you can see I've cut away the <em>random puzzle</em> part, that's because I was too lazy to implement that part, which didn't seem all that interesting to me. If you think that's a problem and need solving, I accept patches.</p> <h4>Common lisp version perfs</h4> <p class="first">When using <a href="http://sbcl.org/">SBCL</a> on the same machine, what I got was:</p> <pre class="src">

(sudoku:solve-example-grids) Solved 50 of 50 easy puzzles (avg .0021 sec (471.7 Hz), max 0.015 secs). Solved 95 of 95 hard puzzles (avg .0022 sec (446.0 Hz), max 0.008 secs). Solved 11 of 11 hardest puzzles (avg .0018 sec (550.0 Hz), max 0.003 secs). </pre>

<p>With the same way to compute the average, we now have 461.6Hz.</p> <p>Now, that's between 3 times and more than <strong>10 times faster</strong> than the python version (taken collection per collection), for a comparable effort, a much better development environment, and the same all dynamic no explicit compiling approach.</p> <h3>Conclusion</h3> <p class="first">I guess I'm fond of <em>Common Lisp</em>, which I already saw coming (so did you, right?), and now I have some public article and code to share about why :)</p> <p>The code is hosted at <a href="https://github.com/dimitri/sudoku">https://github.com/dimitri/sudoku</a> if you're interested, with the necessary files to reproduce, some docs, etc.</p> <p>Also, apart from using <em>integers</em> as <em>bitfields</em>, which I did more for being lispy than for performances, I did very little effort for optimizing the code. It's quite naive in this respect, yet allow me an average of 461.6Hz rather than 82Hz, that's <strong><em>5.6 times faster</em></strong> average.</p> <p>So yes, I will continue to invest some precious time in <em>Common Lisp</em> as a very good interactive scripting language, and maybe more than that.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 10 Jul 2012 20:37:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/07/10-solving-sudoku.html</guid> </item> <item> <title>PGDay France 2012</title> <link>http://tapoueh.org/blog/2012/06/08-pgdayfr-lyon.html</link> <description><![CDATA[<p>The french PostgreSQL Conference, <a href="http://www.pgday.fr/programme">pgday.fr</a>, was yesterday in Lyon. We had a very good time and a great schedule with a single track packed with 7 talks, addressing a diverse set of PostgreSQL related topics, from GIS to fuzzy logic, including replication.</p>

<p>You might have guessed it already, I did talk about replication. Here's the slide deck I did use, it's in french, sorry if you don't grok that language.</p> <center> <p><a class="image-link" href="../../../images/confs/PGDay_2012_Replications.pdf"> <img src="../../../images/confs/PGDay_2012_Replications.png"></a></p> </center> <p>The conference was very nice and did go smoothly, even if we were “only” 60 of us I had the pleasure to meet with different users with very different set of needs. Very happy to have been there!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 08 Jun 2012 16:17:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/06/08-pgdayfr-lyon.html</guid> </item> <item> <title>M-x recompile</title> <link>http://tapoueh.org/blog/2012/06/01-emacs-recompile.html</link> <description><![CDATA[<p>A friend of mine just asked me for advice to tweak some Emacs features, and I think that's really typical of using Emacs: rather than getting used to the way things are shipped to you, when using Emacs, you start wanting to adapt the tools to the way you want things to be working instead. And you can call that the awesome!</p>

<p>In this case we're talking about the M-x compile and M-x recompile functions. My friend bound the former to &lt;f11&gt; and wanted that C-u f11 do a recompile with the exact same command line as the previous compile command.</p> <p>Well, to be honest, I didn't know about M-x recompile until after I wrote the following function, made to trigger another compile with the last command used if using C-u.</p> <pre class="src"> (<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">cyb-compile-last-command</span> nil) (<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">cyb-compile-command-history</span> nil)

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">cyb-compile</span> (arg)

<span style="color: #bc8f8f;">"Compile with given command, optionally recompile with last command"</span> (interactive <span style="color: #bc8f8f;">"P"</span>) (<span style="color: #7f007f;">if</span> arg (<span style="color: #7f007f;">progn</span> <span style="color: #b22222;">;; </span><span style="color: #b22222;">arg given: compile with last command </span> (<span style="color: #7f007f;">unless</span> cyb-compile-last-command (<span style="color: #ff0000; font-weight: bold;">error</span> <span style="color: #bc8f8f;">"Can't recompile yet, no known last command"</span>)) (compile cyb-compile-last-command)) <span style="color: #b22222;">;; </span><span style="color: #b22222;">else branch, no arg given, ask for a command </span> (<span style="color: #7f007f;">let</span> ((command (read-string <span style="color: #bc8f8f;">"Compile with command: "</span> <span style="color: #bc8f8f;">"make -k"</span> 'cyb-compile-command-history <span style="color: #bc8f8f;">"make -k"</span>))) (setq cyb-compile-last-command command) (compile command))))

(global-set-key (kbd <span style="color: #bc8f8f;">"&lt;f11&gt;"</span>) 'cyb-compile) </pre>

<p>With that little <em>Emacs Lisp</em> code we're driving Emacs the way we want to be working, and that's great! You can see it was a <em>quick hack</em> in that if you wanted to use the function non interactively it would still prompt for the command to use to compile, when <em>Emacs Lisp</em> interactive special form would allow us to implement something way smarter here. Also if we wanted to spend some more time on that feature, we should probably tweak the <em>error</em> condition to be asking for the command rather than just complaining, that would certainly be more useful.</p> <p>Exercise left to the reader, rewrite using recompile rather than reinventing it in a hurry! Beware of call-interactively though. Oh and fix the aforementioned infelicities, too.</p> <p>To conclude, we see that writing <em>Emacs Lisp</em> code to fix a usability problem in a hurry is a great force of Emacs, and that we're provided with the necessary tool set so as to be able to reach completeness if we wanted to do so.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 01 Jun 2012 18:45:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/06/01-emacs-recompile.html</guid> </item> <item> <title>Back From PgCon</title> <link>http://tapoueh.org/blog/2012/05/24-back-from-pgcon.html</link> <description><![CDATA[<p>Last week was the annual <em>PostgreSQL Hackers</em> gathering in Canada, thanks to the awesome <a href="http://www.pgcon.org/">pgcon</a> conference. This year's issue has been packed with good things, beginning with the <a href="http://wiki.postgresql.org/wiki/PgCon2012CanadaClusterSummit">Cluster Summit</a> then followed the next day by the <a href="http://wiki.postgresql.org/wiki/PgCon_2012_Developer_Meeting">Developer Meeting</a> just followed (yes, in the same day) with the <a href="http://wiki.postgresql.org/wiki/PgCon2012CanadaInCoreReplicationMeeting">In Core Replication Meeting</a>. That was a packed shedule!</p>

<center> <p><img src="../../../images/in-core-replication.jpg" alt=""></p> </center> <p>The <em>in core replication</em> project has been presented with slides titled <a href="http://wiki.postgresql.org/images/7/75/BDR_Presentation_PGCon2012.pdf">Future In-Core Replication for PostgreSQL</a> and got a very good reception. For instance, people implementing <a href="http://slony.info/">Slony</a> (<em>Jan Wieck</em>, <em>Christopher Browne</em> and <em>Steve Singer</em> where here) appreciated the concepts here and where rather supportive of both the requirements and the design, and appreciated the very early demo and results that we had to show already, as a kind of a proof of concepts.</p> <p>After those first two days, we could start the actual show. I had the honnor to present a migration use case entitled <a href="http://www.pgcon.org/2012/schedule/events/431.en.html">Large Scale MySQL Migration</a> where we're speaking about going from MySQL to PostgreSQL, from 37 to 256 shards, moving more than 6TB of data including binary <em>blobs</em> that we had to process with pl/java. A quite involved migration project whose slides you now can read here:</p> <center> <p><a class="image-link" href="../../../images/fotolog.pdf"> <img src="../../../images/fotolog.jpg"></a></p> </center> <p>I've heard that we should soon be able to enjoy audio and video recordings of the sessions, so if you couldn't make it this year for any reason, don't miss that, you will have loads of very interesting talks to virtually attend. I definitely will do that to catch-up with some talks I couldn't attend, having to pick one out of three is not an easy task, all the more when you add the providential <em>hallway track</em>.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 24 May 2012 09:40:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/05/24-back-from-pgcon.html</guid> </item> <item> <title>Clean PGQ Subconsumers</title> <link>http://tapoueh.org/blog/2012/04/26-unregister-subconsumers.html</link> <description><![CDATA[<p>Now that you're all using the wonders of <a href="../03/12-PGQ-Cooperative-Consumers.html">Cooperative Consumers</a> to help you efficiently and reliably implement your business constraints and offload them from the main user transactions, you're reaching a point where you have to clean up your development environment (because that's what happens to development environments, right?), and you want a way to start again from a clean empty place.</p>

<center> <p><img src="../../../images/drop-queue.png" alt=""></p> </center> <p>Here we go. It used to be much more simple than that, so if you're still using <strong>PGQ</strong> from <strong>Skytools2</strong>, just jump to the next step.</p> <h3>Unregister Subconsumers</h3> <p class="first">That query will figure out subconsumers in the system function pgq.get_consumer_info() and ask PGQ to please <em>unregister</em> them, losing events in the way, even events from batches that are currently active.</p> <pre class="src">

with subconsumers as ( select q1.queue_name, q2.consumer_name, substring(q1.consumer_name from <span style="color: #bc8f8f;">'%.#"%#"'</span> for <span style="color: #bc8f8f;">'#'</span>) as subconsumer_name from (select * from pgq.get_consumer_info() where lag is null) as q1 join (select * from pgq.get_consumer_info() where lag is not null) as q2 on q1.queue_name = q2.queue_name ) select , pgq_coop.unregister_subconsumer(queue_name, consumer_name, subconsumer_name, 1) from subconsumers; </pre>

<h3>Unregister Consumers</h3> <p class="first">Now that the first step is done, we have to <em>unregister</em> the main consumers, which is easy and what you already did before:</p> <pre class="src"> select queue_name, consumer_name,

pgq.unregister_consumer(queue_name, consumer_name) from pgq.get_consumer_info(); </pre>

<h3>Drop queues</h3> <p class="first">And as we want to really clean up the mess, let's also drop the queues.</p> <pre class="src"> select queue_name, pgq.drop_queue(queue_name)

from pgq.queue; </pre> ]]></description> <author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 26 Apr 2012 15:05:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/04/26-unregister-subconsumers.html</guid> </item> <item> <title>PGQ Coop Consumers</title> <link>http://tapoueh.org/blog/2012/03/12-PGQ-Cooperative-Consumers.html</link> <description><![CDATA[<p>While working a new <a href="http://www.postgresql.org/">PostgreSQL</a> architecture for an high scale project that used to be in the top 10 of internet popular web sites (in terms of visitors), I needed to be able to off load some processing from the main path: that's called a <em>batch job</em>. This needs to be <em>transactional</em>: don't run the job if we did rollback; the transaction, process all <em>events</em> that were part of the same transaction in the same transaction, etc.</p>

<center> <p><img src="../../../images/workers.jpg" alt=""></p> </center> <p>That calls for using <a href="http://wiki.postgresql.org/wiki/PGQ_Tutorial">PGQ</a>, the <em>jobs queue</em> solution from <a href="http://wiki.postgresql.org/wiki/Skytools">Skytools</a>, the power horse for <a href="http://wiki.postgresql.org/wiki/Londiste_Tutorial">Londiste</a>. If PGQ is good enough to build a full trigger-based replication solution on top of it, certainly it's good enough for our custom processing, right? Well, you still need to check that your expectations are met, and that was happily the case in my implementation. It's a very common problem, and PGQ very often is a great solution to it.</p> <p>As this implementation is PHP centric, we've been using <a href="https://github.com/dimitri/libphp-pgq">libphp-pgq</a> to drive our background workers. Using PGQ in PHP has been very easy to setup, the only trap being not to forget about running the <em>ticker</em> process.</p> <p>It got interesting because of two elements. First, we're nor running a single database instance here but a bunch of them... make it <em>256 databases</em>. Each of them having 5 queues to consume, that would be about 1280 consumer processes, distributed on 16 servers that's still 80 per server, so way too many. What we did instead is reuse the <a href="https://github.com/markokr/skytools/blob/master/scripts/queue_mover.py">queue mover</a> script found in the Skytools distribution and adapt it to <em>forward</em> the event of the 1280 source queues to only 5 destination queues. We then process the events from this single location.</p> <p>Now it's easier to deal with, but we're not still exactly there. Of course, with so many sources, concentrating them all into the same place means that a single consumer is not able to process the events as fast as they are produced. That's where the <em>cooperative consuming</em> shines, it's very easy to turn your <em>consumer</em> into a <em>cooperative</em> one even on an existing and running queue, and that's what we did. So now we can choose how many <em>workers</em> we want per queue: one of them has 4 workers, another one see not so much activity and 1 worker still fits.</p> <center> <p><img src="../../../images/coop-workers.jpeg" alt=""></p> </center> <p>The queue mover script that knows how to subscribe to many queues from the same process is going to be contributed to Skytools proper, of course.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 12 Mar 2012 14:43:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/03/12-PGQ-Cooperative-Consumers.html</guid> </item> <item> <title>Extension White Listing</title> <link>http://tapoueh.org/blog/2012/03/08-extension-white-listing.html</link> <description><![CDATA[<p>PostgreSQL 9.1 includes proper extension support, as you might well know if you ever read this very blog here. Some hosting facilities are playing with PostgreSQL at big scale (hello <a href="https://postgres.heroku.com/blog">Heroku</a>!) and still meet with small caveats making their life uneasy.</p>

<p>To be specific, only <em>superusers</em> are allowed to install C coded stored procedures, and that impacts a lot of very useful PostgreSQL extension: all those shiped in the <em>contrib</em> package are coded in C. Now, <a href="https://postgres.heroku.com/blog">Heroku</a> is not giving away <em>superuser</em> access to their hosted customers in order to limit the number of ways they can shoot themselves in the foot. And given PostgreSQL security model, being granted <em>database owner</em> is mostly good enough for day to day operation.</p> <blockquote> <p class="quoted"> See Andrew's article <a href="http://people.planetpostgresql.org/andrew/index.php?/archives/259-Heroku&#44;-a-really-easy-way-to-get-a-database-in-a-hurry..html">Heroku, a really easy way to get a database in a hurry</a> for more context about Heroku's offering here.</p> </blockquote> <p>Mostly, but as we see, not completely good enough. How to arrange for a non <em>superuser</em> to be able to still install a C-coded extension in his own database? That's quite dangerous as any bug causing a crash would mean a PostgreSQL whole restart. So you not only want to empower CREATE EXTENSION to database owners, you also want to be able to review and explicitely <em>white list</em> the allowed extensions.</p> <p>Here we go: <a href="https://github.com/dimitri/pgextwlist">pgextwlist</a> is a PostgreSQL extensions implementing just that idea. You have to tweak local_preload_libraries so that it gets loaded automatically and early enough, and you have to provide for the list of authorized extensions in the extwlist.extensions setting.</p> <p>Let's see a usage example, straight from the documentation:</p> <pre class="src"> dim=&gt; select rolsuper from pg_roles where rolname = current_user; select rolsuper from pg_roles where rolname = current_user;

rolsuper <span style="color: #b22222;">———- </span> f (1 row)

dim=&gt; create extension hstore; create extension hstore; WARNING: &gt; is deprecated as an operator name DETAIL: This name may be disallowed altogether in future versions of PostgreSQL. CREATE EXTENSION dim&gt; create extension earthdistance; create extension earthdistance; ERROR: extension "earthdistance" is not whitelisted DETAIL: Installing the extension "earthdistance" failed, because it is not

on the whitelist of user-installable extensions. HINT: Your system administrator has allowed users to install certain extensions. SHOW extwlist.extensions;

dim=&gt; \dx \dx

List of installed extensions

Name Version Schema Description

<span style="color: #b22222;">———+———+————+—————————————————

</span> hstore 1.0 public data type for storing sets of (key, value) pairs
plpgsql 1.0 pg_catalog PL/pgSQL procedural language

(2 rows)

dim=&gt; drop extension hstore; drop extension hstore; DROP EXTENSION </pre>

<p>As you can see, it allows non <em>superusers</em> to install an extension written in C.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 08 Mar 2012 14:25:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/03/08-extension-white-listing.html</guid> </item> <item> <title>Battle Language à la Marmite</title> <link>http://tapoueh.org/blog/2012/03/01-duchessfr-battle-language.html</link> <description><![CDATA[<p>J'ai eu la chance hier soir de participer à la <a href="http://jduchess.org/duchess-france/blog/battle-language-a-la-marmite/">Battle Language à la Marmite</a>, où j'avais proposé de parler de <a href="http://www.emacswiki.org/emacs/EmacsLisp">Emacs Lisp</a>, proposition qui s'est transformée en porte-étendard de la grande famille <a href="http://www.lisp.org/index.html">Lisp</a>. J'ai utilisé avec plaisir certains contenu de <a href="http://www.lisperati.com/">Lisperati</a> dans ma présentation et je vous recommande le détour sur ce site !</p>

<center> <p><a class="image-link" href="../../../images/confs/elisp.pdf"> <img src="../../../images/confs/elisp-1.png"></a></p> </center> <p>J'ai dans cette présentation très rapide (5 minutes seulement) mentionné l'approche <em>axiomatique</em> de <strong><em>John McCarthy</em></strong> lorsqu'il a <em>découvert</em> le language, on peut en lire un peu plus sur le site de <strong><em>Paul Graham</em></strong> et son article <a href="http://www.paulgraham.com/rootsoflisp.html">The Roots of Lisp</a> et le code associé, une <a href="http://lib.store.yahoo.net/lib/paulgraham/jmc.lisp">implémentation du LISP de McCarthy en common lisp</a>.</p> <p>Merci à <a href="http://jduchess.org/">Duchess</a> pour une bonne soirée où nous avons pu échanger nos points de vue et débattre des languages fonctionnels et objects, des différences entre Erlang et Haskell et Ruby, et de quelques autres sujets dérivés !</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 01 Mar 2012 14:49:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/03/01-duchessfr-battle-language.html</guid> </item> <item> <title>pgbouncer munin plugin</title> <link>http://tapoueh.org/blog/2011/11/16-pgbouncer-munin.html</link> <description><![CDATA[<p>It seems that if you search for a <a href="http://munin-monitoring.org/">munin</a> plugin for <a href="http://wiki.postgresql.org/wiki/PgBouncer">pgbouncer</a> it's easy enough to reach an old page of mine with an old version of my plugin, and a broken link. Let's remedy that by publishing here the newer version of the plugin. To be honest, I though it already made its way into the official munin 1.4 set of plugins, but I've not been following closely enough.</p>

<center> <p><img src="../../../images/bouncing_elephant.gif" alt=""></p> </center> <p>As the plugin is 300 lines of python code, it's not a good idea to just inline it here, so please grab it at <a href="../../../resources/pgbouncer_">pgbouncer_</a>.</p> <p>You might need to know that the script name once installed should follow the form pgbouncer_dbname_stats_requests or pgbouncer_dbname_pools, where of course dbname can contain any number of _ characters. This script supports quite old versions of <em>pgbouncer</em> that didn't accept the normal pq protocol, you did have to use psql to have any chance of getting the data from a script, you couldn't then just use a PostgreSQL driver such as <a href="http://initd.org/psycopg/">psycopg2</a>.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 16 Nov 2011 14:00:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/11/16-pgbouncer-munin.html</guid> </item> <item> <title>Extensions en simple SQL</title> <link>http://tapoueh.org/blog/2011/10/31-extensions-sql.html</link> <description><![CDATA[<p>La <a href="http://2011.pgconf.eu/">conférence européenne à Amsterdam</a> était un très bon évènement de la communauté, avec une organisation impeccable dans un hôtel accueillant. J'ai eu le plaisir d'y parler des extensions et de leur usage dans le cadre du développement applicatif « interne », sous le titre <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/138-extensions-are-good-for-business-logic/">Extensions are good for business logic</a>.</p>

<center> <p><a class="image-link" href="http://wiki.postgresql.org/images/f/f1/Using-extensions.pdf"> <img src="../../../images/using-extensions-10.png"></a></p> </center> <p>L'idée de ma présentation, que la plupart d'entre vous a loupé je suppose (en tout cas je n'avais qu'une petite poignée de français dans la salle, et j'espère avoir des lecteurs qui n'étaient pas à Amsterdam), l'idée est d'utiliser les mécanismes offerts par les extensions afin de maintenir le code PL que vous utilisez en production.</p> <p>Il s'agit la plupart du temps de procédures qui implémentent une partie de la logique métier de vos applications, mais si proche des données que cela termine en base directement : c'est une bonne chose, en particulier depuis <em>PostgreSQL 9.1</em>. Cette version propose en effet une gestion assez complète des extensions.</p> <p>Il s'agit de réaliser un <em>empaquetage</em> de vos procédures en suivant la documentation en ligne et son chapitre <a href="http://docs.postgresqlfr.org/9.1/extend-extensions.html">35.15. Empaqueter des objets dans une extension</a>. Une fois cela fait, il est alors possible de déployer votre ensemble de procédure stockée avec la commande CREATE EXTENSION mesprocs;, et ensuite la commande psql \dx vous permet de lister les extensions installées et leur numéro de version.</p> <p>Les mises à jours sont également gérées avec une commande SQL dédiée, il s'agit alors de ALTER EXTENSION mesprocs UPDATE [TO version];. Il suffit de fournir des scripts intermédiaires nommés par exemple mesprocs--1.0--1.1.sql et mesprocs--1.1--1.2.sql et PostgreSQL saura comment passer de 1.0 à 1.1.</p> <p>Voilà, vous savez presque tout de ma présentation à Amsterdam et vous pouvez retrouver le reste sur le support proposé en début d'article. Bien sûr je n'ai pas reproduit ici les questions intéressantes qui m'ont été posées, une bonne partie d'entre elles sont venues enrichir ma liste de Noël pour les extensions. Si vous voulez être sûr de trouver cela sous votre sapin, cependant, le meilleur moyen est encore de m'en parler : sponsoriser les développement Open Source est une belle démarche :)</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 31 Oct 2011 14:22:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/10/31-extensions-sql.html</guid> </item> <item> <title>Back From Amsterdam</title> <link>http://tapoueh.org/blog/2011/10/26-back-from-amsterdam.html</link> <description><![CDATA[<p>Another great conference took place last week, <a href="http://2011.pgconf.eu/">PostgreSQL Conference Europe 2011</a> was in Amsterdam and plenty of us PostgreSQL geeks were too. I attended to lot of talks and did learn some more about our project, its community and its features, but more than that it was a perfect occasion to meet with the community.</p>

<center> <p><img src="../../../images/ams-conf-room.jpg" alt=""></p> </center> <p><a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/2-dave-page/">Dave Page</a> talked about SQL/MED under the title <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/146-postgresql-at-the-center-of-your-dataverse/">PostgreSQL at the center of your dataverse</a> and detailed what to expert from a <em>Foreign Data Wrapper</em> in PostgreSQL 9.1, then how to write your own. Wherever you are currently managing your data, you can easily enough make it so that PostgreSQL integrates them by means of fetching them to answer your queries. Which means real time data federating: you don't copy data around, you remote access them when executing the query.</p> <p>I might need to come up with new <em>Foreign Data Wrappers</em> in a not too distant future, now that I better grasp how much work it really is to do that, it appears to be a good migration strategy too:</p> <pre class="src">

INSERT INTO real.table SELECT FROM foreign.table; </pre>

<p>Another discovery is that apparently <a href="http://code.google.com/p/plv8js/wiki/PLV8">PLv8</a> is ready for public consumption. Using it can lead to <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/174-heralding-the-death-of-nosql/">Heralding the Death of NoSQL</a>, so use it with care.</p> <p>In the presentation of <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/156-synchronous-replication-and-durability-tuning/">Synchronous Replication and Durability Tuning</a> we mainly saw that mixing <em>synchronous</em> and <em>asynchronous</em> transactions in your application is the key to real performances across the ocean, as the speed of the light is not infinite. From Baltimore to Amsterdam the latency can not be better than 100ms and that's not the same as <em>instant</em> nowadays.</p> <p>Then again, depending on the number of concurrent queries to sync over the ocean link, the experimental setup was able to achieve several thousands of queries per second, which is validating the model we picked for <em>sync rep</em> and its implementation.</p> <p>If you want to read the slides again at home, or if you could not be there for some reason, then most of the talks are now available online at the <a href="http://wiki.postgresql.org/wiki/PostgreSQL_Conference_Europe_Talks_2011">PostgreSQL Conference Europe Talks 2011</a> wiki page.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 26 Oct 2011 10:08:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/10/26-back-from-amsterdam.html</guid> </item> <item> <title>Implementing backups</title> <link>http://tapoueh.org/blog/2011/10/12-backup-strategy.html</link> <description><![CDATA[<p>I've been asked about my opinion on backup strategy and best practices, and it so happens that I have some kind of an opinion on the matter.</p>

<p>I tend to think best practice here begins with defining properly the <em>backup plan</em> you want to implement. It's quite a complex matter, so be sure to ask yourself about your needs: what do you want to be protected from?</p> <center> <p><img src="../../../images/online-backup.jpg" alt=""></p> </center> <p>The two main things to want to protect from are hardware loss (crash disaster, plane in the data center, fire, water flood, etc) and human error (UPDATE without a where clause). Replication is an answer to the former, archiving and dumps to the latter. You generally need both.</p> <p>Often enough “backups” include WAL <em>archiving</em> and <em>shipping</em> and nightly or weekly <em>base backups</em>, with some retention and some scripts or procedures ready to setup <a href="http://www.postgresql.org/docs/9.1/static/continuous-archiving.html">Point In Time Recovery</a> and recover some data without interfering with the WAL archiving and shipping. Of course with PostgreSQL 9.0 and 9.1, the <em>WAL Shipping</em> can be implemented with <em>streaming replication</em> and you can even have a <em>Hot Standby</em>. But for backups you still want archiving.</p> <p>Mostly I still implement pg_dump -Fc nightly backups with a custom retention (for example, 1 backup a month kept 2 years, 1 backup a week kept 6 or 12 months, 1 backup a night kept 1 to 2 weeks), when the database size allow the pg_dump run to remain constrained in the <em>maintenance window</em>, if any.</p> <p>Don't forget that while pg_dump runs, you can't roll out <em>DDL changes</em> to the production system any more, so you want to be careful about this <em>maintenance window</em> thing. When you have one.</p> <p><em>Physical backups</em> are not locking <em>rollouts</em> away, but they often suck a good deal of the <em>IO bandwidth</em> so you need to pick up a right timing to do them. That's how you can get to once a week base backup and WAL <em>archiving</em>.</p> <p>If you can't pg_dump production, maybe you can have <em>automated restore jobs</em> from the <em>physical backups</em> that you then pg_dump -Fc, so that you still have that. That can come up handy, really: you can't test your <em>major upgrade</em> out of a <em>physical backup</em>.</p> <p>Also, <strong><em>obviously</em></strong>, never consider your backup strategy implemented until you have either <em>automated restores</em> in place or a regular schedule to exercise them (<em>staging instances</em>, devel instances).</p> <p>Then as far as the practical tools go, I tend to think that <a href="http://tapoueh.org/pgsql/pgstaging.html">pg_staging</a> is worth its installation complexity, and for WAL archiving and base backup I recommend <a href="http://skytools.projects.postgresql.org/doc/walmgr.html">walmgr</a> from <a href="http://wiki.postgresql.org/wiki/SkyTools">Skytools</a>, that's a very handy tool. When using PostgreSQL 9.0 or 9.1, consider using <a href="http://packages.debian.org/experimental/skytools3-walmgr">walmgr3</a> so that it's behaving nice alongside <em>streaming replication</em>.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 12 Oct 2011 22:22:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/10/12-backup-strategy.html</guid> </item> <item> <title>Extensions, applications</title> <link>http://tapoueh.org/blog/2011/10/10-extensions-applicatives.html</link> <description><![CDATA[<p>La <a href="http://2011.pgconf.eu/">conférence PostgreSQL annuelle en Europe</a> a lieu la semaine prochaine à Amsterdam, et j'espère que vous avez déjà vos billets, car cette édition s'annonce comme un très bon millésime !</p>

<p>Je présenterai donc comment utiliser les extensions, le titre en anglais est <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/138-extensions-are-good-for-business-logic/">Extensions are good for business logic</a>, et l'idée est de voir comment exploiter les extensions afin de mieux gérer vos mises à jours en bases de données.</p> <p>Le cycle de vie des bases de données en production inclue souvent l'utilisation d'une base de développement où le schéma évolue au rythme des besoins des développeurs, et de temps en temps on consolide une partie de ces modifications (dans des <em>rollouts</em>, scripts contenant principalement des DDL) afin de les déployer en production — si possible avec une étape intermédiaire en préproduction, tout de même.</p> <p>Savoir ce qui est déployé en développement et comment en retirer le script à jouer en production peut être parfois fastidieu. Quand ce n'est pas le cas, c'est que le travail a été fait en amont, ce qui est le signe d'une bonne organisation, avec les surcoûts que l'on peut imaginer.</p> <p>Les <a href="http://www.postgresql.org/docs/9.1/static/extend-extensions.html">extensions</a> telles que présentes dans PostgreSQL 9.1 vous permettent de mieux gérer ce genre de cas, en optimisant le surcoût : il ne disparaît pas, mais devient opérationnel plutôt que de rester une charge d'organisation.</p> <p>Allez, je vous laisse maintenant, je dois me préparer pour la conférence :)</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 10 Oct 2011 10:35:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/10/10-extensions-applicatives.html</guid> </item> <item> <title>Scaling Stored Procedures</title> <link>http://tapoueh.org/blog/2011/10/06-scaling-with-stored-procedures.html</link> <description><![CDATA[<p>In the news recently <em>stored procedures</em> where used as an excuse for moving away logic from the database layer to application layer, and to migrate away from a powerful technology to a simpler one, now that there's no logic anymore in the database.</p>

<p>It's not the way I would typically approach scaling problems, and apparently I'm not alone on the <em>Stored Procedures</em> camp. Did you read this nice blog post <a href="http://ora-00001.blogspot.com/2011/07/mythbusters-stored-procedures-edition.html">Mythbusters: Stored Procedures Edition</a> already? Well it happens in another land that where my comfort zone is, but still has some interesting things to say.</p> <p>I won't try and address all of the myths they attack in a single article. Let's pick the scalability problems, the two of them I think about are code management and performances. We are quite well equiped for that in PostgreSQL, really.</p> <p>For code maintainance we now have <a href="http://www.postgresql.org/docs/9.1/static/extend-extensions.html">PostgreSQL Extensions</a>, which allows you to pack all your procedures into separate <em>extensions</em>, and to maintain a version number and upgrade procedures for each of them. You can handle separate rollouts in development for going from 1.12 to 1.13 then 1.14 and after the developers tested it more completely and changed their mind again on the best API they want to work with, 1.15 which is stamped ok for production. At this point, ALTER EXTENSION UPGRADE will happily apply all the rollouts in sequence to upgrade from 1.12 straight to 1.15 in one go. And if you prefer to bake a special careful script to handle that big jump, you also can provide a specific extension--1.12--1.15.sql script.</p> <p>Of course you're managing all those files with your favorite <em>SCM</em>, to answer to some other myth from the blog reference we are loosely following.</p> <center> <p><a class="image-link" href="http://postgresqlrussia.org/articles/view/131"> <img src="../../../images/Moskva_DB_Tools.v3.png"></a></p> </center> <p>I wanted to talk about the other side of the scalability problem, which is the operations side of it. What happens when you need to scale the database in terms of its size and level of concurrent activity? PostgreSQL earned a very good reputation at being able to scale-up, what about scaling-out? Certainly, now that you're all down into <em>Stored Procedure</em>, it's going to be a very bad situation?</p> <p>Well, in fact, you're then in a very good position here, thanks to <a href="http://wiki.postgresql.org/wiki/PL/Proxy">PLproxy</a>. This <em>extension</em> is a custom procedural language whose job is to handle a cluster of database shards that all expose the same PL API, and it's very good at doing that.</p> <p><em>Stored Procedures</em> are a very good tool to have, be sure to get comfortable enough with them so that you can choose exactly when to use them. If you're not sure about that, we at <a href="http://www.2ndquadrant.com/">2ndQuadrant</a> will be happy to help you there!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 06 Oct 2011 18:23:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/10/06-scaling-with-stored-procedures.html</guid> </item> <item> <title>See you in Amsterdam</title> <link>http://tapoueh.org/blog/2011/10/04-see-you-in-Amsterdam.html</link> <description><![CDATA[<p>The next <a href="http://2011.pgconf.eu/">PostgreSQL conference</a> is approaching very fast now, I hope you have your ticket already: it's a very promissing event! If you want some help in deciding whether to register or not, just have another look at <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/">the schedule</a>. Pick the talks you want to see. It's hard, given how packed with good ones the schedule is. When you're mind is all set, review the list. Registered?</p>

<p>I'll be presenting another talk about extensions, but this time I've geared up to use cases, with <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/138-extensions-are-good-for-business-logic/">Extensions are good for business logic</a>. The idea is not to talk about how to make PostgreSQL play fair with extensions including at <em>dump</em> and <em>restore</em> times, that's already done and I've been talking only too much about it. The idea this time is to figure out how much you get from this feature.</p> <p>If you ever felt like something is missing in your processes between pushing rollouts in devel environments and refining them as developers are testing and preparing something for the live databases, then we have something for you here. Including how to easily compare state between production and development, but without having to guess or reverse engineer anything.</p> <p>Yeah, extensions are all about getting even more professional! A great tool you'll be happy to master!</p> <p>And now I need to prepare a damn good slide deck, right? See you there! :)</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 04 Oct 2011 14:25:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/10/04-see-you-in-Amsterdam.html</guid> </item> <item> <title>PostgreSQL à Amsterdam</title> <link>http://tapoueh.org/blog/2011/09/27-pgconf-eu.html</link> <description><![CDATA[<p>Dans moins d'un mois se tient la conférence européenne PostgreSQL, <a href="http://2011.pgconf.eu/">pgconf.eu</a>. Il s'agit de quatre jours consacrés à votre SGBD préféré, où vous pourrez rencontrer la communauté européenne, consituée d'utilisateurs, d'entreprises de toutes tailles, de développeurs, de participants en tout genre.</p>

<p>C'est l'endroit où aller pour apprendre comment le projet fonctionne, comprendre les impacts des nouvelles versions sur votre architecture, avoir une discussion technique pointue sur cette fonctionalité que vous voudriez voir arriver dans la prochaine version, ou simplement vous rendre compte de l'énergie formidable qui est insuflée dans ce projet !</p> <p>Évidemment <a href="http://2ndquadrant.fr/">2ndQuadrant</a> sera de la partie, nous présenterons plusieurs de nos <a href="http://www.2ndquadrant.com/fr/les-fonctionnalites-de-postgresql-91/">contributions PostgreSQL 9.1</a>. Cela commencera avec la formation d'une journée complète de <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/81-greg-smith/">Greg</a>, <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/162-performance-from-start-to-crash/">Performance From Start to Crash</a> : si vous voulez apprendre comment aborder les performances d'un serveur PostgreSQL par le <em>leader</em> international du domaine, auteur du livre <a href="http://www.amazon.fr/Bases-donn%C3%A9es-PostgreSQL-Gregory-Smith/dp/274402483X/ref=sr_1_1?ie=UTF8&amp;qid=1316183931&amp;sr=8-1">Bases de données PostgreSQL 9.0</a>, réservez vite votre place !</p> <p>Les présentation au format classique commencent le lendemain, et en trois jours la liste des présentation de notre <a href="http://www.2ndquadrant.com/fr/profil-de-lequipe/">équipe 2ndQuadrant</a> est assez copieuse. Voyons cela.</p> <p>Nous commençons avec <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/144-migration-to-postgresql-a-holistic-view/">Migration to PostgreSQL - a holistic view</a> par <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/78-harald-armin-massa/">Harald Armin Massa</a>, qui propose un point de vue intéressant sur les raisons qui retiennent certaines migrations. Ensuite <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/34-gianni-ciolli/">Gianni Ciolli</a> présentera <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/159-look-out-the-window-functions-and-free-your-sql/">Look Out The Window Functions (and free your SQL)</a> ou comment résoudre simplement des problèmes complexes lorsque l'on dispose d'outils avancés.</p> <p>Une autre présentation à ne pas rater, <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/156-synchronous-replication-and-durability-tuning/">Synchronous Replication and Durability Tuning</a> détaille comment profiter au mieux de PostgreSQL 9.1 afin d'obtenir les garanties de durabilité des données souhaitées dans votre application. Et cette présentation est animée par <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/81-greg-smith/">Greg Smith</a> et <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/17-simon-riggs/">Simon Riggs</a>. Ce dernier a développé la <em>réplication synchrone</em>, et <em>Hot Standby</em> avant cela. Vous ne trouverez personne au monde mieux placé pour faire cette présentation !</p> <p>Les deux prochaines présentation de nos <a href="http://expert-postgresql.fr/">experts PostgreSQL</a>, en continuant notre lecture du <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/">programme de pgconf.eu</a> dans l'ordre, ont lieu au même moment. Le choix ne sera pas facile entre <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/158-improving-vacuum-suction/">Improving VACUUM Suction</a> par Greg à nouveau, et une comparaison de <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/183-londiste-3-et-slony-21/">londiste 3 et slony 2.1</a> par <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/57-cedric-villemain/">Cédric Villemain</a>, en français.</p> <p>À suivre, <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/138-extensions-are-good-for-business-logic/">Extensions are good for business logic</a> que je vous présenterai moi-même, vous pouvez voir ma présentation sur la fiche qui porte mon nom : <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/14-dimitri-fontaine/">Dimitri Fontaine</a>. Il s'agit d'une présentation en anglais qui détaille comment utiliser les extensions dans le cadre de la maintenance de la partie <em>procédures stockées</em> d'une application.</p> <p>Et pour finir le deuxième jour des conférences 2ndQuadrant, vous pourrez apprendre avec Gianni comment <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/160-debugging-complex-sql-queries-with-writable-ctes/">Debugging complex SQL queries with writable CTEs</a>, une fonctionnalité contribuée au projet par un autre consultant <a href="http://www.2ndquadrant.com/fr/contact/">2ndQuadrant</a>, Marko Tiikkaja.</p> <p>Et il reste encore une journée ! Nous ne mentons pas en disant que le programme est complet ! Le dernier jour de la conférence n'est pas le moins intéressant, j'espère que vous aurez su garder un peu d'énergie pour suivre…</p> <p><a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/17-simon-riggs/">Simon Riggs</a> qui présentera sa vision de la <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/199-postgresql-roadmap/">PostgreSQL Roadmap</a> pour les prochaines années. Ce n'est bien sûr que sa vision personnelle, mais lorsque l'on fait le bilan de ces 7 dernières années de <a href="http://www.2ndquadrant.com/fr/histoire-postgresql/">contributions à PostgreSQL</a>, on voit à quel point son opinion personnelle peut avoir du poids dans le développement du projet.</p> <p>À suivre, la présentation de Greg sur son sujet de prédilection : <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/157-bottom-up-database-benchmarking/">Bottom-up Database Benchmarking</a>. Tout ce que vous avez toujours voulu savoir sur les mesures de performances de vos bases de données, sans jamais oser le demander. Quelque chose dans ce style en tout cas :)</p> <p>Bien sûr d'autres présentations sont disponibles et retiendront votre attention, ce billet vous présente seulement celles qui seront données par les <a href="http://expert-postgresql.fr/">experts PostgreSQL</a> de <a href="http://www.2ndquadrant.com/fr/expertise-postgresql/">2ndQuadrant</a>. En vous souhaitant bonne conférence à tous, j'espère avoir le plaisir de vous retrouver à Amsterdam le mois prochain !</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 27 Sep 2011 11:10:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/09/27-pgconf-eu.html</guid> </item> <item> <title>Skytools3: walmgr</title> <link>http://tapoueh.org/blog/2011/09/21-skytools-walmgr-part-1.html</link> <description><![CDATA[<p>Let's begin the <a href="http://wiki.postgresql.org/wiki/SkyTools">Skytools 3</a> documentation effort, which is long overdue. The code is waiting for you over at <a href="https://github.com/markokr/skytools">github</a>, and is stable and working. Why is it still in <em>release candidate</em> status, I hear you asking? Well because it's missing updated documentation.</p>

<p><a href="http://packages.debian.org/experimental/skytools3-walmgr">WalMgr</a> is the Skytools component that manages <em>WAL shipping</em> for you, and archiving too. It knows how to prepare your master and standby setup, how to take a base backup and push it to the standby's system, how to archive (at the satndby) master's WAL files as they are produced and have the standby restore from this archive.</p> <p>What's new in walmgr from Skytools 3 is its support for <em>Streaming Replication</em> that made its way into PostgreSQL 9.0 and is even more useful in PostgreSQL 9.1 (better monitoring, synchronous replication option).</p> <h2>Getting ready</h2> <p class="first">Now, I'm using debian here, and a build virtual machine where I'm doing the <em>backporting</em> work. As <a href="http://www.postgresql.org/about/news.1349">PostgreSQL 9.1</a> is now out, let's use that.</p> <pre class="src"> :~$ pg_lsclusters Version Cluster Port Status Owner Data directory 8.4 main 5432 online postgres /var/lib/postgresql/8.4/main ... 9.0 main 5433 online postgres /var/lib/postgresql/9.0/main ... 9.1 main 5434 online postgres /var/lib/postgresql/9.1/main ... </pre> <p>After some editing of the configuration files (enabling <em>hot standby</em> and switching pg_hba.conf to trust for the sake of this example), we can see that the cluster is ready to be abused:</p> <pre class="src"> :~$ sudo pg_ctlcluster 9.1 main restart :~$ psql —cluster 9.1/main -U postgres \ -c <span style="color: #ad7fa8; font-style: italic;">"select name, setting from pg_settings where name in ('max_wal_senders', 'wal_level')"</span>
name setting

+————-
max_wal_senders 1
wal_level hot_standby

(2 rows)

:~$ sudo mkdir -p /etc/walshipping/9.1/main /var/lib/postgresql/walshipping :~$ sudo chown -R postgres:postgres /etc/walshipping /var/lib/postgresql/walshipping

:~$ ssh-keygen -t dsa :~/.ssh$ cp id_dsa.pub authorized_keys :~$ ssh localhost </pre>

<p>So the order of operations is to prepare a standby, then have it restore from the archives, then activate the wal streaming and check that the setup allows the standby to switch back and forth between the streaming and the archives.</p> <h2>Setting walmgr</h2> <p class="first">To prepare the standby, we will do a <em>base backup</em> of the master. That step is handled by walmgr, so we first need to set it up. Here's the sample master.ini file:</p> <pre class="src"> [<span style="color: #8ae234; font-weight: bold;">walmgr</span>] <span style="color: #eeeeec;">job_name</span> = wal-master <span style="color: #eeeeec;">logfile</span> = /var/log/postgresql/%(job_name)s.log <span style="color: #eeeeec;">pidfile</span> = /var/run/postgresql/%(job_name)s.pid <span style="color: #eeeeec;">use_skylog</span> = 0

<span style="color: #eeeeec;">master_db</span> = port=5434 dbname=template1 <span style="color: #eeeeec;">master_data</span> = /var/lib/postgresql/9.1/main/ <span style="color: #eeeeec;">master_config</span> = /etc/postgresql/9.1/main/postgresql.conf <span style="color: #eeeeec;">master_bin</span> = /usr/lib/postgresql/9.1/bin

<span style="color: #888a85;"># </span><span style="color: #888a85;">set this only if you can afford database restarts during setup and stop. </span><span style="color: #eeeeec;">master_restart_cmd</span> = pg_ctlcluster 9.1 main restart

<span style="color: #eeeeec;">slave</span> = 127.0.0.1 <span style="color: #eeeeec;">slave_config</span> = /etc/walshipping/9.1/main/standby.ini

<span style="color: #eeeeec;">walmgr_data</span> = /var/lib/postgresql/walshipping/9.1/main <span style="color: #eeeeec;">completed_wals</span> = %(walmgr_data)s/logs.complete <span style="color: #eeeeec;">partial_wals</span> = %(walmgr_data)s/logs.partial <span style="color: #eeeeec;">full_backup</span> = %(walmgr_data)s/data.master <span style="color: #eeeeec;">config_backup</span> = %(walmgr_data)s/config.backup

<span style="color: #888a85;"># </span><span style="color: #888a85;">syncdaemon update frequency </span><span style="color: #eeeeec;">loop_delay</span> = 10.0 <span style="color: #888a85;"># </span><span style="color: #888a85;">use record based shipping available since 8.2 </span><span style="color: #eeeeec;">use_xlog_functions</span> = 0

<span style="color: #888a85;"># </span><span style="color: #888a85;">pass -z to rsync, useful on low bandwidth links </span><span style="color: #eeeeec;">compression</span> = 0

<span style="color: #888a85;"># </span><span style="color: #888a85;">keep symlinks for pg_xlog and pg_log </span><span style="color: #eeeeec;">keep_symlinks</span> = 1

<span style="color: #888a85;"># </span><span style="color: #888a85;">tell walmgr to set wal_level to hot_standby during setup </span><span style="color: #888a85;">#</span><span style="color: #888a85;">hot_standby = 1 </span> <span style="color: #888a85;"># </span><span style="color: #888a85;">periodic sync </span><span style="color: #888a85;">#</span><span style="color: #888a85;">command_interval = 600 </span><span style="color: #888a85;">#</span><span style="color: #888a85;">periodic_command = /var/lib/postgresql/walshipping/periodic.sh </span></pre>

<p>And the /etc/walshipping/9.1/main/standby.ini companion:</p> <pre class="src"> [<span style="color: #8ae234; font-weight: bold;">walmgr</span>] <span style="color: #eeeeec;">job_name</span> = wal-standby <span style="color: #eeeeec;">logfile</span> = /var/log/postgresql/%(job_name)s.log <span style="color: #eeeeec;">use_skylog</span> = 0

<span style="color: #eeeeec;">slave_data</span> = /var/lib/postgresql/9.1/standby <span style="color: #eeeeec;">slave_bin</span> = /usr/lib/postgresql/9.1/bin <span style="color: #eeeeec;">slave_stop_cmd</span> = pg_ctlcluster 9.1 standby stop <span style="color: #eeeeec;">slave_start_cmd</span> = pg_ctlcluster 9.1 standby start <span style="color: #eeeeec;">slave_config_dir</span> = /etc/postgresql/9.1/standby/

<span style="color: #eeeeec;">walmgr_data</span> = /var/lib/postgresql/walshipping/9.1/main <span style="color: #eeeeec;">completed_wals</span> = %(walmgr_data)s/logs.complete <span style="color: #eeeeec;">partial_wals</span> = %(walmgr_data)s/logs.partial <span style="color: #eeeeec;">full_backup</span> = %(walmgr_data)s/data.master <span style="color: #eeeeec;">config_backup</span> = %(walmgr_data)s/config.backup

<span style="color: #eeeeec;">backup_datadir</span> = no <span style="color: #eeeeec;">keep_backups</span> = 0 <span style="color: #888a85;"># </span><span style="color: #888a85;">archive_command = </span> <span style="color: #888a85;"># </span><span style="color: #888a85;">primary database connect string for hot standby — enabling </span><span style="color: #888a85;"># </span><span style="color: #888a85;">this will cause the slave to be started in hot standby mode. </span><span style="color: #eeeeec;">primary_conninfo</span> = host=127.0.0.1 port=5434 user=postgres </pre>

<p>And let's get started:</p> <pre class="src"> :~$ cp standby.ini /etc/walshipping/9.1/main/

:~$ walmgr3 -v master.ini setup 2011-09-21 16:57:05,685 30450 INFO Configuring WAL archiving 2011-09-21 16:57:05,687 30450 DEBUG found 'archive_mode' in config — enabling it 2011-09-21 16:57:05,687 30450 DEBUG found 'wal_level' in config — setting to 'archive' 2011-09-21 16:57:05,688 30450 DEBUG modifying configuration: {'archive_mode': 'on', 'wal_level': 'archive', 'archive_command': '/usr/bin/walmgr3 /var/lib/postgresql/master.ini xarchive %p %f'} 2011-09-21 16:57:05,688 30450 DEBUG found parameter archive_mode with value ''off'' 2011-09-21 16:57:05,690 30450 DEBUG found parameter wal_level with value ''minimal'' 2011-09-21 16:57:05,690 30450 DEBUG found parameter archive_command with value '''' 2011-09-21 16:57:05,691 30450 INFO Restarting postmaster 2011-09-21 16:57:05,691 30450 DEBUG Execute cmd: 'pg_ctlcluster 9.1 main restart' 2011-09-21 16:57:09,404 30450 DEBUG Execute cmd: 'ssh' '-Tn' '-o' 'Batchmode=yes' '-o' 'StrictHostKeyChecking=no' '127.0.0.1' '/usr/bin/walmgr3' '/etc/walshipping/9.1/main/standby.ini' 'setup' 2011-09-21 16:57:09,712 30450 INFO Done

postgres@squeeze64:~$ walmgr3 master.ini backup 2011-09-21 17:00:17,259 30702 INFO Backup lock obtained. 2011-09-21 17:00:17,277 30692 INFO Execute SQL: select pg_start_backup('FullBackup'); [port=5434 dbname=template1] 2011-09-21 17:00:17,791 30712 INFO Removing expired backup directory: /var/lib/postgresql/walshipping/9.1/main/data.master 2011-09-21 17:00:18,200 30692 INFO Checking tablespaces 2011-09-21 17:00:18,202 30692 INFO pg_log does not exist, skipping 2011-09-21 17:00:18,259 30692 INFO Backup conf files from /etc/postgresql/9.1/main 2011-09-21 17:00:18,590 30731 INFO First useful WAL file is: 000000010000000200000092 2011-09-21 17:00:19,901 30759 INFO Backup lock released. 2011-09-21 17:00:19,919 30692 INFO Full backup successful

:~$ walmgr3 /etc/walshipping/9.1/main/standby.ini listbackups

List of backups:

Backup set Timestamp Label First WAL


———————— ———— ———————— data.master 2011-09-21 17:00:17 CEST FullBackup 000000010000000200000092 </pre>

<p>Following articles will show how to manage that archive and how to go from that to an <em>Hot Standby</em> fed by either <em>Streaming Replication</em> or <em>Archives</em>.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 21 Sep 2011 17:21:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/09/21-skytools-walmgr-part-1.html</guid> </item> <item> <title>el-get-3.1</title> <link>http://tapoueh.org/blog/2011/09/16-el-get-3.1.html</link> <description><![CDATA[<p>The <a href="https://github.com/dimitri/el-get">el-get</a> project releases its new stable version, 3.1. This new release fixes bugs, add a host of new recipes (we have 420 of them and counting) and some nice new features too. You really want to upgrade.</p>

<h2>New features</h2> <p class="first">Among the features you will find dependencies management and M-x el-get-list-packages, that you should try as soon as possible. Of course, don't miss M-x el-get-self-update that eases the process somehow.</p> <center> <p><img src="../../../images/emacs-el-get-list-packages.png" alt=""></p> </center> <p>This shows the result of M-x el-get-list-packages. The packages that don't have a description are the one from <a href="http://www.emacswiki.org/cgi-bin/wiki?action=index;match=%5C.(el&#124;tar)(%5C.gz)%3F%24">emacswiki</a> that doesn't provide a listing of the filename <em>and</em> the first line of the file (it usually follows the format ;;; filename.el --- description here). As we don't want to mirror the website just to be able to provide descriptions, we just don't have them now.</p> <p>Another nice new feature, contributed by a user that wanted to self-learn <a href="http://www.gnu.org/software/emacs/manual/html_node/elisp/index.html">elisp</a>, is the el-get-user-package-directory support. Just place in there some init-my-package.el files, and when <em>el-get</em> wants to init the my-package package, it will load that file for you. That helps managing your setup, and I'm already using that in my own ~/.emacs.d/ repository.</p> <h2>Upgrading</h2> <p class="first">The upgrading is to be done with some care, though, because you need to edit your packaging setup. The el-get-sources variable used to be both where to setup extra recipes and the list of packages you want to have installed, and several people rightfully insisted that I should change that. I've been slow to be convinced, but there it is, they were right.</p> <p>So now, <a href="http://www.emacswiki.org/emacs/el-get">el-get</a> works from the current status of packages and will init all those packages you have installed. Which means that you just M-x el-get-install a package and don't think about it anymore. If you need to override this behavior, it's still possible to do so by specifying the whole list of packages you want initialized (and installed if necessary) on the (el-get 'sync ...) call.</p> <p>That later setup is useful if you want to share your el-get selection on several machines.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 16 Sep 2011 14:13:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/09/16-el-get-3.1.html</guid> </item> <item> <title>PostgreSQL 9.1</title> <link>http://tapoueh.org/blog/2011/09/19-sortie-de-9.1.html</link> <description><![CDATA[<p><a href="http://www.postgresql.org/about/news.1349">PostgreSQL 9.1</a> est dans les bacs ! Vous n'avez pas encore cette nouvelle version en production ? Pas encore évalué pourquoi vous devriez envisager de migrer à cette version ? Il existe beaucoup de bonnes raisons de passer à cette version, et peu de pièges.</p>

<p>Nous commençons à lire des articles qui reprennent la nouvelle dans la presse française, et j'ai le plaisir de mentionner celui de <a href="http://www.programmez.com/actualites.php?titre_actu=Sortie-de-PostgreSQL-91-&#33;&amp;id_actu=10190">programmez.com</a> qui annonce « un système d'extensions inégalé ». En tant que développeur des <a href="http://www.postgresql.org/docs/9.1/static/extend-extensions.html">Extensions</a> dans PostgreSQL, je ne peux qu'être non seulement d'accord avec eux, mais aussi flatté :)</p> <p>Bons tests à tous, et bonne mises à jour pour les plus chanceux !</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 14 Sep 2011 10:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/09/19-sortie-de-9.1.html</guid> </item> <item> <title>Éviter les injections SQL</title> <link>http://tapoueh.org/blog/2011/09/07-eviter-les-injections-sql.html</link> <description><![CDATA[<p>Nous avons parlé la dernière fois les règles d'<a href="http://tapoueh.org/blog/2011/08/18-echappements-de-chaine.html">échappement de chaînes</a> avec PostgreSQL, et mentionné qu'utiliser ces techniques afin de protéger les données insérées dans les requêtes SQL n'était pas une bonne idée dans la mesure où PostgreSQL offre une fonctionnalité bien plus adaptée.</p>

<p>Nous faisons face ici à un problème de sécurité très bien décrit dans le billet humoristique de <a href="http://xkcd.com/327/">Little Boby Tables</a>, dont je vous recommande la lecture. L'idée est simple, la mise en place de contre mesure fourmille de pièges subtils, à moins d'utiliser la solution décrite ci-après.</p> <center> <p><img src="http://imgs.xkcd.com/comics/exploits_of_a_mom.png" alt=""></p> </center> <p>Lorsque l'on envoie une requête SQL à PostgreSQL, celle-ci contient pêle-mêle un mélange de mots-clés SQL et de données utilisateurs. Dans la requête SELECT colname FROM table WHERE pk = 1234; l'élément 1234 est une donnée fournie à PostgreSQL. Lorsque l'on utilise d'autre types de données, on va parler de <em>litéral</em>, qui peut être ou non <em>décoré</em>. Un exemple ?</p> <pre class="src"> # SELECT <span style"color: #ad7fa8; font-style: italic;">'undecorated literal'</span>, pg_typeof(<span style="color: #ad7fa8; font-style: italic;">'undecoreted literal'</span>),

date <span style="color: #ad7fa8; font-style: italic;">'today'</span>, pg_typeof(date <span style="color: #ad7fa8; font-style: italic;">'today'</span>);

?column? pg_typeof date pg_typeof

<span style="color: #888a85;">———————+————+————+————

</span> undecorated literal unknown 2011-09-07 date

(1 row) </pre>

<p>Outre l'aspect types de données (un litéral non décoré est de type <em>unknown</em> jusqu'à ce qu'une opération force son type, c'est ce qui permet d'avoir du polymorphisme dans PostgreSQL), nous voyons ici que PostgreSQL doit faire la différence entre le SQL lui-même et les paramètres qui le composent. Il sait bien sûr faire cela, il suffit d'encadrer les valeurs dans des simples guillemets ou bien d'utiliser la notation dite de <a href="http://docs.postgresqlfr.org/9.0/sql-syntax.html#sql-syntax-dollar-quoting">dollar quoting</a>. Mais si l'on ne prend pas de précautions, l'utilisateur peut terminer la séquence d'échappements depuis le champ de saisie du formulaire…</p> <p><a href="http://docs.postgresql.fr/9.1/libpq.html">libpq</a> est la librairie standard cliente de PostgreSQL et fourni des <em>API</em> de connexion et propose une fonction <a href="http://docs.postgresql.fr/9.1/libpq-exec.html#libpq-pqexecparams">PGexecParams</a>. Cette fonction expose un mécanisme disponible dans le protocole de communication de PostgreSQL lui-même : il est possible de faire parvenir le SQL et les données qu'il contient dans deux parties différentes du messages plutôt que de les mélanger. Ainsi, le serveur n'a plus du tout à deviner où commencent et où terminent les données dans la requête, il lui suffit de regarder dans le tableau séparé contenant les données quand il en a besoin.</p> <p>Terminées les injections SQL !</p> <p>Note : cette fonction est exposée dans la plupart des pilotes de connexion, et même en PHP, dont la popularité et l'exposition me poussent à donner une référence plus précise : utilisez <a href="http://fr2.php.net/manual/en/function.pg-query-params.php">pg_query_params</a>, son intérêt n'est pas simplement syntaxique, il va jusque dans la définition des échanges de données entre le client (votre code PHP) et le serveur (PostgreSQL).</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 07 Sep 2011 11:36:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/09/07-eviter-les-injections-sql.html</guid> </item> <item> <title>Éviter les injections SQL</title> <link>http://tapoueh.org/blog/2011/09/07-requete-parametree.html</link> <description><![CDATA[<p>Nous avons parlé la dernière fois les règles d'<a href="http://tapoueh.org/blog/2011/08/18-echappements-de-chaine.html">échappement de chaînes</a> avec PostgreSQL, et mentionné qu'utiliser ces techniques afin de protéger les données insérées dans les requêtes SQL n'était pas une bonne idée dans la mesure où PostgreSQL offre une fonctionnalité bien plus adaptée.</p>

<p>Nous faisons face ici à un problème de sécurité très bien décrit dans le billet humoristique de <a href="http://xkcd.com/327/">Little Boby Tables</a>, dont je vous recommande la lecture. L'idée est simple, la mise en place de contre mesure fourmille de pièges subtils, à moins d'utiliser la solution décrite ci-après.</p> <center> <p><img src="http://imgs.xkcd.com/comics/exploits_of_a_mom.png" alt=""></p> </center> <p>Lorsque l'on envoie une requête SQL à PostgreSQL, celle-ci contient pêle-mêle un mélange de mots-clés SQL et de données utilisateurs. Dans la requête SELECT colname FROM table WHERE pk = 1234; l'élément 1234 est une donnée fournie à PostgreSQL. Lorsque l'on utilise d'autre types de données, on va parler de <em>litéral</em>, qui peut être ou non <em>décoré</em>. Un exemple ?</p> <pre class="src"> # SELECT <span style"color: #ad7fa8; font-style: italic;">'undecorated literal'</span>, pg_typeof(<span style="color: #ad7fa8; font-style: italic;">'undecoreted literal'</span>),

date <span style="color: #ad7fa8; font-style: italic;">'today'</span>, pg_typeof(date <span style="color: #ad7fa8; font-style: italic;">'today'</span>);

?column? pg_typeof date pg_typeof

<span style="color: #888a85;">———————+————+————+————

</span> undecorated literal unknown 2011-09-07 date

(1 row) </pre>

<p>Outre l'aspect types de données (un litéral non décoré est de type <em>unknown</em> jusqu'à ce qu'une opération force son type, c'est ce qui permet d'avoir du polymorphisme dans PostgreSQL), nous voyons ici que PostgreSQL doit faire la différence entre le SQL lui-même et les paramètres qui le composent. Il sait bien sûr faire cela, il suffit d'encadrer les valeurs dans des simples guillemets ou bien d'utiliser la notation dite de <a href="http://docs.postgresqlfr.org/9.0/sql-syntax.html#sql-syntax-dollar-quoting">dollar quoting</a>. Mais si l'on ne prend pas de précautions, l'utilisateur peut terminer la séquence d'échappements depuis le champ de saisie du formulaire…</p> <p><a href="http://docs.postgresql.fr/9.1/libpq.html">libpq</a> est la librairie standard cliente de PostgreSQL et fourni des <em>API</em> de connexion et propose une fonction <a href="http://docs.postgresql.fr/9.1/libpq-exec.html#libpq-pqexecparams">PGexecParams</a>. Cette fonction expose un mécanisme disponible dans le protocole de communication de PostgreSQL lui-même : il est possible de faire parvenir le SQL et les données qu'il contient dans deux parties différentes du messages plutôt que de les mélanger. Ainsi, le serveur n'a plus du tout à deviner où commencent et où terminent les données dans la requête, il lui suffit de regarder dans le tableau séparé contenant les données quand il en a besoin.</p> <p>Terminées les injections SQL !</p> <p>Note : cette fonction est exposée dans la plupart des pilotes de connexion, et même en PHP, que la popularité et l'exposition me poussent à donner une référence plus précise : utilisez <a href="http://fr2.php.net/manual/en/function.pg-query-params.php">pg_query_params</a>, son intérêt n'est pas simplement syntaxique, il va jusque dans la définition des échanges de données entre le client (votre code PHP) et le serveur (PostgreSQL).</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 07 Sep 2011 11:36:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/09/07-requete-parametree.html</guid> </item> <item> <title>PostgreSQL and debian</title> <link>http://tapoueh.org/blog/2011/09/05-apt-postgresql-org.html</link> <description><![CDATA[<p>After talking about it for a very long time, work finally did begin! I'm talking about the <a href="https://github.com/dimitri/apt.postgresql.org">apt.postgresql.org</a> build system that will allow us, in the long run, to propose debian versions of binary packages for <a href="http://www.postgresql.org/">PostgreSQL</a> and its extensions, compiled for a bunch of debian and ubuntu versions.</p>

<p>We're now thinking to support the i386 and amd64 architectures for lenny, squeeze, wheezy and sid, and also for maverick and natty, maybe oneiric too while at it.</p> <p>It's still the very beginning of the effort, and it was triggered by the decision to move sid to 9.1. While it's a good decision in itself, I still hate to have to pick only one PostgreSQL version per debian stable release when we have all the technical support we need to be able to support all stable releases that <em>upstream</em> is willing to maintain. If you've been living under a rock, or if you couldn't care less about debian choices, the problem here for debian is ensuring security (and fixes) updates for PostgreSQL — they promise they will handle the job just fine in the social contract, and don't want to have to it without support from PostgreSQL if a <em>debian stable</em> release contains a deprecated PostgreSQL version.</p> <p>That opens the door for PostgreSQL community to handle the packaging of its solutions as a service to its debian users. We intend to open with support for 8.4, 9.0 and 9.1, and maybe 8.3 too, as <a href="http://qa.debian.org/developer.php?login=myon">Christoph Berg</a> is doing good progress on this front. See, it's teamwork here!</p> <p>We still have more work to do, and setting up the build environment so that we are able to provide the packages for so much targets will indeed be interesting. Getting there, a step after another.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 05 Sep 2011 17:14:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/09/05-apt-postgresql-org.html</guid> </item> <item> <title>pg_restore -L &amp; pg_staging</title> <link>http://tapoueh.org/blog/2011/08/29-pgstaging-and-pgrestore-listing.html</link> <description><![CDATA[<p>On the <a href="http://archives.postgresql.org/pgsql-hackers">PostgreSQL Hackers</a> mailing lists, <a href="http://people.planetpostgresql.org/andrew/">Andrew Dunstan</a> just proposed some new options for pg_dump and pg_restore to ease our lives. One of the answers was talking about some scripts available to exploit the <a href="http://www.postgresql.org/docs/9.0/static/app-pgrestore.html">pg_restore</a> listing that you play with using options -l and -L, or the long name versions --list and --use-list. The <a href="../../../pgsql/pgstaging.html">pg_staging</a> tool allows you to easily exploit those lists too.</p>

<p>The pg_restore list is just a listing of one object per line of all objects contained into a <em>custom</em> dump, that is one made with pg_dump -Fc. You can then tweak this listing in order to comment out some objects (prepending a ; to the line where you find it), and give your hacked file back to pg_restore --use-list so that it will skip them.</p> <p>What's pretty useful here, among other things, is that a table will have in fact more than one line in the listing. One is for the TABLE definition, another one for the TABLE DATA. So that pg_staging is able to provide you with options for only restoring some <em>schemas</em>, some <em>schemas_nodata</em> and even some <em>tablename_nodata_regexp</em>, to use directly the configuration options names.</p> <p>How to do a very simple exclusion of some table's data when restoring a dump, will you ask me? There we go. Let's first prepare an environment, where I have only a <a href="http://www.postgresql.org/">PostgreSQL</a> server running.</p> <pre class="src"> $ git clone git://github.com/dimitri/pg_staging.git $ git clone git://github.com/dimitri/pgloader.git $ for s in /.sql; do psql -f $s; done $ pg_dump -Fc &gt; pgloader.dump </pre> <p>Now I have a dump with some nearly random SQL objects in it, let's filter out the tables named <em>reformat</em> and <em>parallel</em> from that. We will take the sample setup from the pg_staging project. Going the quick route, we will not even change the default sample database name that's used, which is postgres. After all, the catalog command of pg_staging that we're using here is a <em>developer</em> command, you're supposed to be using pg_staging for a lot more services that just this one.</p> <pre class="src"> $ cp pg_staging/pg_staging.ini . $ (echo <span style="color: #bc8f8f;">"schemas = public"</span>;

echo <span style="color: #bc8f8f;">"tablename_nodata_regexp = parallel,reformat"</span>) \ &gt;&gt; pg_staging.ini $ echo <span style="color: #bc8f8f;">"catalog postgres pgloader.dump"</span> \

python pg_staging/pg_staging.py -c pg_staging.ini

; Archive created at Mon Aug 29 17:17:49 2011 ; ; [EDITED OUTPUT] ; ; Selected TOC Entries: ; 3; 2615 2200 SCHEMA - public postgres 1864; 0 0 COMMENT - SCHEMA public postgres 1536; 1259 174935 TABLE public parallel dimitri 1537; 1259 174943 TABLE public partial dimitri 1538; 1259 174951 TABLE public reformat dimitri ;1853; 0 174935 TABLE DATA public parallel dimitri 1854; 0 174943 TABLE DATA public partial dimitri ;1855; 0 174951 TABLE DATA public reformat dimitri 1834; 2606 174942 CONSTRAINT public parallel_pkey dimitri 1836; 2606 174950 CONSTRAINT public partial_pkey dimitri 1838; 2606 174955 CONSTRAINT public reformat_pkey dimitri </pre>

<p>We can see that the objects indeed are skipped, now how to really go about the pg_restore is like that:</p> <pre class="src"> $ createdb foo $ echo <span style="color: #bc8f8f;">"catalog postgres pgloader.dump"</span> \

|python pg_staging/pg_staging.py -c pg_staging.ini &gt; short.list $ pg_restore -L short.list -d foo pgloader.dump </pre>

<p>The little bonus with using pg_staging is that when filtering out a <em>schema</em> it will track all tables and triggers from that schema, and also the functions used in the trigger definition. Which is not as easy as it sounds, believe me!</p> <p>The practical use case is when filtering out PGQ and Londiste, then the PGQ triggers will automatically be skipped by pg_staging rather than polluting the pg_restore logs because the CREATE TRIGGER command could not find the necessary implementation procedure.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 29 Aug 2011 18:05:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/29-pgstaging-and-pgrestore-listing.html</guid> </item> <item> <title>Skytools, version 3</title> <link>http://tapoueh.org/blog/2011/08/26-skytools3.html</link> <description><![CDATA[<p>You can find <a href="http://packages.debian.org/source/experimental/skytools3">skytools3</a> in debian experimental already, it's in <em>release candidate</em> status. What's missing is the documentation, so here's an idea: I'm going to make a blog post series about <a href="https://github.com/markokr/skytools">skytools</a> next features, how to use them, what they are good for, etc. This first article of the series will just list what are those new features.</p>

<p>Here are the slides from the <a href="http://www.char11.org/">CHAR(11)</a> talk I made last month, about that very subject:</p> <center> <p><a class="image-link" href="../../../images/confs/CHAR_2011_Skytools3.pdf"> <img src="../../../images/confs/CHAR_2011_Skytools3.png"></a></p> </center> <p>The new version comes with a lot of new features. PGQ now is able to duplicate the queue events from one node to the next, so that it's able to manage <em>switching over</em>. To do that we have three types of nodes now, <em>root</em>, <em>branch</em> and <em>leaf</em>. PGQ also supports <em>cooperative consumers</em>, meaning that you can share the processing load among many <em>consumers</em>, or workers.</p> <p>Londiste now benefits from the <em>switch over</em> feature, and is packed with new little features like add &lt;table&gt; --create, the new --trigger-flags argument, and the new --handler thing (to do e.g. partial table replication). Let's not forget the much awaited execute &lt;script&gt; command that allows to include DDL commands into the replication stream, nor the <em>parallel</em> COPY support that will boost your initial setup.</p> <p>walmgr in the new version behaves correctly when using <a href="http://www.postgresql.org">PostgreSQL</a> 9.0. Meaning that as soon as no more <em>WAL</em> files are available in the archives, it returns an error code to the <em>archiver</em> so that the server switches to <em>streaming</em> live from the primary_conninfo, then back to replaying the files from the archive if the connection were to fail, etc. All in all, it just works.</p> <p>Details to follow here, stay tuned!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 26 Aug 2011 21:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/26-skytools3.html</guid> </item> <item> <title>pgfincore in debian</title> <link>http://tapoueh.org/blog/2011/08/19-pgfincore-in-debian.html</link> <description><![CDATA[<p>As of pretty recently, <a href="http://villemain.org/projects/pgfincore">pgfincore</a> is now in debian, as you can see on its <a href="http://packages.debian.org/sid/postgresql-9.0-pgfincore">postgresql-9.0-pgfincore</a> page. The reason why it entered the <a href="http://www.debian.org/">debian</a> archives is that it reached the 1.0 release!</p>

<p>Rather than talking about what <em>pgfincore</em> is all about (<em>A set of functions to manage pages in memory from PostgreSQL</em>), I will talk about its packaging and support as a <em>debian package</em>. Here's the first example of a modern multi-version packaging I have to offer. <a href="https://github.com/dimitri/pgfincore/tree/master/debian">pgfincore packaging</a> supports building for 8.4 and 9.0 and 9.1 out of the box, even if the only binary you'll find in <em>debian</em> sid is the 9.0 one, as you can check on the <a href="http://packages.debian.org/source/sid/pgfincore">pgfincore debian source package</a> page.</p> <p>Also, this is the first package I've done properly using the newer version of <a href="http://kitenet.net/~joey/code/debhelper/">debhelper</a>, which make the <a href="https://github.com/dimitri/pgfincore/blob/master/debian/rules">debian/rules</a> file easier than ever. Let's have a look at it:</p> <pre class="src"> <span style="color: #b8860b;">SRCDIR</span> = $(<span style="color: #b8860b;">CURDIR</span>) <span style="color: #b8860b;">TARGET</span> = $(<span style="color: #b8860b;">CURDIR</span>)/debian/pgfincore-%v
<span style="color: #b8860b;">PKGVERS</span> = $(<span style="color: #b8860b;">shell</span> dpkg-parsechangelog awk -F <span style="color: #bc8f8f;">'[:-]'</span> <span style="color: #bc8f8f;">'/^Version:/ { print substr($$2, 2) }'</span>)

<span style="color: #b8860b;">EXCLUDE</span> = —exclude-vcs —exclude=debian

<span style="color: #7f007f;">include</span> <span style="color: #b8860b;">/usr/share/postgresql-common/pgxs_debian_control.mk</span>

<span style="color: #0000ff;">override_dh_auto_clean</span>: debian/control

pg_buildext clean $(<span style="color: #b8860b;">SRCDIR</span>) $(<span style="color: #b8860b;">TARGET</span>) <span style="color: #bc8f8f;">"$(</span><span style="color: #b8860b;">CFLAGS</span><span style="color: #bc8f8f;">)"</span> dh_clean

<span style="color: #0000ff;">override_dh_auto_build</span>: <span style="background-color: #ff69b4;"> #</span><span style="color: #b22222;"> </span><span style="color: #b22222;">build all supported version </span> pg_buildext build $(<span style="color: #b8860b;">SRCDIR</span>) $(<span style="color: #b8860b;">TARGET</span>) <span style="color: #bc8f8f;">"$(</span><span style="color: #b8860b;">CFLAGS</span><span style="color: #bc8f8f;">)"</span>

<span style="color: #0000ff;">override_dh_auto_install</span>: <span style="background-color: #ff69b4;"> #</span><span style="color: #b22222;"> </span><span style="color: #b22222;">then install each of them </span> for v in <span style="color: #bc8f8f;">`pg_buildext supported-versions $(</span><span style="color: #b8860b;">SRCDIR</span><span style="color: #bc8f8f;">)`</span>; do \

dh_install -ppostgresql-$$v-pgfincore ;\ done

<span style="color: #0000ff;">orig</span>: clean

cd .. &amp;&amp; tar czf pgfincore_$(<span style="color: #b8860b;">PKGVERS</span>).orig.tar.gz $(<span style="color: #b8860b;">EXCLUDE</span>) pgfincore

<span style="color: #0000ff;">%</span>:

dh <span style="color: #0000ff;">$</span><span style="color: #5f9ea0;">@</span> </pre>

<p>The debian/rules file is known to be the corner stone of your debian packaging, and usually is the most complex part of it. It's a Makefile at its heart, and we can see that thanks to the debhelper magic it's not that complex to maintain anymore.</p> <p>Then, this file is using support from a bunch of helpers command, each of them comes with its own man page and does a little part of the work. The overall idea around debhelper is that what it does covers 90% of the cases around, and it's not aiming for more. You have to <em>override</em> the parts where it defaults to being wrong.</p> <p>Here for example the build system has to produce files for all three supported versions of <a href="http://www.postgresql.org/">PostgreSQL</a>, which means invoking the same build system three time with some changes in the <em>environment</em> (mainly setting the PG_CONFIG variable correctly). But even for that we have a <em>debian</em> facility, that comes in the package <a href="http://packages.debian.org/sid/postgresql-server-dev-all">postgresql-server-dev-all</a>, called pg_buildext. As long as your extension build system is VPATH friendly, it's all automated.</p> <p>Please read that last sentence another time. VPATH is the thing that allows Make to find your source tree somewhere in the system, not in the current working directory. That allows you to cleanly build the same sources in different build locations, which is exactly what we need here, and is cleanly supported by <a href="http://www.postgresql.org/docs/9.1/static/extend-pgxs.html">PGXS</a>, the <a href="http://www.postgresql.org/docs/9.1/static/extend-pgxs.html">PostgreSQL Extension Building Infrastructure</a>.</p> <p>Which means that the main Makefile of <em>pgfincore</em> had to be simplified, and the code layout too. Some advances Make features such as $(wildcard ...) and all will not work here. See what we got at the end:</p> <pre class="src"> ifndef VPATH <span style="color: #b8860b;">SRCDIR</span> = . else <span style="color: #b8860b;">SRCDIR</span> = $(<span style="color: #b8860b;">VPATH</span>) endif

<span style="color: #b8860b;">EXTENSION</span> = pgfincore

<span style="color: #b8860b;">EXTVERSION</span> = $(<span style="color: #b8860b;">shell</span> grep default_version $(<span style="color: #b8860b;">SRCDIR</span>)/$(<span style="color: #b8860b;">EXTENSION</span>).control \

sed -e <span style="color: #bc8f8f;">"s/default_version:space:*=:space:*'\([^']*\)'/\1/"</span>)

<span style="color: #b8860b;">MODULES</span> = $(<span style="color: #b8860b;">EXTENSION</span>) <span style="color: #b8860b;">DATA</span> = sql/pgfincore.sql sql/uninstall_pgfincore.sql <span style="color: #b8860b;">DOCS</span> = doc/README.$(<span style="color: #b8860b;">EXTENSION</span>).rst

<span style="color: #b8860b;">PG_CONFIG</span> = pg_config

<span style="color: #b8860b;">PG91</span> = $(<span style="color: #b8860b;">shell</span> $(<span style="color: #b8860b;">PG_CONFIG</span>) —version grep -qE <span style="color: #bc8f8f;">"8\.|9\.0"</span> &amp;&amp; echo no echo yes)

ifeq ($(<span style="color: #b8860b;">PG91</span>),yes) <span style="color: #0000ff;">all</span>: pgfincore—$(<span style="color: #b8860b;">EXTVERSION</span>).sql

<span style="color: #0000ff;">pgfincore—$(</span><span style="color: #0000ff;">EXTVERSION</span><span style="color: #0000ff;">).sql</span>: sql/pgfincore.sql

cp $<span style="color: #5f9ea0;">&lt;</span> <span style="color: #0000ff;">$</span><span style="color: #5f9ea0;">@</span>

<span style="color: #b8860b;">DATA</span> = pgfincore—unpackaged—$(<span style="color: #b8860b;">EXTVERSION</span>).sql pgfincore—$(<span style="color: #b8860b;">EXTVERSION</span>).sql <span style="color: #b8860b;">EXTRA_CLEAN</span> = sql/$(<span style="color: #b8860b;">EXTENSION</span>)—$(<span style="color: #b8860b;">EXTVERSION</span>).sql endif

<span style="color: #b8860b;">PGXS</span> := $(<span style="color: #b8860b;">shell</span> $(<span style="color: #b8860b;">PG_CONFIG</span>) —pgxs) <span style="color: #7f007f;">include</span> $(<span style="color: #b8860b;">PGXS</span>)

<span style="color: #0000ff;">deb</span>:

dh clean make -f debian/rules orig debuild -us -uc -sa </pre>

<p>No more Make magic to find source files. Franckly though, when your sources are 1 c file and 2 sql files, you don't need that much magic anyway. You just want to believe that a single generic Makefile will happily build any project you throw at it, only requiring minor adjustment. Well, the reality is that you might need some more little adjustments if you want to benefit from VPATH building, and having the binaries for 8.4 and 9.0 and 9.1 built seemlessly in a simple loop. Like we have here for <em>pgfincore</em>.</p> <p>Now the Makefile still contains a little bit of magic, in order to parse the extension version number from its <em>control file</em> and produce a <em>script</em> named accordingly. Then you'll notice a difference between the <a href="https://github.com/dimitri/pgfincore/blob/master/debian/postgresql-9.1-pgfincore.install">postgresql-9.1-pgfincore.install</a> file and the <a href="https://github.com/dimitri/pgfincore/blob/master/debian/postgresql-9.0-pgfincore.install">postgresql-9.0-pgfincore.install</a>. We're just not shipping the same files:</p> <pre class="src"> debian/pgfincore-9.0/pgfincore.so usr/lib/postgresql/9.0/lib sql/pgfincore.sql usr/share/postgresql/9.0/contrib sql/uninstall_pgfincore.sql usr/share/postgresql/9.0/contrib </pre> <p>As you can see here:</p> <pre class="src"> debian/pgfincore-9.1/pgfincore.so usr/lib/postgresql/9.1/lib debian/pgfincore-9.1/pgfincore*.sql usr/share/postgresql/9.1/extension sql/pgfincore—unpackaged—1.0.sql usr/share/postgresql/9.1/extension </pre> <p>So, now that we uncovered all the relevant magic, packaging and building your next extension so that it supports as many PostgreSQL major releases as you need to will be that easy.</p> <p>For reference, you might need to also tweak /usr/share/postgresql-common/supported-versions so that it allows you to build for all those versions you claim to support in the <a href="https://github.com/dimitri/pgfincore/blob/master/debian/pgversions">debian/pgversions</a> file.</p> <pre class="src"> $ sudo dpkg-divert \ —divert /usr/share/postgresql-common/supported-versions.distrib \ —rename /usr/share/postgresql-common/supported-versions

$ cat /usr/share/postgresql-common/supported-versions /bin/bash

dpkg -l postgresql-server-dev-* \ | awk -F '[ -]' '/^ii/ &amp;&amp; ! /server-dev-all/ {print $6}' </pre>

<p>All of this will come pretty handy when we finally sit down and work on a way to provide binary packages for PostgreSQL and its extensions, and all supported versions of those at that. This very project is not dead, it's just sleeping some more.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 19 Aug 2011 23:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/19-pgfincore-in-debian.html</guid> </item> <item> <title>Échappement de chaînes</title> <link>http://tapoueh.org/blog/2011/08/18-echappements-de-chaine.html</link> <description><![CDATA[<p>Parmis les nouveautés de la <a href="http://www.postgresql.org/about/news.1331">prochaine version</a> de <a href="http://www.postgresql.org/">PostgreSQL</a>, la fameuse 9.1, il faut signaler le changement de valeur par défaut de la variable standard_conforming_strings, qui passe à <em>vraie</em>.</p>

<p>En effet, l'utilisation d'échappements avec le caractère « anti-slash » n'est pas conforme au standard SQL. Le paramètre standard_conforming_strings permet de contrôler le comportement de PostgreSQL lorsqu'il lit une chaîne de caractère dans une requête SQL.</p> <p>Voyons quelques exemples :</p> <pre class="src"> dimitri=# set standard_conforming_strings to true; SET dimitri=# select 'hop''';

?column?


hop' (1 ligne)

dimitri=# select 'hop\''; dimitri'# ';

?column?


hop\';

(1 ligne)

dimitri=# select E'hop\'';

?column?


hop' (1 ligne)

dimitri=# set standard_conforming_strings to false; SET dimitri=# select E'hop\'';

?column?


hop' (1 ligne)

dimitri=# select 'hop\''; ATTENTION: utilisation non standard de \' dans une cha&#238;ne litt&#233;rale LIGNE 1 : select 'hop\'';

^ ASTUCE : Utilisez '' pour &#233;crire des guillemets dans une cha&#238;ne ou utilisez la syntaxe de cha&#238;ne d'&#233;chappement (E'...'). ?column?


hop' (1 ligne) </pre>

<p>Il existe un moyen de forcer PostgreSQL à accepter l'utilisation d'échappements avec « anti-slash » indépendamment de la valeur de standard_conforming_strings, c'est la notation préfixée avec E. Il est recommandé de toujours l'utiliser dès lors que la chaîne de caractère contient des « anti-slash » utilisés comme échappement (du caractère simple guillemet en général).</p> <p>Le paramètre escape_string_warning, enfin, permet de désactiver les avertissements tels que présentés dans le dernier exemple ci-dessus, lorsqu'il est positionné à off. Bien sûr, sa valeur par défaut est on.</p> <p>Toute apparition de ce <em>WARNING</em> lorsque escape_string_warning est on signifie que votre application n'est pas prête à migrer à 9.1 avec son paramétrage par défaut. Il existe deux actions possible : changer le paramétrage de sa nouvelle valeur par défaut à sa précédente, ou bien corriger ses applications pour utiliser le préfixe E dès que cela est nécessaire.</p> <p>L'utilisation de standard_conforming_strings à on présente un autre avantage au respect du standard SQL : la sécurité contre les injections. S'il n'est pas possible d'échapper le guillemet simple qui termine toute chaîne de caractère utilisateur, il devient compliqué de jouer au plus malin avec le <em>parser</em>. Le mieux ici reste bien sûr d'utiliser les requêtes paramétrées, à suivre dans un prochain article.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 18 Aug 2011 19:01:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/18-echappements-de-chaine.html</guid> </item> <item> <title>el-get-list-packages</title> <link>http://tapoueh.org/blog/2011/08/18-el-get-list-packages.html</link> <description><![CDATA[<p>From the first days of <a href="../../../emacs/el-get.html">el-get</a> is was quite clear for me that we would reach a point where users would want a nice listing including descriptions of the packages, and a <em>major mode</em> allowing you to select packages to install, remove and update. It was also quite clear that I was not much interested into doing it myself, even if I would appreciate having it done.</p>

<p>Well, the joy of Open Source &amp; Free Software (pick your own poison). <a href="https://github.com/jglee1027">jglee1027</a> is this <em>GitHub</em> guy who did offer an implementation of said facility, and who added descriptions for almost all of the now 402 recipes that we have included with <a href="../../../emacs/el-get.html">el-get</a>.</p> <p>Here's an image of what you get:</p> <center> <p><img src="../../../images/emacs-el-get-list-packages.png" alt=""></p> </center> <p>The packages with no description are fetched by M-x el-get-emacswiki-refresh which will not download all <a href="http://emacswiki.org">emacswiki</a> content locally just so that it can parse the scripts's header and have a local description. Maybe it's time to ask for another page over there like <a href="http://www.emacswiki.org/cgi-bin/wiki?action=index;match=%5C.(el%7Ctar)(%5C.gz)%3F%24">emacswiki page index</a> but containing the first line too.</p> <p>For recipes we offer, this first line often looks like the following:</p> <pre class="src"> <span style="color: #b22222;">;;; </span><span style="color: #b22222;">123-menu.el — Simple menuing system, reminiscent of Lotus 123 in DOS </span></pre> <p>Of course some files over there are not following the stanza, but that would be good enough already.</p> <p>All in all, I hope you enjoy M-x el-get-list-packages!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 18 Aug 2011 18:10:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/18-el-get-list-packages.html</guid> </item> <item> <title>Tutoriel pgloader</title> <link>http://tapoueh.org/blog/2011/08/15-tutoriel-pgloader.html</link> <description><![CDATA[<p>En reprenant le contenu des articles de la série sur <a href="http://tapoueh.org/pgsql/pgloader.html">pgloader</a>, j'ai pris le temps de compiler un tutoriel complet, en anglais. Si j'en crois les quelques mails que je reçois régulièrement au sujet de pgloader depuis quelques années maintenant, cela devrait aider les nouveaux utilisateurs.</p> ]]></description> <author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 15 Aug 2011 15:39:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/15-tutoriel-pgloader.html</guid> </item> <item> <title>pgloader tutorial</title> <link>http://tapoueh.org/blog/2011/08/15-pgloader-tutorial.html</link> <description><![CDATA[<p>To finish up the pgloader series, I've compiled all the information into a single page, the long awaited <a href="http://tapoueh.org/pgsql/pgloader.html#sec5">pgloader tutorial</a>. That should help lots of users to get started with pgloader.</p> ]]></description> <author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 15 Aug 2011 15:33:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/15-pgloader-tutorial.html</guid> </item> <item> <title>pgloader constant cols</title> <link>http://tapoueh.org/blog/2011/08/12-pgloader-udc.html</link> <description><![CDATA[<p>The previous articles in the <a href="../../../pgsql/pgloader.html">pgloader</a> series detailed <a href="http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html">How To Use PgLoader</a> then <a href="http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html">How to Setup pgloader</a>, then what to expect from a <a href="http://tapoueh.org/blog/2011/08/01-parallel-pgloader.html">parallel pgloader</a> setup, and then <a href="http://tapoueh.org/blog/2011/08/05-reformating-modules-for-pgloader.html">pgloader reformating</a>. Another need you might encounter when you get to use <a href="../../../pgsql/pgloader.html">pgloader</a> is adding <em>constant</em> values into a table's column.</p>

<p>The basic situation where you need to do so is adding an <em>origin</em> field to your table. The value of that is not to be found in the data file itself, typically, but known in the pgloader setup. That could even be the filename you are importing data from.</p> <p>In <a href="../../../pgsql/pgloader.html">pgloader</a> that's called a <em>user defined column</em>. Here's what the relevant <a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> setup looks like:</p> <pre class="src"> [<span style="color: #228b22;">udc</span>] <span style="color: #b8860b;">table</span> = udc <span style="color: #b8860b;">format</span> = text <span style="color: #b8860b;">filename</span> = udc/udc.data <span style="color: #b8860b;">input_encoding</span> = <span style="color: #bc8f8f;">'latin1'</span> <span style="color: #b8860b;">field_sep</span> = % <span style="color: #b8860b;">columns</span> = b:2, d:1, x:3, y:4 <span style="color: #b8860b;">udc_c</span> = constant value <span style="color: #b8860b;">copy_columns</span> = b, c, d </pre> <p>And the data file is:</p> <pre class="src"> 1%5%foo%bar 2%10%bar%toto 3%4%toto%titi 4%18%titi%baz 5%2%baz%foo </pre> <p>And here's what the loaded table looks like:</p> <pre class="src"> pgloader/examples$ pgloader -Tsc pgloader.conf udc
Table name duration size copy rows errors

====================================================================

udc 0.201s - 5 0

pgloader/examples$ psql —cluster 8.4/main pgloader -c <span style="color: #bc8f8f;">"table udc"</span>

b c d

+—————-+—
5 constant value 1
10 constant value 2
4 constant value 3
18 constant value 4
2 constant value 5

(5 rows) </pre>

<p>Of course the configuration is not so straightforward as to process fields in the data file in the order that they appear, after all the <a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> are also a test suite.</p> <p>Long story short: if you need to add some <em>constant</em> values into the target table you're loading data to, <a href="../../../pgsql/pgloader.html">pgloader</a> will help you there!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 12 Aug 2011 11:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/12-pgloader-udc.html</guid> </item> <item> <title>pgloader constant cols</title> <link>http://tapoueh.org/blog/2011/08/12-pgloader-udc.html</link> <description><![CDATA[<p>The previous articles in the <a href="../../../pgsql/pgloader.html">pgloader</a> series detailed <a href="http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html">How To Use PgLoader</a> then <a href="http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html">How to Setup pgloader</a>, then what to expect from a <a href="http://tapoueh.org/blog/2011/08/01-parallel-pgloader.html">parallel pgloader</a> setup, and then <a href="http://tapoueh.org/blog/2011/08/05-reformating-modules-for-pgloader.html">pgloader reformating</a>. Another need you might encounter when you get to use <a href="../../../pgsql/pgloader.html">pgloader</a> is adding <em>constant</em> values into a table's column.</p>

<p>The basic situation where you need to do so is adding an <em>origin</em> field to your table. The value of that is not to be found in the data file itself, typically, but known in the pgloader setup. That could even be the filename you are importing data from.</p> <p>In <a href="../../../pgsql/pgloader.html">pgloader</a> that's called a <em>user defined column</em>. Here's what the relevant <a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> setup looks like:</p> <pre class="src"> [<span style="color: #8ae234; font-weight: bold;">udc</span>] <span style="color: #eeeeec;">table</span> = udc <span style="color: #eeeeec;">format</span> = text <span style="color: #eeeeec;">filename</span> = udc/udc.data <span style="color: #eeeeec;">input_encoding</span> = <span style="color: #ad7fa8; font-style: italic;">'latin1'</span> <span style="color: #eeeeec;">field_sep</span> = % <span style="color: #eeeeec;">columns</span> = b:2, d:1, x:3, y:4 <span style="color: #eeeeec;">udc_c</span> = constant value <span style="color: #eeeeec;">copy_columns</span> = b, c, d </pre> <p>And the data file is:</p> <pre class="src"> 1%5%foo%bar 2%10%bar%toto 3%4%toto%titi 4%18%titi%baz 5%2%baz%foo </pre> <p>And here's what the loaded table looks like:</p> <pre class="src"> pgloader/examples$ pgloader -Tsc pgloader.conf udc
Table name duration size copy rows errors

====================================================================

udc 0.201s - 5 0

pgloader/examples$ psql —cluster 8.4/main pgloader -c <span style="color: #ad7fa8; font-style: italic;">"table udc"</span>

b c d

+—————-+—
5 constant value 1
10 constant value 2
4 constant value 3
18 constant value 4
2 constant value 5

(5 rows) </pre>

<p>Of course the configuration is not so straightforward as to process fields in the data file in the order that they appear, after all the <a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> are also a test suite.</p> <p>Long story short: if you need to add some <em>constant</em> values into the target table you're loading data to, <a href="../../../pgsql/pgloader.html">pgloader</a> will help you there!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 12 Aug 2011 11:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/12-pgloader-udc.html</guid> </item> <item> <title>Emacs Startup</title> <link>http://tapoueh.org/blog/2011/08/blog/2011/08/06-emacs-startup-notification.html</link> <description><![CDATA[<p>Using <a href="http://www.gnu.org/software/emacs/">Emacs</a> we get to manage a larger and larger setup file (either ~/.emacs or ~/.emacs.d/init.el), sometime with lots of dependencies, and some sub-files thanks to the load function or the provide and require mechanism.</p>

<p>Some users are even starting Emacs often enough for the startup time to be a concern. With an emacs-uptime (yes it's a command, you can M-x emacs-uptime) of days to weeks (10 days, 17 hours, 45 minutes, 34 seconds as of this writing), it's not something I really care about much.</p> <p>But I know that some <a href="http://tapoueh.org/emacs/el-get.html">el-get</a> users still do care, and will use el-get-is-lazy and do all their Emacs tweaking as eval-after-load blocks. Trying to have an idea of how much a <em>worst case</em> startup with <a href="http://www.emacswiki.org/emacs/el-get">el-get</a> is, I have added the following piece of elisp at the very end of my startup code:</p> <pre class="src"> (<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">dim:notify-startup-done</span> ()

<span style="color: #bc8f8f;">" notify user that Emacs is now ready"</span> (el-get-notify <span style="color: #bc8f8f;">"Emacs is ready."</span> (format <span style="color: #bc8f8f;">"The init sequence took %g seconds."</span> (float-time (time-subtract after-init-time before-init-time)))))

(add-hook 'after-init-hook 'dim:notify-startup-done) </pre>

<p>The el-get-notify function will adapt and either use the dbus implementation from Emacs 24, or <a href="http://www.emacswiki.org/emacs/notify.el">notify.el</a> from <a href="http://www.emacswiki.org/">EmacsWiki</a> (just M-x el-get-install it if you need it), or will use its own implementation of an Emacs <a href="http://growl.info/">Growl</a> client (it's about 5 lines long), and baring all of that will use the message function.</p> <p>The reason I say <em>worst case</em> is that I have a lot of packages to initialize at startup, and that I did absolutely no effort for this initializing to be quick. Still, my Emacs setup is taking about 20 seconds to boot. Pretty good I would say, for a weekly operation.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Sat, 06 Aug 2011 14:58:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/blog/2011/08/06-emacs-startup-notification.html</guid> </item> <item> <title>Emacs Startup</title> <link>http://tapoueh.org/blog/2011/08/06-emacs-startup-notification.html</link> <description><![CDATA[<p>Using <a href="http://www.gnu.org/software/emacs/">Emacs</a> we get to manage a larger and larger setup file (either ~/.emacs or ~/.emacs.d/init.el), sometime with lots of dependencies, and some sub-files thanks to the load function or the provide and require mechanism.</p>

<p>Some users are even starting Emacs often enough for the startup time to be a concern. With an emacs-uptime (yes it's a command, you can M-x emacs-uptime) of days to weeks (10 days, 17 hours, 45 minutes, 34 seconds as of this writing), it's not something I really care about much.</p> <p>But I know that some <a href="http://tapoueh.org/emacs/el-get.html">el-get</a> users still do care, and will use el-get-is-lazy and do all their Emacs tweaking as eval-after-load blocks. Trying to have an idea of how much a <em>worst case</em> startup with <a href="http://www.emacswiki.org/emacs/el-get">el-get</a> is, I have added the following piece of elisp at the very end of my startup code:</p> <pre class="src"> (<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">dim:notify-startup-done</span> ()

<span style="color: #bc8f8f;">" notify user that Emacs is now ready"</span> (el-get-notify <span style="color: #bc8f8f;">"Emacs is ready."</span> (format <span style="color: #bc8f8f;">"The init sequence took %g seconds."</span> (float-time (time-subtract after-init-time before-init-time)))))

(add-hook 'after-init-hook 'dim:notify-startup-done) </pre>

<p>The el-get-notify function will adapt and either use the dbus implementation from Emacs 24, or <a href="http://www.emacswiki.org/emacs/notify.el">notify.el</a> from <a href="http://www.emacswiki.org/">EmacsWiki</a> (just M-x el-get-install it if you need it), or will use its own implementation of an Emacs <a href="http://growl.info/">Growl</a> client (it's about 5 lines long), and baring all of that will use the message function.</p> <p>The reason I say <em>worst case</em> is that I have a lot of packages to initialize at startup, and that I did absolutely no effort for this initializing to be quick. Still, my Emacs setup is taking about 20 seconds to boot. Pretty good I would say, for a weekly operation.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Sat, 06 Aug 2011 14:58:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/06-emacs-startup-notification.html</guid> </item> <item> <title>pgloader reformating</title> <link>http://tapoueh.org/blog/2011/08/05-reformating-modules-for-pgloader.html</link> <description><![CDATA[<p>Back to our series about <a href="../../../pgsql/pgloader.html">pgloader</a>. The previous articles detailed <a href="http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html">How To Use PgLoader</a> then <a href="http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html">How to Setup pgloader</a>, then what to expect from a <a href="http://tapoueh.org/blog/2011/08/01-parallel-pgloader.html">parallel pgloader</a> setup. This article will detail how to <em>reformat</em> input columns so that what <a href="http://www.postgresql.org/">PostgreSQL</a> sees is not what's in the data file, but the result of a <em>transformation</em> from this data into something acceptable as an <em>input</em> for the target data type.</p>

<p>Here's what the <a href="http://pgloader.projects.postgresql.org/">pgloader documentation</a> has to say about this <em>reformat</em> parameter: <em>The value of this option is a comma separated list of columns to rewrite, which are a colon separated list of column name, reformat module name, reformat function name</em>.</p> <p>And here's the <a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> section that deals with reformat:</p> <pre class="src"> [<span style="color: #8ae234; font-weight: bold;">reformat</span>] <span style="color: #eeeeec;">table</span> = reformat <span style="color: #eeeeec;">format</span> = text <span style="color: #eeeeec;">filename</span> = reformat/reformat.data
<span style="color: #eeeeec;">field_sep</span> =

<span style="color: #eeeeec;">columns</span> = id, timestamp <span style="color: #eeeeec;">reformat</span> = timestamp:mysql:timestamp </pre>

<p>The documentation says some more about it, so check it out. Also, the reformat_path option (set either on the command line or in the configuration file) is used to find the python module implementing the reformat function. Please refer to the manual as to how to set it.</p> <p>Now, obviously, for the <em>reformat</em> to happen we need to write some code. That's the whole point of the option: you need something very specific, you are in a position to write the 5 lines of code needed to make it happen, <a href="http://tapoueh.org/pgsql/pgloader.html">pgloader</a> allows you to just do that. Of course, the code needs to be written in python here, so that you can even benefit from the <a href="http://tapoueh.org/blog/2011/08/01-parallel-pgloader.html">parallel pgloader</a> settings.</p> <p>Let's see an reformat module exemple, as found in <a href="https://github.com/dimitri/pgloader/blob/master/reformat/mysql.py">reformat/mysql.py</a> in the pgloader sources:</p> <pre class="src"> <span style="color: #888a85;"># </span><span style="color: #888a85;">Author: Dimitri Fontaine &lt;<a href="mailto:dim&#64;tapoueh.org">dim&#64;tapoueh.org</a>&gt; </span><span style="color: #888a85;">#</span><span style="color: #888a85;"> </span><span style="color: #888a85;"># </span><span style="color: #888a85;">pgloader mysql reformating module </span><span style="color: #888a85;">#</span><span style="color: #888a85;"> </span> <span style="color: #729fcf; font-weight: bold;">def</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">timestamp</span>(reject, <span style="color: #729fcf;">input</span>):

<span style="color: #ad7fa8; font-style: italic;">""" Reformat str as a PostgreSQL timestamp

MySQL timestamps are like: 20041002152952 We want instead this input: 2004-10-02 15:29:52 """</span> <span style="color: #729fcf; font-weight: bold;">if</span> <span style="color: #729fcf;">len</span>(<span style="color: #729fcf;">input</span>) != 14: <span style="color: #eeeeec;">e</span> = <span style="color: #ad7fa8; font-style: italic;">"MySQL timestamp reformat input too short: %s"</span> % <span style="color: #729fcf;">input</span> reject.log(e, <span style="color: #729fcf;">input</span>)

<span style="color: #eeeeec;">year</span> = <span style="color: #729fcf;">input</span>[0:4] <span style="color: #eeeeec;">month</span> = <span style="color: #729fcf;">input</span>[4:6] <span style="color: #eeeeec;">day</span> = <span style="color: #729fcf;">input</span>[6:8] <span style="color: #eeeeec;">hour</span> = <span style="color: #729fcf;">input</span>[8:10] <span style="color: #eeeeec;">minute</span> = <span style="color: #729fcf;">input</span>[10:12] <span style="color: #eeeeec;">seconds</span> = <span style="color: #729fcf;">input</span>[12:14]

<span style="color: #729fcf; font-weight: bold;">return</span> <span style="color: #ad7fa8; font-style: italic;">'%s-%s-%s %s:%s:%s'</span> % (year, month, day, hour, minute, seconds) </pre>

<p>This reformat module will <em>transform</em> a timestamp representation as issued by certain versions of MySQL into something that PostgreSQL is able to read as a timestamp.</p> <p>If you're in the camp that wants to write as little code as possible rather than easy to read and maintain code, I guess you could write it this way instead:</p> <pre class="src"> <span style="color: #729fcf; font-weight: bold;">import</span> re <span style="color: #729fcf; font-weight: bold;">def</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">timestamp</span>(reject, <span style="color: #729fcf;">input</span>):

<span style="color: #ad7fa8; font-style: italic;">""" 20041002152952 -&gt; 2004-10-02 15:29:52 """</span> <span style="color: #eeeeec;">g</span> = re.match(r<span style="color: #ad7fa8; font-style: italic;">"(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})"</span>, <span style="color: #729fcf;">input</span>) <span style="color: #729fcf; font-weight: bold;">return</span> <span style="color: #ad7fa8; font-style: italic;">'%s-%s-%s %s:%s:%s'</span> % <span style="color: #729fcf;">tuple</span>([g.group(x+1) <span style="color: #729fcf; font-weight: bold;">for</span> x <span style="color: #729fcf; font-weight: bold;">in</span> <span style="color: #729fcf;">range</span>(6)]) </pre>

<p>Whenever you have an input file with data that PostgreSQL chokes upon, you can solve this problem from <a href="http://tapoueh.org/pgsql/pgloader.html">pgloader</a> itself: no need to resort to scripting and a pipelines of <a href="http://www.gnu.org/software/gawk/manual/gawk.html">awk</a> (which I use a lot in other cases, don't get me wrong) or other tools. See, you finally have an excuse to <a href="http://diveintopython.org/">Dive into Python</a>!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 05 Aug 2011 11:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/05-reformating-modules-for-pgloader.html</guid> </item> <item> <title>Reformater avec pgloader</title> <link>http://tapoueh.org/blog/2011/08/05-reformater-avec-pgloader.html</link> <description><![CDATA[<p>Dans la série de nos articles sur <a href="http://tapoueh.org/tags/pgloader.html">pgloader</a>, le dernier venu détaille comment utiliser la fonction de <em>reformatage</em> de cet outil. Dans le cadre d'utilisation d'un <a href="http://fr.wikipedia.org/wiki/Extract_Transform_Load">ETL</a>, cela est assimilé à la phase <em>Transform</em>, ce qui fait de pgloader une solution <em>simple</em> pour vos besoins d'ETL.</p> ]]></description> <author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 05 Aug 2011 11:26:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/05-reformater-avec-pgloader.html</guid> </item> <item> <title>See Tsung in action</title> <link>http://tapoueh.org/blog/2011/08/02-see-tsung-in-action.html</link> <description><![CDATA[<p><a href="http://tsung.erlang-projects.org/">Tsung</a> is an open-source multi-protocol distributed load testing tool and a mature project. It's been available for about 10 years and is built with the <a href="http://www.erlang.org/">Erlang</a> system. It supports several protocols, including the <a href="http://www.postgresql.org/">PostgreSQL</a> one.</p>

<p>When you want to benchmark your own application, to know how many more clients it can handle or how much gain you will see with some new shiny hardware, <a href="http://tsung.erlang-projects.org/">Tsung</a> is the tool to use. It will allow you to <em>record</em> a number of sessions then replay them at high scale. <a href="http://pgfouine.projects.postgresql.org/tsung.html">pgfouine</a> supports Tsung and is able to turn your PostgreSQL logs into Tsung sessions, too.</p> <p>Tsung did get used in the video game world, their version of it is called <a href="http://www.developer.unitypark3d.com/tools/utsung/">uTsung</a>, apparently using the <a href="http://www.developer.unitypark3d.com/index.html">uLink</a> game development facilities. They even made a video demo of uTsung, that you might find interresting:</p> <blockquote> <p class="quoted"><a class="image-link" href="http://www.youtube.com/watch?v=rxBhqIP_7ls"> <img src="../../../images/utsung-demo.png"></a></p> </blockquote> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 02 Aug 2011 10:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/02-see-tsung-in-action.html</guid> </item> <item> <title>Parallel pgloader</title> <link>http://tapoueh.org/blog/2011/08/01-parallel-pgloader.html</link> <description><![CDATA[<p>This article continues the series that began with <a href="http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html">How To Use PgLoader</a> then detailed <a href="http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html">How to Setup pgloader</a>. We have some more fine points to talk about here, today's article is about loading your data in parallel with <a href="../../../pgsql/pgloader.html">pgloader</a>.</p>

<h2>several files at a time</h2> <p class="first">Parallelism is implemented in 3 different ways in pgloader. First, you can load more than one file at a time thanks to the max_parallel_sections parameter, that has to be setup in the <em>global section</em> of the file.</p> <p>This setting is quite simple and already allows the most common use case.</p> <h2>several workers per file</h2> <p class="first">The other use case is when you have huge files to load into the database. Then you want to be able to have more than one process reading the file at the same time. Using <a href="../../../pgsql/pgloader.html">pgloader</a>, you already did the compromise to load the whole content in more than one transaction, so there's no further drawback here about having those multiple transactions per file spread to more than one load <em>worker</em>.</p> <p>There are basically two ways to split the work between several workers here, and both are implemented in pgloader.</p> <h3>N workers, N splits of the file</h3> <pre class="src"> <span style="color: #eeeeec;">section_threads</span> = 4 <span style="color: #eeeeec;">split_file_reading</span> = True </pre> <p>Setup this way, <a href="../../../pgsql/pgloader.html">pgloader</a> will launch 4 different <em>threads</em> (see the <strong>caveat</strong> section of this article). Each thread is then given a part of the input data file and will run the whole usual pgloader processing on its own. For this to work you need to be able to seek in the input stream, which might not always be convenient.</p> <h3>one reader, N workers</h3> <pre class="src"> <span style="color: #eeeeec;">section_threads</span> = 4 <span style="color: #eeeeec;">split_file_reading</span> = False <span style="color: #eeeeec;">rrqueue_size</span> = 5000 </pre> <p>With such a setup, <a href="../../../pgsql/pgloader.html">pgloader</a> will start 4 different worker <em>threads</em> that will receive the data input in an internal <a href="http://docs.python.org/library/collections.html#deque-objects">python queue</a>. Another active <em>thread</em> will be responsible of reading the input file and filling the queues in a <em>round robin</em> fashion, but will hand all the processing of the data to each worker, of course.</p> <h3>how many threads?</h3> <p class="first">If you're using a mix and match of max_parallel_sections and section_threads with split_file_reading set to True of False, it's uneasy to know exactly how many <em>threads</em> will run at any time in the loading. How to ascertain which section will run in parallel when it depends on the timing of the loading?</p> <p>The advice here is the usual one, don't overestimate the capabilities of your system unless you are in a position to check before by doing trial runs.</p> <h2>caveat</h2> <p class="first">Current implementation of all the parallelism in <a href="../../../pgsql/pgloader.html">pgloader</a> has been done with the <a href="http://docs.python.org/library/threading.html">python threading</a> API. While this is easy enough to use when you want to exchange data between threads, it's suffering from the <a href="http://docs.python.org/c-api/init.html#thread-state-and-the-global-interpreter-lock">Global Interpreter Lock</a> issue. This means that while the code is doing its processing in parallel, the <em>runtime</em> not so much. You might still benefit from the current implementation if you have hard to parse files, or custom reformat modules that are part of the loading bottleneck.</p> <h2>future</h2> <p class="first">The solution would be to switch to using the newer <a href="http://docs.python.org/library/multiprocessing.html">python multiprocessing</a> API, and some preliminary work has been done in pgloader to allow for that. If you're interested in real parallel bulk loading, <a href="dim%20(at)%20tapoueh%20(dot)%20org">contact-me</a>!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 01 Aug 2011 12:15:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/01-parallel-pgloader.html</guid> </item> <item> <title>Parallel pgloader</title> <link>http://tapoueh.org/blog/2011/08/01-parallel-pgloader.html</link> <description><![CDATA[<p>This article continues the series that began with <a href="http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html">How To Use PgLoader</a> then detailed <a href="http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html">How to Setup pgloader</a>. We have some more fine points to talk about here, today's article is about loading your data in parallel with <a href="../../../pgsql/pgloader.html">pgloader</a>.</p>

<h2>several files at a time</h2> <p class="first">Parallelism is implemented in 3 different ways in pgloader. First, you can load more than one file at a time thanks to the max_parallel_sections parameter, that has to be setup in the <em>global section</em> of the file.</p> <p>This setting is quite simple and already allows the most common use case.</p> <h2>several workers per file</h2> <p class="first">The other use case is when you have huge files to load into the database. Then you want to be able to have more than one process reading the file at the same time. Using <a href="../../../pgsql/pgloader.html">pgloader</a>, you already did the compromise to load the whole content in more than one transaction, so there's no further drawback here about having those multiple transactions per file spread to more than one load <em>worker</em>.</p> <p>There are basically two ways to split the work between several workers here, and both are implemented in pgloader.</p> <h3>N workers, N splits of the file</h3> <pre class="src"> <span style="color: #eeeeec;">section_threads</span> = 4 <span style="color: #eeeeec;">split_file_reading</span> = True </pre> <p>Setup this way, <a href="../../../pgsql/pgloader.html">pgloader</a> will launch 4 different <em>threads</em> (see the <strong>caveat</strong> section of this article). Each thread is then given a part of the input data file and will run the whole usual pgloader processing on its own. For this to work you need to be able to seek in the input stream, which might not always be convenient.</p> <h3>one reader, N workers</h3> <pre class="src"> <span style="color: #eeeeec;">section_threads</span> = 4 <span style="color: #eeeeec;">split_file_reading</span> = False <span style="color: #eeeeec;">rrqueue_size</span> = 5000 </pre> <p>With such a setup, <a href="../../../pgsql/pgloader.html">pgloader</a> will start 4 different worker <em>threads</em> that will receive the data input in an internal <a href="http://docs.python.org/library/collections.html#deque-objects">python queue</a>. Another active <em>thread</em> will be responsible of reading the input file and filling the queues in a <em>round robin</em> fashion, but will hand all the processing of the data to each worker, of course.</p> <h3>how many threads?</h3> <p class="first">If you're using a mix and match of max_parallel_sections and section_threads with split_file_reading set to True of False, it's uneasy to know exactly how many <em>threads</em> will run at any time in the loading. How to ascertain which section will run in parallel when it depends on the timing of the loading?</p> <p>The advice here is the usual one, don't overestimate the capabilities of your system unless you are in a position to check before by doing trial runs.</p> <h2>caveat</h2> <p class="first">Current implementation of all the parallelism in <a href="../../../pgsql/pgloader.html">pgloader</a> has been done with the <a href="http://docs.python.org/library/threading.html">python threading</a> API. While this is easy enough to use when you want to exchange data between threads, it's suffering from the <a href="http://docs.python.org/c-api/init.html#thread-state-and-the-global-interpreter-lock">Global Interpreter Lock</a> issue. This means that while the code is doing its processing in parallel, the <em>runtime</em> not so much. You might still benefit from the current implementation if you have hard to parse files, or custom reformat modules that are part of the loading bottleneck.</p> <h2>future</h2> <p class="first">The solution would be to switch to using the newer <a href="http://docs.python.org/library/multiprocessing.html">python multiprocessing</a> API, and some preliminary work has been done in pgloader to allow for that. If you're interested in real parallel bulk loading, <a href="dim%20(at)%20tapoueh%20(dot)%20org">contact-me</a>!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 01 Aug 2011 12:05:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/01-parallel-pgloader.html</guid> </item> <item> <title>Configurer pgloader</title> <link>http://tapoueh.org/blog/2011/07/29-configurer-pgloader.html</link> <description><![CDATA[<p>Je viens de publier un billet en anglais intitulé <a href="http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html">How to Setup pgloader</a>, qui complète l'écriture en cours d'un <a href="http://tapoueh.org/pgsql/pgloader.html">tutoriel pgloader</a> plus complet. Une fois de plus, je n'ai pas pris le temps de traduire cet article en français avant de savoir si cela vous intéresse, ô lecteurs. Si c'est le cas il suffit de me l'indiquer par mail (ou <em>courriel</em>, après tout) pour que j'ajoute cela dans ma TODO liste.</p>

<p>Bonne lecture !</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 29 Jul 2011 15:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/29-configurer-pgloader.html</guid> </item> <item> <title>How to Setup pgloader</title> <link>http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html</link> <description><![CDATA[<p>In a previous article we detailed <a href="http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html">how to use pgloader</a>, let's now see how to write the pgloader.conf that instructs <a href="http://tapoueh.org/pgsql/pgloader.html">pgloader</a> about what to do.</p>

<p>This file is expected in the INI format, with a <em>global</em> section then one section per file you want to import. The <em>global</em> section defines some default options and how to connect to the <a href="http://tapoueh.org/pgsql/index.html">PostgreSQL</a> server.</p> <p>The configuration setup is fully documented on the <a href="http://pgloader.projects.postgresql.org/">pgloader man page</a> that you can even easily find online. As all <em>unix</em> style man pages, though, it's more a complete reference than introductory material. Let's review.</p> <h2>global section</h2> <p class="first">Here's the <em>global</em> section of the <a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> file of the source files. Well, some options are <em>debugger</em> only options, really, so I changed their value so that what you see here is a better starting point.</p> <pre class="src"> [<span style="color: #8ae234; font-weight: bold;">pgsql</span>] <span style="color: #eeeeec;">base</span> = pgloader

<span style="color: #eeeeec;">log_file</span> = /tmp/pgloader.log <span style="color: #eeeeec;">log_min_messages</span> = INFO <span style="color: #eeeeec;">client_min_messages</span> = WARNING

<span style="color: #eeeeec;">lc_messages</span> = C <span style="color: #eeeeec;">pg_option_client_encoding</span> = <span style="color: #ad7fa8; font-style: italic;">'utf-8'</span> <span style="color: #eeeeec;">pg_option_standard_conforming_strings</span> = on <span style="color: #eeeeec;">pg_option_work_mem</span> = 128MB

<span style="color: #eeeeec;">copy_every</span> = 15000

<span style="color: #eeeeec;">null</span> = <span style="color: #ad7fa8; font-style: italic;">""</span> <span style="color: #eeeeec;">empty_string</span> = <span style="color: #ad7fa8; font-style: italic;">"\ "</span>

<span style="color: #eeeeec;">max_parallel_sections</span> = 4 </pre>

<p>You don't see all the connection setup, here base was enough. You might need to setup host, port and user, and maybe even pass, too, to be able to connect to the PostgreSQL server.</p> <p>The logging options allows you to set a file where to log all pgloader messages, that are categorized as either DEBUG, INFO, WARNING, ERROR or CRITICAL. The options log_min_messages and client_min_messages are another good idea stolen from <a href="http://www.postgresql.org/">PostgreSQL</a> and allow you to setup the level of chatter you want to see on the interactive console (standard output and standard error streams) and on the log file.</p> <p>Please note that the DEBUG level will produce more that 3 times as many data as the data file you're importing. If you're not a pgloader contributor or helping them, well, <em>debug</em> it, you want to avoid setting the log chatter to this value.</p> <p>The client_encoding will be <a href="http://www.postgresql.org/docs/current/static/sql-set.html">SET</a> by <a href="http://tapoueh.org/pgsql/pgloader.html">pgloader</a> on the PostgreSQL connection it establish. You can now even set any parameter you want by using the pg_option_parameter_name magic settings. Note that the command line option --pg-options (or -o for brevity) allows you to override that.</p> <p>Then, the copy_every parameter is set to 5 in the examples, because the test files are containing less than 10 lines and we want to test several <em>batches</em> of commits when using them. So for your real loading, stick to default parameters (10 000 lines per COPY command), or more. You can play with this parameter, depending on the network (or local access) and disk system you're using you might see improvements by reducing it or enlarging it. There's no so much theory of operation as empirical testing and setting here. For a one-off operation, just remove the lines from the configuration.</p> <p>The parameters null and empty_string are related to interpreting the data in the text or csv files you have, and the documentation is quite clear about them. Note that you have global setting and per-section setting too.</p> <p>The last parameter of this example, max_parallel_sections, is detailed later in the article.</p> <h2>files section</h2> <p class="first">After the <em>global</em> section come as many sections as you have file to load. Plus the <em>template</em> sections, that are only there so that you can share a bunch of parameters in more than one section. Picture a series of data file all of the same format, the only thing that will change is the filename. Use a template section in this case!</p> <p>Let's see an example:</p> <pre class="src"> [<span style="color: #8ae234; font-weight: bold;">simple_tmpl</span>] <span style="color: #eeeeec;">template</span> = True <span style="color: #eeeeec;">format</span> = text <span style="color: #eeeeec;">datestyle</span> = dmy
<span style="color: #eeeeec;">field_sep</span> =

<span style="color: #eeeeec;">trailing_sep</span> = True

[<span style="color: #8ae234; font-weight: bold;">simple</span>] <span style="color: #eeeeec;">use_template</span> = simple_tmpl <span style="color: #eeeeec;">table</span> = simple <span style="color: #eeeeec;">filename</span> = simple/simple.data <span style="color: #eeeeec;">columns</span> = a:1, b:3, c:2 <span style="color: #eeeeec;">skip_head_lines</span> = 2

<span style="color: #888a85;"># </span><span style="color: #888a85;">those reject settings are defaults one </span><span style="color: #eeeeec;">reject_log</span> = /tmp/simple.rej.log <span style="color: #eeeeec;">reject_data</span> = /tmp/simple.rej

[<span style="color: #8ae234; font-weight: bold;">partial</span>] <span style="color: #eeeeec;">table</span> = partial <span style="color: #eeeeec;">format</span> = text <span style="color: #eeeeec;">filename</span> = partial/partial.data <span style="color: #eeeeec;">field_sep</span> = % <span style="color: #eeeeec;">columns</span> = * <span style="color: #eeeeec;">only_cols</span> = 1-3, 5 </pre>

<p>That's 2 of the examples from the <a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> file, in 3 sections so that we see one template example. Of course, having a single section using the template, it's just here for the example.</p> <h2>data file format</h2> <p class="first">The most important setting that you have to care about is the file format. Your choice here is either text, csv or fixed. Mostly, what we are given nowadays is csv. You might remember having read that the nice thing about standards is that there's so many to choose from... well, the csv land is one where it's pretty hard to find different producers that understand it the same way.</p> <p>So when you fail to have pgloader load your <em>mostly csv</em> files with a csv setup, it's time to consider using text instead. The text file format accept a lot of tunables to adapt to crazy situations, but is all python code when the <a href="http://docs.python.org/library/csv.html">python csv module</a> is a C-coded module, more efficient.</p> <p>If you're wondering what kind of format we're talking about here, here's the <a href="https://github.com/dimitri/pgloader/blob/master/examples/cluttered/cluttered.data">cluttered pgloader example</a> for your reading pleasure, using ^ (carret) as the field separator:</p> <pre class="src"> 1^some multi\ line text with\ newline escaping^and some other data following^ 2^and another line^clean^ 3^and\ a last multiline\ escaped line with a missing\ escaping^just to test^ 4^\ ^empty value^ 5^^null value^ 6^multi line\ escaped value\ \ with empty line\ embeded^last line^ </pre> <p>And here's what we get by loading that:</p> <pre class="src"> pgloader/examples$ pgloader -c pgloader.conf -s cluttered
Table name duration size copy rows errors

====================================================================

cluttered 0.193s - 6 0

pgloader/examples$ psql pgloader -c <span style="color: #ad7fa8; font-style: italic;">"table cluttered;"</span>

a b c

—+——————————-+——————

1 and some other data following some multi

: line text with : newline escaping

2 clean and another line
3 just to test and

: a last multiline : escaped line : with a missing : escaping

4 empty value
5 null value
6 last line multi line

: escaped value : : with empty line : embeded (6 rows) </pre>

<p>So when you have such kind of data, well, it might be that pgloader is still able to help you!</p> <p>Please refer to the <a href="http://pgloader.projects.postgresql.org/">pgloader man page</a> to know about each and every parameter that you can define and the values accepted, etc. And the <em>fixed</em> data format is to be used when you're not given a field separator but field positions in the file. Yes, we still encounter those from time to time. Who needs variable size storage, after all?</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 29 Jul 2011 15:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html</guid> </item> <item> <title>Emacs ANSI colors</title> <link>http://tapoueh.org/blog/2011/07/blog/2011/07/29-emacs-ansi-colors.html</link> <description><![CDATA[<p><a href="http://tapoueh.org/emacs/index.html">Emacs</a> comes with a pretty good implementation of a terminal emulator, M-x term. Well not that good actually, but given what I use it for, it's just what I need. Particulary if you add to that my <a href="http://tapoueh.org/emacs/cssh.html">cssh</a> tool, so that connecting with ssh to a remote host is just a =C-= runs the command cssh-term-remote-open away, and completes on the host name thanks to ~/.ssh/known_hosts.</p>

<p>Now, a problem that I still had to solve was the colors used in the terminal. As I'm using the <em>tango</em> color theme for emacs, the default <em>ANSI</em> palette's blue color was not readable. Here's how to fix that:</p> <pre class="src">

(<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">ansi-color</span>) (setq ansi-color-names-vector (vector (frame-parameter nil 'background-color) <span style="color: #bc8f8f;">"#f57900"</span> <span style="color: #bc8f8f;">"#8ae234"</span> <span style="color: #bc8f8f;">"#edd400"</span> <span style="color: #bc8f8f;">"#729fcf"</span> <span style="color: #bc8f8f;">"#ad7fa8"</span> <span style="color: #bc8f8f;">"cyan3"</span> <span style="color: #bc8f8f;">"#eeeeec"</span>) ansi-term-color-vector ansi-color-names-vector ansi-color-map (ansi-color-make-color-map)) </pre>

<p>Now your colors in an emacs terminal are easy to read, as you can see:</p> <blockquote> <p class="quoted"><img src="../../../images/emacs-tango-term-colors.png" alt=""></p> </blockquote> <p>Hope you enjoy!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 29 Jul 2011 10:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/blog/2011/07/29-emacs-ansi-colors.html</guid> </item> <item> <title>Emacs ANSI colors</title> <link>http://tapoueh.org/blog/2011/07/29-emacs-ansi-colors.html</link> <description><![CDATA[<p><a href="http://tapoueh.org/emacs/index.html">Emacs</a> comes with a pretty good implementation of a terminal emulator, M-x term. Well not that good actually, but given what I use it for, it's just what I need. Particulary if you add to that my <a href="http://tapoueh.org/emacs/cssh.html">cssh</a> tool, so that connecting with ssh to a remote host is just a =C-= runs the command cssh-term-remote-open away, and completes on the host name thanks to ~/.ssh/known_hosts.</p>

<p>Now, a problem that I still had to solve was the colors used in the terminal. As I'm using the <em>tango</em> color theme for emacs, the default <em>ANSI</em> palette's blue color was not readable. Here's how to fix that:</p> <pre class="src">

(<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">ansi-color</span>) (setq ansi-color-names-vector (vector (frame-parameter nil 'background-color) <span style="color: #ad7fa8; font-style: italic;">"#f57900"</span> <span style="color: #ad7fa8; font-style: italic;">"#8ae234"</span> <span style="color: #ad7fa8; font-style: italic;">"#edd400"</span> <span style="color: #ad7fa8; font-style: italic;">"#729fcf"</span> <span style="color: #ad7fa8; font-style: italic;">"#ad7fa8"</span> <span style="color: #ad7fa8; font-style: italic;">"cyan3"</span> <span style="color: #ad7fa8; font-style: italic;">"#eeeeec"</span>) ansi-term-color-vector ansi-color-names-vector ansi-color-map (ansi-color-make-color-map)) </pre>

<p>Now your colors in an emacs terminal are easy to read, as you can see:</p> <blockquote> <p class="quoted"><img src="../../../images/emacs-tango-term-colors.png" alt=""></p> </blockquote> <p>Hope you enjoy!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 29 Jul 2011 10:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/29-emacs-ansi-colors.html</guid> </item> <item> <title>How to Setup pgloader</title> <link>http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html</link> <description><![CDATA[<p>In a previous article we detailed <a href="http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html">how to use pgloader</a>, let's now see how to write the pgloader.conf that instructs <a href="http://tapoueh.org/pgsql/pgloader.html">pgloader</a> about what to do.</p>

<p>This file is expected in the INI format, with a <em>global</em> section then one section per file you want to import. The <em>global</em> section defines some default options and how to connect to the <a href="http://tapoueh.org/pgsql/index.html">PostgreSQL</a> server.</p> <p>The configuration setup is fully documented on the <a href="http://pgloader.projects.postgresql.org/">pgloader man page</a> that you can even easily find online. As all <em>unix</em> style man pages, though, it's more a complete reference than introductory material. Let's review.</p> <h2>global section</h2> <p class="first">Here's the <em>global</em> section of the <a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> file of the source files. Well, some options are <em>debugger</em> only options, really, so I changed their value so that what you see here is a better starting point.</p> <pre class="src"> [<span style="color: #8ae234; font-weight: bold;">pgsql</span>] <span style="color: #eeeeec;">base</span> = pgloader

<span style="color: #eeeeec;">log_file</span> = /tmp/pgloader.log <span style="color: #eeeeec;">log_min_messages</span> = INFO <span style="color: #eeeeec;">client_min_messages</span> = WARNING

<span style="color: #eeeeec;">lc_messages</span> = C <span style="color: #eeeeec;">pg_option_client_encoding</span> = <span style="color: #ad7fa8; font-style: italic;">'utf-8'</span> <span style="color: #eeeeec;">pg_option_standard_conforming_strings</span> = on <span style="color: #eeeeec;">pg_option_work_mem</span> = 128MB

<span style="color: #eeeeec;">copy_every</span> = 15000

<span style="color: #eeeeec;">null</span> = <span style="color: #ad7fa8; font-style: italic;">""</span> <span style="color: #eeeeec;">empty_string</span> = <span style="color: #ad7fa8; font-style: italic;">"\ "</span>

<span style="color: #eeeeec;">max_parallel_sections</span> = 4 </pre>

<p>You don't see all the connection setup, here base was enough. You might need to setup host, port and user, and maybe even pass, too, to be able to connect to the PostgreSQL server.</p> <p>The logging options allows you to set a file where to log all pgloader messages, that are categorized as either DEBUG, INFO, WARNING, ERROR or CRITICAL. The options log_min_messages and client_min_messages are another good idea stolen from <a href="http://www.postgresql.org/">PostgreSQL</a> and allow you to setup the level of chatter you want to see on the interactive console (standard output and standard error streams) and on the log file.</p> <p>Please note that the DEBUG level will produce more that 3 times as many data as the data file you're importing. If you're not a pgloader contributor or helping them, well, <em>debug</em> it, you want to avoid setting the log chatter to this value.</p> <p>The client_encoding will be <a href="http://www.postgresql.org/docs/current/static/sql-set.html">SET</a> by <a href="http://tapoueh.org/pgsql/pgloader.html">pgloader</a> on the PostgreSQL connection it establish. You can now even set any parameter you want by using the pg_option_parameter_name magic settings. Note that the command line option --pg-options (or -o for brevity) allows you to override that.</p> <p>Then, the copy_every parameter is set to 5 in the examples, because the test files are containing less than 10 lines and we want to test several <em>batches</em> of commits when using them. So for your real loading, stick to default parameters (10 000 lines per COPY command), or more. You can play with this parameter, depending on the network (or local access) and disk system you're using you might see improvements by reducing it or enlarging it. There's no so much theory of operation as empirical testing and setting here. For a one-off operation, just remove the lines from the configuration.</p> <p>The parameters null and empty_string are related to interpreting the data in the text or csv files you have, and the documentation is quite clear about them. Note that you have global setting and per-section setting too.</p> <p>The last parameter of this example, max_parallel_sections, is detailed later in the article.</p> <h2>files section</h2> <p class="first">After the <em>global</em> section come as many sections as you have file to load. Plus the <em>template</em> sections, that are only there so that you can share a bunch of parameters in more than one section. Picture a series of data file all of the same format, the only thing that will change is the filename. Use a template section in this case!</p> <p>Let's see an example:</p> <pre class="src"> [<span style="color: #8ae234; font-weight: bold;">simple_tmpl</span>] <span style="color: #eeeeec;">template</span> = True <span style="color: #eeeeec;">format</span> = text <span style="color: #eeeeec;">datestyle</span> = dmy
<span style="color: #eeeeec;">field_sep</span> =

<span style="color: #eeeeec;">trailing_sep</span> = True

[<span style="color: #8ae234; font-weight: bold;">simple</span>] <span style="color: #eeeeec;">use_template</span> = simple_tmpl <span style="color: #eeeeec;">table</span> = simple <span style="color: #eeeeec;">filename</span> = simple/simple.data <span style="color: #eeeeec;">columns</span> = a:1, b:3, c:2 <span style="color: #eeeeec;">skip_head_lines</span> = 2

<span style="color: #888a85;"># </span><span style="color: #888a85;">those reject settings are defaults one </span><span style="color: #eeeeec;">reject_log</span> = /tmp/simple.rej.log <span style="color: #eeeeec;">reject_data</span> = /tmp/simple.rej

[<span style="color: #8ae234; font-weight: bold;">partial</span>] <span style="color: #eeeeec;">table</span> = partial <span style="color: #eeeeec;">format</span> = text <span style="color: #eeeeec;">filename</span> = partial/partial.data <span style="color: #eeeeec;">field_sep</span> = % <span style="color: #eeeeec;">columns</span> = * <span style="color: #eeeeec;">only_cols</span> = 1-3, 5 </pre>

<p>That's 2 of the examples from the <a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> file, in 3 sections so that we see one template example. Of course, having a single section using the template, it's just here for the example.</p> <h3>data file format</h3> <p class="first">The most important setting that you have to care about is the file format. Your choice here is either text, csv or fixed. Mostly, what we are given nowadays is csv. You might remember having read that the nice thing about standards is that there's so many to choose from... well, the csv land is one where it's pretty hard to find different producers that understand it the same way.</p> <p>So when you fail to have pgloader load your <em>mostly csv</em> files with a csv setup, it's time to consider using text instead. The text file format accept a lot of tunables to adapt to crazy situations, but is all python code when the <a href="http://docs.python.org/library/csv.html">python csv module</a> is a C-coded module, more efficient.</p> <p>If you're wondering what kind of format we're talking about here, here's the <a href="https://github.com/dimitri/pgloader/blob/master/examples/cluttered/cluttered.data">cluttered pgloader example</a> for your reading pleasure, using ^ (carret) as the field separator:</p> <pre class="src"> 1^some multi\ line text with\ newline escaping^and some other data following^ 2^and another line^clean^ 3^and\ a last multiline\ escaped line with a missing\ escaping^just to test^ 4^\ ^empty value^ 5^^null value^ 6^multi line\ escaped value\ \ with empty line\ embeded^last line^ </pre> <p>And here's what we get by loading that:</p> <pre class="src"> pgloader/examples$ pgloader -c pgloader.conf -s cluttered
Table name duration size copy rows errors

====================================================================

cluttered 0.193s - 6 0

pgloader/examples$ psql pgloader -c <span style="color: #ad7fa8; font-style: italic;">"table cluttered;"</span>

a b c

—+——————————-+——————

1 and some other data following some multi

: line text with : newline escaping

2 clean and another line
3 just to test and

: a last multiline : escaped line : with a missing : escaping

4 empty value
5 null value
6 last line multi line

: escaped value : : with empty line : embeded (6 rows) </pre>

<p>So when you have such kind of data, well, it might be that pgloader is still able to help you!</p> <p>Please refer to the <a href="http://pgloader.projects.postgresql.org/">pgloader man page</a> to know about each and every parameter that you can define and the values accepted, etc. And the <em>fixed</em> data format is to be used when you're not given a field separator but field positions in the file. Yes, we still encounter those from time to time.</p> <h2>parallel processing</h2> <h3>one reader, multiple workers</h3> <h3>multiple workers, each reading</h3> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 29 Jul 2011 09:57:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html</guid> </item> <item> <title>Next month partitions</title> <link>http://tapoueh.org/blog/2011/07/27-check-parts-for-next-month.html</link> <description><![CDATA[<p>When you do partition your tables monthly, then comes the question of when to create next partitions. I tend to create them just the week before next month and I have some nice <a href="http://www.nagios.org/">nagios</a> scripts to alert me in case I've forgotten to do so. How to check that by hand in the end of a month?</p>

<p>Here's a catalog query to help you there:</p> <pre class="src"> &gt; select * -&gt; from -&gt; ( (&gt; select <span style"color: #ad7fa8; font-style: italic;">'previous parts'</span> as schemaname, count()::text as tablename (&gt; from pg_tables (&gt; where schemaname not in (<span style="color: #ad7fa8; font-style: italic;">'pg_catalog'</span>,<span style="color: #ad7fa8; font-style: italic;">'information_schema'</span>) (&gt; and tablename like to_char(now(), <span style="color: #ad7fa8; font-style: italic;">'%YYYYMM'</span>) (&gt; (&gt; union (&gt;
(&gt; select schemaname, substring(tablename,1,length(tablename)-6) <span style="color: #ad7fa8; font-style: italic;">'201108'</span>

(&gt; from pg_tables (&gt; where schemaname not in (<span style="color: #ad7fa8; font-style: italic;">'pg_catalog'</span>,<span style="color: #ad7fa8; font-style: italic;">'information_schema'</span>) (&gt; and tablename like to_char(now(), <span style="color: #ad7fa8; font-style: italic;">'%YYYYMM'</span>) (&gt; (&gt; except (&gt; (&gt; select schemaname, tablename (&gt; from pg_tables (&gt; where schemaname not in (<span style="color: #ad7fa8; font-style: italic;">'pg_catalog'</span>,<span style="color: #ad7fa8; font-style: italic;">'information_schema'</span>) (&gt; and tablename like to_char(now() + interval <span style="color: #ad7fa8; font-style: italic;">'1 month'</span>, <span style="color: #ad7fa8; font-style: italic;">'%YYYYMM'</span>) (&gt; ) as t -&gt; order by schemaname &lt;&gt; <span style="color: #ad7fa8; font-style: italic;">'previous parts'</span>, schemaname;

schemaname tablename

<span style="color: #888a85;">—————-+————————

</span> previous parts 1
central stats_entrantes_201108

(2 rows) </pre>

<p>As you see, our partitions are named _YYYYMM so that's it's easy to match them in our queries, but I guess about everyone does about the same here. Then the to_char expressions only allow to not enter manually '%201108' in the query text. And there's a trick so that we display how many partitions we have this month, adding a line to the result...</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 27 Jul 2011 22:35:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/27-check-parts-for-next-month.html</guid> </item> <item> <title>Comment Utiliser pgloader</title> <link>http://tapoueh.org/blog/2011/07/22-comment-utiliser-pgloader.html</link> <description><![CDATA[<p>C'est une question qui revient régulièrement, et à laquelle je pensais avoir apporté une réponse satisfaisante avec <a href="https://github.com/dimitri/pgloader/tree/master/examples">les exemples pgloader</a>. Ce document ressemble un peu à un <em>tutoriel</em>, en anglais, et je l'ai détaillé dans l'article <a href="22-how-to-use-pgloader.html">how to use pgloader</a> sur ce même site, en anglais. Si la demande est suffisante, je le traduirai en français.</p>

<p>En attendant, bonne lecture !</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 22 Jul 2011 13:48:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/22-comment-utiliser-pgloader.html</guid> </item> <item> <title>How To Use PgLoader</title> <link>http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html</link> <description><![CDATA[<p>This question about <a href="../../../pgsql/pgloader.html">pgloader</a> usage coms in quite frequently, and I think the examples <a href="https://github.com/dimitri/pgloader/tree/master/examples">README</a> goes a long way in answering it. It's not exactly a <em>tutorial</em> but is almost there. Let me paste it here for reference:</p>

<h2>installing pgloader</h2> <p class="first">Either use the <a href="http://packages.debian.org/source/pgloader">debian package</a> or the one for your distribution of choice if you use another one. RedHat, CentOS, FreeBSD, OpenBSD and some more already include a binary package that you can use directly.</p> <p>Or you could git clone https://github.com/dimitri/pgloader.git and go from there. As it's all python code, it runs fine interpreted from the source directory, you don't <em>need</em> to install it in a special place in your system.</p> <h2>setting up the test environment</h2> <p class="first">To use them, please first create a pgloader database, then for each example the tables it needs, then issue the pgloader command:</p> <pre class="src"> $ createdb —encoding=utf-8 pgloader $ cd examples $ psql pgloader &lt; simple/simple.sql $ ../pgloader.py -Tvc pgloader.conf simple </pre> <p>If you want to load data from all examples, create tables for all of them first, then run pgloader without argument.</p> <h2>example description</h2> <p class="first">The provided examples are:</p> <ul> <li>simple <p>This dataset shows basic case, with trailing separator and data reordering.</p></li> <li>xzero <p>Same as simple but using \0 as the null marker ()</p></li> <li>errors <p>Same test, but with impossible dates. Should report some errors. If it does not report errors, check you're not using psycopg 1.1.21.</p> <p>Should report 3 errors out of 7 lines (4 updates).</p></li> <li>clob <p>This dataset shows some text large object importing to PostgreSQL text datatype.</p></li> <li>cluttured <p>A dataset with newline escaped and multi-line input (without quoting) Beware of data reordering, too.</p></li> <li>csv <p>A dataset with csv delimiter ',' and quoting '&quot;'.</p></li> <li>partial <p>A dataset from which we only load some columns of the provided one.</p></li> <li>serial <p>In this dataset the id field is ommited, it's a serial which will be automatically set by PostgreSQL while COPYing.</p></li> <li>reformat <p>A timestamp column is formated the way MySQL dump its timestamp, which is not the same as the way PostgreSQL reads them. The reformat.mysql module is used to reformat the data on-the-fly.</p></li> <li>udc <p>A used defined column test, where all file columns are not used but a new constant one, not found in the input datafile, is added while loading data.</p></li> </ul> <h2>running the import</h2> <p class="first">You can launch all those pgloader tests in one run, provided you created the necessary tables:</p> <pre class="src">

$ for sql in /*sql; do psql pgloader &lt; $sql; done $ ../pgloader.py -Tsc pgloader.conf

errors WARNING COPY error, trying to find on which line errors WARNING COPY data buffer saved in /tmp/errors.AhWvAv.pgloader errors WARNING COPY error recovery done (2/3) in 0.064s errors WARNING COPY error, trying to find on which line errors WARNING COPY data buffer saved in /tmp/errors.BclHtj.pgloader errors WARNING COPY error recovery done (1/1) in 0.054s errors ERROR 3 errors found into [errors] data errors ERROR please read /tmp/errors.rej.log for errors log errors ERROR and /tmp/errors.rej for data still to process errors ERROR 3 database errors occured reformat WARNING COPY error, trying to find on which line reformat WARNING COPY data buffer saved in /tmp/reformat.6P4WCD.pgloader reformat WARNING COPY error recovery done (1/4) in 0.034s reformat ERROR 1 errors found into [reformat] data reformat ERROR please read /tmp/reformat.rej.log for errors log reformat ERROR and /tmp/reformat.rej for data still to process reformat ERROR 1 database errors occured

Table name duration size copy rows errors

====================================================================

allcols 0.025s - 8 0
clob 0.034s - 7 0
cluttered 0.061s - 6 0
csv 0.035s - 6 0
errors 0.113s - 4 3
fixed 0.045s - 3 0
partial 0.030s - 7 0
reformat 0.036s - 4 1
serial 0.029s - 7 0
simple 0.050s - 7 0
udc 0.020s - 5 0

====================================================================

Total 0.367s - 64 4
</pre> <p>Please note errors test should return 3 errors and reformat 1 error.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 22 Jul 2011 13:38:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html</guid> </item> <item> <title>Emacs Cheat Sheet</title> <link>http://tapoueh.org/blog/2011/07/blog/2011/07/20-emacs-cheat-sheet.html</link> <description><![CDATA[<p>I stumbled upon the following <em>cheat sheet</em> for <a href="http://www.gnu.org/software/emacs/">Emacs</a> yesterday, and it's worth sharing. I already learnt or discovered again some nice default chords, like for example C-x C-o runs the command delete-blank-lines and C-M-o runs the command split-line. I guess I will use the later one a lot.</p>

<center> <p><a class="image-link" href="../../../images/emacs-cheat-sheet.png"> <img src="../../../images/emacs-cheat-sheet-tn.png"></a></p> </center> <p>Hope you'll like it!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 20 Jul 2011 10:44:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/blog/2011/07/20-emacs-cheat-sheet.html</guid> </item> <item> <title>Emacs Cheat Sheet</title> <link>http://tapoueh.org/blog/2011/07/20-emacs-cheat-sheet.html</link> <description><![CDATA[<p>I stumbled upon the following <em>cheat sheet</em> for <a href="http://www.gnu.org/software/emacs/">Emacs</a> yesterday, and it's worth sharing. I already learnt or discovered again some nice default chords, like for example C-x C-o runs the command delete-blank-lines and C-M-o runs the command split-line. I guess I will use the later one a lot.</p>

<center> <p><a class="image-link" href="../../../images/emacs-cheat-sheet.png"> <img src="../../../images/emacs-cheat-sheet-tn.png"></a></p> </center> <p>Hope you'll like it!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 20 Jul 2011 10:44:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/20-emacs-cheat-sheet.html</guid> </item> <item> <title>Skytools3 : les slides</title> <link>http://tapoueh.org/blog/2011/07/19-skytools3-slides.html</link> <description><![CDATA[<p>La conférence <a href="http://char11.org/">CHAR(11)</a> étant maintenant terminée, il est d'usage de publier les <em>slides</em> utilisés. J'ai présenté <a href="http://wiki.postgresql.org/wiki/SkyTools">Skytools</a> 3.0 dont la prochaine version sera publiée dès que j'aurais eu le temps de terminer de revoir (en fait principalement d'écrire) la documentation.</p>

<center> <p><a class="image-link" href="../../../images/skytools3.pdf"> <img src="../../../images/skytools3-0.png"></a></p> </center> <p>Les <em>slides</em> de l'ensemble des présentations devraient être publiés en ligne à terme, mais cela ne va pas pouvoir être fait aussi rapidement que nous le voudrions tous. Alors voici un peu de lecture en attendant la suite !</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 19 Jul 2011 14:39:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/19-skytools3-slides.html</guid> </item> <item> <title>Skytools3 talk Slides</title> <link>http://tapoueh.org/blog/2011/07/19-skytools3-talk-slides.html</link> <description><![CDATA[<p>In case you're wondering, here are the slides from the <a href="http://char11.org/">CHAR(11)</a> talk I gave, about <a href="http://wiki.postgresql.org/wiki/SkyTools">Skytools</a> 3.0, <em>soon</em> to be released. That means as soon as I have enough time available to polish (or write) the documentation.</p>

<center> <p><a class="image-link" href="../../../images/skytools3.pdf"> <img src="../../../images/skytools3-0.png"></a></p> </center> <p>The slides for all the talks should eventually make their way to a central place, but expect some noticable delay here. Sorry about that, and have a good reading meanwhile!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 19 Jul 2011 14:24:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/19-skytools3-talk-slides.html</guid> </item> <item> <title>Elisp Breadcrumbs</title> <link>http://tapoueh.org/blog/2011/07/blog/2011/07/14-elisp-breadcrumbs.html</link> <description><![CDATA[<p>A <a href="http://en.wikipedia.org/wiki/Breadcrumb_(navigation)">breadcrumb</a> is a navigation aid. I just added one to this website, so that it gets easier to browse from any article to its local and parents indexes and back to <a href="../../../index.html">/dev/dim</a>, the root webpage of this site.</p>

<p>As it was not that much work to implement, here's the whole of it:</p> <pre class="src"> <span style="color: #b22222;">;;;</span><span style="color: #b22222;"> </span><span style="color: #b22222;">;;; </span><span style="color: #b22222;">Breadcrumb support </span><span style="color: #b22222;">;;;</span><span style="color: #b22222;"> </span>(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">tapoueh-breadcrumb-to-current-page</span> ()

<span style="color: #bc8f8f;">"Return a list of (name . link) from the index root page to current one"</span> (<span style="color: #7f007f;">let*</span> ((current (muse-current-file)) (cwd (file-name-directory current)) (project (muse-project-of-file current)) (root (muse-style-element <span style="color: #da70d6;">:path</span> (caddr project))) (path (tapoueh-path-to-root)) (dirs (split-string (file-relative-name current root) <span style="color: #bc8f8f;">"/"</span>))) <span style="color: #b22222;">;; </span><span style="color: #b22222;">("blog" "2011" "07" "13-back-from-char11.muse") </span> (append (list (cons <span style="color: #bc8f8f;">"/dev/dim"</span> (concat path <span style="color: #bc8f8f;">"index.html"</span>))) (<span style="color: #7f007f;">loop</span> for p in (butlast dirs) collect (cons p (format <span style="color: #bc8f8f;">"%s%s/index.html"</span> path p)) do (setq path (concat path p <span style="color: #bc8f8f;">"/"</span>))))))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">tapoueh-insert-breadcrumb-div</span> ()

<span style="color: #bc8f8f;">"The real HTML inserting"</span> (insert <span style="color: #bc8f8f;">"&lt;div id=\"breadcrumb\"&gt;"</span>) (<span style="color: #7f007f;">loop</span> for (name . link) in (tapoueh-breadcrumb-to-current-page) do (insert (format <span style="color: #bc8f8f;">"&lt;a href=%s&gt;%s&lt;/a&gt;"</span> link name) <span style="color: #bc8f8f;">" / "</span>)) (insert <span style="color: #bc8f8f;">"&lt;/div&gt;\n"</span>))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">tapoueh-insert-breadcrumb</span> ()

<span style="color: #bc8f8f;">"Must run with current buffer being a muse article"</span> (<span style="color: #7f007f;">save-excursion</span> (beginning-of-buffer) (<span style="color: #7f007f;">when</span> (tapoueh-extract-directive <span style="color: #bc8f8f;">"author"</span> (muse-current-file)) (re-search-forward <span style="color: #bc8f8f;">"&lt;body&gt;"</span> nil t) <span style="color: #b22222;">; </span><span style="color: #b22222;">find where the article content is </span> (re-search-forward <span style="color: #bc8f8f;">"&lt;h2&gt;"</span> nil t) <span style="color: #b22222;">; </span><span style="color: #b22222;">that's the title line </span> (beginning-of-line) (open-line 1) (tapoueh-insert-breadcrumb-div)

(re-search-forward <span style="color: #bc8f8f;">"&lt;h2&gt;"</span> nil t 2) <span style="color: #b22222;">; </span><span style="color: #b22222;">that's the TAG line </span> (beginning-of-line) (open-line 1) (tapoueh-insert-breadcrumb-div)))) </pre>

<p>This code is now called in the :after function of my <a href="http://www.emacswiki.org/emacs/EmacsMuse">Muse</a> project style, and it gets the work done.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 14 Jul 2011 18:44:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/blog/2011/07/14-elisp-breadcrumbs.html</guid> </item> <item> <title>Elisp Breadcrumbs</title> <link>http://tapoueh.org/blog/2011/07/14-elisp-breadcrumbs.html</link> <description><![CDATA[ (open-line 1) (tapoueh-insert-breadcrumb-div)))) </pre>

<p>This code is now called in the :after function of my <a href="http://www.emacswiki.org/emacs/EmacsMuse">Muse</a> project style, and it gets the work done.</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/muse.html">Muse</a></p> </div>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 14 Jul 2011 18:44:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/14-elisp-breadcrumbs.html</guid> </item> <item> <title>De retour de CHAR(11)</title> <link>http://tapoueh.org/blog/2011/07/13-de-retour-de-char11.html</link> <description><![CDATA[h1>De retour de CHAR(11)</h1>

Wednesday, July 13 2011, 17:30 </div>
<p>Quelle meilleure occupation dans le train du retour de <a href="http://char11.org/schedule">CHAR(11)</a> que de se faire reporteur pour l'occasion ? En réalité, dormir serait une idée tant les soirées se sont prolongées !</p>

<p>Nous avons eu le plaisir d'écouter <strong><em>Jan Wieck</em></strong> présenter un historique simplifié de la réplication avec <a href="http://www.postgresql.org/">PostgreSQL</a>. Étant lui-même l'un des pionniers du domaine, son point de vue est des plus intéressants. Il a parlé de l'évolution des solutions de réplication, et je ne peux m'empêcher de penser que par bien des côtés <a href="http://wiki.postgresql.org/wiki/SKytools">Skytools</a> est une évolution de <a href="http://slony.info/">Slony</a> — Jan, auteur de Slony, semblait d'accord avec cela.</p> <p>En effet Skytools est né de limitations de Slony. Certaines d'entre elles existent toujours, comme l'absence de séparation entre la couche de <strong><em>queuing</em></strong> et la couche de réplication elle-même, et certaines ont été résolues depuis, comme les difficultés à subir de fortes charges en écriture. Et puis les deux solutions partagent même une partie de leur implémentation, depuis PostgreSQL 8.3, avec les types de données txid et <a href="http://www.postgresql.org/docs/8.3/interactive/functions-info.html#FUNCTIONS-TXID-SNAPSHOT">txid_snapshot</a>. Bien sûr, l'objectif de Skytools est d'avoir une solution la plus simple possible, parfaitement adapée à un ensemble de cas d'utilisation précis et bornés, alors que Slony essaye de résoudre automatiquement les problèmes les plus difficiles du domaine, au prix d'une interface très complexe.</p> <p>Bien sûr, <strong><em>Jan</em></strong> a pris le temps de comparer objectivement ces solutions de réplication avec la solution intégrée dans PostgreSQL, <em>Streaming Replication</em> et <em>Hot Standby</em>. Nous avions déjà la réplication binaire asynchrone, PostgreSQL 9.1 nous apporte la réplication synchrone avec un contrôle par transaction. <a href="http://database-explorer.blogspot.com/">Simon Riggs</a>, auteur de la fonctionalité, a insisté sur l'innovation que cela représente : aucun autre projet ne permet de contrôler la garantie de durabilité des données avec une granularité aussi souple et précise !</p> <p><a href="http://projects.2ndquadrant.com/repmgr">repmgr</a> est une solution d'administration de <em>cluster</em> animés avec <em>Hot Standby</em> et <em>Streaming Replication</em> (synchrone ou non). Son fonctionnement a été détaillé par <strong><em>Greg Smith</em></strong> et <strong><em>Cédric Villemain</em></strong>. Le premier a montré comment mettre au point une architecture permettant de répartir la charge en lecture, et le second comment obtenir un système tolérant aux pannes grâce au <em>failover</em> automatique intégré dans repmgr. Cette solution innovante a été mise au point en grande partie par 2ndQuadrant France, nous l'avons déjà estampillée <em>production ready</em>.</p> <p><strong><em><a href="http://www.hagander.net/">Magnus Hagander</a></em></strong> a beaucoup travaillé sur le protocole de <em>streaming</em> utilisé pour la réplication intégrée dans PostgreSQL 9.1, ainsi que sur les outils qui exploitent ce protocole. Il a naturellement présenté cela, et l'idée d'un <em>proxy</em> relayant le flux binaire des journaux de transaction est revenue dans les discutions (nous avions déjà envisagé cela en 2010, l'article en anglais <a href="../../2010/05/27-back-from-pgcon2010.html">Back from PgCon2010</a> contient quelques éléments sur le sujet). Avec la réplication synchrone, il devient possible de concevoir des architectures avancées, robustes et versatiles — le proxy pourrait maintenant s'occuper à la fois des archives et des serveurs <em>standby</em>.</p> <p><a href="http://database-explorer.blogspot.com/">Simon Riggs</a> nous a ensuite proposé une rétrospective des 7 dernières années de travail qu'il a réalisé avec PostgreSQL, de l'implémentation du <em>Point in Time Recovery</em> à la réplication synchrone, en passant par <em>Hot Standby</em>. Ce que nous avons dans PostgreSQL 9.0 correspond déjà à ce qu'Oracle propose de plus avancé en terme de durabitilé des données, et 9.1 permet de franchir l'étape suivante. Cela ne freine en rien <strong><em>Simon</em></strong> qui parlait déjà des projets à venir pour les 10 prochaines années.</p> <p>Enfin, <a href="http://www.heroku.com/">Heroku</a> nous a présenté leur incroyable entreprise. Ils ont aujourd'hui plus de 150 000 instances de PostgreSQL en production, démontrant que notre SGBD préféré est prêt pour les hébergeurs. <strong><em>Heroku</em></strong> est en train de concevoir et réaliser une solution prête à l'emploi pour le fameux <em>Cloud</em> si difficile à définir. Ici, il s'agit d'être capable d'ajouter des nouveaux réplicas en lecture seule à la volée pour encaisser les pics de trafic, créer des instances de développement d'un clic, etc.</p> <p>Cet article ne couvre qu'une petite sélection des sujets abordés à la conférence, je compte sur <a href="http://blog.guillaume.lelarge.info/">Guillaume</a> pour lui aussi vous parler de <a href="http://char11.org/schedule">CHAR(11)</a>, mais il faudra peut être attendre son retour des <a href="http://2011.rmll.info/">RMLL</a> (quelle énergie !).</p> <h2>Tags</h2> <p><a href="../../../tags/postgresqlfr.html">PostgreSQLFr</a> <a href="../../../tags/conferences.html">Conferences</a> <a href="../../../tags/skytools.html">Skytools</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 13 Jul 2011 17:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/13-de-retour-de-char11.html</guid> </item> <item> <title>Back From CHAR(11)</title> <link>http://tapoueh.org/blog/2011/07/13-back-from-char11.html</link> <description><![CDATA[h1>Back From CHAR(11)</h1>

Wednesday, July 13 2011, 17:15 </div>
<p><a href="http://char11.org/schedule">CHAR(11)</a> finished somewhen in the night leading to today, if you consider the <em>social events</em> to be part of it, which I definitely do. This conference has been a very good one, both on the organisation side of things and of course for its content.</p>

<p>It began with a perspective about the evolution of replication solutions, by <strong><em>Jan Wieck</em></strong> himself. In some way <a href="http://wiki.postgresql.org/wiki/SKytools">Skytools</a> is an evolution of <a href="http://slony.info/">Slony</a>, in the sense that it reuses the same concepts, a part of the design, and even share bits of the implementation (like the <a href="http://www.postgresql.org/docs/8.3/interactive/functions-info.html#FUNCTIONS-TXID-SNAPSHOT">txid_snapshot</a> datatype that were added in PostgreSQL 8.3). The evolution occured in choosing a subset of the features of Slony and then simplifying the user interface as much as possible. And with Skytools 3.0, those features that were removed but still are useful to solve real-life problems are now available too.</p> <p>Of course the talk did approach the other replication solutions (not just the trigger based ones), and did compare <a href="http://wiki.postgresql.org/wiki/Setting_up_RServ_with_PostgreSQL_7.0.3">RServ</a> to <a href="http://bucardo.org/">Bucardo</a> for example. And then all those were compared to the <a href="http://www.postgresql.org/">PostgreSQL</a> core replication facilities, which are quite a different animal. It was a really nice <em>keynote</em> here, preparing the audience minds to make the most out of all the other talks.</p> <p>I will not review all the talks in details, as I'm pretty sure some other attendees will turn into reporters themselves: scaling the write load!</p> <p>Still <a href="http://projects.2ndquadrant.com/repmgr">repmgr</a> got its share of attention. <a href="http://www.2ndquadrant.com/books/postgresql-9-0-high-performance/">Greg Smith</a> and <a href="http://www.2ndquadrant.fr/">Cédric Villemain</a> did present both how to do <strong>read scaling</strong> and <strong>auto failover</strong> management with this tool, going into fine details about how it works internally and how to best design your services architecture for maximum <strong>data availibility</strong>. The question and answers section led to insist on the fact that you can not have data availibility with less than 3 production nodes.</p> <p><a href="http://www.hagander.net/">Magnus Hagander</a> detailed how flexible the core protocol support for replication (and streaming) really is. That flexibility means that you can quite easily talk this protocol from any application, and the idea of a <em>wal proxy</em> did pop out again (see <a href="../../2010/05/27-back-from-pgcon2010.html">Back from PgCon2010</a> article for my first mentionning of the idea). The main difference is that we now have <em>synchronous replication</em> support, so that the proxy could be trusted both for archiving and serving standbys.</p> <p>Of course <a href="http://database-explorer.blogspot.com/">Simon</a> still has lots of ideas about next 10 years of replication oriented projects for core PostgreSQL code, and his talk nicely summarized the previous 7 years. Future is bright, and guess what, it's beginning today!</p> <p>We also heard about <a href="http://www.heroku.com/">Heroku</a>, and these guys are doing crazy impressive things. Like running 150 000 PostgreSQL instances, for example, showing that you can actually use our prefered database server in the hosting business. I expect that the maturing solution and tool sets providing data availibility are soon to be a game changer here. What they are doing is designing a <strong>flexible data architecture</strong> with strong guarantees (<strong>no data loss</strong>). The <em>cloud elasticity</em> is reaching out from the stateless services, and <em>those guys</em> are making it happen now.</p> <p>May you live in interresting times!</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/skytools.html">skytools</a> <a href="../../../tags/conferences.html">Conferences</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 13 Jul 2011 17:15:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/13-back-from-char11.html</guid> </item> <item> <title>Muse setup revised</title> <link>http://tapoueh.org/blog/2011/07/05-muse-setup-revised.html</link> <description><![CDATA[<p>Most of you are probably reading my posts directly in their RSS reader tools (mine is <a href="http://www.gnus.org/">gnus</a> thanks to the <a href="http://gwene.org/">Gwene</a> service), so you probably missed it, but I just <em>pushed</em> a whole new version of <a href="http://tapoueh.org">my website</a>, still using <a href="https://github.com/alexott/muse">Emacs Muse</a> as the engine.</p>

<p>My setup is tentatively called <a href="../../../tapoueh.el.html">tapoueh.el</a> and browsable online. It consists of some tweaks on top of Muse, so that I can enjoy <a href="../../../tags/index.html">tags</a> and proper <a href="../../../rss/">rss</a> support. By <em>proper</em>, I mean that I want to be able to produce as many <em>topic</em> RSS <em>feeds</em> from a single <em>blog</em>, and thanks to the <em>tags</em> support that's now what I have.</p> <p>The RSS handling and the tagging system are adhoc code, and this very article begins like this:</p> <pre class="src"> Dimitri Fontaine Muse setup revised 20110705-19:55 Emacs Muse </pre> <p>All the information for the site navigation are taken from there, and at long last the RSS I publish now contains proper URLs without abusing <a href="../../../blog.dim.html">anchors</a>, as in the previous link which is a compatibility page in case you had some bookmarks. The compat only works with javascript (did you know that <em>anchors</em> are not part of the URL that is sent to the server, so that you can't apply RedirectMatch or other tweaks?), but all it needs is <em>2 lines of code</em>, so I guess that's not so bad.</p> <pre class="src"> <span style="color: #fcaf3e;">var</span> <span style="color: #fce94f;">anchor</span> = window.location.hash; document.location.href=document.getElementById(anchor).href; </pre> <p>I hope you like the new setup as much as I do, even if I'm left with some debugging to do. That's the price to pay for doing it yourself I guess. But I still don't know of a ready to use solution (as in <em>off the shelf</em>) that meet my criteria for web publishing. More on that topic another time.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 05 Jul 2011 19:55:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/05-muse-setup-revised.html</guid> </item> <item> <title>Muse setup revised</title> <link>http://tapoueh.org/blog/2011/07/blog/2011/07/05-muse-setup-revised.html</link> <description><![CDATA[<p>Most of you are probably reading my posts directly in their RSS reader tools (mine is <a href="http://www.gnus.org/">gnus</a> thanks to the <a href="http://gwene.org/">Gwene</a> service), so you probably missed it, but I just <em>pushed</em> a whole new version of <a href="http://tapoueh.org">my website</a>, still using <a href="https://github.com/alexott/muse">Emacs Muse</a> as the engine.</p>

<p>My setup is tentatively called <a href="../../../tapoueh.el.html">tapoueh.el</a> and browsable online. It consists of some tweaks on top of Muse, so that I can enjoy <a href="../../../tags/index.html">tags</a> and proper <a href="../../../rss/">rss</a> support. By <em>proper</em>, I mean that I want to be able to produce as many <em>topic</em> RSS <em>feeds</em> from a single <em>blog</em>, and thanks to the <em>tags</em> support that's now what I have.</p> <p>The RSS handling and the tagging system are adhoc code, and this very article begins like this:</p> <pre class="src"> Dimitri Fontaine <span style="font-size: 140%; font-weight: bold;"> Muse setup revised</span> 20110705-19:55 Emacs Muse </pre> <p>All the information for the site navigation are taken from there, and at long last the RSS I publish now contains proper URLs without abusing <a href="../../../blog.dim.html">anchors</a>, as in the previous link which is a compatibility page in case you had some bookmarks. The compat only works with javascript (did you know that <em>anchors</em> are not part of the URL that is sent to the server, so that you can't apply RedirectMatch or other tweaks?), but all it needs is <em>2 lines of code</em>, so I guess that's not so bad.</p> <pre class="src"> <span style="color: #7f007f;">var</span> <span style="color: #b8860b;">anchor</span> = window.location.hash; document.location.href=document.getElementById(anchor).href; </pre> <p>I hope you like the new setup as much as I do, even if I'm left with some debugging to do. That's the price to pay for doing it yourself I guess. But I still don't know of a ready to use solution (as in <em>off the shelf</em>) that meet my criteria for web publishing. More on that topic another time.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 05 Jul 2011 19:55:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/blog/2011/07/05-muse-setup-revised.html</guid> </item> <item> <title>Prêt pour CHAR(11) ?</title> <link>http://tapoueh.org/blog/2011/07/04-pret-pour-char11.html</link> <description><![CDATA[h1>Prêt pour CHAR(11) ?</h1>

Monday, July 04 2011, 20:15 </div>
<p>La semaine prochaine <strong>déjà</strong> se tient <a href="http://www.char11.org/">CHAR(11)</a>, la conférence spécialisée sur le <em>Clustering</em>, la <em>Haute Disponibilité</em> et la <em>Réplication</em> avec <a href="http://www.postgresql.org/">PostgreSQL</a>. C'est en Europe, à Cambridge cette fois, et c'est en anglais même si plusieurs compatriotes seront dans l'assistance.</p>

<p>Si vous n'avez pas encore jeté un œil au <a href="http://www.char11.org/schedule">programme</a>, je vous encourage à le faire. Même si vous n'aviez pas prévu de venir… parce qu'il y a de quoi vous faire changer d'avis !</p> <p>Il est déjà difficile de suivre les <a href="http://archives.postgresql.org/">listes de diffusions PostgreSQL</a> en anglais, pour une simple question de temps, mais parfois la barrière de la langue peut également jouer. Alors si vous n'aviez pas bien suivi, je me permets de préciser qui sont les principaux intervenants à cette conférence.</p> <p><strong><em>Jan Wieck</em></strong> assure la première intervention avec un rétrospectif des solutions de réplication pour PostgreSQL. Il a initié <a href="http://slony.info/">Slony</a> et continue d'être très actif dans son architecture et son développement.</p> <p><strong><em>Greg Smith</em></strong>, un collègue chez <a href="http://www.2ndquadrant.us/">2ndQuadrant</a>, est monsieur performances « bas niveau » : sa spécialité est de tirer le meilleur de votre matériel, de votre configuration serveur, de PostgreSQL lui-même, et des requêtes que vous lui soumettez. Son livre <a href="http://www.2ndquadrant.com/books/postgresql-9-0-high-performance/">PostgreSQL High Performance</a> est un incontournable, à ce titre <a href="http://blog.guillaume.lelarge.info/index.php/post/2011/05/01/%C2%AB-Bases-de-donn%C3%A9es-PostgreSQL&#44;-Gestion-des-performances-%C2%BB">traduit en français</a>.</p> <p>Nous avons ensuite <strong><em>Magnus Hagander</em></strong> qui a rejoint récemment la <em>core team</em> (l'organisation centrale du projet), et qui contribue depuis plus de 10 ans au code de PostgreSQL.</p> <p><strong><em>Simon Riggs</em></strong>, lui aussi un de <a href="http://www.2ndquadrant.com/about/#riggs">nos collègues</a>, a réalisé le <em>PITR</em>, l'archivage des journaux de transactions, la réplication asynchrone et pour la prochaine version de PostgreSQL, la réplication synchrone.</p> <p><strong><em>Hannu Krosing</em></strong> (devinez <a href="http://www.2ndquadrant.com/">où</a> il travaille ?) a conçu l'architecture (et les outils) qui permettent à <a href="http://www.skype.com/">Skype</a> d'annoncer une « scalability » infinie, en tout cas annoncée pour supporter jusqu'à <a href="http://highscalability.com/skype-plans-postgresql-scale-1-billion-users">1 milliard d'utilisateurs</a>.</p> <p><strong><em>Koichi Suzuki</em></strong> dirige les efforts du produit prometteur <a href="http://postgres-xc.sourceforge.net/">PostgreS-XC</a>, un bel exemple de collaboration entre différents acteurs du marché, ici <a href="http://www.enterprisedb.com/">EnterpriseDB</a> et <a href="https://www.oss.ecl.ntt.co.jp/ossc/">NTT Open Source Software Center</a>. Ce qui montre une fois de plus que l'<a href="http://fr.wikipedia.org/wiki/Open_source">Open Source</a> est solidement ancré dans entreprises commerciales.</p> <p>Bien sûr, Cédric et moi-même, de la partie française de <a href="http://www.2ndquadrant.fr/">2ndQuadrant</a>, serons de la partie. Nous interviendrons sur des sujets que nous connaissons bien pour avoir participé à leur développement et pour les déployer et les maintenir en production, <a href="http://projects.2ndquadrant.com/repmgr">repmgr</a> et <a href="http://wiki.postgresql.org/wiki/Londiste_Tutorial">Londiste</a>.</p> <p>Et je passe sur d'autres profils, dont les sujets ne serront pas moins intéressants. Bref, si <em>réplication</em> et <em>cluster</em> sont des thèmes que vous voulez conjuguer avec PostgreSQL, c'est l'endroit où passer le début de la semaine prochaine !</p> <h2>Tags</h2> <p><a href="../../../tags/postgresqlfr.html">PostgreSQLfr</a> <a href="../../../tags/conferences.html">Conferences</a> <a href="../../../tags/skytools.html">skytools</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 04 Jul 2011 20:15:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/04-pret-pour-char11.html</guid> </item>

<item>

<title>Multi-Version support for Extensions</title> <link>http://tapoueh.org/blog/2011/06/29-multi-version-support-for-extensions.html</link> <description><![CDATA[h1>Multi-Version support for Extensions</h1>

Wednesday, June 29 2011, 09:50 </div>
<p><span class="hack"> </span></p>

<p>We still have this problem to solve with extensions and their packaging. How to best organize things so that your extension is compatible with before 9.1 and 9.1 and following releases of <a href="http://www.postgresql.org/">PostgreSQL</a>?</p> <p>Well, I had to do it for the <a href="http://pgfoundry.org/projects/ip4r/">ip4r</a> contribution, and I wanted the following to happen:</p> <pre class="src"> dpkg-deb: building package `postgresql-8.3-ip4r' ... dpkg-deb: building package `postgresql-8.4-ip4r' ... dpkg-deb: building package `postgresql-9.0-ip4r' ... dpkg-deb: building package `postgresql-9.1-ip4r' ... </pre> <p>And here's a simple enough way to achieve that. First, you have to get your packaging ready the usual way, and to install the build dependencies. Then realizing that /usr/share/postgresql-common/supported-versions from the latest postgresql-common package will only return 8.3 in lenny (yes, I'm doing some <em>backporting</em> here), we have to tweak it.</p> <pre class="src"> postgresql-server-dev-8.4 postgresql-server-dev-9.0 postgresql-server-dev-9.1 postgresql-server-dev-all

$ sudo dpkg-divert \ —divert /usr/share/postgresql-common/supported-versions.distrib \ —rename /usr/share/postgresql-common/supported-versions

$ cat /usr/share/postgresql-common/supported-versions /bin/bash

dpkg -l postgresql-server-dev-* \ | awk -F '[ -]' '/^ii/ &amp;&amp; ! /server-dev-all/ {print $6}' </pre>

<p>Now we are allowed to build our extension for all those versions, so we add 9.1 to the debian/pgversions file. And debuild will do the right thing now, thanks to <a href="http://manpages.debian.net/cgi-bin/man.cgi?query=pg_buildext">pg_buildext</a> from <a href="http://packages.debian.org/sid/postgresql-server-dev-all">postgresql-server-dev-all</a>.</p> <p>The problem we face is that the built is not an <a href="http://www.postgresql.org/docs/9.1/static/extend-extensions.html">extension</a> as in 9.1, so things like \dx in psql and <a href="http://www.postgresql.org/docs/9.1/static/sql-createextension.html">CREATE EXTENSION</a> will not work out of the box. First, we need a control file. Then we need to remove the transaction control from the install script (here, ip4r.sql), and finally, this script needs to be called ip4r--1.05.sql. Here's how I did it:</p> <pre class="src"> $ cat ip4r.control comment = 'IPv4 and IPv4 range index types' default_version = '1.05' relocatable = yes

$ cat debian/postgresql-9.1-ip4r.install debian/ip4r-9.1/ip4r.so usr/lib/postgresql/9.1/lib ip4r.control usr/share/postgresql/9.1/extension debian/ip4r-9.1/ip4r.sql usr/share/postgresql/9.1/extension

$ cat debian/postgresql-9.1-ip4r.links usr/share/postgresql/9.1/extension/ip4r.sql usr/share/postgresql/9.1/extension/ip4r—1.05.sql </pre>

<p>Be careful not to forget to remove any and all BEGIN; and COMMIT; lines from the ip4r.sql file, which meant that I also removed support for <em>Rtree</em>, which is not relevant for modern versions of PostgreSQL saith the script (post 8.2). That means I'm not publishing this very work yet, but I wanted to share the debian/postgresql-9.1-extension.links idea.</p> <p>Notice that I didn't change anything about the .sql.in make rule, so I didn't have to use the support for module_pathname in the control file.</p> <p>Now, after the usual debuild step, I can just sudo debi to install all the just build packages and CREATE EXTENSION will run fine. And in 9.0 you get the old way to install it, but it still works:</p> <pre class="src"> $ psql -U postgres —cluster 9.0/main -1 \ -f /usr/share/postgresql/9.0/contrib/ip4r.sql &lt;lots of chatter&gt;

$ psql -U postgres —cluster 9.1/main -c 'create extension ip4r;' CREATE EXTENSION </pre>

<p>That's it :)</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/ip4r.html">ip4r</a> <a href="../../../tags/9.1.html">9.1</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 29 Jun 2011 09:50:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/06/29-multi-version-support-for-extensions.html</guid> </item> <item> <title>Don't be afraid of 'cl</title> <link>http://tapoueh.org/blog/2011/06/blog/2011/06/20-dont-be-afraid-of-cl.html</link> <description><![CDATA[<p>In this <a href="http://tsengf.blogspot.com/2011/06/confirm-to-quit-when-editing-files-from.html">blog article</a>, you're shown a quite long function that loop through your buffers to find out if any of them is associated with a file whose full name includes &quot;projects&quot;. Well, you should not be afraid of using cl:</p>

<pre class="src"> (<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">cl</span>) (<span style="color: #7f007f;">loop</span> for b being the buffers

when (string-match <span style="color: #bc8f8f;">"projects"</span> (or (buffer-file-name b) <span style="color: #bc8f8f;">""</span>)) return t) </pre>

<p>If you want to collect the list of buffers whose name matches your test, then replace return t by collect b and you're done. Really, this loop thing is worth learning.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 20 Jun 2011 00:15:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/06/blog/2011/06/20-dont-be-afraid-of-cl.html</guid> </item> <item> <title>Don't be afraid of 'cl</title> <link>http://tapoueh.org/blog/2011/06/20-dont-be-afraid-of-cl.html</link> <description><![CDATA[h1>Don't be afraid of 'cl</h1>

Monday, June 20 2011, 00:15 </div>
<p>In this <a href="http://tsengf.blogspot.com/2011/06/confirm-to-quit-when-editing-files-from.html">blog article</a>, you're shown a quite long function that loop through your buffers to find out if any of them is associated with a file whose full name includes &quot;projects&quot;. Well, you should not be afraid of using cl:</p>

<pre class="src"> (<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">cl</span>) (<span style="color: #729fcf; font-weight: bold;">loop</span> for b being the buffers

when (string-match <span style="color: #ad7fa8; font-style: italic;">"projects"</span> (or (buffer-file-name b) <span style="color: #ad7fa8; font-style: italic;">""</span>)) return t) </pre>

<p>If you want to collect the list of buffers whose name matches your test, then replace return t by collect b and you're done. Really, this loop thing is worth learning.</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 20 Jun 2011 00:15:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/06/20-dont-be-afraid-of-cl.html</guid> </item>

<item>

<title>Back from Ottawa, preparing for Cambridge</title> <link>http://tapoueh.org/blog/2011/05/30-back-from-ottawa-preparing-for-cambridge.html</link> <description><![CDATA[h1>Back from Ottawa, preparing for Cambridge</h1>

Monday, May 30 2011, 11:00 </div>
<p><span class="hack"> </span></p>

<p>While <a href="http://blog.hagander.net/">Magnus</a> is all about <a href="http://2011.pgconf.eu/">PG Conf EU</a> already, you have to realize we're just landed back from <a href="http://www.pgcon.org/2011/">PG Con</a> in Ottawa. My next stop in the annual conferences is <a href="http://char11.org/">CHAR 11</a>, the <em>Clustering, High Availability and Replication</em> conference in Cambridge, 11-12 July. Yes, on the old continent this time.</p> <p>This year's <em>pgcon</em> hot topics, for me, have been centered around a better grasp at <a href="http://www.postgresql.org/docs/9.1/static/transaction-iso.html#XACT-SERIALIZABLE">SSI</a> and <em>DDL Triggers</em>. Having those beasts in <a href="http://www.postgresql.org/">PostgreSQL</a> would allow for auditing, finer privileges management and some more automated replication facilities. Imagine that ALTER TABLE is able to fire a <em>trigger</em>, provided by <em>Londiste</em> or <em>Slony</em>, that will do what's needed on the cluster by itself. That would be awesome, wouldn't it?</p> <p>At <em>CHAR 11</em> I'll be talking about <a href="http://wiki.postgresql.org/wiki/SkyTools">Skytools 3</a>. You know I've been working on its <em>debian</em> packaging, now is the time to review the documentation and make there something as good looking as the monitoring system are...</p> <p>Well, expect some news and a nice big picture diagram overview soon, if work schedule leaves me anytime that's what I want to be working on now.</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/pgcon.html">pgcon</a> <a href="../../../tags/conferences.html">Conferences</a> <a href="../../../tags/skytools.html">skytools</a> <a href="../../../tags/9.1.html">9.1</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 30 May 2011 11:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/30-back-from-ottawa-preparing-for-cambridge.html</guid> </item> <item> <title>el-get 2.2</title> <link>http://tapoueh.org/blog/2011/05/blog/2011/05/26-el-get-22.html</link> <description><![CDATA[<p>We've spotted a little too late for our own taste a discrepancy in the source tree: a work in progress patch landed in git just before to release <a href="https://github.com/dimitri/el-get">el-get</a> stable. So we cleaned the tree (thanks again <a href="http://julien.danjou.info/">Julien</a>), branched a stable maintenance tree, and released 2.2 from there.</p>

<p>You're back to enjoying el-get :)</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 26 May 2011 12:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/blog/2011/05/26-el-get-22.html</guid> </item> <item> <title>el-get 2.2</title> <link>http://tapoueh.org/blog/2011/05/26-el-get-22.html</link> <description><![CDATA[h1>el-get 2.2</h1>

Thursday, May 26 2011, 12:00 </div>
<p>We've spotted a little too late for our own taste a discrepancy in the source tree: a work in progress patch landed in git just before to release <a href="https://github.com/dimitri/el-get">el-get</a> stable. So we cleaned the tree (thanks again <a href="http://julien.danjou.info/">Julien</a>), branched a stable maintenance tree, and released 2.2 from there.</p>

<p>You're back to enjoying el-get :)</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/release.html">release</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 26 May 2011 12:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/26-el-get-22.html</guid> </item> <item> <title>el-get 2.1</title> <link>http://tapoueh.org/blog/2011/05/blog/2011/05/26-el-get-21.html</link> <description><![CDATA[<p>Current <a href="https://github.com/dimitri/el-get">el-get</a> status is stable, ready for daily use and packed with extra features that make life easier. There are some more things we could do, as always, but they will be about smoothing things further.</p>

<h3>Latest released version</h3> <p><a href="https://github.com/dimitri/el-get">el-get</a> version 2.1 is available, with a boatload of features, including autoloads support, byte-compiling in an external <em>clean room</em> <a href="http://www.gnu.org/software/emacs/">Emacs</a> instance, custom support, lazy initialisation support (defering all <em>init</em> functions to eval-after-load), and multi repositories ELPA support.</p> <h3>Version numbering</h3> <p class="first">Version String are now inspired by how Emacs itself numbers its versions. First is the major version number, then a dot, then the minor version number. The minor version number is 0 when still developping the next major version. So 3.0 is a developer release while 3.1 will be the next stable release.</p> <p>Please note that this versioning policy has been picked while backing 1.2~dev, so 1.0 was a <em>stable</em> release in fact. Ah, history.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 26 May 2011 10:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/blog/2011/05/26-el-get-21.html</guid> </item> <item> <title>el-get 2.1</title> <link>http://tapoueh.org/blog/2011/05/26-el-get-21.html</link> <description><![CDATA[h1>el-get 2.1</h1>

Thursday, May 26 2011, 10:00 </div>
<p>Current <a href="https://github.com/dimitri/el-get">el-get</a> status is stable, ready for daily use and packed with extra features that make life easier. There are some more things we could do, as always, but they will be about smoothing things further.</p>

<h3>Latest released version</h3> <p><a href="https://github.com/dimitri/el-get">el-get</a> version 2.1 is available, with a boatload of features, including autoloads support, byte-compiling in an external <em>clean room</em> <a href="http://www.gnu.org/software/emacs/">Emacs</a> instance, custom support, lazy initialisation support (defering all <em>init</em> functions to eval-after-load), and multi repositories ELPA support.</p> <h3>Version numbering</h3> <p class="first">Version String are now inspired by how Emacs itself numbers its versions. First is the major version number, then a dot, then the minor version number. The minor version number is 0 when still developping the next major version. So 3.0 is a developer release while 3.1 will be the next stable release.</p> <p>Please note that this versioning policy has been picked while backing 1.2~dev, so 1.0 was a <em>stable</em> release in fact. Ah, history.</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/release.html">release</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 26 May 2011 10:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/26-el-get-21.html</guid> </item> <item> <title>Preparing for PGCON</title> <link>http://tapoueh.org/blog/2011/05/12-preparing-for-pgcon.html</link> <description><![CDATA[h1>Preparing for PGCON</h1>

Thursday, May 12 2011, 10:30 </div>
<p>It's this time of the year again, the main international <a href="http://www.pgcon.org/2011/">PostgreSQL Conference</a> is next week in Ottawa, Canada. If previous years are any indication, this will be great event where to meet with a lot of the members of your community. The core team will be there, developers will be there, and we will meet with users and their challenging use cases.</p>

<p>This is a very good time to review both what you did in the project those last 12 months, and what you plan to do next year. To help with that, several <em>meeting</em> events are organized. They're like a whole-day round table with a kind of an agenda, with a limited number of invited people in, and very intense on-topic discussions about how to organize ourselves for another great year of innovation in PostgreSQL.</p> <p>Then we have two days full of talks where I usually learn some new aspect of the project or of the product, and where ideas tend to just pop-up in a continuous race. Being away from home and with people you see only once a year (some of them more than that of course, hi European fellows!) seems to allow for some broader thinking.</p> <p>The talks I want to go to include <a href="http://www.pgcon.org/2011/schedule/events/361.en.html">Database Scalability Patterns: Sharding for Unlimited Growth</a> by <a href="http://www.pgcon.org/2011/schedule/speakers/20.en.html">Robert Treat</a>, <a href="http://www.pgcon.org/2011/schedule/events/366.en.html">Maintaining Terabytes</a> by <a href="http://www.pgcon.org/2011/schedule/speakers/112.en.html">Selena Deckelmann</a>, <a href="http://www.pgcon.org/2011/schedule/events/307.en.html">NTT’s Case Report</a> by <a href="http://www.pgcon.org/2011/schedule/speakers/192.en.html">Tetsuo Sakata</a>, <a href="http://www.pgcon.org/2011/schedule/events/350.en.html">Hacking the Query Planner</a> by <a href="http://www.pgcon.org/2011/schedule/speakers/202.en.html">Tom Lane</a>. That's for a first day, right?</p> <p>Then, on the second day, I notice <a href="http://www.pgcon.org/2011/schedule/events/311.en.html">Range Types</a> by <a href="http://www.pgcon.org/2011/schedule/speakers/83.en.html">Jeff Davis</a>, <a href="http://www.pgcon.org/2011/schedule/events/309.en.html">SP-GiST - a new indexing infrastructure for PostgreSQL</a> by <a href="http://www.pgcon.org/2011/schedule/speakers/29.en.html">Oleg</a> and <a href="http://www.pgcon.org/2011/schedule/speakers/33.en.html">Teodor</a>, <a href="http://www.pgcon.org/2011/schedule/events/337.en.html">The Write Stuff</a> by <a href="http://www.pgcon.org/2011/schedule/speakers/110.en.html">Greg Smith</a> (a colleague at <a href="http://www.2ndquadrant.fr/">2ndQuadrant</a>).</p> <p>I will miss <a href="http://www.pgcon.org/2011/schedule/events/333.en.html">Serializable Snapshot Isolation in Postgres</a> by <a href="http://www.pgcon.org/2011/schedule/speakers/113.en.html">Kevin Grittner</a> and <a href="http://www.pgcon.org/2011/schedule/speakers/197.en.html">Dan Ports</a>, unfortunately, because I'll be talking about <a href="http://www.pgcon.org/2011/schedule/events/280.en.html">Extensions Development</a> at the same time.</p> <p>Well of course this list is just a first selection, hallway tracks are often what guides me through talks or make me skip some.</p> <p>See you there!</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/pgcon.html">pgcon</a> <a href="../../../tags/extensions.html">Extensions</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 12 May 2011 10:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/12-preparing-for-pgcon.html</guid> </item> <item> <title>Mailq modeline display</title> <link>http://tapoueh.org/blog/2011/05/blog/2011/05/05-mailq-modeline-display.html</link> <description><![CDATA[<p>If you've not been following along, you might have missed it: it appears to me that even today, in 2011, mail systems work much better when setup the old way. Meaning with a local <a href="http://en.wikipedia.org/wiki/Mail_Transfer_Agent">MTA</a> for outgoing mail. With some niceties, such as <a href="http://tapoueh.org/articles/news/_Postfix_sender_dependent_relayhost_maps.html">sender dependent relayhost maps</a>.</p>

<p>That's why I needed <a href="http://tapoueh.org/projects.html#sec21">M-x mailq</a> to display the <em>mail queue</em> and have some easy shortcuts in order to operate it (mainly f runs the command mailq-mode-flush, but per site and per id delivery are useful too).</p> <p>Now, I also happen to setup outgoing mail routes to walk through an <em>SSH tunnel</em>, which thanks to both <a href="http://www.manpagez.com/man/5/ssh_config/">~/.ssh/config</a> and <a href="https://github.com/dimitri/cssh">cssh</a> (C-= runs the command cssh-term-remote-open, with completion) is a couple of keystrokes away to start. Well it still happens to me to forget about starting it, which causes mails to hold in a queue until I realise it's not delivered, which always take just about too long.</p> <p>A solution I've been thinking about is to add a little flag in the <a href="http://www.gnu.org/s/emacs/manual/html_node/elisp/Mode-Line-Format.html">modeline</a> in my <a href="http://www.gnus.org/">gnus</a> *Group* and *Summary* buffers. The flag would show up as ✔ when no mail is queued and waiting for me to open the tunnel, or ✘ as soon as the queue is not empty. Here's what it looks like here:</p> <center> <p><img src="../../../images//mailq-modeline-display.png" alt=""></p> </center> <p>Well I'm pretty happy with the setup. The flag is refreshed every minute, and here's as an example how I did setup mailq in my <a href="https://github.com/dimitri/el-get">el-get-sources</a> setup:</p> <pre class="src">

(<span style="color: #da70d6;">:name</span> mailq <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> () (mailq-modeline-display))) </pre>

<p>I'm not sure how many of you dear readers are using a local MTA to deliver your mails, but well, the ones who do (or consider doing so) might even find this article useful!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 05 May 2011 14:10:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/blog/2011/05/05-mailq-modeline-display.html</guid> </item> <item> <title>Mailq modeline display</title> <link>http://tapoueh.org/blog/2011/05/05-mailq-modeline-display.html</link> <description><![CDATA[h1>Mailq modeline display</h1>

Thursday, May 05 2011, 14:10 </div>
<p>If you've not been following along, you might have missed it: it appears to me that even today, in 2011, mail systems work much better when setup the old way. Meaning with a local <a href="http://en.wikipedia.org/wiki/Mail_Transfer_Agent">MTA</a> for outgoing mail. With some niceties, such as <a href="http://tapoueh.org/articles/news/_Postfix_sender_dependent_relayhost_maps.html">sender dependent relayhost maps</a>.</p>

<p>That's why I needed <a href="http://tapoueh.org/projects.html#sec21">M-x mailq</a> to display the <em>mail queue</em> and have some easy shortcuts in order to operate it (mainly f runs the command mailq-mode-flush, but per site and per id delivery are useful too).</p> <p>Now, I also happen to setup outgoing mail routes to walk through an <em>SSH tunnel</em>, which thanks to both <a href="http://www.manpagez.com/man/5/ssh_config/">~/.ssh/config</a> and <a href="https://github.com/dimitri/cssh">cssh</a> (C-= runs the command cssh-term-remote-open, with completion) is a couple of keystrokes away to start. Well it still happens to me to forget about starting it, which causes mails to hold in a queue until I realise it's not delivered, which always take just about too long.</p> <p>A solution I've been thinking about is to add a little flag in the <a href="http://www.gnu.org/s/emacs/manual/html_node/elisp/Mode-Line-Format.html">modeline</a> in my <a href="http://www.gnus.org/">gnus</a> *Group* and *Summary* buffers. The flag would show up as ✔ when no mail is queued and waiting for me to open the tunnel, or ✘ as soon as the queue is not empty. Here's what it looks like here:</p> <center> <p><img src="../../../images//mailq-modeline-display.png" alt=""></p> </center> <p>Well I'm pretty happy with the setup. The flag is refreshed every minute, and here's as an example how I did setup mailq in my <a href="https://github.com/dimitri/el-get">el-get-sources</a> setup:</p> <pre class="src">

(<span style="color: #729fcf;">:name</span> mailq <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> () (mailq-modeline-display))) </pre>

<p>I'm not sure how many of you dear readers are using a local MTA to deliver your mails, but well, the ones who do (or consider doing so) might even find this article useful!</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/modeline.html">modeline</a> <a href="../../../tags/cssh.html">cssh</a> <a href="../../../tags/mailq.html">mailq</a> <a href="../../../tags/postfix.html">postfix</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 05 May 2011 14:10:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/05-mailq-modeline-display.html</guid> </item> <item> <title>Tables and Views dependencies</title> <link>http://tapoueh.org/blog/2011/05/04-tables-and-views-dependencies.html</link> <description><![CDATA[h1>Tables and Views dependencies</h1>

Wednesday, May 04 2011, 11:45 </div>
<p><span class="hack"> </span></p>

<p>Let's say you need to ALTER TABLE foo ALTER COLUMN bar TYPE bigint;, and <a href="http://postgresql.org">PostgreSQL</a> is helpfully telling you that no you can't because such and such <em>views</em> depend on the column. The basic way to deal with that is to copy paste from the error message the names of the views involved, then prepare a script wherein you first DROP VIEW ...; then ALTER TABLE and finally CREATE VIEW again, all in the same transaction.</p> <p>So you have to copy paste also the view definitions. With large view definitions, it quickly gets cumbersome to do so. Well when you're working on operations, you have to bear in mind that cumbersome is a synonym for <em>error prone</em>, in fact — so you want another solution if possible.</p> <p>Oh, and the other drawback of this solution is that ALTER TABLE will first take a LOCK on the table, locking out any activity. And more than that, the lock acquisition will queue behind current activity on the table, which means waiting for a fairly long time and damaging the service quality on a moderately loaded server.</p> <p>It's possible to abuse the <a href="http://www.postgresql.org/docs/current/static/catalogs.html">system catalogs</a> in order to find all <em>views</em> that depend on a given table, too. For that, you have to play with pg_depend and you have to know that internally, a <em>view</em> is in fact a <em>rewrite rule</em>. Then here's how to produce the two scripts that we need:</p> <pre class="src"> # \t Showing only tuples. # \o /tmp/drop.sql
# select <span style"color: #ad7fa8; font-style: italic;">'DROP VIEW '</span> views <span style="color: #ad7fa8; font-style: italic;">';'</span>

from (select distinct(r.ev_class::regclass) as views from pg_depend d join pg_rewrite r on r.oid = d.objid where refclassid = <span style="color: #ad7fa8; font-style: italic;">'pg_class'</span>::regclass and refobjid = <span style="color: #ad7fa8; font-style: italic;">'SCHEMA.TABLENAME'</span>::regclass and classid = <span style="color: #ad7fa8; font-style: italic;">'pg_rewrite'</span>::regclass and pg_get_viewdef(r.ev_class, true) ~ <span style="color: #ad7fa8; font-style: italic;">'COLUMN_NAME'</span>) as x;

# \o /tmp/create.sql # select <span style="color: #ad7fa8; font-style: italic;">'CREATE VIEW '</span> || views || E<span style="color: #ad7fa8; font-style: italic;">' AS \n'</span>

|| pg_get_viewdef(views, true) || <span style="color: #ad7fa8; font-style: italic;">';'</span> from (select distinct(r.ev_class::regclass) as views from pg_depend d join pg_rewrite r on r.oid = d.objid where refclassid = <span style="color: #ad7fa8; font-style: italic;">'pg_class'</span>::regclass and refobjid = <span style="color: #ad7fa8; font-style: italic;">'SCHEMA.TABLENAME'</span>::regclass and classid = <span style="color: #ad7fa8; font-style: italic;">'pg_rewrite'</span>::regclass and pg_get_viewdef(r.ev_class, true) ~ <span style="color: #ad7fa8; font-style: italic;">'COLUMN_NAME'</span>) as x;

# \o </pre> <p>Replace <code>SCHEMA.TABLENAME</code> and <code>COLUMN_NAME</code> with your targets here and the first query should give you one row per candidate view. Well if you're not using the <code>\o</code> trick, that is — if you do, check out the generated file instead, with <code>\! cat /tmp/drop.sql</code> for example.</p> <p>Please note that this catalog query is not accurate, as it will select as a candidate any view that will by chance both depend on the target table and contain the <code>column_name</code> in its text definition. So either filter out the candidates properly (by proper proof reading then another <code>WHERE</code> clause), or just accept that you might <code>DROP</code> then <code>CREATE</code> again more <em>views</em> than need be.</p> <p>If you need some more details about the <code>\t \o</code> sequence you might be interested in this older article about <a href"http://tapoueh.org/articles/blog/_Resetting_sequences._All_of_them&#44;_please&#33;.html">resetting sequences</a>.</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/catalogs.html">catalogs</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 04 May 2011 11:45:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/04-tables-and-views-dependencies.html</guid> </item> <item> <title>Extension module_pathname and .sql.in</title> <link>http://tapoueh.org/blog/2011/05/02-extension-module_pathname-and-sqlin.html</link> <description><![CDATA[h1>Extension module_pathname and .sql.in</h1>

Monday, May 02 2011, 17:30 </div>
<p><span class="hack"> </span></p>

<p>While currently too busy at work to deliver much Open Source contributions, let's debunk an old habit of <a href="http://www.postgresql.org/">PostgreSQL</a> extension authors. It's all down to copy pasting from <em>contrib</em>, and there's no reason to continue doing $libdir this way ever since 7.4 days.</p> <p>Let's take an example here, with the <a href="https://github.com/dimitri/prefix">prefix</a> extension. This one too will need some love, but is still behind on my spare time todo list, sorry about that. So, in the prefix.sql.in we read</p> <pre class="src">

CREATE OR REPLACE FUNCTION prefix_range_in(cstring) RETURNS prefix_range AS <span style="color: #ad7fa8; font-style: italic;">'MODULE_PATHNAME'</span> LANGUAGE <span style="color: #ad7fa8; font-style: italic;">'C'</span> IMMUTABLE STRICT; </pre>

<p>Two things are to change here. First, the PostgreSQL <em>backend</em> will understand just fine if you just say AS '$libdir/prefix'. So you have to know in the sql script the name of the shared object library, but if you do, you can maintain directly a prefix.sql script instead.</p> <p>The advantage is that you now can avoid a compatibility problem when you want to support PostgreSQL from 8.2 to 9.1 in your extension (older than that and it's <a href="http://wiki.postgresql.org/wiki/PostgreSQL_Release_Support_Policy">no longer supported</a>). You directly ship your script.</p> <p>For compatibility, you could also use the <a href="http://developer.postgresql.org/pgdocs/postgres/extend-extensions.html">control file</a> module_pathname property. But for 9.1 you then have to add a implicit Make rule so that the script is derived from your .sql.in. And as you are managing several scripts — so that you can handle <em>versioning</em> and <em>upgrades</em> — it can get hairy (<em>hint</em>, you need to copy prefix.sql as prefix--1.1.1.sql, then change its name at next revision, and think about <em>upgrade</em> scripts too). The module_pathname facility is better to keep for when managing more than a single extension in the same directory, like the <a href="http://git.postgresql.org/gitweb?p=postgresql.git;a=blob;f=contrib/spi/Makefile;h=0c11bfcbbd47b0c3ed002874bfefd9e2022cf5ac;hb=HEAD">SPI contrib</a> is doing.</p> <p>Sure, maintaining an extension that targets both antique releases of PostgreSQL and <a href="http://developer.postgresql.org/pgdocs/postgres/sql-createextension.html">CREATE EXTENSION</a> super-powered one(s) (not yet released) is a little more involved than that. We'll get back to that, as some people are still pioneering the movement.</p> <p>On my side, I'm working with some <a href="http://www.debian.org/">debian</a> <a href="http://qa.debian.org/developer.php?login=myon">developer</a> on how to best manage the packaging of those extensions, and this work could end up as a specialized <em>policy</em> document and a coordinated <em>team</em> of maintainers for all things PostgreSQL in debian. This will also give some more steam to the PostgreSQL effort for debian packages: the idea is to maintain packages for all supported version (from 8.2 up to soon 9.1), something debian itself can not commit to.</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/prefix.html">prefix</a> <a href="../../../tags/9.1.html">9.1</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 02 May 2011 17:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/02-extension-module_pathname-and-sqlin.html</guid> </item>

<item>

<title>Emacs and PostgreSQL, PL line numbering</title> <link>http://tapoueh.org/blog/2011/04/blog/2011/04/23-emacs-and-postgresql-pl-line-numbering.html</link> <description><![CDATA[<p><span class="hack"> </span></p>

<p>A while ago I've been fixing and publishing <a href="https://github.com/dimitri/pgsql-linum-format">pgsql-linum-format</a> separately. That allows to number PL/whatever code lines when editing from <a href="http://www.gnu.org/software/emacs/">Emacs</a>, and it's something very useful to turn on when debugging.</p> <center> <p><img src="../../../images//emacs-pgsql-linum.png" alt=""></p> </center> <p>The carrets on the <em>fringe</em> in the emacs window are the result of (setq-default indicate-buffer-boundaries 'left) and here it's just overloading the image somehow. But the idea is to just M-x linum-mode when you need it, at least that's my usage of it.</p> <p>You can use <a href="https://github.com/dimitri/el-get">el-get</a> to easily get (then update) this little Emacs extension.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Sat, 23 Apr 2011 10:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/04/blog/2011/04/23-emacs-and-postgresql-pl-line-numbering.html</guid> </item> <item> <title>Emacs and PostgreSQL, PL line numbering</title> <link>http://tapoueh.org/blog/2011/04/23-emacs-and-postgresql-pl-line-numbering.html</link> <description><![CDATA[h1>Emacs and PostgreSQL, PL line numbering</h1>

Saturday, April 23 2011, 10:30 </div>
<p><span class="hack"> </span></p>

<p>A while ago I've been fixing and publishing <a href="https://github.com/dimitri/pgsql-linum-format">pgsql-linum-format</a> separately. That allows to number PL/whatever code lines when editing from <a href="http://www.gnu.org/software/emacs/">Emacs</a>, and it's something very useful to turn on when debugging.</p> <center> <p><img src="../../../images//emacs-pgsql-linum.png" alt=""></p> </center> <p>The carrets on the <em>fringe</em> in the emacs window are the result of (setq-default indicate-buffer-boundaries 'left) and here it's just overloading the image somehow. But the idea is to just M-x linum-mode when you need it, at least that's my usage of it.</p> <p>You can use <a href="https://github.com/dimitri/el-get">el-get</a> to easily get (then update) this little Emacs extension.</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/pgsql-linum-format.html">pgsql-linum-format</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Sat, 23 Apr 2011 10:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/04/23-emacs-and-postgresql-pl-line-numbering.html</guid> </item> <item> <title>Emacs Kicker</title> <link>http://tapoueh.org/blog/2011/04/blog/2011/04/15-emacs-kicker.html</link> <description><![CDATA[<p>Following up on the very popular <a href="https://github.com/technomancy/emacs-starter-kit">emacs-starter-kit</a>, I'm now proposing the <a href="https://github.com/dimitri/emacs-kicker">emacs-kicker</a>. It's about the .emacs file you've seen in older posts here, which I maintain for some colleagues. After all, if they find it useful, some more people might to, so I've decided to publish it.</p>

<p>What you'll find is a very simple 128 lines <a href="http://www.gnu.org/software/emacs/">Emacs</a> user init file, based on <a href="https://github.com/dimitri/el-get">el-get</a> for external packages. A not so <em>random</em> selection of those is used, here's the list when you hide some details:</p> <pre class="src">

'(el-get <span style="color: #b22222;">; </span><span style="color: #b22222;">el-get is self-hosting </span> escreen <span style="color: #b22222;">; </span><span style="color: #b22222;">screen for emacs, C-\ C-h </span> php-mode-improved <span style="color: #b22222;">; </span><span style="color: #b22222;">if you're into php... </span> psvn <span style="color: #b22222;">; </span><span style="color: #b22222;">M-x svn-status </span> switch-window <span style="color: #b22222;">; </span><span style="color: #b22222;">takes over C-x o </span> auto-complete <span style="color: #b22222;">; </span><span style="color: #b22222;">complete as you type with overlays </span> emacs-goodies-el <span style="color: #b22222;">; </span><span style="color: #b22222;">the debian addons for emacs </span> yasnippet <span style="color: #b22222;">; </span><span style="color: #b22222;">powerful snippet mode </span> zencoding-mode <span style="color: #b22222;">; </span><span style="color: #b22222;">http://www.emacswiki.org/emacs/ZenCoding </span> (<span style="color: #da70d6;">:name</span> buffer-move <span style="color: #b22222;">; </span><span style="color: #b22222;">move buffers around in windows </span> (<span style="color: #da70d6;">:name</span> smex <span style="color: #b22222;">; </span><span style="color: #b22222;">a better (ido like) M-x </span> (<span style="color: #da70d6;">:name</span> magit <span style="color: #b22222;">; </span><span style="color: #b22222;">git meet emacs, and a binding </span> (<span style="color: #da70d6;">:name</span> goto-last-change <span style="color: #b22222;">; </span><span style="color: #b22222;">move pointer back to last change </span></pre>

<p>Another interresting thing to note in this kicker is a choice of some key bindings that are rather unusual (yet) I guess.</p> <pre class="src"> (global-set-key (kbd <span style="color: #bc8f8f;">"C-x C-b"</span>) 'ido-switch-buffer) (global-set-key (kbd <span style="color: #bc8f8f;">"C-x C-c"</span>) 'ido-switch-buffer) (global-set-key (kbd <span style="color: #bc8f8f;">"C-x B"</span>) 'ibuffer) </pre> <p>Yes, you see that I've rebound C-x C-c to switching buffers. That key is really easy to use and I don't think that M-x kill-emacs deserves it. Keys that are so easy to use should be kept for frequent actions, and quiting emacs is a once-a-day to once-a-month action here. And you can still quit from the window manager button or from the menu or from M-x.</p> <p>Also <em>Mac</em> users are not left behind, you will see some settings that either are adapted to the system (like choosing another <em>font</em>, keep displaying the menu-bar or not installing the darkish tango-color-mode on this system, where it renders poorly in my opinion), as you can see here:</p> <pre class="src"> (<span style="color: #7f007f;">if</span> (string-match <span style="color: #bc8f8f;">"apple-darwin"</span> system-configuration)

(set-face-font 'default <span style="color: #bc8f8f;">"Monaco-13"</span>) (set-frame-font <span style="color: #bc8f8f;">"Monospace-10"</span>))

(<span style="color: #7f007f;">when</span> (string-match <span style="color: #bc8f8f;">"apple-darwin"</span> system-configuration)

(setq mac-allow-anti-aliasing t) (setq mac-command-modifier 'meta) (setq mac-option-modifier 'none)) </pre>

<p>So all in all, I don't expect this emacs-kicker to please everyone, but I expect it to be simple and rich enough (thanks to <a href="https://github.com/dimitri/el-get">el-get</a>), and it should be a good <em>kick start</em> that's easy to adapt.</p> <p>If you want to try it without installing it it's very easy to do so. Just clone the git repository then start an Emacs that will use this. For example that could be, using the excellent <a href="http://emacsformacosx.com/">Emacs For MacOSX</a>:</p> <pre class="src">

$ /Applications/Emacs.app/Contents/MacOS/Emacs -Q -l init.el </pre>

<p>I hope some readers will find it useful! :)</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 15 Apr 2011 21:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/04/blog/2011/04/15-emacs-kicker.html</guid> </item> <item> <title>Emacs Kicker</title> <link>http://tapoueh.org/blog/2011/04/15-emacs-kicker.html</link> <description><![CDATA[h1>Emacs Kicker</h1>

Friday, April 15 2011, 21:30 </div>
<p>Following up on the very popular <a href="https://github.com/technomancy/emacs-starter-kit">emacs-starter-kit</a>, I'm now proposing the <a href="https://github.com/dimitri/emacs-kicker">emacs-kicker</a>. It's about the .emacs file you've seen in older posts here, which I maintain for some colleagues. After all, if they find it useful, some more people might to, so I've decided to publish it.</p>

<p>What you'll find is a very simple 128 lines <a href="http://www.gnu.org/software/emacs/">Emacs</a> user init file, based on <a href="https://github.com/dimitri/el-get">el-get</a> for external packages. A not so <em>random</em> selection of those is used, here's the list when you hide some details:</p> <pre class="src">

'(el-get <span style="color: #888a85;">; </span><span style="color: #888a85;">el-get is self-hosting </span> escreen <span style="color: #888a85;">; </span><span style="color: #888a85;">screen for emacs, C-\ C-h </span> php-mode-improved <span style="color: #888a85;">; </span><span style="color: #888a85;">if you're into php... </span> psvn <span style="color: #888a85;">; </span><span style="color: #888a85;">M-x svn-status </span> switch-window <span style="color: #888a85;">; </span><span style="color: #888a85;">takes over C-x o </span> auto-complete <span style="color: #888a85;">; </span><span style="color: #888a85;">complete as you type with overlays </span> emacs-goodies-el <span style="color: #888a85;">; </span><span style="color: #888a85;">the debian addons for emacs </span> yasnippet <span style="color: #888a85;">; </span><span style="color: #888a85;">powerful snippet mode </span> zencoding-mode <span style="color: #888a85;">; </span><span style="color: #888a85;">http://www.emacswiki.org/emacs/ZenCoding </span> (<span style="color: #729fcf;">:name</span> buffer-move <span style="color: #888a85;">; </span><span style="color: #888a85;">move buffers around in windows </span> (<span style="color: #729fcf;">:name</span> smex <span style="color: #888a85;">; </span><span style="color: #888a85;">a better (ido like) M-x </span> (<span style="color: #729fcf;">:name</span> magit <span style="color: #888a85;">; </span><span style="color: #888a85;">git meet emacs, and a binding </span> (<span style="color: #729fcf;">:name</span> goto-last-change <span style="color: #888a85;">; </span><span style="color: #888a85;">move pointer back to last change </span></pre>

<p>Another interresting thing to note in this kicker is a choice of some key bindings that are rather unusual (yet) I guess.</p> <pre class="src"> (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-x C-b"</span>) 'ido-switch-buffer) (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-x C-c"</span>) 'ido-switch-buffer) (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-x B"</span>) 'ibuffer) </pre> <p>Yes, you see that I've rebound C-x C-c to switching buffers. That key is really easy to use and I don't think that M-x kill-emacs deserves it. Keys that are so easy to use should be kept for frequent actions, and quiting emacs is a once-a-day to once-a-month action here. And you can still quit from the window manager button or from the menu or from M-x.</p> <p>Also <em>Mac</em> users are not left behind, you will see some settings that either are adapted to the system (like choosing another <em>font</em>, keep displaying the menu-bar or not installing the darkish tango-color-mode on this system, where it renders poorly in my opinion), as you can see here:</p> <pre class="src"> (<span style="color: #729fcf; font-weight: bold;">if</span> (string-match <span style="color: #ad7fa8; font-style: italic;">"apple-darwin"</span> system-configuration)

(set-face-font 'default <span style="color: #ad7fa8; font-style: italic;">"Monaco-13"</span>) (set-frame-font <span style="color: #ad7fa8; font-style: italic;">"Monospace-10"</span>))

(<span style="color: #729fcf; font-weight: bold;">when</span> (string-match <span style="color: #ad7fa8; font-style: italic;">"apple-darwin"</span> system-configuration)

(setq mac-allow-anti-aliasing t) (setq mac-command-modifier 'meta) (setq mac-option-modifier 'none)) </pre>

<p>So all in all, I don't expect this emacs-kicker to please everyone, but I expect it to be simple and rich enough (thanks to <a href="https://github.com/dimitri/el-get">el-get</a>), and it should be a good <em>kick start</em> that's easy to adapt.</p> <p>If you want to try it without installing it it's very easy to do so. Just clone the git repository then start an Emacs that will use this. For example that could be, using the excellent <a href="http://emacsformacosx.com/">Emacs For MacOSX</a>:</p> <pre class="src">

$ /Applications/Emacs.app/Contents/MacOS/Emacs -Q -l init.el </pre>

<p>I hope some readers will find it useful! :)</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/switch-window.html">switch-window</a> <a href="../../../tags/emacs-kicker.html">emacs-kicker</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 15 Apr 2011 21:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/04/15-emacs-kicker.html</guid> </item> <item> <title>Some notes about Skytools3</title> <link>http://tapoueh.org/blog/2011/04/11-some-notes-about-skytools3.html</link> <description><![CDATA[h1>Some notes about Skytools3</h1>

Monday, April 11 2011, 11:30 </div>
<p>I've been working on <a href="http://github.com/markokr/skytools">skytools3</a> packaging lately. I've been pushing quite a lot of work into it, in order to have exactly what I needed out of the box, after some 3 years of production and experiences with the products. Plural, yes, because even if <a href="http://wiki.postgresql.org/wiki/PgBouncer">pgbouncer</a> and <a href="http://wiki.postgresql.org/wiki/PL/Proxy">plproxy</a> are siblings to the projets (same developers team, separate life cycle and releases), then skytools still includes several sub-projects.</p>

<p>Here's what the skytools3 packaging is going to look like:</p> <pre class="src"> skytools3 Skytool's replication and queuing python-pgq3 Skytool's PGQ python library python-skytools3 python scripts framework for skytools skytools-ticker3 PGQ ticker daemon service skytools-walmgr3 high-availability archive and restore commands postgresql-8.4-pgq3 PGQ server-side code (C module for PostgreSQL) postgresql-9.0-pgq3 PGQ server-side code (C module for PostgreSQL) </pre> <p>This split is needed so that you can install your <em>daemons</em> (we call them <em>consumers</em>) on separate machines than where you run <a href="http://postgresql.org">PostgreSQL</a>. But for the walmgr part, it makes no sense to install it if you don't have a local PostgreSQL service, as it's providing archive and restore commands. Then the <em>ticker</em>, you're free to run it on any machine really, so just package it this way (in skytools3 the <em>ticker</em> is written in C and does not depend on the python framework any more).</p> <p>What you can't see here yet is the new goodies that wraps it as a quality debian package. A new skytools user is created for you when you install the skytools3 package (which contains the services), along with a skeleton file /etc/skytools.ini and a user directory /etc/skytools/. Put in there your services configuration file, and register those service in the /etc/skytools.ini file itself. Then they will get cared about in the init sequence at startup and shutdown of your server.</p> <p>The services will run under the skytools system user, and will default to put their log into /var/log/skytools/. The pidfile will get into /var/run/skytools/. All integrated, automated.</p> <p>Next big <em>TODO</em> is about documentation, reviewing it and polishing it, and I think that skytools3 will then get ready for public release. Yes, you read it right, it's happening this very year! I'm very excited about it, and have several architectures that will greatly benefit from the switch to skytools3. More on that later, though! (Yes, my <em>to blog later</em> list is getting quite long now).</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/skytools.html">skytools</a> <a href="../../../tags/restore.html">restore</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 11 Apr 2011 11:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/04/11-some-notes-about-skytools3.html</guid> </item>

<item>

<title>towards pg_staging 1.0</title> <link>http://tapoueh.org/blog/2011/03/29-towards-pg_staging-10.html</link> <description><![CDATA[h1>towards pg_staging 1.0</h1>

Tuesday, March 29 2011, 15:30 </div>
<p>If you don't remember about what <a href="pgstaging.html">pg_staging</a> is all about, it's a central console from where to control all your <a href="http://www.postgresql.org/">PostgreSQL</a> databases. Typically you use it to manage your development and pre-production setup, where developers ask you pretty often to install them some newer dump from the production, and you want that operation streamlined and easy.</p>

<center> <p><img src="../../../images//pg_staging.png" alt=""></p> </center> <h3>Usage</h3> <p class="first">The typical session would be something like this:</p> <pre class="src"> pg_staging&gt; databases foodb.dev

foodb foodb_20100824 :5432 foodb_20100209 foodb_20100209 :5432 foodb_20100824 foodb_20100824 :5432 pgbouncer pgbouncer :6432 postgres postgres :5432

pg_staging&gt; dbsizes foodb.dev foodb.dev

foodb_20100209: -1 foodb_20100824: 104 GB Total = 104 GB

pg_staging&gt; restore foodb.dev ... pg_staging&gt; switch foodb.dev today </pre>

<p>The list of supported commands is quite long now, and documented too (it comes with two man pages). The restore one is the most important and will create the database, add it to the pgbouncer setup, fetch the backup named dbname.`date -I`.dump, prepare a filtered object list (more on that), load <em>pre</em> SQL scripts, launch pg_restore, VACUUM ANALYZE the database when configured to do so, load the <em>post</em> SQL scripts then optionaly <em>switch</em> the pgbouncer setup to default to this new database.</p> <h3>Filtering</h3> <p class="first">The newer option is called tablename_nodata_regexp, and here's its documentation in full:</p> <blockquote> <p class="quoted"> List of table names regexp (comma separated) to restore without content. The pg_restore catalog TABLE DATA sections will get filtered out. The regexp is applied against schemaname.tablename and non-anchored by default.</p> </blockquote> <p>This comes to supplement the schemas and schemas_nodata options, that allows to only restore objects from a given set of <em>schemas</em> (filtering out triggers that will calls function that are in the excluded schemas, like e.g. <a href="http://wiki.postgresql.org/wiki/Skytools">Londiste</a> ones) or to restore only the TABLE definitions while skipping the TABLE DATA entries.</p> <h3>Setup</h3> <p class="first">To setup your environment for <em>pg_staging</em>, you need to take some steps. It's not complex but it's fairly involved. The benefit is this amazingly useful central unique console to control as many databases as you need.</p> <p>You need a pg_staging.ini file where to describe your environment. I typically name the sessions in there by the name of the database to restore followed by a dev or preprod extension.</p> <p>You need to have all your backups available through HTTP, and as of now, served by the famous <em>apache</em> mod_dir directory listing. It's easy to add support to other methods, but is has not been done yet. You also need to have a cluster wide --globals-only backup available somewhere so that you can easily create the users etc you need from pg_staging.</p> <p>You also need to run a pgbouncer daemon on each database server, allowing you to bypass editing connection strings when you switch a new database version live.</p> <p>You also need to install the <em>client</em> script, have a local pgstaging system user and allow it to run the client script as root, so that it's able to control some services and edit pgbouncer.ini for you.</p> <h3>Status</h3> <p class="first">I'm still using it a lot (several times a week) to manage a whole development and pre-production environment set, so the very low <a href="https://github.com/dimitri/pg_staging">code activity</a> of the project is telling that it's pretty stable (last series of <em>commits</em> are all bug fixes and round corners).</p> <p>Given that, I'm thinking in terms of pg_staging 1.0 soon! Now is a pretty good time to try it and see how it can help you.</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/skytools.html">skytools</a> <a href="../../../tags/backup.html">backup</a> <a href="../../../tags/restore.html">restore</a> <a href="../../../tags/pg_staging.html">pg_staging</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 29 Mar 2011 15:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/03/29-towards-pg_staging-10.html</guid> </item> <item> <title>Extensions in 9.1</title> <link>http://tapoueh.org/blog/2011/03/01-extensions-in-91.html</link> <description><![CDATA[h1>Extensions in 9.1</h1>

Tuesday, March 01 2011, 16:30 </div>
<p>If you've not been following closely you might have missed out on extensions integration. Well, <a href="http://en.wikipedia.org/wiki/Tom_Lane_(computer_scientist)">Tom</a> spent some time on the patches I've been preparing for the last 4 months. And not only did he commit most of the work but he also enhanced some parts of the code (better factoring) and basically finished it.</p>

<p>At the <a href="http://wiki.postgresql.org/wiki/PgCon_2010_Developer_Meeting">previous developer meeting</a> his advice was to avoid putting too much into the very first version of the patch for it to stand its chances of being integrated, and while in the review process more than one major <a href="http://www.postgresql.org/">PostgreSQL</a> contributor expressed worries about the size of the patch and the number of features proposed. Which is the usual process.</p> <p>Then what happened is that <strong><em>Tom</em></strong> finally took a similar reasoning as mine while working on the feature. To maximize the benefits, once you have the infrastructure in place, it's not that much more work to provide the really interesting features. What's complex is agreeing on what exactly are their specifications. And in the <em>little</em> time window we got on this commit fest (well, we hijacked about 2 full weeks there), we managed to get there.</p> <p>So in the end the result is quite amazing, and you can see that on the documentation chapter about it: <a href="http://developer.postgresql.org/pgdocs/postgres/extend-extensions.html">35.15. Packaging Related Objects into an Extension</a>.</p> <p>All the <em>contrib</em> modules that are installing SQL objects into databases for you to use them are now converted to <strong><em>Extensions</em></strong> too, and will get released in 9.1 with an upgrade script that allows you to <em>upgrade from unpackaged</em>. That means that once you've upgraded from a past PostgreSQL release up to 9.1, it will be a command away for you to register <em>extensions</em> as such. I expect third party <em>extension</em> authors (from <a href="http://pgfoundry.org/projects/ip4r/">ip4r</a> to <a href="http://pgfoundry.org/projects/temporal">temporal</a>) to release a <em>upgrade-from-unpackaged</em> version of their work too.</p> <p>Of course, a big use case of the <em>extensions</em> is also in-house PL code, and having version number and multi-stage upgrade scripts there will be fantastic too, I can't wait to work with such a tool set myself. Some later blog post will detail the benefits and usage. I'm already trying to think how much of this version and upgrade facility could be expanded to classic DDL objects…</p> <p>So expect some more blog posts from me on this subject, I will have to talk about <em>debian packaging</em> an extension (it's getting damn easy with <a href="http://packages.debian.org/squeeze/postgresql-server-dev-all">postgresql-server-dev-all</a> — yes it has received some planing ahead), and about how to package your own extension, manage upgrades, turn your current pre-9.1 extension into a <em>full blown extension</em>, and maybe how to stop worrying about extension when you're a DBA.</p> <p>If you have some features you would want to discuss for next releases, please do contact me!</p> <p>Meanwhile, I'm very happy that this project of mine finally made it to <em>core</em>, it's been long in the making. Some years to talk about it and then finally 4 months of coding that I'll remember as a marathon. Many Thanks go to all who helped here, from <a href="http://www.2ndquadrant.com/">2ndQuadrant</a> to early reviewers to people I talked to over beers at conferences… lots of people really.</p> <p>To an extended PostgreSQL (and beyond) :)</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/pgcon.html">pgcon</a> <a href="../../../tags/conferences.html">Conferences</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/ip4r.html">ip4r</a> <a href="../../../tags/9.1.html">9.1</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 01 Mar 2011 16:30:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/03/01-extensions-in-91.html</guid> </item>

<item>

<title>desktop-mode and readahead</title> <link>http://tapoueh.org/blog/2011/02/blog/2011/02/23-desktop-mode-and-readahead.html</link> <description><![CDATA[<p>I'm using <a href="http://www.gnu.org/software/emacs/manual/html_node/elisp/Desktop-Save-Mode.html#Desktop-Save-Mode">Desktop Save Mode</a> so that <a href="http://www.gnu.org/software/emacs/">Emacs</a> knows to open again all the buffers I've been using. That goes quite well with how often I start Emacs, that is once a week or once a month. Now, M-x ibuffer last line is as following:</p>

<pre class="src">

718 buffers 19838205 668 files, 15 processes </pre>

<p>That means that at startup, Emacs will load that many files. In order not to have to wait until it's done doing so, I've setup things this way:</p> <pre class="src"> <span style="color: #b22222;">;; </span><span style="color: #b22222;">and the session </span>(setq desktop-restore-eager 20

desktop-lazy-verbose nil) (desktop-save-mode 1) (savehist-mode 1) </pre>

<p>Problem is that it's still slow. An idea I had was to use the <a href="https://fedorahosted.org/readahead/browser/README">readahead</a> tool that allows reducing some distributions boot time. Of course this tool is not expecting the same file format as emacs-desktop uses. Still, converting is quite easy is some awk magic. Here's the result:</p> <pre class="src"> <span style="color: #b22222;">;;; </span><span style="color: #b22222;">dim-desktop.el — Dimitri Fontaine </span><span style="color: #b22222;">;;</span><span style="color: #b22222;"> </span><span style="color: #b22222;">;; </span><span style="color: #b22222;">Allows to prepare a readahead file list from desktop-save </span> (<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">desktop</span>)

(<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">dim-desktop-file-readahead-list</span>

<span style="color: #bc8f8f;">"~/.emacs.desktop.readahead"</span> <span style="color: #bc8f8f;">"Where to save the emacs desktop `readahead` file list"</span>)

(<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">dim-desktop-filelist-command</span>

<span style="color: #bc8f8f;">"gawk -F '[ \"]' '/desktop-.-buffer/ {getline; if($4) print $4}' %s"</span> <span style="color: #bc8f8f;">"Command to run to prepare the readahead file list"</span>)

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">dim-desktop-get-readahead-file-list</span> (<span style="color: #228b22;">&amp;optional</span> filename dir)

<span style="color: #bc8f8f;">"get the file list for readahead from dekstop file in DIR, or ~"</span> (<span style="color: #7f007f;">with-temp-file</span> (or filename dim-desktop-file-readahead-list) (insert (shell-command-to-string (format dim-desktop-filelist-command (expand-file-name desktop-base-file-name (or dir <span style="color: #bc8f8f;">"~"</span>)))))))

<span style="color: #b22222;">;; </span><span style="color: #b22222;">This will not work because the hook is run before to add the buffers into </span><span style="color: #b22222;">;; </span><span style="color: #b22222;">the desktop file. </span><span style="color: #b22222;">;;</span><span style="color: #b22222;"> </span><span style="color: #b22222;">;;</span><span style="color: #b22222;">(add-hook 'desktop-save-hook 'dim-desktop-get-readahead-file-list) </span> <span style="color: #b22222;">;; </span><span style="color: #b22222;">so instead, advise the function </span>(<span style="color: #7f007f;">defadvice</span> <span style="color: #0000ff;">desktop-save</span> (after desktop-save-readahead activate)

<span style="color: #bc8f8f;">"Prepare a readahead(8) file for the desktop file"</span> (dim-desktop-get-readahead-file-list))

(<span style="color: #7f007f;">provide</span> '<span style="color: #5f9ea0;">dim-desktop</span>) </pre>

<p>The awk construct getline allows to process the next line of the input file, which is very practical here (and in a host of other situations). Now that we have a file containing the list of files Emacs will load, we have to tweak the system to readahead those disk blocks. As I'm currently using <a href="http://kde.org/">KDE</a> again, I've done it thusly:</p> <pre class="src"> % cat ~/.kde/Autostart/readahead.emacs.sh /bin/bash

# just readahead the emacs desktop files # this file listing is maintained directly from Emacs itself readahead ~/.emacs.desktop.readahead </pre>

<p>So, well, it works. The files that Emacs will need are pre-read, so at the time the desktop really gets to them, I see no more disk activity (laptops have a led to see that happening). But the desktop loading time has not changed...</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 23 Feb 2011 16:45:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/02/blog/2011/02/23-desktop-mode-and-readahead.html</guid> </item> <item> <title>desktop-mode and readahead</title> <link>http://tapoueh.org/blog/2011/02/23-desktop-mode-and-readahead.html</link> <description><![CDATA[h1>desktop-mode and readahead</h1>

Wednesday, February 23 2011, 16:45 </div>
<p>I'm using <a href="http://www.gnu.org/software/emacs/manual/html_node/elisp/Desktop-Save-Mode.html#Desktop-Save-Mode">Desktop Save Mode</a> so that <a href="http://www.gnu.org/software/emacs/">Emacs</a> knows to open again all the buffers I've been using. That goes quite well with how often I start Emacs, that is once a week or once a month. Now, M-x ibuffer last line is as following:</p>

<pre class="src">

718 buffers 19838205 668 files, 15 processes </pre>

<p>That means that at startup, Emacs will load that many files. In order not to have to wait until it's done doing so, I've setup things this way:</p> <pre class="src"> <span style="color: #888a85;">;; </span><span style="color: #888a85;">and the session </span>(setq desktop-restore-eager 20

desktop-lazy-verbose nil) (desktop-save-mode 1) (savehist-mode 1) </pre>

<p>Problem is that it's still slow. An idea I had was to use the <a href="https://fedorahosted.org/readahead/browser/README">readahead</a> tool that allows reducing some distributions boot time. Of course this tool is not expecting the same file format as emacs-desktop uses. Still, converting is quite easy is some awk magic. Here's the result:</p> <pre class="src"> <span style="color: #888a85;">;;; </span><span style="color: #888a85;">dim-desktop.el — Dimitri Fontaine </span><span style="color: #888a85;">;;</span><span style="color: #888a85;"> </span><span style="color: #888a85;">;; </span><span style="color: #888a85;">Allows to prepare a readahead file list from desktop-save </span> (<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">desktop</span>)

(<span style="color: #729fcf; font-weight: bold;">defvar</span> <span style="color: #eeeeec;">dim-desktop-file-readahead-list</span>

<span style="color: #ad7fa8; font-style: italic;">"~/.emacs.desktop.readahead"</span> <span style="color: #888a85;">"Where to save the emacs desktop `readahead` file list"</span>)

(<span style="color: #729fcf; font-weight: bold;">defvar</span> <span style="color: #eeeeec;">dim-desktop-filelist-command</span>

<span style="color: #ad7fa8; font-style: italic;">"gawk -F '[ \"]' '/desktop-.-buffer/ {getline; if($4) print $4}' %s"</span> <span style="color: #888a85;">"Command to run to prepare the readahead file list"</span>)

(<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">dim-desktop-get-readahead-file-list</span> (<span style="color: #8ae234; font-weight: bold;">&amp;optional</span> filename dir)

<span style="color: #888a85;">"get the file list for readahead from dekstop file in DIR, or ~"</span> (<span style="color: #729fcf; font-weight: bold;">with-temp-file</span> (or filename dim-desktop-file-readahead-list) (insert (shell-command-to-string (format dim-desktop-filelist-command (expand-file-name desktop-base-file-name (or dir <span style="color: #ad7fa8; font-style: italic;">"~"</span>)))))))

<span style="color: #888a85;">;; </span><span style="color: #888a85;">This will not work because the hook is run before to add the buffers into </span><span style="color: #888a85;">;; </span><span style="color: #888a85;">the desktop file. </span><span style="color: #888a85;">;;</span><span style="color: #888a85;"> </span><span style="color: #888a85;">;;</span><span style="color: #888a85;">(add-hook 'desktop-save-hook 'dim-desktop-get-readahead-file-list) </span> <span style="color: #888a85;">;; </span><span style="color: #888a85;">so instead, advise the function </span>(<span style="color: #729fcf; font-weight: bold;">defadvice</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">desktop-save</span> (after desktop-save-readahead activate)

<span style="color: #888a85;">"Prepare a readahead(8) file for the desktop file"</span> (dim-desktop-get-readahead-file-list))

(<span style="color: #729fcf; font-weight: bold;">provide</span> '<span style="color: #8ae234;">dim-desktop</span>) </pre>

<p>The awk construct getline allows to process the next line of the input file, which is very practical here (and in a host of other situations). Now that we have a file containing the list of files Emacs will load, we have to tweak the system to readahead those disk blocks. As I'm currently using <a href="http://kde.org/">KDE</a> again, I've done it thusly:</p> <pre class="src"> % cat ~/.kde/Autostart/readahead.emacs.sh /bin/bash

# just readahead the emacs desktop files # this file listing is maintained directly from Emacs itself readahead ~/.emacs.desktop.readahead </pre>

<p>So, well, it works. The files that Emacs will need are pre-read, so at the time the desktop really gets to them, I see no more disk activity (laptops have a led to see that happening). But the desktop loading time has not changed...</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/restore.html">restore</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 23 Feb 2011 16:45:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/02/23-desktop-mode-and-readahead.html</guid> </item> <item> <title>Back from FOSDEM</title> <link>http://tapoueh.org/blog/2011/02/07-back-from-fosdem.html</link> <description><![CDATA[h1>Back from FOSDEM</h1>

Monday, February 07 2011, 11:10 </div>
<p>This year we were in the main building of the conference, and apparently the booth went very well, solding lots of <a href="http://postgresqleu.spreadshirt.net/">PostgreSQL merchandise</a> etc. I had the pleasure to once again meet with the community, but being there only 1 day I didn't spend as much time as I would have liked with some of the people there.</p>

<p>In case you're wondering, my <a href="http://fosdem.org/2011/schedule/event/pg_extension1">extension's talk</a> went quite well, and several people were kind enough to tell me they appreciated it! There was video recording of it, so we will soon have proofs showing how bad it really was and how <em>polite</em> those people really are :)</p> <p>I will soon be able to write an article series detailing what's an Extension and how you deal with them, either as a user or an author. Well in fact the goal is for any user to easily become an extension author, as I think lots of people are already maintaining server side code but missing tools to manage it properly. But that will begin once the patch is in, so that I present <em>the real stuff</em> rather than what I proposed to the community… Stay tuned!</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/conferences.html">Conferences</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/fosdem.html">FOSDEM</a> <a href="../../../tags/9.1.html">9.1</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 07 Feb 2011 11:10:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/02/07-back-from-fosdem.html</guid> </item> <item> <title>Going to FOSDEM</title> <link>http://tapoueh.org/blog/2011/02/01-going-to-fosdem.html</link> <description><![CDATA[h1>Going to FOSDEM</h1>

Tuesday, February 01 2011, 13:35 </div>
<p>A quick blog entry to say that yes:</p>

<center> <p><img src="../../../images//going-to-fosdem-2011.png" alt=""></p> </center> <p>And I will even do my <a href="http://fosdem.org/2011/schedule/event/pg_extension1">Extension's talk</a> which had a <a href="http://blog.hagander.net/archives/183-Feedback-from-PGDay.EU-the-speakers.html">success at pgday.eu</a>. The talk will be updated to include the last developments of the extension's feature, as some of it changed already in between, and to detail the plan for the ALTER EXTENSION ... UPGRADE feature that I'd like to see included as soon as 9.1, but time is running so fast.</p> <p>In fact the design for the UPGRADE has been done and reviewed already, but there's yet to reach consensus on how to setup which is the upgrade file to use when upgrading from a given version to another. I've solved it in my patch, of course, by adding properties into the extension's <em>control file</em>. That's the best place to have that setup I think, it allows lots of flexibility, leave the extension's author in charge, and avoids any hard coding of any kind of assumptions about file naming or whatever.</p> <p>Next days and reviews will tell us more about how the design is received. Meanwhile, we're working on finalizing the main extension's patch, offering pg_dump support.</p> <p>See you at <a href="http://fosdem.org/2011/">FOSDEM</a>!</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/conferences.html">Conferences</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/fosdem.html">FOSDEM</a> <a href="../../../tags/9.1.html">9.1</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 01 Feb 2011 13:35:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/02/01-going-to-fosdem.html</guid> </item>

<item>

<title>Starting afresh with el-get</title> <link>http://tapoueh.org/blog/2011/01/blog/2011/01/11-starting-afresh-with-el-get.html</link> <description><![CDATA[<p>It so happens that a colleague of mine wanted to start using <a href="http://www.gnu.org/software/emacs/">Emacs</a> but couldn't get to it. He insists on having proper color themes in all applications and some sensible defaults full of nifty add-ons everywhere, and didn't want to have to learn that much about <em>Emacs</em> and <em>Emacs Lisp</em> to get started. I'm not even sure that he will <a href="http://www.gnu.org/software/emacs/tour/">Take the Emacs tour</a>.</p>

<p>You would tell me that there's nothing we can do for so unfriendly users. Well, here's what I did:</p> <pre class="src"> <span style="color: #b22222;">;; </span><span style="color: #b22222;">emacs setup </span> (add-to-list 'load-path <span style="color: #bc8f8f;">"~/.emacs.d/el-get/el-get"</span>) (<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">el-get</span>) (setq

el-get-sources '(el-get php-mode-improved psvn auto-complete switch-window

(<span style="color: #da70d6;">:name</span> buffer-move <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> () (global-set-key (kbd <span style="color: #bc8f8f;">"&lt;C-S-up&gt;"</span>) 'buf-move-up) (global-set-key (kbd <span style="color: #bc8f8f;">"&lt;C-S-down&gt;"</span>) 'buf-move-down) (global-set-key (kbd <span style="color: #bc8f8f;">"&lt;C-S-left&gt;"</span>) 'buf-move-left) (global-set-key (kbd <span style="color: #bc8f8f;">"&lt;C-S-right&gt;"</span>) 'buf-move-right)))

(<span style="color: #da70d6;">:name</span> magit <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> () (global-set-key (kbd <span style="color: #bc8f8f;">"C-x C-z"</span>) 'magit-status)))

(<span style="color: #da70d6;">:name</span> goto-last-change <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> () <span style="color: #b22222;">;; </span><span style="color: #b22222;">azerty keyboard here, don't use C-x C-/ </span> (global-set-key (kbd <span style="color: #bc8f8f;">"C-x C-"</span>) 'goto-last-change)))))

(<span style="color: #7f007f;">when</span> window-system

(add-to-list 'el-get-sources 'color-theme-tango))

(el-get 'sync)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">visual settings </span>(setq inhibit-splash-screen t) (menu-bar-mode -1) (tool-bar-mode -1) (scroll-bar-mode -1)

(line-number-mode 1) (column-number-mode 1)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">Use the clipboard, pretty please, so that copy/paste "works" </span>(setq x-select-enable-clipboard t)

(set-frame-font <span style="color: #bc8f8f;">"Monospace-10"</span>)

(global-hl-line-mode)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">suivre les changements exterieurs sur les fichiers </span>(global-auto-revert-mode 1)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">pour les couleurs dans M-x shell </span>(autoload 'ansi-color-for-comint-mode-on <span style="color: #bc8f8f;">"ansi-color"</span> nil t) (add-hook 'shell-mode-hook 'ansi-color-for-comint-mode-on)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">S-fleches pour changer de fen&#234;tre </span>(windmove-default-keybindings) (setq windmove-wrap-around t)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">find-file-at-point quand &#231;a a du sens </span>(setq ffap-machine-p-known 'accept) <span style="color: #b22222;">; </span><span style="color: #b22222;">no pinging </span>(setq ffap-url-regexp nil) <span style="color: #b22222;">; </span><span style="color: #b22222;">disable URL features in ffap </span>(setq ffap-ftp-regexp nil) <span style="color: #b22222;">; </span><span style="color: #b22222;">disable FTP features in ffap </span>(define-key global-map (kbd <span style="color: #bc8f8f;">"C-x C-f"</span>) 'find-file-at-point)

(<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">ibuffer</span>) (global-set-key <span style="color: #bc8f8f;">"\C-x\C-b"</span> 'ibuffer)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">use iswitchb-mode for C-x b </span>(iswitchb-mode)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">I can't remember having meant to use C-z as suspend-frame </span>(global-set-key (kbd <span style="color: #bc8f8f;">"C-z"</span>) 'undo)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">winner-mode pour revenir sur le layout pr&#233;c&#233;dent C-c &lt;left&gt; </span>(winner-mode 1)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">dired-x pour C-x C-j </span>(<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">dired-x</span>)

<span style="color: #b22222;">;; </span><span style="color: #b22222;">full screen </span>(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">fullscreen</span> ()

(interactive) (set-frame-parameter nil 'fullscreen (<span style="color: #7f007f;">if</span> (frame-parameter nil 'fullscreen) nil 'fullboth))) (global-set-key [f11] 'fullscreen) </pre>

<p>With just this simple 87 lines (all included) of setup, my local user is very happy to switch to using <a href="http://www.gnu.org/software/emacs/">our favorite editor</a>. And he's not even afraid (yet) of his ~/.emacs. I say that's a very good sign of where we are with <a href="https://github.com/dimitri/el-get">el-get</a>!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 11 Jan 2011 16:20:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/01/blog/2011/01/11-starting-afresh-with-el-get.html</guid> </item> <item> <title>Starting afresh with el-get</title> <link>http://tapoueh.org/blog/2011/01/11-starting-afresh-with-el-get.html</link> <description><![CDATA[h1>Starting afresh with el-get</h1>

Tuesday, January 11 2011, 16:20 </div>
<p>It so happens that a colleague of mine wanted to start using <a href="http://www.gnu.org/software/emacs/">Emacs</a> but couldn't get to it. He insists on having proper color themes in all applications and some sensible defaults full of nifty add-ons everywhere, and didn't want to have to learn that much about <em>Emacs</em> and <em>Emacs Lisp</em> to get started. I'm not even sure that he will <a href="http://www.gnu.org/software/emacs/tour/">Take the Emacs tour</a>.</p>

<p>You would tell me that there's nothing we can do for so unfriendly users. Well, here's what I did:</p> <pre class="src"> <span style="color: #888a85;">;; </span><span style="color: #888a85;">emacs setup </span> (add-to-list 'load-path <span style="color: #ad7fa8; font-style: italic;">"~/.emacs.d/el-get/el-get"</span>) (<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">el-get</span>) (setq

el-get-sources '(el-get php-mode-improved psvn auto-complete switch-window

(<span style="color: #729fcf;">:name</span> buffer-move <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> () (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"&lt;C-S-up&gt;"</span>) 'buf-move-up) (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"&lt;C-S-down&gt;"</span>) 'buf-move-down) (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"&lt;C-S-left&gt;"</span>) 'buf-move-left) (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"&lt;C-S-right&gt;"</span>) 'buf-move-right)))

(<span style="color: #729fcf;">:name</span> magit <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> () (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-x C-z"</span>) 'magit-status)))

(<span style="color: #729fcf;">:name</span> goto-last-change <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> () <span style="color: #888a85;">;; </span><span style="color: #888a85;">azerty keyboard here, don't use C-x C-/ </span> (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-x C-"</span>) 'goto-last-change)))))

(<span style="color: #729fcf; font-weight: bold;">when</span> window-system

(add-to-list 'el-get-sources 'color-theme-tango))

(el-get 'sync)

<span style="color: #888a85;">;; </span><span style="color: #888a85;">visual settings </span>(setq inhibit-splash-screen t) (menu-bar-mode -1) (tool-bar-mode -1) (scroll-bar-mode -1)

(line-number-mode 1) (column-number-mode 1)

<span style="color: #888a85;">;; </span><span style="color: #888a85;">Use the clipboard, pretty please, so that copy/paste "works" </span>(setq x-select-enable-clipboard t)

(set-frame-font <span style="color: #ad7fa8; font-style: italic;">"Monospace-10"</span>)

(global-hl-line-mode)

<span style="color: #888a85;">;; </span><span style="color: #888a85;">suivre les changements exterieurs sur les fichiers </span>(global-auto-revert-mode 1)

<span style="color: #888a85;">;; </span><span style="color: #888a85;">pour les couleurs dans M-x shell </span>(autoload 'ansi-color-for-comint-mode-on <span style="color: #ad7fa8; font-style: italic;">"ansi-color"</span> nil t) (add-hook 'shell-mode-hook 'ansi-color-for-comint-mode-on)

<span style="color: #888a85;">;; </span><span style="color: #888a85;">S-fleches pour changer de fen&#234;tre </span>(windmove-default-keybindings) (setq windmove-wrap-around t)

<span style="color: #888a85;">;; </span><span style="color: #888a85;">find-file-at-point quand &#231;a a du sens </span>(setq ffap-machine-p-known 'accept) <span style="color: #888a85;">; </span><span style="color: #888a85;">no pinging </span>(setq ffap-url-regexp nil) <span style="color: #888a85;">; </span><span style="color: #888a85;">disable URL features in ffap </span>(setq ffap-ftp-regexp nil) <span style="color: #888a85;">; </span><span style="color: #888a85;">disable FTP features in ffap </span>(define-key global-map (kbd <span style="color: #ad7fa8; font-style: italic;">"C-x C-f"</span>) 'find-file-at-point)

(<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">ibuffer</span>) (global-set-key <span style="color: #ad7fa8; font-style: italic;">"\C-x\C-b"</span> 'ibuffer)

<span style="color: #888a85;">;; </span><span style="color: #888a85;">use iswitchb-mode for C-x b </span>(iswitchb-mode)

<span style="color: #888a85;">;; </span><span style="color: #888a85;">I can't remember having meant to use C-z as suspend-frame </span>(global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-z"</span>) 'undo)

<span style="color: #888a85;">;; </span><span style="color: #888a85;">winner-mode pour revenir sur le layout pr&#233;c&#233;dent C-c &lt;left&gt; </span>(winner-mode 1)

<span style="color: #888a85;">;; </span><span style="color: #888a85;">dired-x pour C-x C-j </span>(<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">dired-x</span>)

<span style="color: #888a85;">;; </span><span style="color: #888a85;">full screen </span>(<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">fullscreen</span> ()

(interactive) (set-frame-parameter nil 'fullscreen (<span style="color: #729fcf; font-weight: bold;">if</span> (frame-parameter nil 'fullscreen) nil 'fullboth))) (global-set-key [f11] 'fullscreen) </pre>

<p>With just this simple 87 lines (all included) of setup, my local user is very happy to switch to using <a href="http://www.gnu.org/software/emacs/">our favorite editor</a>. And he's not even afraid (yet) of his ~/.emacs. I say that's a very good sign of where we are with <a href="https://github.com/dimitri/el-get">el-get</a>!</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/switch-window.html">switch-window</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 11 Jan 2011 16:20:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/01/11-starting-afresh-with-el-get.html</guid> </item> <item> <title>el-get 1.1, with 174 recipes</title> <link>http://tapoueh.org/blog/2010/12/blog/2010/12/20-el-get-11-with-174-recipes.html</link> <description><![CDATA[<p>Yes, you read it well, <a href="https://github.com/dimitri/el-get">el-get</a> currently <em>features</em> 174 <a href="https://github.com/dimitri/el-get/tree/master/recipes">recipes</a>, and is now reaching the 1.1 release. The reason for this release is mainly that I have two big chunks of code to review and the current code has been very stable for awhile. It seems better to do a release with the stable code that exists now before to shake it this much. If you're wondering when to jump in the water and switch to using <em>el-get</em>, now is a pretty good time.</p>

<h3>New source types</h3> <p class="first">We now have support for the <a href="http://www.archlinux.org/pacman/">pacman</a> package management for <a href="http://www.archlinux.org/">archlinux</a>, and a way to handle a different package name in the recipe and in the distribution. We also have support for <a href="http://mercurial.selenic.com/">mercurial</a> and <a href="http://subversion.tigris.org/">subversion</a> and <a href="http://darcs.net/">darcs</a>.</p> <p>Also, <a href="http://wiki.debian.org/Apt">apt-get</a> will sometime prompt you to validate its choices, that's the infamous <em>Do you want to continue?</em> prompt. We now handle that smoothly.</p> <h3>(el-get 'sync)</h3> <p class="first">In 1.1, that really means <em>synchronous</em>. That means we install one package after the other, and any error will stop it all. Before that, it was an active wait loop over a parallel install: this option is still available through calling (el-get 'wait).</p> <h3>No more <em>failed to install</em></h3> <p class="first">Exactly. This error you may have encountered sometime is due to trying to install a package over a previous failed install attempt (network outage, disk full, bad work-in-progress recipe, etc). After awhile in the field it was clear that no case where found where you would regret it if <a href="https://github.com/dimitri/el-get">el-get</a> just did removed the previous failed installation for you before to go and install again, as aked. So that's now automatic.</p> <h3>Featuring an overhauled :build facility</h3> <p class="first">The build commands can now either be a list, as before, or some that we <em>evaluate</em> for you. That allows for easier to maintain <em>recipes</em>, and here's an exemple of that:</p> <pre class="src"> (<span style="color: #da70d6;">:name</span> distel

<span style="color: #da70d6;">:type</span> svn <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"http://distel.googlecode.com/svn/trunk/"</span> <span style="color: #da70d6;">:info</span> <span style="color: #bc8f8f;">"doc"</span> <span style="color: #da70d6;">:build</span> `,(mapcar (<span style="color: #7f007f;">lambda</span> (target) (concat <span style="color: #bc8f8f;">"make "</span> target <span style="color: #bc8f8f;">" EMACS="</span> el-get-emacs)) '(<span style="color: #bc8f8f;">"clean"</span> <span style="color: #bc8f8f;">"all"</span>)) <span style="color: #da70d6;">:load-path</span> (<span style="color: #bc8f8f;">"elisp"</span>) <span style="color: #da70d6;">:features</span> distel) </pre>

<p>As you see that also allows for maintainance of multi-platform build recipes, and multiple emacs versions too. It's still a little too much on the <em>awkward</em> side of things, though, and that's one of the ongoing work that will happen for next version.</p> <h3>Misc improvements</h3> <p class="first">We are now able to byte-compile your packages, and offer some more hooks (el-get-init-hooks has been asked with a nice usage example). There's a new :localname property that allows to pick where to save the local file when using HTTP method for retrieval, and that in turn allows to fix some <em>recipes</em>.</p> <pre class="src"> (<span style="color: #da70d6;">:name</span> xcscope

<span style="color: #da70d6;">:type</span> http <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"http://cscope.cvs.sourceforge.net/viewvc/cscope/cscope/contrib/xcsc</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">ope/xcscope.el?revision=1.14&amp;content-type=text%2Fplain"</span> <span style="color: #da70d6;">:localname</span> <span style="color: #bc8f8f;">"xscope.el"</span> <span style="color: #da70d6;">:features</span> xcscope) </pre>

<p>Oh and you even get :before user function support, even if needing it often shows that you're doing it in a strange way. More often than not it's possible to do all you need to in the :after function, but this tool is there so that you spend less time on having a working environment, not more, right? :)</p> <h3>Switch notice</h3> <p class="first">All in all, if you're already using <a href="https://github.com/dimitri/el-get">el-get</a> you should consider switching to 1.1 (by issuing M-x el-get-update of course), and if you're hesitating, just join the fun now!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 20 Dec 2010 16:45:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/12/blog/2010/12/20-el-get-11-with-174-recipes.html</guid> </item> <item> <title>el-get 1.1, with 174 recipes</title> <link>http://tapoueh.org/blog/2010/12/20-el-get-11-with-174-recipes.html</link> <description><![CDATA[h1>el-get 1.1, with 174 recipes</h1>

Monday, December 20 2010, 16:45 </div>
<p>Yes, you read it well, <a href="https://github.com/dimitri/el-get">el-get</a> currently <em>features</em> 174 <a href="https://github.com/dimitri/el-get/tree/master/recipes">recipes</a>, and is now reaching the 1.1 release. The reason for this release is mainly that I have two big chunks of code to review and the current code has been very stable for awhile. It seems better to do a release with the stable code that exists now before to shake it this much. If you're wondering when to jump in the water and switch to using <em>el-get</em>, now is a pretty good time.</p>

<h3>New source types</h3> <p class="first">We now have support for the <a href="http://www.archlinux.org/pacman/">pacman</a> package management for <a href="http://www.archlinux.org/">archlinux</a>, and a way to handle a different package name in the recipe and in the distribution. We also have support for <a href="http://mercurial.selenic.com/">mercurial</a> and <a href="http://subversion.tigris.org/">subversion</a> and <a href="http://darcs.net/">darcs</a>.</p> <p>Also, <a href="http://wiki.debian.org/Apt">apt-get</a> will sometime prompt you to validate its choices, that's the infamous <em>Do you want to continue?</em> prompt. We now handle that smoothly.</p> <h3>(el-get 'sync)</h3> <p class="first">In 1.1, that really means <em>synchronous</em>. That means we install one package after the other, and any error will stop it all. Before that, it was an active wait loop over a parallel install: this option is still available through calling (el-get 'wait).</p> <h3>No more <em>failed to install</em></h3> <p class="first">Exactly. This error you may have encountered sometime is due to trying to install a package over a previous failed install attempt (network outage, disk full, bad work-in-progress recipe, etc). After awhile in the field it was clear that no case where found where you would regret it if <a href="https://github.com/dimitri/el-get">el-get</a> just did removed the previous failed installation for you before to go and install again, as aked. So that's now automatic.</p> <h3>Featuring an overhauled :build facility</h3> <p class="first">The build commands can now either be a list, as before, or some that we <em>evaluate</em> for you. That allows for easier to maintain <em>recipes</em>, and here's an exemple of that:</p> <pre class="src"> (<span style="color: #729fcf;">:name</span> distel

<span style="color: #729fcf;">:type</span> svn <span style="color: #729fcf;">:url</span> <span style="color: #ad7fa8; font-style: italic;">"http://distel.googlecode.com/svn/trunk/"</span> <span style="color: #729fcf;">:info</span> <span style="color: #ad7fa8; font-style: italic;">"doc"</span> <span style="color: #729fcf;">:build</span> `,(mapcar (<span style="color: #729fcf; font-weight: bold;">lambda</span> (target) (concat <span style="color: #ad7fa8; font-style: italic;">"make "</span> target <span style="color: #ad7fa8; font-style: italic;">" EMACS="</span> el-get-emacs)) '(<span style="color: #ad7fa8; font-style: italic;">"clean"</span> <span style="color: #ad7fa8; font-style: italic;">"all"</span>)) <span style="color: #729fcf;">:load-path</span> (<span style="color: #ad7fa8; font-style: italic;">"elisp"</span>) <span style="color: #729fcf;">:features</span> distel) </pre>

<p>As you see that also allows for maintainance of multi-platform build recipes, and multiple emacs versions too. It's still a little too much on the <em>awkward</em> side of things, though, and that's one of the ongoing work that will happen for next version.</p> <h3>Misc improvements</h3> <p class="first">We are now able to byte-compile your packages, and offer some more hooks (el-get-init-hooks has been asked with a nice usage example). There's a new :localname property that allows to pick where to save the local file when using HTTP method for retrieval, and that in turn allows to fix some <em>recipes</em>.</p> <pre class="src"> (<span style="color: #729fcf;">:name</span> xcscope

<span style="color: #729fcf;">:type</span> http <span style="color: #729fcf;">:url</span> <span style="color: #ad7fa8; font-style: italic;">"http://cscope.cvs.sourceforge.net/viewvc/cscope/cscope/contrib/xcsc</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">ope/xcscope.el?revision=1.14&amp;content-type=text%2Fplain"</span> <span style="color: #729fcf;">:localname</span> <span style="color: #ad7fa8; font-style: italic;">"xscope.el"</span> <span style="color: #729fcf;">:features</span> xcscope) </pre>

<p>Oh and you even get :before user function support, even if needing it often shows that you're doing it in a strange way. More often than not it's possible to do all you need to in the :after function, but this tool is there so that you spend less time on having a working environment, not more, right? :)</p> <h3>Switch notice</h3> <p class="first">All in all, if you're already using <a href="https://github.com/dimitri/el-get">el-get</a> you should consider switching to 1.1 (by issuing M-x el-get-update of course), and if you're hesitating, just join the fun now!</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/release.html">release</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 20 Dec 2010 16:45:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/12/20-el-get-11-with-174-recipes.html</guid> </item>

<item>

<title>Dynamic Triggers in PLpgSQL</title> <link>http://tapoueh.org/blog/2010/11/24-dynamic-triggers-in-plpgsql.html</link> <description><![CDATA[h1>Dynamic Triggers in PLpgSQL</h1>

Wednesday, November 24 2010, 16:45 </div>
<p>You certainly know that implementing <em>dynamic</em> triggers in PLpgSQL is impossible. But I had a very bad night, being up from as soon as 3:30 am today, so that when a developer asked me about reusing the same trigger function code from more than one table and for a dynamic column name, I didn't remember about it being impossible.</p>

<p>Here's what happens in such cases, after a long time on the problem (yes, overall, that's a slow day). Note that I'm abusing the (record_literal).* notation a lot in there, and even the (record_literal).column_name too.</p> <pre class="src"> CREATE OR REPLACE FUNCTION public.update_timestamp()

RETURNS TRIGGER LANGUAGE plpgsql AS $f$ DECLARE ts_column varchar; old_timestamp timestamptz; attname name; n text; v text; BEGIN IF TG_NARGS != 1 THEN RAISE EXCEPTION <span style="color: #ad7fa8; font-style: italic;">'Trigger public.update_timestamp() called with % args'</span>, TG_NARGS; END IF;

ts_column := TG_ARGV[0];

EXECUTE <span style="color: #ad7fa8; font-style: italic;">'SELECT n.'</span> ts_column
<span style="color: #ad7fa8; font-style: italic;">' FROM (SELECT ('</span>
quote_literal(OLD) <span style="color: #ad7fa8; font-style: italic;">'::'</span> TG_RELID::regclass
<span style="color: #ad7fa8; font-style: italic;">').*) as n'</span>

INTO old_timestamp;

<span style="color: #888a85;">— build NEW record text </span> n := <span style="color: #ad7fa8; font-style: italic;">'('</span>; FOR attname IN EXECUTE <span style="color: #ad7fa8; font-style: italic;">'SELECT attname '</span>

<span style="color: #ad7fa8; font-style: italic;">' FROM pg_class c left join pg_attribute a on a.attrelid = c.oid'</span>
<span style="color: #ad7fa8; font-style: italic;">' WHERE c.oid = $1 and attnum &gt; 0 order by attnum'</span>

USING TG_RELID LOOP

EXECUTE <span style="color: #ad7fa8; font-style: italic;">'SELECT ('</span> quote_literal(NEW) <span style="color: #ad7fa8; font-style: italic;">'::'</span> TG_RELID::regclass <span style="color: #ad7fa8; font-style: italic;">').'</span> attname INTO v;
IF n != <span style="color: #ad7fa8; font-style: italic;">'('</span> THEN n := n <span style="color: #ad7fa8; font-style: italic;">','</span>; END IF;

IF attname = ts_column AND v::timestamptz IS NOT DISTINCT FROM old_timestamp THEN

n := n now();

ELSE

n := n COALESCE(v, <span style="color: #ad7fa8; font-style: italic;">''</span>);

END IF; END LOOP;

n := n <span style="color: #ad7fa8; font-style: italic;">')'</span>;
EXECUTE <span style="color: #ad7fa8; font-style: italic;">'SELECT ($1::'</span> TG_RELID::regclass <span style="color: #ad7fa8; font-style: italic;">').*'</span>

INTO NEW USING n;

RETURN NEW; END; $f$; </pre>

<p>It's not pretty, and not fast. It's about 2 ms per call on a table with 15 columns, in some preliminary tests. But it sure was a nice challenge!</p> <h2>Tags</h2> <p><a href="../../../tags/plpgsql.html">plpgsql</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 24 Nov 2010 16:45:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/11/24-dynamic-triggers-in-plpgsql.html</guid> </item> <item> <title>pg_basebackup</title> <link>http://tapoueh.org/blog/2010/11/07-pg_basebackup.html</link> <description><![CDATA[h1>pg_basebackup</h1>

Sunday, November 07 2010, 13:45 </div>
<p><a href="http://2ndquadrant.com/about/#krosing">Hannu</a> just gave me a good idea in <a href="http://archives.postgresql.org/pgsql-hackers/2010-11/msg00236.php">this email</a> on <a href="http://archives.postgresql.org/pgsql-hackers/">-hackers</a>, proposing that <a href="https://github.com/dimitri/pg_basebackup">pg_basebackup</a> should get the xlog files again and again in a loop for the whole duration of the <em>base backup</em>. That's now done in the aforementioned tool, whose options got a little more useful now:</p>

<pre class="src"> Usage: pg_basebackup.py [-v] [-f] [-j jobs] <span style="color: #ad7fa8; font-style: italic;">"dsn"</span> dest

Options:

<p>Yeah, as implementing the xlog idea required having some kind of parallelism, I built on it and the script now has a --jobs option for you to setup how many processes to launch in parallel, all fetching some base backup files in its own standard (libpq) <a href="http://www.postgresql.org/">PostgreSQL</a> connection, in compressed chunks of 8 MB (so that's not 8 MB chunks sent over).</p> <p>The xlog loop will fetch any WAL file whose ctime changed again, wholesale. It's easier this way, and tools to get optimized behavior already do exist, either <a href="http://skytools.projects.postgresql.org/doc/walmgr.html">walmgr</a> or <a href="http://www.postgresql.org/docs/9.0/interactive/warm-standby.html#STREAMING-REPLICATION">walreceiver</a>.</p> <p>The script is still a little <a href="http://python.org/">python</a> self-contained short file, it just went from about 100 lines of code to about 400 lines. There's no external dependency, all it needs is provided by a standard python installation. The problem with that is that it's using select.poll() that I think is not available on windows. Supporting every system or adding to the dependencies, I've been choosing what's easier for me.</p> <pre class="src">

<span style="color: #729fcf; font-weight: bold;">import</span> select <span style="color: #eeeeec;">p</span> = select.poll() p.register(sys.stdin, select.POLLIN) </pre>

<p>If you get to try it, please report about it, you should know or easily discover my <em>email</em>!</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/skytools.html">skytools</a> <a href="../../../tags/backup.html">backup</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Sun, 07 Nov 2010 13:45:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/11/07-pg_basebackup.html</guid> </item>

<item>

<title>Introducing Extensions</title> <link>http://tapoueh.org/blog/2010/10/21-introducing-extensions.html</link> <description><![CDATA[h1>Introducing Extensions</h1>

Thursday, October 21 2010, 13:45 </div>
<p>After reading <a href="http://database-explorer.blogspot.com/2010/10/extensions-in-91.html">Simon's blog post</a>, I can't help but try to give some details about what it is exactly that I'm working on. As he said, there are several aspects to <em>extensions</em> in <a href="http://www.postgresql.org/">PostgreSQL</a>, it all begins here: <a href="http://www.postgresql.org/docs/9.0/interactive/extend.html">Chapter 35. Extending SQL</a>.</p>

<p>It's possible, and mostly simple enough, to add your own code or behavior to PostgreSQL, so that it will use your code and your semantics while solving user queries. That's highly useful and it's easy to understand how so when you look at some projects like <a href="http://postgis.refractions.net/">PostGIS</a>, <a href="http://pgfoundry.org/projects/ip4r/">ip4r</a> (index searches of ip in a range, not limited to CIDR notation), or our own <em>Key Value Store</em>, <a href="http://www.postgresql.org/docs/9.0/interactive/hstore.html">hstore</a>.</p> <h3>So, what's in an <em>Extension</em>?</h3> <p class="first">An <em>extension</em> in its simple form is a SQL <em>script</em> that you load on your database, but manage separately. Meaning you don't want the script to be part of your backups. Often, that kind of script will create new datatypes and operators, support functions, user functions and index support, and then it would include some C code that ships in a <em>shared library object</em>.</p> <p>As far as PostgreSQL is concerned, at least in the current version of my patch, the extension is first a <em>meta</em> information file that allows to register it. We currently call that the control file. Then, it's an SQL script that is <em>executed</em> by the server when you create the <em>extension</em>.</p> <p>If it so happens that the SQL script depends on some <em>shared library objects</em> file, this has to be present at the right place (MODULE_PATHNAME) for the <em>extension</em> to be successfully created, but that's always been the case.</p> <p>The problem with current releases of PostgreSQL, that the <em>extension</em> patch is solving, is the pg_dump and pg_restore support. We said it, you don't want the SQL script to be part of your dump, because it's not maintained in your database, but in some code repository out there. What you want is to be able to install the <em>extension</em> again at the file system level then pg_restore your database — that depends on it being there.</p> <p>And that's exactly what the <em>extension</em> patch provides. By now having a SQL object called an extension, and maintained in the new pg_extension catalog, we have an Oid to refer to. Which we do by recording a dependency between any object created by the script and the <em>extension</em> Oid, so that pg_dump can be instructed to skip those.</p> <h3>Examples?</h3> <p class="first">So, let's have a look at what you can do if you play with a patched development server version, or if you play directly from the git repository at <a href="http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=shortlog;h=refs/heads/extension">http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=shortlog;h=refs/heads/extension</a></p> <pre class="src"> dim ~ createdb exts dim ~ psql exts psql (9.1devel) Type <span style="color: #ad7fa8; font-style: italic;">"help"</span> for help.

dim=# \dx+

List of extensions

Name | Description

+-+-+————————————————————————-
adminpack | Administrative functions for PostgreSQL
auto_username | functions for tracking who changed a table
autoinc | functions for autoincrementing fields
btree_gin | GIN support for common types BTree operators
btree_gist | GiST support for common types BTree operators
chkpass | Store crypt()ed passwords
citext | case-insensitive character string type
cube | data type for representing multidimensional cubes
dblink | connect to other PostgreSQL databases from within a database
dict_int | example of an add-on dictionary template for full-text search
dict_xsyn | example of an add-on dictionary template for full-text search
earthdistance | calculating great circle distances on the surface of the Earth
fuzzystrmatch | determine similarities and distance between strings
hstore | storing sets of key/value pairs
int_aggregate | integer aggregator and an enumerator (obsolete)
intarray | one-dimensional arrays of integers: functions, operators, index support
isn | data types for the international product numbering standards
lo | managing Large Objects
ltree | data type for hierarchical tree-like structure
moddatetime | functions for tracking last modification time
pageinspect | inspect the contents of database pages at a low level
pg_buffercache | examine the shared buffer cache in real time
pg_freespacemap | examine the free space map (FSM)
pg_stat_statements | tracking execution statistics of all SQL statements executed
pg_trgm | determine the similarity of text, with indexing support
pgcrypto | cryptographic functions
pgrowlocks | show row locking information for a specified table
pgstattuple | obtain tuple-level statistics
prefix | Prefix Match Indexing
refint | functions for implementing referential integrity
seg | data type for representing line segments, or floating point intervals
tablefunc | various functions that return tables, including crosstab(text sql)
test_parser | example of a custom parser for full-text search
timetravel | functions for implementing time travel
tsearch2 | backwards-compatible text search functionality (pre-8.3)
unaccent | text search dictionary that removes accents

(36 rows) </pre>

<p>Ok I've edited the output in a visible way, to leave the <em>Version</em> and <em>Custom Variable Classes</em> column out. It's taking lots of screen place and it's not that useful here. Maybe the <em>classes</em> one will even get dropped out of the patch before reaching 9.1, we'll see.</p> <p>Let's pick an extension there and install it in our new database:</p> <pre class="src"> exts=# create extension pg_trgm; NOTICE: Installing extension 'pg_trgm' from '/Users/dim/pgsql/exts/share/contrib/pg_trgm.sql', with user data CREATE EXTENSION exts=# \dx

List of extensions

Name Description

+—+—+———————————————————
pg_trgm determine the similarity of text, with indexing support

(1 row) </pre>

<p>See, that was easy enough. Same thing, the extra columns have been removed. So, what's in this extension, will you ask me, what are those objects that you would normally (that is, before the patch) find in your pg_dump backup script?</p> <pre class="src"> exts=# select * from pg_extension_objects('pg_trgm');
class classid objid objdesc

+———+——-+—————————————————————————————————————————————-
pg_extension 3996 18498 extension pg_trgm
pg_proc 1255 18499 function set_limit(real)
pg_proc 1255 18500 function show_limit()
pg_proc 1255 18501 function show_trgm(text)
pg_proc 1255 18502 function similarity(text,text)
pg_proc 1255 18503 function similarity_op(text,text)
pg_operator 2617 18504 operator %(text,text)
pg_type 1247 18505 type gtrgm
pg_proc 1255 18506 function gtrgm_in(cstring)
pg_proc 1255 18507 function gtrgm_out(gtrgm)
pg_type 1247 18508 type gtrgm[]
pg_proc 1255 18509 function gtrgm_consistent(internal,text,integer,oid,internal)
pg_proc 1255 18510 function gtrgm_compress(internal)
pg_proc 1255 18511 function gtrgm_decompress(internal)
pg_proc 1255 18512 function gtrgm_penalty(internal,internal,internal)
pg_proc 1255 18513 function gtrgm_picksplit(internal,internal)
pg_proc 1255 18514 function gtrgm_union(bytea,internal)
pg_proc 1255 18515 function gtrgm_same(gtrgm,gtrgm,internal)
pg_opfamily 2753 18516 operator family gist_trgm_ops for access method gist
pg_opclass 2616 18517 operator class gist_trgm_ops for access method gist
pg_amop 2602 18518 operator 1 %(text,text) of operator family gist_trgm_ops for access method gist
pg_amproc 2603 18519 function 1 gtrgm_consistent(internal,text,integer,oid,internal) of operator family gist_trgm_ops for access method gist
pg_amproc 2603 18520 function 2 gtrgm_union(bytea,internal) of operator family gist_trgm_ops for access method gist
pg_amproc 2603 18521 function 3 gtrgm_compress(internal) of operator family gist_trgm_ops for access method gist
pg_amproc 2603 18522 function 4 gtrgm_decompress(internal) of operator family gist_trgm_ops for access method gist
pg_amproc 2603 18523 function 5 gtrgm_penalty(internal,internal,internal) of operator family gist_trgm_ops for access method gist
pg_amproc 2603 18524 function 6 gtrgm_picksplit(internal,internal) of operator family gist_trgm_ops for access method gist
pg_amproc 2603 18525 function 7 gtrgm_same(gtrgm,gtrgm,internal) of operator family gist_trgm_ops for access method gist
pg_proc 1255 18526 function gin_extract_trgm(text,internal)
pg_proc 1255 18527 function gin_extract_trgm(text,internal,smallint,internal,internal)
pg_proc 1255 18528 function gin_trgm_consistent(internal,smallint,text,integer,internal,internal)
pg_opfamily 2753 18529 operator family gin_trgm_ops for access method gin
pg_opclass 2616 18530 operator class gin_trgm_ops for access method gin
pg_amop 2602 18531 operator 1 %(text,text) of operator family gin_trgm_ops for access method gin
pg_amproc 2603 18532 function 1 btint4cmp(integer,integer) of operator family gin_trgm_ops for access method gin
pg_amproc 2603 18533 function 2 gin_extract_trgm(text,internal) of operator family gin_trgm_ops for access method gin
pg_amproc 2603 18534 function 3 gin_extract_trgm(text,internal,smallint,internal,internal) of operator family gin_trgm_ops for access method gin
pg_amproc 2603 18535 function 4 gin_trgm_consistent(internal,smallint,text,integer,internal,internal) of operator family gin_trgm_ops for access method gin

(38 rows) </pre>

<p>This function main intended users are the <em>extension</em> authors themselves, so that it's easy for them to figure out which system identifier (the objid column) has been attributed to some SQL objects from their install script. With this knowledge, you can prepare some <em>upgrade</em> scripts. But that's for another patch altogether, so we'll get back to the matter in another blog entry.</p> <p>So we chose <a href="http://www.postgresql.org/docs/9.0/interactive/pgtrgm.html">trgm</a> as an example, let's follow the documentation and create a test table and a custom index in there, just so that the extension is put to good use. Then let's try to DROP our extension, because we're testing the infrastructure, right?</p> <pre class="src"> exts=# create table test(id bigint, name text); CREATE TABLE exts=# CREATE INDEX idx_test_name ON test USING gist (name gist_trgm_ops); CREATE INDEX exts=# drop extension pg_trgm; ERROR: cannot drop extension pg_trgm because other objects depend on it DETAIL: index idx_test_name depends on operator class gist_trgm_ops for access method gist HINT: Use DROP ... CASCADE to drop the dependent objects too. </pre> <p>Of course PostgreSQL is smart enough here — the <em>extension</em> patch had nothing special to do to achieve that, apart from recording the dependencies. Next, as we didn't drop extension pg_trgm cascade;, it's still in the database. So let's see what a pg_dump will look like. As it's quite a lot of text to paste, let's see the pg_restore catalog instead. And that's a feature that needs to be known some more, too.</p> <pre class="src">
dim ~ pg_dump -Fc exts pg_restore -l grep -v '^;'

1812; 1262 18497 DATABASE - exts dim 1; 3996 18498 EXTENSION - pg_trgm 1813; 0 0 COMMENT - EXTENSION pg_trgm 6; 2615 2200 SCHEMA - public dim 1814; 0 0 COMMENT - SCHEMA public dim 1815; 0 0 ACL - public dim 320; 2612 11602 PROCEDURAL LANGUAGE - plpgsql dim 1521; 1259 18543 TABLE public test dim 1809; 0 18543 TABLE DATA public test dim 1808; 1259 18549 INDEX public idx_test_name dim </pre>

<p>As you see, the only SQL object that got into the backup are an EXTENSION and its COMMENT. Nothing like the types or the functions that the pg_trgm script creates.</p> <h3>What does it means to extension authors?</h3> <p class="first">In order to be an <em>extension</em>, you have to prepare a <em>control</em> file where to give the necessary information to register your script. This file must be named extension.control if the script is named extension.sql, at least at the moment. This file can benefit from some variable expansion too, like does the current extension.sql.in, in that if you provide an extension.control.in file the term VERSION will be expanded to whatever $(VERSION) is set to in your Makefile.</p> <p>If you never wrote a C coded <em>extension</em> for PostgreSQL, this might look complex and irrelevant. Baseline is that you need a Makefile so that you can benefit easily from the PostgreSQL infrastructure work and have the make install operation place your files at the right place, including the new control file.</p> <h3>That's it for today, folks</h3> <p class="first">A next blog entry will detail what happens with extensions providing <em>user data</em>, and the CREATE EXTENSION name WITH NO DATA; variant. Stay tuned!</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/ip4r.html">ip4r</a> <a href="../../../tags/plpgsql.html">plpgsql</a> <a href="../../../tags/backup.html">backup</a> <a href="../../../tags/restore.html">restore</a> <a href="../../../tags/prefix.html">prefix</a> <a href="../../../tags/9.1.html">9.1</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 21 Oct 2010 13:45:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/10/21-introducing-extensions.html</guid> </item> <item> <title>Extensions: writing a patch for PostgreSQL</title> <link>http://tapoueh.org/blog/2010/10/15-extensions-writing-a-patch-for-postgresql.html</link> <description><![CDATA[h1>Extensions: writing a patch for PostgreSQL</h1>

Friday, October 15 2010, 11:30 </div>
<p><span class="hack"> </span></p>

<p>These days, thanks to my <a href="http://2ndquadrant.com/">community oriented job</a>, I'm working full time on a <a href="http://www.postgresql.org/">PostgreSQL</a> patch to terminate basic support for <a href="http://www.postgresql.org/docs/9/static/extend.html">extending SQL</a>. First thing I want to share is that patching the <em>backend code</em> is not as hard as one would think. Second one is that <a href="http://git-scm.com/">git</a> really is helping.</p> <p><em>“Not as hard as one would think</em>, are you kidding me?”, I hear some say. Well, that's true. It's C code in there, but with a very good layer of abstractions so that you're not dealing with subtle problems that much. Of course it happens that you have to, and managing the memory isn't an option. That said, palloc() and the <em>memory contexts</em> implementation makes that as easy as <em>in lots of cases, you don't have to think about it</em>.</p> <p>PostgreSQL is very well known for its reliability, and that's not something that just happened. All the source code is organized in a way that makes it possible, so your main task is to write code that looks as much as possible like the existing surrounding code. And we all know how to <em>copy paste</em>, right?</p> <p>So, my current work on the <em>extensions</em> is to make it so that if you install <a href="http://www.postgresql.org/docs/9.0/interactive/hstore.html">hstore</a> in your database (to pick an example), your backup won't contain any <em>hstore</em> specific objects (types, functions, operators, index support objects, etc) but rather a single line that tells PostgreSQL to install <em>hstore</em> again.</p> <pre class="src"> CREATE EXTENSION hstore; </pre> <p>The feature already works in <a href="http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=shortlog;h=refs/heads/extension">my git branch</a> and I'm extracting infrastructure work in there to ease review. That's when git helps a lot. What I've done is create a new branch from the master one, then <a href="http://www.kernel.org/pub/software/scm/git/docs/git-cherry-pick.html">cherry pick</a> the patches of interest. Well sometime you have to resort to helper tools. I've been told after the fact that using git cherry-pick -n would have allowed the following to be much simpler:</p> <pre class="src"> dim ~/dev/PostgreSQL/postgresql-extension git cherry-pick 3f291b4f82598309368610431cf2a18d7b7a7950 error: could not apply 3f291b4... Implement dependency tracking for CREATE EXTENSION, and DROP EXTENSION ... CASCADE. hint: after resolving the conflicts, mark the corrected paths hint: with 'git add &lt;paths&gt;' or 'git rm &lt;paths&gt;' hint: and commit the result with 'git commit -c 3f291b4' dim ~/dev/PostgreSQL/postgresql-extension git status \ | awk '/modified/ &amp;&amp; ! /both/ &amp;&amp; ! /genfile/ {print $3}

/deleted/ {print $5} /both/ {print $4}' \ | xargs echo git reset — \ | sh Unstaged changes after reset: M src/backend/catalog/dependency.c M src/backend/catalog/heap.c M src/backend/catalog/pg_aggregate.c M src/backend/catalog/pg_conversion.c M src/backend/catalog/pg_namespace.c M src/backend/catalog/pg_operator.c M src/backend/catalog/pg_proc.c M src/backend/catalog/pg_type.c M src/backend/commands/extension.c M src/backend/commands/foreigncmds.c M src/backend/commands/opclasscmds.c M src/backend/commands/proclang.c M src/backend/commands/tsearchcmds.c M src/backend/nodes/copyfuncs.c M src/backend/nodes/equalfuncs.c M src/backend/parser/gram.y M src/include/catalog/dependency.h M src/include/commands/extension.h M src/include/nodes/parsenodes.h </pre>

<p>That's what I did to prepare a side branch containing only changes to a part of my current work. I had to filter the diff so much only because I'm commiting in rather big steps, rather than very little chunks at a time. In this case that means I had a single patch with several <em>units</em> of changes and I wanted to extract only one. Well, it happens that even in such a case, git is helping!</p> <p>There's more to say about the <em>extension</em> related feature of course, but that'll do it for this article. I'd just end up with the following nice <em>diffstat</em> of 4 days of work:</p> <pre class="src"> dim ~/dev/PostgreSQL/postgresql-extension git —no-pager diff master..|wc -l

3897 dim ~/dev/PostgreSQL/postgresql-extension git —no-pager diff master..|diffstat

doc/src/sgml/extend.sgml 46 ++
doc/src/sgml/ref/allfiles.sgml 2
doc/src/sgml/ref/create_extension.sgml 95 ++++
doc/src/sgml/ref/drop_extension.sgml 115 +++++
doc/src/sgml/reference.sgml 2
src/backend/access/transam/xlog.c 95 —-
src/backend/catalog/Makefile 1
src/backend/catalog/dependency.c 25 +
src/backend/catalog/heap.c 9
src/backend/catalog/objectaddress.c 14
src/backend/catalog/pg_aggregate.c 7
src/backend/catalog/pg_conversion.c 7
src/backend/catalog/pg_namespace.c 13
src/backend/catalog/pg_operator.c 7
src/backend/catalog/pg_proc.c 7
src/backend/catalog/pg_type.c 8
src/backend/commands/Makefile 3
src/backend/commands/comment.c 6
src/backend/commands/extension.c 688 +++++++++++++++++++++++++++++++++
src/backend/commands/foreigncmds.c 19
src/backend/commands/functioncmds.c 7
src/backend/commands/opclasscmds.c 13
src/backend/commands/proclang.c 7
src/backend/commands/tsearchcmds.c 25 +
src/backend/nodes/copyfuncs.c 22 +
src/backend/nodes/equalfuncs.c 18
src/backend/parser/gram.y 51 ++
src/backend/tcop/utility.c 27 +
src/backend/utils/adt/genfile.c 193 +++++++++
src/backend/utils/init/postinit.c 3
src/backend/utils/misc/Makefile 2
src/backend/utils/misc/cfparser.c 113 +++++
src/backend/utils/misc/guc-file.l 26 -
src/backend/utils/misc/guc.c 160 ++++++-
src/bin/pg_dump/common.c 6
src/bin/pg_dump/pg_dump.c 520 ++++++++++++++++++++++—
src/bin/pg_dump/pg_dump.h 10
src/bin/pg_dump/pg_dump_sort.c 7
src/bin/psql/command.c 3
src/bin/psql/describe.c 45 ++
src/bin/psql/describe.h 3
src/bin/psql/help.c 1
src/include/catalog/dependency.h 1
src/include/catalog/indexing.h 6
src/include/catalog/pg_extension.h 61 ++
src/include/catalog/pg_proc.h 13
src/include/catalog/toasting.h 1
src/include/commands/extension.h 54 ++
src/include/nodes/nodes.h 2
src/include/nodes/parsenodes.h 20
src/include/parser/kwlist.h 1
src/include/utils/builtins.h 4
src/include/utils/cfparser.h 18
src/include/utils/guc.h 11
src/makefiles/pgxs.mk 21 -

55 files changed, 2456 insertions(+), 188 deletions(-) </pre>

<h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/backup.html">backup</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 15 Oct 2010 11:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/10/15-extensions-writing-a-patch-for-postgresql.html</guid> </item> <item> <title>Date puzzle for starters</title> <link>http://tapoueh.org/blog/2010/10/08-date-puzzle-for-starters.html</link> <description><![CDATA[h1>Date puzzle for starters</h1>

Friday, October 08 2010, 10:00 </div>
<p>The <a href="http://www.postgresql.org/">PostgreSQL</a> IRC channel is a good place to be, for all the very good help you can get there, because people are always wanting to remain helpful, because of the off-topics discussions sometime, or to get to talk with community core members. And to start up your day too.</p>

<p>This morning's question started simple : “how can I check if today is the &quot;first sunday fo the month&quot;. or &quot;the second tuesday of the month&quot; etc?”</p> <p>And the first version of the answer, quite simple it is too:</p> <pre class="src"> dim=# with begin(d) as (select date_trunc(<span style="color: #ad7fa8; font-style: italic;">'month'</span>, <span style="color: #ad7fa8; font-style: italic;">'today'</span>::date)::date) dim-# select d + 7 - extract(dow from d)::int as sunday from begin;

sunday <span style="color: #888a85;">———— </span> 2010-10-03 (1 row) </pre>

<p>So you just have to compare the result of the function with 'today'::date and there you go. The problem is that the question could be read in the other way round, like, what is today in <em>first</em> or <em>second</em> <em>day name</em> of this month <em>format</em>? Once more, <a href="http://blog.rhodiumtoad.org.uk/">RhodiumToad</a> to the rescue:</p> <pre class="src"> select to_char(current_date,
<span style="color: #ad7fa8; font-style: italic;">'"'</span> ((ARRAY[<span style="color: #ad7fa8; font-style: italic;">'First'</span>,<span style="color: #ad7fa8; font-style: italic;">'Second'</span>,<span style="color: #ad7fa8; font-style: italic;">'Third'</span>,<span style="color: #ad7fa8; font-style: italic;">'Fourth'</span>,<span style="color: #ad7fa8; font-style: italic;">'Fifth'</span>])

[(extract(day from current_date)::integer - 1)/7 + 1] )

<span style="color: #ad7fa8; font-style: italic;">'" Day'</span>);

to_char <span style="color: #888a85;">—————— </span> Second Friday (1 row) </pre>

<p>That's a straight answer to the question, read that way!</p> <p>But the part that I found nice to play with was my first reading of the question, as I don't get to lose my ideas that easily, you see… so what about writing a function to return the date of any <em>nth</em> occurrence of a given <em>day of week</em> in a <em>given month</em>, defaulting to this very month?</p> <pre class="src"> create or replace function get_nth_dow_of_month

( nth int, dow int, begin date default current_date ) returns date language sql strict as $$ with month(d) as ( select generate_series(date_trunc(<span style="color: #ad7fa8; font-style: italic;">'month'</span>, $3), date_trunc(<span style="color: #ad7fa8; font-style: italic;">'month'</span>, $3) + interval <span style="color: #ad7fa8; font-style: italic;">'1 month - 1 day'</span>, interval <span style="color: #ad7fa8; font-style: italic;">'1 day'</span>)::date ), repeat as ( select d, extract(dow from d) as dow, (d - date_trunc(<span style="color: #ad7fa8; font-style: italic;">'month'</span>, $3)::date) / 7 as repeat from month ) select d from repeat where dow = $2 and repeat = $1; $$;

dim=# select get_nth_dow_of_month(0, 0);

get_nth_dow_of_month <span style="color: #888a85;">———————- </span> 2010-10-03 (1 row)

dim=# select get_nth_dow_of_month(1, 4, <span style="color: #ad7fa8; font-style: italic;">'2010-09-12'</span>);

get_nth_dow_of_month <span style="color: #888a85;">———————- </span> 2010-09-09 (1 row) </pre>

<p>So you see we just got the first Sunday of this month (0, 0) and the second Thursday (1, 4) of the previous one. Any date within a month is a good way to tell which month you want to work in, as the function's written, abusing date_trunc like it does.</p> <p>Now the way the function is written is unfinished. You want to fix it in one of two ways. Either stop using generate_series to only output one row at a time, or fix the API so that you can ask for more than a <em>nth dow</em> at a time. Of course, that was a starter for me, not a problem I need to solve directly, and that was a good excuse for a blog entry, so I won't fix it. That's left as an exercise to our interested readers!</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/9.1.html">9.1</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 08 Oct 2010 10:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/10/08-date-puzzle-for-starters.html</guid> </item> <item> <title>Resuming work on Extensions, first little step</title> <link>http://tapoueh.org/blog/2010/10/07-resuming-work-on-extensions-first-little-step.html</link> <description><![CDATA[h1>Resuming work on Extensions, first little step</h1>

Thursday, October 07 2010, 17:15 </div>
<p><span class="hack"> </span></p>

<p>Yeah I'm back on working on my part of the extension thing in <a href="http://www.postgresql.org/">PostgreSQL</a>.</p> <p>First step is a little one, but as it has public consequences, I figured I'd talk about it already. I've just refreshed my git repository to follow the new master one, and you can see that here <a href="http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=commitdiff;h=9a88e9de246218e93c04b6b97e1ef61d97925430">http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=commitdiff;h=9a88e9de246218e93c04b6b97e1ef61d97925430</a>.</p> <p>It's been easier than I feared, mainly:</p> <pre class="src"> $ git —no-pager diff master..extension $ git —no-pager format-patch master..extension $ cp 0001-First-stab-at-writing-pg_execute_from_file-function.patch .. $ git checkout master $ git pull -f pgmaster $ git reset —hard pgmaster/master $ git checkout extension $ git reset —hard master $ git am -s ../0001-First-stab-at-writing-pg_execute_from_file-function.edit.patch $ git status
$ git log —short head

$ git log -n2 —oneline $ git push -f </pre>

<p>So that's still more steps that one want to call dead simple, but still. The format-patch command is to save my work away (all patches that are in the <em>extension</em> branch but not in the <em>master</em> — well that was only one of them here). Then, as the master repository URL didn't change, I can simply pull the changes in. Of course I had a nice message <em>warning: no common commits</em>.</p> <p>Once pulled, I trashed my local copy and replaced it with the new official one, that's git reset --hard pgmaster/master, then in the <em>extension</em> branch I could trash it and have it linked to the local master again.</p> <p>Of course, the git am method wouldn't apply my patch as-is, there was some underlying changes in the source files, the identification tag changed from $PostgreSQL$ to, e.g., src/backend/utils/adt/genfile.c, and I had to cope with that. Maybe there's some tool (git am -3 ?) to do it automatically, I just copy edited the .patch file.</p> <p>Lastly, it's all about checking the result and publishing the result. This last line is git push -f and is when I just trashed and replaced my <a href="http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=summary">postgresql-extension</a> community repository. I don't think anybody was following it, but should it be the case, you will have to <em>reinit</em> your copy.</p> <p>More blog posts to come about extensions, as I arranged to have some real time to devote on the topic. At least I was able to arrange things so that I can work on the subject for real, and the first thing I did, the very night before it was meant to begin, is catch a <em>tonsillitis</em>. Lost about a week, not the project! Stay tuned!</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/extensions.html">Extensions</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 07 Oct 2010 17:15:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/10/07-resuming-work-on-extensions-first-little-step.html</guid> </item> <item> <title>el-get reaches 1.0</title> <link>http://tapoueh.org/blog/2010/10/blog/2010/10/07-el-get-reaches-10.html</link> <description><![CDATA[<p>It's been a week since the last commits in the <a href="http://github.com/dimitri/el-get">el-get repository</a>, and those were all about fixing and adding recipes, and about notifications. Nothing like <em>core plumbing</em> you see. Also, 0.9 was released on <em>2010-08-24</em> and felt pretty complete already, then received lots of improvements. It's high time to cross the line and call it 1.0!</p>

<p>Now existing users will certainly just be moderatly happy to see the tool reach that version number, depending whether they think more about the bugs they want to see fixed (ftp is supported, only called http) and the new features they want to see in (<em>info</em> documentation) or more about what el-get does for them already today...</p> <p>For the new users, or the yet-to-be-convinced users, let's take some time and talk about el-get. A <em>FAQ</em> like session might be best.</p> <h3>How is el-get different from ELPA?</h3> <p><a href="http://tromey.com/elpa/">ELPA</a> is the <em>Emacs Lisp Package Archive</em> and is also known as package.el, to be included in Emacs 24. This allows emacs list extension authors to <em>package</em> their work. That means they have to follow some guidelines and format their contribution, then propose it for upload.</p> <p>This requires licence checks (good) and for the <a href="http://elpa.gnu.org/">new official ELPA mirror</a> it even requires dead-tree papers exchange and contracts and copyright assignments, I believe.</p> <h3>Why have both?</h3> <p class="first">While <em>ELPA</em> is a great thing to have, it's so easy to find some high quality Emacs extension out there that are not part of the offer. Either authors are not interrested into uploading to ELPA, or they don't know how to properly <em>package</em> for it (it's only simple for single file extensions, see).</p> <p>So el-get is a pragmatic answer here. It's there because it so happens that I don't depend only on emacs extensions that are available with Emacs itself, in my distribution site-lisp and in ELPA. I need some more, and I don't need it to be complex to find it, fetch it, init it and use it.</p> <p>Of course I could try and package any extension I find I need and submit it to ELPA, but really, to do that nicely I'd need to contact the extension author (<em>upstream</em>) for him to accept my patch, and then consider a fork.</p> <p>With el-get I propose distributed packaging if you will. Let's have a look at two <em>recipes</em> here. First, the el-get one itself:</p> <pre class="src"> (<span style="color: #da70d6;">:name</span> el-get

<span style="color: #da70d6;">:type</span> git <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"git://github.com/dimitri/el-get.git"</span> <span style="color: #da70d6;">:features</span> el-get <span style="color: #da70d6;">:compile</span> <span style="color: #bc8f8f;">"el-get.el"</span>) </pre>

<p>Then a much more complex one, the <a href="http://bbdb.sourceforge.net/">bbdb</a> one:</p> <pre class="src"> (<span style="color: #da70d6;">:name</span> bbdb

<span style="color: #da70d6;">:type</span> git <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"git://github.com/barak/BBDB.git"</span> <span style="color: #da70d6;">:load-path</span> (<span style="color: #bc8f8f;">"./lisp"</span> <span style="color: #bc8f8f;">"./bits"</span>) <span style="color: #da70d6;">:build</span> (<span style="color: #bc8f8f;">"./configure"</span> <span style="color: #bc8f8f;">"make autoloads"</span> <span style="color: #bc8f8f;">"make"</span>) <span style="color: #da70d6;">:build/darwin</span> (<span style="color: #bc8f8f;">"./configure —with-emacs=/Applications/Emacs.app/Contents</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">/MacOS/Emacs" "make autoloads" "make")</span> <span style="color: #da70d6;">:features</span> bbdb <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> () (bbdb-initialize)) <span style="color: #da70d6;">:info</span> <span style="color: #bc8f8f;">"texinfo"</span>) </pre>

<p>The idea is that it's much simpler to just come up with a recipe like this than to patch existing code and upload it to ELPA. And anybody can share their <em>recipes</em> very easily, with or without proposing them to me, even if I very much like to add some more in the official el-get list.</p> <p>As a user, you don't even need to twiddle with recipes, mostly, because we already have them for you. What you do instead is list them in el-get-sources.</p> <h3>So, show me how you use it?</h3> <p class="first">Yeah, sure. Here's a sample of my dim-packages.el file, part of my .emacs <em>suite</em>. Yeah a single .emacs does not suit me anymore, it's a complete .emacs.d now, but that's because that's how I like it organised, you know. So, here's the example:</p> <pre class="src"> <span style="color: #b22222;">;;; </span><span style="color: #b22222;">dim-packages.el — Dimitri Fontaine </span><span style="color: #b22222;">;;</span><span style="color: #b22222;"> </span><span style="color: #b22222;">;; </span><span style="color: #b22222;">Set el-get-sources and call el-get to init all those packages we need. </span>(<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">el-get</span>) (add-to-list 'el-get-recipe-path <span style="color: #bc8f8f;">"~/dev/emacs/el-get/recipes"</span>)

(setq el-get-sources

'(cssh el-get switch-window vkill google-maps yasnippet verbiste mailq sic<span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">p</span>

(<span style="color: #da70d6;">:name</span> magit <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> () (global-set-key (kbd <span style="color: #bc8f8f;">"C-x C-z"</span>) 'magit-status))<span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">)</span>

(<span style="color: #da70d6;">:name</span> asciidoc <span style="color: #da70d6;">:type</span> elpa <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> () (autoload 'doc-mode <span style="color: #bc8f8f;">"doc-mode"</span> nil t) (add-to-list 'auto-mode-alist '(<span style="color: #bc8f8f;">"\\.adoc$"</span> . doc-mode)) (add-hook 'doc-mode-hook '(<span style="color: #7f007f;">lambda</span> () (turn-on-auto-fill) (<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">asciidoc</span>)))))

(<span style="color: #da70d6;">:name</span> goto-last-change <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> () (global-set-key (kbd <span style="color: #bc8f8f;">"C-x C-/"</span>) 'goto-last-change)))

(<span style="color: #da70d6;">:name</span> auto-dictionary <span style="color: #da70d6;">:type</span> elpa) (<span style="color: #da70d6;">:name</span> gist <span style="color: #da70d6;">:type</span> elpa) (<span style="color: #da70d6;">:name</span> lisppaste <span style="color: #da70d6;">:type</span> elpa)))

(el-get) <span style="color: #b22222;">; </span><span style="color: #b22222;">that could/should be (el-get 'sync) </span>(<span style="color: #7f007f;">provide</span> '<span style="color: #5f9ea0;">dim-packages</span>) </pre>

<p>Ok that's not all of it, but it should give you a nice idea about what problem I solve with el-get and how. In my emacs startup sequence, somewhere inside my ~/.emacs.d/init.el file, I have a line that says (require 'dim-packages). This will set el-get-sources to the list just above, then call (el-get), the main function.</p> <p>This main function will check each given package and install it if necessary (including <em>build</em> the package, as in make autoloads; make), then <em>init</em> it. What <em>init</em> means exactly depends on what the recipe says. That can include <em>byte-compiling</em> some files, caring about <em>load-path</em>, <em>load</em> and <em>require</em> commands, caring about <em>Info-directory-list</em> and ginstall-info too, and some more.</p> <p>So in short, it will make it so that your emacs instance is ready for you to use. And you get the choice to use the given el-get recipes as-is, like I did for cssh, el-get, switch-window and others, up to sicp, or to tweak them partly, like in the magit example where I've added a user init function (the :after property) to bind magit-status to C-x C-z here. You can even embed a full recipe inline in the el-get-sources variable, that's the case for each item that gives its :type property, like asciidoc or gist.</p> <p>And, as you see, we're using ELPA a lot in this sources, so el-get isn't striving to replace it at all, it's just trying to accomodate to a broader world.</p> <h3>I read that the el-get-install is asynchronous, tell me more.</h3> <p class="first">Yeah, right, the example above says (el-get) at its end, and in the cases when el-get has to install or build sources, this will be done asynchronously. Which means that not only several sources will get processed at once (using your multi cores, yeah) but that it will let emacs start up as if it was ready.</p> <p>It happens that's usually what I want, because I seldom add sources in my setup, but in theory that can break your emacs. What I do is start it again or fix by hand, what you can do instead is (el-get 'sync) so that emacs is blocked waiting for el-get to properly install and initialize all the sources you've setup. Your choice, just add the 'sync parameter there.</p> <h3>Now, explain me why it is better this way, again, please?</h3> <p class="first">Well, before I wrote el-get, trying out a new extension, setting it up etc was something quite involved, and that I had to redo on several machines. The only way not to redo it was to include the extension's code into my own git repository (my emacs.d is in git, of course).</p> <p>And putting code I don't maintain into my own git repository is something I frown upon. I have no business pretending I'll maintain the code, and I know I will never think to check the URL where I've found it for updates. That's when I though noting down the URL somewhere.</p> <p>Also, what about sharing the extension with friends. Uneasy, at best.</p> <p>Enters el-get and I can just add an entry to el-get-sources, based on a file somewhere in my own el-get-recipe-path. When I'm happy with this file, I can contribute it to el-get proper or just send it over to any interested recipient. Adding it to your sources is easy. Copy the file in your el-get-recipe-path somewhere, add its name to your el-get-sources, then M-x el-get-install it. Done. If you were given the :after function, it's all setup already.</p> <p>If you contribute the recipe to el-get, then M-x el-get-update RET el-get RET and you get it on this other machine where you also use Emacs. Or you can tell your friend to do the same and benefit from your <em>packaging</em>.</p> <h3>Well, sounds good. What recipes do you have already?</h3> <p class="first">I count 67 of them already. One of them is just a book in <em>info</em> format, with no <em>elisp</em> at all, can you spot it?</p> <pre class="src"> ELISP&gt; (directory-files <span style="color: #bc8f8f;">"~/dev/emacs/el-get/recipes/"</span> nil <span style="color: #bc8f8f;">"el$"</span>)

(<span style="color: #bc8f8f;">"auctex.el"</span> <span style="color: #bc8f8f;">"auto-complete-etags.el"</span> <span style="color: #bc8f8f;">"auto-complete-extension.el"</span> <span style="color: #bc8f8f;">"auto-complete.el"</span> <span style="color: #bc8f8f;">"auto-install.el"</span> <span style="color: #bc8f8f;">"autopair.el"</span> <span style="color: #bc8f8f;">"bbdb.el"</span> <span style="color: #bc8f8f;">"blender-python-mode.el"</span> <span style="color: #bc8f8f;">"color-theme-twilight.el"</span> <span style="color: #bc8f8f;">"color-theme.el"</span> <span style="color: #bc8f8f;">"cssh.el"</span> <span style="color: #bc8f8f;">"django-mode.el"</span> <span style="color: #bc8f8f;">"el-get.el"</span> <span style="color: #bc8f8f;">"emacs-w3m.el"</span> <span style="color: #bc8f8f;">"emacschrome.el"</span> <span style="color: #bc8f8f;">"emms.el"</span> <span style="color: #bc8f8f;">"ensime.el"</span> <span style="color: #bc8f8f;">"erc-highlight-nicknames.el"</span> <span style="color: #bc8f8f;">"erc-track-score.el"</span> <span style="color: #bc8f8f;">"escreen.el"</span> <span style="color: #bc8f8f;">"filladapt.el"</span> <span style="color: #bc8f8f;">"flyguess.el"</span> <span style="color: #bc8f8f;">"gist.el"</span> <span style="color: #bc8f8f;">"google-maps.el"</span> <span style="color: #bc8f8f;">"google-weather.el"</span> <span style="color: #bc8f8f;">"goto-last-change.el"</span> <span style="color: #bc8f8f;">"haskell-mode.el"</span> <span style="color: #bc8f8f;">"highlight-parentheses.el"</span> <span style="color: #bc8f8f;">"hl-sexp.el"</span> <span style="color: #bc8f8f;">"levenshtein.el"</span> <span style="color: #bc8f8f;">"magit.el"</span> <span style="color: #bc8f8f;">"mailq.el"</span> <span style="color: #bc8f8f;">"maxframe.el"</span> <span style="color: #bc8f8f;">"multi-term.el"</span> <span style="color: #bc8f8f;">"muse-blog.el"</span> <span style="color: #bc8f8f;">"nognus.el"</span> <span style="color: #bc8f8f;">"nterm.el"</span> <span style="color: #bc8f8f;">"nxhtml.el"</span> <span style="color: #bc8f8f;">"offlineimap.el"</span> <span style="color: #bc8f8f;">"package.el"</span> <span style="color: #bc8f8f;">"popup-kill-ring.el"</span> <span style="color: #bc8f8f;">"pos-tip.el"</span> <span style="color: #bc8f8f;">"pov-mode.el"</span> <span style="color: #bc8f8f;">"psvn.el"</span> <span style="color: #bc8f8f;">"pymacs.el"</span> <span style="color: #bc8f8f;">"rainbow-mode.el"</span> <span style="color: #bc8f8f;">"rcirc-groups.el"</span> <span style="color: #bc8f8f;">"rinari.el"</span> <span style="color: #bc8f8f;">"ropemacs.el"</span> <span style="color: #bc8f8f;">"rt-liberation.el"</span> <span style="color: #bc8f8f;">"scratch.el"</span> <span style="color: #bc8f8f;">"session.el"</span> <span style="color: #bc8f8f;">"sicp.el"</span> <span style="color: #bc8f8f;">"smex.el"</span> <span style="color: #bc8f8f;">"switch-window.el"</span> <span style="color: #bc8f8f;">"textile-mode.el"</span> <span style="color: #bc8f8f;">"todochiku.el"</span> <span style="color: #bc8f8f;">"twitter.el"</span> <span style="color: #bc8f8f;">"twittering-mode.el"</span> <span style="color: #bc8f8f;">"undo-tree.el"</span> <span style="color: #bc8f8f;">"verbiste.el"</span> <span style="color: #bc8f8f;">"vimpulse-surround.el"</span> <span style="color: #bc8f8f;">"vimpulse.el"</span> <span style="color: #bc8f8f;">"vkill.el"</span> <span style="color: #bc8f8f;">"xcscope.el"</span> <span style="color: #bc8f8f;">"xml-rpc-el.el"</span> <span style="color: #bc8f8f;">"yasnippet.el"</span>) </pre>

<h3>Ok, I want to try it, what's next?</h3> <p class="first">Visit the following URL <a href="http://github.com/dimitri/el-get">http://github.com/dimitri/el-get</a> and follow the install instructions. You're given a <em>scratch installer</em> there, that's some <em>elisp</em> code you copy paste into *scratch* then execute there, and you have el-get ready to serve.</p> <p>An excellent idea I stole at ELPA!</p> <h3>Hey, I already know what el-get is, what's new in 1.0?</h3> <p class="first">The <em>changelog</em> is quite full of good stuff, really:</p> <ul> <li>Implement el-get recipes so that el-get-sources can be a simple list of symbols. Now that there's an authoritative git repository, where to share the recipes is easy.</li> <li>Add support for emacswiki directly, save from having to enter the URL</li> <li>Implement package status on-disk saving so that installing over a previously failed install is in theory possible. Currently `el-get' will refrain from removing your package automatically, though.</li> <li>Fix ELPA remove method, adding a &quot;removed&quot; state too.</li> <li>Implement CVS login support.</li> <li>Add lots of recipes</li> <li>Add support for `system-type' specific build commands</li> <li>Byte compile files from the load-path entries or :compile files</li> <li>Implement support for git submodules with the command `git submodule update &mdash;init &mdash;recursive`</li> <li>Add catch-all post-install and post-update hooks</li> <li>Add desktop notification on install/update.</li> </ul> <h3>I'm still using the deprecated emacswiki version, what now?</h3> <p class="first">That version didn't have recipes, and the new version should be perfectly happy with your current el-get-sources, so that I recommend using the <em>scratch installer</em> too. Don't forget to add el-get itself into your el-get-sources list, of course!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 07 Oct 2010 13:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/10/blog/2010/10/07-el-get-reaches-10.html</guid> </item> <item> <title>el-get reaches 1.0</title> <link>http://tapoueh.org/blog/2010/10/07-el-get-reaches-10.html</link> <description><![CDATA[h1>el-get reaches 1.0</h1>

Thursday, October 07 2010, 13:30 </div>
<p>It's been a week since the last commits in the <a href="http://github.com/dimitri/el-get">el-get repository</a>, and those were all about fixing and adding recipes, and about notifications. Nothing like <em>core plumbing</em> you see. Also, 0.9 was released on <em>2010-08-24</em> and felt pretty complete already, then received lots of improvements. It's high time to cross the line and call it 1.0!</p>

<p>Now existing users will certainly just be moderatly happy to see the tool reach that version number, depending whether they think more about the bugs they want to see fixed (ftp is supported, only called http) and the new features they want to see in (<em>info</em> documentation) or more about what el-get does for them already today...</p> <p>For the new users, or the yet-to-be-convinced users, let's take some time and talk about el-get. A <em>FAQ</em> like session might be best.</p> <h3>How is el-get different from ELPA?</h3> <p><a href="http://tromey.com/elpa/">ELPA</a> is the <em>Emacs Lisp Package Archive</em> and is also known as package.el, to be included in Emacs 24. This allows emacs list extension authors to <em>package</em> their work. That means they have to follow some guidelines and format their contribution, then propose it for upload.</p> <p>This requires licence checks (good) and for the <a href="http://elpa.gnu.org/">new official ELPA mirror</a> it even requires dead-tree papers exchange and contracts and copyright assignments, I believe.</p> <h3>Why have both?</h3> <p class="first">While <em>ELPA</em> is a great thing to have, it's so easy to find some high quality Emacs extension out there that are not part of the offer. Either authors are not interrested into uploading to ELPA, or they don't know how to properly <em>package</em> for it (it's only simple for single file extensions, see).</p> <p>So el-get is a pragmatic answer here. It's there because it so happens that I don't depend only on emacs extensions that are available with Emacs itself, in my distribution site-lisp and in ELPA. I need some more, and I don't need it to be complex to find it, fetch it, init it and use it.</p> <p>Of course I could try and package any extension I find I need and submit it to ELPA, but really, to do that nicely I'd need to contact the extension author (<em>upstream</em>) for him to accept my patch, and then consider a fork.</p> <p>With el-get I propose distributed packaging if you will. Let's have a look at two <em>recipes</em> here. First, the el-get one itself:</p> <pre class="src"> (<span style="color: #729fcf;">:name</span> el-get

<span style="color: #729fcf;">:type</span> git <span style="color: #729fcf;">:url</span> <span style="color: #ad7fa8; font-style: italic;">"git://github.com/dimitri/el-get.git"</span> <span style="color: #729fcf;">:features</span> el-get <span style="color: #729fcf;">:compile</span> <span style="color: #ad7fa8; font-style: italic;">"el-get.el"</span>) </pre>

<p>Then a much more complex one, the <a href="http://bbdb.sourceforge.net/">bbdb</a> one:</p> <pre class="src"> (<span style="color: #729fcf;">:name</span> bbdb

<span style="color: #729fcf;">:type</span> git <span style="color: #729fcf;">:url</span> <span style="color: #ad7fa8; font-style: italic;">"git://github.com/barak/BBDB.git"</span> <span style="color: #729fcf;">:load-path</span> (<span style="color: #ad7fa8; font-style: italic;">"./lisp"</span> <span style="color: #ad7fa8; font-style: italic;">"./bits"</span>) <span style="color: #729fcf;">:build</span> (<span style="color: #ad7fa8; font-style: italic;">"./configure"</span> <span style="color: #ad7fa8; font-style: italic;">"make autoloads"</span> <span style="color: #ad7fa8; font-style: italic;">"make"</span>) <span style="color: #729fcf;">:build/darwin</span> (<span style="color: #ad7fa8; font-style: italic;">"./configure —with-emacs=/Applications/Emacs.app/Contents</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">/MacOS/Emacs" "make autoloads" "make")</span> <span style="color: #729fcf;">:features</span> bbdb <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> () (bbdb-initialize)) <span style="color: #729fcf;">:info</span> <span style="color: #ad7fa8; font-style: italic;">"texinfo"</span>) </pre>

<p>The idea is that it's much simpler to just come up with a recipe like this than to patch existing code and upload it to ELPA. And anybody can share their <em>recipes</em> very easily, with or without proposing them to me, even if I very much like to add some more in the official el-get list.</p> <p>As a user, you don't even need to twiddle with recipes, mostly, because we already have them for you. What you do instead is list them in el-get-sources.</p> <h3>So, show me how you use it?</h3> <p class="first">Yeah, sure. Here's a sample of my dim-packages.el file, part of my .emacs <em>suite</em>. Yeah a single .emacs does not suit me anymore, it's a complete .emacs.d now, but that's because that's how I like it organised, you know. So, here's the example:</p> <pre class="src"> <span style="color: #888a85;">;;; </span><span style="color: #888a85;">dim-packages.el — Dimitri Fontaine </span><span style="color: #888a85;">;;</span><span style="color: #888a85;"> </span><span style="color: #888a85;">;; </span><span style="color: #888a85;">Set el-get-sources and call el-get to init all those packages we need. </span>(<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">el-get</span>) (add-to-list 'el-get-recipe-path <span style="color: #ad7fa8; font-style: italic;">"~/dev/emacs/el-get/recipes"</span>)

(setq el-get-sources

'(cssh el-get switch-window vkill google-maps yasnippet verbiste mailq sic<span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">p</span>

(<span style="color: #729fcf;">:name</span> magit <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> () (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-x C-z"</span>) 'magit-status))<span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">)</span>

(<span style="color: #729fcf;">:name</span> asciidoc <span style="color: #729fcf;">:type</span> elpa <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> () (autoload 'doc-mode <span style="color: #ad7fa8; font-style: italic;">"doc-mode"</span> nil t) (add-to-list 'auto-mode-alist '(<span style="color: #ad7fa8; font-style: italic;">"\\.adoc$"</span> . doc-mode)) (add-hook 'doc-mode-hook '(<span style="color: #729fcf; font-weight: bold;">lambda</span> () (turn-on-auto-fill) (<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">asciidoc</span>)))))

(<span style="color: #729fcf;">:name</span> goto-last-change <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> () (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-x C-/"</span>) 'goto-last-change)))

(<span style="color: #729fcf;">:name</span> auto-dictionary <span style="color: #729fcf;">:type</span> elpa) (<span style="color: #729fcf;">:name</span> gist <span style="color: #729fcf;">:type</span> elpa) (<span style="color: #729fcf;">:name</span> lisppaste <span style="color: #729fcf;">:type</span> elpa)))

(el-get) <span style="color: #888a85;">; </span><span style="color: #888a85;">that could/should be (el-get 'sync) </span>(<span style="color: #729fcf; font-weight: bold;">provide</span> '<span style="color: #8ae234;">dim-packages</span>) </pre>

<p>Ok that's not all of it, but it should give you a nice idea about what problem I solve with el-get and how. In my emacs startup sequence, somewhere inside my ~/.emacs.d/init.el file, I have a line that says (require 'dim-packages). This will set el-get-sources to the list just above, then call (el-get), the main function.</p> <p>This main function will check each given package and install it if necessary (including <em>build</em> the package, as in make autoloads; make), then <em>init</em> it. What <em>init</em> means exactly depends on what the recipe says. That can include <em>byte-compiling</em> some files, caring about <em>load-path</em>, <em>load</em> and <em>require</em> commands, caring about <em>Info-directory-list</em> and ginstall-info too, and some more.</p> <p>So in short, it will make it so that your emacs instance is ready for you to use. And you get the choice to use the given el-get recipes as-is, like I did for cssh, el-get, switch-window and others, up to sicp, or to tweak them partly, like in the magit example where I've added a user init function (the :after property) to bind magit-status to C-x C-z here. You can even embed a full recipe inline in the el-get-sources variable, that's the case for each item that gives its :type property, like asciidoc or gist.</p> <p>And, as you see, we're using ELPA a lot in this sources, so el-get isn't striving to replace it at all, it's just trying to accomodate to a broader world.</p> <h3>I read that the el-get-install is asynchronous, tell me more.</h3> <p class="first">Yeah, right, the example above says (el-get) at its end, and in the cases when el-get has to install or build sources, this will be done asynchronously. Which means that not only several sources will get processed at once (using your multi cores, yeah) but that it will let emacs start up as if it was ready.</p> <p>It happens that's usually what I want, because I seldom add sources in my setup, but in theory that can break your emacs. What I do is start it again or fix by hand, what you can do instead is (el-get 'sync) so that emacs is blocked waiting for el-get to properly install and initialize all the sources you've setup. Your choice, just add the 'sync parameter there.</p> <h3>Now, explain me why it is better this way, again, please?</h3> <p class="first">Well, before I wrote el-get, trying out a new extension, setting it up etc was something quite involved, and that I had to redo on several machines. The only way not to redo it was to include the extension's code into my own git repository (my emacs.d is in git, of course).</p> <p>And putting code I don't maintain into my own git repository is something I frown upon. I have no business pretending I'll maintain the code, and I know I will never think to check the URL where I've found it for updates. That's when I though noting down the URL somewhere.</p> <p>Also, what about sharing the extension with friends. Uneasy, at best.</p> <p>Enters el-get and I can just add an entry to el-get-sources, based on a file somewhere in my own el-get-recipe-path. When I'm happy with this file, I can contribute it to el-get proper or just send it over to any interested recipient. Adding it to your sources is easy. Copy the file in your el-get-recipe-path somewhere, add its name to your el-get-sources, then M-x el-get-install it. Done. If you were given the :after function, it's all setup already.</p> <p>If you contribute the recipe to el-get, then M-x el-get-update RET el-get RET and you get it on this other machine where you also use Emacs. Or you can tell your friend to do the same and benefit from your <em>packaging</em>.</p> <h3>Well, sounds good. What recipes do you have already?</h3> <p class="first">I count 67 of them already. One of them is just a book in <em>info</em> format, with no <em>elisp</em> at all, can you spot it?</p> <pre class="src"> ELISP&gt; (directory-files <span style="color: #ad7fa8; font-style: italic;">"~/dev/emacs/el-get/recipes/"</span> nil <span style="color: #ad7fa8; font-style: italic;">"el$"</span>)

(<span style="color: #ad7fa8; font-style: italic;">"auctex.el"</span> <span style="color: #ad7fa8; font-style: italic;">"auto-complete-etags.el"</span> <span style="color: #ad7fa8; font-style: italic;">"auto-complete-extension.el"</span> <span style="color: #ad7fa8; font-style: italic;">"auto-complete.el"</span> <span style="color: #ad7fa8; font-style: italic;">"auto-install.el"</span> <span style="color: #ad7fa8; font-style: italic;">"autopair.el"</span> <span style="color: #ad7fa8; font-style: italic;">"bbdb.el"</span> <span style="color: #ad7fa8; font-style: italic;">"blender-python-mode.el"</span> <span style="color: #ad7fa8; font-style: italic;">"color-theme-twilight.el"</span> <span style="color: #ad7fa8; font-style: italic;">"color-theme.el"</span> <span style="color: #ad7fa8; font-style: italic;">"cssh.el"</span> <span style="color: #ad7fa8; font-style: italic;">"django-mode.el"</span> <span style="color: #ad7fa8; font-style: italic;">"el-get.el"</span> <span style="color: #ad7fa8; font-style: italic;">"emacs-w3m.el"</span> <span style="color: #ad7fa8; font-style: italic;">"emacschrome.el"</span> <span style="color: #ad7fa8; font-style: italic;">"emms.el"</span> <span style="color: #ad7fa8; font-style: italic;">"ensime.el"</span> <span style="color: #ad7fa8; font-style: italic;">"erc-highlight-nicknames.el"</span> <span style="color: #ad7fa8; font-style: italic;">"erc-track-score.el"</span> <span style="color: #ad7fa8; font-style: italic;">"escreen.el"</span> <span style="color: #ad7fa8; font-style: italic;">"filladapt.el"</span> <span style="color: #ad7fa8; font-style: italic;">"flyguess.el"</span> <span style="color: #ad7fa8; font-style: italic;">"gist.el"</span> <span style="color: #ad7fa8; font-style: italic;">"google-maps.el"</span> <span style="color: #ad7fa8; font-style: italic;">"google-weather.el"</span> <span style="color: #ad7fa8; font-style: italic;">"goto-last-change.el"</span> <span style="color: #ad7fa8; font-style: italic;">"haskell-mode.el"</span> <span style="color: #ad7fa8; font-style: italic;">"highlight-parentheses.el"</span> <span style="color: #ad7fa8; font-style: italic;">"hl-sexp.el"</span> <span style="color: #ad7fa8; font-style: italic;">"levenshtein.el"</span> <span style="color: #ad7fa8; font-style: italic;">"magit.el"</span> <span style="color: #ad7fa8; font-style: italic;">"mailq.el"</span> <span style="color: #ad7fa8; font-style: italic;">"maxframe.el"</span> <span style="color: #ad7fa8; font-style: italic;">"multi-term.el"</span> <span style="color: #ad7fa8; font-style: italic;">"muse-blog.el"</span> <span style="color: #ad7fa8; font-style: italic;">"nognus.el"</span> <span style="color: #ad7fa8; font-style: italic;">"nterm.el"</span> <span style="color: #ad7fa8; font-style: italic;">"nxhtml.el"</span> <span style="color: #ad7fa8; font-style: italic;">"offlineimap.el"</span> <span style="color: #ad7fa8; font-style: italic;">"package.el"</span> <span style="color: #ad7fa8; font-style: italic;">"popup-kill-ring.el"</span> <span style="color: #ad7fa8; font-style: italic;">"pos-tip.el"</span> <span style="color: #ad7fa8; font-style: italic;">"pov-mode.el"</span> <span style="color: #ad7fa8; font-style: italic;">"psvn.el"</span> <span style="color: #ad7fa8; font-style: italic;">"pymacs.el"</span> <span style="color: #ad7fa8; font-style: italic;">"rainbow-mode.el"</span> <span style="color: #ad7fa8; font-style: italic;">"rcirc-groups.el"</span> <span style="color: #ad7fa8; font-style: italic;">"rinari.el"</span> <span style="color: #ad7fa8; font-style: italic;">"ropemacs.el"</span> <span style="color: #ad7fa8; font-style: italic;">"rt-liberation.el"</span> <span style="color: #ad7fa8; font-style: italic;">"scratch.el"</span> <span style="color: #ad7fa8; font-style: italic;">"session.el"</span> <span style="color: #ad7fa8; font-style: italic;">"sicp.el"</span> <span style="color: #ad7fa8; font-style: italic;">"smex.el"</span> <span style="color: #ad7fa8; font-style: italic;">"switch-window.el"</span> <span style="color: #ad7fa8; font-style: italic;">"textile-mode.el"</span> <span style="color: #ad7fa8; font-style: italic;">"todochiku.el"</span> <span style="color: #ad7fa8; font-style: italic;">"twitter.el"</span> <span style="color: #ad7fa8; font-style: italic;">"twittering-mode.el"</span> <span style="color: #ad7fa8; font-style: italic;">"undo-tree.el"</span> <span style="color: #ad7fa8; font-style: italic;">"verbiste.el"</span> <span style="color: #ad7fa8; font-style: italic;">"vimpulse-surround.el"</span> <span style="color: #ad7fa8; font-style: italic;">"vimpulse.el"</span> <span style="color: #ad7fa8; font-style: italic;">"vkill.el"</span> <span style="color: #ad7fa8; font-style: italic;">"xcscope.el"</span> <span style="color: #ad7fa8; font-style: italic;">"xml-rpc-el.el"</span> <span style="color: #ad7fa8; font-style: italic;">"yasnippet.el"</span>) </pre>

<h3>Ok, I want to try it, what's next?</h3> <p class="first">Visit the following URL <a href="http://github.com/dimitri/el-get">http://github.com/dimitri/el-get</a> and follow the install instructions. You're given a <em>scratch installer</em> there, that's some <em>elisp</em> code you copy paste into *scratch* then execute there, and you have el-get ready to serve.</p> <p>An excellent idea I stole at ELPA!</p> <h3>Hey, I already know what el-get is, what's new in 1.0?</h3> <p class="first">The <em>changelog</em> is quite full of good stuff, really:</p> <ul> <li>Implement el-get recipes so that el-get-sources can be a simple list of symbols. Now that there's an authoritative git repository, where to share the recipes is easy.</li> <li>Add support for emacswiki directly, save from having to enter the URL</li> <li>Implement package status on-disk saving so that installing over a previously failed install is in theory possible. Currently `el-get' will refrain from removing your package automatically, though.</li> <li>Fix ELPA remove method, adding a &quot;removed&quot; state too.</li> <li>Implement CVS login support.</li> <li>Add lots of recipes</li> <li>Add support for `system-type' specific build commands</li> <li>Byte compile files from the load-path entries or :compile files</li> <li>Implement support for git submodules with the command `git submodule update &mdash;init &mdash;recursive`</li> <li>Add catch-all post-install and post-update hooks</li> <li>Add desktop notification on install/update.</li> </ul> <h3>I'm still using the deprecated emacswiki version, what now?</h3> <p class="first">That version didn't have recipes, and the new version should be perfectly happy with your current el-get-sources, so that I recommend using the <em>scratch installer</em> too. Don't forget to add el-get itself into your el-get-sources list, of course!</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/muse.html">Muse</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/switch-window.html">switch-window</a> <a href="../../../tags/cssh.html">cssh</a> <a href="../../../tags/mailq.html">mailq</a> <a href="../../../tags/rcirc.html">rcirc</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 07 Oct 2010 13:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/10/07-el-get-reaches-10.html</guid> </item>

<item>

<title>Regexp performances and Finite Automata</title> <link>http://tapoueh.org/blog/2010/09/blog/2010/09/26-regexp-performances-and-finite-automata.html</link> <description><![CDATA[<p><span class="hack"> </span></p>

<p>The major reason why I dislike <a href="http://www.perl.org/">perl</a> so much, and <a href="http://www.ruby-lang.org">ruby</a> too, and the thing I'd want different in the <a href="http://www.gnu.org/software/emacs/manual/elisp.html">Emacs Lisp</a> API so far is how they set developers mind into using <a href="http://www.regular-expressions.info/">regexp</a>. You know the quote, don't you?</p> <blockquote> <p class="quoted"> Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.</p> </blockquote> <p>That said, some situations require the use of <em>regexp</em> — or are so much simpler to solve using them than the maintenance hell you're building here ain't that big a drag. The given expressiveness is hard to match with any other solution, to the point I sometime use them in my code (well I use <a href="http://www.emacswiki.org/emacs/rx">rx</a> to lower the burden sometime, just see this example).</p> <pre class="src"> (rx bol (zero-or-more blank) (one-or-more digit) <span style="color: #bc8f8f;">":"</span>) <span style="color: #bc8f8f;">"^:blank:*:digit:+:"</span> </pre> <p>The thing you might want to know about <em>regexp</em> is that computing them is an heavy task usually involving <em>parsing</em> their representation, <em>compiling</em> it to some executable code, and then <em>executing</em> generated code. It's been showed in the past (as soon as 1968) that a <em>regexp</em> is just another way to write a finite automata, at least as soon as you don't need <em>backtracking</em>. The writing of this article is my reaction to reading <a href="http://swtch.com/~rsc/regexp/regexp1.html">Regular Expression Matching Can Be Simple And Fast</a> (but is slow in Java, Perl, PHP, Python, Ruby, ...), a very interesting article — see the benchmarks in there.</p> <p>The bulk of it is that we find mainly two categories of <em>regexp</em> engine in the wild, those that are using <a href="http://en.wikipedia.org/wiki/Nondeterministic_finite_state_machine">NFA</a> and <a href="http://en.wikipedia.org/wiki/Deterministic_finite_automaton">DFA</a> intermediate representation techniques, and the others. Our beloved <a href="http://www.postgresql.org/">PostgreSQL</a> sure offers the feature, it's the ~ and ~* <a href="http://www.postgresql.org/docs/9.0/interactive/functions-matching.html">operators</a>. The implementation here is based on <a href="http://www.arglist.com/regex/">Henry Spencer</a>'s work, which the aforementioned article says</p> <blockquote> <p class="quoted"> became very widely used, eventually serving as the basis for the slow regular expression implementations mentioned earlier: Perl, PCRE, Python, and so on.</p> </blockquote> <p>Having a look at the actual implementation shows that indeed, current PostgreSQL code for <em>regexp</em> matching uses intermediate representations of them as NFA and DFA. The code is quite complex, even more than I though it would be, and I didn't have the time it would take to check it against the proposed one from the <em>simple and fast</em> article.</p> <pre class="src"> postgresql/src/backend/regex <p>So all in all, I'll continue avoiding <em>regexp</em> as much as I currently do, and will maintain my tendency to using <a href="http://www.gnu.org/manual/gawk/gawk.html">awk</a> when I need them on files (it allows to refine the searching without resorting to more and more pipes in the command line). And as far as resorting to using <em>regexp</em> in PostgreSQL is concerned, it seems that the code here is already about topnotch. Once more.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Sun, 26 Sep 2010 21:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/blog/2010/09/26-regexp-performances-and-finite-automata.html</guid> </item> <item> <title>Regexp performances and Finite Automata</title> <link>http://tapoueh.org/blog/2010/09/26-regexp-performances-and-finite-automata.html</link> <description><![CDATA[h1>Regexp performances and Finite Automata</h1>

Sunday, September 26 2010, 21:00 </div>
<p><span class="hack"> </span></p>

<p>The major reason why I dislike <a href="http://www.perl.org/">perl</a> so much, and <a href="http://www.ruby-lang.org">ruby</a> too, and the thing I'd want different in the <a href="http://www.gnu.org/software/emacs/manual/elisp.html">Emacs Lisp</a> API so far is how they set developers mind into using <a href="http://www.regular-expressions.info/">regexp</a>. You know the quote, don't you?</p> <blockquote> <p class="quoted"> Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.</p> </blockquote> <p>That said, some situations require the use of <em>regexp</em> — or are so much simpler to solve using them than the maintenance hell you're building here ain't that big a drag. The given expressiveness is hard to match with any other solution, to the point I sometime use them in my code (well I use <a href="http://www.emacswiki.org/emacs/rx">rx</a> to lower the burden sometime, just see this example).</p> <pre class="src"> (rx bol (zero-or-more blank) (one-or-more digit) <span style="color: #ad7fa8; font-style: italic;">":"</span>) <span style="color: #ad7fa8; font-style: italic;">"^:blank:*:digit:+:"</span> </pre> <p>The thing you might want to know about <em>regexp</em> is that computing them is an heavy task usually involving <em>parsing</em> their representation, <em>compiling</em> it to some executable code, and then <em>executing</em> generated code. It's been showed in the past (as soon as 1968) that a <em>regexp</em> is just another way to write a finite automata, at least as soon as you don't need <em>backtracking</em>. The writing of this article is my reaction to reading <a href="http://swtch.com/~rsc/regexp/regexp1.html">Regular Expression Matching Can Be Simple And Fast</a> (but is slow in Java, Perl, PHP, Python, Ruby, ...), a very interesting article — see the benchmarks in there.</p> <p>The bulk of it is that we find mainly two categories of <em>regexp</em> engine in the wild, those that are using <a href="http://en.wikipedia.org/wiki/Nondeterministic_finite_state_machine">NFA</a> and <a href="http://en.wikipedia.org/wiki/Deterministic_finite_automaton">DFA</a> intermediate representation techniques, and the others. Our beloved <a href="http://www.postgresql.org/">PostgreSQL</a> sure offers the feature, it's the ~ and ~* <a href="http://www.postgresql.org/docs/9.0/interactive/functions-matching.html">operators</a>. The implementation here is based on <a href="http://www.arglist.com/regex/">Henry Spencer</a>'s work, which the aforementioned article says</p> <blockquote> <p class="quoted"> became very widely used, eventually serving as the basis for the slow regular expression implementations mentioned earlier: Perl, PCRE, Python, and so on.</p> </blockquote> <p>Having a look at the actual implementation shows that indeed, current PostgreSQL code for <em>regexp</em> matching uses intermediate representations of them as NFA and DFA. The code is quite complex, even more than I though it would be, and I didn't have the time it would take to check it against the proposed one from the <em>simple and fast</em> article.</p> <pre class="src"> postgresql/src/backend/regex <p>So all in all, I'll continue avoiding <em>regexp</em> as much as I currently do, and will maintain my tendency to using <a href="http://www.gnu.org/manual/gawk/gawk.html">awk</a> when I need them on files (it allows to refine the searching without resorting to more and more pipes in the command line). And as far as resorting to using <em>regexp</em> in PostgreSQL is concerned, it seems that the code here is already about topnotch. Once more.</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/emacs.html">Emacs</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Sun, 26 Sep 2010 21:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/26-regexp-performances-and-finite-automata.html</guid> </item> <item> <title>Postfix sender_dependent_relayhost_maps</title> <link>http://tapoueh.org/blog/2010/09/23-postfix-sender_dependent_relayhost_maps.html</link> <description><![CDATA[h1>Postfix sender_dependent_relayhost_maps</h1>

Thursday, September 23 2010, 14:30 </div>
<p><span class="hack"> </span></p>

<p>The previous article about <a href="http://tapoueh.org/articles/news/_Scratch_that_itch:_M-x_mailq.html">M-x mailq</a> has raised several mails asking me details about the <a href="http://www.postfix.com/">Postfix</a> setup I'm talking about. The problem we're trying to solve is having a local MTA to send mails, so that any old-style Unix tool just works, instead of only the MUA you've spent time setting up.</p> <p>Postfix makes it possible to do that quite easily, but it gets a little more involved if you have more than one <em>relayhost</em> that you want to use depending on your current <em>From</em> address. Think personal email against work email, or avoiding your ISP network when sending your private mails, <em>hoping</em> directly on a server you own or trust.</p> <p>So how do you do just that? Let's see the relevant parts of main.cf.</p> <pre class="src"> relayhost = your.default.relay.host.here relay_domains = domain.org, work-domain.com, other-domain.info smtp_sender_dependent_authentication = yes sender_dependent_relayhost_maps = hash:/etc/postfix/relaymap </pre> <p>The relaymap looks like this:</p> <pre class="src"> <span style="color: #888a85;"># </span><span style="color: #888a85;">comments </span>user@domain.org mail.domain.org local@work-domain.com smtp.work-domain.com <span style="color: #888a85;"># </span><span style="color: #888a85;">that requires a local tunnel started with ssh, see ~/.ssh/config </span>me@other-domain.info [127.0.0.1]:10025 </pre> <p>You need to use <a href="http://www.postfix.org/postmap.1.html">postmap</a> on this file before to reload or restart your local instance of Postfix.</p> <p>Also, you should want to crypt your communication to your preferred relay host, using TLS goes like this:</p> <pre class="src"> smtp_sasl_auth_enable=yes smtp_sasl_password_maps=hash:/etc/postfix/sasl-passwords smtp_sasl_mechanism_filter = digest-md5 smtp_sasl_security_options = noanonymous smtp_sasl_mechanism_filter = login, plain smtp_sasl_type = cyrus

smtp_tls_session_cache_database = btree:${queue_directory}/smtp_scache smtp_tls_loglevel = 2 smtp_use_tls = yes smtp_tls_security_level = may </pre>

<p>The password file will need to get parsed by postmap too, and would better be set with limited read access, and looks like this:</p> <pre class="src"> mail.domain.org user@domain.org:password smtp.work-domain.com local@work-domain.com:h4ckm3 [<span style="color: #8ae234; font-weight: bold;">127.0.0.1</span>]:10025 me@other-domain.info:guess </pre> <p>Hope this help you get started, at least that's a document I would have enjoyed reading when I first started to setup my local relaying MTA.</p> <p>Oh, and now that you have this, I hope you will enjoy my M-x mailq tool for occasions when you're wondering why you're not receiving an answer back yet, then start the ssh tunnel…</p> <h2>Tags</h2> <p><a href="../../../tags/mailq.html">mailq</a> <a href="../../../tags/postfix.html">postfix</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 23 Sep 2010 14:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/23-postfix-sender_dependent_relayhost_maps.html</guid> </item>

<item>

<title>Scratch that itch: M-x mailq</title> <link>http://tapoueh.org/blog/2010/09/23-scratch-that-itch-m-x-mailq.html</link> <description><![CDATA[h1>Scratch that itch: M-x mailq</h1>

Thursday, September 23 2010, 09:30 </div>
<p>Nowadays, most people would think that email is something simple, you just setup your preferred client (that's called a MUA) with some information such as the smtp host you want it to talk to (that's call a MTA and this one is your relayhost). Then there's all the receiving mails part, and that's smtp again on the server side. Then there's how to get those mail, read them, flag them, manage them, and that's better served by IMAP. Let's talk about sending mails in smtp for this entry.</p>

<p>The traditional way to handle mail sending is to have your own MTA on each system you use — there used to be a <em>sysadmin</em> team caring about all those systems, but we're lost in the personal computer era now — that only means <strong><em>you</em></strong> are the sysadmin. So about any Unix tool that wants to send a mail will do so with the command /usr/bin/sendmail to queue the outgoing message.</p> <p>My typical <em>workstation</em> setup includes a full-blown MTA (my choice is <a href="http://www.postfix.com/">Postfix</a>) that will choose the next relay host depending on the message <em>From</em> field: I don't want to trust any local default relayhost. Note that the next relay is connected to with authentication and over an encrypted protocol.</p> <blockquote> <p class="quoted"> We're getting there, really. But I don't know a better way to present a software, little as it be, other than talking about the need that leads to its development.</p> </blockquote> <p>Some relaying I do atop an ssh tunnel, and it happens that I send mail and have forgotten about setting up the aforementioned tunnel. In this case, the advantage is that it will not block my MUA (<a href="http://gnus.org/">gnus</a>, in quite good shape those days, receiving lots of love), as the queueing happens as usual. The drawback is that <a href="http://www.postfix.com/">Postfix</a> will <em>silently</em> queue the mail until it's able to deliver it, which can take days.</p> <p>Enters M-x mailq! Ok, I could be doing M-! mailq and see <em>Mail queue is empty</em> in the message area, but then as soon as the queue's not empty I need to resort to some <em>shell</em> or <em>terminal</em> in order to <em>flush</em> the queue — that's after setting up the tunnel, as easy as C-= remote in my case, see <a href="http://github.com/dimitri/cssh">cssh</a>. Scratching that itch, I now only have to hit f here, to flush the queue. And from the <em>gnus</em> *Group* and *Summary* buffers, it's M-q to see the mail queue.</p> <p>Thanks to <a href="http://forum.ubuntu-fr.org/viewtopic.php?id=218883">http://forum.ubuntu-fr.org/viewtopic.php?id=218883</a> here's a visual sample of the mailq mode, where you see the mail queue in colors and the <em>keymap</em> you're offered.</p> <center> <p><img src="../../../images//mailq-el.png" alt=""></p> </center> <p>So you could even <em>flush</em> only a given queue id or a given site, or just <em>kill</em> the current id or the current site so that it's a C-y away. I hope it's useful for you too — oh, and it's already in the <a href="http://github.com/dimitri/el-get">el-get</a> recipes, of course!</p> <h2>Tags</h2> <p><a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/cssh.html">cssh</a> <a href="../../../tags/mailq.html">mailq</a> <a href="../../../tags/postfix.html">postfix</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 23 Sep 2010 09:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/23-scratch-that-itch-m-x-mailq.html</guid> </item> <item> <title>switch-window reaches 0.8</title> <link>http://tapoueh.org/blog/2010/09/blog/2010/09/13-switch-window-reaches-08.html</link> <description><![CDATA[<p>I wanted to play with the idea of using the whole keyboard for my <a href="http://github.com/dimitri/switch-window">switch-window</a> utility, but wondered how to get those keys in the right order and all. Finally found quail-keyboard-layout which seems to exists for such uses, as you can see:</p>

<pre class="src"> (<span style="color: #7f007f;">loop</span> with layout = (split-string quail-keyboard-layout <span style="color: #bc8f8f;">""</span>)

for row from 1 to 4 collect (<span style="color: #7f007f;">loop</span> for col from 1 to 12 (<span style="color: #bc8f8f;">"q"</span> <span style="color: #bc8f8f;">"w"</span> <span style="color: #bc8f8f;">"e"</span> <span style="color: #bc8f8f;">"r"</span> <span style="color: #bc8f8f;">"t"</span> <span style="color: #bc8f8f;">"y"</span> <span style="color: #bc8f8f;">"u"</span> <span style="color: #bc8f8f;">"i"</span> <span style="color: #bc8f8f;">"o"</span> <span style="color: #bc8f8f;">"p"</span> <span style="color: #bc8f8f;">"["</span> <span style="color: #bc8f8f;">"]"</span>) (<span style="color: #bc8f8f;">"a"</span> <span style="color: #bc8f8f;">"s"</span> <span style="color: #bc8f8f;">"d"</span> <span style="color: #bc8f8f;">"f"</span> <span style="color: #bc8f8f;">"g"</span> <span style="color: #bc8f8f;">"h"</span> <span style="color: #bc8f8f;">"j"</span> <span style="color: #bc8f8f;">"k"</span> <span style="color: #bc8f8f;">"l"</span> <span style="color: #bc8f8f;">";"</span> <span style="color: #bc8f8f;">"'"</span> <span style="color: #bc8f8f;">"\\"</span>) (<span style="color: #bc8f8f;">"z"</span> <span style="color: #bc8f8f;">"x"</span> <span style="color: #bc8f8f;">"c"</span> <span style="color: #bc8f8f;">"v"</span> <span style="color: #bc8f8f;">"b"</span> <span style="color: #bc8f8f;">"n"</span> <span style="color: #bc8f8f;">"m"</span> <span style="color: #bc8f8f;">","</span> <span style="color: #bc8f8f;">"."</span> <span style="color: #bc8f8f;">"/"</span> <span style="color: #bc8f8f;">" "</span> <span style="color: #bc8f8f;">" "</span>)) </pre>

<p>So now switch-window will use that (but only the first 10 letters) instead of <em>hard-coding</em> numbers from 1 to 9 as labels and direct switches. That makes it more suitable to <a href="http://github.com/dimitri/cssh">cssh</a> users too, I guess.</p> <p>In other news, I think <a href="http://github.com/dimitri/el-get">el-get</a> is about ready for its 1.0 release. Please test it and report any problem very soon before the release!</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 13 Sep 2010 17:45:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/blog/2010/09/13-switch-window-reaches-08.html</guid> </item> <item> <title>switch-window reaches 0.8</title> <link>http://tapoueh.org/blog/2010/09/13-switch-window-reaches-08.html</link> <description><![CDATA[h1>switch-window reaches 0.8</h1>

Monday, September 13 2010, 17:45 </div>
<p>I wanted to play with the idea of using the whole keyboard for my <a href="http://github.com/dimitri/switch-window">switch-window</a> utility, but wondered how to get those keys in the right order and all. Finally found quail-keyboard-layout which seems to exists for such uses, as you can see:</p>

<pre class="src"> (<span style="color: #729fcf; font-weight: bold;">loop</span> with layout = (split-string quail-keyboard-layout <span style="color: #ad7fa8; font-style: italic;">""</span>)

for row from 1 to 4 collect (<span style="color: #729fcf; font-weight: bold;">loop</span> for col from 1 to 12 (<span style="color: #ad7fa8; font-style: italic;">"q"</span> <span style="color: #ad7fa8; font-style: italic;">"w"</span> <span style="color: #ad7fa8; font-style: italic;">"e"</span> <span style="color: #ad7fa8; font-style: italic;">"r"</span> <span style="color: #ad7fa8; font-style: italic;">"t"</span> <span style="color: #ad7fa8; font-style: italic;">"y"</span> <span style="color: #ad7fa8; font-style: italic;">"u"</span> <span style="color: #ad7fa8; font-style: italic;">"i"</span> <span style="color: #ad7fa8; font-style: italic;">"o"</span> <span style="color: #ad7fa8; font-style: italic;">"p"</span> <span style="color: #ad7fa8; font-style: italic;">"["</span> <span style="color: #ad7fa8; font-style: italic;">"]"</span>) (<span style="color: #ad7fa8; font-style: italic;">"a"</span> <span style="color: #ad7fa8; font-style: italic;">"s"</span> <span style="color: #ad7fa8; font-style: italic;">"d"</span> <span style="color: #ad7fa8; font-style: italic;">"f"</span> <span style="color: #ad7fa8; font-style: italic;">"g"</span> <span style="color: #ad7fa8; font-style: italic;">"h"</span> <span style="color: #ad7fa8; font-style: italic;">"j"</span> <span style="color: #ad7fa8; font-style: italic;">"k"</span> <span style="color: #ad7fa8; font-style: italic;">"l"</span> <span style="color: #ad7fa8; font-style: italic;">";"</span> <span style="color: #ad7fa8; font-style: italic;">"'"</span> <span style="color: #ad7fa8; font-style: italic;">"\\"</span>) (<span style="color: #ad7fa8; font-style: italic;">"z"</span> <span style="color: #ad7fa8; font-style: italic;">"x"</span> <span style="color: #ad7fa8; font-style: italic;">"c"</span> <span style="color: #ad7fa8; font-style: italic;">"v"</span> <span style="color: #ad7fa8; font-style: italic;">"b"</span> <span style="color: #ad7fa8; font-style: italic;">"n"</span> <span style="color: #ad7fa8; font-style: italic;">"m"</span> <span style="color: #ad7fa8; font-style: italic;">","</span> <span style="color: #ad7fa8; font-style: italic;">"."</span> <span style="color: #ad7fa8; font-style: italic;">"/"</span> <span style="color: #ad7fa8; font-style: italic;">" "</span> <span style="color: #ad7fa8; font-style: italic;">" "</span>)) </pre>

<p>So now switch-window will use that (but only the first 10 letters) instead of <em>hard-coding</em> numbers from 1 to 9 as labels and direct switches. That makes it more suitable to <a href="http://github.com/dimitri/cssh">cssh</a> users too, I guess.</p> <p>In other news, I think <a href="http://github.com/dimitri/el-get">el-get</a> is about ready for its 1.0 release. Please test it and report any problem very soon before the release!</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/switch-window.html">switch-window</a> <a href="../../../tags/cssh.html">cssh</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 13 Sep 2010 17:45:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/13-switch-window-reaches-08.html</guid> </item> <item> <title>Window Functions example remix</title> <link>http://tapoueh.org/blog/2010/09/12-window-functions-example-remix.html</link> <description><![CDATA[h1>Window Functions example remix</h1>

Sunday, September 12 2010, 21:35 </div>
<p><span class="hack"> </span></p>

<p>The drawback of hosting a static only website is, obviously, the lack of comments. What happens actually, though, is that I receive very few comments by direct mail. As I don't get another <em>spam</em> source to cleanup, I'm left unconvinced that's such a drawback. I still miss the low probability of seeing blog readers exchange directly, but I think a tapoueh.org mailing list would be my answer, here...</p> <p>Anyway, <a href="http://people.planetpostgresql.org/dfetter/">David Fetter</a> took the time to send me a comment by mail with a cleaned up rewrite of the previous entry SQL, here's it for your pleasure!</p> <pre class="src"> WITH t AS (

SELECT o, w, CASE WHEN LAG(w) OVER(w) IS DISTINCT FROM w AND ROW_NUMBER() OVER (w) &gt; 1 <span style="color: #888a85;">/* Eliminate first change /</span> THEN 1 END AS change FROM ( VALUES (1, 5), (2, 10), (3, 7), (4, 7), (5, 7) ) AS data(o, w) WINDOW w AS (ORDER BY o) <span style="color: #888a85;">/ Factor out WINDOW */</span> ) SELECT SUM(change) FROM t; </pre>

<p>As you can see <strong><em>David</em></strong> chose to filter the first change in the subquery rather than hacking it away with a simple -1 at the outer level. I'm still wondering which way is cleaner (that depends on how you look at the problem), but I think I know which one is simpler! Thanks <strong><em>David</em></strong> for this blog entry!</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Sun, 12 Sep 2010 21:35:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/12-window-functions-example-remix.html</guid> </item> <item> <title>Window Functions example</title> <link>http://tapoueh.org/blog/2010/09/09-window-functions-example.html</link> <description><![CDATA[h1>Window Functions example</h1>

Thursday, September 09 2010, 16:35 </div>
<p>So, when 8.4 came out there was all those comments about how getting <a href="http://www.postgresql.org/docs/8.4/interactive/tutorial-window.html">window functions</a> was an awesome addition. Now, it seems that a lot of people seeking for help in <a href="http://wiki.postgresql.org/index.php?title=IRC">#postgresql</a> just don't know what kind of problem this feature helps solving. I've already been using them in some cases here in this blog, for getting some nice overview about <a href="http://tapoueh.org/articles/blog/_Partitioning:_relation_size_per_%E2%80%9Cgroup%E2%80%9D.html">Partitioning: relation size per “group”</a>.</p>

<p>Now, another example use case rose on IRC today. I'll quote directly our user here:</p> <blockquote> <p class="quoted"> hey there, how can i count the number of (value) changes in one column?</p> <p class="quoted"> example: a table with a column <em>weight</em>. let's say we have 5 rows, having the following values for weight: 5, 10, 7, 7, 7. the number of changes of weight would be 2 here (from 5 to 10 and 10 to 7). any idea how I could do that in SQL using PGSQL 8.4.4? GROUP BY or count(distinct weight) obviously does not work. thx in advance</p> </blockquote> <p>Now, several of us began talking about <em>window functions</em> and about the fact that you need some other column to identify the ordering of those weights, obviously, because that's the only way to define what a change is in this context. Let's have a first try at it.</p> <pre class="src"> =# select o, w,

case when lag(w) over(order by o) is distinct from w then 1 end as change from (values (1, 5), (2, 10), (3, 7), (4, 7), (5, 7)) as data(o, w);

o w change

<span style="color: #888a85;">—+—-+———

</span> 1 5 1
2 10 1
3 7 1
4 7
5 7

(5 rows) </pre>

<p>Not too bad, but of course we are seeing a false change on the first line, as for any <em>window</em> of rows you define the previous one, given by lag() over(), will be NULL. The easiest way to accommodate is the following:</p> <pre class="src"> =# select sum(change) -1 as changes

from (select case when lag(w) over(order by o) is distinct from w then 1 end as change from (values (1, 5), (2, 10), (3, 7), (4, 7), (5, 7)) as t(o, w)) as x; changes <span style="color: #888a85;">——— </span> 2 (1 row) </pre>

<p>So don't be shy and go read about <a href="http://www.postgresql.org/docs/8.4/interactive/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS">window functions in SQL expressions</a> and <a href="http://www.postgresql.org/docs/8.4/interactive/queries-table-expressions.html#QUERIES-WINDOW">window function processing</a> in the query table expressions. That's a very nice tool to have and my guess is that you will soon enough realize the only reason why you could think you don't have a need for them is that you didn't know it existed, and what you can do with it. <em>Sharpen your saw!</em> :)</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 09 Sep 2010 16:35:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/09-window-functions-example.html</guid> </item> <item> <title>Synchronous Replication</title> <link>http://tapoueh.org/blog/2010/09/06-synchronous-replication.html</link> <description><![CDATA[h1>Synchronous Replication</h1>

Monday, September 06 2010, 18:05 </div>
<p>Although the new asynchronous replication facility that ships with 9.0 ain't released to the wide public yet, our hackers hero are already working on the synchronous version of it. A part of the facility is rather easy to design, we want something comparable to <a href="http://www.drbd.org/">DRBD</a> flexibility, but specific to our database world. So <em>synchronous</em> would either mean <em>recv</em>, <em>fsync</em> or <em>apply</em>, depending on what you need the <em>standby</em> to have already done when the master acknowledges the COMMIT. Let's call that the <em>service level</em>.</p>

<p>The part of the design that's not so easy is more interesting. Do we need to register standbys and have the <em>service level</em> setup per standby? Can we get some more flexibility and have the <em>service level</em> set on a per-transaction basis? The idea here would be that the application knows which transactions are meant to be extra-safe and which are not, the same way that you can set synchronous_commit to off when dealing with web sessions, for example.</p> <p><em>Why choosing?</em> I hear you ask. Well, it's all about having more data safety, and a typical setup would contain an asynchronous reporting server and a local <em>failover</em> synchronous server. Then add a remote one, too. So even if we pick the transaction based facility, we still want to be able to choose at setup time which server to failover to. Than means we don't want that much flexibility now, we want to know where the data is safe, we don't want to have to guess.</p> <p>Some way to solve that is to be able to setup a slave as being the failover one, or say, the sync one. Now, the detail that ruins it all is that we need a <em>timeout</em> to handle worst cases when a given slave loses its connectivity (or power, say). Now, the slave ain't in <em>sync</em> any more and some people will require that the service is still available (<em>timeout</em> but COMMIT) and some will require that the service is down: don't accept a new transaction if you can't make its data safe to the slave too.</p> <p>The answer would be to have the master arbitrate between what the transaction wants and what the slave is setup to provide, and what it's able to provide at the time of the transaction. Given a transaction with a <em>service level</em> of <em>apply</em> and a slave setup for being <em>async</em>, the COMMIT does not have to wait, because there's no known slave able to offer the needed level. Or the COMMIT can not happen, for the very same reason.</p> <p>Then I think it all flows quite naturally from there, and while arbitrating the master could record which slave is currently offering what <em>service level</em>. And offering the information in a system view too, of course.</p> <p>The big question that's not answered in this proposal is how to setup that being unable to reach the wanted <em>service level</em> is an error or a warning?</p> <p>That too would need to be for the master to arbitrate based on a per standby and a per transaction setting, and in the general case it could be a <em>quorum</em> setup: each slave is given a <em>weight</em> and each transaction a <em>quorum</em> to reach. The master sums up the weights of the standby that ack the transaction at the needed <em>service level</em> and the COMMIT happens as soon as the quorum is reached, or is canceled as soon as the <em>timeout</em> is reached, whichever comes first.</p> <p>Such a model allows for very flexible setups, where each standby has a <em>weight</em> and offers a given <em>service level</em>, and each transaction waits until a <em>quorum</em> is reached. Giving the right weights to your standbys (like, powers of two) allow you to set the quorum in a way that only one given standby is able to acknowledge the most important transactions. But that's flexible enough you can change it at any time, it's just a <em>weight</em> that allows a <em>sum</em> to be made, so my guess would be it ends up in the <em>feedback loop</em> between the standby and its master.</p> <p>The most appealing part of this proposal is that it doesn't look complex to implement, and should allow for highly flexible setups. Of course, the devil is in the details, and we're talking about latencies in the distributed system here. That's also being discussed on the <a href="http://archives.postgresql.org/pgsql-hackers/">mailing list</a>.</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/release.html">release</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 06 Sep 2010 18:05:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/06-synchronous-replication.html</guid> </item>

<item>

<title>Want to share your recipes?</title> <link>http://tapoueh.org/blog/2010/08/blog/2010/08/31-want-to-share-your-recipes.html</link> <description><![CDATA[<p>Yes, that's another <a href="http://github.com/dimitri/el-get/">el-get</a> related entry. It seems to take a lot of my attention these days. After having setup the git repository so that you can update el-get from within itself (so that it's <em>self-contained</em>), the next logical step is providing <em>recipes</em>.</p>

<p>By that I mean that el-get-sources entries will certainly look a lot alike between a user and another. Let's take the el-get entry itself:</p> <pre class="src"> (<span style="color: #da70d6;">:name</span> el-get

<span style="color: #da70d6;">:type</span> git <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"git://github.com/dimitri/el-get.git"</span> <span style="color: #da70d6;">:features</span> <span style="color: #bc8f8f;">"el-get"</span>) </pre>

<p>I guess all el-get users will have just the same 4 lines in their el-get-sources. So let's call that a <em>recipe</em>, and have el-get look for yours into the el-get-recipe-path directories. A recipe is found looking in those directories in order, and must be named package.el. Now, el-get already contains a handful of them, as you can see:</p> <pre class="src"> ELISP&gt; (directory-files <span style="color: #bc8f8f;">"~/dev/emacs/el-get/recipes/"</span> nil <span style="color: #bc8f8f;">"[</span><span style="color: #bc8f8f;">^</span><span style="color: #bc8f8f;">.]$"</span>) (<span style="color: #bc8f8f;">"auctex.el"</span> <span style="color: #bc8f8f;">"bbdb.el"</span> <span style="color: #bc8f8f;">"cssh.el"</span> <span style="color: #bc8f8f;">"el-get.el"</span> <span style="color: #bc8f8f;">"emms.el"</span> <span style="color: #bc8f8f;">"erc-track-score.el"</span>

<span style="color: #bc8f8f;">"escreen.el"</span> <span style="color: #bc8f8f;">"google-maps.el"</span> <span style="color: #bc8f8f;">"haskell-mode.el"</span> <span style="color: #bc8f8f;">"hl-sexp.el"</span> <span style="color: #bc8f8f;">"magit.el"</span> <span style="color: #bc8f8f;">"muse-blog.el"</span> <span style="color: #bc8f8f;">"nxhtml.el"</span> <span style="color: #bc8f8f;">"psvn.el"</span> <span style="color: #bc8f8f;">"rainbow-mode.el"</span> <span style="color: #bc8f8f;">"rcirc-groups.el"</span> <span style="color: #bc8f8f;">"vkill.el"</span> <span style="color: #bc8f8f;">"xcscope.el"</span> <span style="color: #bc8f8f;">"xml-rpc-el.el"</span> <span style="color: #bc8f8f;">"yasnippet.el"</span>) </pre>

<p>Please note that you can have your own local recipes by adding directories to el-get-recipe-path. So now your minimalistic el-get-sources list will look like '(el-get cssh screen), say. And if you want to override a recipe, for instance to use the default one but still have a personal :after function containing your own setup, then simply have your el-get-source entry a partial entry. Missing :type and el-get will merge your local overrides atop the default one.</p> <p>Finally, the way to share your recipes is by sending me an email with the file, or to do the same over the github interface, I guess I'll still receive a mail then.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 31 Aug 2010 14:15:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/blog/2010/08/31-want-to-share-your-recipes.html</guid> </item> <item> <title>Want to share your recipes?</title> <link>http://tapoueh.org/blog/2010/08/31-want-to-share-your-recipes.html</link> <description><![CDATA[h1>Want to share your recipes?</h1>

Tuesday, August 31 2010, 14:15 </div>
<p>Yes, that's another <a href="http://github.com/dimitri/el-get/">el-get</a> related entry. It seems to take a lot of my attention these days. After having setup the git repository so that you can update el-get from within itself (so that it's <em>self-contained</em>), the next logical step is providing <em>recipes</em>.</p>

<p>By that I mean that el-get-sources entries will certainly look a lot alike between a user and another. Let's take the el-get entry itself:</p> <pre class="src"> (<span style="color: #729fcf;">:name</span> el-get

<span style="color: #729fcf;">:type</span> git <span style="color: #729fcf;">:url</span> <span style="color: #ad7fa8; font-style: italic;">"git://github.com/dimitri/el-get.git"</span> <span style="color: #729fcf;">:features</span> <span style="color: #ad7fa8; font-style: italic;">"el-get"</span>) </pre>

<p>I guess all el-get users will have just the same 4 lines in their el-get-sources. So let's call that a <em>recipe</em>, and have el-get look for yours into the el-get-recipe-path directories. A recipe is found looking in those directories in order, and must be named package.el. Now, el-get already contains a handful of them, as you can see:</p> <pre class="src"> ELISP&gt; (directory-files <span style="color: #ad7fa8; font-style: italic;">"~/dev/emacs/el-get/recipes/"</span> nil <span style="color: #ad7fa8; font-style: italic;">"[</span><span style="color: #ad7fa8; font-style: italic;">^</span><span style="color: #ad7fa8; font-style: italic;">.]$"</span>) (<span style="color: #ad7fa8; font-style: italic;">"auctex.el"</span> <span style="color: #ad7fa8; font-style: italic;">"bbdb.el"</span> <span style="color: #ad7fa8; font-style: italic;">"cssh.el"</span> <span style="color: #ad7fa8; font-style: italic;">"el-get.el"</span> <span style="color: #ad7fa8; font-style: italic;">"emms.el"</span> <span style="color: #ad7fa8; font-style: italic;">"erc-track-score.el"</span>

<span style="color: #ad7fa8; font-style: italic;">"escreen.el"</span> <span style="color: #ad7fa8; font-style: italic;">"google-maps.el"</span> <span style="color: #ad7fa8; font-style: italic;">"haskell-mode.el"</span> <span style="color: #ad7fa8; font-style: italic;">"hl-sexp.el"</span> <span style="color: #ad7fa8; font-style: italic;">"magit.el"</span> <span style="color: #ad7fa8; font-style: italic;">"muse-blog.el"</span> <span style="color: #ad7fa8; font-style: italic;">"nxhtml.el"</span> <span style="color: #ad7fa8; font-style: italic;">"psvn.el"</span> <span style="color: #ad7fa8; font-style: italic;">"rainbow-mode.el"</span> <span style="color: #ad7fa8; font-style: italic;">"rcirc-groups.el"</span> <span style="color: #ad7fa8; font-style: italic;">"vkill.el"</span> <span style="color: #ad7fa8; font-style: italic;">"xcscope.el"</span> <span style="color: #ad7fa8; font-style: italic;">"xml-rpc-el.el"</span> <span style="color: #ad7fa8; font-style: italic;">"yasnippet.el"</span>) </pre>

<p>Please note that you can have your own local recipes by adding directories to el-get-recipe-path. So now your minimalistic el-get-sources list will look like '(el-get cssh screen), say. And if you want to override a recipe, for instance to use the default one but still have a personal :after function containing your own setup, then simply have your el-get-source entry a partial entry. Missing :type and el-get will merge your local overrides atop the default one.</p> <p>Finally, the way to share your recipes is by sending me an email with the file, or to do the same over the github interface, I guess I'll still receive a mail then.</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/muse.html">Muse</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/cssh.html">cssh</a> <a href="../../../tags/rcirc.html">rcirc</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 31 Aug 2010 14:15:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/31-want-to-share-your-recipes.html</guid> </item> <item> <title>Happy Numbers</title> <link>http://tapoueh.org/blog/2010/08/blog/2010/08/30-happy-numbers.html</link> <description><![CDATA[<p>After discovering the excellent <a href="http://gwene.org/">Gwene</a> service, which allows you to subscribe to <em>newsgroups</em> to read RSS content (<em>blogs</em>, <em>planets</em>, <em>commits</em>, etc), I came to read this nice article about <a href="http://programmingpraxis.com/2010/07/23/happy-numbers/">Happy Numbers</a>. That's a little problem that fits well an interview style question, so I first solved it yesterday evening in <a href="http://www.gnu.org/software/emacs/emacs-lisp-intro/html_node/List-Processing.html#List-Processing">Emacs Lisp</a> as that's the language I use the most those days.</p>

<blockquote> <p class="quoted"> A happy number is defined by the following process. Starting with any positive integer, replace the number by the sum of the squares of its digits, and repeat the process until the number equals 1 (where it will stay), or it loops endlessly in a cycle which does not include 1. Those numbers for which this process ends in 1 are happy numbers, while those that do not end in 1 are unhappy numbers (or sad numbers).</p> </blockquote> <p>Now, what about implementing the same in pure SQL, for more fun? Now that's interesting! After all, we didn't get WITH RECURSIVE for tree traversal only, <a href="http://archives.postgresql.org/message-id/e08cc0400911042333o5361b21cu2c9438f82b1e55ce@mail.gmail.com">did we</a>?</p> <p>Unfortunately, we need a little helper function first, if only to ease the reading of the recursive query. I didn't try to inline it, but here it goes:</p> <pre class="src"> create or replace function digits(x bigint)

returns setof int language sql as $$ select substring($1::text from i for 1)::int from generate_series(1, length($1::text)) as t(i) $$; </pre>

<p>That was easy: it will output one row per digit of the input number — and rather than resorting to powers of ten and divisions and remainders, we do use plain old text representation and substring. Now, to the real problem. If you're read what is an happy number and already did read the fine manual about <a href="http://www.postgresql.org/docs/8.4/interactive/queries-with.html">Recursive Query Evaluation</a>, it should be quite easy to read the following:</p> <pre class="src"> with recursive happy(n, seen) as (

select 7::bigint, <span style="color: #bc8f8f;">'{}'</span>::bigint[] union all

select sum(d*d), h.seen sum(d*d)

from (select n, digits(n) as d, seen from happy ) as h group by h.n, h.seen having not seen @&gt; array[sum(d*d)] ) select * from happy;

n seen

<span style="color: #b22222;">——+——————

</span> 7 {}
49 {49}
97 {49,97}
130 {49,97,130}
10 {49,97,130,10}
1 {49,97,130,10,1}

(6 rows)

Time: 1.238 ms </pre>

<p>That shows how it works for some <em>happy</em> number, and it's easy to test for a non-happy one, like for example 17. The query won't cycle thanks to the seen array and the having filter, so the only difference between an <em>happy</em> and a <em>sad</em> number will be that in the former case the last line output by the recursive query will have n = 1. Let's expand this knowledge into a proper function (because we want to be able to have the number we test for happiness as an argument):</p> <pre class="src"> create or replace function happy(x bigint)

returns boolean language sql as $$ with recursive happy(n, seen) as ( select $1, <span style="color: #bc8f8f;">'{}'</span>::bigint[] union all

select sum(d*d), h.seen sum(d*d)

from (select n, digits(n) as d, seen from happy ) as h group by h.n, h.seen having not seen @&gt; array[sum(d*d)] ) select n = 1 as happy from happy order by array_length(seen, 1) desc nulls last limit 1 $$; </pre>

<p>We need the desc nulls last trick in the order by because the array_length() of any dimension of an empty array is NULL, and we certainly don't want to return all and any number as unhappy on the grounds that the query result contains a line input, {}. Let's now play the same tricks as in the puzzle article:</p> <pre class="src"> # select array_agg(x) as happy from generate_series(1, 50) as t(x) where happy(x); happy <span style"color: #b22222;">———————————- </span> {1,7,10,13,19,23,28,31,32,44,49} (1 row)

Time: 24.527 ms

# explain analyze select x from generate_series(1, 10000) as t(x) where happy(x); QUERY PLAN <span style"color: #b22222;">———————————————————— </span> Function Scan on generate_series t

(cost=0.00..265.00 rows=333 width=4) (actual time=2.938..3651.019 rows=1442 loops=1) Filter: happy((x)::bigint) Total runtime: 3651.534 ms (3 rows)

Time: 3652.178 ms </pre>

<p>(Yes, I tricked the EXPLAIN ANALYZE output so that it fits on the page width here). For what it's worth, finding the first 10000 happy numbers in <em>Emacs Lisp</em> on the same laptop takes 2830 ms, also running a recursive version of the code.</p> <h3>Update, the Emacs Lisp version, inline:</h3> <pre class="src"> (<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">happy?</span> (<span style="color: #228b22;">&amp;optional</span> n seen)

<span style="color: #bc8f8f;">"return true when n is a happy number"</span> (interactive) (<span style="color: #7f007f;">let*</span> ((number (or n (read-from-minibuffer <span style="color: #bc8f8f;">"Is this number happy: "</span>))) (digits (mapcar 'string-to-int (subseq (split-string number <span style="color: #bc8f8f;">""</span>) 1 -1))) (squares (mapcar (<span style="color: #7f007f;">lambda</span> (x) (* x x)) digits)) (happiness (apply '+ squares))) (<span style="color: #7f007f;">cond</span> ((eq 1 happiness) t) ((memq happiness seen) nil) (t (happy? (number-to-string happiness) (push happiness seen))))))

(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">find-happy-numbers</span> (<span style="color: #228b22;">&amp;optional</span> limit)

<span style="color: #bc8f8f;">"find all happy numbers from 1 to limit"</span> (interactive) (<span style="color: #7f007f;">let</span> ((count (or limit (read-from-minibuffer <span style="color: #bc8f8f;">"List of happy numbers from 1 to: "</span>))) happy) (<span style="color: #7f007f;">dotimes</span> (n (string-to-int count)) (<span style="color: #7f007f;">when</span> (happy? (number-to-string (1+ n))) (push (1+ n) happy))) (nreverse happy))) </pre> ]]></description> <author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 30 Aug 2010 11:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/blog/2010/08/30-happy-numbers.html</guid> </item> <item> <title>Happy Numbers</title> <link>http://tapoueh.org/blog/2010/08/30-happy-numbers.html</link> <description><![CDATA[h1>Happy Numbers</h1>

Monday, August 30 2010, 11:00 </div>
<p>After discovering the excellent <a href="http://gwene.org/">Gwene</a> service, which allows you to subscribe to <em>newsgroups</em> to read RSS content (<em>blogs</em>, <em>planets</em>, <em>commits</em>, etc), I came to read this nice article about <a href="http://programmingpraxis.com/2010/07/23/happy-numbers/">Happy Numbers</a>. That's a little problem that fits well an interview style question, so I first solved it yesterday evening in <a href="http://www.gnu.org/software/emacs/emacs-lisp-intro/html_node/List-Processing.html#List-Processing">Emacs Lisp</a> as that's the language I use the most those days.</p>

<blockquote> <p class="quoted"> A happy number is defined by the following process. Starting with any positive integer, replace the number by the sum of the squares of its digits, and repeat the process until the number equals 1 (where it will stay), or it loops endlessly in a cycle which does not include 1. Those numbers for which this process ends in 1 are happy numbers, while those that do not end in 1 are unhappy numbers (or sad numbers).</p> </blockquote> <p>Now, what about implementing the same in pure SQL, for more fun? Now that's interesting! After all, we didn't get WITH RECURSIVE for tree traversal only, <a href="http://archives.postgresql.org/message-id/e08cc0400911042333o5361b21cu2c9438f82b1e55ce@mail.gmail.com">did we</a>?</p> <p>Unfortunately, we need a little helper function first, if only to ease the reading of the recursive query. I didn't try to inline it, but here it goes:</p> <pre class="src"> create or replace function digits(x bigint)

returns setof int language sql as $$ select substring($1::text from i for 1)::int from generate_series(1, length($1::text)) as t(i) $$; </pre>

<p>That was easy: it will output one row per digit of the input number — and rather than resorting to powers of ten and divisions and remainders, we do use plain old text representation and substring. Now, to the real problem. If you're read what is an happy number and already did read the fine manual about <a href="http://www.postgresql.org/docs/8.4/interactive/queries-with.html">Recursive Query Evaluation</a>, it should be quite easy to read the following:</p> <pre class="src"> with recursive happy(n, seen) as (

select 7::bigint, <span style="color: #ad7fa8; font-style: italic;">'{}'</span>::bigint[] union all

select sum(d*d), h.seen sum(d*d)

from (select n, digits(n) as d, seen from happy ) as h group by h.n, h.seen having not seen @&gt; array[sum(d*d)] ) select * from happy;

n seen

<span style="color: #888a85;">——+——————

</span> 7 {}
49 {49}
97 {49,97}
130 {49,97,130}
10 {49,97,130,10}
1 {49,97,130,10,1}

(6 rows)

Time: 1.238 ms </pre>

<p>That shows how it works for some <em>happy</em> number, and it's easy to test for a non-happy one, like for example 17. The query won't cycle thanks to the seen array and the having filter, so the only difference between an <em>happy</em> and a <em>sad</em> number will be that in the former case the last line output by the recursive query will have n = 1. Let's expand this knowledge into a proper function (because we want to be able to have the number we test for happiness as an argument):</p> <pre class="src"> create or replace function happy(x bigint)

returns boolean language sql as $$ with recursive happy(n, seen) as ( select $1, <span style="color: #ad7fa8; font-style: italic;">'{}'</span>::bigint[] union all

select sum(d*d), h.seen sum(d*d)

from (select n, digits(n) as d, seen from happy ) as h group by h.n, h.seen having not seen @&gt; array[sum(d*d)] ) select n = 1 as happy from happy order by array_length(seen, 1) desc nulls last limit 1 $$; </pre>

<p>We need the desc nulls last trick in the order by because the array_length() of any dimension of an empty array is NULL, and we certainly don't want to return all and any number as unhappy on the grounds that the query result contains a line input, {}. Let's now play the same tricks as in the puzzle article:</p> <pre class="src"> # select array_agg(x) as happy from generate_series(1, 50) as t(x) where happy(x); happy <span style"color: #888a85;">———————————- </span> {1,7,10,13,19,23,28,31,32,44,49} (1 row)

Time: 24.527 ms

# explain analyze select x from generate_series(1, 10000) as t(x) where happy(x); QUERY PLAN <span style"color: #888a85;">———————————————————— </span> Function Scan on generate_series t

(cost=0.00..265.00 rows=333 width=4) (actual time=2.938..3651.019 rows=1442 loops=1) Filter: happy((x)::bigint) Total runtime: 3651.534 ms (3 rows)

Time: 3652.178 ms </pre>

<p>(Yes, I tricked the EXPLAIN ANALYZE output so that it fits on the page width here). For what it's worth, finding the first 10000 happy numbers in <em>Emacs Lisp</em> on the same laptop takes 2830 ms, also running a recursive version of the code.</p> <h3>Update, the Emacs Lisp version, inline:</h3> <pre class="src"> (<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">happy?</span> (<span style="color: #8ae234; font-weight: bold;">&amp;optional</span> n seen)

<span style="color: #888a85;">"return true when n is a happy number"</span> (interactive) (<span style="color: #729fcf; font-weight: bold;">let*</span> ((number (or n (read-from-minibuffer <span style="color: #ad7fa8; font-style: italic;">"Is this number happy: "</span>))) (digits (mapcar 'string-to-int (subseq (split-string number <span style="color: #ad7fa8; font-style: italic;">""</span>) 1 -1))) (squares (mapcar (<span style="color: #729fcf; font-weight: bold;">lambda</span> (x) (* x x)) digits)) (happiness (apply '+ squares))) (<span style="color: #729fcf; font-weight: bold;">cond</span> ((eq 1 happiness) t) ((memq happiness seen) nil) (t (happy? (number-to-string happiness) (push happiness seen))))))

(<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">find-happy-numbers</span> (<span style="color: #8ae234; font-weight: bold;">&amp;optional</span> limit)

<span style="color: #888a85;">"find all happy numbers from 1 to limit"</span> (interactive) (<span style="color: #729fcf; font-weight: bold;">let</span> ((count (or limit (read-from-minibuffer <span style="color: #ad7fa8; font-style: italic;">"List of happy numbers from 1 to: "</span>))) happy) (<span style="color: #729fcf; font-weight: bold;">dotimes</span> (n (string-to-int count)) (<span style="color: #729fcf; font-weight: bold;">when</span> (happy? (number-to-string (1+ n))) (push (1+ n) happy))) (nreverse happy))) </pre>

<h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/emacs.html">Emacs</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 30 Aug 2010 11:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/30-happy-numbers.html</guid> </item> <item> <title>welcome el-get scratch installer</title> <link>http://tapoueh.org/blog/2010/08/blog/2010/08/27-welcome-el-get-scratch-installer.html</link> <description><![CDATA[<p><span class="hack"> </span></p>

<p>A very good remark from some users: installing and managing el-get should be simpler. They wanted both an easy install of the thing, and a way to be able to manage it afterwards (like, update the local copy against the authoritative source). So I decided it was high time for getting the code out of my ~/.emacs.d git repository and up to a public place: <a href="http://github.com/dimitri/el-get">http://github.com/dimitri/el-get</a>.</p> <p>Then, I added some documentation (a README), and then, a *scratch* installer, following great ideas from ELPA. So have at it, it's a copy paste away!</p> <p>Don't forget to setup your el-get-sources and include there the el-get source for updates, there's nothing magic about it so it's up to you. You may notice that it's not yet possible to init el-get from el-get-sources, though, that's the drawback of the lack of magic. So you will have to still add an explicit (require 'el-get) before to go and define you own el-get-sources then finally (el-get). I don't think that's a problem I need to solve, though.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 27 Aug 2010 14:15:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/blog/2010/08/27-welcome-el-get-scratch-installer.html</guid> </item> <item> <title>welcome el-get scratch installer</title> <link>http://tapoueh.org/blog/2010/08/27-welcome-el-get-scratch-installer.html</link> <description><![CDATA[h1>welcome el-get scratch installer</h1>

Friday, August 27 2010, 14:15 </div>
<p><span class="hack"> </span></p>

<p>A very good remark from some users: installing and managing el-get should be simpler. They wanted both an easy install of the thing, and a way to be able to manage it afterwards (like, update the local copy against the authoritative source). So I decided it was high time for getting the code out of my ~/.emacs.d git repository and up to a public place: <a href="http://github.com/dimitri/el-get">http://github.com/dimitri/el-get</a>.</p> <p>Then, I added some documentation (a README), and then, a *scratch* installer, following great ideas from ELPA. So have at it, it's a copy paste away!</p> <p>Don't forget to setup your el-get-sources and include there the el-get source for updates, there's nothing magic about it so it's up to you. You may notice that it's not yet possible to init el-get from el-get-sources, though, that's the drawback of the lack of magic. So you will have to still add an explicit (require 'el-get) before to go and define you own el-get-sources then finally (el-get). I don't think that's a problem I need to solve, though.</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 27 Aug 2010 14:15:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/27-welcome-el-get-scratch-installer.html</guid> </item> <item> <title>Playing with bit strings</title> <link>http://tapoueh.org/blog/2010/08/26-playing-with-bit-strings.html</link> <description><![CDATA[h1>Playing with bit strings</h1>

Thursday, August 26 2010, 17:45 </div>
<p>The idea of the day ain't directly from me, I'm just helping with a very thin subpart of the problem. The problem, I can't say much about, let's just assume you want to reduce the storage of MD5 in your database, so you want to abuse <a href="http://www.postgresql.org/docs/8.4/interactive/datatype-bit.html">bit strings</a>. A solution to use them works fine, but the datatype is still missing some facilities, for example going from and to hexadecimal representation in text.</p>

<pre class="src"> create or replace function hex_to_varbit(h text)

returns varbit language sql as $$

select (<span style="color: #ad7fa8; font-style: italic;">'X'</span> $1)::varbit;

$$;

create or replace function varbit_to_hex(b varbit)

returns text language sql as $$ select array_to_string(array_agg(to_hex((b &lt;&lt; (32*o))::bit(32)::bigint)), <span style="color: #ad7fa8; font-style: italic;">''</span>) from (select b, generate_series(0, n-1) as o from (select $1, octet_length($1)/4) as t(b, n)) as x $$; </pre>

<p>To understand the magic in the second function, let's walk through the tests one could do when wanting to grasp how things work in the bitstring world (using also some reading of the fine documentation, too).</p> <pre class="src"> # select ('101011001011100110010110'::varbit &lt;&lt; 0)::bit(8); bit ---------- 10101100 (1 row) # select ('101011001011100110010110'::varbit &lt;&lt; 8)::bit(8);

bit


10111001 (1 row)

# select ('101011001011100110010110'::varbit &lt;&lt; 16)::bit(8); bit ---------- 10010110 (1 row) # select * from TEMP VERSION OF THE FUNCTION FOR TESTING

o b x

—+———————————-+———-

0 10101100101111010001100011011011 acbd18db
1 01001100110000101111100001011100 4cc2f85c
2 11101101111011110110010101001111 edef654f
3 11001100110001001010010011011000 ccc4a4d8

(4 rows) </pre>

<p>What do we get from that, will you ask? Let's see a little example:</p> <pre class="src"> # select hex_to_varbit(md5('foo')); hex_to_varbit ---------------------------------------------------------------------------------------------------------------------------------- 10101100101111010001100011011011010011001100001011111000010111001110110111101111011001010100111111001100110001001010010011011000 (1 row) # select md5('foo'), varbit_to_hex(hex_to_varbit(md5('foo')));
md5 varbit_to_hex

+———————————-
acbd18db4cc2f85cedef654fccc4a4d8 acbd18db4cc2f85cedef654fccc4a4d8

(1 row) </pre>

<p>Storing varbits rather than the text form of the MD5 allows us to go from 6510 MB down to 4976 MB on a sample table containing 100 millions rows. We're targeting more that that, so that's a great win down here!</p> <p>In case you wonder, querying the main index on varbit rather than the one on text for a single result row, the cost of doing the conversion with varbit_to_hex seems to be around 28 µs. We can afford it.</p> <p>Hope this helps!</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a></p>

]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 26 Aug 2010 17:45:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/26-playing-with-bit-strings.html</guid> </item> <item> <title>el-get news</title> <link>http://tapoueh.org/blog/2010/08/blog/2010/08/26-el-get-news.html</link> <description><![CDATA[<p>I've been receiving some requests for <a href="http://www.emacswiki.org/emacs/el-get.el">el-get</a>, some of them even included a patch. So now there's support for bzr, CSV and http-tar, augmenting the existing support for git, git-svn, apt-get, fink and ELPA formats.</p>

<p>Also, as the install and even the build are completely <em>asynchronous</em> — there's a pending bugfix for the building, which is now using <a href="http://www.gnu.org/software/emacs/elisp/html_node/Asynchronous-Processes.html">start-process-shell-command</a>. The advantage of doing so is that you're free to use Emacs as usual while el-get is having your piece of elisp code compiled, which can take time.</p> <p>The drawback is that it's uneasy to to do the associated setup at the right time without support from el-get, so you have the new option :after which takes a functionp object: please consider using that to give your own special setup for the external emacs bits and pieces you're using.</p> <p>Let's see some examples of the new features:</p> <pre class="src">

(<span style="color: #da70d6;">:name</span> xml-rpc-el <span style="color: #da70d6;">:type</span> bzr <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"lp:xml-rpc-el"</span>)

(<span style="color: #da70d6;">:name</span> haskell-mode <span style="color: #da70d6;">:type</span> http-tar <span style="color: #da70d6;">:options</span> (<span style="color: #bc8f8f;">"xzf"</span>) <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"http://projects.haskell.org/haskellmode-emacs/haskell-mode-2.8.0.</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">tar.gz"</span> <span style="color: #da70d6;">:load</span> <span style="color: #bc8f8f;">"haskell-site-file.el"</span> <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> () (add-hook 'haskell-mode-hook 'turn-on-haskell-doc-mode) (add-hook 'haskell-mode-hook 'turn-on-haskell-indentation)))

(<span style="color: #da70d6;">:name</span> auctex <span style="color: #da70d6;">:type</span> cvs <span style="color: #da70d6;">:module</span> <span style="color: #bc8f8f;">"auctex"</span> <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">":pserver:anonymous@cvs.sv.gnu.org:/sources/auctex"</span> <span style="color: #da70d6;">:build</span> (<span style="color: #bc8f8f;">"./autogen.sh"</span> <span style="color: #bc8f8f;">"./configure"</span> <span style="color: #bc8f8f;">"make"</span>) <span style="color: #da70d6;">:load</span> (<span style="color: #bc8f8f;">"auctex.el"</span> <span style="color: #bc8f8f;">"preview/preview-latex.el"</span>) <span style="color: #da70d6;">:info</span> <span style="color: #bc8f8f;">"doc"</span>) </pre>

<p>As you can see, there are also the new options :module (only used by CVS so far) and :options (only used by http-tar so far). With this later method, the :options key allows you to have support for virtually any kind of tar compression (.tar.bz2, etc).</p> <p>The CVS support currently does not include authentication against the anonymous pserver, because the only repository I've been asked support for isn't using that, and the couple of servers that I know of are either wanting no password at the prompt, or a dummy one. That's for another day, if needed at all.</p> <p>That pushes the little local hack to more than a thousand lines of elisp code, and the next steps include proposing it to <a href="http://tromey.com/elpa/">ELPA</a> so that getting to use it is easier than ever. You'd just have to choose whether to install ELPA from el-get or the other way around.</p> ]]></description>

<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 26 Aug 2010 16:30:00 +0200</pubDate> <guid isPermaLink="true">