<center> <p><img src="../../../images/lightbulb.gif" alt=""></p> </center> <center> <p><em>AHAH, you'll see!</em></p> </center> <p>The general approach I'm following code wise with that <em>command language</em> is to first get a code API to expose the capabilities of the system, then somehow plug the <em>command language</em> into that API thanks to a <em>parser</em>. It turns out that doing so in <em>Common Lisp</em> is really easy, and that you can get a <em>compiler</em> for free too, while at it. Let's see about that.</p> <h3>A very simple toy example</h3> <p class="first">In this newsgroup article <a href="https://groups.google.com/forum/?fromgroups=#!topic/comp.lang.lisp/JJxTBqf7scU">What is symbolic compoutation?</a>, <a href="http://informatimago.com/">Pascal Bourguignon</a> did propose a very simple piece of code:</p> <pre class="src"> (<span style="color: #fcaf3e;">defparameter</span> <span style="color: #fce94f;">*additive-color-graph*</span><channel> <title>tail -f /dev/dim</title> <link>http://tapoueh.org/index.html</link> <description>Dimitri Fontaine's blog</description> <language>en-us</language> <generator>Emacs Muse</generator> <item> <title>from Parsing to Compiling</title> <link>http://tapoueh.org/blog/2013/05/13-from-parser-to-compiler.html</link> <description><![CDATA[<p>Last week came with two bank hollidays in a row, and I took the opportunity to design a <em>command language</em> for <a href="../../../pgsql/pgloader.html">pgloader</a>. While doing that, I unexpectedly stumbled accross a very nice <em>AHAH!</em> moment, and I now want to share it with you, dear reader.</p>
'((red (red white) (green yellow) (blue magenta)) (green (red yellow) (green white) (blue cyan)) (blue (red magenta) (green cyan) (blue white))))
(<span style="color: #fcaf3e;">defun</span> <span style="color: #729fcf;">symbolic-color-add</span> (a b)
<p>This is an example of <em>symbolic computation</em>, and we're going to build a little <em>language</em> to express the data and the code. Not that we would need to build one, mind you, more in order to have a really simple example leading us to the <em>ahah</em> moment you're now waiting for.</p> <p>Before we dive into the main topic, you have to realize that the previous code example actually works: it's defining some data, using an implicit data structure composed by nesting lists together, and defines a function that knows how to sort out the data in that anonymous data structure so as to compound 2 colors together.</p> <pre class="src"> TOY-PARSER> (symbolic-color-add 'red 'green) YELLOW </pre> <h3>A command language and parser</h3> <p class="first">I decided to go with the following <em>language</em>:</p> <pre class="src"> color red +red white +green yellow +blue magenta color green +red yellow +green white +blue cyan color blue +red magenta +green cyan +blue white(cadr (assoc a (cdr (assoc b additive-color-graph))))) </pre>
mix red and green </pre>
<p>And here's how some of the parser looks like, using the <a href="http://nikodemus.github.io/esrap/">esrap</a> <em>packrat</em> lib:</p> <pre class="src"> (defrule color-name (and whitespaces (+ (alpha-char-p character)))(<span style="color: #729fcf;">:destructure</span> (ws name) (<span style="color: #fcaf3e;">declare</span> (ignore ws)) <span style="color: #888a85;">; </span><span style="color: #888a85;">ignore whitespaces </span> <span style="color: #888a85;">;; </span><span style="color: #888a85;">CL symbols default to upper case. </span> (intern (string-upcase (coerce name 'string)) <span style="color: #729fcf;">:toy-parser</span>)))
<span style="color: #888a85;">;;; </span><span style="color: #888a85;">parse string "+ red white" </span>(defrule color-mix (and whitespaces <span style="color: #73d216;">"+"</span> color-name color-name)
(<span style="color: #729fcf;">:destructure</span> (ws plus color-added color-obtained) (<span style="color: #fcaf3e;">declare</span> (ignore ws plus)) <span style="color: #888a85;">; </span><span style="color: #888a85;">ignore whitespaces and keywords </span> (list color-added color-obtained)))
<span style="color: #888a85;">;;; </span><span style="color: #888a85;">mix red and green </span>(defrule mix-two-colors (and kw-mix color-name kw-and color-name)
<p>Those <em>rules</em> are not the whole parser, go have a look at the project on github if you want to see the whole code, it's called <a href="https://github.com/dimitri/toy-parser">toy-parser</a> over there. The main idea here is to show that when we parse a line from our little language, we produce the simplest possible structured data: in lisp that's <em>symbols</em> and <em>lists</em>.</p> <p>The reason why it makes sense doing that is the next rule:</p> <center> <p><img src="../../../images/the-one-ring.jpg" alt=""></p> </center> <center> <p><em>The one grammar rule to bind them all</em></p> </center> <pre class="src"> (defrule program (and colors mix-two-colors)(<span style="color: #729fcf;">:destructure</span> (mix c1 and c2) (<span style="color: #fcaf3e;">declare</span> (ignore mix and)) <span style="color: #888a85;">; </span><span style="color: #888a85;">ignore keywords </span> (list c1 c2))) </pre>
<p>This rule is the complex one to bind them all. It's using a <em>quasiquote</em>, a basic lisp syntax element allowing the programmer to very easily produce data that looks exactly like code. Let's see how it goes with a very simple example:</p> <pre class="src"> TOY-PARSER> (pprint (parse 'program(<span style="color: #729fcf;">:destructure</span> (graph (c1 c2)) `(<span style="color: #fcaf3e;">lambda</span> () (<span style="color: #fcaf3e;">let</span> ((additive-color-graph ',graph)) (symbolic-color-add ',c1 ',c2))))) </pre>
<span style="color: #73d216;">"color red +green yellow mix green and red"</span>))
(LAMBDA NIL
</pre> <p>The parser is producing structure (nested) data that really looks like lisp code, right? So maybe we can just run that code...</p> <h3>What about a compiler now?</h3> <center> <p><img src="../../../images/aha.jpg" alt=""></p> </center> <center> <p><em>Here is my AHAH moment!</em></p> </center> <p>Let's see about actually running the code:</p> <pre class="src"> TOY-PARSER> (<span style="color: #fcaf3e;">let*</span> ((code <span style="color: #73d216;">"color red +green yellow mix green and red"</span>)(LET ((ADDITIVE-COLOR-GRAPH '((RED (GREEN YELLOW))))) (SYMBOLIC-COLOR-ADD 'RED 'GREEN)))
(program (parse 'program code))) (compile nil program)) Function #x3020027CF0EF> NIL NIL TOY-PARSER> (<span style="color: #fcaf3e;">let*</span> ((code <span style="color: #73d216;">"color red +green yellow mix green and red"</span>) (program (parse 'program code))) (funcall (compile nil program))) YELLOW </pre>
funcall that function we just built.</p>
<p>Oh and the function is actually compiled down to native code, of course:</p>
<pre class="src">
TOY-PARSER> (<span style="color: #fcaf3e;">let*</span> ((code <span style="color: #73d216;">"color red +green yellow mix red and green"</span>)
(program (parse 'program code)) (func (compile nil program))) (time (<span style="color: #fcaf3e;">loop</span> repeat 1000 do (funcall func))))
(<span style="color: #fcaf3e;">LOOP</span> REPEAT 1000 DO (FUNCALL FUNC)) took 108 microseconds (0.000108 seconds) to run. During that period, and with 4 available CPU cores,
<p>Yeah, it took the whole of105 microseconds (0.000105 seconds) were spent in user mode 13 microseconds (0.000013 seconds) were spent in system mode NIL </pre>
108 microseconds to actually run the code
generated by our own <em>parser</em> <strong>a thousand times</strong>, on my laptop. I can believe
it's been compiled to native code, that seems like the right ballpark.</p>
<h3>Conclusion</h3>
<p class="first">The <a href="https://github.com/dimitri/toy-parser">toy-parser</a> code is there on <em>GitHub</em> and you can actually load it using
<a href="http://www.quicklisp.org/">Quicklisp</a>: clone the repository in ~/quicklisp/local-projects/ then
(ql:quickload "toy-parser"), and play with it in (in-package :toy-parser).</p>
<p>The only thing I still want to say here is this: can your programming
language of choice make it that easy?</p>
]]></description>
<center> <p><img src="../../../images/global_accessibility-640.png" alt=""></p> </center> <h3>A very localized example</h3> <p class="first">We first need to find and import some data, and I found at the following place a <a href="http://www.lion1906.com/Pages/francais/utile/telechargements.html">CSV listing of french cities with coordinates and population</a> and some numbers of interest for the exercise here.</p> <p>To import the data set, we first need a table, then a<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 13 May 2013 11:08:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/05/13-from-parser-to-compiler.html</guid> </item> <item> <title>Nearest Big City</title> <link>http://tapoueh.org/blog/2013/05/02-nearest-big-city.html</link> <description><![CDATA[<p>In this article, we want to find the town with the greatest number of inhabitants near a given location.</p>
COPY command:</p>
<pre class="src">
<span style="color: #fcaf3e;">CREATE</span> <span style="color: #fcaf3e;">TABLE</span> <span style="color: #729fcf;">lion1906</span> (
insee <span style="color: #c17d11;">text</span>, nom <span style="color: #c17d11;">text</span>, altitude <span style="color: #c17d11;">integer</span>, code_postal <span style="color: #c17d11;">text</span>, longitude <span style="color: #c17d11;">double</span> <span style="color: #c17d11;">precision</span>, latitude <span style="color: #c17d11;">double</span> <span style="color: #c17d11;">precision</span>, pop99 <span style="color: #c17d11;">bigint</span>, surface <span style="color: #c17d11;">double</span> <span style="color: #c17d11;">precision</span> );
\<span style="color: #729fcf;">copy</span> lion1906 <span style="color: #fcaf3e;">from</span> <span style="color: #73d216;">'villes.csv'</span> <span style="color: #fcaf3e;">with</span> <span style="color: #729fcf;">csv</span> <span style="color: #729fcf;">header</span> <span style="color: #729fcf;">delimiter</span> <span style="color: #73d216;">';'</span> <span style="color: #729fcf;">encoding</span> <span style="color: #73d216;">'latin1'</span> </pre>
<p>With that data in place, we can find the 10 nearest towns of a random choosing of us, let's pick <em>Villeurbanne</em> which is in the region of <em>Lyon</em>.</p> <pre class="src"><span style="color: #fcaf3e;">select</span> code_postal, nom, pop99 <span style="color: #fcaf3e;">from</span> lion1906 <span style="color: #fcaf3e;">order</span> <span style="color: #729fcf;">by</span> <span style="color: #c17d11;">point</span>(longitude, latitude) <-> (<span style="color: #fcaf3e;">select</span> <span style="color: #c17d11;">point</span>(longitude, latitude) <span style="color: #fcaf3e;">from</span> lion1906 <span style="color: #fcaf3e;">where</span> nom = <span style="color: #73d216;">'Villeurbanne'</span>) <span style="color: #fcaf3e;">limit</span> 10;
| code_postal | nom | pop99 |
<span style="color: #888a85;">————-+————————+———
| </span> 69100 | Villeurbanne | 124215 |
| 69300 | Caluire-et-Cuire | 41233 |
| 69120 | Vaulx-en-Velin | 39154 |
| 69580 | Sathonay-Camp | 4336 |
| 69140 | Rillieux-la-Pape | 28367 |
| 69000 | Lyon | 445452 |
| 69500 | Bron | 37369 |
| 69580 | Sathonay-Village | 1693 |
| 01700 | Neyron | 2157 |
| 69660 | Collonges-au-Mont-d<span style="color: #73d216;">'Or | 3420 |
(10 rows) </span></pre>
<p>We find Lyon in our list in there, and we want the query now to return only that one as it has the greatest number of inhabitants in the list:</p> <pre class="src"> <span style="color: #fcaf3e;">with</span> neighbours <span style="color: #fcaf3e;">as</span> (<span style="color: #fcaf3e;">select</span> code_postal, nom, pop99 <span style="color: #fcaf3e;">from</span> lion1906 <span style="color: #fcaf3e;">order</span> <span style="color: #729fcf;">by</span> <span style="color: #c17d11;">point</span>(longitude, latitude) <-> (<span style="color: #fcaf3e;">select</span> <span style="color: #c17d11;">point</span>(longitude, latitude) <span style="color: #fcaf3e;">from</span> lion1906 <span style="color: #fcaf3e;">where</span> nom = <span style="color: #73d216;">'Villeurbanne'</span>) <span style="color: #fcaf3e;">limit</span> 10 ) <span style="color: #fcaf3e;">select</span> * <span style="color: #fcaf3e;">from</span> neighbours <span style="color: #fcaf3e;">order</span> <span style="color: #729fcf;">by</span> pop99 <span style="color: #fcaf3e;">desc</span> <span style="color: #fcaf3e;">limit</span> 1;
| code_postal | nom | pop99 |
<span style="color: #888a85;">————-+——+———
| </span> 69000 | Lyon | 445452 |
(1 <span style="color: #729fcf;">row</span>) </pre>
<p>Well, thank you PostgreSQL, that was easy!</p> <p>Note that you can actually index such queries, that's called a <em>KNN index</em>. PostgreSQL knows how to use some kind of indexes to fetch data matching an expression such asORDER BY a <-> b, which allow you to consider a <em>KNN</em>
search in your application.</p>
<h3>Let's get worldwide</h3>
<p class="first">The real scope of our exercise is to associate every known town in the world
with some big city around, so let's first fetch and import some worldwide
data this time, from
<a href="maxmind">http://download.maxmind.com/download/worldcities/worldcitiespop.txt.gz</a>.</p>
<center>
<p><img src="../../../images/map_nearest_city_01.gif" alt=""></p>
</center>
<pre class="src">
<span style="color: #fcaf3e;">CREATE</span> <span style="color: #fcaf3e;">TABLE</span> <span style="color: #729fcf;">maxmind_worldcities</span> (
country_code <span style="color: #c17d11;">text</span>, city_lower <span style="color: #c17d11;">text</span>, city_normal <span style="color: #c17d11;">text</span>, region_code <span style="color: #c17d11;">text</span> <span style="color: #fcaf3e;">DEFAULT</span> <span style="color: #73d216;">''</span>, population <span style="color: #c17d11;">INT</span> <span style="color: #fcaf3e;">DEFAULT</span> <span style="color: #73d216;">'0'</span>, latitude <span style="color: #c17d11;">float8</span> <span style="color: #fcaf3e;">DEFAULT</span> <span style="color: #73d216;">'0'</span>, longitude <span style="color: #c17d11;">float8</span> <span style="color: #fcaf3e;">DEFAULT</span> <span style="color: #73d216;">'0'</span> );
\<span style="color: #729fcf;">copy</span> maxmind_worldcities <span style="color: #fcaf3e;">FROM</span> <span style="color: #73d216;">'/tmp/worldcitiespop.txt'</span> <span style="color: #fcaf3e;">WITH</span> <span style="color: #729fcf;">DELIMITER</span> <span style="color: #73d216;">','</span> <span style="color: #729fcf;">QUOTE</span> E<span style="color: #73d216;">'\f'</span> <span style="color: #729fcf;">CSV</span> <span style="color: #729fcf;">HEADER</span> <span style="color: #729fcf;">ENCODING</span> <span style="color: #73d216;">'LATIN1'</span>;
<span style="color: #729fcf;">alter</span> <span style="color: #fcaf3e;">table</span> <span style="color: #729fcf;">maxmind_worldcities</span> <span style="color: #729fcf;">add</span> <span style="color: #fcaf3e;">column</span> loc <span style="color: #c17d11;">point</span>; <span style="color: #729fcf;">update</span> maxmind_worldcities <span style="color: #729fcf;">set</span> loc = <span style="color: #c17d11;">point</span>(longitude, latitude); </pre>
<p>This time you can see that I created an extra column with the <em>location</em> in there, so that I don't have to compute it each time I need it, like I did before.</p> <p>Now is the time to test that data set and hopefully fetch the same result as before when we only had french cities loaded:</p> <pre class="src"> <span style="color: #fcaf3e;">with</span> neighbours <span style="color: #fcaf3e;">as</span> (<span style="color: #fcaf3e;">select</span> country_code, city_lower, population <span style="color: #fcaf3e;">from</span> maxmind_worldcities <span style="color: #fcaf3e;">where</span> population <span style="color: #fcaf3e;">is</span> <span style="color: #fcaf3e;">not</span> <span style="color: #fcaf3e;">null</span> <span style="color: #fcaf3e;">order</span> <span style="color: #729fcf;">by</span> loc <-> (<span style="color: #fcaf3e;">select</span> loc <span style="color: #fcaf3e;">from</span> maxmind_worldcities <span style="color: #fcaf3e;">where</span> city_lower = <span style="color: #73d216;">'villeurbanne'</span>) <span style="color: #fcaf3e;">limit</span> 10 ) <span style="color: #fcaf3e;">select</span> * <span style="color: #fcaf3e;">from</span> neighbours <span style="color: #fcaf3e;">order</span> <span style="color: #729fcf;">by</span> population <span style="color: #fcaf3e;">desc</span> <span style="color: #fcaf3e;">limit</span> 1;
| country_code | city_lower | population |
<span style="color: #888a85;">—————+————+————
| </span> fr | lyon | 463700 |
(1 <span style="color: #729fcf;">row</span>) </pre>
<p>Ok, looks like we're all set for the real problem. Now we want to pick for each of those cities it's nearest neighboor, so here's how to do that:</p> <pre class="src"> <span style="color: #fcaf3e;">create</span> <span style="color: #729fcf;">index</span> <span style="color: #fcaf3e;">on</span> maxmind_worldcities(country_code, region_code, city_lower); <span style="color: #fcaf3e;">create</span> <span style="color: #729fcf;">index</span> <span style="color: #fcaf3e;">on</span> maxmind_worldcities <span style="color: #fcaf3e;">using</span> gist(loc);<span style="color: #fcaf3e;">create</span> <span style="color: #fcaf3e;">table</span> <span style="color: #729fcf;">maxmind_neighbours</span> <span style="color: #fcaf3e;">as</span>
<p>To be fair, I have to tell you that this query took almost 2 hours to complete on my laptop here, but as I'm doing that for friend and a blog article, I've been lazy and didn't try to optimise it. It could be using<span style="color: #fcaf3e;">select</span> country_code, region_code, city_lower, (<span style="color: #fcaf3e;">with</span> neighbours <span style="color: #fcaf3e;">as</span> ( <span style="color: #fcaf3e;">select</span> country_code, city_lower, population <span style="color: #fcaf3e;">from</span> maxmind_worldcities <span style="color: #fcaf3e;">where</span> population <span style="color: #fcaf3e;">is</span> <span style="color: #fcaf3e;">not</span> <span style="color: #fcaf3e;">null</span> <span style="color: #fcaf3e;">and</span> country_code = wc.country_code <span style="color: #fcaf3e;">and</span> region_code = wc.region_code <span style="color: #fcaf3e;">order</span> <span style="color: #729fcf;">by</span> loc <-> wc.loc <span style="color: #fcaf3e;">limit</span> 10) <span style="color: #fcaf3e;">select</span> city_lower <span style="color: #fcaf3e;">from</span> neighbours <span style="color: #fcaf3e;">order</span> <span style="color: #729fcf;">by</span> population <span style="color: #fcaf3e;">desc</span> <span style="color: #fcaf3e;">limit</span> 1 ) <span style="color: #fcaf3e;">as</span> neighbour <span style="color: #fcaf3e;">from</span> maxmind_worldcities wc ; </pre>
LATERAL for sure, I don't know if that would help very much with
performances: I didn't try.</p>
<p>With that in hands we can now check some cities and their <em>biggest</em>
neighbours, as in the following query:</p>
<pre class="src">
<span style="color: #fcaf3e;">select</span> * <span style="color: #fcaf3e;">from</span> maxmind_neighbours <span style="color: #fcaf3e;">where</span> city_lower = <span style="color: #73d216;">'villeurbanne'</span>;
| country_code | region_code | city_lower | neighbour |
<span style="color: #888a85;">—————+————-+—————+————
| </span> fr | B9 | villeurbanne | lyon |
(1 <span style="color: #729fcf;">row</span>) </pre>
<p>And looking for New-York City suburbs I did find a <em>chinatown</em>, which is a pretty common smaller town name apparently:</p> <pre class="src"> <span style="color: #fcaf3e;">select</span> * <span style="color: #fcaf3e;">from</span> maxmind_neighbours <span style="color: #fcaf3e;">where</span> city_lower = <span style="color: #73d216;">'chinatown'</span>;| country_code | region_code | city_lower | neighbour |
<span style="color: #888a85;">—————+————-+————+—————
| </span> sb | 08 | chinatown | honiara |
| us | CA | chinatown | san francisco |
| us | DC | chinatown | washington |
| us | HI | chinatown | honolulu |
| us | IL | chinatown | chicago |
| us | MT | chinatown | missoula |
| us | NV | chinatown | reno |
| us | NY | chinatown | <span style="color: #729fcf;">new</span> york |
(8 <span style="color: #729fcf;">rows</span>) </pre>
<h3>Big Cities in the big world</h3> <center> <p><img src="../../../images/Old-Photos-of-Big-Cities-21.jpg" alt=""></p> </center> <center> <p><em>We might need to change some of our views</em></p> </center> <p>So, let's see how many smaller towns each of those random big cities have:</p> <pre class="src"><span style="color: #fcaf3e;">select</span> country_code, region_code, neighbour, <span style="color: #729fcf;">count</span>() <span style="color: #fcaf3e;">from</span> maxmind_neighbours <span style="color: #fcaf3e;">where</span> neighbour <span style="color: #fcaf3e;">in</span> (<span style="color: #73d216;">'london'</span>, <span style="color: #73d216;">'new york'</span>, <span style="color: #73d216;">'moscow'</span>, <span style="color: #73d216;">'paris'</span>, <span style="color: #73d216;">'tokyo'</span>, <span style="color: #73d216;">'sao polo'</span>, <span style="color: #73d216;">'chicago'</span>) <span style="color: #fcaf3e;">group</span> <span style="color: #729fcf;">by</span> country_code, region_code, neighbour;
| country_code | region_code | neighbour | <span style="color: #729fcf;">count</span> |
<span style="color: #888a85;">—————+————-+————+——-
| </span> gb | H9 | london | 2 |
| jp | 40 | tokyo | 414 |
| us | NY | <span style="color: #729fcf;">new</span> york | 131 |
| ca | 08 | london | 16 |
| ru | 48 | moscow | 245 |
| fr | A8 | paris | 16 |
| us | IL | chicago | 13 |
(7 <span style="color: #729fcf;">rows</span>) </pre>
<p>And now let's be fair and see where are the cities with the greatest number of towns nearby them, with the following query:</p> <pre class="src"><span style="color: #fcaf3e;">select</span> country_code, region_code, neighbour, <span style="color: #729fcf;">count</span>() <span style="color: #fcaf3e;">from</span> maxmind_neighbours <span style="color: #fcaf3e;">where</span> neighbour <span style="color: #fcaf3e;">is</span> <span style="color: #fcaf3e;">not</span> <span style="color: #fcaf3e;">null</span> <span style="color: #fcaf3e;">group</span> <span style="color: #729fcf;">by</span> country_code, region_code, neighbour <span style="color: #fcaf3e;">order</span> <span style="color: #729fcf;">by</span> 4 <span style="color: #fcaf3e;">desc</span> <span style="color: #fcaf3e;">limit</span> 25;
| country_code | region_code | neighbour | <span style="color: #729fcf;">count</span> |
<span style="color: #888a85;">—————+————-+————+——-
| </span> cn | 03 | nanchang | 16759 |
| cn | 26 | xian | 12864 |
| <span style="color: #729fcf;">id</span> | 18 | kupang | 10715 |
| cn | 24 | taiyuan | 10550 |
| mm | 11 | taunggyi | 10253 |
| <span style="color: #729fcf;">id</span> | 38 | makasar | 9471 |
| ir | 15 | ahvaz | 9461 |
| <span style="color: #729fcf;">id</span> | 01 | banda aceh | 9161 |
| cn | 14 | lasa | 8841 |
| cn | 15 | lanzhou | 8618 |
| ir | 29 | kerman | 8579 |
| <span style="color: #729fcf;">id</span> | 26 | medan | 7787 |
| ir | 04 | iranshahr | 7249 |
| ir | 07 | shiraz | 7219 |
| ma | 55 | agadir | 7121 |
| ir | 42 | mashhad | 7107 |
| af | 08 | gazni | 7011 |
| ir | 33 | tabriz | 6586 |
| cn | 01 | hefei | 6521 |
| bd | 81 | dhaka | 6480 |
| ir | 08 | rasht | 6471 |
| <span style="color: #729fcf;">id</span> | 17 | mataram | 6467 |
| <span style="color: #729fcf;">id</span> | 33 | cilegon | 6287 |
| af | 23 | qandahar | 6213 |
| cn | 07 | fuzhou | 6089 |
(25 <span style="color: #729fcf;">rows</span>) </pre> ]]></description>
<center> <p><img src="../../../images/toplap-small.png" alt=""></p> </center> <p>The conference has been packed with awesome really. Among the things that I'm going home with are new thoughts, tricks and tips, and new modes to use in Emacs.</p> <p>The main new though is all about learning to program. That's a problem space in which I have a growing interest in, and the conference talk about <em>arxana</em> showed that it should be possible to build an environment where you can learn programming with the excuse of having fun with maths. And after talking about music and its notation and <a href="http://www.lilypond.org/">lilypond</a>, it should even be possible to offer some interactive programming environment where not only you can play music live as <a href="http://meta-ex.com/">Meta-Ex</a> is doing, but where the other output of your program would be the updated music scores.</p> <p>The main practical bits I'm going home with is <a href="http://www.foldr.org/~michaelw/projects/redshank">redshank</a>, <em>A collection of code-wrangling Emacs macros mostly geared towards Common Lisp, but some are useful for other Lisp dialects, too</em>. That complements <a href="http://mumble.net/~campbell/emacs/paredit.el">paredit</a> and allows you to do some reformating very easily.</p> <p>Lastly, I'm back to giving the dark background environment a try now. I think I prefer the contrast and richer color sets of the default Emacs color theme, but the black window has some classy visual effect too. And with <a href="https://github.com/jasonm23/emacs-mainline">main-line</a> the effect is quite awesome!</p> <center> <p><img src="../../../images/Emacs-Tango-2-Main-Line.png" alt=""></p> </center> <center> <p><em>Look at that!</em></p> </center> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 02 May 2013 11:34:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/05/02-nearest-big-city.html</guid> </item> <item> <title>Emacs Conference</title> <link>http://tapoueh.org/blog/2013/04/02-Emacs-Conference.html</link> <description><![CDATA[<p>Yes it did happen, for real, in London: the <a href="http://emacsconf.herokuapp.com/">Emacs Conference</a>. It was easter week-end. Yet the conference managed to have more than 60 people meet together and spend a full day talking about <a href="http://www.gnu.org/software/emacs/">Emacs</a>. If you weren't there, a live stream was available and soon enough (wait for about two weeks) the video material will be published, as <a href="http://sachachua.com/blog/">sacha</a> is working on it.</p>
<p>J'ai eu le plaisir de réaliser une présentation intitulée « The Need for Speed » dans laquelle on replace l'effort d'optimisation dans son contexte métier, afin de faire une étude des coûts et bénéfices et de savoir non seulement à quoi s'attendre mais aussi quand s'arrêter.</p> <center> <p><a class="image-link" href="../../../images/confs/the_need_for_speed.pdf"> <img src="../../../images/confs/the_need_for_speed-3.png"></a></p> </center> <p>Merci à <em>dalibo</em> pour cette conférence !</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 02 Apr 2013 09:56:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/04/02-Emacs-Conference.html</guid> </item> <item> <title>The Need For Speed</title> <link>http://tapoueh.org/blog/2013/03/29-the-need-for-speed.html</link> <description><![CDATA[<p>Hier se tenait la <a href="http://www.postgresql-sessions.org/en/5/start">cinquième édition</a> de la conférence organisée par <em>dalibo</em>, où des intervenants extérieurs sont régulièrement invités. Le thème hier était à la fois clair et très vaste : la performance.</p>
<center> <p><img src="../../../images/clock-key.jpg" alt=""></p> </center> <p>A case where we want to apply the previous article approach is when replicating data with a <em>trigger based solution</em>, such as <a href="http://wiki.postgresql.org/wiki/SkyTools">SkyTools</a> and <a href="https://github.com/markokr/skytools">londiste</a>. Well, maybe not in all cases, we need to have a amount of<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 29 Mar 2013 09:49:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/03/29-the-need-for-speed.html</guid> </item> <item> <title>Bulk Replication</title> <link>http://tapoueh.org/blog/2013/03/18-bulk-replication.html</link> <description><![CDATA[<p>In the previous article here we talked about how to <em>properly</em> update more than one row at a time, under the title <a href="http://tapoueh.org/blog/2013/03/15-batch-update.html">Batch Update</a>. We did consider performances, including network round trips, and did look at the behavior of our results when used concurrently.</p>
UPDATE
trafic worthy of setting up the solution. As soon as we know we're getting
to <em>replay</em> important enough batches of events, though, certainly using the
<em>batch update</em> tricks makes sense.</p>
<p>It so happens that londiste 3 includes the capability to use <em>handlers</em>. Those
are plugins written in <em>python</em> (like all the client side code from <em>SkyTools</em>)
whose job is to handle the <em>processing</em> of the event batches. Several of them
are included in the <a href="https://github.com/markokr/skytools/tree/master/python/londiste">londiste sources</a>, and one of them is named bulk.py.</p>
<h3>Bulk loading data with londiste</h3>
<p class="first">To use set in londiste.ini:</p>
<pre class="src">
<span style="color: #fce94f;">handler_modules</span> = londiste.handlers.bulk
</pre>
<p>then add table with one of those commands:</p>
<pre class="src">
londiste3 add-table xx —handler=<span style="color: #73d216;">"bulk"</span>
londiste3 add-table xx —handler=<span style="color: #73d216;">"bulk(method=X)"</span>
</pre>
<p>The default method is 0, and the available methods are the following:</p>
<p><em>correct</em> (0)</p>
<ul>
<li>inserts as COPY into table</li>
<li>update as COPY into temp table and single UPDATE from there</li>
<li>delete as COPY into temp table and single DELETE from there</li>
</ul>
<p><em>delete</em> (1)</p>
<ul>
<li>as <em>correct</em>, but <em>update</em> are done as DELETE then COPY</li>
</ul>
<p><em>merged</em> (2)</p>
<ul>
<li>as <em>delete</em>, but merge <em>insert</em> rows with <em>update</em> rows</li>
</ul>
<h3>Conclusion</h3>
<center>
<p><img src="../../../images/londiste.jpg" alt=""></p>
</center>
<p>Yes, by using that <em>handler</em> which is provided by default in <em>londiste</em>, you
will apply the previous article tricks in your replication solution. And you
can even choose to use that for only some of the tables you are replicating.</p>
]]></description>
<center> <p><img src="../../../images/Home-Brewing.jpg" alt=""></p> </center> <center> <p><em>Another kind of Batch to update</em></p> </center> <p>So when you need to<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 18 Mar 2013 14:54:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/03/18-bulk-replication.html</guid> </item> <item> <title>Batch Update</title> <link>http://tapoueh.org/blog/2013/03/15-batch-update.html</link> <description><![CDATA[<p>Performance consulting involves some tricks that you have to teach over and over again. One of them is that SQL tends to be so much better at dealing with plenty of rows in a single statement when compared to running as many statements, each one against a single row.</p>
UPDATE a bunch of rows from a given source, remember
that you can actually use a JOIN in the <em>update</em> statement. Either the source
of data is already in the database, in which case it's as simple as using
the FROM clause in the <em>update</em> statement, or it's not, and we're getting back
to that in a minute.</p>
<h3>UPDATE FROM</h3>
<p class="first">It's all about using that FROM clause in an <em>update</em> statement, right?</p>
<pre class="src">
<p>Using that, you can actually update thousands of rows in our <em>target</em> table in a single statement, and you can't really get faster than that.</p> <h3>Preparing the Batch</h3> <p class="first">Now, if you happen to have the source data in your application process' memory, the previous bits is not doing you any good, you think. Well, the trick is that pushing your in-memory data into the database and then joining against the now local source of data is generally faster than looping in the application and having to do a whole network <em>round trip</em> per row.</p> <center> <p><img src="../../../images/round-trip.png" alt=""></p> </center> <center> <p><em>What about</em> <strong><em>that</em></strong> <em>round trip?</em></p> </center> <p>Let's see how it goes:</p> <pre class="src"> <span style="color: #fcaf3e;">CREATE</span> <span style="color: #729fcf;">TEMP</span> <span style="color: #fcaf3e;">TABLE</span> <span style="color: #729fcf;">source</span>(<span style="color: #fcaf3e;">LIKE</span> target <span style="color: #729fcf;">INCLUDING</span> <span style="color: #fcaf3e;">ALL</span>) <span style="color: #fcaf3e;">ON</span> <span style="color: #729fcf;">COMMIT</span> <span style="color: #729fcf;">DROP</span>;<span style="color: #729fcf;">UPDATE</span> target <span style="color: #729fcf;">t</span> <span style="color: #729fcf;">SET</span> counter = <span style="color: #729fcf;">t</span>.counter + s.counter, <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">source</span> s <span style="color: #fcaf3e;">WHERE</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> = s.<span style="color: #729fcf;">id</span> </pre>
<span style="color: #729fcf;">COPY</span> <span style="color: #729fcf;">source</span> <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">STDIN</span>;
<span style="color: #729fcf;">UPDATE</span> target <span style="color: #729fcf;">t</span>
<p>As we're talking about performances, the trick here is to use the <a href="http://www.postgresql.org/docs/9.2/static/sql-copy.html">COPY</a> protocol to fill in the <em>temporary table</em> we just create to hold our data. So we're now sending the whole data set in a temporary location in the database, then using that as the<span style="color: #729fcf;">SET</span> counter = <span style="color: #729fcf;">t</span>.counter + s.counter, <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">source</span> s <span style="color: #fcaf3e;">WHERE</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> = s.<span style="color: #729fcf;">id</span> </pre>
UPDATE source. And that's way faster than
doing a separate UPDATE statement per row in your batch, even for small
batches.</p>
<p>Also, rather than using the SQL COPY command, you might want to look up the
docs of the PostgreSQL driver you are currently using in your application,
it certainly includes some higher level facilities to deal with pushing the
data into the streaming protocol.</p>
<h3>Insert or Update</h3>
<p class="first">And now sometime some of the rows in the batch have to be <em>updated</em> while some
others are new and must be inserted. How do you do that? Well, PostgreSQL
9.1 brings on the table WITH support for all <a href="http://www.postgresql.org/docs/9.2/static/dml.html">DML</a> queries, which means that
you can do the following just fine:</p>
<pre class="src">
<span style="color: #fcaf3e;">WITH</span> upd <span style="color: #fcaf3e;">AS</span> (
<p>That query here is <em>updating</em> all the rows that are known in both the <em>target</em> and the <em>source</em> and returns what we took from the <em>source</em> in the operation, so that we can do an <em>anti-join</em> in the next step of the query, where we're <em>inserting</em> any row that was not taken care of in the <em>update</em> part of the statement.</p> <p>Note that when the batch gets to bigger size it's usually better to join against the <em>target</em> table in the<span style="color: #729fcf;">UPDATE</span> target <span style="color: #729fcf;">t</span> <span style="color: #729fcf;">SET</span> counter = <span style="color: #729fcf;">t</span>.counter + s.counter, <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">source</span> s <span style="color: #fcaf3e;">WHERE</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> = s.<span style="color: #729fcf;">id</span> <span style="color: #fcaf3e;">RETURNING</span> s.<span style="color: #729fcf;">id</span> ) <span style="color: #729fcf;">INSERT</span> <span style="color: #fcaf3e;">INTO</span> target(<span style="color: #729fcf;">id</span>, counter) <span style="color: #fcaf3e;">SELECT</span> <span style="color: #729fcf;">id</span>, <span style="color: #729fcf;">sum</span>(counter) <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">source</span> s <span style="color: #fcaf3e;">LEFT</span> <span style="color: #fcaf3e;">JOIN</span> upd <span style="color: #729fcf;">t</span> <span style="color: #fcaf3e;">USING</span>(<span style="color: #729fcf;">id</span>) <span style="color: #fcaf3e;">WHERE</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> <span style="color: #fcaf3e;">IS</span> <span style="color: #fcaf3e;">NULL</span> <span style="color: #fcaf3e;">GROUP</span> <span style="color: #729fcf;">BY</span> s.<span style="color: #729fcf;">id</span> <span style="color: #fcaf3e;">RETURNING</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> </pre>
INSERT statement, because that will have an
<em>index</em> on the join key.</p>
<h3>Concurrency patterns</h3>
<p class="first">Now, you will tell me that we just solved the UPSERT problem. Well what
happens if more than one transaction is trying to do the WITH (UPDATE)
INSERT dance at the same time? It's a single <em>statement</em>, so it's a single
<em>snapshot</em>. What can go wrong?</p>
<center>
<p><img src="../../../images/gophermegaphones.480.jpg" alt=""></p>
</center>
<center>
<p><em>Concurrent processing</em></p>
</center>
<p>What happens is that as soon as the concurrent sources contain some data for
the same <em>primary key</em>, you get a <em>duplicate key</em> error on the insert. As both
the transactions are concurrent, they are seeing the same <em>target</em> table where
the new data does not exists, and both will conclude that they need to
INSERT the new data into the <em>target</em> table.</p>
<p>There are two things that you can do to avoid the problem. The first thing
is to make it so that you're doing only one <em>batch update</em> at any time, by
architecting your application around that constraint. That's the most
effective way around the problem, but not the most practical.</p>
<p>The other thing you can do, is force the concurrent transactions to
serialize one after the other, using an <a href="http://www.postgresql.org/docs/9.2/static/explicit-locking.html">explicit locking</a> statement:</p>
<pre class="src">
<span style="color: #729fcf;">LOCK</span> <span style="color: #fcaf3e;">TABLE</span> target <span style="color: #fcaf3e;">IN</span> <span style="color: #729fcf;">SHARE</span> <span style="color: #729fcf;">ROW</span> <span style="color: #729fcf;">EXCLUSIVE</span> <span style="color: #729fcf;">MODE</span>;
</pre>
<p>That <em>lock level</em> is not automatically acquired by any PostgreSQL command, so
the only way it helps you is when you're doing that for every transaction
you want to serialize. When you know you're not at risk (that is, when not
playing the <em>insert or update</em> dance), you can omit taking that <em>lock</em>.</p>
<h3>Conclusion</h3>
<center>
<p><a class="image-link" href="http://www.flickr.com/photos/asquarephotography/6841106459/in/photostream/">
<img src="../../../images/stack-of-old-books.jpg"></a></p>
</center>
<p>The SQL language has its quirks, that's true. It's been made for efficient
data processing, and with recent enough <a href="http://www.postgresql.org/about/featurematrix/">PostgreSQL releases</a> you even have
some advanced pipelining facilities included in the language. Properly
learning how to make the most out of that old component of your programming
stack still makes a lot of sense today!</p>
]]></description>
<center> <p><img src="../../../images/Home-Brewing.jpg" alt=""></p> </center> <center> <p><em>Another kind of Batch to update</em></p> </center> <p>So when you need to<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 15 Mar 2013 10:47:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/03/15-batch-update.html</guid> </item> <item> <title>Batch Update</title> <link>http://tapoueh.org/blog/2013/03/15-batch-update.html</link> <description><![CDATA[<p>Performance consulting involves some tricks that you have to teach over and over again. One of them is that SQL tends to be so much better at dealing with plenty of rows in a single statement when compared to running as many statements, each one against a single row.</p>
UPDATE a bunch of rows from a given source, remember
that you can actually use a JOIN in the <em>update</em> statement. Either the source
of data is already in the database, in which case it's as simple as using
the FROM clause in the <em>update</em> statement, or it's not, and we're getting back
to that in a minute.</p>
<h3>UPDATE FROM</h3>
<p class="first">It's all about using that FROM clause in an <em>update</em> statement, right?</p>
<pre class="src">
<p>Using that, you can actually update thousands of rows in our <em>target</em> table in a single statement, and you can't really get faster than that.</p> <h3>Preparing the Batch</h3> <p class="first">Now, if you happen to have the source data in your application process' memory, the previous bits is not doing you any good, you think. Well, the trick is that pushing your in-memory data into the database and then joining against the now local source of data is generally faster than looping in the application and having to do a whole network <em>round trip</em> per row.</p> <center> <p><img src="../../../images/round-trip.png" alt=""></p> </center> <center> <p><em>What about</em> <strong><em>that</em></strong> <em>round trip?</em></p> </center> <p>Let's see how it goes:</p> <pre class="src"> <span style="color: #fcaf3e;">CREATE</span> <span style="color: #729fcf;">TEMP</span> <span style="color: #fcaf3e;">TABLE</span> <span style="color: #729fcf;">source</span>(<span style="color: #fcaf3e;">LIKE</span> target <span style="color: #729fcf;">INCLUDING</span> <span style="color: #fcaf3e;">ALL</span>) <span style="color: #fcaf3e;">ON</span> <span style="color: #729fcf;">COMMIT</span> <span style="color: #729fcf;">DROP</span>;<span style="color: #729fcf;">UPDATE</span> target <span style="color: #729fcf;">t</span> <span style="color: #729fcf;">SET</span> counter = <span style="color: #729fcf;">t</span>.counter + s.counter, <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">source</span> s <span style="color: #fcaf3e;">WHERE</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> = s.<span style="color: #729fcf;">id</span> </pre>
<span style="color: #729fcf;">COPY</span> <span style="color: #729fcf;">source</span> <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">STDIN</span>;
<span style="color: #729fcf;">UPDATE</span> target <span style="color: #729fcf;">t</span>
<p>As we're talking about performances, the trick here is to use the <a href="http://www.postgresql.org/docs/9.2/static/sql-copy.html">COPY</a> protocol to fill in the <em>temporary table</em> we just create to hold our data. So we're now sending the whole data set in a temporary location in the database, then using that as the<span style="color: #729fcf;">SET</span> counter = <span style="color: #729fcf;">t</span>.counter + s.counter, <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">source</span> s <span style="color: #fcaf3e;">WHERE</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> = s.<span style="color: #729fcf;">id</span> </pre>
UPDATE source. And that's way faster than
doing a separate UPDATE statement per row in your batch, even for small
batches.</p>
<p>Also, rather than using the SQL COPY command, you might want to look up the
docs of the PostgreSQL driver you are currently using in your application,
it certainly includes some higher level facilities to deal with pushing the
data into the streaming protocol.</p>
<h3>Insert or Update</h3>
<p class="first">And now sometime some of the rows in the batch have to be <em>updated</em> while some
others are new and must be inserted. How do you do that? Well, PostgreSQL
9.1 brings on the table WITH support for all <a href="http://www.postgresql.org/docs/9.2/static/dml.html">DML</a> queries, which means that
you can do the following just fine:</p>
<pre class="src">
<span style="color: #fcaf3e;">WITH</span> upd <span style="color: #fcaf3e;">AS</span> (
<p>That query here is <em>updating</em> all the rows that are known in both the <em>target</em> and the <em>source</em> and returns what we took from the <em>source</em> in the operation, so that we can do an <em>anti-join</em> in the next step of the query, where we're <em>inserting</em> any row that was not taken care of in the <em>update</em> part of the statement.</p> <p>Note that when the batch gets to bigger size it's usually better to join against the <em>target</em> table in the<span style="color: #729fcf;">UPDATE</span> target <span style="color: #729fcf;">t</span> <span style="color: #729fcf;">SET</span> counter = <span style="color: #729fcf;">t</span>.counter + s.counter, <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">source</span> s <span style="color: #fcaf3e;">WHERE</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> = s.<span style="color: #729fcf;">id</span> <span style="color: #fcaf3e;">RETURNING</span> s.<span style="color: #729fcf;">id</span> ) <span style="color: #729fcf;">INSERT</span> <span style="color: #fcaf3e;">INTO</span> target(<span style="color: #729fcf;">id</span>, counter) <span style="color: #fcaf3e;">SELECT</span> <span style="color: #729fcf;">id</span>, <span style="color: #729fcf;">sum</span>(counter) <span style="color: #fcaf3e;">FROM</span> <span style="color: #729fcf;">source</span> s <span style="color: #fcaf3e;">LEFT</span> <span style="color: #fcaf3e;">JOIN</span> upd <span style="color: #729fcf;">t</span> <span style="color: #fcaf3e;">USING</span>(<span style="color: #729fcf;">id</span>) <span style="color: #fcaf3e;">WHERE</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> <span style="color: #fcaf3e;">IS</span> <span style="color: #fcaf3e;">NULL</span> <span style="color: #fcaf3e;">GROUP</span> <span style="color: #729fcf;">BY</span> s.<span style="color: #729fcf;">id</span> <span style="color: #fcaf3e;">RETURNING</span> <span style="color: #729fcf;">t</span>.<span style="color: #729fcf;">id</span> </pre>
INSERT statement, because that will have an
<em>index</em> on the join key.</p>
<h3>Concurrency patterns</h3>
<p class="first">Now, you will tell me that we just solved the UPSERT problem. Well what
happens if more than one transaction is trying to do the WITH (UPDATE)
INSERT dance at the same time? It's a single <em>statement</em>, so it's a single
<em>snapshot</em>. What can go wrong?</p>
<center>
<p><img src="../../../images/gophermegaphones.480.jpg" alt=""></p>
</center>
<center>
<p><em>Concurrent processing</em></p>
</center>
<p>What happens is that as soon as the concurrent sources contain some data for
the same <em>primary key</em>, you get a <em>duplicate key</em> error on the insert. As both
the transactions are concurrent, they are seeing the same <em>target</em> table where
the new data does not exists, and both will conclude that they need to
INSERT the new data into the <em>target</em> table.</p>
<p>There are two things that you can do to avoid the problem. The first thing
is to make it so that you're doing only one <em>batch update</em> at any time, by
architecting your application around that constraint. That's the most
effective way around the problem, but not the most practical.</p>
<p>The other thing you can do, is force the concurrent transactions to
serialize one after the other, using an <a href="http://www.postgresql.org/docs/9.2/static/explicit-locking.html">explicit locking</a> statement:</p>
<pre class="src">
<span style="color: #729fcf;">LOCK</span> <span style="color: #fcaf3e;">TABLE</span> target <span style="color: #fcaf3e;">IN</span> <span style="color: #729fcf;">SHARE</span> <span style="color: #729fcf;">ROW</span> <span style="color: #729fcf;">EXCLUSIVE</span> <span style="color: #729fcf;">MODE</span>;
</pre>
<p>That <em>lock level</em> is not automatically acquired by any PostgreSQL command, so
the only way it helps you is when you're doing that for every transaction
you want to serialize. When you know you're not at risk (that is, when not
playing the <em>insert or update</em> dance), you can omit taking that <em>lock</em>.</p>
<h3>Conclusion</h3>
<center>
<p><a class="image-link" href="http://www.flickr.com/photos/asquarephotography/6841106459/in/photostream/">
<img src="../../../images/stack-of-old-books.jpg"></a></p>
</center>
<p>The SQL language has its quirks, that's true. It's been made for efficient
data processing, and with recent enough <a href="http://www.postgresql.org/about/featurematrix/">PostgreSQL releases</a> you even have
some advanced pipelining facilities included in the language. Properly
learning how to make the most out of that old component of your programming
stack still makes a lot of sense today!</p>
]]></description>
<center> <p><img src="../../../images/Home-Brewing.jpg" alt=""></p> </center> <center> <p><em>Another kind of Batch to update</em></p> </center> <p>So when you need to<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 15 Mar 2013 10:47:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/03/15-batch-update.html</guid> </item> <item> <title>Batch Update</title> <link>http://tapoueh.org/blog/2013/03/15-batch-update.html</link> <description><![CDATA[<p>Performance consulting involves some tricks that you have to teach over and over again. One of them is that SQL tends to be so much better at dealing with plenty of rows in a single statement when compared to running as many statements, each one against a single row.</p>
UPDATE a bunch of rows from a given source, remember
that you can actually use a JOIN in the <em>update</em> statement. Either the source
of data is already in the database, in which case it's as simple as using
the FROM clause in the <em>update</em> statement, or it's not, and we're getting back
to that in a minute.</p>
<h3>UPDATE FROM</h3>
<p class="first">It's all about using that FROM clause in an <em>update</em> statement, right?</p>
<pre class="src">
<p>Using that, you can actually update thousands of rows in our <em>target</em> table in a single statement, and you can't really get faster than that.</p> <h3>Preparing the Batch</h3> <p class="first">Now, if you happen to have the source data in your application process' memory, the previous bits is not doing you any good, you think. Well, the trick is that pushing your in-memory data into the database and then joining against the now local source of data is generally faster than looping in the application and having to do a whole network <em>round trip</em> per row.</p> <center> <p><img src="../../../images/round-trip.png" alt=""></p> </center> <center> <p><em>What about</em> <strong><em>that</em></strong> <em>round trip?</em></p> </center> <p>Let's see how it goes:</p> <pre class="src"> <span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">TEMP</span> <span style="color: #7f007f;">TABLE</span> <span style="color: #da70d6;">source</span>(<span style="color: #7f007f;">LIKE</span> target <span style="color: #da70d6;">INCLUDING</span> <span style="color: #7f007f;">ALL</span>) <span style="color: #7f007f;">ON</span> <span style="color: #da70d6;">COMMIT</span> <span style="color: #da70d6;">DROP</span>;<span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span> <span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter, <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span> </pre>
<span style="color: #da70d6;">COPY</span> <span style="color: #da70d6;">source</span> <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">STDIN</span>;
<span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span>
<p>As we're talking about performances, the trick here is to use the <a href="http://www.postgresql.org/docs/9.2/static/sql-copy.html">COPY</a> protocol to fill in the <em>temporary table</em> we just create to hold our data. So we're now sending the whole data set in a temporary location in the database, then using that as the<span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter, <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span> </pre>
UPDATE source. And that's way faster than
doing a separate UPDATE statement per row in your batch, even for small
batches.</p>
<p>Also, rather than using the SQL COPY command, you might want to look up the
docs of the PostgreSQL driver you are currently using in your application,
it certainly includes some higher level facilities to deal with pushing the
data into the streaming protocol.</p>
<h3>Insert or Update</h3>
<p class="first">And now sometime some of the rows in the batch have to be <em>updated</em> while some
others are new and must be inserted. How do you do that? Well, PostgreSQL
9.1 brings on the table WITH support for all <a href="http://www.postgresql.org/docs/9.2/static/dml.html">DML</a> queries, which means that
you can do the following just fine:</p>
<pre class="src">
<span style="color: #7f007f;">WITH</span> upd <span style="color: #7f007f;">AS</span> (
<p>That query here is <em>updating</em> all the rows that are known in both the <em>target</em> and the <em>source</em> and returns what we took from the <em>source</em> in the operation, so that we can do an <em>anti-join</em> in the next step of the query, where we're <em>inserting</em> any row that was not taken care of in the <em>update</em> part of the statement.</p> <p>Note that when the batch gets to bigger size it's usually better to join against the <em>target</em> table in the<span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span> <span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter, <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span> <span style="color: #7f007f;">RETURNING</span> s.<span style="color: #da70d6;">id</span> ) <span style="color: #da70d6;">INSERT</span> <span style="color: #7f007f;">INTO</span> target(<span style="color: #da70d6;">id</span>, counter) <span style="color: #7f007f;">SELECT</span> <span style="color: #da70d6;">id</span>, <span style="color: #da70d6;">sum</span>(counter) <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">LEFT</span> <span style="color: #7f007f;">JOIN</span> upd <span style="color: #da70d6;">t</span> <span style="color: #7f007f;">USING</span>(<span style="color: #da70d6;">id</span>) <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> <span style="color: #7f007f;">IS</span> <span style="color: #7f007f;">NULL</span> <span style="color: #7f007f;">GROUP</span> <span style="color: #da70d6;">BY</span> s.<span style="color: #da70d6;">id</span> <span style="color: #7f007f;">RETURNING</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> </pre>
INSERT statement, because that will have an
<em>index</em> on the join key.</p>
<h3>Concurrency patterns</h3>
<p class="first">Now, you will tell me that we just solved the UPSERT problem. Well what
happens if more than one transaction is trying to do the WITH (UPDATE)
INSERT dance at the same time? It's a single <em>statement</em>, so it's a single
<em>snapshot</em>. What can go wrong?</p>
<center>
<p><img src="../../../images/gophermegaphones.480.jpg" alt=""></p>
</center>
<center>
<p><em>Concurrent processing</em></p>
</center>
<p>What happens is that as soon as the concurrent sources contain some data for
the same <em>primary key</em>, you get a <em>duplicate key</em> error on the insert. As both
the transactions are concurrent, they are seeing the same <em>target</em> table where
the new data does not exists, and both will conclude that they need to
INSERT the new data into the <em>target</em> table.</p>
<p>There are two things that you can do to avoid the problem. The first thing
is to make it so that you're doing only one <em>batch update</em> at any time, by
architecting your application around that constraint. That's the most
effective way around the problem, but not the most practical.</p>
<p>The other thing you can do, is force the concurrent transactions to
serialize one after the other, using an <a href="http://www.postgresql.org/docs/9.2/static/explicit-locking.html">explicit locking</a> statement:</p>
<pre class="src">
<span style="color: #da70d6;">LOCK</span> <span style="color: #7f007f;">TABLE</span> target <span style="color: #7f007f;">IN</span> <span style="color: #da70d6;">SHARE</span> <span style="color: #da70d6;">ROW</span> <span style="color: #da70d6;">EXCLUSIVE</span> <span style="color: #da70d6;">MODE</span>;
</pre>
<p>That <em>lock level</em> is not automatically acquired by any PostgreSQL command, so
the only way it helps you is when you're doing that for every transaction
you want to serialize. When you know you're not at risk (that is, when not
playing the <em>insert or update</em> dance), you can omit taking that <em>lock</em>.</p>
<h3>Conclusion</h3>
<center>
<p><a class="image-link" href="http://www.flickr.com/photos/asquarephotography/6841106459/in/photostream/">
<img src="../../../images/stack-of-old-books.jpg"></a></p>
</center>
<p>The SQL language has its quirks, that's true. It's been made for efficient
data processing, and with recent enough <a href="http://www.postgresql.org/about/featurematrix/">PostgreSQL releases</a> you even have
some advanced pipelining facilities included in the language. Properly
learning how to make the most out of that old component of your programming
stack still makes a lot of sense today!</p>
]]></description>
<center> <p><img src="../../../images/Home-Brewing.jpg" alt=""></p> </center> <center> <p><em>Another kind of Batch to update</em></p> </center> <p>So when you need to<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 15 Mar 2013 10:47:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/03/15-batch-update.html</guid> </item> <item> <title>Batch Update</title> <link>http://tapoueh.org/blog/2013/03/15-batch-update.html</link> <description><![CDATA[<p>Performance consulting involves some tricks that you have to teach over and over again. One of them is that SQL tends to be so much better at dealing with plenty of rows in a single statement when compared to running as many statements, each one against a single row.</p>
UPDATE a bunch of rows from a given source, remember
that you can actually use a JOIN in the <em>update</em> statement. Either the source
of data is already in the database, in which case it's as simple as using
the FROM clause in the <em>update</em> statement, or it's not, and we're getting back
to that in a minute.</p>
<h3>UPDATE FROM</h3>
<p class="first">It's all about using that FROM clause in an <em>update</em> statement, right?</p>
<pre class="src">
<p>Using that, you can actually update thousands of rows in our <em>target</em> table in a single statement, and you can't really get faster than that.</p> <h3>Preparing the Batch</h3> <p class="first">Now, if you happen to have the source data in your application process' memory, the previous bits is not doing you any good, you think. Well, the trick is that pushing your in-memory data into the database and then joining against the now local source of data is generally faster than looping in the application and having to do a whole network <em>round trip</em> per row.</p> <center> <p><img src="../../../images/round-trip.png" alt=""></p> </center> <center> <p><em>What about</em> <strong><em>that</em></strong> <em>round trip?</em></p> </center> <p>Let's see how it goes:</p> <pre class="src"> <span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">TEMP</span> <span style="color: #7f007f;">TABLE</span> <span style="color: #da70d6;">source</span>(<span style="color: #7f007f;">LIKE</span> target <span style="color: #da70d6;">INCLUDING</span> <span style="color: #7f007f;">ALL</span>) <span style="color: #7f007f;">ON</span> <span style="color: #da70d6;">COMMIT</span> <span style="color: #da70d6;">DROP</span>;<span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span> <span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter, <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span> </pre>
<span style="color: #da70d6;">COPY</span> <span style="color: #da70d6;">source</span> <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">STDIN</span>;
<span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span>
<p>As we're talking about performances, the trick here is to use the <a href="http://www.postgresql.org/docs/9.2/static/sql-copy.html">COPY</a> protocol to fill in the <em>temporary table</em> we just create to hold our data. So we're now sending the whole data set in a temporary location in the database, then using that as the<span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter, <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span> </pre>
UPDATE source. And that's way faster than
doing a separate UPDATE statement per row in your batch, even for small
batches.</p>
<p>Also, rather than using the SQL COPY command, you might want to look up the
docs of the PostgreSQL driver you are currently using in your application,
it certainly includes some higher level facilities to deal with pushing the
data into the streaming protocol.</p>
<h3>Insert or Update</h3>
<p class="first">And now sometime some of the rows in the batch have to be <em>updated</em> while some
others are new and must be inserted. How do you do that? Well, PostgreSQL
9.1 brings on the table WITH support for all <a href="http://www.postgresql.org/docs/9.2/static/dml.html">DML</a> queries, which means that
you can do the following just fine:</p>
<pre class="src">
<span style="color: #7f007f;">WITH</span> upd <span style="color: #7f007f;">AS</span> (
<p>That query here is <em>updating</em> all the rows that are known in both the <em>target</em> and the <em>source</em> and returns what we took from the <em>source</em> in the operation, so that we can do an <em>anti-join</em> in the next step of the query, where we're <em>inserting</em> any row that was not taken care of in the <em>update</em> part of the statement.</p> <p>Note that when the batch gets to bigger size it's usually better to join against the <em>target</em> table in the<span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span> <span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter, <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span> <span style="color: #7f007f;">RETURNING</span> s.<span style="color: #da70d6;">id</span> ) <span style="color: #da70d6;">INSERT</span> <span style="color: #7f007f;">INTO</span> target(<span style="color: #da70d6;">id</span>, counter) <span style="color: #7f007f;">SELECT</span> <span style="color: #da70d6;">id</span>, <span style="color: #da70d6;">sum</span>(counter) <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">LEFT</span> <span style="color: #7f007f;">JOIN</span> upd <span style="color: #7f007f;">USING</span>(<span style="color: #da70d6;">id</span>) <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> <span style="color: #7f007f;">IS</span> <span style="color: #7f007f;">NULL</span> <span style="color: #7f007f;">GROUP</span> <span style="color: #da70d6;">BY</span> s.<span style="color: #da70d6;">id</span> <span style="color: #7f007f;">RETURNING</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> </pre>
INSERT statement, because that will have an
<em>index</em> on the join key.</p>
<h3>Concurrency patterns</h3>
<p class="first">Now, you will tell me that we just solved the UPSERT problem. Well what
happens if more than one transaction is trying to do the WITH (UPDATE)
INSERT dance at the same time? It's a single <em>statement</em>, so it's a single
<em>snapshot</em>. What can go wrong?</p>
<center>
<p><img src="../../../images/gophermegaphones.480.jpg" alt=""></p>
</center>
<center>
<p><em>Concurrent processing</em></p>
</center>
<p>What happens is that as soon as the concurrent sources contain some data for
the same <em>primary key</em>, you get a <em>duplicate key</em> error on the insert. As both
the transactions are concurrent, they are seeing the same <em>target</em> table where
the new data does not exists, and both will conclude that they need to
INSERT the new data into the <em>target</em> table.</p>
<p>There are two things that you can do to avoid the problem. The first thing
is to make it so that you're doing only one <em>batch update</em> at any time, by
architecting your application around that constraint. That's the most
effective way around the problem, but not the most practical.</p>
<p>The other thing you can do, is force the concurrent transactions to
serialize one after the other, using an <a href="http://www.postgresql.org/docs/9.2/static/explicit-locking.html">explicit locking</a> statement:</p>
<pre class="src">
<span style="color: #da70d6;">LOCK</span> <span style="color: #7f007f;">TABLE</span> target <span style="color: #7f007f;">IN</span> <span style="color: #da70d6;">SHARE</span> <span style="color: #da70d6;">ROW</span> <span style="color: #da70d6;">EXCLUSIVE</span> <span style="color: #da70d6;">MODE</span>;
</pre>
<p>That <em>lock level</em> is not automatically acquired by any PostgreSQL command, so
the only way it helps you is when you're doing that for every transaction
you want to serialize. When you know you're not at risk (that is, when not
playing the <em>insert or update</em> dance), you can omit taking that <em>lock</em>.</p>
<h3>Conclusion</h3>
<center>
<p><a class="image-link" href="http://www.flickr.com/photos/asquarephotography/6841106459/in/photostream/">
<img src="../../../images/stack-of-old-books.jpg"></a></p>
</center>
<p>The SQL language has its quirks, that's true. It's been made for efficient
data processing, and with recent enough <a href="http://www.postgresql.org/about/featurematrix/">PostgreSQL releases</a> you even have
some advanced pipelining facilities included in the language. Properly
learning how to make the most out of that old component of your programming
stack still makes a lot of sense today!</p>
]]></description>
<center> <p><img src="../../../images/Home-Brewing.jpg" alt=""></p> </center> <center> <p><em>Another kind of Batch to update</em></p> </center> <p>So when you need to<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 15 Mar 2013 10:47:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/03/15-batch-update.html</guid> </item> <item> <title>Batch Update</title> <link>http://tapoueh.org/blog/2013/03/15-batch-update.html</link> <description><![CDATA[<p>Performance consulting involves some tricks that you have to teach over and over again. One of them is that SQL tends to be so much better at dealing with plenty of rows in a single statement when compared to running as many statements, each one against a single row.</p>
UPDATE a bunch of rows from a given source, remember
that you can actually use a JOIN in the <em>update</em> statement. Either the source
of data is already in the database, in which case it's as simple as using
the FROM clause in the <em>update</em> statement, or it's not, and we're getting back
to that in a minute.</p>
<h3>UPDATE FROM</h3>
<p class="first">It's all about using that FROM clause in an <em>update</em> statement, right?</p>
<pre class="src">
<p>Using that, you can actually update thousands of rows in our <em>target</em> table in a single statement, and you can't really get faster than that.</p> <h3>Preparing the Batch</h3> <p class="first">Now, if you happen to have the source data in your application process' memory, the previous bits is not doing you any good, you think. Well, the trick is that pushing your in-memory data into the database and then joining against the now local source of data is generally faster than looping in the application and having to do a whole network <em>round trip</em> per row.</p> <center> <p><img src="../../../images/round-trip.png" alt=""></p> </center> <center> <p><em>What about</em> <strong><em>that</em></strong> <em>round trip?</em></p> </center> <p>Let's see how it goes:</p> <pre class="src"> <span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">TEMP</span> <span style="color: #7f007f;">TABLE</span> <span style="color: #da70d6;">source</span>(<span style="color: #7f007f;">LIKE</span> target <span style="color: #da70d6;">INCLUDING</span> <span style="color: #7f007f;">ALL</span>) <span style="color: #7f007f;">ON</span> <span style="color: #da70d6;">COMMIT</span> <span style="color: #da70d6;">DROP</span>;<span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span> <span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter, <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span> </pre>
<span style="color: #da70d6;">COPY</span> <span style="color: #da70d6;">source</span> <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">STDIN</span>;
<span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span>
<p>As we're talking about performances, the trick here is to use the <a href="http://www.postgresql.org/docs/9.2/static/sql-copy.html">COPY</a> protocol to fill in the <em>temporary table</em> we just create to hold our data. So we're now sending the whole data set in a temporary location in the database, then using that as the<span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter, <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span> </pre>
UPDATE source. And that's way faster than
doing a separate UPDATE statement per row in your batch, even for small
batches.</p>
<p>Also, rather than using the SQL COPY command, you might want to look up the
docs of the PostgreSQL driver you are currently using in your application,
it certainly includes some higher level facilities to deal with pushing the
data into the streaming protocol.</p>
<h3>Insert or Update</h3>
<p class="first">And now sometime some of the rows in the batch have to be <em>updated</em> while some
others are new and must be inserted. How do you do that? Well, PostgreSQL
9.1 brings on the table WITH support for all <a href="http://www.postgresql.org/docs/9.2/static/dml.html">DML</a> queries, which means that
you can do the following just fine:</p>
<pre class="src">
<span style="color: #7f007f;">WITH</span> upd <span style="color: #7f007f;">AS</span> (
<p>That query here is <em>updating</em> all the rows that are known in both the <em>target</em> and the <em>source</em> and returns what we took from the <em>source</em> in the operation, so that we can do an <em>anti-join</em> in the next step of the query, where we're <em>inserting</em> any row that was not taken care of in the <em>update</em> part of the statement.</p> <p>Note that when the batch gets to bigger size it's usually better to join against the <em>target</em> table in the<span style="color: #da70d6;">UPDATE</span> target <span style="color: #da70d6;">t</span> <span style="color: #da70d6;">SET</span> counter = <span style="color: #da70d6;">t</span>.counter + s.counter, <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> = s.<span style="color: #da70d6;">id</span> <span style="color: #7f007f;">RETURNING</span> s.<span style="color: #da70d6;">id</span> ) <span style="color: #da70d6;">INSERT</span> <span style="color: #7f007f;">INTO</span> target(<span style="color: #da70d6;">id</span>, counter) <span style="color: #7f007f;">SELECT</span> <span style="color: #da70d6;">id</span>, <span style="color: #da70d6;">sum</span>(counter) <span style="color: #7f007f;">FROM</span> <span style="color: #da70d6;">source</span> s <span style="color: #7f007f;">LEFT</span> <span style="color: #7f007f;">JOIN</span> upd <span style="color: #7f007f;">USING</span>(<span style="color: #da70d6;">id</span>) <span style="color: #7f007f;">WHERE</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> <span style="color: #7f007f;">IS</span> <span style="color: #7f007f;">NULL</span> <span style="color: #7f007f;">GROUP</span> <span style="color: #da70d6;">BY</span> s.<span style="color: #da70d6;">id</span> <span style="color: #7f007f;">RETURNING</span> <span style="color: #da70d6;">t</span>.<span style="color: #da70d6;">id</span> </pre>
INSERT statement, because that will have an
<em>index</em> on the join key.</p>
<h3>Concurrency patterns</h3>
<p class="first">Now, you will tell me that we just solved the UPSERT problem. Well what
happens if more than one transaction is trying to do the WITH (UPDATE)
INSERT dance at the same time? It's a single <em>statement</em>, so it's a single
<em>snapshot</em>. What can go wrong?</p>
<center>
<p><img src="../../../images/gophermegaphones.480.jpg" alt=""></p>
</center>
<center>
<p><em>Concurrent processing</em></p>
</center>
<p>What happens is that as soon as the concurrent sources contain some data for
the same <em>primary key</em>, you get a <em>duplicate key</em> error on the insert. As both
the transactions are concurrent, they are seeing the same <em>target</em> table where
the new data does not exists, and both will conclude that they need to
INSERT the new data into the <em>target</em> table.</p>
<p>There are two things that you can do to avoid the problem. The first thing
is to make it so that you're doing only one <em>batch update</em> at any time, by
architecting your application around that constraint. That's the most
effective way around the problem, but not the most practical.</p>
<p>The other thing you can do, is force the concurrent transactions to
serialize one after the other, using an <a href="http://www.postgresql.org/docs/9.2/static/explicit-locking.html">explicit locking</a> statement:</p>
<pre class="src">
<span style="color: #da70d6;">LOCK</span> <span style="color: #7f007f;">TABLE</span> target <span style="color: #7f007f;">IN</span> <span style="color: #da70d6;">SHARE</span> <span style="color: #da70d6;">ROW</span> <span style="color: #da70d6;">EXCLUSIVE</span> <span style="color: #da70d6;">MODE</span>;
</pre>
<p>That <em>lock level</em> is not automatically acquired by any PostgreSQL command, so
the only way it helps you is when you're doing that for every transaction
you want to serialize. When you know you're not at risk (that is, when not
playing the <em>insert or update</em> dance), you can omit taking that <em>lock</em>.</p>
<h3>Conclusion</h3>
<center>
<p><a class="image-link" href="http://www.flickr.com/photos/asquarephotography/6841106459/in/photostream/">
<img src="../../../images/stack-of-old-books.jpg"></a></p>
</center>
<p>The SQL language has its quirks, that's true. It's been made for efficient
data processing, and with recent enough <a href="http://www.postgresql.org/about/featurematrix/">PostgreSQL releases</a> you even have
some advanced pipelining facilities included in the language. Properly
learning how to make the most out of that old component of your programming
stack still makes a lot of sense today!</p>
]]></description>
<center> <p><img src="../../../images/emacs-rocks-logo.png" alt=""></p> </center> <center> <p><em>It's all about Emacs, and it rocks!</em></p> </center> <p>We have a great line-up for this conference, which makes me proud to be able to be there. If you've ever been paying attention when using <a href="http://www.gnu.org/software/emacs/">Emacs</a> then you've already heard those names: <a href="http://sachachua.com/blog/">Sacha Chua</a> is frequently blogging about how she manages to improve her workflow thanks to <a href="http://www.gnu.org/software/emacs/emacs-lisp-intro/">Emacs Lisp</a>, <a href="https://github.com/jwiegley">John Wiegley</a> is a proficient Emacs contributor maybe best known for his <a href="https://github.com/ledger/ledger">ledger</a> <em>Emacs Mode</em>, then we have <a href="http://www.lukego.com/">Luke Gorrie</a> who hacked up <a href="http://wingolog.org/archives/2006/01/02/slime">SLIME</a> among other things, we also have <a href="http://nic.ferrier.me.uk/">Nic Ferrier</a> who is starting a revolution in how to use <em>Emacs Lisp</em> with <a href="http://elnode.org/">elnode</a>. And more! Including <a href="http://en.wikipedia.org/wiki/Steve_Yegge">Steve Yegge</a>!</p> <center> <p>See you there in London.</p> </center> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 15 Mar 2013 10:47:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/03/15-batch-update.html</guid> </item> <item> <title>Emacs Conference</title> <link>http://tapoueh.org/blog/2013/03/04-Emacs-Conference.html</link> <description><![CDATA[<p>The <a href="http://emacsconf.herokuapp.com/">Emacs Conference</a> is happening, it's real, and it will take place at the end of this month in London. Check it out, and register at <a href="http://emacsconf.eventbrite.co.uk/">Emacs Conference Event Brite</a>. It's free and there's still some availability.</p>
<center> <p><img src="../../../images/SetOperations.480.png" alt=""></p> </center> <center> <p><em>Which Set Operation do you want for counting unique values?</em></p> </center> <p>The first query here has the default level of magic in it, really. What happens is that each time we do an update of the <em>HyperLogLog</em> <em>hash</em> value, we update some data which are allowing us to compute its cardinality.</p> <pre class="src"><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 04 Mar 2013 13:58:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/03/04-Emacs-Conference.html</guid> </item> <item> <title>HyperLogLog Unions</title> <link>http://tapoueh.org/blog/2013/02/26-hll-union.html</link> <description><![CDATA[<p>In the article from yesterday we talked about <a href="http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html">PostgreSQL HyperLogLog</a> with some details. The real magic of that extension has been skimmed over though, and needs another very small article all by itself, in case you missed it.</p>
> <span style"color: #7f007f;">select</span> <span style="color: #228b22;">date</span>,
#users <span style="color: #7f007f;">as</span> daily, pg_column_size(users) <span style="color: #7f007f;">as</span> bytes <span style="color: #7f007f;">from</span> daily_uniques <span style="color: #7f007f;">order</span> <span style="color: #da70d6;">by</span> <span style="color: #228b22;">date</span>;
| <span style="color: #228b22;">date</span> | daily | bytes |
<span style="color: #b22222;">————+——————+——-
| </span> 2013-02-22 | 401676.779509985 | 1287 |
| 2013-02-23 | 660187.271908359 | 1287 |
| 2013-02-24 | 869980.029947449 | 1287 |
| 2013-02-25 | 580865.296677817 | 1287 |
| 2013-02-26 | 240569.492722719 | 1287 |
(5 <span style="color: #da70d6;">rows</span>) </pre>
<p>And has advertized the data is kept in a static sized data structure. The magic here all happens athll_add() time, the function you have to call to
update the data.</p>
<p>Now on to something way more magic!</p>
<center>
<p><img src="../../../images/aggregates2.jpg" alt=""></p>
</center>
<center>
<p><em>Are those the aggregates you're looking for?</em></p>
</center>
<pre class="src">
> <span style"color: #7f007f;">select</span> to_char(<span style="color: #228b22;">date</span>, <span style="color: #bc8f8f;">'YYYY/MM'</span>) <span style="color: #7f007f;">as</span> <span style="color: #da70d6;">month</span>,
round(#hll_union_agg(users)) <span style="color: #7f007f;">as</span> monthly <span style="color: #7f007f;">from</span> daily_uniques <span style="color: #7f007f;">group</span> <span style="color: #da70d6;">by</span> 1;
| <span style="color: #da70d6;">month</span> | monthly |
<span style="color: #b22222;">———+———
| </span> 2013/02 | 1960380 |
(1 <span style="color: #da70d6;">row</span>) </pre>
<p>The <em>HyperLogLog</em> data structure is allowing the implementation of an <strong><em>union</em></strong> algorithm that will be able to compute how many unique values you happen to have registered in both one day and the next. Extended in its general form, and doing SQL, what you get is an <em>aggregate</em> that you can use inGROUP BY
constructs and <a href="http://www.postgresql.org/docs/9.2/static/tutorial-window.html">window functions</a>. Did you read about them yet?</p>
]]></description>
<center> <p><img src="../../../images/cardinality1.jpg" alt=""></p> </center> <center> <p><em>How to Compute Cardinality?</em></p> </center> <h3>Installing postgresql-hll</h3> <p class="first">It's as simple as<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 26 Feb 2013 12:44:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/26-hll-union.html</guid> </item> <item> <title>PostgreSQL HyperLogLog</title> <link>http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</link> <description><![CDATA[<p>If you've been following along at home the newer statistics developments, you might have heard about this new <a href="http://research.google.com/pubs/pub40671.html">State of The Art Cardinality Estimation Algorithm</a> called <a href="http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data/">HyperLogLog</a>. This technique is now available for PostgreSQL in the extension <a href="http://blog.aggregateknowledge.com/2013/02/04/open-source-release-postgresql-hll/">postgresql-hll</a> available at <a href="https://github.com/aggregateknowledge/postgresql-hll">https://github.com/aggregateknowledge/postgresql-hll</a> and soon to be in
debian.</p>
CREATE EXTENSION hll; really, even if to get there you
must have installed the <em>package</em> on your system. We did some packaging work
for debian and the result should appear soon in a distro near you.</p>
<p>Then you also need to keep your data in some table, straight from the
documentation we can use that schema:</p>
<pre class="src">
<span style="color: #b22222;">— Create the destination table
</span><span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">TABLE</span> <span style="color: #0000ff;">daily_uniques</span> (
<span style="color: #228b22;">DATE</span> <span style="color: #228b22;">DATE</span> <span style="color: #7f007f;">UNIQUE</span>,
users hll
);
</pre>
<p>Then to add some data for which you want to know the <em>cardinality</em> of, it's as
simple as in the following UPDATE statement:</p>
<pre class="src">
<span style="color: #da70d6;">UPDATE</span> daily_uniques
<p>So in our example what you see is that we want to decipher how many unique IP addresses we saw, and we do that by first creating a <em>hash</em> of that source data then calling<span style="color: #da70d6;">SET</span> users = hll_add(users, hll_hash_text(<span style="color: #bc8f8f;">'123.123.123.123'</span>)) <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>; </pre>
hll_add() with the current value and the hash result.</p>
<p>The current value must be initialized using hll_empty().</p>
<h3>Concurrency</h3>
<p class="first">The most awake readers among you have already spotted that: using an UPDATE
on the same row over and over again is a good recipe to kill any form of
concurrency, so you don't want to do that on your production setup unless
you don't care about those UPDATE waiting piling up in your system.</p>
<p>The idea is then to fill-in a queue of updates and asynchronously update the
daily_uniques table from that queue, possibly using the hll_add_agg
aggregate that the extension provides, so that you do only one update per
batch of values to process.</p>
<h3>∅: Empty Set and NULL</h3>
<center>
<p><img src="../../../images/EmptySet_L.gif" alt=""></p>
</center>
<center>
<p><em>Yes there's a <a href="http://www.unicodemap.org/details/0x2205/index.html">unicode</a> entry for that, ∅</em></p>
</center>
<p>Now, what happens when the batch of new unique values you want to update
from is itself empty? Well I would have expected hll_add_agg over an empty
set to return an empty hll value, the same as returned by hll_empty(), but
it turns out it's returning NULL instead.</p>
<p>And then hll_add(users, NULL) will happily return NULL. So the next UPDATE
is cancelling all the previous work, which is not nice. We had to cater for
that case explicitely in the UPDATE query that's working from the batch of
new values to add to our current <em>HyperLogLog</em> hash entry, and I can't resist
to show off one of the most awesome PostgreSQL features here: <em>writable CTE</em>.</p>
<pre class="src">
<span style="color: #7f007f;">WITH</span> hll(agg) <span style="color: #7f007f;">AS</span> (
<p>That's how you protect against an empty set being turned into a<span style="color: #7f007f;">SELECT</span> hll_add_agg(hll_hash_text(<span style="color: #da70d6;">value</span>)) <span style="color: #7f007f;">FROM</span> new_batch ) <span style="color: #da70d6;">UPDATE</span> daily_uniques <span style="color: #da70d6;">SET</span> users = <span style="color: #7f007f;">CASE</span> <span style="color: #7f007f;">WHEN</span> hll.agg <span style="color: #7f007f;">IS</span> <span style="color: #7f007f;">NULL</span> <span style="color: #7f007f;">THEN</span> users <span style="color: #7f007f;">ELSE</span> hll_union(users, hll.agg) <span style="color: #7f007f;">END</span> <span style="color: #7f007f;">FROM</span> hll <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>; </pre>
NULL. I
think the real fix would need to be included in postgresql-hll itself, in
making it so that the hll_add_agg aggregate returns hll_empty() on an empty
set, and I will report that bug (with that very article as the detailed
explanation of it).</p>
<h3>Using postgresql-hll</h3>
<p class="first">When using postgresql-hll on the production system, we were able to get some
good looking numbers from our daily_uniques table:</p>
<pre class="src">
<span style="color: #7f007f;">with</span> stats <span style="color: #7f007f;">as</span> (
<span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, #users <span style="color: #7f007f;">as</span> daily, #hll_union_agg(users) <span style="color: #7f007f;">over</span>() <span style="color: #7f007f;">as</span> total <span style="color: #7f007f;">from</span> daily_uniques ) <span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, round(daily) <span style="color: #7f007f;">as</span> daily, round((daily/total*100)::<span style="color: #228b22;">numeric</span>, 2) <span style="color: #7f007f;">as</span> percent <span style="color: #7f007f;">from</span> stats <span style="color: #7f007f;">order</span> <span style="color: #da70d6;">by</span> <span style="color: #228b22;">date</span>;
| <span style="color: #228b22;">date</span> | daily | percent |
<span style="color: #b22222;">————+———+———
| </span> 2013-02-22 | 401677 | 25.19 |
| 2013-02-23 | 660187 | 41.41 |
| 2013-02-24 | 869980 | 54.56 |
| 2013-02-25 | 154996 | 9.72 |
(4 <span style="color: #da70d6;">rows</span>) </pre>
<p>I coulnd't resist to show off two of my favorite SQL constructs in that example query here, which are the <a href="http://www.postgresql.org/docs/9.2/static/queries-with.html">Common Table Expressions</a> (or CTE) and <a href="http://www.postgresql.org/docs/9.2/static/tutorial-window.html">window functions</a>. If thatover() clause reads strange to you, take a minute
now and go read about it. Yes, do that now, we're waiting.</p>
<p>The data here is showing that we did setup the facility in the middle of the
first day, and that the morning's activity is quite low.</p>
<h3>Conclusion</h3>
<center>
<p><img src="../../../images/hll-dv-estimator.png" alt=""></p>
</center>
<center>
<p><em>The <a href="http://blog.aggregateknowledge.com/author/wwkae/">HyperLogLog DV estimator</a></em></p>
</center>
<p>When using postgresql-hll you need to be careful not to kill your
application concurrency abilities, and you need to protect yourself against
the ∅ killer too. The other thing to keep in mind is that the numbers you
get out of the hll technique are estimates within a given <em>precision</em>, and you
might want to read some more about what it means for your intended usage of
the feature.</p>
]]></description>
<center> <p><img src="../../../images/cardinality1.jpg" alt=""></p> </center> <center> <p><em>How to Compute Cardinality?</em></p> </center> <h3>Installing postgresql-hll</h3> <p class="first">It's as simple as<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 25 Feb 2013 10:23:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</guid> </item> <item> <title>PostgreSQL HyperLogLog</title> <link>http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</link> <description><![CDATA[<p>If you've been following along at home the newer statistics developments, you might have heard about this new <a href="http://research.google.com/pubs/pub40671.html">State of The Art Cardinality Estimation Algorithm</a> called <a href="http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data/">HyperLogLog</a>. This technique is now available for PostgreSQL in the extension <a href="http://blog.aggregateknowledge.com/2013/02/04/open-source-release-postgresql-hll/">postgresql-hll</a> available at <a href="https://github.com/aggregateknowledge/postgresql-hll">https://github.com/aggregateknowledge/postgresql-hll</a> and soon to be in
debian.</p>
CREATE EXTENSION hll; really, even if to get there you
must have installed the <em>package</em> on your system. We did some packaging work
for debian and the result should appear soon in a distro near you.</p>
<p>Then you also need to keep your data in some table, straight from the
documentation we can use that schema:</p>
<pre class="src">
<span style="color: #b22222;">— Create the destination table
</span><span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">TABLE</span> <span style="color: #0000ff;">daily_uniques</span> (
<span style="color: #228b22;">DATE</span> <span style="color: #228b22;">DATE</span> <span style="color: #7f007f;">UNIQUE</span>,
users hll
);
</pre>
<p>Then to add some data for which you want to know the <em>cardinality</em> of, it's as
simple as in the following UPDATE statement:</p>
<pre class="src">
<span style="color: #da70d6;">UPDATE</span> daily_uniques
<p>So in our example what you see is that we want to decipher how many unique IP addresses we saw, and we do that by first creating a <em>hash</em> of that source data then calling<span style="color: #da70d6;">SET</span> users = hll_add(users, hll_hash_text(<span style="color: #bc8f8f;">'123.123.123.123'</span>)) <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>; </pre>
hll_add() with the current value and the hash result.</p>
<p>The current value must be initialized using hll_empty().</p>
<h3>Concurrency</h3>
<p class="first">The most awake readers among you have already spotted that: using an UPDATE
on the same row over and over again is a good recipe to kill any form of
concurrency, so you don't want to do that on your production setup unless
you don't care about those UPDATE waiting piling up in your system.</p>
<p>The idea is then to fill-in a queue of updates and asynchronously update the
daily_uniques table from that queue, possibly using the hll_add_agg
aggregate that the extension provides, so that you do only one update per
batch of values to process.</p>
<h3>∅: Empty Set and NULL</h3>
<center>
<p><img src="../../../images/EmptySet_L.gif" alt=""></p>
</center>
<center>
<p><em>Yes there's a <a href="http://www.unicodemap.org/details/0x2205/index.html">unicode</a> entry for that, ∅</em></p>
</center>
<p>Now, what happens when the batch of new unique values you want to update
from is itself empty? Well I would have expected hll_add_agg over an empty
set to return an empty hll value, the same as returned by hll_empty(), but
it turns out it's returning NULL instead.</p>
<p>And then hll_add(users, NULL) will happily return NULL. So the next UPDATE
is cancelling all the previous work, which is not nice. We had to cater for
that case explicitely in the UPDATE query that's working from the batch of
new values to add to our current <em>HyperLogLog</em> hash entry, and I can't resist
to show off one of the most awesome PostgreSQL features here: <em>writable CTE</em>.</p>
<pre class="src">
<span style="color: #7f007f;">WITH</span> hll(agg) <span style="color: #7f007f;">AS</span> (
<p>That's how you protect against an empty set being turned into a<span style="color: #7f007f;">SELECT</span> hll_add_agg(hll_hash_text(<span style="color: #da70d6;">value</span>)) <span style="color: #7f007f;">FROM</span> new_batch ) <span style="color: #da70d6;">UPDATE</span> daily_uniques <span style="color: #da70d6;">SET</span> users = <span style="color: #7f007f;">CASE</span> <span style="color: #7f007f;">WHEN</span> hll.agg <span style="color: #7f007f;">IS</span> <span style="color: #7f007f;">NULL</span> <span style="color: #7f007f;">THEN</span> users <span style="color: #7f007f;">ELSE</span> hll_union(users, hll.agg) <span style="color: #7f007f;">END</span> <span style="color: #7f007f;">FROM</span> hll <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>; </pre>
NULL. I
think the real fix would need to be included in postgresql-hll itself, in
making it so that the hll_add_agg aggregate returns hll_empty() on an empty
set, and I will report that bug (with that very article as the detailed
explanation of it).</p>
<h3>Using postgresql-hll</h3>
<p class="first">When using postgresql-hll on the production system, we were able to get some
good looking numbers from our daily_uniques table:</p>
<pre class="src">
<span style="color: #7f007f;">with</span> stats <span style="color: #7f007f;">as</span> (
<span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, #users <span style="color: #7f007f;">as</span> daily, #hll_union_agg(users) <span style="color: #7f007f;">over</span>() <span style="color: #7f007f;">as</span> total <span style="color: #7f007f;">from</span> daily_uniques ) <span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, round(daily) <span style="color: #7f007f;">as</span> daily, round((daily/total*100)::<span style="color: #228b22;">numeric</span>, 2) <span style="color: #7f007f;">as</span> percent <span style="color: #7f007f;">from</span> stats <span style="color: #7f007f;">order</span> <span style="color: #da70d6;">by</span> <span style="color: #228b22;">date</span>;
| <span style="color: #228b22;">date</span> | daily | percent |
<span style="color: #b22222;">————+———+———
| </span> 2013-02-22 | 401677 | 25.19 |
| 2013-02-23 | 660187 | 41.41 |
| 2013-02-24 | 869980 | 54.56 |
| 2013-02-25 | 154996 | 9.72 |
(4 <span style="color: #da70d6;">rows</span>) </pre>
<p>I coulnd't resist to show off two of my favorite SQL constructs in that example query here, which are the <a href="http://www.postgresql.org/docs/9.2/static/queries-with.html">Common Table Expressions</a> (or CTE) and <a href="http://www.postgresql.org/docs/9.2/static/tutorial-window.html">window functions</a>. If thatover() clause reads strange to you, take a minute
now and go read about it. Yes, do that now, we're waiting.</p>
<p>The data here is showing that we did setup the facility in the middle of the
first day, and that the morning's activity is quite low.</p>
<h3>Conclusion</h3>
<center>
<p><img src="../../../images/hll-dv-estimator.png" alt=""></p>
</center>
<center>
<p><em>The <a href="http://blog.aggregateknowledge.com/author/wwkae/">HyperLogLog DV estimator</a></em></p>
</center>
<p>When using postgresql-hll you need to be careful not to kill your
application concurrency abilities, and you need to protect yourself against
the ∅ killer too. The other thing to keep in mind is that the numbers you
get out of the hll technique are estimates within a given <em>precision</em>, and you
might want to read some more about what it means for your intended usage of
the feature.</p>
]]></description>
<center> <p><img src="../../../images/cardinality1.jpg" alt=""></p> </center> <center> <p><em>How to Compute Cardinality?</em></p> </center> <h3>Installing postgresql-hll</h3> <p class="first">It's as simple as<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 25 Feb 2013 10:23:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</guid> </item> <item> <title>PostgreSQL HyperLogLog</title> <link>http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</link> <description><![CDATA[<p>If you've been following along at home the newer statistics developments, you might have heard about this new <a href="http://research.google.com/pubs/pub40671.html">State of The Art Cardinality Estimation Algorithm</a> called <a href="http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data/">HyperLogLog</a>. This technique is now available for PostgreSQL in the extension <a href="http://blog.aggregateknowledge.com/2013/02/04/open-source-release-postgresql-hll/">postgresql-hll</a> available at <a href="https://github.com/aggregateknowledge/postgresql-hll">https://github.com/aggregateknowledge/postgresql-hll</a> and soon to be in
debian.</p>
CREATE EXTENSION hll; really, even if to get there you
must have installed the <em>package</em> on your system. We did some packaging work
for debian and the result should appear soon in a distro near you.</p>
<p>Then you also need to keep your data in some table, straight from the
documentation we can use that schema:</p>
<pre class="src">
<span style="color: #b22222;">— Create the destination table
</span><span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">TABLE</span> <span style="color: #0000ff;">daily_uniques</span> (
<span style="color: #228b22;">DATE</span> <span style="color: #228b22;">DATE</span> <span style="color: #7f007f;">UNIQUE</span>,
users hll
);
</pre>
<p>Then to add some data for which you want to know the <em>cardinality</em> of, it's as
simple as in the following UPDATE statement:</p>
<pre class="src">
<span style="color: #da70d6;">UPDATE</span> daily_uniques
<p>So in our example what you see is that we want to decipher how many unique IP addresses we saw, and we do that by first creating a <em>hash</em> of that source data then calling<span style="color: #da70d6;">SET</span> users = hll_add(users, hll_hash_text(<span style="color: #bc8f8f;">'123.123.123.123'</span>)) <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>; </pre>
hll_add() with the current value and the hash result.</p>
<p>The current value must be initialized using hll_empty().</p>
<h3>Concurrency</h3>
<p class="first">The most awake readers among you have already spotted that: using an UPDATE
on the same row over and over again is a good recipe to kill any form of
concurrency, so you don't want to do that on your production setup unless
you don't care about those UPDATE waiting piling up in your system.</p>
<p>The idea is then to fill-in a queue of updates and asynchronously update the
daily_uniques table from that queue, possibly using the hll_add_agg
aggregate that the extension provides, so that you do only one update per
batch of values to process.</p>
<h3>∅: Empty Set and NULL</h3>
<center>
<p><img src="../../../images/EmptySet_L.gif" alt=""></p>
</center>
<center>
<p><em>Yes there's a <a href="http://www.unicodemap.org/details/0x2205/index.html">unicode</a> entry for that, ∅</em></p>
</center>
<p>Now, what happens when the batch of new unique values you want to update
from is itself empty? Well I would have expected hll_add_agg over an empty
set to return an empty hll value, the same as returned by hll_empty(), but
it turns out it's returning NULL instead.</p>
<p>And then hll_add(users, NULL) will happily return NULL. So the next UPDATE
is cancelling all the previous work, which is not nice. We had to cater for
that case explicitely in the UPDATE query that's working from the batch of
new values to add to our current <em>HyperLogLog</em> hash entry, and I can't resist
to show off one of the most awesome PostgreSQL features here: <em>writable CTE</em>.</p>
<pre class="src">
<span style="color: #7f007f;">WITH</span> hll(agg) <span style="color: #7f007f;">AS</span> (
<p>That's how you protect against an empty set being turned into a<span style="color: #7f007f;">SELECT</span> hll_add_agg(hll_hash_text(<span style="color: #da70d6;">value</span>)) <span style="color: #7f007f;">FROM</span> new_batch ) <span style="color: #da70d6;">UPDATE</span> daily_uniques <span style="color: #da70d6;">SET</span> users = <span style="color: #7f007f;">CASE</span> <span style="color: #7f007f;">WHEN</span> hll.agg <span style="color: #7f007f;">IS</span> <span style="color: #7f007f;">NULL</span> <span style="color: #7f007f;">THEN</span> users <span style="color: #7f007f;">ELSE</span> hll_union(users, hll.agg) <span style="color: #7f007f;">END</span> <span style="color: #7f007f;">FROM</span> hll <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>; </pre>
NULL. I
think the real fix would need to be included in postgresql-hll itself, in
making it so that the hll_add_agg aggregate returns hll_empty() on an empty
set, and I will report that bug (with that very article as the detailed
explanation of it).</p>
<h3>Using postgresql-hll</h3>
<p class="first">When using postgresql-hll on the production system, we were able to get some
good looking numbers from our daily_uniques table:</p>
<pre class="src">
<span style="color: #7f007f;">with</span> stats <span style="color: #7f007f;">as</span> (
<span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, #users <span style="color: #7f007f;">as</span> daily, #hll_union_agg(users) <span style="color: #7f007f;">over</span>() <span style="color: #7f007f;">as</span> total <span style="color: #7f007f;">from</span> daily_uniques ) <span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, round(daily) <span style="color: #7f007f;">as</span> daily, round((daily/total*100)::<span style="color: #228b22;">numeric</span>, 2) <span style="color: #7f007f;">as</span> percent <span style="color: #7f007f;">from</span> stats <span style="color: #7f007f;">order</span> <span style="color: #da70d6;">by</span> <span style="color: #228b22;">date</span>;
| <span style="color: #228b22;">date</span> | daily | percent |
<span style="color: #b22222;">————+———+———
| </span> 2013-02-22 | 401677 | 25.19 |
| 2013-02-23 | 660187 | 41.41 |
| 2013-02-24 | 869980 | 54.56 |
| 2013-02-25 | 154996 | 9.72 |
(4 <span style="color: #da70d6;">rows</span>) </pre>
<p>I coulnd't resist to show off two of my favorite SQL constructs in that example query here, which are the <a href="http://www.postgresql.org/docs/9.2/static/queries-with.html">Common Table Expressions</a> (or CTE) and <a href="http://www.postgresql.org/docs/9.2/static/tutorial-window.html">window functions</a>. If thatover() clause reads strange to you, take a minute
now and go read about it. Yes, do that now, we're waiting.</p>
<p>The data here is showing that we did setup the facility in the middle of the
first day, and that the morning's activity is quite low.</p>
<h3>Conclusion</h3>
<center>
<p><img src="../../../images/hll-dv-estimator.png" alt=""></p>
</center>
<center>
<p><em>The <a href="http://blog.aggregateknowledge.com/author/wwkae/">HyperLogLog DV estimator</a></em></p>
</center>
<p>When using postgresql-hll you need to be careful not to kill your
application concurrency abilities, and you need to protect yourself against
the ∅ killer too. The other thing to keep in mind is that the numbers you
get out of the hll technique are estimates within a given <em>precision</em>, and you
might want to read some more about what it means for your intended usage of
the feature.</p>
]]></description>
<center> <p><img src="../../../images/cardinality1.jpg" alt=""></p> </center> <center> <p><em>How to Compute Cardinality?</em></p> </center> <h3>Installing postgresql-hll</h3> <p class="first">It's as simple as<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 25 Feb 2013 10:23:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</guid> </item> <item> <title>PostgreSQL HyperLogLog</title> <link>http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</link> <description><![CDATA[<p>If you've been following along at home the newer statistics developments, you might have heard about this new <a href="http://research.google.com/pubs/pub40671.html">State of The Art Cardinality Estimation Algorithm</a> called <a href="http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data/">HyperLogLog</a>. This technique is now available for PostgreSQL in the extension <a href="http://blog.aggregateknowledge.com/2013/02/04/open-source-release-postgresql-hll/">postgresql-hll</a> available at <a href="https://github.com/aggregateknowledge/postgresql-hll">https://github.com/aggregateknowledge/postgresql-hll</a> and soon to be in
debian.</p>
CREATE EXTENSION hll; really, even if to get there you
must have installed the <em>package</em> on your system. We did some packaging work
for debian and the result should appear soon in a distro near you.</p>
<p>Then you also need to keep your data in some table, straight from the
documentation we can use that schema:</p>
<pre class="src">
<span style="color: #b22222;">— Create the destination table
</span><span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">TABLE</span> <span style="color: #0000ff;">daily_uniques</span> (
<span style="color: #228b22;">DATE</span> <span style="color: #228b22;">DATE</span> <span style="color: #7f007f;">UNIQUE</span>,
users hll
);
</pre>
<p>Then to add some data for which you want to know the <em>cardinality</em> of, it's as
simple as in the following UPDATE statement:</p>
<pre class="src">
<span style="color: #da70d6;">UPDATE</span> daily_uniques
<p>So in our example what you see is that we want to decipher how many unique IP addresses we saw, and we do that by first creating a <em>hash</em> of that source data then calling<span style="color: #da70d6;">SET</span> users = hll_add(users, hll_hash_text(<span style="color: #bc8f8f;">'123.123.123.123'</span>)) <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>; </pre>
hll_add() with the current value and the hash result.</p>
<p>The current value must be initialized using hll_empty().</p>
<h3>Concurrency</h3>
<p class="first">The most awake readers among you have already spotted that: using an UPDATE
on the same row over and over again is a good recipe to kill any form of
concurrency, so you don't want to do that on your production setup unless
you don't care about those UPDATE waiting piling up in your system.</p>
<p>The idea is then to fill-in a queue of updates and asynchronously update the
daily_uniques table from that queue, possibly using the hll_add_agg
aggregate that the extension provides, so that you do only one update per
batch of values to process.</p>
<h3>∅: Empty Set and NULL</h3>
<center>
<p><img src="../../../images/EmptySet_L.gif" alt=""></p>
</center>
<center>
<p><em>Yes there's a <a href="http://www.unicodemap.org/details/0x2205/index.html">unicode</a> entry for that, ∅</em></p>
</center>
<p>Now, what happens when the batch of new unique values you want to update
from is itself empty? Well I would have expected hll_add_agg over an empty
set to return an empty hll value, the same as returned by hll_empty(), but
it turns out it's returning NULL instead.</p>
<p>And then hll_add(users, NULL) will happily return NULL. So the next UPDATE
is cancelling all the previous work, which is not nice. We had to cater for
that case explicitely in the UPDATE query that's working from the batch of
new values to add to our current <em>HyperLogLog</em> hash entry, and I can't resist
to show off one of the most awesome PostgreSQL features here: <em>writable CTE</em>.</p>
<pre class="src">
<span style="color: #7f007f;">WITH</span> hll(agg) <span style="color: #7f007f;">AS</span> (
<p>That's how you protect against an empty set being turned into a<span style="color: #7f007f;">SELECT</span> hll_add_agg(hll_hash_text(<span style="color: #da70d6;">value</span>)) <span style="color: #7f007f;">FROM</span> new_batch ) <span style="color: #da70d6;">UPDATE</span> daily_uniques <span style="color: #da70d6;">SET</span> users = <span style="color: #7f007f;">CASE</span> <span style="color: #7f007f;">WHEN</span> hll.agg <span style="color: #7f007f;">IS</span> <span style="color: #7f007f;">NULL</span> <span style="color: #7f007f;">THEN</span> users <span style="color: #7f007f;">ELSE</span> hll_union(users, hll.agg) <span style="color: #7f007f;">END</span> <span style="color: #7f007f;">FROM</span> hll <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>; </pre>
NULL. I
think the real fix would need to be included in postgresql-hll itself, in
making it so that the hll_add_agg aggregate returns hll_empty() on an empty
set, and I will report that bug (with that very article as the detailed
explanation of it).</p>
<h3>Using postgresql-hll</h3>
<p class="first">When using postgresql-hll on the production system, we were able to get some
good looking numbers from our daily_uniques table:</p>
<pre class="src">
<span style="color: #7f007f;">with</span> stats <span style="color: #7f007f;">as</span> (
<span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, #users <span style="color: #7f007f;">as</span> daily, #hll_union_agg(users) <span style="color: #7f007f;">over</span>() <span style="color: #7f007f;">as</span> total <span style="color: #7f007f;">from</span> daily_uniques ) <span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, round(daily) <span style="color: #7f007f;">as</span> daily, round((daily/total*100)::<span style="color: #228b22;">numeric</span>, 2) <span style="color: #7f007f;">as</span> percent <span style="color: #7f007f;">from</span> stats <span style="color: #7f007f;">order</span> <span style="color: #da70d6;">by</span> <span style="color: #228b22;">date</span>;
| <span style="color: #228b22;">date</span> | daily | percent |
<span style="color: #b22222;">————+———+———
| </span> 2013-02-22 | 401677 | 25.19 |
| 2013-02-23 | 660187 | 41.41 |
| 2013-02-24 | 869980 | 54.56 |
| 2013-02-25 | 154996 | 9.72 |
(4 <span style="color: #da70d6;">rows</span>) </pre>
<p>I coulnd't resist to show off two of my favorite SQL constructs in that example query here, which are the <a href="http://www.postgresql.org/docs/9.2/static/queries-with.html">Common Table Expressions</a> (or CTE) and <a href="http://www.postgresql.org/docs/9.2/static/tutorial-window.html">window functions</a>. If thatover() clause reads strange to you, take a minute
now and go read about it. Yes, do that now, we're waiting.</p>
<p>The data here is showing that we did setup the facility in the middle of the
first day, and that the morning's activity is quite low.</p>
<h3>Conclusion</h3>
<center>
<p><img src="../../../images/hll-dv-estimator.png" alt=""></p>
</center>
<center>
<p><em>The <a href="http://blog.aggregateknowledge.com/author/wwkae/">HyperLogLog DV estimator</a></em></p>
</center>
<p>When using postgresql-hll you need to be careful not to kill your
application concurrency abilities, and you need to protect yourself against
the ∅ killer too. The other thing to keep in mind is that the numbers you
get out of the hll technique are estimates within a given <em>precision</em>, and you
might want to read some more about what it means for your intended usage of
the feature.</p>
]]></description>
<center> <p><img src="../../../images/cardinality1.jpg" alt=""></p> </center> <center> <p><em>How to Compute Cardinality?</em></p> </center> <h3>Installing postgresql-hll</h3> <p class="first">It's as simple as<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 25 Feb 2013 10:23:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</guid> </item> <item> <title>PostgreSQL HyperLogLog</title> <link>http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</link> <description><![CDATA[<p>If you've been following along at home the newer statistics developments, you might have heard about this new <a href="http://research.google.com/pubs/pub40671.html">State of The Art Cardinality Estimation Algorithm</a> called <a href="http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data/">HyperLogLog</a>. This technique is now available for PostgreSQL in the extension <a href="http://blog.aggregateknowledge.com/2013/02/04/open-source-release-postgresql-hll/">postgresql-hll</a> available at <a href="https://github.com/aggregateknowledge/postgresql-hll">https://github.com/aggregateknowledge/postgresql-hll</a> and soon to be in
debian.</p>
CREATE EXTENSION hll; really, even if to get there you
must have installed the <em>package</em> on your system. We did some packaging work
for debian and the result should appear soon in a distro near you.</p>
<p>Then you also need to keep your data in some table, straight from the
documentation we can use that schema:</p>
<pre class="src">
<span style="color: #b22222;">— Create the destination table
</span><span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">TABLE</span> <span style="color: #0000ff;">daily_uniques</span> (
<span style="color: #228b22;">DATE</span> <span style="color: #228b22;">DATE</span> <span style="color: #7f007f;">UNIQUE</span>,
users hll
);
</pre>
<p>Then to add some data for which you want to know the <em>cardinality</em> of, it's as
simple as in the following UPDATE statement:</p>
<pre class="src">
<span style="color: #da70d6;">UPDATE</span> daily_uniques
<p>So in our example what you see is that we want to decipher how many unique IP addresses we saw, and we do that by first creating a <em>hash</em> of that source data then calling<span style="color: #da70d6;">SET</span> users = hll_add(users, hll_hash_text(<span style="color: #bc8f8f;">'123.123.123.123'</span>)) <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>; </pre>
hll_add() with the current value and the hash result.</p>
<p>The current value must be initialized using hll_empty().</p>
<h3>Concurrency</h3>
<p class="first">The most awake readers among you have already spotted that: using an UPDATE
on the same row over and over again is a good recipe to kill any form of
concurrency, so you don't want to do that on your production setup unless
you don't care about those UPDATE waiting piling up in your system.</p>
<p>The idea is then to fill-in a queue of updates and asynchronously update the
daily_uniques table from that queue, possibly using the hll_add_agg
aggregate that the extension provides, so that you do only one update per
batch of values to process.</p>
<h3>∅: Empty Set and NULL</h3>
<center>
<p><img src="../../../images/EmptySet_L.gif" alt=""></p>
</center>
<center>
<p><em>Yes there's a <a href="http://www.unicodemap.org/details/0x2205/index.html">unicode</a> entry for that, ∅</em></p>
</center>
<p>Now, what happens when the batch of new unique values you want to update
from is itself empty? Well I would have expected hll_add_agg over an empty
set to return an empty hll value, the same as returned by hll_empty(), but
it turns out it's returning NULL instead.</p>
<p>And then hll_add(users, NULL) will happily return NULL. So the next UPDATE
is cancelling all the previous work, which is not nice. We had to cater for
that case explicitely in the UPDATE query that's working from the batch of
new values to add to our current <em>HyperLogLog</em> hash entry, and I can't resist
to show off one of the most awesome PostgreSQL features here: <em>writable CTE</em>.</p>
<pre class="src">
<span style="color: #7f007f;">WITH</span> hll(agg) <span style="color: #7f007f;">AS</span> (
<p>That's how you protect against an empty set being turned into a<span style="color: #7f007f;">SELECT</span> hll_add_agg(hll_hash_text(<span style="color: #da70d6;">value</span>)) <span style="color: #7f007f;">FROM</span> new_batch ) <span style="color: #da70d6;">UPDATE</span> daily_uniques <span style="color: #da70d6;">SET</span> users = <span style="color: #7f007f;">CASE</span> <span style="color: #7f007f;">WHEN</span> hll.agg <span style="color: #7f007f;">IS</span> <span style="color: #7f007f;">NULL</span> <span style="color: #7f007f;">THEN</span> users <span style="color: #7f007f;">ELSE</span> hll_union(users, hll.agg) <span style="color: #7f007f;">END</span> <span style="color: #7f007f;">FROM</span> hll <span style="color: #7f007f;">WHERE</span> <span style="color: #228b22;">date</span> = <span style="color: #7f007f;">current_date</span>; </pre>
NULL. I
think the real fix would need to be included in postgresql-hll itself, in
making it so that the hll_add_agg aggregate returns hll_empty() on an empty
set, and I will report that bug (with that very article as the detailed
explanation of it).</p>
<h3>Using postgresql-hll</h3>
<p class="first">When using postgresql-hll on the production system, we were able to get some
good looking numbers from our daily_uniques table:</p>
<pre class="src">
<span style="color: #7f007f;">with</span> stats <span style="color: #7f007f;">as</span> (
<span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, #users <span style="color: #7f007f;">as</span> daily, #hll_union_agg(users) <span style="color: #7f007f;">over</span>() <span style="color: #7f007f;">as</span> total <span style="color: #7f007f;">from</span> daily_uniques ) <span style="color: #7f007f;">select</span> <span style="color: #228b22;">date</span>, round(daily) <span style="color: #7f007f;">as</span> daily, round((daily/total*100)::<span style="color: #228b22;">numeric</span>, 2) <span style="color: #7f007f;">as</span> percent <span style="color: #7f007f;">from</span> stats <span style="color: #7f007f;">order</span> <span style="color: #da70d6;">by</span> <span style="color: #228b22;">date</span>;
| <span style="color: #228b22;">date</span> | daily | percent |
<span style="color: #b22222;">————+———+———
| </span> 2013-02-22 | 401677 | 25.19 |
| 2013-02-23 | 660187 | 41.41 |
| 2013-02-24 | 869980 | 54.56 |
| 2013-02-25 | 154996 | 9.72 |
(4 <span style="color: #da70d6;">rows</span>) </pre>
<p>I coulnd't resist to show off two of my favorite SQL constructs in that example query here, which are the <a href="http://www.postgresql.org/docs/9.2/static/queries-with.html">Common Table Expressions</a> (or CTE) and <a href="http://www.postgresql.org/docs/9.2/static/tutorial-window.html">window functions</a>. If thatover() clause reads strange to you, take a minute
now and go read about it. Yes, do that now, we're waiting.</p>
<p>The data here is showing that we did setup the facility in the middle of the
first day, and that the morning's activity is quite low.</p>
<h3>Conclusion</h3>
<center>
<p><img src="../../../images/hll-dv-estimator.png" alt=""></p>
</center>
<center>
<p><em>The <a href="http://blog.aggregateknowledge.com/author/wwkae/">HyperLogLog DV estimator</a></em></p>
</center>
<p>When using postgresql-hll you need to be careful not to kill your
application concurrency abilities, and you need to protect yourself against
the ∅ killer too. The other thing to keep in mind is that the numbers you
get out of the hll technique are estimates within a given <em>precision</em>, and you
might want to read some more about what it means for your intended usage of
the feature.</p>
]]></description>
<center> <p><img src="../../../images/made-with-lisp.png" alt=""></p> </center> <center> <p><em>Yes, that old language brings so much on the table</em></p> </center> <p>When using <em>Common Lisp</em>, you have an awesome interactive development environment where you can redefine function and objects <em>while testing them</em>. That means you don't have to quit the interpreter, reload the new version of the code and put the interactive test case together all over again after a change. Just evaluate the change in the interactive environement: functions are compiled incrementally over their previous definition, objects whose classes have changed are migrated live.</p> <p>See, I just said <em>objects</em> and <em>classes</em>. <em>Common Lisp</em> comes with some advanced <em>Object Oriented Programming</em> facilities named <a href="http://www.aiai.ed.ac.uk/~jeff/clos-guide.html">CLOS</a> and <a href="http://www.alu.org/mop/index.html">MOP</a> where the <em>Java</em> and <em>Python</em> and <em>C++</em> object models are just a subset of what you're being offered. Hint, those don't have <a href="http://en.wikipedia.org/wiki/Multiple_dispatch">Multiple Dispatch</a>.</p> <p>And you have a very sophisticated <a href="http://www.gigamonkeys.com/book/beyond-exception-handling-conditions-and-restarts.html">Condition System</a> where <em>Exceptions</em> are just a subset of what you can do (hint: have a look a <a href="http://www.gigamonkeys.com/book/beyond-exception-handling-conditions-and-restarts.html#restarts">restarts</a> and tell me you didn't wish your programming language of choice had them). And it continues that way for about any basic building bloc you might want to be using.</p> <h3>Loading data</h3> <p class="first">Back to <a href="http://tapoueh.org/pgsql/pgloader.html">pgloader</a> will you tell me. Right. I've been spending a couple of evening on hacking on the new version of pgloader in <em>Common Lisp</em>, and wanted to share some preliminary results.</p> <center> <p><img src="../../../images/toy-loader.320.jpg" alt=""></p> </center> <center> <p><em>Playing with the loader</em></p> </center> <p>The current status of the new <em>pgloader</em> still is pretty rough, if you're not used to develop in Common Lisp you might not find it ready for use yet. I'm still working on the internal APIs and trying to make something clean and easy to use for a developer, and then I will provide some external ways to play with it, user oriented. I missed that step once with the <em>Python</em> based version of the tool, I don't want to do the same errors again this time.</p> <p>So here's a test run with the current <em>pgloader</em>, on a small enough data set of<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 25 Feb 2013 10:23:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/25-postgresql-hyperloglog.html</guid> </item> <item> <title>Playing with pgloader</title> <link>http://tapoueh.org/blog/2013/02/12-playing-with-pgloader.html</link> <description><![CDATA[<p>While making progress with both <a href="http://wiki.postgresql.org/wiki/Event_Triggers">Event Triggers</a> and <a href="http://tapoueh.org/blog/2013/01/08-Extensions-Templates.html">Extension Templates</a>, I needed to make a little break. My current keeping sane mental exercise seems to mainly involve using <em>Common Lisp</em>, a programming language that ships with about all the building blocks you need.</p>
226 MB of CSV files.</p>
<pre class="src">
time python pgloader.py -R.. —summary -Tc ../pgloader.dbname.conf
| Table name | duration | size | copy rows | errors |
====================================================================
| aaaaaaaaaa_aaaa | 2.148s | - | 24595 | 0 |
| bbbbbbbbbb_bbbb...| 0.609s | - | 326 | 0 | |
| cccccccccc_cccc...| 2.868s | - | 25126 | 0 | |
| dddddddddd_dddd...| 0.638s | - | 8 | 0 | |
| eeeeeeeeee_eeee...| 2.874s | - | 36825 | 0 | |
| ffffffffff_ffffff | 0.667s | - | 624 | 0 |
| gggggggggg_gggg...| 0.847s | - | 5638 | 0 | |
| hhh_hhhhhhh | 9.907s | - | 120159 | 0 |
| iii_iiiiiiiiiiiii | 0.574s | - | 661 | 0 |
| jjjjjjj | 6.647s | - | 30027 | 0 |
| kkk_kkkkkkkkk | 0.439s | - | 12 | 0 |
| lll_llllll | 0.308s | - | 4 | 0 |
| mmmm_mmm | 2.139s | - | 29669 | 0 |
| nnnn_nnnnnn | 8.555s | - | 100197 | 0 |
| oooo_ooooo | 13.781s | - | 93555 | 0 |
| pppp_ppppppp | 8.275s | - | 76457 | 0 |
| qqqq_qqqqqqqqqqqq | 8.568s | - | 126159 | 0 |
====================================================================
| Total | 01m09.902s | - | 670042 | 0 |
CSV file or a <em>MySQL</em> database directly, and pushing that data
in the queue; while the other thread is pulling data from the queue and
writing it into our <a href="http://www.postgresql.org/">PostgreSQL</a> database.</p>
<pre class="src">
CL-USER> (pgloader.csv:import-database <span style="color: #bc8f8f;">"dbname"</span>
:csv-path-root <span style="color: #bc8f8f;">"/path/to/csv/"</span> :separator #\Tab :quote #\" :escape <span style="color: #bc8f8f;">"\"\""</span> :null-as <span style="color: #bc8f8f;">":null:"</span>) table name read imported errors time
——— ——— ——— ——— aaaaaaaaaa_aaaa 24595 24595 0 0.995s bbbbbbbbbb_bbbbbbbbb 326 326 0 0.570s cccccccccc_cccccccccccc 25126 25126 0 1.461s dddddddddd_dddddddddd_dd 8 8 0 0.650s eeeeeeeeee_eeeeeeeeee_eeeeeeee 36825 36825 0 1.664s ffffffffff_ffffff 624 624 0 0.707s gggggggggg_ggggg_gggggggg 5638 5638 0 0.655s hhh_hhhhhhh 120159 120159 0 3.415s iii_iiiiiiiiiiiii 661 661 0 0.420s jjjjjjj 30027 30027 0 2.743s kkk_kkkkkkkkk 12 12 0 0.327s lll_llllll 4 4 0 0.315s mmmm_mmm 29669 29669 0 1.182s nnnn_nnnnnn 100197 100197 0 2.206s oooo_ooooo 93555 93555 0 9.683s pppp_ppppppp 76457 76457 0 5.349s qqqq_qqqqqqqqqqqq 126159 126159 0 2.495s
——— ——— ——— ——— Total import time 670042 670042 0 34.836s NIL </pre>
table name read imported errors time
——— ——— ——— ——— aaaaaaaaaa_aaaa 24595 24595 0 0.887s bbbbbbbbbb_bbbbbbbbb 326 326 0 0.617s cccccccccc_cccccccccccc 25126 25126 0 1.497s dddddddddd_dddddddddd_dd 8 8 0 0.582s eeeeeeeeee_eeeeeeeeee_eeeeeeee 36825 36825 0 1.697s ffffffffff_ffffff 624 624 0 0.748s gggggggggg_ggggg_gggggggg 5638 5638 0 0.923s hhh_hhhhhhh 120159 120159 0 3.525s iii_iiiiiiiiiiiii 661 661 0 0.449s jjjjjjj 30027 30027 0 2.546s kkk_kkkkkkkkk 12 12 0 0.330s lll_llllll 4 4 0 0.323s mmmm_mmm 29669 29669 0 1.227s nnnn_nnnnnn 100197 100197 0 2.489s oooo_ooooo 93555 93555 0 9.148s pppp_ppppppp 76457 76457 0 6.713s qqqq_qqqqqqqqqqqq 126159 126159 0 4.571s
——— ——— ——— ——— Total streaming time 670042 670042 0 38.272s NIL </pre>
1m4.745s. Now, if we do
an <em>export only</em> test, it runs in 31.822s. So yes streaming is a good thing to
have here.</p>
<h3>Conclusion</h3>
<p class="first">We just got twice as fast as the python version.</p>
<p>Some will say that I'm not comparing fairly to the <em>Python</em> version of
pgloader here, because I could have implemented the streaming facility in
<em>Python</em> too. Well actually I did, the option are called <a href="http://tapoueh.org/pgsql/pgloader.html#sec13">section_threads</a> and
<a href="http://tapoueh.org/pgsql/pgloader.html#sec15">split_file_reading</a>, that you can set so that a reader is pushing data into a
set of queues and several workers are feeding each from its own queue. It
didn't help with performances at all. Once again, read about the infamous
<a href="http://docs.python.org/3/c-api/init.html#threads">Global Interpreter Lock</a> to understand why not.</p>
<center>
<p><img src="../../../images/lisplogo_flag_128.png" alt=""></p>
</center>
<p>So actually it's a fair comparison here where the new code is twice as fast
as the previous one, with only some hours of hacking and before spending any
time on optimisation. Well, apart from using a <em>producer</em>, a <em>consumer</em> and a
<em>queue</em>, which I almost had to have for streaming in between two database
connections anyways.</p>
]]></description>
<center> <p><img src="../../../images/sexp.gif" alt=""></p> </center> <center> <p><em>Well,<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 12 Feb 2013 11:17:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/12-playing-with-pgloader.html</guid> </item> <item> <title>Marking whole word</title> <link>http://tapoueh.org/blog/2013/02/04-Emacs-mark-word.html</link> <description><![CDATA[<p>I've discovered recently another Emacs facility that I since then use several times a day, and I wonder how I did without it before:
C-M-SPC runs the command mark-sexp.</p>
mark-sexp apparently is related to the Sex Pistols</em></p>
</center>
<p>It's pretty simple actually, when you have the <em>point</em> at the beginning of a
word or an identifier (containing numbers, dashes, underscores and other
punctuation signs), you can select the <em>whole</em> of it in a single key chord!</p>
<p>The best thing is that if you press the same key chord again, it will expand
to include the next expression. And that works in plain text and most
programming languages where I've tried it, which is not so much recently. It
does not depend that much on the programming language anyway.</p>
<p>The full general solution here is to use something like <a href="https://github.com/magnars/expand-region.el">expand region</a>, don't
miss the <a href="http://emacsrocks.com/e09.html">Emacs Rocks Expand Region Episode</a>, it's less than 3 minutes and you
will want to install <em>expand-region</em> after that. For easy installing, of
course you are already using <a href="http://tapoueh.org/emacs/el-get.html">el-get</a> right?</p>
<p>Now, a friend just asked this morning how to select the <em>current word</em> even
when the the point is currently in the middle of it. Going manually back to
the beginning of it is no fun. I knew about thing-at-point and a little
about how it works, but didn't find anything readily made for that use case
(hint: it needs to be an <em>interactive</em> command).</p>
<p>Here's what I came up with, then:</p>
<pre class="src">
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">mha:select-current-word</span> ()
<span style="color: #bc8f8f;">"Select the current word."</span> (interactive) (beginning-of-thing 'symbol) (push-mark (point) nil t) (end-of-thing 'symbol) (exchange-point-and-mark))
(global-set-key (kbd <span style="color: #bc8f8f;">"C-S-M-SPC"</span>) 'mha:select-current-word) </pre>
<p>I pickedC-M-S-SPC not because it's the easiest way to invoke the new
command, but because to me it's a quite natural extension to the C-M-SPC
that I use so often. Again, each time you want to <em>select</em> a identifier in
some code of yours, you'd most certainly be better off using C-M-SPC.</p>
]]></description>
<center> <p><img src="../../../images/software-upgrade.320.png" alt=""></p> </center> <center> <p><em>Upgrade time!</em></p> </center> <h3>Skytools 3.1.3 enters debian</h3> <p class="first">First news is that <em>Skytools 3.1.3</em> has been entering <a href="http://packages.debian.org/search?keywords=skytools3">debian</a> today (I hope that by the time you reach that URL, it's been updated to show information according to the news here, but I might be early). As there's current a <em>debian freeze</em> to release <em>wheezy</em> (and you can help <a href="http://www.debian.org/News/2012/20121110">squash some bugs</a>), this version is only getting uploaded to <em>experimental</em> for now. Thanks to the tireless work of <a href="http://www.df7cb.de/blog/2012/apt.postgresql.org.html">Christoph Berg</a> though, this version is already available from <a href="https://wiki.postgresql.org/wiki/Apt">apt.postgresql.org</a>.</p> <h3>Upgrading to PGQ 3</h3> <p class="first">The other news is that I've been testing <em>live upgrade</em> scenario where we want to upgrade from<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 08 Feb 2013 17:15:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/04-Emacs-mark-word.html</guid> </item> <item> <title>Live Upgrading PGQ</title> <link>http://tapoueh.org/blog/2013/02/08-PGQ-Live-Upgrade.html</link> <description><![CDATA[<p>Some <a href="http://skytools.projects.pgfoundry.org/skytools-3.0/doc/">skytools</a> related new today, it's been a while. For those who where at my <a href="http://tapoueh.org/blog/2013/02/04-Another-great-FOSDEM.html">FOSDEM's talk</a> about <a href="https://fosdem.org/2013/schedule/event/postgresql_implementing_high_availability/">Implementing High Availability</a> you might have heard that I really like working with <a href="http://wiki.postgresql.org/wiki/Skytools#PgQ">PGQ</a>. A new version has been released a while ago, and the most recent verion is now
3.1.3, as announced in the <a href="http://www.postgresql.org/message-id/CACMqXCLD2je5VFqUCzjwC2s5QQVYLe6-4awJaRvqLSBEVw8_MQ@mail.gmail.com">Skytools 3.1.3</a> email.</p>
PGQ to PGQ3, and it works pretty well, and it's quite simple
to achieve too. Here's how.</p>
<p>So the first thing is to shut down the current <em>ticker</em> process. Then we
install the new packages, assuming that you did follow the step in the wiki
pointed above, please go read <a href="https://wiki.postgresql.org/wiki/Apt">apt.postgresql.org</a> again now if needs be.</p>
<pre class="src">
pgqadm.py ticker.ini -s
sudo apt-get install postgresql-9.1-pgq3 skytools3-ticker skytools3
</pre>
<p>The ticker is not running anymore, we have the right version of the software
installed. Next step is to upgrade the database parts of PGQ:</p>
<pre class="src">
psql -f /usr/share/skytools3/pgq.upgrade_2.1_to_3.0.sql ...
psql -1 -f /usr/share/postgresql/9.1/contrib/pgq.upgrade.sql ...
</pre>
<p>Of course replace those ... with options such as your actual connection
string. I tend to always add -vON_ERROR_STOP=1 to all these
commands, so that I don't depend on having the right .psqlrc on the
particular server I'm connected to. Also remember that if you want to do
that for more than one database, you need to actually run that pair of
commands for each of them.</p>
<p>Now it's time to restart the new ticker. The main changes from the previous
one is that it is now a C program called pgqd that knows how to tick for any
number of <em>databases</em>, so that you only have to have <em>one instance</em> around <em>per
cluster</em> now.</p>
<pre class="src">
sudo /etc/init.d/skytools3 start
tail -f /var/log/skytools/pgqd.log
</pre>
<p>Those two commands are taking for granted that you did prepare the pgqd
setup the <em>debian</em> and <em>skytools</em> way, by adding your config in
/etc/skytools3/pgqd.ini and editing /etc/skytools.ini accordingly, so that
it's automatically taken into account at machine boot.</p>
<p>Note that I did actually exercised the procedure above while running a
<a href="http://www.postgresql.org/docs/9.2/static/pgbench.html">pgbench</a> test replicated with londiste. Of course the replication has been
lagging a little while no <em>ticker</em> was running, and then it catched-up as fast
as it could, in that case:</p>
<pre class="src">
INFO {count: 245673, ignored: 0, duration: 422.104366064}
</pre>
<h3>Happy Hacking!</h3>
<p class="first">So if you have any <em>batch processing</em> needs, remember to consider what PGQ has
to offer. And yes if you're running some cron job to compute things out of
the database for you, you are doing some <em>batch processing</em>.</p>
<center>
<p><img src="../../../images/hayseed.jpg" alt=""></p>
</center>
<center>
<p><em>Yes, I did search for Transactional Batch Processing</em></p>
</center>
]]></description>
<center> <p><a class="image-link" href="https://fosdem.org/2013/"> <img src="../../../images/fosdem-logo.png"></a><a class="image-link" href="http://fosdem2013.pgconf.eu/"> <img src="../../../images/postgresql-elephant.small.png"></a></p> </center> <center> <p><em>PostgreSQL at FOSDEM made for a great event</em></p> </center> <p>Having had the opportunity to meet more people from those other development communities, I really think we should go and reach for them in their own conferences. About any PostgreSQL community member I've been talking about with about that idea seemed to agree and generally already was thinking the same thing. And most are already doing it, in fact...</p> <p>I had the pleasure to run two conferences there, both in the <a href="https://fosdem.org/2013/schedule/track/postgresql/">PostgreSQL devroom</a>.</p> <h3>Event Triggers</h3> <p class="first">I'm currently in the middle of implementing <em>Event Triggers</em> for PostgreSQL and I have been for about the last 2 years. It's a quite complex feature to get right and so the patch itself is complex and large, which means the reviewing process is complex and takes time.</p> <p>That also means that some parts of the design have already been redone completely at least 3 times, and that what got <em>commited</em> to the PostgreSQL code is nothing like what the design we decided should go in looks like. That's just a fact of life, maybe, but that makes for a very long development process.</p> <p>We're now getting to the end of it though, and this talk is showing both where we want to go with <em>Event Triggers</em>, where we are now and what remains to be done for 9.3 if we want the feature to be any useful.</p> <p>If you're interested into that development, have a look at the slide deck and possibly ask me some questions about what's not clear on the <a href="http://www.postgresql.org/list/pgsql-hackers/">pgsql-hackers</a> mailing list (preferably).</p> <center> <p><a class="image-link" href="../../../images/confs/Fosdem2013_Event_Triggers.pdf"> <img src="../../../images/confs/Fosdem2013_Event_Triggers.png"></a></p> </center> <center> <p><em>Event Triggers, The Real Mess™</em></p> </center> <p>The other way to get summarized and clear information about Event Triggers is the wiki page by the same name: <a href="http://wiki.postgresql.org/wiki/Event_Triggers">Event Triggers</a>.</p> <p>You will see that while a lot has been done (internal refactoring, adding new infrastructure and SQL level commands, and the minimum<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 08 Feb 2013 15:52:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/08-PGQ-Live-Upgrade.html</guid> </item> <item> <title>Another Great FOSDEM</title> <link>http://tapoueh.org/blog/2013/02/04-Another-great-FOSDEM.html</link> <description><![CDATA[<p>This year's FOSDEM has been a great edition, in particular the <a href="http://fosdem2013.pgconf.eu/">FOSDEM PGDAY 2013</a> was a great way to begin a 3 days marathon of talking about PostgreSQL with people not only from our community but also from plenty other Open Source communities too: users!</p>
PLpgSQL support);
a lot remains to be done where the code has already been submitted several
times, following several designs directions given by careful review on
hackers, and still we have some choices to make.</p>
<h3>Implementing High-Availability</h3>
<p class="first">This talk is showing several ways to implement <em>High Availability</em> with
PostgreSQL. The fact is that that term is overloaded already, and usually
covers two very different things which are <em>Service Availability</em> and <em>Data
Availability</em>.</p>
<p>In the talk, we're showing up several techniques that you can use to address
different set of compromises in between <em>scaling</em>, <em>load balancing</em>, <em>data
availability</em> and <em>durability</em>, and <em>service availability</em>. The first two points
could seem unrelated to the main topic, but <em>scaling</em> often is a simple enough
way to achieve <em>service availability</em>... until you need to think about
<em>sharding</em>, that is.</p>
<center>
<p><a class="image-link" href="../../../images/confs/Fosdem2013_High_Availability.pdf">
<img src="../../../images/confs/Fosdem2013_High_Availability.640.png"></a></p>
</center>
<center>
<p><em>Implementing High Availability of Services and Data with PostgreSQL</em></p>
</center>
<p>So the talk is all about making compromises in between them and getting to
an architecture able to implement the choosen compromises. While the talk
has been pretty well received, it was delivered in a 50 mins slot where we
usually take a whole day or three when addressing that problems at a
customer's site.</p>
<p>Some parts of how to get to the right architecture for the compromises that
are important for you can't be fully covered in that time slot, while still
being able to actually present the techniques that we're using.</p>
<p>I think it might be useful to extract a single use-case or two from that
talk then have a full 50 mins version reduced to a single or a couple of
very clear compromises and how to achieve them in details, rather than
trying to present a full range of techniques and how to use them in
different scenarios.</p>
<h3>FOSDEM</h3>
<p class="first">After having been talking with many people, it appears that for next year's
edition I should be proposing a more general talk that aims at helping
developpers in other communities (python, ruby, etc) discover what's in for
them in PostgreSQL. This database is full of advanced features that are
really easy to use, and the only problem when preparing such a talk is
choosing the right subset...</p>
<p>If you're running a local developper user group and are interested into
learning some more about how PostgreSQL can help you in a daily basis,
please do get in touch with me and let's schedule a presentation together!</p>
]]></description>
<center> <p><a class="image-link" href="https://fosdem.org/2013/"> <img src="../../../images/fosdem-logo.png"></a><a class="image-link" href="http://fosdem2013.pgconf.eu/"> <img src="../../../images/postgresql-elephant.small.png"></a></p> </center> <center> <p><em>PostgreSQL at FOSDEM made for a great event</em></p> </center> <p>Having had the opportunity to meet more people from those other development communities, I really think we should go and reach for them in their own conferences. About any PostgreSQL community member I've been talking about with about that idea seemed to agree and generally already was thinking the same thing. And most are already doing it, in fact...</p> <p>I had the pleasure to run two conferences there, both in the <a href="https://fosdem.org/2013/schedule/track/postgresql/">PostgreSQL devroom</a>.</p> <h3>Event Triggers</h3> <p class="first">I'm currently in the middle of implementing <em>Event Triggers</em> for PostgreSQL and I have been for about the last 2 years. It's a quite complex feature to get right and so the patch itself is complex and large, which means the reviewing process is complex and takes time.</p> <p>That also means that some parts of the design have already been redone completely at least 3 times, and that what got <em>commited</em> to the PostgreSQL code is nothing like what the design we decided should go in looks like. That's just a fact of life, maybe, but that makes for a very long development process.</p> <p>We're now getting to the end of it though, and this talk is showing both where we want to go with <em>Event Triggers</em>, where we are now and what remains to be done for 9.3 if we want the feature to be any useful.</p> <p>If you're interested into that development, have a look at the slide deck and possibly ask me some questions about what's not clear on the <a href="http://www.postgresql.org/list/pgsql-hackers/">pgsql-hackers</a> mailing list (preferably).</p> <center> <p><a class="image-link" href="../../../images/confs/Fosdem2013_Event_Triggers.pdf"> <img src="../../../images/confs/Fosdem2013_Event_Triggers.png"></a></p> </center> <center> <p><em>Event Triggers, The Real Mess™</em></p> </center> <p>The other way to get summarized and clear information about Event Triggers is the wiki page by the same name: <a href="http://wiki.postgresql.org/wiki/Event_Triggers">Event Triggers</a>.</p> <p>You will see that while a lot has been done (internal refactoring, adding new infrastructure and SQL level commands, and the minimum<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 04 Feb 2013 09:55:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/04-Another-great-FOSDEM.html</guid> </item> <item> <title>Another Great FOSDEM</title> <link>http://tapoueh.org/blog/2013/02/04-Another-great-FOSDEM.html</link> <description><![CDATA[<p>This year's FOSDEM has been a great edition, in particular the <a href="http://fosdem2013.pgconf.eu/">FOSDEM PGDAY 2013</a> was a great way to begin a 3 days marathon of talking about PostgreSQL with people not only from our community but also from plenty other Open Source communities too: users!</p>
PLpgSQL support);
a lot remains to be done where the code has already been submitted several
times, following several designs directions given by careful review on
hackers, and still we have some choices to make.</p>
<h3>Implementing High-Availability</h3>
<p class="first">This talk is showing several ways to implement <em>High Availability</em> with
PostgreSQL. The fact is that that term is overloaded already, and usually
covers two very different things which are <em>Service Availability</em> and <em>Data
Availability</em>.</p>
<p>In the talk, we're showing up several techniques that you can use to address
different set of compromises in between <em>scaling</em>, <em>load balancing</em>, <em>data
availability</em> and <em>durability</em>, and <em>service availability</em>. The first two points
could seem unrelated to the main topic, but <em>scaling</em> often is a simple enough
way to achieve <em>service availability</em>... until you need to think about
<em>sharding</em>, that is.</p>
<center>
<p><a class="image-link" href="../../../images/confs/Fosdem2013_High_Availability.pdf">
<img src="../../../images/confs/Fosdem2013_High_Availability.640.png"></a></p>
</center>
<center>
<p><em>Implementing High Availability of Services and Data with PostgreSQL</em></p>
</center>
<p>So the talk is all about making compromises in between them and getting to
an architecture able to implement the choosen compromises. While the talk
has been pretty well received, it was delivered in a 50 mins slot where we
usually take a whole day or three when addressing that problems at a
customer's site.</p>
<p>Some parts of how to get to the right architecture for the compromises that
are important for you can't be fully covered in that time slot, while still
being able to actually present the techniques that we're using.</p>
<p>I think it might be useful to extract a single use-case or two from that
talk then have a full 50 mins version reduced to a single or a couple of
very clear compromises and how to achieve them in details, rather than
trying to present a full range of techniques and how to use them in
different scenarios.</p>
<h3>FOSDEM</h3>
<p class="first">After having been talking with many people, it appears that for next year's
edition I should be proposing a more general talk that aims at helping
developpers in other communities (python, ruby, etc) discover what's in for
them in PostgreSQL. This database is full of advanced features that are
really easy to use, and the only problem when preparing such a talk is
choosing the right subset...</p>
<p>If you're running a local developper user group and are interested into
learning some more about how PostgreSQL can help you in a daily basis,
please do get in touch with me and let's schedule a presentation together!</p>
]]></description>
<center> <p><a class="image-link" href="https://fosdem.org/2013/"> <img src="../../../images/fosdem-logo.png"></a><a class="image-link" href="http://fosdem2013.pgconf.eu/"> <img src="../../../images/postgresql-elephant.small.png"></a></p> </center> <center> <p><em>PostgreSQL at FOSDEM made for a great event</em></p> </center> <p>Having had the opportunity to meet more people from those other development communities, I really think we should go and reach for them in their own conferences. About any PostgreSQL community member I've been talking about with about that idea seemed to agree and generally already was thinking the same thing. And most are already doing it, in fact...</p> <p>I had the pleasure to run two conferences there, both in the <a href="https://fosdem.org/2013/schedule/track/postgresql/">PostgreSQL devroom</a>.</p> <h3>Event Triggers</h3> <p class="first">I'm currently in the middle of implementing <em>Event Triggers</em> for PostgreSQL and I have been for about the last 2 years. It's a quite complex feature to get right and so the patch itself is complex and large, which means the reviewing process is complex and takes time.</p> <p>That also means that some parts of the design have already been redone completely at least 3 times, and that what got <em>commited</em> to the PostgreSQL code is nothing like what the design we decided should go in looks like. That's just a fact of life, maybe, but that makes for a very long development process.</p> <p>We're now getting to the end of it though, and this talk is showing both where we want to go with <em>Event Triggers</em>, where we are now and what remains to be done for 9.3 if we want the feature to be any useful.</p> <p>If you're interested into that development, have a look at the slide deck and possibly ask me some questions about what's not clear on the <a href="http://www.postgresql.org/list/pgsql-hackers/">pgsql-hackers</a> mailing list (preferably).</p> <center> <p><a class="image-link" href="../../../images/confs/Fosdem2013_Event_Triggers.pdf"> <img src="../../../images/confs/Fosdem2013_Event_Triggers.png"></a></p> </center> <center> <p><em>Event Triggers, The Real Mess™</em></p> </center> <p>The other way to get summarized and clear information about Event Triggers is the wiki page by the same name: <a href="http://wiki.postgresql.org/wiki/Event_Triggers">Event Triggers</a>.</p> <p>You will see that while a lot has been done (internal refactoring, adding new infrastructure and SQL level commands, and the minimum<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 04 Feb 2013 09:55:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/04-Another-great-FOSDEM.html</guid> </item> <item> <title>Another Great FOSDEM</title> <link>http://tapoueh.org/blog/2013/02/04-Another-great-FOSDEM.html</link> <description><![CDATA[<p>This year's FOSDEM has been a great edition, in particular the <a href="http://fosdem2013.pgconf.eu/">FOSDEM PGDAY 2013</a> was a great way to begin a 3 days marathon of talking about PostgreSQL with people not only from our community but also from plenty other Open Source communities too: users!</p>
PLpgSQL support);
a lot remains to be done where the code has already been submitted several
times, following several designs directions given by careful review on
hackers, and still we have some choices to make.</p>
<h3>Implementing High-Availability</h3>
<p class="first">This talk is showing several ways to implement <em>High Availability</em> with
PostgreSQL. The fact is that that term is overloaded already, and usually
covers two very different things which are <em>Service Availability</em> and <em>Data
Availability</em>.</p>
<p>In the talk, we're showing up several techniques that you can use to address
different set of compromises in between <em>scaling</em>, <em>load balancing</em>, <em>data
availability</em> and <em>durability</em>, and <em>service availability</em>. The first two points
could seem unrelated to the main topic, but <em>scaling</em> often is a simple enough
way to achieve <em>service availability</em>... until you need to think about
<em>sharding</em>, that is.</p>
<center>
<p><a class="image-link" href="../../../images/confs/Fosdem2013_High_Availability.pdf">
<img src="../../../images/confs/Fosdem2013_High_Availability.640.png"></a></p>
</center>
<center>
<p><em>Implementing High Availability of Services and Data with PostgreSQL</em></p>
</center>
<p>So the talk is all about making compromises in between them and getting to
an architecture able to implement the choosen compromises. While the talk
has been pretty well received, it was delivered in a 50 mins slot where we
usually take a whole day or three when addressing that problems at a
customer's site.</p>
<p>Some parts of how to get to the right architecture for the compromises that
are important for you can't be fully covered in that time slot, while still
being able to actually present the techniques that we're using.</p>
<p>I think it might be useful to extract a single use-case or two from that
talk then have a full 50 mins version reduced to a single or a couple of
very clear compromises and how to achieve them in details, rather than
trying to present a full range of techniques and how to use them in
different scenarios.</p>
<h3>FOSDEM</h3>
<p class="first">After having been talking with many people, it appears that for next year's
edition I should be proposing a more general talk that aims at helping
developpers in other communities (python, ruby, etc) discover what's in for
them in PostgreSQL. This database is full of advanced features that are
really easy to use, and the only problem when preparing such a talk is
choosing the right subset...</p>
<p>If you're running a local developper user group and are interested into
learning some more about how PostgreSQL can help you in a daily basis,
please do get in touch with me and let's schedule a presentation together!</p>
]]></description>
<p>Turns out it's not true, because we still depend on past century technologies somehow. Not everybody will be looking at the schedule on the web using a connected mobile device (you know, you've heard of them, those <em>tracking and surveillance devices</em>, if you want to believe <a href="http://stallman.org/rms-lifestyle.html">Stallman</a>), and as the schedule gets printed on little paper sheets, it's unfortunately too late to change it now.</p> <center> <p><a class="image-link" href="https://fosdem.org/2013/"> <img src="../../../images/fosdem.png"></a></p> </center> <center> <p><em>Those flyers are already printed on paper sheets, the schedule too</em></p> </center> <p>So it happens that I'll be speaking twice on Sunday and not at all on Friday.</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 04 Feb 2013 09:55:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/02/04-Another-great-FOSDEM.html</guid> </item> <item> <title>A Sunday at FOSDEM</title> <link>http://tapoueh.org/blog/2013/01/30-A-Sunday-at-FOSDEM.html</link> <description><![CDATA[<p>The previous article <a href="29-FOSDEM-2013.html">FOSDEM 2013</a> said to be careful with the <a href="https://fosdem.org/2013/schedule/track/postgresql/">PostgreSQL devroom schedule</a> because one of my talks there might get swapped with a slot on the <a href="http://fosdem2013.pgconf.eu/">FOSDEM PGDay 2013</a> which happens <strong><em>this Friday</em></strong> and has been sold out anyway.</p>
<center> <p><a class="image-link" href="https://fosdem.org/2013/"> <img src="../../../images/fosdem.png"></a></p> </center> <center> <p><em>I'm Going to the FOSDEM, hope to see you there!</em></p> </center> <p>And I'm presenting two talks over there that are both currently scheduled on the Sunday in the <a href="https://fosdem.org/2013/schedule/track/postgresql/">PostgreSQL devroom</a>. We're talking about changing that though, so that one of those will in fact happen <strong><em>this Friday</em></strong> at the <a href="http://www.postgresql.eu/events/schedule/fosdem2013/">FOSDEM PGDay 2013</a>, which has a different schedule, so consider watching for that.</p> <p>One of those two talks is about <a href="https://fosdem.org/2013/schedule/event/postgresql_implementing_high_availability/">Implementing High Availability</a> (yes, with PostgreSQL). It's been quite well received in the places I had to chance to make it before (namely <em>PGDay France</em> and <em>PG Conf Europe</em>), and it's going to be a stripped down version of it so that it fits well in the 45 mins slot we have here.</p> <p>The other talk is going to be about <a href="https://fosdem.org/2013/schedule/event/postgresql_event_triggers/">Event Triggers</a>, a feature new in PostgreSQL 9.3 (due in september 2013, crossing fingers) and while the goal of that talk is to introduce what the feature is all about and a bunch of use cases that you can address by using it, it will certainly offer a peek into the PostgreSQL development cycle and community processes.</p> <center> <p><img src="../../../images/belgium-beers.jpg" alt=""></p> </center> <center> <p><em>See you in Brussels!</em></p> </center> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 30 Jan 2013 10:50:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/01/30-A-Sunday-at-FOSDEM.html</guid> </item> <item> <title>FOSDEM 2013</title> <link>http://tapoueh.org/blog/2013/01/29-FOSDEM-2013.html</link> <description><![CDATA[<p>This year again I'm going to <a href="https://fosdem.org/2013/">FOSDEM</a>, and to the extra special <a href="http://fosdem2013.pgconf.eu/">PostgreSQL FOSDEM day</a>. It will be the first time that I'm going to be at the event for the full week-end rather than just commuting in for the day.</p>
<center> <p><img src="../../../images/PDL_Adapter-250.png" alt=""></p> </center> <center> <p><em>Not quite this kind of data loader</em></p> </center> <p>In a recent migration project where we freed data from MySQL into PostgreSQL, we used<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 29 Jan 2013 10:11:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/01/29-FOSDEM-2013.html</guid> </item> <item> <title>pgloader: what's next?</title> <link>http://tapoueh.org/blog/2013/01/28-pgloader-future.html</link> <description><![CDATA[<p><a href="../../../pgsql/pgloader.html">pgloader</a> is a tool to help loading data into <a href="http://www.postgresql.org/">PostgreSQL</a>, adding some error management to the <a href="http://www.postgresql.org/docs/9.2/interactive/sql-copy.html">COPY</a> command.
COPYis the fast way of loading data into PostgreSQL and is transaction safe. That means that if a single error appears within your bulk of data, you will have loaded none of it.pgloaderwill submit the data again in smaller chunks until it's able to isolate the bad from the good, and then the good is loaded in.</p>
pgloader again. But the loading time was not fast enough
for the service downtime window that we had here. Indeed <a href="http://www.python.org/">Python</a> is not known
for being the fastest solution around. It's easy to use and to ship to
production, but sometimes you not only want to be able to be efficient when
writing code, you also need the code to actually run fast too.</p>
<h3>Faster data loading</h3>
<p class="first">So I began writing a little dedicated tool for that migration in <a href="http://cliki.net/">Common Lisp</a>
which is growing on me as my personal answer to the burning question: <em>python
2 or python 3</em>? I find <em>Common Lisp</em> to offer an even more dynamic programming
environment, an easier language to use, and the result often has
performances characteristics way beyond what I can get with python. Between
<a href="http://tapoueh.org/blog/2012/07/10-solving-sudoku.html">5 times faster</a> and <a href="http://tapoueh.org/blog/2012/08/20-performance-the-easiest-way.html">121 times faster</a> in some quite stupid benchmark.</p>
<p>Here, with real data, my one shot attempt has been running more than <em>twice
as fast</em> as the python version, after about a day of programming.</p>
<center>
<p><img src="../../../images/lisp-python.png" alt=""></p>
</center>
<center>
<p><em>See what's happening now?</em></p>
</center>
<p>The other thing here is that I've tempted to get pgloader work in parallel,
but at the time I didn't know about the <a href="http://docs.python.org/3/c-api/init.html#threads">Global Interpreter Lock</a> that they
didn't find how to remove in Python 3 still, by the way. So my threading
attempts at making pgloader work in parallel are pretty useless.</p>
<p>Whereas in <em>Common Lisp</em> I can just use the <a href="http://lparallel.org/">lparallel</a> lib, which exposes
threading facilities and some <em>queueing</em> facilities as a mean to communicate
data in between workers, and have my code easily work in parallel for real.</p>
<h3>Compatibility</h3>
<p class="first">The only drawback that I can see here is that if you've been writing your
own <em>reformating modules</em> in python for pgloader (yes you can
<a href="http://tapoueh.org/pgsql/pgloader.html#sec21">implement your own reformating module for pgloader</a>), then you would have to
port it to <em>Common Lisp</em>. Shout me an email if that's your case.</p>
<h3>Next version</h3>
<p class="first">So, I think we're going to have a <em>pgloader 3</em> someday, that will be way
faster than the current one, and bundle some more features: real parallel
behavior, ability to fetch non local data (connecting to MySQL directly, or
HTTP, S3, etc); and I'm thinking about offering a COPY like syntax to drive
the loading too, while at it. Also, the ability to discover the set of data
to load all by itself when you want to load a whole database: think of it as
a special <em>Migration</em> mode of operations.</p>
<p>Some feature requests can't be solved easily when keeping the old .INI
syntax cruft, so it's high time to implement some kind of a real command
language. I have several ideas about those, in between the COPY syntax and
the SQL*Loader configuration format, which is both clunky and quite
powerful, too.</p>
<p>After a beginning in TCL and a complete rewrite in python in 2005, it looks
like 2013 is going to be the year of <em>pgloader 3</em>, in <em>Common Lisp</em>!</p>
]]></description>
<center> <p><img src="../../../images/dauphin-logo.jpg" alt=""></p> </center> <center> <p><em>That's how I feel for MySQL users</em></p> </center> <h3>Migrating the schema</h3> <p class="first">For the <em>schema</em> parts, I've been using <a href="http://pgfoundry.org/projects/mysql2pgsql/">mysql2pgsql</a> with success for many years. This tool is not complete and will do only about <em>80%</em> of the work. As I think that the schema should always be validated manually when doing a migration anyway, I happen to think that it's good news.</p> <h3>Getting the data out</h3> <p class="first">Then for the data parts I keep on using <a href="../../../pgsql/pgloader.html">pgloader</a>. The data is never quite right, and the ability to filter out what you can't readily import in a <em>reject</em> file proves itself a a must have here. The problems you have in the exported MySQL data are quite serious:</p> <center> <p><img src="../../../images/data-unlocked.320.png" alt=""></p> </center> <center> <p><em>Can I have my data please?</em></p> </center> <p>First, date formating is not compatible with what PostgreSQL expects, sometimes using<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 28 Jan 2013 10:48:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/01/28-pgloader-future.html</guid> </item> <item> <title>Automated Setup for pgloader</title> <link>http://tapoueh.org/blog/2013/01/17-pgloader-auto-setup.html</link> <description><![CDATA[<p>Another day, another migration from <em>MySQL</em> to <a href="http://www.postgresql.org/">PostgreSQL</a>... or at least that's how it feels sometimes. This time again I've been using some quite old scripts to help me do the migration.</p>
20130117143218 instead of what we expect: 2013-01-17
14:32:18, and of course even when the format is right (that seems to depend
on the MySQL server's version), you still have to transform the 0000-00-00
00:00:00 into NULL.</p>
<blockquote>
<p class="quoted">
Before thinking about the usage of that particular date rather than
using NULL when you don't have the information, you might want to
remember that there's no <a href="http://en.wikipedia.org/wiki/0_(year)">year zero</a> in the calendar, it's year 1 BC and
then year 1.</p>
</blockquote>
<p>Then, text encoding is often mixed up, even when the MySQL databases are
said to be in <em>latin1</em> or <em>unicode</em>, you somehow always end up finding texts in
<em>win1252</em> or some other <em>code page</em> in there.</p>
<p>And of course, MySQL provides no tool to export the data to CSV, so you have
to come up with your own. The SELECT INTO OUTFILE command on the server
produces non conforming CSV (\n can appear in non-escaped field contents),
and while the mysql client manual page details that it outputs CSV when
stdout is not a terminal, it won't even try to quote fields or escape \t
when they appear in the data.</p>
<p>So, we use the <a href="https://github.com/slardiere/mysqltocsv">mysqltocsv</a> little script to export the data, and then use
that data to feed <a href="../../../pgsql/pgloader.html">pgloader</a>.</p>
<h3>Loading the data in</h3>
<p class="first">Now, we have to write down a configuration file for pgloader to know what to
load and where to find the data. What about generating the file from the
database schema instead, using the query in <a href="generate-pgloader-config.sql">generate-pgloader-config.sql</a>:</p>
<pre class="src">
<span style="color: #7f007f;">with</span> reformat <span style="color: #7f007f;">as</span> (
<span style="color: #7f007f;">select</span> relname, attnum, attname, typname, <span style="color: #7f007f;">case</span> typname <span style="color: #7f007f;">when</span> <span style="color: #bc8f8f;">'timestamptz'</span>
| <span style="color: #7f007f;">then</span> attname | <span style="color: #bc8f8f;">':mynull:timestamp'</span> |
|---|
<span style="color: #7f007f;">when</span> <span style="color: #bc8f8f;">'date'</span>
| <span style="color: #7f007f;">then</span> attname | <span style="color: #bc8f8f;">':mynull:date'</span> |
|---|
<span style="color: #7f007f;">end</span> <span style="color: #7f007f;">as</span> reformat <span style="color: #7f007f;">from</span> pg_class <span style="color: #da70d6;">c</span> <span style="color: #7f007f;">join</span> pg_namespace n <span style="color: #7f007f;">on</span> n.oid = <span style="color: #da70d6;">c</span>.relnamespace <span style="color: #7f007f;">left</span> <span style="color: #7f007f;">join</span> pg_attribute <span style="color: #da70d6;">a</span> <span style="color: #7f007f;">on</span> <span style="color: #da70d6;">c</span>.oid = <span style="color: #da70d6;">a</span>.attrelid <span style="color: #7f007f;">join</span> pg_type <span style="color: #da70d6;">t</span> <span style="color: #7f007f;">on</span> <span style="color: #da70d6;">t</span>.oid = <span style="color: #da70d6;">a</span>.atttypid <span style="color: #7f007f;">where</span> <span style="color: #da70d6;">c</span>.relkind = <span style="color: #bc8f8f;">'r'</span> <span style="color: #7f007f;">and</span> attnum > 0 <span style="color: #7f007f;">and</span> n.nspname = <span style="color: #bc8f8f;">'public'</span> ), config_reformat <span style="color: #7f007f;">as</span> ( <span style="color: #7f007f;">select</span> relname,
| <span style="color: #bc8f8f;">'['</span>||relname||<span style="color: #bc8f8f;">']'</span> | E<span style="color: #bc8f8f;">'\n'</span> | ||
|---|---|---|---|
| <span style="color: #bc8f8f;">'table = '</span> | relname | E<span style="color: #bc8f8f;">' \n'</span> | |
| <span style="color: #bc8f8f;">'filename = /path/to/csv/'</span> | relname | E<span style="color: #bc8f8f;">'.csv\n'</span> | |
| <span style="color: #bc8f8f;">'format = csv'</span> | E<span style="color: #bc8f8f;">'\n'</span> | ||
| <span style="color: #bc8f8f;">'field_sep = \t'</span> | E<span style="color: #bc8f8f;">'\n'</span> | ||
| <span style="color: #bc8f8f;">'columns = '</span> | E<span style="color: #bc8f8f;">' \n'</span> | ||
| <span style="color: #bc8f8f;">'reformat = '</span> | array_to_string(<span style="color: #da70d6;">array_agg</span>(reformat), <span style="color: #bc8f8f;">', '</span>) | ||
| E<span style="color: #bc8f8f;">'\n'</span> <span style="color: #7f007f;">as</span> config |
<span style="color: #7f007f;">from</span> reformat <span style="color: #7f007f;">where</span> reformat <span style="color: #7f007f;">is</span> <span style="color: #7f007f;">not</span> <span style="color: #7f007f;">null</span> <span style="color: #7f007f;">group</span> <span style="color: #da70d6;">by</span> relname ), noreformat <span style="color: #7f007f;">as</span> ( <span style="color: #7f007f;">select</span> relname, bool_and(reformat <span style="color: #7f007f;">is</span> <span style="color: #7f007f;">null</span>) <span style="color: #7f007f;">as</span> noreformating <span style="color: #7f007f;">from</span> reformat <span style="color: #7f007f;">group</span> <span style="color: #da70d6;">by</span> relname ), config_noreformat <span style="color: #7f007f;">as</span> ( <span style="color: #7f007f;">select</span> relname,
| <span style="color: #bc8f8f;">'['</span>||relname||<span style="color: #bc8f8f;">']'</span> | E<span style="color: #bc8f8f;">'\n'</span> | ||
|---|---|---|---|
| <span style="color: #bc8f8f;">'table = '</span> | relname | E<span style="color: #bc8f8f;">' \n'</span> | |
| <span style="color: #bc8f8f;">'filename = /path/to/csv/'</span> | relname | E<span style="color: #bc8f8f;">'.csv\n'</span> | |
| <span style="color: #bc8f8f;">'format = csv'</span> | E<span style="color: #bc8f8f;">'\n'</span> | ||
| <span style="color: #bc8f8f;">'field_sep = \t'</span> | E<span style="color: #bc8f8f;">'\n'</span> | ||
| <span style="color: #bc8f8f;">'columns = '</span> | E<span style="color: #bc8f8f;">' \n'</span> | ||
| E<span style="color: #bc8f8f;">'\n'</span> <span style="color: #7f007f;">as</span> config |
<p>To work with the setup generated, you will have to prepend a global section for pgloader and to include a reformating module in python, that I named <a href="mynull.py">mynull.py</a>:</p> <pre class="src"> <span style="color: #b22222;"># Author: Dimitri Fontaine <<a href="mailto:dimitri@2ndQuadrant.fr">dimitri@2ndQuadrant.fr</a>> # # pgloader mysql reformating module </span> <span style="color: #7f007f;">def</span> <span style="color: #0000ff;">timestamp</span>(reject, <span style="color: #da70d6;">input</span>):<span style="color: #7f007f;">from</span> reformat <span style="color: #7f007f;">join</span> noreformat <span style="color: #7f007f;">using</span> (relname) <span style="color: #7f007f;">where</span> noreformating <span style="color: #7f007f;">group</span> <span style="color: #da70d6;">by</span> relname ), allconfs <span style="color: #7f007f;">as</span> ( <span style="color: #7f007f;">select</span> relname, config <span style="color: #7f007f;">from</span> config_reformat <span style="color: #7f007f;">union</span> <span style="color: #7f007f;">all</span> <span style="color: #7f007f;">select</span> relname, config <span style="color: #7f007f;">from</span> config_noreformat ) <span style="color: #7f007f;">select</span> config <span style="color: #7f007f;">from</span> allconfs <span style="color: #7f007f;">where</span> relname <span style="color: #7f007f;">not</span> <span style="color: #7f007f;">in</span> (<span style="color: #bc8f8f;">'tables'</span>, <span style="color: #bc8f8f;">'wedont'</span>, <span style="color: #bc8f8f;">'wantto'</span>, <span style="color: #bc8f8f;">'load'</span>) <span style="color: #7f007f;">order</span> <span style="color: #da70d6;">by</span> relname; </pre>
<span style="color: #bc8f8f;">""" Reformat str as a PostgreSQL timestamp
MySQL timestamps are ok this time: 2012-12-18 23:38:12 But may contain the infamous all-zero date, where we want NULL. """</span> <span style="color: #7f007f;">if</span> <span style="color: #da70d6;">input</span> == <span style="color: #bc8f8f;">'0000-00-00 00:00:00'</span>: <span style="color: #7f007f;">return</span> <span style="color: #5f9ea0;">None</span>
<span style="color: #7f007f;">return</span> <span style="color: #da70d6;">input</span>
<span style="color: #7f007f;">def</span> <span style="color: #0000ff;">date</span>(reject, <span style="color: #da70d6;">input</span>):
<span style="color: #bc8f8f;">""" date columns can also have '0000-00-00'"""</span> <span style="color: #7f007f;">if</span> <span style="color: #da70d6;">input</span> == <span style="color: #bc8f8f;">'0000-00-00'</span>: <span style="color: #7f007f;">return</span> <span style="color: #5f9ea0;">None</span>
<p>Now you can launch<span style="color: #7f007f;">return</span> <span style="color: #da70d6;">input</span> </pre>
pgloader and profit!</p>
<h3>Conclusion</h3>
<p class="first">There are plenty of tools to assist you migrating away from MySQL and other
databases. When you make that decision, you're not alone, and it's easy
enough to find people to come and help you.</p>
<p>While MySQL is Open Source and is not a <em>lock in</em> from a licencing
perspective, I still find it hard to swallow that there's no provided tools
for getting data out in a sane format, and that so many little
inconsistencies exist in the product with respect to data handling (try to
have a NOT NULL column, then enjoy the default empty strings that have been
put in there). So at this point, yes, I consider that moving to <a href="http://www.postgresql.org/">PostgreSQL</a>
is a way to <em>free your data</em>:</p>
<center>
<p><img src="../../../images/free-our-open-data.jpg" alt=""></p>
</center>
<center>
<p><em>Free your data!</em></p>
</center>
]]></description>
<center> <p><img src="../../../images/lambda.png" alt=""></p> </center> <center> <p><em>Let's have fun with lambda!</em></p> </center> <p>So, here we go with a simple Common Lisp attempt. The <em>Lost in scope</em> article begins with defining a very simple function returning a boolean value, only true when it's not<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 17 Jan 2013 14:32:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/01/17-pgloader-auto-setup.html</guid> </item> <item> <title>Lost in scope</title> <link>http://tapoueh.org/blog/2013/01/09-Lost-in-scope.html</link> <description><![CDATA[<p>Thanks to <a href="https://twitter.com/mickael/status/288795520179240962">Mickael</a> on <em>twitter</em> I got to read an article about loosing scope with some common programming languages. As the blog article <a href="https://my.smeuh.org/al/blog/lost-in-scope">Lost in scope</a> references <em>functional programming languages</em> and plays with both <em>Javascript</em> and <em>Erlang</em>, I though I had to try it out with <em>Common Lisp</em> too.</p>
monday.</p>
<h3>Monday is special</h3>
<p class="first">Keep in mind that the following example has been choosen to be simple yet
offer a case of <em>lexical binding shadowing</em>. It looks convoluted. Focus on the
day binding.</p>
<pre class="src">
(<span style="color: #7f007f;">defparameter</span> <span style="color: #b8860b;">*days*</span>
'(monday tuesday wednesday thursday friday saturday sunday) <span style="color: #bc8f8f;">"List of days in the week"</span>)
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">any-day-but-monday?</span> (day)
<p>So as you can see, in <em>Common Lisp</em> we just get away with a list of symbols rather than a string that we split to have a list of strings, or an array of strings, as in the examples with <em>python</em> and <em>ruby</em>.</p> <p>Now, the <em>generalized boolean</em> is either<span style="color: #bc8f8f;">"Returns a generalized boolean, true unless DAY is 'monday"</span> (member day (remove-if (<span style="color: #7f007f;">lambda</span> (day) (eq day 'monday)) days))) </pre>
nil to mean false, or anything else
to mean true, and in that example the return value of <a href="http://www.lispworks.com/documentation/HyperSpec/Body/a_member.htm">member</a> is a sub-list
that begins where the <em>member</em> was found:</p>
<pre class="src">
CL-USER> (any-day-but-monday? 'monday)
NIL
CL-USER> (any-day-but-monday? 'tuesday) (TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY) </pre>
<p>Oh, and as we work with <em>Common Lisp</em>, we're having a real <a href="http://www.gigamonkeys.com/book/lather-rinse-repeat-a-tour-of-the-repl.html">REPL</a> where to play directly with our code, no need to add <em>interactive</em> stanzas in the main program text file just to be able to play with it. In <a href="http://common-lisp.net/project/slime/">Emacs Slime</a> we just useC-M-x on a <em>form</em> to have it available in the <em>REPL</em>, or C-c C-l to load the
whole file we're working on.</p>
<p>So, we see that <em>Common Lisp</em> scoping rules are silently doing the right thing
here. Within the <a href="http://www.lispworks.com/documentation/HyperSpec/Body/f_rm_rm.htm">remove-if</a> call we define a <em>lambda</em> function taking a single
parameter called <em>day</em>. It so happens that this parameter is shadowing the
<em>any-day-but-monday?</em> function parameter, and that shadowing only happens in
the <em>lexical scope</em> of the <em>lambda</em> we are creating. For a detailed discussion
about that concept, I would refer you to the <a href="http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node43.html">Scope and Extent</a> chapter of
<em>Common Lisp the Language, 2nd Edition</em>.</p>
<p>In <em>Common Lisp</em> we have both <em>lexical scope</em> and <em>dynamic extents</em>, and a
variable defined with <em>defparameter</em> or <em>defvar</em> or that you otherwise <a href="http://www.lispworks.com/documentation/HyperSpec/Body/s_declar.htm">declare</a>
<em>special</em> will have a <em>dynamic extent</em>. Hence this section title.</p>
<h3>Closures</h3>
<p class="first">Now, the <a href="https://my.smeuh.org/al/blog/lost-in-scope">lost in scope</a> article tries some more at finding a solution around
the scoping rules of the <em>python</em> and <em>ruby</em> languages, where the developer can
not easily instruct the language about the scoping rules he wants to be
using in a case by case way, as far as I can see.</p>
<p>First, let's reproduce the problem by using a single variable that we bind
in all the closures. Those are called <em>callbacks</em> in the original article, so
I've kept using that name here.</p>
<center>
<p><img src="../../../images/callback.jpg" alt=""></p>
</center>
<pre class="src">
(<span style="color: #7f007f;">defparameter</span> <span style="color: #b8860b;">*callbacks-all-sunday*</span>
<p>In that example, there's only a single variable day that we reuse throughout the <em>loop</em> construct, so that when the loop ends, we have a list of closures all refering to the same variable, and this variable, by the end of the loop, has(<span style="color: #7f007f;">loop</span> for day in days collect (<span style="color: #7f007f;">lambda</span> () day)) <span style="color: #bc8f8f;">"loop binds DAY only once"</span>) </pre>
sunday as its value.</p>
<pre class="src">
CL-USER> (mapcar #'funcall callbacks-all-sunday)
(SUNDAY SUNDAY SUNDAY SUNDAY SUNDAY SUNDAY SUNDAY)
</pre>
<h3>Closures, take 2</h3>
<p class="first">Now, the way to have what we want here, that is a list of closures each
having its own variable.</p>
<pre class="src">
(<span style="color: #7f007f;">defparameter</span> <span style="color: #b8860b;">*callbacks*</span>
<p>And there we go:</p> <pre class="src"> CL-USER> (mapcar #'funcall callbacks) (MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY) </pre> <h3>Conclusion</h3> <p class="first">Scoping rules are very important in any programming language, functional or not, and must be well understood by programmers. I find that once again, that topic has received a very deep thinking in <em>Common Lisp</em>, and the language is giving all the options to its developers.</p> <center> <p><img src="../../../images/scope.png" alt=""></p> </center> <center> <p><em>What are your language of choice scoping rules?</em></p> </center> <p>I want to stress that in <em>Common Lisp</em> the scope rules are very clearly defined in the standard documentation of the language. For instance, <em>defun</em> and <em>let</em> both introduce a lexical binding, <em>defvar</em> and <em>defparameter</em> introduce a <em>dynamic variable</em>.</p> <p>Also, as a user of the language you have the ability to <em>declare</em> any variable as being <em>special</em> in order to introduce yourself a <em>dynamic variable</em>. In(mapcar (<span style="color: #7f007f;">lambda</span> (day) <span style="color: #b22222;">;; </span><span style="color: #b22222;">for each day, produce a separate closure </span> <span style="color: #b22222;">;; </span><span style="color: #b22222;">around its own lexical variable day </span> (<span style="color: #7f007f;">lambda</span> () day)) days) <span style="color: #bc8f8f;">"A list of callbacks to return the current day..."</span>) </pre>
C you
can declare some variables as being <em>static</em>, which is something else and
frown with a very different set of problems.</p>
]]></description>
<center> <p><img src="../../../images/community.jpg" alt=""> <em>PostgreSQL is first an Awesome Community</em></p> </center> <p>The solution we talked about is to use <em>templates</em>, and so I've been working on a patch to bring <em>templates for extensions</em> to PostgreSQL. As we're talking about 3 new system catalogs, that's a big patch in term of lines of code. In term of features though, it's quite an easy one.</p> <p>Here's how it goes. Let's say you want to prepare the system to be able to<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 09 Jan 2013 11:07:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/01/09-Lost-in-scope.html</guid> </item> <item> <title>Extensions Templates</title> <link>http://tapoueh.org/blog/2013/01/08-Extensions-Templates.html</link> <description><![CDATA[<p>In a recent article titled <a href="../../2012/12/13-Inline-Extensions.html">Inline Extensions</a> we detailed the problem of how to distribute an extension's <em>package</em> to a remote server without having access to its file system at all. The solution to that problem is non trivial, let's say. But thanks to the awesome <a href="http://www.postgresql.org/community/">PostgreSQL Community</a> we finaly have some practical ideas on how to address the problem as discussed on <a href="http://archives.postgresql.org/pgsql-hackers/">pgsql-hackers</a>, our development mailing list.</p>
CREATE EXTENSION pair; without having to install it as an <em>OS package</em> for
which you would need to get root access on the server where your PostgreSQL
instance is running, which is not always easy, and sometimes not a good
idea.</p>
<h3>Installing an extension template</h3>
<p class="first">With the <a href="http://www.postgresql.org/message-id/m2wqvoha0p.fsf%402ndQuadrant.fr">template patch</a> I just sent on the lists, what you can do is prepare
a template with your extension's script and properties, then use it to
install the extensions.</p>
<pre class="src">
<span style="color: #7f007f;">create</span> <span style="color: #da70d6;">template</span>
<span style="color: #7f007f;">for</span> extension pair <span style="color: #7f007f;">default</span> <span style="color: #da70d6;">version</span> <span style="color: #bc8f8f;">'1.0'</span> <span style="color: #7f007f;">with</span> (<span style="color: #da70d6;">nosuperuser</span>, norelocatable, <span style="color: #da70d6;">schema</span> <span style="color: #da70d6;">public</span>) <span style="color: #7f007f;">as</span> $$ <span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">TYPE</span> <span style="color: #0000ff;">pair</span> <span style="color: #7f007f;">AS</span> ( <span style="color: #da70d6;">k</span> <span style="color: #228b22;">text</span>, v <span style="color: #228b22;">text</span> );
<span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">OR</span> <span style="color: #da70d6;">REPLACE</span> <span style="color: #da70d6;">FUNCTION</span> <span style="color: #0000ff;">pair</span>(anyelement, <span style="color: #228b22;">text</span>) <span style="color: #da70d6;">RETURNS</span> pair <span style="color: #da70d6;">LANGUAGE</span> <span style="color: #da70d6;">SQL</span> <span style="color: #7f007f;">AS</span> <span style="color: #bc8f8f;">'SELECT ROW($1, $2)::pair'</span>;
<span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">OR</span> <span style="color: #da70d6;">REPLACE</span> <span style="color: #da70d6;">FUNCTION</span> <span style="color: #0000ff;">pair</span>(<span style="color: #228b22;">text</span>, anyelement) <span style="color: #da70d6;">RETURNS</span> pair <span style="color: #da70d6;">LANGUAGE</span> <span style="color: #da70d6;">SQL</span> <span style="color: #7f007f;">AS</span> <span style="color: #bc8f8f;">'SELECT ROW($1, $2)::pair'</span>;
<span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">OR</span> <span style="color: #da70d6;">REPLACE</span> <span style="color: #da70d6;">FUNCTION</span> <span style="color: #0000ff;">pair</span>(anyelement, anyelement) <span style="color: #da70d6;">RETURNS</span> pair <span style="color: #da70d6;">LANGUAGE</span> <span style="color: #da70d6;">SQL</span> <span style="color: #7f007f;">AS</span> <span style="color: #bc8f8f;">'SELECT ROW($1, $2)::pair'</span>;
<h3>Installing an extension from a template</h3> <p class="first">With the template installed in the catalogs, now you can go and install your extension:</p> <pre class="src"> foo> <span style="color: #7f007f;">create</span> extension pair; <span style="color: #7f007f;">CREATE</span> EXTENSION<span style="color: #7f007f;">CREATE</span> <span style="color: #7f007f;">OR</span> <span style="color: #da70d6;">REPLACE</span> <span style="color: #da70d6;">FUNCTION</span> <span style="color: #0000ff;">pair</span>(<span style="color: #228b22;">text</span>, <span style="color: #228b22;">text</span>) <span style="color: #da70d6;">RETURNS</span> pair <span style="color: #da70d6;">LANGUAGE</span> <span style="color: #da70d6;">SQL</span> <span style="color: #7f007f;">AS</span> <span style="color: #bc8f8f;">'SELECT ROW($1, $2)::pair;'</span>; $$; </pre>
foo> \dx pair
List <span style="color: #da70d6;">of</span> installed extensions
| <span style="color: #da70d6;">Name</span> | <span style="color: #da70d6;">Version</span> | <span style="color: #da70d6;">Schema</span> | Description |
<span style="color: #b22222;">——+———+———+————-
| </span> pair | 1.0 | <span style="color: #da70d6;">public</span> |
(1 <span style="color: #da70d6;">row</span>)
foo> \dx+ pair
<p>The extension installation is now happening from the catalog templates rather than the file system, which means you didn't need to beObjects <span style="color: #7f007f;">in</span> extension "pair" <span style="color: #da70d6;">Object</span> Description <span style="color: #b22222;">————————————— </span> <span style="color: #da70d6;">function</span> pair(anyelement,anyelement) <span style="color: #da70d6;">function</span> pair(anyelement,<span style="color: #228b22;">text</span>) <span style="color: #da70d6;">function</span> pair(<span style="color: #228b22;">text</span>,anyelement) <span style="color: #da70d6;">function</span> pair(<span style="color: #228b22;">text</span>,<span style="color: #228b22;">text</span>) <span style="color: #da70d6;">type</span> pair (5 <span style="color: #da70d6;">rows</span>) </pre>
root on the
system where the server is running. Also note that this example above did
happen when connected as the <em>database owner</em>, a user who is not the
<em>superuser</em>. Requiring less privileges is always good news, right?</p>
<h3>Managing upgrade scripts and extension update</h3>
<p class="first">Now that the extension is installed, you might want to update it with some
new awesome features. Let's have a look at that.</p>
<center>
<p><img src="../../../images/extension-update.png" alt=""></p>
</center>
<center>
<p><em>Upload your Extension Update Scripts</em></p>
</center>
<p>Rather than make a new version of the extension package with the new files
in there, then asking the operations team to make the new package available
on the internal repositories then install them on the servers, you could now
prepare and <em>QA</em> the new setup that way:</p>
<pre class="src">
<span style="color: #7f007f;">create</span> <span style="color: #da70d6;">template</span> <span style="color: #7f007f;">for</span> extension pair <span style="color: #7f007f;">from</span> <span style="color: #bc8f8f;">'1.0'</span> <span style="color: #7f007f;">to</span> <span style="color: #bc8f8f;">'1.1'</span>
<span style="color: #7f007f;">as</span> $$
<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">OPERATOR</span> ~> (LEFTARG = <span style="color: #228b22;">text</span>, RIGHTARG = anyelement, <span style="color: #da70d6;">PROCEDURE</span> = pair);
<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">OPERATOR</span> ~> (LEFTARG = anyelement, RIGHTARG = <span style="color: #228b22;">text</span>, <span style="color: #da70d6;">PROCEDURE</span> = pair);
<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">OPERATOR</span> ~> (LEFTARG = anyelement, RIGHTARG = anyelement, <span style="color: #da70d6;">PROCEDURE</span> = pair);
<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">OPERATOR</span> ~> (LEFTARG = <span style="color: #228b22;">text</span>, RIGHTARG = <span style="color: #228b22;">text</span>, <span style="color: #da70d6;">PROCEDURE</span> = pair); $$;
<span style="color: #7f007f;">create</span> <span style="color: #da70d6;">template</span>
<p>Of course it's not the most realistic example when you look at the content. In particular the<span style="color: #7f007f;">for</span> extension pair <span style="color: #7f007f;">from</span> <span style="color: #bc8f8f;">'1.1'</span> <span style="color: #7f007f;">to</span> <span style="color: #bc8f8f;">'1.2'</span> <span style="color: #7f007f;">as</span> $$ <span style="color: #da70d6;">comment</span> <span style="color: #7f007f;">on</span> extension pair <span style="color: #7f007f;">is</span> <span style="color: #bc8f8f;">'Simple Key Value Text Type'</span>; $$; </pre>
1.2 version that only adds a comment to the extension. I
needed another version to test the automatic upgrade path with more than one
step though, so here we go.</p>
<pre class="src">
foo> <span style="color: #da70d6;">alter</span> extension pair <span style="color: #da70d6;">update</span> <span style="color: #7f007f;">to</span> <span style="color: #bc8f8f;">'1.2'</span>;
<span style="color: #da70d6;">ALTER</span> EXTENSION
foo> \dx pair
List <span style="color: #da70d6;">of</span> installed extensions
| <span style="color: #da70d6;">Name</span> | <span style="color: #da70d6;">Version</span> | <span style="color: #da70d6;">Schema</span> | Description |
<span style="color: #b22222;">——+———+———+—————————-
| </span> pair | 1.2 | <span style="color: #da70d6;">public</span> | <span style="color: #da70d6;">Simple</span> <span style="color: #da70d6;">Key</span> <span style="color: #da70d6;">Value</span> <span style="color: #228b22;">Text</span> <span style="color: #da70d6;">Type</span> |
(1 <span style="color: #da70d6;">row</span>)
foo> \dx+ pair
<p>We did it!</p> <h3>Internals</h3> <p class="first">Let's have a look at those new catalogs:</p> <center> <p><img src="../../../images/octopus-anatomy.jpg" alt=""></p> </center> <center> <p><em>Oh, that's not quite the internals I expected...</em></p> </center> <p>Here we go now:</p> <pre class="src"> foo> select * from pg_extension_control; select * from pg_extension_control; -[ RECORD 1 ]—+——-Objects <span style="color: #7f007f;">in</span> extension "pair" <span style="color: #da70d6;">Object</span> Description <span style="color: #b22222;">————————————— </span> <span style="color: #da70d6;">function</span> pair(anyelement,anyelement) <span style="color: #da70d6;">function</span> pair(anyelement,<span style="color: #228b22;">text</span>) <span style="color: #da70d6;">function</span> pair(<span style="color: #228b22;">text</span>,anyelement) <span style="color: #da70d6;">function</span> pair(<span style="color: #228b22;">text</span>,<span style="color: #228b22;">text</span>) <span style="color: #da70d6;">operator</span> ~>(anyelement,anyelement) <span style="color: #da70d6;">operator</span> ~>(anyelement,<span style="color: #228b22;">text</span>) <span style="color: #da70d6;">operator</span> ~>(<span style="color: #228b22;">text</span>,anyelement) <span style="color: #da70d6;">operator</span> ~>(<span style="color: #228b22;">text</span>,<span style="color: #228b22;">text</span>) <span style="color: #da70d6;">type</span> pair (9 <span style="color: #da70d6;">rows</span>) </pre>
| ctlname | pair |
| ctlowner | 32926 |
| ctldefault | t |
| ctlrelocatable | f |
| ctlsuperuser | f |
| ctlnamespace | public |
| ctlversion | 1.0 |
| ctlrequires |
foo> select * from pg_extension_template; select * from pg_extension_template; -[ RECORD 1 ]——————————————————————
| tplname | pair |
| tplowner | 32926 |
| tplversion | 1.0 |
| tplscript | |
| CREATE TYPE pair AS ( k text, v text ); | |
| CREATE OR REPLACE FUNCTION pair(anyelement, text) | |
| RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair'; | |
| CREATE OR REPLACE FUNCTION pair(text, anyelement) | |
| RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair'; | |
| CREATE OR REPLACE FUNCTION pair(anyelement, anyelement) | |
| RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair'; | |
| CREATE OR REPLACE FUNCTION pair(text, text) | |
| RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair;'; | |
foo> select * from pg_extension_uptmpl; select * from pg_extension_uptmpl; -[ RECORD 1 ]————————————————————-
| uptname | pair |
| uptowner | 32926 |
| uptfrom | 1.0 |
| uptto | 1.1 |
| uptscript | |
| CREATE OPERATOR ~> (LEFTARG = text, | |
| RIGHTARG = anyelement, | |
| PROCEDURE = pair); | |
| CREATE OPERATOR ~> (LEFTARG = anyelement, | |
| RIGHTARG = text, | |
| PROCEDURE = pair); | |
| CREATE OPERATOR ~> (LEFTARG = anyelement, | |
| RIGHTARG = anyelement, | |
| PROCEDURE = pair); | |
| CREATE OPERATOR ~> (LEFTARG = text, | |
| RIGHTARG = text, | |
| PROCEDURE = pair); | |
-[ RECORD 2 ]————————————————————-
| uptname | pair |
| uptowner | 32926 |
| uptfrom | 1.1 |
| uptto | 1.2 |
| uptscript | |
| comment on extension pair is 'Simple Key Value Text Type'; | |
NULL columns.</p>
<pre class="src">
foo> \d pg_extension_template
\d pg_extension_template
Table <span style="color: #bc8f8f;">"pg_catalog.pg_extension_template"</span>
| Column | Type | Modifiers |
+——+————
| tplname | name | not null |
| tplowner | oid | not null |
| tplversion | text | |
| tplscript | text |
Indexes:
<span style="color: #bc8f8f;">"pg_extension_template_name_version_index"</span> UNIQUE, btree (tplname, tplversion) <span style="color: #bc8f8f;">"pg_extension_template_oid_index"</span> UNIQUE, btree (oid)
foo> \d pg_extension_uptmpl \d pg_extension_uptmpl Table <span style="color: #bc8f8f;">"pg_catalog.pg_extension_uptmpl"</span>
| Column | Type | Modifiers |
+——+————
| uptname | name | not null |
| uptowner | oid | not null |
| uptfrom | text | |
| uptto | text | |
| uptscript | text |
Indexes:
<h3>Next steps</h3> <p class="first">Now that we have the basics in place, the patch is far from finished still. It needs<span style="color: #bc8f8f;">"pg_extension_uptmpl_name_from_to_index"</span> UNIQUE, btree (uptname, uptfrom, uptto) <span style="color: #bc8f8f;">"pg_extension_uptmpl_oid_index"</span> UNIQUE, btree (oid) </pre>
pg_dump and psql support, support for the function
pg_available_extension_versions(), implementing some ALTER TEMPLATE FOR
EXTENSION commands for which I only sketched the syntax in the grammar, and
some more infrastructure to be able to have ALTER OWNER and ALTER RENAME
commands.</p>
<center>
<p><img src="../../../images/patch-brewing.jpg" alt=""></p>
</center>
<center>
<p><em>Warning: patch brewing here! Syntax and other key elements will change.</em></p>
</center>
<p>All that is pretty technical though, the real thing that patch needs is some
quality review and maybe some adjustments. I would be surprised if it didn't
need adjustments, really. Because the way the community works, we always
need some. That's why the PostgreSQL product is so good!</p>
]]></description>
<center> <p><img src="../../../images/dylibbundler.png" alt=""></p> </center> <p>Now that we have the <em>Extension</em> facility though, what we see is a growing number of users taking advantage of it for the purpose of managing in house procedural code and related objects. This code can be a bunch of <a href="http://www.postgresql.org/docs/9.2/static/plpgsql.html">PLpgSQL</a> or <a href="http://www.postgresql.org/docs/9.2/static/plpython.html">plpython</a> functions and as such you normaly create them directly from any application connection to PostgreSQL.</p> <p>So the idea would be to allow creating <em>Extensions</em> fully from a SQL command, including the whole set of objects it contains. More than one approach are possible to reach that goal, each with downsides and advantages. We will see them later in that document.</p> <p>Before that though, let's first review what the extension mechanism has to offer to its users when there's no <em>contrib like</em> module to manage.</p> <h3>A use case for next generation extensions</h3> <p class="first">The only design goal of the<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 08 Jan 2013 17:53:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2013/01/08-Extensions-Templates.html</guid> </item> <item> <title>Inline Extensions</title> <link>http://tapoueh.org/blog/2012/12/13-Inline-Extensions.html</link> <description><![CDATA[<p>We've been having the
CREATE EXTENSIONfeature in <a href="http://www.postgresql.org/">PostgreSQL</a> for a couple of releases now, so let's talk about how to go from here. The first goal of the extension facility has been to allow for a clean <em>dump</em> and <em>restore</em> process of <a href="http://www.postgresql.org/docs/9.2/static/contrib.html">contrib</a> modules. As such it's been tailored to the needs of deploying files on the <em>file system</em> because there's no escaping from that when you have to ship <em>binary</em> and <em>executable</em> files, those infamous.so,.dllor.dylibthings.</p>
9.1 PostgreSQL Extension feature has been to
support a proper <em>dump & restore</em> user experience when using <em>contrib modules</em>
such as hstore or ltree. Building up on that, what do <em>Extensions</em> have to
offer to non C developpers out there? In other words, what CREATE EXTENSION
brings on the table that a bunch of <em>loose</em> objects does not? What problems
can we now solve?</p>
<center>
<p><img src="../../../images/multi_function_equipment.jpg" alt=""></p>
</center>
<center>
<p><em>A Multi Functions Equipment, All Bundled Together</em></p>
</center>
<p>A way to phrase it is to say that <em>Extensions</em> are user defined CASCADE
support. <em>Extensions</em> brings extensibility to the pg_depend PostgreSQL
internal dependency tracking system that CASCADE is built on. From that
angle, <em>Extensions</em> are a way to manage dependencies of <em>SQL objects</em> in a way
that allow you to manage them as a single entity.</p>
<p>One of the existing problems this helps solving is the infamous lack of
dependency tracking between function calls. Using <em>Extensions</em> when you deal
with a set of functions acting as an API, you can at least protect that as a
unit:</p>
<pre class="src">
STATEMENT: drop function public.populate_record(anyelement,hstore);
<p>And you also have a version number and tools integration to manage extensions, with psqlERROR: cannot drop function populate_record(anyelement,hstore) because extension hstore requires it HINT: You can drop extension hstore instead. </pre>
\dx command and the equivalent feature in <a href="http://www.pgadmin.org/">pgAdmin</a>.
Coming with your own version number management is not impossible, some do
that already. Here it's integrated and the upgrade sequences are offered too
(applying 1.1--1.2 then 1.2--1.3 automatically).</p>
<p>Let's just say that it's very easy to understand the <em>traction</em> our users feel
towards leveraging <em>Extensions</em> features in order to properly manage their set
of stored procedures and SQL objects.</p>
<h3>The <em>dump & restore</em> experience</h3>
<p class="first">The common problem of all those proposals is very central to the whole idea
of <em>Extensions</em> as we know them. The goal of building them as been to fix the
<em>restoring</em> experience when using extensions in a database, and we managed to
do that properly for contrib likes extensions.</p>
<center>
<p><img src="../../../images/fly.tn.png" alt=""></p>
</center>
<center>
<p><em>A fly in the ointment</em></p>
</center>
<p>When talking about <em>Inline Extensions</em>, the fly in the ointment is how to
properly manage their pg_dump behavior. The principle we built for
<em>Extensions</em> and that is almost unique to them is to <strong><em>omit</em></strong> them in the dump
files. The only other objects that we filter out of the dump are the one
installed at server initialisation times, when using <a href="http://www.postgresql.org/docs/9.2/static/app-initdb.html">initdb</a>, to be found in
the pg_catalog and information_schema systems' <em>schema</em>.</p>
<p>At restore time, the dump file contains the CREATE EXTENSION command so the
PostgreSQL server will go fetch the <em>control</em> and <em>script</em> files on disk and
process them, loading the database with the right set of SQL objects.</p>
<p>Now we're talking about <em>Extensions</em> which we would maybe want to dump the
objects of, so that at <em>restore</em> time we don't need to find them from unknown
external resources: the fact that the extension is <em>Inline</em> means that the
PostgreSQL server has no way to know where its content is coming from.</p>
<p>The next proposals are trying to address that problem, with more or less
success. So far none of them is entirely sastisfying to me, even if a clear
temporary winner as emerged on the <em>hackers</em> mailing list, summarized in the
<a href="http://archives.postgresql.org/message-id/m2fw3judug.fsf@2ndQuadrant.fr">in-catalog Extension Scripts and Control parameters (templates?)</a> thread.</p>
<h3>Inline Extension Proposals</h3>
<p class="first">Now, on to some proposals to make the best out of our all time favorite
PostgreSQL feature, the only one that makes no sense at all by itself...</p>
<h4>Starting from an empty extension</h4>
<p class="first">We already have the facility to add existing <em>loose</em> objects to an extension,
and that's exactly what we use when we create an extension for the first
time when it used not to be an extension before, with the CREATE EXTENSION
... FROM 'unpackaged'; command.</p>
<p>The hstore--unpackaged--1.0.sql file contains statements such as:</p>
<pre class="src">
<span style="color: #da70d6;">ALTER</span> EXTENSION hstore <span style="color: #da70d6;">ADD</span> <span style="color: #da70d6;">type</span> <span style="color: #0000ff;">hstore</span>;
<span style="color: #da70d6;">ALTER</span> EXTENSION hstore <span style="color: #da70d6;">ADD</span> <span style="color: #da70d6;">function</span> <span style="color: #0000ff;">hstore_in</span>(cstring);
<span style="color: #da70d6;">ALTER</span> EXTENSION hstore <span style="color: #da70d6;">ADD</span> <span style="color: #da70d6;">function</span> <span style="color: #0000ff;">hstore_out</span>(hstore);
<span style="color: #da70d6;">ALTER</span> EXTENSION hstore <span style="color: #da70d6;">ADD</span> <span style="color: #da70d6;">function</span> <span style="color: #0000ff;">hstore_recv</span>(internal);
<span style="color: #da70d6;">ALTER</span> EXTENSION hstore <span style="color: #da70d6;">ADD</span> <span style="color: #da70d6;">function</span> <span style="color: #0000ff;">hstore_send</span>(hstore);
</pre>
<p>Opening CREATE EXTENSION so that it allows you to create a really <em>empty</em>
extension would then allow you to fill-in as you need, with as many commands
as you want to add objects to it. The <em>control</em> file properties would need to
find their way in that design, that sure can be taken care of.</p>
<center>
<p><img src="../../../images/empty-extension.jpg" alt=""></p>
</center>
<center>
<p><em>Look me, an Empty Extension!</em></p>
</center>
<p>The main drawback here is that there's no separation anymore in between the
extension author, the distribution means, the DBA and the database user.
When you want to install a third party <em>Extension</em> using only SQL commands,
you could do it with that scheme by using a big script full of one-liners
commands.</p>
<p>So that if you screw up your <em>copy/pasting</em> session (well you should maybe
reconsider your choice of tooling at this point, but that's another topic),
you will end up with a perfectly valid <em>Extension</em> that does not contain what
you wanted. As the end user, you have no clue about that until the first
time using the extension fails.</p>
<h4>CREATE EXTENSION AS</h4>
<p class="first">The next idea is to embed the <em>Extension</em> script itself in the command, so as
to to get a cleaner command API (in my opinion at least) and a better error
message when the paste is wrong. Of course it your <em>paste</em> problem happens to
just be loosing a line in the middle of the script there is not so much I
can do for you...</p>
<pre class="src">
<span style="color: #7f007f;">CREATE</span> EXTENSION hstore
<span style="color: #7f007f;">WITH</span> <span style="color: #da70d6;">parameter</span> = <span style="color: #da70d6;">value</span>, ... <span style="color: #7f007f;">AS</span> $$ <span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">TYPE</span> <span style="color: #0000ff;">hstore</span>;
<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">FUNCTION</span> <span style="color: #0000ff;">hstore_in</span>(cstring) <span style="color: #da70d6;">RETURNS</span> hstore
<span style="color: #7f007f;">AS</span> <span style="color: #bc8f8f;">'MODULE_PATHNAME'</span> <span style="color: #da70d6;">LANGUAGE</span> <span style="color: #da70d6;">C</span> <span style="color: #da70d6;">STRICT</span> <span style="color: #da70d6;">IMMUTABLE</span>;
<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">FUNCTION</span> <span style="color: #0000ff;">hstore_out</span>(hstore) <span style="color: #da70d6;">RETURNS</span> cstring <span style="color: #7f007f;">AS</span> <span style="color: #bc8f8f;">'MODULE_PATHNAME'</span> <span style="color: #da70d6;">LANGUAGE</span> <span style="color: #da70d6;">C</span> <span style="color: #da70d6;">STRICT</span> <span style="color: #da70d6;">IMMUTABLE</span>;
<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">FUNCTION</span> <span style="color: #0000ff;">hstore_recv</span>(internal) <span style="color: #da70d6;">RETURNS</span> hstore <span style="color: #7f007f;">AS</span> <span style="color: #bc8f8f;">'MODULE_PATHNAME'</span> <span style="color: #da70d6;">LANGUAGE</span> <span style="color: #da70d6;">C</span> <span style="color: #da70d6;">STRICT</span> <span style="color: #da70d6;">IMMUTABLE</span>;
<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">FUNCTION</span> <span style="color: #0000ff;">hstore_send</span>(hstore) <span style="color: #da70d6;">RETURNS</span> <span style="color: #228b22;">bytea</span> <span style="color: #7f007f;">AS</span> <span style="color: #bc8f8f;">'MODULE_PATHNAME'</span> <span style="color: #da70d6;">LANGUAGE</span> <span style="color: #da70d6;">C</span> <span style="color: #da70d6;">STRICT</span> <span style="color: #da70d6;">IMMUTABLE</span>;
<span style="color: #7f007f;">CREATE</span> <span style="color: #da70d6;">TYPE</span> <span style="color: #0000ff;">hstore</span> (
INTERNALLENGTH = -1, <span style="color: #da70d6;">STORAGE</span> = extended
<span style="color: #da70d6;">INPUT</span> = hstore_in, <span style="color: #da70d6;">OUTPUT</span> = hstore_out,
RECEIVE = hstore_recv, SEND = hstore_send);
$$;
</pre>
<center>
<p><em>An edited version of hstore--1.1.sql for vertical space concerns</em></p>
</center>
TEXT SEARCH TEMPLATE After All</em></p>
</center>
<p>The idea would then be to have some new specific TEMPLATE SQL Object that
would be used to <em>import</em> or <em>upload</em> your control file and create and update
scripts in the database, using nothing else than a SQL connection. Then at
CREATE EXTENSION time the system would be able to work either from the file
system or the <em>template</em> catalogs.</p>
<p>One obvious problem is how to deal with a unique namespace when we split the
sources into the file system and the database, and when the file system is
typically maintained by using apt-get or yum commands.</p>
<p>Then again I would actually prefer that mechanism better than the other
proposals if the idea was to load the file system control and scripts files
as TEMPLATEs themselves and then only operate <em>Extensions</em> from <em>Templates</em>. But
doing that would mean getting back to the situation where we still are not
able to devise a good, simple and robust pg_dump policy for extensions and
templates.</p>
<h3>Conclusion</h3>
<p class="first">I hope to be finding the right solution to my long term plan in this release
development cycle, but it looks like the right challenge to address now is
to find the right compromise instead. Using the <em>Templates</em> idea already
brings a lot on the table, if not the whole set of features I would like to
see.</p>
<center>
<p><img src="../../../images/building-blocks.jpg" alt=""></p>
</center>
<center>
<p><em>PostgreSQL: Building on Solid Foundations</em></p>
</center>
<p>What would be missing mainly would be the ability for an <em>Extension</em> to switch
from being file based to being a template, either because the author decided
to change the way he's shipping it, or because the user is switching from
using the <a href="http://pgxnclient.projects.pgfoundry.org/">pgxn client</a> to using <em>proper</em> system packages. I guess that's
something we can see about later, though.</p>
]]></description>
<center> <p><img src="../../../images/trigger-wheels.big.jpg" alt=""></p> </center> <p>Another way to ask that question is saying that</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 13 Dec 2012 11:34:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/12/13-Inline-Extensions.html</guid> </item> <item> <title>Trigger Parameters</title> <link>http://tapoueh.org/blog/2012/12/06-parametrized-triggers.html</link> <description><![CDATA[<p>We have a not too active <a href="http://archives.postgresql.org/pgsql-fr-generale/2012-12/index.php">postgresql-fr-generale</a> mailing list where some interesting questions are asked by our subscribers. This article comes from such a question about how to deal with trigger parameters, which are nice to have, but static.</p>
<p>So here it is,<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 06 Dec 2012 11:10:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/12/06-parametrized-triggers.html</guid> </item> <item> <title>M-x ack</title> <link>http://tapoueh.org/blog/2012/11/22-Emacs-Ack-Mode.html</link> <description><![CDATA[<p>I've been asked about how to integrate the <a href="http://betterthangrep.com/">ack</a> tool (you know, the one that is <em>better than grep</em>) into Emacs today. Again. And I just realized that I didn't blog about my solution. That might explain why I keep getting asked about it after all...</p>
M-x ack:</p>
<pre class="src">
<span style="color: #b22222;">;;; </span><span style="color: #b22222;">dim-ack.el — Dimitri Fontaine
</span><span style="color: #b22222;">;;</span><span style="color: #b22222;">
</span><span style="color: #b22222;">;; </span><span style="color: #b22222;">http://stackoverflow.com/questions/2322389/ack-does-not-work-when-run-from-gr</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">ep-find-in-emacs-on-windows</span><span style="color: #b22222;">
</span>
(<span style="color: #7f007f;">defcustom</span> <span style="color: #b8860b;">ack-command</span> (or (executable-find <span style="color: #bc8f8f;">"ack"</span>)
(executable-find <span style="color: #bc8f8f;">"ack-grep"</span>)) <span style="color: #bc8f8f;">"Command to use to call ack, e.g. ack-grep under debian"</span> <span style="color: #da70d6;">:type</span> 'file)
(<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">ack-command-line</span> (concat ack-command <span style="color: #bc8f8f;">" —nogroup —nocolor "</span>)) (<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">ack-history</span> nil) (<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">ack-host-defaults-alist</span> nil)
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">ack</span> ()
<span style="color: #bc8f8f;">"Like grep, but using ack-command as the default"</span> (interactive) <span style="color: #b22222;">; </span><span style="color: #b22222;">Make sure grep has been initialized </span> (<span style="color: #7f007f;">if</span> (>= emacs-major-version 22) (<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">grep</span>) (<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">compile</span>)) <span style="color: #b22222;">; </span><span style="color: #b22222;">Close STDIN to keep ack from going into filter mode </span> (<span style="color: #7f007f;">let</span> ((null-device (format <span style="color: #bc8f8f;">"< %s"</span> null-device)) (grep-command ack-command-line) (grep-history ack-history) (grep-host-defaults-alist ack-host-defaults-alist)) (call-interactively 'grep) (setq ack-history grep-history ack-host-defaults-alist grep-host-defaults-alist)))
(<span style="color: #7f007f;">provide</span> '<span style="color: #5f9ea0;">dim-ack</span>) </pre>
<p>Enjoy!</p> ]]></description><center> <p><img src="../../../images/happy-numbers.png" alt=""></p> </center> <p>Today I'm back on that topic and as I'm toying with <em>Common Lisp</em> I though it would be a good excuse to learn me some new tricks. As you can see from the earlier blog entry, last time I did attack the <em>digits</em> problem quite lightly. Let's try a better approach now.</p> <pre class="src"> (<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">digits</span> (n)<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 22 Nov 2012 17:36:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/11/22-Emacs-Ack-Mode.html</guid> </item> <item> <title>CL Happy Numbers</title> <link>http://tapoueh.org/blog/2012/11/20-CL-Happy-Numbers.html</link> <description><![CDATA[<p>A while ago I stumbled upon <a href="http://tapoueh.org/blog/2010/08/30-happy-numbers.html">Happy Numbers</a> as explained in <a href="http://programmingpraxis.com/2010/07/23/happy-numbers/">programming praxis</a>, and offered an implementation of them in
SQLand inEmacs Lisp. Yeah, I know. Why not, though?</p>
<p>As you can see I wanted to use that facility I like very much, the<span style="color: #bc8f8f;">"return the list of the digits of N"</span> (nreverse (<span style="color: #7f007f;">loop</span> for x = n then r for (r d) = (multiple-value-list (truncate x 10)) collect d until (zerop r)))) </pre>
for
x = n then r way to handle first loop iteration differently from the
next ones. But I've been hinted on #lisp that there's a much better way to
write same code:</p>
<pre class="src">
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">integer-digits</span> (integer)
<p>That code runs about twice as fast as the previous one and is easier to reason about. It's using<span style="color: #bc8f8f;">"stassats version"</span> (nreverse (<span style="color: #7f007f;">loop</span> with remainder do (setf (values integer remainder) (truncate integer 10)) collect remainder until (zerop integer)))) </pre>
setf and the form <a href="http://www.lispworks.com/documentation/lw51/CLHS/Body/f_values.htm">setf values</a>, something nice to
discover as it seems to be quite powerful. Let's see how to use it, even if
it's really simple:</p>
<pre class="src">
CL-USER> (integer-digits 12304501)
(1 2 3 0 4 5 0 1)
</pre>
<p>Let's move on to solving the <em>Happy Numbers</em> problem though:</p>
<pre class="src">
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">sum-of-squares-of-digits</span> (integer)
(<span style="color: #7f007f;">loop</span> with remainder do (setf (values integer remainder) (truncate integer 10)) sum (* remainder remainder) until (zerop integer)))
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">happy?</span> (n <span style="color: #228b22;">&optional</span> seen)
<span style="color: #bc8f8f;">"return true when n is a happy number"</span> (<span style="color: #7f007f;">let*</span> ((happiness (sum-of-squares-of-digits n))) (<span style="color: #7f007f;">cond</span> ((eq 1 happiness) t) ((memq happiness seen) nil) (t (happy? happiness (push happiness seen))))))
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">find-happy-numbers</span> (limit)
<p>And here's how it goes:</p> <pre class="src"> CL-USER> (find-happy-numbers 100) (1 7 10 13 19 23 28 31 32 44 49 68 70 79 82 86 91 94 97 100)<span style="color: #bc8f8f;">"find all happy numbers from 1 to limit"</span> (<span style="color: #7f007f;">loop</span> for n from 1 to limit when (happy? n) collect n)) </pre>
CL-USER> (time (length (find-happy-numbers 1000000))) (LENGTH (FIND-HAPPY-NUMBERS 1000000)) took 1,621,413 microseconds (1.621413 seconds) to run.
116,474 microseconds (0.116474 seconds, 7.18%) of which was spent in GC. During that period, and with 4 available CPU cores, 1,431,332 microseconds (1.431332 seconds) were spent in user mode 145,941 microseconds (0.145941 seconds) were spent in system mode 185,438,208 bytes of memory allocated. 1 minor page faults, 0 major page faults, 0 swaps. 143071 </pre>
SQL
and <em>Emacs Lisp</em>, the reason being that instead of writing the number into a
<em>string</em> with (format t "~d" number) then <a href="http://www.lispworks.com/documentation/HyperSpec/Body/f_subseq.htm">subseq</a> to get them one after the
other, we're now using <a href="http://www.lispworks.com/documentation/HyperSpec/Body/f_floorc.htm">truncate</a>.</p>
<p>Happy hacking!</p>
<h3>Update</h3>
<p class="first">It turns out that to solve math related problem, some maths hindsight is
helping. Who would have believed that? So if you want to easily get some
more performances out of the previous code, just try that solution:</p>
<pre class="src">
(<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">*depressed-squares*</span> '(0 4 16 20 37 42 58 89 145)
<span style="color: #bc8f8f;">"see http://oeis.org/A039943"</span>)
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">undepressed?</span> (n)
<span style="color: #bc8f8f;">"same as happy?, using a static list of unhappy sums"</span> (<span style="color: #7f007f;">cond</span> ((eq 1 n) t) ((member n depressed-squares) nil) (t (<span style="color: #7f007f;">let</span> ((h (sum-of-squares-of-digits n))) (undepressed? h)))))
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">find-undepressed-numbers</span> (limit)
<p>Time to compare:</p> <pre class="src"> CL-USER> (time (length (find-happy-numbers 1000000))) (LENGTH (FIND-HAPPY-NUMBERS 1000000)) took 1,938,048 microseconds (1.938048 seconds) to run.<span style="color: #bc8f8f;">"find all happy numbers from 1 to limit"</span> (<span style="color: #7f007f;">loop</span> for n from 1 to limit when (undepressed? n) collect n)) </pre>
290,902 microseconds (0.290902 seconds, 15.01%) of which was spent in GC. During that period, and with 4 available CPU cores, 1,778,021 microseconds (1.778021 seconds) were spent in user mode 140,862 microseconds (0.140862 seconds) were spent in system mode 185,438,208 bytes of memory allocated. 3,320 minor page faults, 0 major page faults, 0 swaps. 143071
CL-USER> (time (length (find-undepressed-numbers 1000000))) (LENGTH (FIND-UNDEPRESSED-NUMBERS 1000000)) took 1,036,847 microseconds (1.036847 seconds) to run.
5,372 microseconds (0.005372 seconds, 0.52%) of which was spent in GC. During that period, and with 4 available CPU cores, 1,018,708 microseconds (1.018708 seconds) were spent in user mode 16,982 microseconds (0.016982 seconds) were spent in system mode 2,289,152 bytes of memory allocated. 143071 CL-USER> </pre> ]]></description> <author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 20 Nov 2012 18:20:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/11/20-CL-Happy-Numbers.html</guid> </item> <item> <title>About Vimgolf</title> <link>http://tapoueh.org/blog/2012/11/06-About-vimgolf.html</link> <description><![CDATA[<p>Following some <em>tweet</em> I found myself desultory watching an episode of the awesome <a href="http://vimeo.com/channels/222837">VimGolf in Emacs</a> video series by <a href="http://vimeo.com/timvisher">Tim Visher</a>. Those series are about picking some challenge from <a href="http://vimgolf.com/">vimgolf</a> and implementing it with our favorite editor instead. Because <a href="http://emacsrocks.com/">Emacs Rocks</a> guys.</p>
0xab, where you begin with a template containing only the 0x00 entry. The
idea is of course to use the <em>Vim</em> feature that will increment the <em>number at
point</em>, and is available through the C-a keystroke.</p>
<pre class="src">
<span style="color: #228b22;">unsigned</span> <span style="color: #228b22;">int</span> <span style="color: #b8860b;">hex</span>[] = {
0x00, }; </pre>
0x as a prefix meaning that the next number is <em>hexadecimal</em>. But
<em>Emacs</em> ship with <a href="http://www.gnu.org/software/emacs/manual/html_node/emacs/Keyboard-Macros.html">Emacs Keyboard Macros</a> and those have a counter, so it's easy
enough to fill in numbers from 1 to 255 that way: M-1 F3 F3 , F4 will
register a macro where the counter starts at 1, and each time you hit F4 it
will insert the current counter value, increment it and insert a coma. You
want to do that 254 times, so you do C-u 2 5 4 F4 and <em>Emacs</em> just does that.</p>
<p>Now, to transform those decimal numbers into their <em>hexadecimal</em>
representation, you can use advanced <a href="http://www.gnu.org/software/emacs/manual/html_node/emacs/Regexp-Replace.html">Emacs Regexp Replace</a> features. Replace
[0-9]+ with the result from the following <em>Emacs Lisp</em> code: \,(format
"0x%02x" (string-to-number \&)). The \& in there will be replaced by the
matching text, so that will do what we need here, turning 10 into 0x0a.</p>
<h3>Let's get some better tools</h3>
<p class="first">We could do better, though. I happen to already use a <em>key chord</em> to duplicate
the current line, and we would need a function to <a href="http://www.emacswiki.org/emacs/IncrementNumber">Increment Number At Point</a>.
Those I found over at <a href="http://www.emacswiki.org/">EmacsWiki</a> were not to my taste as they were not able
to figure out easily which <em>base</em> to use. So here's a little <em>Emacs Lisp</em>
example showing how to extend your favorite editor to have some <em>Vim</em> common
features, which is why <em>Emacs</em> ships with <em>Emacs Lisp</em> in the first place.</p>
<pre class="src">
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">duplicate-current-line</span> (<span style="color: #228b22;">&optional</span> n)
<span style="color: #bc8f8f;">"duplicate current line, make more than 1 copy given a numeric argument"</span> (interactive <span style="color: #bc8f8f;">"p"</span>) (<span style="color: #7f007f;">let</span> ((nb (or n 1)) (current-line (thing-at-point 'line))) (<span style="color: #7f007f;">save-excursion</span> <span style="color: #b22222;">;; </span><span style="color: #b22222;">when on last line, insert a newline first </span> (<span style="color: #7f007f;">when</span> (or (= 1 (forward-line 1)) (eq (point) (point-max))) (insert <span style="color: #bc8f8f;">"\n"</span>))
<span style="color: #b22222;">;; </span><span style="color: #b22222;">now insert as many time as requested </span> (<span style="color: #7f007f;">while</span> (> n 0) (insert current-line) (decf n))) <span style="color: #b22222;">;; </span><span style="color: #b22222;">now move down as many lines as we inserted </span> (next-line nb)))
(global-set-key (kbd <span style="color: #bc8f8f;">"C-S-d"</span>) 'duplicate-current-line) </pre>
<center> <p><a class="image-link" href="http://lisperati.com/"> <img src="../../../images/emacs-on-toaster.jpg"></a></p> </center> <pre class="src"> (<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">cl</span>) <span style="color: #b22222;">; </span><span style="color: #b22222;">destructuring-bind is found there </span> (<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">dim:increment-number-at-point</span> (<span style="color: #228b22;">&optional</span> prefix)(interactive <span style="color: #bc8f8f;">"p"</span>) (<span style="color: #7f007f;">let*</span> ((beg (skip-chars-backward <span style="color: #bc8f8f;">"0-9a-fA-F"</span>)) (hexa (<span style="color: #7f007f;">save-excursion</span> (forward-char -2) (looking-at-p <span style="color: #bc8f8f;">"0x"</span>))) <span style="color: #b22222;">;; </span><span style="color: #b22222;">force the prefix to hexa (4) we see "0x" before the number </span> (prefix (<span style="color: #7f007f;">if</span> hexa 4 prefix)) (end (re-search-forward <span style="color: #bc8f8f;">"[0-9a-fA-F]+"</span> nil t)) (nstr (match-string 0)) (l (- (match-end 0) (match-beginning 0))) (fmt (format <span style="color: #bc8f8f;">"%%0%d"</span> l))) (message <span style="color: #bc8f8f;">"PLOP: %d"</span> prefix) (<span style="color: #7f007f;">destructuring-bind</span> (base format) (<span style="color: #7f007f;">case</span> prefix ((1) '(10 <span style="color: #bc8f8f;">"d"</span>)) <span style="color: #b22222;">; </span><span style="color: #b22222;">no command prefix, decimal </span> ((4) '(16 <span style="color: #bc8f8f;">"x"</span>)) <span style="color: #b22222;">; </span><span style="color: #b22222;">C-u, hexadecimal </span> ((16) '(8 <span style="color: #bc8f8f;">"o"</span>))) <span style="color: #b22222;">; </span><span style="color: #b22222;">C-u C-u, octal </span> (<span style="color: #7f007f;">let*</span> ((n (string-to-number nstr base)) (n+1 (+ n 1)) (fmt (format <span style="color: #bc8f8f;">"%s%s"</span> fmt format))) (replace-match (format fmt n+1))))))
(global-set-key (kbd <span style="color: #bc8f8f;">"C-c +"</span>) 'dim:increment-number-at-point) </pre>
<blockquote> <p class="quoted"> So if you're using <em>Emacs</em> a lot but always found an excuse not to grasp <em>Emacs Lisp</em>, I hope that article could be an excuse for you to do so…</p> </blockquote> <h3>Another solution</h3> <p class="first">Anyway, now that we are much better equipped, we can picture a better way to solve the problem. Instead of using a macro that inserts the next counter value, we can use one that duplicate current line, increment number at point (and figures out on its own that the number prefixed with0x is
<em>hexadecimal</em>), and does that 254 times more. Then it's all about reformatting
the text so that if fits nicely on screen, and for that the command M-q runs
the command fill-paragraph is exactly what we need. The command C-x f runs
the command set-fill-column can be used to set the maximum column we allow
<em>Emacs</em> to reach before going to the next line.</p>
<p>Our <em>Golf</em> then becomes a 19 steps solution if you start with the cursor at
the ',' in the previous example:</p>
<pre class="src">
C-x f 5 6 RET
F3 C-S-d C-c + F4
C-u 2 5 4 F4
C-SPC M-< C-n M-q
</pre>
<p>First, set the <em>fill column</em>, then register a macro (in between F3 and F4)
that will duplicate current line (using C-S-d) then increment number at
point (using C-c +). Third line, replay that macro 254 times (C-u 2 5 4 F4).
Fourth line, select all those <em>hexadecimal</em> numbers and fill the paragraph
they form correctly, so as to get:</p>
<h3>All those tips for...</h3>
<pre class="src">
<span style="color: #228b22;">unsigned</span> <span style="color: #228b22;">int</span> <span style="color: #b8860b;">hex</span>[] = {
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, 0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f, 0x40, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47, 0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, 0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58, 0x59, 0x5a, 0x5b, 0x5c, 0x5d, 0x5e, 0x5f, 0x60, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68, 0x69, 0x6a, 0x6b, 0x6c, 0x6d, 0x6e, 0x6f, 0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76, 0x77, 0x78, 0x79, 0x7a, 0x7b, 0x7c, 0x7d, 0x7e, 0x7f, 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f, 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97, 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f, 0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7, 0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf, 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7, 0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf, 0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7, 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf, 0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7, 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf, 0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7, 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef, 0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7, 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff, }; </pre>
pg_backend_pid() of the <a href="http://www.postgresql.org/">PostgreSQL</a> backend I'm working with at
the psql prompt so that I can attach gdb to it. I'll get back to talking
about <a href="https://github.com/dimitri/pgdevenv-el">pgdevenv-el</a> later!</p>
<p>Hope you did enjoy that article, whose goal is to help you while you're
journeying in <a href="http://blog.vivekhaldar.com/post/3996068979/the-levels-of-emacs-proficiency">The Levels Of Emacs Proficiency</a>.</p>
<h3>Update</h3>
<p class="first">While looking at the docs for the <a href="http://www.gnu.org/software/emacs/manual/html_node/emacs/Keyboard-Macro-Counter.html#Keyboard-Macro-Counter">Keyboard Macro Counter</a> to check how to
reset it without having to record the macro again, I just stumbled on this
part of the docs: C-x C-k C-f runs the command kmacro-set-format. So another
way to solve our problem with only facilities that come with a bare Emacs is
the following:</p>
<pre class="src">
C-x f 5 6 RET
C-x C-k C-f 0x%02x RET
C-1 F3 SPC F3 , F4
C-u 2 5 4 F4
DEL C-SPC C-a C-q
</pre>
<p>We're now at 30 keystrokes, so much more than previously, but it's stock
Emacs features and that kmacro-set-format is a wonderful little tool you
might as well have a need for in the future.</p>
]]></description>
<center> <p><a class="image-link" href="http://www.online-marketwatch.com/pgel/pg.html"> <img src="../../../images/pg-el.png"></a></p> </center> <p>One of the things where <em>Emacs</em> really shines is that interactive development environment you get when working on some <em>Emacs Lisp</em> code. Evaluating an function as easy as a single <em>key chord</em>, and that will both compile in the function and load it in the running process. I can't tell you how many times I've been missing that ability when editing C code.</p> <p>With <em>PostgreSQL</em> too we get a pretty interactive environment with the <a href="http://www.postgresql.org/docs/current/static/app-psql.html">psql</a> console application, or with <a href="http://www.pgadmin.org/">pgAdmin</a>. One feature from <em>pgAdmin</em> that I've often wished I had in <em>psql</em> is the ability to edit my query online and easily run it in the console, rather than either using the <em>readline</em> limited history editing features or launching a new editor process each time with<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Sun, 11 Nov 2012 20:52:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/11/06-About-vimgolf.html</guid> </item> <item> <title>Editing SQL</title> <link>http://tapoueh.org/blog/2012/11/06-Interactive-SQL.html</link> <description><![CDATA[<p>It's hard to read my blog yet not know I'm using <a href="http://www.gnu.org/software/emacs/#Platforms">Emacs</a>. It really is a great tool and has a lot to compare to <a href="http://www.postgresql.org/">PostgreSQL</a> in terms of extensibility, documentation quality and community. And there's even a native implementation of the <a href="http://www.postgresql.org/docs/current/static/protocol.html">PostgreSQL Protocol</a> written in <a href="http://www.gnu.org/software/emacs/emacs-lisp-intro/">Emacs Lisp</a>.</p>
\e. At the
same time I would much prefer using my usual <em>Emacs</em> editor to actually <em>edit</em>
the query.</p>
<p>If you've been reading that blog before you know what to expect. My solution
to the stated problem is available in <a href="https://github.com/dimitri/pgdevenv-el">pgdevenv-el</a>, an <em>Emacs</em> package aimed at
helping <em>PostgreSQL</em> developers. Most of the features in there are geared
toward the <em>core backend</em> developers, except for this one I want to talk about
today (I'll blog about the other ones too I guess).</p>
<center>
<p><img src="../../../images/pgdevenv-el-eval-sql.png" alt=""></p>
</center>
<p>What you can see from that screenshot is that the selected query text has
been sent to the <em>psql</em> buffer and exectuted over there. And that the <em>psql</em>
buffer is echoing all queries sent to it. What you can not see straight from
that picture is the interaction to get there. Well, I've been implementing
some <em>elisp</em> features that I was missing.</p>
<p>First, movement: you can do C-M-a and C-M-e to navigate to the beginning and
the end of the SQL query at point, like you do in C or in lisp in <em>Emacs</em>.</p>
<p>Then, selection: you can do C-M-h to select the SQL query at point, you
don't have to navigate yourself, <a href="https://github.com/dimitri/pgdevenv-el">pgdev-sql-mode</a> knows how to do that. Side
note, pgdev-sql-mode is the name of the <em>minor mode</em> you need to activate in
your SQL buffers to have the magic available.</p>
<p>Last but not least, evaluation: as when editing lisp code, you can now use
C-M-x to send the current query text to an associated <em>psql</em> buffer.</p>
<p>The way to associate the <em>psql</em> buffer to an <em>SQL</em> buffer is currently done
thanks to the other <em>pgdevenv-el</em> features that this blog post is not talking
about, and the setup is addressed in the documentation: you have to let know
<em>pgdevenv-el</em> where your PostgreSQL branches are installed locally so that it
can prepare you a <em>Shell</em> buffer with PGDATA and PGPORT already set for you.
And currently, for C-M-x to work you need to open the buffer yourself before
hand, using C-c - n (to run the command pgdev-open-shell), and type psql in
the <em>Shell</em> prompt.</p>
<p>What that means for me is that I can at least edit SQL (in <em>PostgreSQL</em>
regression files and other places) in my usual <em>Emacs</em> buffer and actually
refine it as I go until it does exactly what I need, without having to use
the <em>readline</em> history editing or the \e command, which is not great when your
<em>Shell</em> is in already running inside <em>Emacs</em>.</p>
]]></description>
<p>To quote the article:</p> <blockquote> <p class="quoted"> The first thing I always do when playing around with a new software platform is to write a concurrent "Hello World" program. The program works as follows: One active entity (e.g. thread, Erlang process, Goroutine) has to print "Hello " and another one "World!\n" with the two active entities synchronizing with each other so that the output always is "Hello World!\n".</p> </blockquote> <p>Here's my try in <a href="http://cliki.net/">Common Lisp</a> using <a href="http://lparallel.org/">lparallel</a> and some <em>local nicknames</em>, the whole<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 06 Nov 2012 09:55:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/11/06-Interactive-SQL.html</guid> </item> <item> <title>Concurrent Hello</title> <link>http://tapoueh.org/blog/2012/11/04-Concurrent-Hello.html</link> <description><![CDATA[<p>Thanks to <a href="https://twitter.com/mickael/status/265191809100181504">Mickael</a> on <em>twitter</em> I ran into that article about implementing a very basic <em>Hello World!</em> program as a way to get into a new concurrent language or facility. The original article, titled <a href="http://himmele.blogspot.de/2012/11/concurrent-hello-world-in-go-erlang.html">Concurrent Hello World in Go, Erlang and C++</a> is all about getting to know <a href="http://golang.org/">The Go Programming Language</a> better.</p>
23 lines of it:</p>
<pre class="src">
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">say-hello</span> (helloq worldq n)
(<span style="color: #7f007f;">dotimes</span> (i n) (format t <span style="color: #bc8f8f;">"Hello "</span>) (lq:push-queue <span style="color: #da70d6;">:say-world</span> worldq) (lq:pop-queue helloq)) (lq:push-queue <span style="color: #da70d6;">:quit</span> worldq))
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">say-world</span> (helloq worldq)
(<span style="color: #7f007f;">when</span> (eq (lq:pop-queue worldq) <span style="color: #da70d6;">:say-world</span>) (format t <span style="color: #bc8f8f;">"World!~%"</span>) (lq:push-queue <span style="color: #da70d6;">:say-hello</span> helloq) (say-world helloq worldq)))
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">hello-world</span> (n)
<p>If you want to play locally with that code, I've been updating it to a <em>github</em> project named <a href="https://github.com/dimitri/go-hello-world">go-hello-world</a>, even if it's coded in <em>CL</em>. See the(<span style="color: #7f007f;">let*</span> ((lp:*kernel* (lp:make-kernel 2)) <span style="color: #b22222;">; </span><span style="color: #b22222;">a new one each time, as we end it </span> (channel (lp:make-channel)) (helloq (lq:make-queue)) (worldq (lq:make-queue))) (lp:submit-task channel #'say-world helloq worldq) (lp:submit-task channel #'say-hello helloq worldq n) (lp:receive-result channel) (lp:receive-result channel) (lp:end-kernel))) </pre>
package.lisp in there for how I did enable the <em>local nicknames</em> lp and lq for
the <em>lparallel</em> packages.</p>
<h3>Beware of the REPL</h3>
<p class="first">In a previous version of this very article, I said that sometimes I get an
extra line feed in the output and I didn't understand why. Some great Common
Lisp folks did hint me about that: it's the <em>REPL</em> output that get
intermingled with the program output, and that's because the hello-world
main function was returning before the thing is over.</p>
<p>I've added a receive-result call in it per worker so that it waits until the
end of the program before returning to the <em>REPL</em>, and that indeed fixes it. A
way to assert that is using the time macro, which was always intermingled
with the output before. It's fixed now:</p>
<pre class="src">
CL-USER> (time (go-hello-world:hello-world 1000))
Hello World!
...
Hello World!
(GO-HELLO-WORLD:HELLO-WORLD 1000)
took 27,886 microseconds (0.027886 seconds) to run.
1,593 microseconds (0.001593 seconds, 5.71%) of which was spent in GC. During that period, and with 4 available CPU cores, 23,246 microseconds (0.023246 seconds) were spent in user mode 14,427 microseconds (0.014427 seconds) were spent in system mode 4,272 bytes of memory allocated. 10 minor page faults, 0 major page faults, 0 swaps. ( lparallel kernel shutdown manager(62) [Reset] #x30200109F65D> ...) CL-USER> </pre>
<center> <p><a class="image-link" href="../../../images/confs/developper-avec-pgsql.pdf"> <img src="../../../images/confs/developper-avec-pgsql-0.png"></a></p> </center> <p>That slide deck contains mainly SQL language, but some french too, rather than english. Sorry for the inconvenience if that's not something you can read. Get me to talk at an english developer friendly conference and I'll translate it for you! :)</p> <p>The aim of that talk is to have people think about SQL as a real asset in their development tool set. SQL really should get compared to your application development language rather than your UI formating language, it's more like PHP or Python than it is like HTML.</p> <p>So the whole talk is about showing off some advanced SQL features, all provided by default in released PostgreSQL versions. The main parts of the talk all come from an article in this blog: <a href="../10/05-reset-counter.html">Reset Counter</a>.</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Sun, 04 Nov 2012 23:04:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/11/04-Concurrent-Hello.html</guid> </item> <item> <title>PostgreSQL for developers</title> <link>http://tapoueh.org/blog/2012/11/02-Conference-AFUP-Lyon.html</link> <description><![CDATA[<p>As <a href="http://blog.guillaume.lelarge.info/index.php/post/2012/11/01/Conf%C3%A9rence-%C3%A0-l-AFUP-Lyon">Guillaume</a> says, we've been enjoying a great evening conference in Lyon 2 days ago, presenting PostgreSQL to developers. He did the first hour presenting the project and the main things you want to know to start using <a href="http://www.postgresql.org/">PostgreSQL</a> in production, then I took the opportunity to be talking to developers to show off some SQL.</p>
<center> <p><a class="image-link" href="http://www.flickr.com/photos/obartunov/8128604476/lightbox/"> <img src="../../../images/prague.jpg"></a></p> </center> <center> <p><em>Photo by <a href="http://www.sai.msu.su/~megera/">Oleg Bartunov</a></em></p> </center> <p>I did have the chance to speak several times at that event, and you can get the slides at my <a href="../../../conferences.html">Conferences</a> page that I try to keep up to date. I did one talk about <a href="http://www.postgresql.eu/events/schedule/pgconfeu2012/session/318-implementing-high-availability/">Implementing High Availability</a> that was about 2 hours long (a double slot), <a href="http://www.postgresql.eu/events/schedule/pgconfeu2012/session/373-lightning-talks/">PGQ Cooperative Consumers</a> that Marko Kreen copresented with me and the <a href="http://www.postgresql.eu/events/schedule/pgconfeu2012/session/317-large-scale-mysql-migration-to-postgresql/">Large Scale MySQL Migration to PostgreSQL</a> that I already presented before this year.</p> <center> <p><a class="image-link" href="../../../images/high-availability.pdf"> <img src="../../../images/high-availability.png"></a></p> </center> <p>Next conference is in Lyon and will be in French, the talk is called <a href="http://lyon.afup.org/2012/10/17/presentation-de-postgresql-31102012-a-19h30/">Présentation de PostgreSQL</a>. The audience is going to be composed of PHP developers interested to know more about PostgreSQL, I'll tell you how it goes!</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 02 Nov 2012 16:22:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/11/02-Conference-AFUP-Lyon.html</guid> </item> <item> <title>Another awesome conf</title> <link>http://tapoueh.org/blog/2012/10/30-Prague-Lyon.html</link> <description><![CDATA[<p>Last week was <a href="http://2012.pgconf.eu/">PostgreSQL Conference Europe 2012</a> in Prague, and it's been awesome. Many thanks to the organisers who did manage to host a very smooth conference with
290attendees, including speakers. That means you kept walking into interesting people to talk to, and in particular the <em>Hallway Track</em> has been a giant success.</p>
<center> <p><img src="../../../images/Prefix-Pro-Blend.jpg" alt=""></p> </center> <h3><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 30 Oct 2012 12:50:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/10/30-Prague-Lyon.html</guid> </item> <item> <title>Prefixes and Ranges</title> <link>http://tapoueh.org/blog/2012/10/16-prefix-update.html</link> <description><![CDATA[<p>It's been a long time since I last had some time to spend on the <a href="http://tapoueh.org/pgsql/prefix.html">prefix</a> PostgreSQL extension and its
prefix_rangedata type. With PostgreSQL 9.2 out, some users wanted me to update the extension for that release, and hinted me that it was high time that I fix that old bug for which I already had a patch.</p>
prefix_range release 1.2.0</h3>
<p class="first">I'm sorry it took that long. It's now done, you can have prefix 1.2.0 from
<a href="https://github.com/dimitri/prefix">https://github.com/dimitri/prefix</a> or if you want a <em>tagged</em> tarball then you
can use this link: <a href="https://github.com/dimitri/prefix/tarball/v1.2.0">https://github.com/dimitri/prefix/tarball/v1.2.0</a>.</p>
<p>The <em>changelog</em> is all about fixing an index search bug and updating the
package to primarily be an extension for PostgreSQL 9.1 and 9.2. Of course
older Major Versions are still supported (all of them since 8.1, but please
first consider upgrading PostgreSQL) if you want to install it <em>manually</em>,
using the prefix--1.2.0.sql file.</p>
<h3>debian package</h3>
<p class="first">And thanks to <a href="http://www.df7cb.de/">Christoph Berg</a> the debian package is already validated and has
reached <em>debian experimental</em>. We don't target <em>sid</em> these days because debian
is preparing a new stable release, so there's a freeze. I think. Anyway,
take your prefix package from here if you need it:
<a href="http://packages.debian.org/experimental/postgresql-9.1-prefix">http://packages.debian.org/experimental/postgresql-9.1-prefix</a>.</p>
<h3>Range Types</h3>
<p class="first">If you step back a little there's an interesting question to answer here.
Why isn't prefix_range and <a href="http://www.postgresql.org/docs/9.2/static/rangetypes.html">PostgreSQL Range Type</a>? Given the names it seems
like a pretty good candidate.</p>
<p>Well the thing is that to make a generic range type you need to have a total
ordering on the range elements, and a distance function that tells you how
far any two elements of a range are one from each other.</p>
<p>When talking about prefixes, I don't see how to do that. The prefix range
['abcd', 'abce') contains an infinity of elements, all the <em>strings</em> that
begin with the letters abcd. I guess that coming with an ordering on text is
possible, but what if any text element represents a prefix?</p>
<p>I mean that in our case, the elements would be of type prefix, and 'abcd' is
a prefix of 'abcdefg'. The question I want to answer is that given a table
with prefixes 'abcd', 'abce' and 'abcde' which row in there has the longest
prefix matching the literal 'abcdef'.</p>
<p>I'm not seeing how to abuse the <em>Range Types</em> mechanism to implement that, so
if you have some ideas please share them!</p>
]]></description>
<h3>What's to solve</h3> <p class="first">Say we store in a table entries from a <em>counter</em> that only increases and the time stamp when we did the measurement. So that when you read<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 16 Oct 2012 10:47:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/10/16-prefix-update.html</guid> </item> <item> <title>Reset Counter</title> <link>http://tapoueh.org/blog/2012/10/05-reset-counter.html</link> <description><![CDATA[<p>I've been given a nice puzzle that I think is a good blog article opportunity, as it involves some thinking and <em>window functions</em>.</p>
30 then later
40 in fact that means we counted 10 more the second reading when compared to
the first, in other words the first 30 are counted again in the second
counter value, 40.</p>
<center>
<p><a class="image-link" href="http://xkcd.com/363/">
<img src="../../../images/reset.png"></a></p>
</center>
<p>Now of course it's a real world counter. Think network traffic counter on a
network interface, if you want something real to play with in your mind. So
the counter will sometime reset and you will read measure sequences such as
40, 0, 20 if you happen to read just when the counter is reset, or most of
the time that will look like 45, 25, 50.</p>
<p>The question we want to answer is, given a series of that counter measures,
including some resets, what is the current logical value of the counter?</p>
<p>Given the sequence of measures 0, 10, 20, 30, 40, 0, 20, 30, 60 the result
we want is 40 + 60, that is 100. Right?</p>
<h3>Playing with some data</h3>
<p class="first">Let's model an hypothetical dataset easy enough to play with. What about
just the previous example? We also need to <em>time stamp</em> the measurements,
let's just use a <em>tick</em> for now, as it's easier to think about:</p>
<pre class="src">
create table measures(tick int, nb int);
insert into measures
<p>Now that we have some data in a table to play with, let's try to find out the numbers we are interested in: we only want to keep the latest measure we read on the counter just before it wraps. That means values where the <em>next one</em> (in tick or time stamping order) is lesser than the current counter value.</p> <p>As we are lucky enough to be playing with the awesome <a href="http://www.postgresql.org/">PostgreSQL</a> which brings <a href="http://www.postgresql.org/docs/9.2/static/tutorial-window.html">window functions</a> on the table, we can easily implement just what we said in a readable way:</p> <pre class="src">values (1, 0), (2, 10), (3, 20), (4, 30), (5, 40), (6, 0), (7, 20), (8, 30), (9, 60); </pre>
<p>The firt <em>case</em> is the exact translation of the problem as spelled in english in just the previous paragraph where we stated we want to keep the current counter value in case of a <em>wraparound</em>, so I guess it's easy enough to get at.</p> <center> <p><img src="../../../images/reset-circuit-thumbnail.jpg" alt=""></p> </center> <p>Then we have a couple of tricks in that query in order to massage the data as we want it. First, the last row of the output won't have a <em>lead</em>, that <em>window function</em> call is going to returnselect tick, nb, case when lead(nb) over w < nb then nb when lead(nb) over w is null then nb else null end as max from measures window w as (order by tick); </pre>
NULL. In that case, we keep the
current counter value as if we just did a <em>wraparound</em>. And finally, when
there's no <em>wraparound</em>, we don't care about the data. Well, for the purpose
of knowing the current <em>logical</em> value of the counter, that is.</p>
<p>And we get that encouraging result:</p>
<pre class="src">
| tick | nb | max |
+——+——
| 1 | 0 | |
| 2 | 10 | |
| 3 | 20 | |
| 4 | 30 | |
| 5 | 40 | 40 |
| 6 | 0 | |
| 7 | 20 | |
| 8 | 30 | |
| 9 | 60 | 60 |
sum() aggregate function will simply discard nulls, so that we don't
have to turn them into a bunch of 0.</p>
<pre class="src">
with t(tops) as (
<center> <p><img src="../../../images/reset-elect.jpg" alt=""></p> </center> <p>And here's the expected result:</p> <pre class="src">select case when lead(nb) over w < nb then nb when lead(nb) over w is null then nb else null end as max from measures window w as (order by tick) ) select sum(tops) from t; </pre>
<p>Now what about testing with another set of data or two, just to be sure that the counter is allowed to wrap more than once within our solution?</p> <pre class="src"> insert into measuressum <span style="color: #b22222;">—— </span> 100 </pre>
<p>Then we have:</p> <pre class="src"> with t(tops) as (values (10, 0), (11, 10), (12, 30), (13, 35), (14, 45), (15, 25), (16, 50), (17, 100), (18, 110); </pre>
<p>All good!</p> <h3>Counter logical value over a given period</h3> <p class="first">Now of course what we want is to find the logical value of the counter for a given day's or month's worth of measures. We then need to pay attention to the value of the counter at the start of our period so that we know to substract it from the logical sum over the period.</p> <center> <p><img src="../../../images/reset-coin-counter.jpg" alt=""></p> </center> <p>Here's an SQL version of the same sentence, applied to the period in between ticksselect case when lead(nb) over w < nb then nb when lead(nb) over w is null then nb else null end as max from measures window w as (order by tick) ) select sum(tops) from t; sum <span style="color: #b22222;">—— </span> 255 (1 row) </pre>
4 and 14, in a completely arbitrary choosing of mine:</p>
<pre class="src">
with t as (
<p>Here we are using the <em>first_value()</em> window function to retain it in the whole resultset of the <em>Common Table Expression</em> (the inner query introduced by the keywordselect tick, first_value(nb) over w as first, case when lead(nb) over w < nb then nb when lead(nb) over w is null then nb else null end as max from measures where tick >= 4 and tick < 14 window w as (order by tick) ) select sum(max) - min(first) as sum from t; </pre>
WITH is called that way). And when doing the sum we're
interested in at the outer level, we didn't forget to substract the first
value: we need to use an aggregate here because we're doing a sum()
aggregate at the same query level, and we have the same value in each row of
the resultset, so we used min(), max() would have been as good.</p>
<p>Another important trick we're using in that query is how to express the date
range. Never use between for that, as you would end up counting boundaries
twice, and customer won't like your accounting process if you do that.
Always use a combo of inclusive and exclusive boundaries comparison, as in
that WHERE clause in the previous query.</p>
<p>Let's have a quick look at the raw data in that range, using another nice
<em>aggregate</em> that PostgreSQL comes with:</p>
<pre class="src">
select array_agg(nb) from measures where tick >= 4 and tick < 14;
array_agg <span style="color: #b22222;">——————————- </span> {30,40,0,20,30,60,0,10,30,35} (1 row) </pre>
<p>We can verify it manually, we wantsum
105 (1 row) </pre>
40 + 60 + 35 - 30, I think we're all good
again. Don't forget we have to substract the first measure from the period!</p>
<h3>Extending the problem</h3>
<p class="first">Another interesting problem, that we didn't have here but that I find
interesting enough to extend this article, is finding the ranges of time
(here, ticks) within which the counter didn't reset.</p>
<center>
<p><img src="../../../images/reset-A2a.jpg" alt=""></p>
</center>
<p>The query is more complex because we need to split the data into partitions,
each partition containing data from the same counter series of measures
without wrapping. The usual trick is to self-join our data set so that for
each given row we have a set of rows from the same partition, we are going
to instead use a <em>correlated subquery</em> to go fetch the next <em>wraparound</em> value:</p>
<pre class="src">
with tops as (
select tick, nb, case when lead(nb) over w < nb then nb when lead(nb) over w is null then nb else null end as max from measures window w as (order by tick) ) select tick, nb, max, (select tick from tops t2 where t2.tick >= t1.tick and max is not null order by t2.tick limit 1) as p from tops t1;
| tick | nb | max | p |
<span style="color: #b22222;">——+——+——+—-
| </span> 1 | 0 | 5 | |
| 2 | 10 | 5 | |
| 3 | 20 | 5 | |
| 4 | 30 | 5 | |
| 5 | 40 | 40 | 5 |
| 6 | 0 | 9 | |
| 7 | 20 | 9 | |
| 8 | 30 | 9 | |
| 9 | 60 | 60 | 9 |
| 10 | 0 | 14 | |
| 11 | 10 | 14 | |
| 12 | 30 | 14 | |
| 13 | 35 | 14 | |
| 14 | 45 | 45 | 14 |
| 15 | 25 | 18 | |
| 16 | 50 | 18 | |
| 17 | 100 | 18 | |
| 18 | 110 | 110 | 18 |
(18 rows) </pre>
<p>With that as an input it's then possible to build ranges of ticks including non wrapping set of measures from our counter, and get for each range the logical value tat the counter had at the end of it:</p> <pre class="src"> with tops as (select tick, nb, case when lead(nb) over w < nb then nb when lead(nb) over w is null then nb else null end as max from measures window w as (order by tick) ), parts as ( select tick, nb, max, (select tick from tops t2 where t2.tick >= t1.tick and max is not null order by t2.tick limit 1) as p from tops t1 ), ranges as ( select first_value(tick) over w as start, last_value(tick) over w as end, max(max) over w from parts window w as (partition by p order by tick) ) select * from ranges where max is not null;
| start | end | max |
<span style="color: #b22222;">——-+——+——
| </span> 1 | 5 | 40 |
| 6 | 9 | 60 |
| 10 | 14 | 45 |
| 15 | 18 | 110 |
(4 rows) </pre>
<h3>Conclusion</h3> <p class="first">What I hope to have shown here, apart from some <em>window function</em> tips and some nice use cases for <em>common table expressions</em>, is that as a developper addingSQL to your tool set is a very good idea.</p>
<center>
<p><img src="../../../images/skill-set.jpg" alt=""></p>
</center>
<p>You don't want to have several parts of your code dealing with a logical
counter like this, because you want the reporting, accounting, quota,
billing and other software to all agree on the values. And you most probably
want to avoid to fetch a huge result set of data and process it in the
application memory (it'd better fit) rather than just get back a single
integer column single row resultset, right?</p>
<p>If you find this SQL example to be off the limits, it's a good sign that you
need to improve on your skills so that SQL is a real asset of your developer
multi languages multi paradygm talents.</p>
]]></description>
<p>As a <em>PostgreSQL contributor</em> though, the release of<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 05 Oct 2012 09:44:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/10/05-reset-counter.html</guid> </item> <item> <title>PostgreSQL 9.3</title> <link>http://tapoueh.org/blog/2012/09/15-PostgreSQL-9.3.html</link> <description><![CDATA[<p><a href="http://www.postgresql.org/">PostgreSQL 9.2</a> is released! It's an awesome new release that I urge you to consider trying and adopting, an upgrade from even
9.1should be very well worth it, as your hardware could suddenly be able to process a much higher load. Indeed, better performances mean more work done on the same budget, that's the name of the game!</p>
9.2 mainly means to me
that it's time to fully concentrate on preparing 9.3. The developement
<em>season</em> of which as already begun, by the way, so some amount of work has
already been done here.</p>
<center>
<p><img src="../../../images/event-trigger.jpg" alt=""></p>
</center>
<p>The list of things I want to be working on for that next release is quite
long, and looks more like a christmas list than anything else. Let's only
talk about those things I might as well make happen rather than all the
things I wish I was able to be delivering in a single release...</p>
<h3>Event Triggers</h3>
<p class="first">We missed 9.2 for wanting to include too big a feature in one go, leading to
too many choices to review and take decision about, for once, and also to
some non optimal choices that had to be reconsidered. Thanks to <a href="../06/24-back-from-pgcon.html">PGCON</a> in
Ottawa earlier this year, I could meet in person with <strong>Robert Haas</strong> and we've
been able to decide how to attack that big patch I had. The first step has
been to <em>commit</em> in the PostgreSQL tree only infrastructure parts, on which we
will be able to build the feature itself.</p>
<h4>Infrastructure</h4>
<p class="first">What we already have today is the ability to run <em>user defined function</em> when
some event occurs, and an event can only be a ddl_command_start as of now.
Also the <em>trigger</em> itself must be written in PLpgSQL or PL/C, as the support
for the other languages was not included from the patch.</p>
<p>That leaves some work to be done in the next months, right?</p>
<h4>PL support</h4>
<p class="first">The <em>user defined function</em> will get some information from <em>magic variables</em>
such as TG_EVENT and such. That allows easier integration of future
information we want to add, without disrupting those existing <em>triggers</em> that
you wrote (no API change), at the cost of having to write a specific
integration per <em>procedural language</em>.</p>
<p>So one of the first things to do now is to take the support for the others
PL that I had in my proposal and make a new patch with only that in there.</p>
<h4>Fill-in more information</h4>
<p class="first">Then again, this first infrastructure part was all about being actually able
to run a user function and left behind most of the information I would like
the function to have. The information already there is the command tag, the
event name and the parsetree that's only usable if you're writing your
trigger in C, which we expect some users to be doing.</p>
<p>To supplement that, we're talking about the Object ID that has been the
target of the <em>event</em>, the schema it leaves in when applicable, the Object
Name, the Operation that's running (CREATE, ALTER, DROP), the Object Kind
being the target of said operation (e.g. TABLE or FUNCTION), and the command
string.</p>
<h4>Publishing the Command String</h4>
<p class="first">Publishing the <em>Command String</em> here is not an easy task, because we have to
rebuild a normalized version of it. Or maybe we can go with passing explicit
context in which the command is running, such as the search_path.</p>
<p>Even with an explicit context that would be easy enough to SET back again
(in a remote server where you would be replicating the DDL, say), it would
be better to normalize the <em>command string</em> so as to remove extra spaces and
make it easier to parse and process from a <em>user defined function</em>.</p>
<p>That part looks like where most of the work is going to happen in the next
<em>commit fests</em>.</p>
<h4>Events</h4>
<p class="first">The other big thing I want to be working on with respect to this feature is
the <em>event</em> support, which is basically <em>hard coded</em> to be ddl_command_start in
the current state of the 9.3 code.</p>
<p>We certainly will want to be able to run <em>user defined function</em> not only at
the very beginning of a <em>DDL command</em>, but also just before it finishes so
that the newly created object already exists, for example.</p>
<p>We might also be interested into supporting triggers on more than DDL, there
I doubt we will see that happening in 9.3, as some people in the community
would go crazy about complex use cases. Time is limited, and I think this is
better kept open for the next release, as the way our beloved PostgreSQL
works is by delivering reliable features: quality first.</p>
<h4>Use cases</h4>
<p class="first">I'm always happy to hear about use cases for the features I'm working on,
and this one has the potential to be covering a non trivial amount of them.
I already can think of <em>trigger based replication systems</em> and some integrated
<em>extension network facilities</em>. With your help we can give those the place
they should have: early days use cases in a great collection.</p>
<h3>Extensions</h3>
<center>
<p><img src="../../../images/extensions-cords.jpg" alt=""></p>
</center>
<p>So yes, <em>event triggers</em> first use case for me is in relation with <em>extensions</em>.
Surprise! There's still some more I want to do with <em>extensions</em>, so much that
I could consider their implementation in 9.1 just an enabler. In 9.1 the
game has been to offer the best support we could design for existing contrib
modules, with a very strong angle toward clean support for <em>dump</em> and <em>restore</em>.</p>
<p>The typical contrib module exports in SQL a list of C coded functions,
sometime supporting a new datatype, sometime a set of administration
functions. It's quite rare that contrib modules are handling <em>user data</em>
embedded in their SQL definition, and when it happens it's mostly
<em>configuration</em> kind of data, such as with <a href="TODO:%20add%20the%20link">PostGIS</a>.</p>
<p>Now we want to fully support <em>extensions</em> that are maintaining their own <em>user
data</em>, or even those that are all about them. The main difficulty here is
that our current design of <em>dump</em> and <em>restore</em> support is following a model
where installing the same extesion in a new database is all covered by
create extension foo;. This is a limited model of the reality, that we need
to expand.</p>
<p>The first manifestation of those problems is in the SEQUENCE support in
extensions, and that impacts one of my favorite extensions: <a href="http://wiki.postgresql.org/wiki/Skytools">PGQ</a>.</p>
<h3>PostgreSQL releases</h3>
<p><a href="http://www.postgresql.org/">PostgreSQL</a> just released an awesome release with 9.2, where we get
tremendous performance optimisations and truly innovative features, such as
RANGE TYPE. How not to consider PostgreSQL as a part of your application
stack, where to develop and host your features.</p>
<p>While users are enjoying the newer release, contributors are already
preparing the next one, hard at work again!</p>
]]></description>
<center> <p><img src="../../../images/el-get.big.png" alt=""></p> </center> <h3>Why El-Get is relevant</h3> <p class="first">Emacs 24.1 is the first release that includes<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Sat, 15 Sep 2012 18:43:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/09/15-PostgreSQL-9.3.html</guid> </item> <item> <title>El-Get 4.1 is out</title> <link>http://tapoueh.org/blog/2012/08/28-el-get-new-stable-release.html</link> <description><![CDATA[<p>Please welcome the new stable version of <a href="https://github.com/dimitri/el-get#readme">El-Get</a>, the much awaited
version 4.1has now been branched for your pleasure. It's packed with lots of features to make your life easy, comes with a <em>Info</em> documentation book and even has a <em>logo</em>. That's no joke, I found one, at least:</p>
package.el, and it even allows
the user to setup several sources where to fetch packages. Those sources,
such as <a href="http://marmalade-repo.org/">Marmalade</a>, are hosting lots of third party code for Emacs.
package.el makes it easy to <em>install</em> (partly) those software.</p>
<p>This is a very fine way of getting extra features in your Emacs
installation, and one that is supported out of the box. For a <em>package</em> to be
listed, its sources need to be prepared, and you need to rely on the central
website you now depend on to be up and running and accessible.</p>
<p>El-Get is all about allowing you to easily cope with the still vast majority
of Emacs Lisp extensions you can find out there, that is non packaged code
that is only available on some more or less mainstream <em>distribution method</em>,
ranging from <a href="http://emacswiki.org/">EmacsWiki</a> to <a href="http://github.com/">github</a> including <em>bare HTTP</em> personal hosting.</p>
<p>With El-Get, you fetch the package where it's located. There's no need for a
central server to host packaged and released software, and it's easy to
share your findings with friends, or even to <em>publish</em> any elisp code you
write.</p>
<p>El-Get will also take care of final steps that package.el did choose not to
support, such as including <em>Info</em> material in your info browser (remember C-h
i runs the command info?), running ./configure && make for you, <em>byte
compiling</em> the sources you just retrieved, adding the necessary <a href="http://www.gnu.org/software/emacs/emacs-lisp-intro/html_node/Autoload.html">autoload</a>
support, etc.</p>
<p>And of course one of the <em>methods</em> supported by El-Get is ELPA, known as <em>Emacs
Lisp Package Archive</em> and implemented by package.el.</p>
<p>So definitely, you typically want both ELPA and <a href="http://github.com/dimitri/el-get">El-Get</a>.</p>
<h3>El-Get 4.1 Changelog Summary</h3>
<p class="first">The new El-Get release is packed with features. It really is. I will only
list some of them now:</p>
<ul>
<li>Plenty new recipes, we now have 590 of them managed in the El-Get source
repository itself, and El-Get will download the current <a href="http://emacswiki.org/">EmacsWiki</a> list of
emacs lisp files at install time too.</li>
<li>The default installation and usage has been simplified a lot.</li>
<li>More options are provided to setup El-Get packages, see
el-get-user-package-directory for example.</li>
<li>Part of the simplification, el-get-sources has been revisited and now
serves only one goal.</li>
<li>We dropped (el-get 'wait) which was a misconception and had been broken
for a long time in the development version of El-Get.</li>
<li>We made improvements in the error handling and in dealing with some
corner cases that still happen often enough for users to report them.
Please continue reporting them!</li>
<li>More caching is done, with a better dependency tracking and status
management.</li>
<li>Enhanced notification support, from DBUS to growl.</li>
<li>Support for <em>checksums</em> with a lot of <em>source types.</em></li>
<li>Completing our git support, <em>shallow</em> clones and <em>submodules</em> are there.</li>
<li>Better support for github including the zip and tar releases.</li>
<li>Ability to reload a package when it's been <em>updated</em>.</li>
<li><em>Moar</em> features</li>
</ul>
<p>And most importantly, El-Get documentation is now almost complete and comes
in the nice <em>Info</em> format I know you've been expecting for so long!</p>
<h3>Using El-Get</h3>
<p class="first">Here's a quick summary of what using El-Get is like, for a new user in 4.1.
If you're already using El-Get see the section about upgrading. To install
El-Get you need to paste those lines to your *scratch* buffer then hit C-j
after the last closing parenthesis:</p>
<pre class="src">
(url-retrieve
<p>Then you can try<span style="color: #bc8f8f;">"https://raw.github.com/dimitri/el-get/master/el-get-install.el"</span> (lambda (s) (goto-char (point-max)) (eval-print-last-sexp))) </pre>
M-x el-get-list-packages and browse through more than 2000
available packages. Mark the ones you want to install with i then type x to
see El-Get fetch and install all those packages you just selected. Here's a
summary of what's available to you in the M-x el-get-list-packages buffer:</p>
<pre class="src">
Major Mode Bindings:
SPC el-get-package-menu-mark-unmark
? el-get-package-menu-describe
d el-get-package-menu-mark-delete
g el-get-package-menu-revert
h el-get-package-menu-quick-help
i el-get-package-menu-mark-install
u el-get-package-menu-mark-update
x el-get-package-menu-execute
</pre>
<p>Once a package is <em>installed</em>, El-Get will <em>initialize</em> it for you, and it will
also do that step at every Emacs startup from there on, provided that you
added some lines to your ~/.emacs initialization file, that look a lot like
the previous *scratch* code you did paste:</p>
<pre class="src">
;;
;; Here's a typical El-Get integration for your .emacs file:
;;
(add-to-list 'load-path <span style="color: #bc8f8f;">"~/.emacs.d/el-get/el-get"</span>)
(setq el-get-user-package-directory <span style="color: #bc8f8f;">"~/.emacs.d/packages.d/"</span>)
(unless (require 'el-get nil t)
(with-current-buffer (url-retrieve-synchronously <span style="color: #bc8f8f;">"https://raw.github.com/dimitri/el-get/master/el-get-install.el"</span>) (goto-char (point-max)) (eval-print-last-sexp)))
(el-get 'sync) </pre>
<p>Then you can add files named likeinit-<package>.el in the
el-get-user-package-directory directory, those files will get loaded when
El-Get <em>initialize</em> <package>.</p>
<p>You can also use M-x el-get-install if you want to bypass the full screen
package listing, you will get completion on the package name.</p>
<h3>Community and development</h3>
<p class="first">El-Get community grew to be a really cool place to be participating in these
days, with core and <em>recipe</em> contributions from more than 130 different people
already, and with 526 stars on github and 184 forks. I almost can't believe
it!</p>
<pre class="src">
| git —no-pager shortlog -n -s | wc -l |
137
| git —no-pager shortlog -n -s | head -10 |
<p>Now that we have something that looks like a <em>core team</em> forming up, I'm thinking about scheduling much more aggressive stable release. 4.1 has been very long in the making, I hope to now have a rapid release cycle leading us to734 Dimitri Fontaine 336 Ryan C. Thompson 114 Julien Danjou 110 Dave Abrahams 73 Ryan Thompson 72 Sébastien Gross 42 Takafumi Arakaki 27 Alex Ott 25 Yakkala Yagnesh Raghava 21 Rüdiger Sonderfeld </pre>
4.2 in quite a short time. As that's not an individual effort by any
mean, though, only history will tell.</p>
<h3>The roadmap</h3>
<p class="first">We have lots of ideas and some rough edges to address, so 4.1 is only a stop
in the release history of El-Get. Next ideas include better error management
in face of rare corner cases and in face of external events, like when you
did rm -rf a directory holding an El-Get managed extension: we should mark
it <em>removed</em> and clean up the autoloads that came from it.</p>
<h3>Upgrading to 4.1</h3>
<p class="first">This item has received some treatment in the documentation. The basic idea
is that el-get-sources is no longer what it used to be, it's now only an
alternative source location for <em>recipes</em>, like it should always have been.
Not that you can still <em>override</em> in there some properties that you want
<em>merged</em> with an official <em>recipe</em>.</p>
<p>The new thing about el-get-sources is that it will no longer be the
authoritative list of packages that El-Get manages. That list is not either
given explicitly when calling the el-get function in your .emacs setup, or
derived from the packages that are known <em>installed</em> on your system (like e.g.
debian is doing).</p>
<p>Also, given that it took us so much time to brew 4.1 a lot of packages have
changed either their hosting location or even switched their SCM. In such
cases an automatic update of the recipe will no longer be possible, you
might need to el-get-remove then el-get-install packages to get them back.</p>
<h3>Conclusion</h3>
<p class="first">El-Get 4.1 is now ready for public consumption, don't be shy, we've been a
lot of users running the development branch for a long time now, I'm running
4.0.7.6901194 while writing this post. 4.0 is the development version of
what is now released as 4.1.</p>
<p>Many thanks to all who contributed to El-Get and to all our users, I'm very
proud that together we worked out a very nice and complete tool!</p>
]]></description>
<p>With so many <em>smart</em> qualifiers you can only guess that I did love the challenge. The idea is to write the simplest code possible and see how smarter you need to be when you need perfs. Let's have a try!</p> <h3>local python results</h3> <p class="first">Here's the code I did use to benchmark the python solution:</p> <pre class="src"> <span style="color: #7f007f;">def</span> <span style="color: #0000ff;">sumrange</span>(arg):<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 28 Aug 2012 11:43:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/08/28-el-get-new-stable-release.html</guid> </item> <item> <title>Fast and stupid?</title> <link>http://tapoueh.org/blog/2012/08/20-performance-the-easiest-way.html</link> <description><![CDATA[<p>I stumbled onto an interesting article about performance when using python, called <a href="http://jiaaro.com/python-performance-the-easyish-way">Python performance the easy(ish) way</a>, where the author tries to get the bet available performances out of the dumbiest possible python code, trying to solve a very simple and stupid problem.</p>
<span style="color: #7f007f;">return</span> <span style="color: #da70d6;">sum</span>(<span style="color: #da70d6;">xrange</span>(arg))
<span style="color: #7f007f;">def</span> <span style="color: #0000ff;">sumrange2</span>(arg):
<span style="color: #b8860b;">x</span> = <span style="color: #b8860b;">i</span> = 0 <span style="color: #7f007f;">while</span> i < arg: <span style="color: #b8860b;">x</span> += i <span style="color: #b8860b;">i</span> += 1 <span style="color: #7f007f;">return</span> x
<span style="color: #7f007f;">import</span> ctypes <span style="color: #b8860b;">ct_sumrange</span> = ctypes.CDLL(<span style="color: #bc8f8f;">'/Users/dim/dev/CL/jiaroo/sumrange.so'</span>)
<span style="color: #7f007f;">def</span> <span style="color: #0000ff;">sumrange_ctypes</span>(arg):
<span style="color: #7f007f;">return</span> ct_sumrange.sumrange(arg)
<span style="color: #7f007f;">if</span> <span style="color: #da70d6;">__name__</span> == <span style="color: #bc8f8f;">"__main__"</span>:
<span style="color: #7f007f;">import</span> timeit <span style="color: #b8860b;">t1</span> = timeit.Timer(<span style="color: #bc8f8f;">'import jiaroo; jiaroo.sumrange(10**10)'</span>) <span style="color: #b8860b;">t2</span> = timeit.Timer(<span style="color: #bc8f8f;">'import jiaroo; jiaroo.sumrange2(10**10)'</span>) <span style="color: #b8860b;">ct</span> = timeit.Timer(<span style="color: #bc8f8f;">'import jiaroo; jiaroo.sumrange_ctypes(10**10)'</span>)
<p>Oh. And the C code too, sorry about that.</p> <pre class="src"> <span style="color: #da70d6;">#include</span> <span style="color: #bc8f8f;"><stdio.h></span><span style="color: #7f007f;">print</span> <span style="color: #bc8f8f;">'timing python sumrange(10**10)'</span> <span style="color: #7f007f;">print</span> <span style="color: #bc8f8f;">'xrange: %5fs'</span> % t1.timeit(1) <span style="color: #7f007f;">print</span> <span style="color: #bc8f8f;">'while: %5fs'</span> % t2.timeit(1) <span style="color: #7f007f;">print</span> <span style="color: #bc8f8f;">'ctypes: %5fs'</span> % ct.timeit(1) </pre>
<span style="color: #228b22;">int</span> <span style="color: #0000ff;">sumrange</span>(<span style="color: #228b22;">int</span> <span style="color: #b8860b;">arg</span>) {
<span style="color: #228b22;">int</span> <span style="color: #b8860b;">i</span>, <span style="color: #b8860b;">x</span>; x = 0;
<p>And here's how I did compile it. The author of the inspiring article insisted on stupid optimisation targets, I did follow him:</p> <pre class="src"> gcc -shared -Wl,-install_name,sumrange.so -o sumrange.so -fPIC sumrange.c -O0 </pre> <p>And here's the result I did get out of it:</p> <pre class="src"> python jiaroo.py timing python sumrange(10**10) <span style="color: #da70d6;">xrange</span>: 927.039917s <span style="color: #7f007f;">while</span>: 2377.291237s ctypes: 5.297124s </pre> <p>Let's be fair, with<span style="color: #7f007f;">for</span> (i = 0; i < arg; i++) { x = x + i; } <span style="color: #7f007f;">return</span> x; } </pre>
-O2 we get much better results:</p>
<pre class="src">
timing python sumrange(10**10)
ctypes: 1.065684s
</pre>
<h3>Common Lisp to the rescue</h3>
<p class="first">So let's have a try in Common Lisp, will you ask me, right?</p>
<p>Here's the code I did use, you can see three different tries:</p>
<pre class="src">
<span style="color: #b22222;">;;;; </span><span style="color: #b22222;">jiaroo.lisp
</span><span style="color: #b22222;">;;;</span><span style="color: #b22222;">
</span><span style="color: #b22222;">;;; </span><span style="color: #b22222;">See http://jiaaro.com/python-performance-the-easyish-way
</span><span style="color: #b22222;">;;;</span><span style="color: #b22222;">
</span><span style="color: #b22222;">;;; </span><span style="color: #b22222;">The goal here is to find out if CL needs to resort to C for very simple
</span><span style="color: #b22222;">;;; </span><span style="color: #b22222;">optimisation tricks like python apparently needs too, unless using pypy
</span><span style="color: #b22222;">;;; </span><span style="color: #b22222;">(to some extend).
</span>
(<span style="color: #7f007f;">in-package</span> #<span style="color: #da70d6;">:jiaroo</span>)
<span style="color: #b22222;">;;; </span><span style="color: #b22222;">"jiaroo" goes here. Hacks and glory await! </span> (<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">sumrange-loop</span> (max)
<span style="color: #bc8f8f;">"return the sum of numbers from 1 to MAX"</span> (<span style="color: #7f007f;">let</span> ((sum 0)) (<span style="color: #7f007f;">declare</span> (type (and unsigned-byte fixnum) max sum) (optimize speed)) (<span style="color: #7f007f;">loop</span> for i fixnum from 1 to max do (incf sum i))))
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">sumrange-dotimes</span> (max)
<span style="color: #bc8f8f;">"return the sum of numbers from 1 to MAX"</span> (<span style="color: #7f007f;">let</span> ((sum 0)) (<span style="color: #7f007f;">declare</span> (type (and unsigned-byte fixnum) max sum) (optimize speed)) (<span style="color: #7f007f;">dotimes</span> (i max sum) (incf sum i))))
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">pk-sumrange</span> (max)
(<span style="color: #7f007f;">declare</span> (type (and unsigned-byte fixnum) max) (optimize speed)) (<span style="color: #7f007f;">let</span> ((sum 0)) (<span style="color: #7f007f;">declare</span> (type (and fixnum unsigned-byte) sum)) (<span style="color: #7f007f;">dotimes</span> (i max sum) (setf sum (logand (+ sum i) most-positive-fixnum)))))
(<span style="color: #7f007f;">defmacro</span> <span style="color: #0000ff;">timing</span> (<span style="color: #228b22;">&body</span> forms)
<span style="color: #bc8f8f;">"return both how much real time was spend in body and its result"</span> (<span style="color: #7f007f;">let</span> ((start (gensym)) (end (gensym)) (result (gensym))) `(<span style="color: #7f007f;">let*</span> ((,start (get-internal-real-time)) (,result (<span style="color: #7f007f;">progn</span> ,@forms)) (,end (get-internal-real-time))) (values ,result (/ (- ,end ,start) internal-time-units-per-second)))))
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">bench-sumrange</span> (power)
<p>And here's the results:</p> <pre class="src"> CL-USER> (bench-sumrange 10) timing common lisp sumrange 10**10 loop: 11.213s dotimes: 7.642s pk dotimes: 22.185s NIL </pre> <h3>Discussion</h3> <p class="first">So python is very slow. C is pretty fast. And Common Lisp just in the middle. Honnestly I expected better performances from my beloved Common Lisp here, but I didn't try very hard, by using <a href="http://ccl.clozure.com/">Clozure Common Lisp</a> which is not the quicker Common Lisp implementation around. For this very benchmark, if you're seeking speed use either <a href="http://sbcl.org/">Steel Bank Common Lisp</a> or <a href="http://www.clisp.org/">CLISP</a> which is known to have a pretty fast bignums implementation (which you don't need in 64 bits in that game).</p> <p>On the other hand, I think that having to go write a C plugin and deal with how to compile and deploy it in the middle of a python script is something to avoid. When using Common Lisp you don't need to resort to that for the <em>runtime</em> to get down from python <em>xrange</em> implementation at<span style="color: #bc8f8f;">"print execution time of both the previous functions"</span> (<span style="color: #7f007f;">let*</span> ((max (expt 10 power)) (lp-time (<span style="color: #7f007f;">multiple-value-bind</span> (r s) (timing (sumrange-loop max)) s)) (dt-time (<span style="color: #7f007f;">multiple-value-bind</span> (r s) (timing (sumrange-dotimes max)) s)) (pk-time (<span style="color: #7f007f;">multiple-value-bind</span> (r s) (timing (pk-sumrange max)) s))) (format t <span style="color: #bc8f8f;">"timing common lisp sumrange 10**~d~%"</span> power) (format t <span style="color: #bc8f8f;">"loop: ~2,3fs ~%"</span> lp-time) (format t <span style="color: #bc8f8f;">"dotimes: ~2,3fs ~%"</span> dt-time) (format t <span style="color: #bc8f8f;">"pk dotimes: ~2,3fs ~%"</span> pk-time))) </pre>
927.039917s down to
the <em>dotimes</em> implementation taking 7.642s. That's about 121 times faster.</p>
<p>So while C is even better, and while I would like a Common Lisp guru to show
me how to get a better speed here, I still very much appreciate the solution
here.</p>
<p>Let's see the winning source code in <em>python</em> and <em>common lisp</em> to compare the
programmer side of things: how hard was it really to get 121 times faster?</p>
<pre class="src">
<span style="color: #7f007f;">def</span> <span style="color: #0000ff;">sumrange</span>(arg):
<pre class="src"> (<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">sumrange-dotimes</span> (max)<span style="color: #7f007f;">return</span> <span style="color: #da70d6;">sum</span>(<span style="color: #da70d6;">xrange</span>(arg)) </pre>
<p>That's about it. Yes we can see some <em>manual</em> optimisation directives here, which are optimisation <em>extra complexity</em>. Not to the same level as bringing a compiled artifact that you need to build and deploy, though. Remember that you will need to know the full path where to find the<span style="color: #bc8f8f;">"return the sum of numbers from 1 to MAX"</span> (<span style="color: #7f007f;">let</span> ((sum 0)) (<span style="color: #7f007f;">declare</span> (type (and unsigned-byte fixnum) max sum) (optimize speed)) (<span style="color: #7f007f;">dotimes</span> (i max sum) (incf sum i)))) </pre>
sumrange.so file on
the production system, in the optimised <em>python</em> case, so that's what we are
comparing against.</p>
<p>Here's what happens without the optimisation, and with a smaller target:</p>
<pre class="src">
CL-USER> (time (jiaroo:sumrange-dotimes (expt 10 9)))
(JIAROO:SUMRANGE-DOTIMES (EXPT 10 9))
took 722,592 microseconds (0.722592 seconds) to run.
During that period, and with 2 available CPU cores,
714,709 microseconds (0.714709 seconds) were spent in user mode 1,183 microseconds (0.001183 seconds) were spent in system mode 499999999500000000
CL-USER> (time (<span style="color: #7f007f;">let</span> ((sum 0)) (<span style="color: #7f007f;">dotimes</span> (i (expt 10 9) sum) (incf sum i)))) (<span style="color: #7f007f;">LET</span> ((SUM 0)) (<span style="color: #7f007f;">DOTIMES</span> (I (EXPT 10 9) SUM) (INCF SUM I))) took 2,174,767 microseconds (2.174767 seconds) to run. During that period, and with 2 available CPU cores,
<p>We get a2,156,549 microseconds (2.156549 seconds) were spent in user mode 10,225 microseconds (0.010225 seconds) were spent in system mode 499999999500000000 </pre>
3 times speed-up from those 2 lines of lisp optimisation
directives, which is pretty good. And it's exponential as I didn't have the
patience to actually wait until the non optimised 10^10 run finished, I
killed it.</p>
<h3>Conclusion</h3>
<p class="first">That's a case here where I don't know how to reach C level of performances
with Common Lisp, which could just be because I don't know yet how to do.</p>
<p>Still, getting a 121 times speedup when compared to the pure <em>python</em> version
of the code is pretty good and encourages me to continue diving into Common
Lisp.</p>
]]></description>
<center> <p><img src="../../../images/autumn-leave-480.jpg" alt=""></p> </center> <p>This talk shares hindsights about the why and the how of that migration, what problems couldn't be solved without moving away and how the solution now looks. The tools used for migrating away the data, the methods the new architecture are detailed. And the new home, in the cloud!</p> <p>Not that much later after that the European PostgreSQL community is giving us a very nice occasion to get to Prague with <a href="http://2012.pgconf.eu/">PostgreSQL Conference Europe 2012</a> (October 23-26). If you've been meaning to meet with the community, if you've been meaning to visit Prague someday, or any mix of those two very good reasons, think about booking that conference already.</p> <p>The <a href="http://2012.pgconf.eu/callforpapers/">call for papers for pgconf.eu</a> has been extended to August 7th, 2012. Consider sharing your hindsights too!</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 22 Aug 2012 16:05:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/08/20-performance-the-easiest-way.html</guid> </item> <item> <title>Autumn 2012 Conferences</title> <link>http://tapoueh.org/blog/2012/08/01-autumn-conferences.html</link> <description><![CDATA[<p>The <a href="http://www.postgresql.org/">PostgreSQL</a> community host a number of <a href="../../../conferences.html">conferences</a> all over the year, and the next ones I'm lucky enough to get to are approaching fast now. First, next month in September, we have <a href="http://postgresopen.org/2012/home/">Postgres Open</a> in Chicago, where my talk about <a href="http://tapoueh.org/blog/2012/05/24-back-from-pgcon.html">Large Scale Migration from MySQL to PostgreSQL</a> has been selected!</p>
<center> <p><a class="image-link" href="http://en.wikipedia.org/wiki/Sudoku"> <img src="../../../images/sudoku.png"></a></p> </center> <p>The article is very well written and makes it easy to think that coming up with the code for such a solver is a very easy task, you apply some basic problem search principles and there you are. Which is partly true, in fact. Also, he uses<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 02 Aug 2012 01:08:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/08/01-autumn-conferences.html</guid> </item> <item> <title>Solving Every Sudoku Puzzle</title> <link>http://tapoueh.org/blog/2012/07/10-solving-sudoku.html</link> <description><![CDATA[<p><a href="http://norvig.com/">Peter Norvig</a> published a while ago a very nice article titled <a href="http://norvig.com/sudoku.html">Solving Every Sudoku Puzzle</a> wherein he presents a programmatic approach to solving that puzzle game.</p>
python, and that means that a lot of trivial programming
activities are not a concern anymore, such as memory management.</p>
<p>As I've been teaching myself <a href="http://www.cliki.net/Common%20Lisp">Common Lisp</a> for some weeks now I though I would
like to read a lisp version of his code, and the article even has a section
titled <em>Translations</em>. Unfortunately, no lisp version is available there. One
might argue that <a href="http://clojure.org/">Clojure</a> is a decent enough lisp, but my current quest is
all about <em>Common Lisp</em> really. So I had to write one myself.</p>
<pre class="src">
CL-USER> (sudoku:print-puzzle
(sudoku:solve-grid <span style="color: #bc8f8f;">"5300700006001950000980000608000600034008030017000200060600002800004190050000800</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">79"))</span>
| 5 3 4 | 6 7 8 | 9 1 2 |
| 6 7 2 | 1 9 5 | 3 4 8 |
| 1 9 8 | 3 4 2 | 5 6 7 |
+——-+——
| 8 5 9 | 7 6 1 | 4 2 3 |
| 4 2 6 | 8 5 3 | 7 9 1 |
| 7 1 3 | 9 2 4 | 8 5 6 |
+——-+——
| 9 6 1 | 5 3 7 | 2 8 4 |
| 2 8 7 | 4 1 9 | 6 3 5 |
| 3 4 5 | 2 8 6 | 1 7 9 |
took 1,974 microseconds (0.001974 seconds) to run. During that period, and with 2 available CPU cores,
<h3>Comments on the python version</h3> <p class="first">Norvig's article is very well written, I think. By that I mean that by reading it you're confident that you've understood the problem and how the solution is articulated, so you almost think you don't need to really try to understand the code, it's just an illustration of the text.</p> <p>Well, not so much. When you want to port the exact same algorithm you have to understand exactly what the code is doing so that you're not implementing something else. All the more when, as I did, you want to use some other data structure.</p> <p>My goal was not to rewrite the code as-is, but to try and come up with <em>idiomatic</em> lisp code implementing Norvig's solution. So rather than using <em>strings</em> and <em>dictionaries</em> (in lisp, they still call them a <a href="http://www.lispworks.com/documentation/lw50/CLHS/Body/f_mk_has.htm">hash table</a>) I've been using more natural data structures.</p> <p>The <em>python</em> code is really uneasy to follow, full of functional programming veteran tricks. I mean avoiding <em>exceptions</em> and simply returning1,894 microseconds (0.001894 seconds) were spent in user mode 88 microseconds (0.000088 seconds) were spent in system mode 174,320 bytes of memory allocated. </pre>
False
whenever there's a problem, and using functions such as all and some to
manage that. It's certainly working, it's not making the code any easier to
read.</p>
<p>To summarize, that code looks like it's been written by someone smart who
didn't want to spend more than a couple of hours on it, and did take all
known trustworthy shortcuts he could to achieve that goal. Quality and
readability certainly weren't the key motive. I've been quite deceived after
reading a very good article.</p>
<h3>Comments on the common lisp version</h3>
<p class="first">Keep in mind that I'm just a <em>Common Lisp</em> newbie. I've been told some good
pieces of advice by knowledgeable people though, so with some luck my
implementation is somewhat <em>lispy</em> enough.</p>
<p>So we start by defining some data structures and low-level functions to
build up the more complex one, so that it's easier to read and debug. The
<em>sudoku</em> puzzle is then a grid of digits and a grid of possible values in
places where the digits are yet unknown.</p>
<p>The way to represent that 9x9 grid is with using <a href="http://www.lispworks.com/documentation/lw51/CLHS/Body/f_mk_ar.htm">make-array</a>:</p>
<pre class="src">
(make-array '(9 9)
<span style="color: #da70d6;">:element-type</span> '(integer 0 9) <span style="color: #da70d6;">:initial-element</span> 0) </pre>
bit-vector (and actually I
did implement it that way), then I've been told that the <em>Common Lisp</em> way to
approach that is using <a href="http://psg.com/~dlamkins/sl/chapter18.html">2-complement integer representation</a>, as we have
plenty of functions to operate numbers that way. I wouldn't believe that
would make the code simpler, but in fact it really did, see:</p>
<pre class="src">
CL-USER> #b111111111
511
CL-USER> (logcount #b111111111)
9
CL-USER> (logcount 511)
9
CL-USER> (logbitp 3 #b100100100)
NIL
CL-USER> (logbitp 2 #b100100100)
T
CL-USER> (format nil <span style="color: #bc8f8f;">"~2r"</span> (logxor #b111111111 (ash 1 4)))
<span style="color: #bc8f8f;">"111101111"</span>
CL-USER> (logbitp 4 (logxor #b111111111 (ash 1 4)))
NIL
</pre>
<p>With that in mind, we can write the following code:</p>
<pre class="src">
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">count-remaining-possible-values</span> (possible-values)
<span style="color: #bc8f8f;">"How many possible values are left in there?"</span> <span style="color: #b22222;">;; </span><span style="color: #b22222;">we could raise an empty-values condition if we get 0... </span> (logcount possible-values))
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">first-set-value</span> (possible-values)
<span style="color: #bc8f8f;">"Return the index of the first set value in POSSIBLE-VALUES."</span> (+ 1 (floor (log possible-values 2))))
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">only-possible-value-is?</span> (possible-values value)
<span style="color: #bc8f8f;">"Return a generalized boolean which is true when the only value found in POSSIBLE-VALUES is VALUE"</span> (and (logbitp (- value 1) possible-values) (= 1 (logcount possible-values))))
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">list-all-possible-values</span> (possible-values)
<span style="color: #bc8f8f;">"Return a list of all possible values to explore"</span> (<span style="color: #7f007f;">loop</span> for i from 1 to 9 when (logbitp (- i 1) possible-values) collect i))
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">value-is-set?</span> (possible-values value)
<span style="color: #bc8f8f;">"Return a generalized boolean which is true when given VALUE is possible in POSSIBLE-VALUES"</span> (logbitp (- value 1) possible-values))
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">unset-possible-value</span> (possible-values value)
<p>You can see here that I was also under the influence of a recent reading about <a href="http://gar1t.com/blog/2012/06/10/solving-embarrassingly-obvious-problems-in-erlang/">making it obvious</a>, or so called <a href="http://dieswaytoofast.blogspot.fr/2012/07/erlang-why-so-many-seemingly-identical.html">intentional programming</a>, following what <a href="http://armstrongonsoftware.blogspot.fr/">Joe Armstrong</a> has to say about it:</p> <blockquote> <p class="quoted"><em>Intentional programming is a name I give to a style of programming where the reader of a program can easily see what the programmer intended by their code. The intention of the code should be obvious from the names of the functions involved and not be inferred by analysing the structure of the code. (Reading the code should) precisely expresses the intention of the programmer—here no guesswork or program analysis is involved, we clearly read what was intended.</em></p> </blockquote> <p>So there we go with function names such as<span style="color: #bc8f8f;">"return an integer representing POSSIBLE-VALUES with VALUE unset"</span> (logxor possible-values (ash 1 (- value 1)))) </pre>
count-remaining-possible-values,
that will help when reading some more complex code, as in the following, the
meat of the solution:</p>
<pre class="src">
(<span style="color: #7f007f;">defmethod</span> <span style="color: #0000ff;">eliminate</span> ((puzzle puzzle) row col value)
<span style="color: #bc8f8f;">"Eliminate given VALUE from possible values in cell ROWxCOL of PUZZLE, and propagate when needed"</span> (<span style="color: #7f007f;">with-slots</span> (grid values) puzzle <span style="color: #b22222;">;; </span><span style="color: #b22222;">if already unset, work is already done </span> (<span style="color: #7f007f;">when</span> (value-is-set? (aref values row col) value) <span style="color: #b22222;">;; </span><span style="color: #b22222;">eliminate the value from the set of possible values </span> (<span style="color: #7f007f;">let*</span> ((possible-values (unset-possible-value (aref values row col) value))) (setf (aref values row col) possible-values)
<span style="color: #b22222;">;; </span><span style="color: #b22222;">now if we're left with a single possible value </span> (<span style="color: #7f007f;">when</span> (= 1 (count-remaining-possible-values possible-values)) (<span style="color: #7f007f;">let</span> ((found-value (first-set-value possible-values))) <span style="color: #b22222;">;; </span><span style="color: #b22222;">update the main grid </span> (setf (aref grid row col) found-value)
<span style="color: #b22222;">;; </span><span style="color: #b22222;">eliminate that value we just found in all peers </span> (eliminate-value-in-peers puzzle row col found-value)))
<span style="color: #b22222;">;; </span><span style="color: #b22222;">now check if any unit has a single possible place for that value </span> (<span style="color: #7f007f;">loop</span> for (r . c) in (list-places-with-single-unit-solution puzzle row col value) do (assign puzzle r c value)))))) </pre>
C-M-x runs the command
eval-defun) and your code is loaded up, ready to be tested. In <em>Emacs Lisp</em>
the test can be simply using your editor and watching the new behavior
taking place, or playing in the M-x ielm console. When the code is not
ready, it crashes, and you're left in the interactive debugger, where you
can use C-x C-e runs the command eval-last-sexp to evaluate any expression
in your source and see its value in the current <em>debug frame</em>.</p>
<p>That way of working is a huge productivity boost, that I've been missing
much when getting back to writing C code for PostgreSQL. I can't C-M-x the
current function and go write some SQL to test it right away, I have to
<em>compile</em> the whole source tree, then <em>install</em> the new binaries, then <em>restart</em>
the test server and then open up a <em>psql</em> console to interact with the new
code. Of course I could just make check and watch the results, but then if I
attach a <em>debugger</em> it complains that the code on-disk is more recent than the
code in the <em>core dump</em>.</p>
<p>What if you want <em>Emacs Lisp</em> integrated facilities and something made for
general programming rather than suited to building a text editor? Don't get
me wrong, you can probably find more production ready code in <em>elisp</em> than in
many other languages, just because Emacs has been there for about 35 years.
Editor targeted production code, though.</p>
<p>This integrated development cycle is all the same when you're using <em>Common
Lisp</em>. The awesome <a href="http://common-lisp.net/project/slime/">Superior Lisp Interaction Mode for Emacs</a> is providing
exactly that experience. Just run M-x slime and then as you define your code
you can C-M-x the function at point, see the compilation errors and warnings
if any in the associated <em>REPL</em>, and just try your code. I tend to mostly play
in the command line, it's possible to just use C-x C-e while typing too.</p>
<h3>Performances</h3>
<p class="first">Of course we do care! After all the original article came with a quite
detailed performance analysis with graphs and all. I won't be reproducing
that, sorry. I'll just show you what penalty you get for using an older
language specification, much more dynamic and with more features than
python, and with a great, scratch that, awesome development environment.</p>
<p>Oh wait, that's the other way round, no penalty, it's actually so much
faster!</p>
<h4>Python version perfs</h4>
<p class="first">The results I got on my desktop machine are about twice as fast as in the
original article, I guess newer machines and newer python have something to
say for that:</p>
<pre class="src">
<p>That makes an average ofdim ~/dev/CL/sudoku python sudoku.dim.py All tests pass. Solved 50 of 50 easy puzzles (avg 0.01 secs (151 Hz), max 0.01 secs). Solved 95 of 95 hard puzzles (avg 0.02 secs (42 Hz), max 0.12 secs). Solved 11 of 11 hardest puzzles (avg 0.01 secs (115 Hz), max 0.01 secs). </pre>
(50*151 + 95*42 + 11*115) / (50+95+11) =
82Hz.</p>
<p>That seems pretty good, let's continue.</p>
<p>As you can see I've cut away the <em>random puzzle</em> part, that's because I was
too lazy to implement that part, which didn't seem all that interesting to
me. If you think that's a problem and need solving, I accept patches.</p>
<h4>Common lisp version perfs</h4>
<p class="first">When using <a href="http://sbcl.org/">SBCL</a> on the same machine, what I got was:</p>
<pre class="src">
<p>With the same way to compute the average, we now have(sudoku:solve-example-grids) Solved 50 of 50 easy puzzles (avg .0021 sec (471.7 Hz), max 0.015 secs). Solved 95 of 95 hard puzzles (avg .0022 sec (446.0 Hz), max 0.008 secs). Solved 11 of 11 hardest puzzles (avg .0018 sec (550.0 Hz), max 0.003 secs). </pre>
461.6Hz.</p>
<p>Now, that's between 3 times and more than <strong>10 times faster</strong> than the python
version (taken collection per collection), for a comparable effort, a much
better development environment, and the same all dynamic no explicit
compiling approach.</p>
<h3>Conclusion</h3>
<p class="first">I guess I'm fond of <em>Common Lisp</em>, which I already saw coming (so did you,
right?), and now I have some public article and code to share about why :)</p>
<p>The code is hosted at <a href="https://github.com/dimitri/sudoku">https://github.com/dimitri/sudoku</a> if you're
interested, with the necessary files to reproduce, some docs, etc.</p>
<p>Also, apart from using <em>integers</em> as <em>bitfields</em>, which I did more for being
lispy than for performances, I did very little effort for optimizing the
code. It's quite naive in this respect, yet allow me an average of 461.6Hz
rather than 82Hz, that's <strong><em>5.6 times faster</em></strong> average.</p>
<p>So yes, I will continue to invest some precious time in <em>Common Lisp</em> as a
very good interactive scripting language, and maybe more than that.</p>
]]></description>
<p>You might have guessed it already, I did talk about replication. Here's the slide deck I did use, it's in french, sorry if you don't grok that language.</p> <center> <p><a class="image-link" href="../../../images/confs/PGDay_2012_Replications.pdf"> <img src="../../../images/confs/PGDay_2012_Replications.png"></a></p> </center> <p>The conference was very nice and did go smoothly, even if we were “only” 60 of us I had the pleasure to meet with different users with very different set of needs. Very happy to have been there!</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 10 Jul 2012 20:37:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/07/10-solving-sudoku.html</guid> </item> <item> <title>PGDay France 2012</title> <link>http://tapoueh.org/blog/2012/06/08-pgdayfr-lyon.html</link> <description><![CDATA[<p>The french PostgreSQL Conference, <a href="http://www.pgday.fr/programme">pgday.fr</a>, was yesterday in Lyon. We had a very good time and a great schedule with a single track packed with 7 talks, addressing a diverse set of PostgreSQL related topics, from GIS to fuzzy logic, including replication.</p>
<p>In this case we're talking about the<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 08 Jun 2012 16:17:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/06/08-pgdayfr-lyon.html</guid> </item> <item> <title>M-x recompile</title> <link>http://tapoueh.org/blog/2012/06/01-emacs-recompile.html</link> <description><![CDATA[<p>A friend of mine just asked me for advice to tweak some Emacs features, and I think that's really typical of using Emacs: rather than getting used to the way things are shipped to you, when using Emacs, you start wanting to adapt the tools to the way you want things to be working instead. And you can call that the awesome!</p>
M-x compile and M-x recompile
functions. My friend bound the former to <f11> and wanted that C-u f11 do a
recompile with the exact same command line as the previous compile command.</p>
<p>Well, to be honest, I didn't know about M-x recompile until after I wrote
the following function, made to trigger another compile with the last
command used if using C-u.</p>
<pre class="src">
(<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">cyb-compile-last-command</span> nil)
(<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">cyb-compile-command-history</span> nil)
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">cyb-compile</span> (arg)
<span style="color: #bc8f8f;">"Compile with given command, optionally recompile with last command"</span> (interactive <span style="color: #bc8f8f;">"P"</span>) (<span style="color: #7f007f;">if</span> arg (<span style="color: #7f007f;">progn</span> <span style="color: #b22222;">;; </span><span style="color: #b22222;">arg given: compile with last command </span> (<span style="color: #7f007f;">unless</span> cyb-compile-last-command (<span style="color: #ff0000; font-weight: bold;">error</span> <span style="color: #bc8f8f;">"Can't recompile yet, no known last command"</span>)) (compile cyb-compile-last-command)) <span style="color: #b22222;">;; </span><span style="color: #b22222;">else branch, no arg given, ask for a command </span> (<span style="color: #7f007f;">let</span> ((command (read-string <span style="color: #bc8f8f;">"Compile with command: "</span> <span style="color: #bc8f8f;">"make -k"</span> 'cyb-compile-command-history <span style="color: #bc8f8f;">"make -k"</span>))) (setq cyb-compile-last-command command) (compile command))))
(global-set-key (kbd <span style="color: #bc8f8f;">"<f11>"</span>) 'cyb-compile) </pre>
<p>With that little <em>Emacs Lisp</em> code we're driving Emacs the way we want to be working, and that's great! You can see it was a <em>quick hack</em> in that if you wanted to use the function non interactively it would still prompt for the command to use to compile, when <em>Emacs Lisp</em>interactive special form would
allow us to implement something way smarter here. Also if we wanted to spend
some more time on that feature, we should probably tweak the <em>error</em> condition
to be asking for the command rather than just complaining, that would
certainly be more useful.</p>
<p>Exercise left to the reader, rewrite using recompile rather than reinventing
it in a hurry! Beware of call-interactively though. Oh and fix the
aforementioned infelicities, too.</p>
<p>To conclude, we see that writing <em>Emacs Lisp</em> code to fix a usability problem
in a hurry is a great force of Emacs, and that we're provided with the
necessary tool set so as to be able to reach completeness if we wanted to do
so.</p>
]]></description>
<center> <p><img src="../../../images/in-core-replication.jpg" alt=""></p> </center> <p>The <em>in core replication</em> project has been presented with slides titled <a href="http://wiki.postgresql.org/images/7/75/BDR_Presentation_PGCon2012.pdf">Future In-Core Replication for PostgreSQL</a> and got a very good reception. For instance, people implementing <a href="http://slony.info/">Slony</a> (<em>Jan Wieck</em>, <em>Christopher Browne</em> and <em>Steve Singer</em> where here) appreciated the concepts here and where rather supportive of both the requirements and the design, and appreciated the very early demo and results that we had to show already, as a kind of a proof of concepts.</p> <p>After those first two days, we could start the actual show. I had the honnor to present a migration use case entitled <a href="http://www.pgcon.org/2012/schedule/events/431.en.html">Large Scale MySQL Migration</a> where we're speaking about going from MySQL to PostgreSQL, from 37 to 256 shards, moving more than 6TB of data including binary <em>blobs</em> that we had to process with<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 01 Jun 2012 18:45:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/06/01-emacs-recompile.html</guid> </item> <item> <title>Back From PgCon</title> <link>http://tapoueh.org/blog/2012/05/24-back-from-pgcon.html</link> <description><![CDATA[<p>Last week was the annual <em>PostgreSQL Hackers</em> gathering in Canada, thanks to the awesome <a href="http://www.pgcon.org/">pgcon</a> conference. This year's issue has been packed with good things, beginning with the <a href="http://wiki.postgresql.org/wiki/PgCon2012CanadaClusterSummit">Cluster Summit</a> then followed the next day by the <a href="http://wiki.postgresql.org/wiki/PgCon_2012_Developer_Meeting">Developer Meeting</a> just followed (yes, in the same day) with the <a href="http://wiki.postgresql.org/wiki/PgCon2012CanadaInCoreReplicationMeeting">In Core Replication Meeting</a>. That was a packed shedule!</p>
pl/java. A quite involved migration project whose slides you now can
read here:</p>
<center>
<p><a class="image-link" href="../../../images/fotolog.pdf">
<img src="../../../images/fotolog.jpg"></a></p>
</center>
<p>I've heard that we should soon be able to enjoy audio and video recordings
of the sessions, so if you couldn't make it this year for any reason, don't
miss that, you will have loads of very interesting talks to virtually
attend. I definitely will do that to catch-up with some talks I couldn't
attend, having to pick one out of three is not an easy task, all the more
when you add the providential <em>hallway track</em>.</p>
]]></description>
<center> <p><img src="../../../images/drop-queue.png" alt=""></p> </center> <p>Here we go. It used to be much more simple than that, so if you're still using <strong>PGQ</strong> from <strong>Skytools2</strong>, just jump to the next step.</p> <h3>Unregister Subconsumers</h3> <p class="first">That query will figure out subconsumers in the system function<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 24 May 2012 09:40:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/05/24-back-from-pgcon.html</guid> </item> <item> <title>Clean PGQ Subconsumers</title> <link>http://tapoueh.org/blog/2012/04/26-unregister-subconsumers.html</link> <description><![CDATA[<p>Now that you're all using the wonders of <a href="../03/12-PGQ-Cooperative-Consumers.html">Cooperative Consumers</a> to help you efficiently and reliably implement your business constraints and offload them from the main user transactions, you're reaching a point where you have to clean up your development environment (because that's what happens to development environments, right?), and you want a way to start again from a clean empty place.</p>
pgq.get_consumer_info() and ask PGQ to please <em>unregister</em> them, losing events
in the way, even events from batches that are currently active.</p>
<pre class="src">
<h3>Unregister Consumers</h3> <p class="first">Now that the first step is done, we have to <em>unregister</em> the main consumers, which is easy and what you already did before:</p> <pre class="src"> select queue_name, consumer_name,with subconsumers as ( select q1.queue_name, q2.consumer_name, substring(q1.consumer_name from <span style="color: #bc8f8f;">'%.#"%#"'</span> for <span style="color: #bc8f8f;">'#'</span>) as subconsumer_name from (select * from pgq.get_consumer_info() where lag is null) as q1 join (select * from pgq.get_consumer_info() where lag is not null) as q2 on q1.queue_name = q2.queue_name ) select , pgq_coop.unregister_subconsumer(queue_name, consumer_name, subconsumer_name, 1) from subconsumers; </pre>
pgq.unregister_consumer(queue_name, consumer_name) from pgq.get_consumer_info(); </pre>
<center> <p><img src="../../../images/workers.jpg" alt=""></p> </center> <p>That calls for using <a href="http://wiki.postgresql.org/wiki/PGQ_Tutorial">PGQ</a>, the <em>jobs queue</em> solution from <a href="http://wiki.postgresql.org/wiki/Skytools">Skytools</a>, the power horse for <a href="http://wiki.postgresql.org/wiki/Londiste_Tutorial">Londiste</a>. Iffrom pgq.queue; </pre> ]]></description> <author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 26 Apr 2012 15:05:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/04/26-unregister-subconsumers.html</guid> </item> <item> <title>PGQ Coop Consumers</title> <link>http://tapoueh.org/blog/2012/03/12-PGQ-Cooperative-Consumers.html</link> <description><![CDATA[<p>While working a new <a href="http://www.postgresql.org/">PostgreSQL</a> architecture for an high scale project that used to be in the top 10 of internet popular web sites (in terms of visitors), I needed to be able to off load some processing from the main path: that's called a <em>batch job</em>. This needs to be <em>transactional</em>: don't run the job if we did
rollback;the transaction, process all <em>events</em> that were part of the same transaction in the same transaction, etc.</p>
PGQ is good enough to build a full trigger-based
replication solution on top of it, certainly it's good enough for our custom
processing, right? Well, you still need to check that your expectations are
met, and that was happily the case in my implementation. It's a very common
problem, and PGQ very often is a great solution to it.</p>
<p>As this implementation is PHP centric, we've been using <a href="https://github.com/dimitri/libphp-pgq">libphp-pgq</a> to drive
our background workers. Using PGQ in PHP has been very easy to setup, the
only trap being not to forget about running the <em>ticker</em> process.</p>
<p>It got interesting because of two elements. First, we're nor running a
single database instance here but a bunch of them... make it <em>256 databases</em>.
Each of them having 5 queues to consume, that would be about 1280 consumer
processes, distributed on 16 servers that's still 80 per server, so way too
many. What we did instead is reuse the <a href="https://github.com/markokr/skytools/blob/master/scripts/queue_mover.py">queue mover</a> script found in the
Skytools distribution and adapt it to <em>forward</em> the event of the 1280 source
queues to only 5 destination queues. We then process the events from this
single location.</p>
<p>Now it's easier to deal with, but we're not still exactly there. Of course,
with so many sources, concentrating them all into the same place means that
a single consumer is not able to process the events as fast as they are
produced. That's where the <em>cooperative consuming</em> shines, it's very easy to
turn your <em>consumer</em> into a <em>cooperative</em> one even on an existing and running
queue, and that's what we did. So now we can choose how many <em>workers</em> we want
per queue: one of them has 4 workers, another one see not so much activity
and 1 worker still fits.</p>
<center>
<p><img src="../../../images/coop-workers.jpeg" alt=""></p>
</center>
<p>The queue mover script that knows how to subscribe to many queues from the
same process is going to be contributed to Skytools proper, of course.</p>
]]></description>
<p>To be specific, only <em>superusers</em> are allowed to install C coded stored procedures, and that impacts a lot of very useful PostgreSQL extension: all those shiped in the <em>contrib</em> package are coded in C. Now, <a href="https://postgres.heroku.com/blog">Heroku</a> is not giving away <em>superuser</em> access to their hosted customers in order to limit the number of ways they can shoot themselves in the foot. And given PostgreSQL security model, being granted <em>database owner</em> is mostly good enough for day to day operation.</p> <blockquote> <p class="quoted"> See Andrew's article <a href="http://people.planetpostgresql.org/andrew/index.php?/archives/259-Heroku,-a-really-easy-way-to-get-a-database-in-a-hurry..html">Heroku, a really easy way to get a database in a hurry</a> for more context about Heroku's offering here.</p> </blockquote> <p>Mostly, but as we see, not completely good enough. How to arrange for a non <em>superuser</em> to be able to still install a C-coded extension in his own database? That's quite dangerous as any bug causing a crash would mean a PostgreSQL whole restart. So you not only want to empower<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 12 Mar 2012 14:43:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/03/12-PGQ-Cooperative-Consumers.html</guid> </item> <item> <title>Extension White Listing</title> <link>http://tapoueh.org/blog/2012/03/08-extension-white-listing.html</link> <description><![CDATA[<p>PostgreSQL 9.1 includes proper extension support, as you might well know if you ever read this very blog here. Some hosting facilities are playing with PostgreSQL at big scale (hello <a href="https://postgres.heroku.com/blog">Heroku</a>!) and still meet with small caveats making their life uneasy.</p>
CREATE EXTENSION
to database owners, you also want to be able to review and explicitely <em>white
list</em> the allowed extensions.</p>
<p>Here we go: <a href="https://github.com/dimitri/pgextwlist">pgextwlist</a> is a PostgreSQL extensions implementing just that
idea. You have to tweak local_preload_libraries so that it gets loaded
automatically and early enough, and you have to provide for the list of
authorized extensions in the extwlist.extensions setting.</p>
<p>Let's see a usage example, straight from the documentation:</p>
<pre class="src">
dim=> select rolsuper from pg_roles where rolname = current_user;
select rolsuper from pg_roles where rolname = current_user;
rolsuper <span style="color: #b22222;">———- </span> f (1 row)
dim=> create extension hstore;
create extension hstore;
WARNING: > is deprecated as an operator name
DETAIL: This name may be disallowed altogether in future versions of PostgreSQL.
CREATE EXTENSION
dim> create extension earthdistance;
create extension earthdistance;
ERROR: extension "earthdistance" is not whitelisted
DETAIL: Installing the extension "earthdistance" failed, because it is not
on the whitelist of user-installable extensions. HINT: Your system administrator has allowed users to install certain extensions. SHOW extwlist.extensions;
dim=> \dx \dx
List of installed extensions
| Name | Version | Schema | Description |
<span style="color: #b22222;">———+———+————+—————————————————
| </span> hstore | 1.0 | public | data type for storing sets of (key, value) pairs |
| plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language |
(2 rows)
dim=> drop extension hstore; drop extension hstore; DROP EXTENSION </pre>
<p>As you can see, it allows non <em>superusers</em> to install an extension written in C.</p> ]]></description><center> <p><a class="image-link" href="../../../images/confs/elisp.pdf"> <img src="../../../images/confs/elisp-1.png"></a></p> </center> <p>J'ai dans cette présentation très rapide (5 minutes seulement) mentionné l'approche <em>axiomatique</em> de <strong><em>John McCarthy</em></strong> lorsqu'il a <em>découvert</em> le language, on peut en lire un peu plus sur le site de <strong><em>Paul Graham</em></strong> et son article <a href="http://www.paulgraham.com/rootsoflisp.html">The Roots of Lisp</a> et le code associé, une <a href="http://lib.store.yahoo.net/lib/paulgraham/jmc.lisp">implémentation du LISP de McCarthy en common lisp</a>.</p> <p>Merci à <a href="http://jduchess.org/">Duchess</a> pour une bonne soirée où nous avons pu échanger nos points de vue et débattre des languages fonctionnels et objects, des différences entre Erlang et Haskell et Ruby, et de quelques autres sujets dérivés !</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 08 Mar 2012 14:25:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/03/08-extension-white-listing.html</guid> </item> <item> <title>Battle Language à la Marmite</title> <link>http://tapoueh.org/blog/2012/03/01-duchessfr-battle-language.html</link> <description><![CDATA[<p>J'ai eu la chance hier soir de participer à la <a href="http://jduchess.org/duchess-france/blog/battle-language-a-la-marmite/">Battle Language à la Marmite</a>, où j'avais proposé de parler de <a href="http://www.emacswiki.org/emacs/EmacsLisp">Emacs Lisp</a>, proposition qui s'est transformée en porte-étendard de la grande famille <a href="http://www.lisp.org/index.html">Lisp</a>. J'ai utilisé avec plaisir certains contenu de <a href="http://www.lisperati.com/">Lisperati</a> dans ma présentation et je vous recommande le détour sur ce site !</p>
<center> <p><img src="../../../images/bouncing_elephant.gif" alt=""></p> </center> <p>As the plugin is 300 lines of python code, it's not a good idea to just inline it here, so please grab it at <a href="../../../resources/pgbouncer_">pgbouncer_</a>.</p> <p>You might need to know that the script name once installed should follow the form<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 01 Mar 2012 14:49:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2012/03/01-duchessfr-battle-language.html</guid> </item> <item> <title>pgbouncer munin plugin</title> <link>http://tapoueh.org/blog/2011/11/16-pgbouncer-munin.html</link> <description><![CDATA[<p>It seems that if you search for a <a href="http://munin-monitoring.org/">munin</a> plugin for <a href="http://wiki.postgresql.org/wiki/PgBouncer">pgbouncer</a> it's easy enough to reach an old page of mine with an old version of my plugin, and a broken link. Let's remedy that by publishing here the newer version of the plugin. To be honest, I though it already made its way into the official munin
1.4set of plugins, but I've not been following closely enough.</p>
pgbouncer_dbname_stats_requests or pgbouncer_dbname_pools, where of
course dbname can contain any number of _ characters. This script supports
quite old versions of <em>pgbouncer</em> that didn't accept the normal pq protocol,
you did have to use psql to have any chance of getting the data from a
script, you couldn't then just use a PostgreSQL driver such as <a href="http://initd.org/psycopg/">psycopg2</a>.</p>
]]></description>
<center> <p><a class="image-link" href="http://wiki.postgresql.org/images/f/f1/Using-extensions.pdf"> <img src="../../../images/using-extensions-10.png"></a></p> </center> <p>L'idée de ma présentation, que la plupart d'entre vous a loupé je suppose (en tout cas je n'avais qu'une petite poignée de français dans la salle, et j'espère avoir des lecteurs qui n'étaient pas à Amsterdam), l'idée est d'utiliser les mécanismes offerts par les extensions afin de maintenir le code<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 16 Nov 2011 14:00:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/11/16-pgbouncer-munin.html</guid> </item> <item> <title>Extensions en simple SQL</title> <link>http://tapoueh.org/blog/2011/10/31-extensions-sql.html</link> <description><![CDATA[<p>La <a href="http://2011.pgconf.eu/">conférence européenne à Amsterdam</a> était un très bon évènement de la communauté, avec une organisation impeccable dans un hôtel accueillant. J'ai eu le plaisir d'y parler des extensions et de leur usage dans le cadre du développement applicatif « interne », sous le titre <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/138-extensions-are-good-for-business-logic/">Extensions are good for business logic</a>.</p>
PL que vous utilisez en production.</p>
<p>Il s'agit la plupart du temps de procédures qui implémentent une partie de
la logique métier de vos applications, mais si proche des données que cela
termine en base directement : c'est une bonne chose, en particulier depuis
<em>PostgreSQL 9.1</em>. Cette version propose en effet une gestion assez complète
des extensions.</p>
<p>Il s'agit de réaliser un <em>empaquetage</em> de vos procédures en suivant la
documentation en ligne et son chapitre
<a href="http://docs.postgresqlfr.org/9.1/extend-extensions.html">35.15. Empaqueter des objets dans une extension</a>. Une fois cela fait, il est
alors possible de déployer votre ensemble de procédure stockée avec la
commande CREATE EXTENSION mesprocs;, et ensuite la commande psql \dx vous
permet de lister les extensions installées et leur numéro de version.</p>
<p>Les mises à jours sont également gérées avec une commande SQL dédiée, il
s'agit alors de ALTER EXTENSION mesprocs UPDATE [TO version];. Il suffit de
fournir des scripts intermédiaires nommés par exemple mesprocs--1.0--1.1.sql
et mesprocs--1.1--1.2.sql et PostgreSQL saura comment passer de 1.0 à 1.1.</p>
<p>Voilà, vous savez presque tout de ma présentation à Amsterdam et vous pouvez
retrouver le reste sur le support proposé en début d'article. Bien sûr je
n'ai pas reproduit ici les questions intéressantes qui m'ont été posées, une
bonne partie d'entre elles sont venues enrichir ma liste de Noël pour les
extensions. Si vous voulez être sûr de trouver cela sous votre sapin,
cependant, le meilleur moyen est encore de m'en parler : sponsoriser les
développement Open Source est une belle démarche :)</p>
]]></description>
<center> <p><img src="../../../images/ams-conf-room.jpg" alt=""></p> </center> <p><a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/2-dave-page/">Dave Page</a> talked about<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 31 Oct 2011 14:22:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/10/31-extensions-sql.html</guid> </item> <item> <title>Back From Amsterdam</title> <link>http://tapoueh.org/blog/2011/10/26-back-from-amsterdam.html</link> <description><![CDATA[<p>Another great conference took place last week, <a href="http://2011.pgconf.eu/">PostgreSQL Conference Europe 2011</a> was in Amsterdam and plenty of us PostgreSQL geeks were too. I attended to lot of talks and did learn some more about our project, its community and its features, but more than that it was a perfect occasion to meet with the community.</p>
SQL/MED under the title
<a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/146-postgresql-at-the-center-of-your-dataverse/">PostgreSQL at the center of your dataverse</a> and detailed what to expert from
a <em>Foreign Data Wrapper</em> in PostgreSQL 9.1, then how to write your own.
Wherever you are currently managing your data, you can easily enough make it
so that PostgreSQL integrates them by means of fetching them to answer your
queries. Which means real time data federating: you don't copy data around,
you remote access them when executing the query.</p>
<p>I might need to come up with new <em>Foreign Data Wrappers</em> in a not too distant
future, now that I better grasp how much work it really is to do that, it
appears to be a good migration strategy too:</p>
<pre class="src">
<p>Another discovery is that apparently <a href="http://code.google.com/p/plv8js/wiki/PLV8">PLv8</a> is ready for public consumption. Using it can lead to <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/174-heralding-the-death-of-nosql/">Heralding the Death of NoSQL</a>, so use it with care.</p> <p>In the presentation of <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/156-synchronous-replication-and-durability-tuning/">Synchronous Replication and Durability Tuning</a> we mainly saw that mixing <em>synchronous</em> and <em>asynchronous</em> transactions in your application is the key to real performances across the ocean, as the speed of the light is not infinite. From Baltimore to Amsterdam the latency can not be better thanINSERT INTO real.table SELECT FROM foreign.table; </pre>
100ms and that's not the same as <em>instant</em> nowadays.</p>
<p>Then again, depending on the number of concurrent queries to sync over the
ocean link, the experimental setup was able to achieve several thousands of
queries per second, which is validating the model we picked for <em>sync rep</em> and
its implementation.</p>
<p>If you want to read the slides again at home, or if you could not be there
for some reason, then most of the talks are now available online at the
<a href="http://wiki.postgresql.org/wiki/PostgreSQL_Conference_Europe_Talks_2011">PostgreSQL Conference Europe Talks 2011</a> wiki page.</p>
]]></description>
<p>I tend to think best practice here begins with defining properly the <em>backup plan</em> you want to implement. It's quite a complex matter, so be sure to ask yourself about your needs: what do you want to be protected from?</p> <center> <p><img src="../../../images/online-backup.jpg" alt=""></p> </center> <p>The two main things to want to protect from are hardware loss (crash disaster, plane in the data center, fire, water flood, etc) and human error (<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 26 Oct 2011 10:08:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/10/26-back-from-amsterdam.html</guid> </item> <item> <title>Implementing backups</title> <link>http://tapoueh.org/blog/2011/10/12-backup-strategy.html</link> <description><![CDATA[<p>I've been asked about my opinion on backup strategy and best practices, and it so happens that I have some kind of an opinion on the matter.</p>
UPDATE without a where clause). Replication is an answer to the former,
archiving and dumps to the latter. You generally need both.</p>
<p>Often enough “backups” include WAL <em>archiving</em> and <em>shipping</em> and nightly or
weekly <em>base backups</em>, with some retention and some scripts or procedures
ready to setup <a href="http://www.postgresql.org/docs/9.1/static/continuous-archiving.html">Point In Time Recovery</a> and recover some data without
interfering with the WAL archiving and shipping. Of course with PostgreSQL
9.0 and 9.1, the <em>WAL Shipping</em> can be implemented with <em>streaming replication</em>
and you can even have a <em>Hot Standby</em>. But for backups you still want
archiving.</p>
<p>Mostly I still implement pg_dump -Fc nightly backups with a custom retention
(for example, 1 backup a month kept 2 years, 1 backup a week kept 6 or 12
months, 1 backup a night kept 1 to 2 weeks), when the database size allow
the pg_dump run to remain constrained in the <em>maintenance window</em>, if any.</p>
<p>Don't forget that while pg_dump runs, you can't roll out <em>DDL changes</em> to the
production system any more, so you want to be careful about this
<em>maintenance window</em> thing. When you have one.</p>
<p><em>Physical backups</em> are not locking <em>rollouts</em> away, but they often suck a good
deal of the <em>IO bandwidth</em> so you need to pick up a right timing to do them.
That's how you can get to once a week base backup and WAL <em>archiving</em>.</p>
<p>If you can't pg_dump production, maybe you can have <em>automated restore jobs</em>
from the <em>physical backups</em> that you then pg_dump -Fc, so that you still have
that. That can come up handy, really: you can't test your <em>major upgrade</em> out
of a <em>physical backup</em>.</p>
<p>Also, <strong><em>obviously</em></strong>, never consider your backup strategy implemented until you
have either <em>automated restores</em> in place or a regular schedule to exercise
them (<em>staging instances</em>, devel instances).</p>
<p>Then as far as the practical tools go, I tend to think that <a href="http://tapoueh.org/pgsql/pgstaging.html">pg_staging</a> is
worth its installation complexity, and for WAL archiving and base backup I
recommend <a href="http://skytools.projects.postgresql.org/doc/walmgr.html">walmgr</a> from <a href="http://wiki.postgresql.org/wiki/SkyTools">Skytools</a>, that's a very handy tool. When using
PostgreSQL 9.0 or 9.1, consider using <a href="http://packages.debian.org/experimental/skytools3-walmgr">walmgr3</a> so that it's behaving nice
alongside <em>streaming replication</em>.</p>
]]></description>
<p>Je présenterai donc comment utiliser les extensions, le titre en anglais est <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/138-extensions-are-good-for-business-logic/">Extensions are good for business logic</a>, et l'idée est de voir comment exploiter les extensions afin de mieux gérer vos mises à jours en bases de données.</p> <p>Le cycle de vie des bases de données en production inclue souvent l'utilisation d'une base de développement où le schéma évolue au rythme des besoins des développeurs, et de temps en temps on consolide une partie de ces modifications (dans des <em>rollouts</em>, scripts contenant principalement des<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 12 Oct 2011 22:22:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/10/12-backup-strategy.html</guid> </item> <item> <title>Extensions, applications</title> <link>http://tapoueh.org/blog/2011/10/10-extensions-applicatives.html</link> <description><![CDATA[<p>La <a href="http://2011.pgconf.eu/">conférence PostgreSQL annuelle en Europe</a> a lieu la semaine prochaine à Amsterdam, et j'espère que vous avez déjà vos billets, car cette édition s'annonce comme un très bon millésime !</p>
DDL) afin de les déployer en production — si possible avec une étape
intermédiaire en préproduction, tout de même.</p>
<p>Savoir ce qui est déployé en développement et comment en retirer le script à
jouer en production peut être parfois fastidieu. Quand ce n'est pas le cas,
c'est que le travail a été fait en amont, ce qui est le signe d'une bonne
organisation, avec les surcoûts que l'on peut imaginer.</p>
<p>Les <a href="http://www.postgresql.org/docs/9.1/static/extend-extensions.html">extensions</a> telles que présentes dans PostgreSQL 9.1 vous permettent de
mieux gérer ce genre de cas, en optimisant le surcoût : il ne disparaît pas,
mais devient opérationnel plutôt que de rester une charge d'organisation.</p>
<p>Allez, je vous laisse maintenant, je dois me préparer pour la conférence :)</p>
]]></description>
<p>It's not the way I would typically approach scaling problems, and apparently I'm not alone on the <em>Stored Procedures</em> camp. Did you read this nice blog post <a href="http://ora-00001.blogspot.com/2011/07/mythbusters-stored-procedures-edition.html">Mythbusters: Stored Procedures Edition</a> already? Well it happens in another land that where my comfort zone is, but still has some interesting things to say.</p> <p>I won't try and address all of the myths they attack in a single article. Let's pick the scalability problems, the two of them I think about are code management and performances. We are quite well equiped for that in PostgreSQL, really.</p> <p>For code maintainance we now have <a href="http://www.postgresql.org/docs/9.1/static/extend-extensions.html">PostgreSQL Extensions</a>, which allows you to pack all your procedures into separate <em>extensions</em>, and to maintain a version number and upgrade procedures for each of them. You can handle separate rollouts in development for going from<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 10 Oct 2011 10:35:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/10/10-extensions-applicatives.html</guid> </item> <item> <title>Scaling Stored Procedures</title> <link>http://tapoueh.org/blog/2011/10/06-scaling-with-stored-procedures.html</link> <description><![CDATA[<p>In the news recently <em>stored procedures</em> where used as an excuse for moving away logic from the database layer to application layer, and to migrate away from a powerful technology to a simpler one, now that there's no logic anymore in the database.</p>
1.12 to 1.13 then 1.14 and after the
developers tested it more completely and changed their mind again on the
best API they want to work with, 1.15 which is stamped ok for production.
At this point, ALTER EXTENSION UPGRADE will happily apply all the rollouts
in sequence to upgrade from 1.12 straight to 1.15 in one go. And if you
prefer to bake a special careful script to handle that big jump, you also
can provide a specific extension--1.12--1.15.sql script.</p>
<p>Of course you're managing all those files with your favorite <em>SCM</em>, to answer
to some other myth from the blog reference we are loosely following.</p>
<center>
<p><a class="image-link" href="http://postgresqlrussia.org/articles/view/131">
<img src="../../../images/Moskva_DB_Tools.v3.png"></a></p>
</center>
<p>I wanted to talk about the other side of the scalability problem, which is
the operations side of it. What happens when you need to scale the database
in terms of its size and level of concurrent activity? PostgreSQL earned a
very good reputation at being able to scale-up, what about scaling-out?
Certainly, now that you're all down into <em>Stored Procedure</em>, it's going to be
a very bad situation?</p>
<p>Well, in fact, you're then in a very good position here, thanks to <a href="http://wiki.postgresql.org/wiki/PL/Proxy">PLproxy</a>.
This <em>extension</em> is a custom procedural language whose job is to handle a
cluster of database shards that all expose the same PL API, and it's very
good at doing that.</p>
<p><em>Stored Procedures</em> are a very good tool to have, be sure to get comfortable
enough with them so that you can choose exactly when to use them. If you're
not sure about that, we at <a href="http://www.2ndquadrant.com/">2ndQuadrant</a> will be happy to help you there!</p>
]]></description>
<p>I'll be presenting another talk about extensions, but this time I've geared up to use cases, with <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/138-extensions-are-good-for-business-logic/">Extensions are good for business logic</a>. The idea is not to talk about how to make PostgreSQL play fair with extensions including at <em>dump</em> and <em>restore</em> times, that's already done and I've been talking only too much about it. The idea this time is to figure out how much you get from this feature.</p> <p>If you ever felt like something is missing in your processes between pushing rollouts in devel environments and refining them as developers are testing and preparing something for the live databases, then we have something for you here. Including how to easily compare state between production and development, but without having to guess or reverse engineer anything.</p> <p>Yeah, extensions are all about getting even more professional! A great tool you'll be happy to master!</p> <p>And now I need to prepare a damn good slide deck, right? See you there! :)</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 06 Oct 2011 18:23:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/10/06-scaling-with-stored-procedures.html</guid> </item> <item> <title>See you in Amsterdam</title> <link>http://tapoueh.org/blog/2011/10/04-see-you-in-Amsterdam.html</link> <description><![CDATA[<p>The next <a href="http://2011.pgconf.eu/">PostgreSQL conference</a> is approaching very fast now, I hope you have your ticket already: it's a very promissing event! If you want some help in deciding whether to register or not, just have another look at <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/">the schedule</a>. Pick the talks you want to see. It's hard, given how packed with good ones the schedule is. When you're mind is all set, review the list. Registered?</p>
<p>C'est l'endroit où aller pour apprendre comment le projet fonctionne, comprendre les impacts des nouvelles versions sur votre architecture, avoir une discussion technique pointue sur cette fonctionalité que vous voudriez voir arriver dans la prochaine version, ou simplement vous rendre compte de l'énergie formidable qui est insuflée dans ce projet !</p> <p>Évidemment <a href="http://2ndquadrant.fr/">2ndQuadrant</a> sera de la partie, nous présenterons plusieurs de nos <a href="http://www.2ndquadrant.com/fr/les-fonctionnalites-de-postgresql-91/">contributions PostgreSQL 9.1</a>. Cela commencera avec la formation d'une journée complète de <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/81-greg-smith/">Greg</a>, <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/162-performance-from-start-to-crash/">Performance From Start to Crash</a> : si vous voulez apprendre comment aborder les performances d'un serveur PostgreSQL par le <em>leader</em> international du domaine, auteur du livre <a href="http://www.amazon.fr/Bases-donn%C3%A9es-PostgreSQL-Gregory-Smith/dp/274402483X/ref=sr_1_1?ie=UTF8&qid=1316183931&sr=8-1">Bases de données PostgreSQL 9.0</a>, réservez vite votre place !</p> <p>Les présentation au format classique commencent le lendemain, et en trois jours la liste des présentation de notre <a href="http://www.2ndquadrant.com/fr/profil-de-lequipe/">équipe 2ndQuadrant</a> est assez copieuse. Voyons cela.</p> <p>Nous commençons avec <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/144-migration-to-postgresql-a-holistic-view/">Migration to PostgreSQL - a holistic view</a> par <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/78-harald-armin-massa/">Harald Armin Massa</a>, qui propose un point de vue intéressant sur les raisons qui retiennent certaines migrations. Ensuite <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/34-gianni-ciolli/">Gianni Ciolli</a> présentera <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/159-look-out-the-window-functions-and-free-your-sql/">Look Out The Window Functions (and free your SQL)</a> ou comment résoudre simplement des problèmes complexes lorsque l'on dispose d'outils avancés.</p> <p>Une autre présentation à ne pas rater, <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/156-synchronous-replication-and-durability-tuning/">Synchronous Replication and Durability Tuning</a> détaille comment profiter au mieux de PostgreSQL 9.1 afin d'obtenir les garanties de durabilité des données souhaitées dans votre application. Et cette présentation est animée par <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/81-greg-smith/">Greg Smith</a> et <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/17-simon-riggs/">Simon Riggs</a>. Ce dernier a développé la <em>réplication synchrone</em>, et <em>Hot Standby</em> avant cela. Vous ne trouverez personne au monde mieux placé pour faire cette présentation !</p> <p>Les deux prochaines présentation de nos <a href="http://expert-postgresql.fr/">experts PostgreSQL</a>, en continuant notre lecture du <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/">programme de pgconf.eu</a> dans l'ordre, ont lieu au même moment. Le choix ne sera pas facile entre <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/158-improving-vacuum-suction/">Improving VACUUM Suction</a> par Greg à nouveau, et une comparaison de <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/183-londiste-3-et-slony-21/">londiste 3 et slony 2.1</a> par <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/57-cedric-villemain/">Cédric Villemain</a>, en français.</p> <p>À suivre, <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/138-extensions-are-good-for-business-logic/">Extensions are good for business logic</a> que je vous présenterai moi-même, vous pouvez voir ma présentation sur la fiche qui porte mon nom : <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/14-dimitri-fontaine/">Dimitri Fontaine</a>. Il s'agit d'une présentation en anglais qui détaille comment utiliser les extensions dans le cadre de la maintenance de la partie <em>procédures stockées</em> d'une application.</p> <p>Et pour finir le deuxième jour des conférences 2ndQuadrant, vous pourrez apprendre avec Gianni comment <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/160-debugging-complex-sql-queries-with-writable-ctes/">Debugging complex SQL queries with writable CTEs</a>, une fonctionnalité contribuée au projet par un autre consultant <a href="http://www.2ndquadrant.com/fr/contact/">2ndQuadrant</a>, Marko Tiikkaja.</p> <p>Et il reste encore une journée ! Nous ne mentons pas en disant que le programme est complet ! Le dernier jour de la conférence n'est pas le moins intéressant, j'espère que vous aurez su garder un peu d'énergie pour suivre…</p> <p><a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/speaker/17-simon-riggs/">Simon Riggs</a> qui présentera sa vision de la <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/199-postgresql-roadmap/">PostgreSQL Roadmap</a> pour les prochaines années. Ce n'est bien sûr que sa vision personnelle, mais lorsque l'on fait le bilan de ces 7 dernières années de <a href="http://www.2ndquadrant.com/fr/histoire-postgresql/">contributions à PostgreSQL</a>, on voit à quel point son opinion personnelle peut avoir du poids dans le développement du projet.</p> <p>À suivre, la présentation de Greg sur son sujet de prédilection : <a href="http://www.postgresql.eu/events/schedule/pgconfeu2011/session/157-bottom-up-database-benchmarking/">Bottom-up Database Benchmarking</a>. Tout ce que vous avez toujours voulu savoir sur les mesures de performances de vos bases de données, sans jamais oser le demander. Quelque chose dans ce style en tout cas :)</p> <p>Bien sûr d'autres présentations sont disponibles et retiendront votre attention, ce billet vous présente seulement celles qui seront données par les <a href="http://expert-postgresql.fr/">experts PostgreSQL</a> de <a href="http://www.2ndquadrant.com/fr/expertise-postgresql/">2ndQuadrant</a>. En vous souhaitant bonne conférence à tous, j'espère avoir le plaisir de vous retrouver à Amsterdam le mois prochain !</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 04 Oct 2011 14:25:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/10/04-see-you-in-Amsterdam.html</guid> </item> <item> <title>PostgreSQL à Amsterdam</title> <link>http://tapoueh.org/blog/2011/09/27-pgconf-eu.html</link> <description><![CDATA[<p>Dans moins d'un mois se tient la conférence européenne PostgreSQL, <a href="http://2011.pgconf.eu/">pgconf.eu</a>. Il s'agit de quatre jours consacrés à votre SGBD préféré, où vous pourrez rencontrer la communauté européenne, consituée d'utilisateurs, d'entreprises de toutes tailles, de développeurs, de participants en tout genre.</p>
<p><a href="http://packages.debian.org/experimental/skytools3-walmgr">WalMgr</a> is the Skytools component that manages <em>WAL shipping</em> for you, and archiving too. It knows how to prepare your master and standby setup, how to take a base backup and push it to the standby's system, how to archive (at the satndby) master's WAL files as they are produced and have the standby restore from this archive.</p> <p>What's new in<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 27 Sep 2011 11:10:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/09/27-pgconf-eu.html</guid> </item> <item> <title>Skytools3: walmgr</title> <link>http://tapoueh.org/blog/2011/09/21-skytools-walmgr-part-1.html</link> <description><![CDATA[<p>Let's begin the <a href="http://wiki.postgresql.org/wiki/SkyTools">Skytools 3</a> documentation effort, which is long overdue. The code is waiting for you over at <a href="https://github.com/markokr/skytools">github</a>, and is stable and working. Why is it still in <em>release candidate</em> status, I hear you asking? Well because it's missing updated documentation.</p>
walmgr from Skytools 3 is its support for <em>Streaming
Replication</em> that made its way into PostgreSQL 9.0 and is even more useful in
PostgreSQL 9.1 (better monitoring, synchronous replication option).</p>
<h2>Getting ready</h2>
<p class="first">Now, I'm using debian here, and a build virtual machine where I'm doing the
<em>backporting</em> work. As <a href="http://www.postgresql.org/about/news.1349">PostgreSQL 9.1</a> is now out, let's use that.</p>
<pre class="src">
:~$ pg_lsclusters
Version Cluster Port Status Owner Data directory
8.4 main 5432 online postgres /var/lib/postgresql/8.4/main ...
9.0 main 5433 online postgres /var/lib/postgresql/9.0/main ...
9.1 main 5434 online postgres /var/lib/postgresql/9.1/main ...
</pre>
<p>After some editing of the configuration files (enabling <em>hot standby</em> and
switching pg_hba.conf to trust for the sake of this example), we can see
that the cluster is ready to be abused:</p>
<pre class="src">
:~$ sudo pg_ctlcluster 9.1 main restart
:~$ psql —cluster 9.1/main -U postgres \
-c <span style="color: #ad7fa8; font-style: italic;">"select name, setting from pg_settings where name in ('max_wal_senders', 'wal_level')"</span>
| name | setting |
+————-
| max_wal_senders | 1 |
| wal_level | hot_standby |
(2 rows)
:~$ sudo mkdir -p /etc/walshipping/9.1/main /var/lib/postgresql/walshipping :~$ sudo chown -R postgres:postgres /etc/walshipping /var/lib/postgresql/walshipping
:~$ ssh-keygen -t dsa :~/.ssh$ cp id_dsa.pub authorized_keys :~$ ssh localhost </pre>
<p>So the order of operations is to prepare a standby, then have it restore from the archives, then activate the wal streaming and check that the setup allows the standby to switch back and forth between the streaming and the archives.</p> <h2>Setting walmgr</h2> <p class="first">To prepare the standby, we will do a <em>base backup</em> of the master. That step is handled bywalmgr, so we first need to set it up. Here's the sample
master.ini file:</p>
<pre class="src">
[<span style="color: #8ae234; font-weight: bold;">walmgr</span>]
<span style="color: #eeeeec;">job_name</span> = wal-master
<span style="color: #eeeeec;">logfile</span> = /var/log/postgresql/%(job_name)s.log
<span style="color: #eeeeec;">pidfile</span> = /var/run/postgresql/%(job_name)s.pid
<span style="color: #eeeeec;">use_skylog</span> = 0
<span style="color: #eeeeec;">master_db</span> = port=5434 dbname=template1 <span style="color: #eeeeec;">master_data</span> = /var/lib/postgresql/9.1/main/ <span style="color: #eeeeec;">master_config</span> = /etc/postgresql/9.1/main/postgresql.conf <span style="color: #eeeeec;">master_bin</span> = /usr/lib/postgresql/9.1/bin
<span style="color: #888a85;"># </span><span style="color: #888a85;">set this only if you can afford database restarts during setup and stop. </span><span style="color: #eeeeec;">master_restart_cmd</span> = pg_ctlcluster 9.1 main restart
<span style="color: #eeeeec;">slave</span> = 127.0.0.1 <span style="color: #eeeeec;">slave_config</span> = /etc/walshipping/9.1/main/standby.ini
<span style="color: #eeeeec;">walmgr_data</span> = /var/lib/postgresql/walshipping/9.1/main <span style="color: #eeeeec;">completed_wals</span> = %(walmgr_data)s/logs.complete <span style="color: #eeeeec;">partial_wals</span> = %(walmgr_data)s/logs.partial <span style="color: #eeeeec;">full_backup</span> = %(walmgr_data)s/data.master <span style="color: #eeeeec;">config_backup</span> = %(walmgr_data)s/config.backup
<span style="color: #888a85;"># </span><span style="color: #888a85;">syncdaemon update frequency </span><span style="color: #eeeeec;">loop_delay</span> = 10.0 <span style="color: #888a85;"># </span><span style="color: #888a85;">use record based shipping available since 8.2 </span><span style="color: #eeeeec;">use_xlog_functions</span> = 0
<span style="color: #888a85;"># </span><span style="color: #888a85;">pass -z to rsync, useful on low bandwidth links </span><span style="color: #eeeeec;">compression</span> = 0
<span style="color: #888a85;"># </span><span style="color: #888a85;">keep symlinks for pg_xlog and pg_log </span><span style="color: #eeeeec;">keep_symlinks</span> = 1
<span style="color: #888a85;"># </span><span style="color: #888a85;">tell walmgr to set wal_level to hot_standby during setup </span><span style="color: #888a85;">#</span><span style="color: #888a85;">hot_standby = 1 </span> <span style="color: #888a85;"># </span><span style="color: #888a85;">periodic sync </span><span style="color: #888a85;">#</span><span style="color: #888a85;">command_interval = 600 </span><span style="color: #888a85;">#</span><span style="color: #888a85;">periodic_command = /var/lib/postgresql/walshipping/periodic.sh </span></pre>
<p>And the/etc/walshipping/9.1/main/standby.ini companion:</p>
<pre class="src">
[<span style="color: #8ae234; font-weight: bold;">walmgr</span>]
<span style="color: #eeeeec;">job_name</span> = wal-standby
<span style="color: #eeeeec;">logfile</span> = /var/log/postgresql/%(job_name)s.log
<span style="color: #eeeeec;">use_skylog</span> = 0
<span style="color: #eeeeec;">slave_data</span> = /var/lib/postgresql/9.1/standby <span style="color: #eeeeec;">slave_bin</span> = /usr/lib/postgresql/9.1/bin <span style="color: #eeeeec;">slave_stop_cmd</span> = pg_ctlcluster 9.1 standby stop <span style="color: #eeeeec;">slave_start_cmd</span> = pg_ctlcluster 9.1 standby start <span style="color: #eeeeec;">slave_config_dir</span> = /etc/postgresql/9.1/standby/
<span style="color: #eeeeec;">walmgr_data</span> = /var/lib/postgresql/walshipping/9.1/main <span style="color: #eeeeec;">completed_wals</span> = %(walmgr_data)s/logs.complete <span style="color: #eeeeec;">partial_wals</span> = %(walmgr_data)s/logs.partial <span style="color: #eeeeec;">full_backup</span> = %(walmgr_data)s/data.master <span style="color: #eeeeec;">config_backup</span> = %(walmgr_data)s/config.backup
<span style="color: #eeeeec;">backup_datadir</span> = no <span style="color: #eeeeec;">keep_backups</span> = 0 <span style="color: #888a85;"># </span><span style="color: #888a85;">archive_command = </span> <span style="color: #888a85;"># </span><span style="color: #888a85;">primary database connect string for hot standby — enabling </span><span style="color: #888a85;"># </span><span style="color: #888a85;">this will cause the slave to be started in hot standby mode. </span><span style="color: #eeeeec;">primary_conninfo</span> = host=127.0.0.1 port=5434 user=postgres </pre>
<p>And let's get started:</p> <pre class="src"> :~$ cp standby.ini /etc/walshipping/9.1/main/:~$ walmgr3 -v master.ini setup 2011-09-21 16:57:05,685 30450 INFO Configuring WAL archiving 2011-09-21 16:57:05,687 30450 DEBUG found 'archive_mode' in config — enabling it 2011-09-21 16:57:05,687 30450 DEBUG found 'wal_level' in config — setting to 'archive' 2011-09-21 16:57:05,688 30450 DEBUG modifying configuration: {'archive_mode': 'on', 'wal_level': 'archive', 'archive_command': '/usr/bin/walmgr3 /var/lib/postgresql/master.ini xarchive %p %f'} 2011-09-21 16:57:05,688 30450 DEBUG found parameter archive_mode with value ''off'' 2011-09-21 16:57:05,690 30450 DEBUG found parameter wal_level with value ''minimal'' 2011-09-21 16:57:05,690 30450 DEBUG found parameter archive_command with value '''' 2011-09-21 16:57:05,691 30450 INFO Restarting postmaster 2011-09-21 16:57:05,691 30450 DEBUG Execute cmd: 'pg_ctlcluster 9.1 main restart' 2011-09-21 16:57:09,404 30450 DEBUG Execute cmd: 'ssh' '-Tn' '-o' 'Batchmode=yes' '-o' 'StrictHostKeyChecking=no' '127.0.0.1' '/usr/bin/walmgr3' '/etc/walshipping/9.1/main/standby.ini' 'setup' 2011-09-21 16:57:09,712 30450 INFO Done
postgres@squeeze64:~$ walmgr3 master.ini backup 2011-09-21 17:00:17,259 30702 INFO Backup lock obtained. 2011-09-21 17:00:17,277 30692 INFO Execute SQL: select pg_start_backup('FullBackup'); [port=5434 dbname=template1] 2011-09-21 17:00:17,791 30712 INFO Removing expired backup directory: /var/lib/postgresql/walshipping/9.1/main/data.master 2011-09-21 17:00:18,200 30692 INFO Checking tablespaces 2011-09-21 17:00:18,202 30692 INFO pg_log does not exist, skipping 2011-09-21 17:00:18,259 30692 INFO Backup conf files from /etc/postgresql/9.1/main 2011-09-21 17:00:18,590 30731 INFO First useful WAL file is: 000000010000000200000092 2011-09-21 17:00:19,901 30759 INFO Backup lock released. 2011-09-21 17:00:19,919 30692 INFO Full backup successful
:~$ walmgr3 /etc/walshipping/9.1/main/standby.ini listbackups
List of backups:
Backup set Timestamp Label First WAL
———————— ———— ———————— data.master 2011-09-21 17:00:17 CEST FullBackup 000000010000000200000092 </pre> <p>Following articles will show how to manage that archive and how to go from that to an <em>Hot Standby</em> fed by either <em>Streaming Replication</em> or <em>Archives</em>.</p> ]]></description>
<h2>New features</h2> <p class="first">Among the features you will find dependencies management and<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 21 Sep 2011 17:21:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/09/21-skytools-walmgr-part-1.html</guid> </item> <item> <title>el-get-3.1</title> <link>http://tapoueh.org/blog/2011/09/16-el-get-3.1.html</link> <description><![CDATA[<p>The <a href="https://github.com/dimitri/el-get">el-get</a> project releases its new stable version,
3.1. This new release fixes bugs, add a host of new recipes (we have 420 of them and counting) and some nice new features too. You really want to upgrade.</p>
M-x
el-get-list-packages, that you should try as soon as possible. Of course,
don't miss M-x el-get-self-update that eases the process somehow.</p>
<center>
<p><img src="../../../images/emacs-el-get-list-packages.png" alt=""></p>
</center>
<p>This shows the result of M-x el-get-list-packages. The packages that don't
have a description are the one from <a href="http://www.emacswiki.org/cgi-bin/wiki?action=index;match=%5C.(el|tar)(%5C.gz)%3F%24">emacswiki</a> that doesn't provide a listing
of the filename <em>and</em> the first line of the file (it usually follows the
format ;;; filename.el --- description here). As we don't want to mirror
the website just to be able to provide descriptions, we just don't have them
now.</p>
<p>Another nice new feature, contributed by a user that wanted to self-learn
<a href="http://www.gnu.org/software/emacs/manual/html_node/elisp/index.html">elisp</a>, is the el-get-user-package-directory support. Just place in there
some init-my-package.el files, and when <em>el-get</em> wants to init the my-package
package, it will load that file for you. That helps managing your setup,
and I'm already using that in my own ~/.emacs.d/ repository.</p>
<h2>Upgrading</h2>
<p class="first">The upgrading is to be done with some care, though, because you need to edit
your packaging setup. The el-get-sources variable used to be both where to
setup extra recipes and the list of packages you want to have installed, and
several people rightfully insisted that I should change that. I've been
slow to be convinced, but there it is, they were right.</p>
<p>So now, <a href="http://www.emacswiki.org/emacs/el-get">el-get</a> works from the current status of packages and will init all
those packages you have installed. Which means that you just M-x
el-get-install a package and don't think about it anymore. If you need to
override this behavior, it's still possible to do so by specifying the whole
list of packages you want initialized (and installed if necessary) on the
(el-get 'sync ...) call.</p>
<p>That later setup is useful if you want to share your el-get selection on
several machines.</p>
]]></description>
<p>Nous commençons à lire des articles qui reprennent la nouvelle dans la presse française, et j'ai le plaisir de mentionner celui de <a href="http://www.programmez.com/actualites.php?titre_actu=Sortie-de-PostgreSQL-91-!&id_actu=10190">programmez.com</a> qui annonce « un système d'extensions inégalé ». En tant que développeur des <a href="http://www.postgresql.org/docs/9.1/static/extend-extensions.html">Extensions</a> dans PostgreSQL, je ne peux qu'être non seulement d'accord avec eux, mais aussi flatté :)</p> <p>Bons tests à tous, et bonne mises à jour pour les plus chanceux !</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 16 Sep 2011 14:13:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/09/16-el-get-3.1.html</guid> </item> <item> <title>PostgreSQL 9.1</title> <link>http://tapoueh.org/blog/2011/09/19-sortie-de-9.1.html</link> <description><![CDATA[<p><a href="http://www.postgresql.org/about/news.1349">PostgreSQL 9.1</a> est dans les bacs ! Vous n'avez pas encore cette nouvelle version en production ? Pas encore évalué pourquoi vous devriez envisager de migrer à cette version ? Il existe beaucoup de bonnes raisons de passer à cette version, et peu de pièges.</p>
<p>Nous faisons face ici à un problème de sécurité très bien décrit dans le billet humoristique de <a href="http://xkcd.com/327/">Little Boby Tables</a>, dont je vous recommande la lecture. L'idée est simple, la mise en place de contre mesure fourmille de pièges subtils, à moins d'utiliser la solution décrite ci-après.</p> <center> <p><img src="http://imgs.xkcd.com/comics/exploits_of_a_mom.png" alt=""></p> </center> <p>Lorsque l'on envoie une requête SQL à PostgreSQL, celle-ci contient pêle-mêle un mélange de mots-clés SQL et de données utilisateurs. Dans la requête<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 14 Sep 2011 10:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/09/19-sortie-de-9.1.html</guid> </item> <item> <title>Éviter les injections SQL</title> <link>http://tapoueh.org/blog/2011/09/07-eviter-les-injections-sql.html</link> <description><![CDATA[<p>Nous avons parlé la dernière fois les règles d'<a href="http://tapoueh.org/blog/2011/08/18-echappements-de-chaine.html">échappement de chaînes</a> avec PostgreSQL, et mentionné qu'utiliser ces techniques afin de protéger les données insérées dans les requêtes SQL n'était pas une bonne idée dans la mesure où PostgreSQL offre une fonctionnalité bien plus adaptée.</p>
SELECT colname FROM table WHERE pk = 1234;
l'élément 1234 est une donnée fournie à PostgreSQL. Lorsque l'on utilise
d'autre types de données, on va parler de <em>litéral</em>, qui peut être ou non
<em>décoré</em>. Un exemple ?</p>
<pre class="src">
# SELECT <span style"color: #ad7fa8; font-style: italic;">'undecorated literal'</span>, pg_typeof(<span style="color: #ad7fa8; font-style: italic;">'undecoreted literal'</span>),
date <span style="color: #ad7fa8; font-style: italic;">'today'</span>, pg_typeof(date <span style="color: #ad7fa8; font-style: italic;">'today'</span>);
| ?column? | pg_typeof | date | pg_typeof |
<span style="color: #888a85;">———————+————+————+————
| </span> undecorated literal | unknown | 2011-09-07 | date |
(1 row) </pre>
<p>Outre l'aspect types de données (un litéral non décoré est de type <em>unknown</em> jusqu'à ce qu'une opération force son type, c'est ce qui permet d'avoir du polymorphisme dans PostgreSQL), nous voyons ici que PostgreSQL doit faire la différence entre le SQL lui-même et les paramètres qui le composent. Il sait bien sûr faire cela, il suffit d'encadrer les valeurs dans des simples guillemets ou bien d'utiliser la notation dite de <a href="http://docs.postgresqlfr.org/9.0/sql-syntax.html#sql-syntax-dollar-quoting">dollar quoting</a>. Mais si l'on ne prend pas de précautions, l'utilisateur peut terminer la séquence d'échappements depuis le champ de saisie du formulaire…</p> <p><a href="http://docs.postgresql.fr/9.1/libpq.html">libpq</a> est la librairie standard cliente de PostgreSQL et fourni des <em>API</em> de connexion et propose une fonction <a href="http://docs.postgresql.fr/9.1/libpq-exec.html#libpq-pqexecparams">PGexecParams</a>. Cette fonction expose un mécanisme disponible dans le protocole de communication de PostgreSQL lui-même : il est possible de faire parvenir le SQL et les données qu'il contient dans deux parties différentes du messages plutôt que de les mélanger. Ainsi, le serveur n'a plus du tout à deviner où commencent et où terminent les données dans la requête, il lui suffit de regarder dans le tableau séparé contenant les données quand il en a besoin.</p> <p>Terminées les injections SQL !</p> <p>Note : cette fonction est exposée dans la plupart des pilotes de connexion, et même en PHP, dont la popularité et l'exposition me poussent à donner une référence plus précise : utilisez <a href="http://fr2.php.net/manual/en/function.pg-query-params.php">pg_query_params</a>, son intérêt n'est pas simplement syntaxique, il va jusque dans la définition des échanges de données entre le client (votre code PHP) et le serveur (PostgreSQL).</p> ]]></description><p>Nous faisons face ici à un problème de sécurité très bien décrit dans le billet humoristique de <a href="http://xkcd.com/327/">Little Boby Tables</a>, dont je vous recommande la lecture. L'idée est simple, la mise en place de contre mesure fourmille de pièges subtils, à moins d'utiliser la solution décrite ci-après.</p> <center> <p><img src="http://imgs.xkcd.com/comics/exploits_of_a_mom.png" alt=""></p> </center> <p>Lorsque l'on envoie une requête SQL à PostgreSQL, celle-ci contient pêle-mêle un mélange de mots-clés SQL et de données utilisateurs. Dans la requête<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 07 Sep 2011 11:36:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/09/07-eviter-les-injections-sql.html</guid> </item> <item> <title>Éviter les injections SQL</title> <link>http://tapoueh.org/blog/2011/09/07-requete-parametree.html</link> <description><![CDATA[<p>Nous avons parlé la dernière fois les règles d'<a href="http://tapoueh.org/blog/2011/08/18-echappements-de-chaine.html">échappement de chaînes</a> avec PostgreSQL, et mentionné qu'utiliser ces techniques afin de protéger les données insérées dans les requêtes SQL n'était pas une bonne idée dans la mesure où PostgreSQL offre une fonctionnalité bien plus adaptée.</p>
SELECT colname FROM table WHERE pk = 1234;
l'élément 1234 est une donnée fournie à PostgreSQL. Lorsque l'on utilise
d'autre types de données, on va parler de <em>litéral</em>, qui peut être ou non
<em>décoré</em>. Un exemple ?</p>
<pre class="src">
# SELECT <span style"color: #ad7fa8; font-style: italic;">'undecorated literal'</span>, pg_typeof(<span style="color: #ad7fa8; font-style: italic;">'undecoreted literal'</span>),
date <span style="color: #ad7fa8; font-style: italic;">'today'</span>, pg_typeof(date <span style="color: #ad7fa8; font-style: italic;">'today'</span>);
| ?column? | pg_typeof | date | pg_typeof |
<span style="color: #888a85;">———————+————+————+————
| </span> undecorated literal | unknown | 2011-09-07 | date |
(1 row) </pre>
<p>Outre l'aspect types de données (un litéral non décoré est de type <em>unknown</em> jusqu'à ce qu'une opération force son type, c'est ce qui permet d'avoir du polymorphisme dans PostgreSQL), nous voyons ici que PostgreSQL doit faire la différence entre le SQL lui-même et les paramètres qui le composent. Il sait bien sûr faire cela, il suffit d'encadrer les valeurs dans des simples guillemets ou bien d'utiliser la notation dite de <a href="http://docs.postgresqlfr.org/9.0/sql-syntax.html#sql-syntax-dollar-quoting">dollar quoting</a>. Mais si l'on ne prend pas de précautions, l'utilisateur peut terminer la séquence d'échappements depuis le champ de saisie du formulaire…</p> <p><a href="http://docs.postgresql.fr/9.1/libpq.html">libpq</a> est la librairie standard cliente de PostgreSQL et fourni des <em>API</em> de connexion et propose une fonction <a href="http://docs.postgresql.fr/9.1/libpq-exec.html#libpq-pqexecparams">PGexecParams</a>. Cette fonction expose un mécanisme disponible dans le protocole de communication de PostgreSQL lui-même : il est possible de faire parvenir le SQL et les données qu'il contient dans deux parties différentes du messages plutôt que de les mélanger. Ainsi, le serveur n'a plus du tout à deviner où commencent et où terminent les données dans la requête, il lui suffit de regarder dans le tableau séparé contenant les données quand il en a besoin.</p> <p>Terminées les injections SQL !</p> <p>Note : cette fonction est exposée dans la plupart des pilotes de connexion, et même en PHP, que la popularité et l'exposition me poussent à donner une référence plus précise : utilisez <a href="http://fr2.php.net/manual/en/function.pg-query-params.php">pg_query_params</a>, son intérêt n'est pas simplement syntaxique, il va jusque dans la définition des échanges de données entre le client (votre code PHP) et le serveur (PostgreSQL).</p> ]]></description><p>We're now thinking to support the<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 07 Sep 2011 11:36:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/09/07-requete-parametree.html</guid> </item> <item> <title>PostgreSQL and debian</title> <link>http://tapoueh.org/blog/2011/09/05-apt-postgresql-org.html</link> <description><![CDATA[<p>After talking about it for a very long time, work finally did begin! I'm talking about the <a href="https://github.com/dimitri/apt.postgresql.org">apt.postgresql.org</a> build system that will allow us, in the long run, to propose
debianversions of binary packages for <a href="http://www.postgresql.org/">PostgreSQL</a> and its extensions, compiled for a bunch of debian and ubuntu versions.</p>
i386 and amd64 architectures for lenny,
squeeze, wheezy and sid, and also for maverick and natty, maybe oneiric too
while at it.</p>
<p>It's still the very beginning of the effort, and it was triggered by the
decision to move sid to 9.1. While it's a good decision in itself, I still
hate to have to pick only one PostgreSQL version per debian stable release
when we have all the technical support we need to be able to support all
stable releases that <em>upstream</em> is willing to maintain. If you've been living
under a rock, or if you couldn't care less about debian choices, the problem
here for debian is ensuring security (and fixes) updates for PostgreSQL —
they promise they will handle the job just fine in the social contract, and
don't want to have to it without support from PostgreSQL if a <em>debian stable</em>
release contains a deprecated PostgreSQL version.</p>
<p>That opens the door for PostgreSQL community to handle the packaging of its
solutions as a service to its debian users. We intend to open with support
for 8.4, 9.0 and 9.1, and maybe 8.3 too, as <a href="http://qa.debian.org/developer.php?login=myon">Christoph Berg</a> is doing good
progress on this front. See, it's teamwork here!</p>
<p>We still have more work to do, and setting up the build environment so that
we are able to provide the packages for so much targets will indeed be
interesting. Getting there, a step after another.</p>
]]></description>
<p>The<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 05 Sep 2011 17:14:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/09/05-apt-postgresql-org.html</guid> </item> <item> <title>pg_restore -L & pg_staging</title> <link>http://tapoueh.org/blog/2011/08/29-pgstaging-and-pgrestore-listing.html</link> <description><![CDATA[<p>On the <a href="http://archives.postgresql.org/pgsql-hackers">PostgreSQL Hackers</a> mailing lists, <a href="http://people.planetpostgresql.org/andrew/">Andrew Dunstan</a> just proposed some new options for
pg_dumpandpg_restoreto ease our lives. One of the answers was talking about some scripts available to exploit the <a href="http://www.postgresql.org/docs/9.0/static/app-pgrestore.html">pg_restore</a> listing that you play with using options-land-L, or the long name versions--listand--use-list. The <a href="../../../pgsql/pgstaging.html">pg_staging</a> tool allows you to easily exploit those lists too.</p>
pg_restore list is just a listing of one object per line of all objects
contained into a <em>custom</em> dump, that is one made with pg_dump -Fc. You can
then tweak this listing in order to comment out some objects (prepending a ;
to the line where you find it), and give your hacked file back to pg_restore
--use-list so that it will skip them.</p>
<p>What's pretty useful here, among other things, is that a table will have in
fact more than one line in the listing. One is for the TABLE definition,
another one for the TABLE DATA. So that pg_staging is able to provide you
with options for only restoring some <em>schemas</em>, some <em>schemas_nodata</em> and even
some <em>tablename_nodata_regexp</em>, to use directly the configuration options
names.</p>
<p>How to do a very simple exclusion of some table's data when restoring a
dump, will you ask me? There we go. Let's first prepare an environment,
where I have only a <a href="http://www.postgresql.org/">PostgreSQL</a> server running.</p>
<pre class="src">
$ git clone git://github.com/dimitri/pg_staging.git
$ git clone git://github.com/dimitri/pgloader.git
$ for s in /.sql; do psql -f $s; done
$ pg_dump -Fc > pgloader.dump
</pre>
<p>Now I have a dump with some nearly random SQL objects in it, let's filter
out the tables named <em>reformat</em> and <em>parallel</em> from that. We will take the
sample setup from the pg_staging project. Going the quick route, we will
not even change the default sample database name that's used, which is
postgres. After all, the catalog command of pg_staging that we're using
here is a <em>developer</em> command, you're supposed to be using pg_staging for a
lot more services that just this one.</p>
<pre class="src">
$ cp pg_staging/pg_staging.ini .
$ (echo <span style="color: #bc8f8f;">"schemas = public"</span>;
echo <span style="color: #bc8f8f;">"tablename_nodata_regexp = parallel,reformat"</span>) \ >> pg_staging.ini $ echo <span style="color: #bc8f8f;">"catalog postgres pgloader.dump"</span> \
| python pg_staging/pg_staging.py -c pg_staging.ini |
<p>We can see that the objects indeed are skipped, now how to really go about the; Archive created at Mon Aug 29 17:17:49 2011 ; ; [EDITED OUTPUT] ; ; Selected TOC Entries: ; 3; 2615 2200 SCHEMA - public postgres 1864; 0 0 COMMENT - SCHEMA public postgres 1536; 1259 174935 TABLE public parallel dimitri 1537; 1259 174943 TABLE public partial dimitri 1538; 1259 174951 TABLE public reformat dimitri ;1853; 0 174935 TABLE DATA public parallel dimitri 1854; 0 174943 TABLE DATA public partial dimitri ;1855; 0 174951 TABLE DATA public reformat dimitri 1834; 2606 174942 CONSTRAINT public parallel_pkey dimitri 1836; 2606 174950 CONSTRAINT public partial_pkey dimitri 1838; 2606 174955 CONSTRAINT public reformat_pkey dimitri </pre>
pg_restore is like that:</p>
<pre class="src">
$ createdb foo
$ echo <span style="color: #bc8f8f;">"catalog postgres pgloader.dump"</span> \
<p>The little bonus with using|python pg_staging/pg_staging.py -c pg_staging.ini > short.list $ pg_restore -L short.list -d foo pgloader.dump </pre>
pg_staging is that when filtering out a <em>schema</em>
it will track all tables and triggers from that schema, and also the
functions used in the trigger definition. Which is not as easy as it
sounds, believe me!</p>
<p>The practical use case is when filtering out PGQ and Londiste, then the PGQ
triggers will automatically be skipped by pg_staging rather than polluting
the pg_restore logs because the CREATE TRIGGER command could not find the
necessary implementation procedure.</p>
]]></description>
<p>Here are the slides from the <a href="http://www.char11.org/">CHAR(11)</a> talk I made last month, about that very subject:</p> <center> <p><a class="image-link" href="../../../images/confs/CHAR_2011_Skytools3.pdf"> <img src="../../../images/confs/CHAR_2011_Skytools3.png"></a></p> </center> <p>The new version comes with a lot of new features.<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 29 Aug 2011 18:05:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/29-pgstaging-and-pgrestore-listing.html</guid> </item> <item> <title>Skytools, version 3</title> <link>http://tapoueh.org/blog/2011/08/26-skytools3.html</link> <description><![CDATA[<p>You can find <a href="http://packages.debian.org/source/experimental/skytools3">skytools3</a> in debian experimental already, it's in <em>release candidate</em> status. What's missing is the documentation, so here's an idea: I'm going to make a blog post series about <a href="https://github.com/markokr/skytools">skytools</a> next features, how to use them, what they are good for, etc. This first article of the series will just list what are those new features.</p>
PGQ now is able to
duplicate the queue events from one node to the next, so that it's able to
manage <em>switching over</em>. To do that we have three types of nodes now, <em>root</em>,
<em>branch</em> and <em>leaf</em>. PGQ also supports <em>cooperative consumers</em>, meaning that you
can share the processing load among many <em>consumers</em>, or workers.</p>
<p>Londiste now benefits from the <em>switch over</em> feature, and is packed with new
little features like add <table> --create, the new --trigger-flags argument,
and the new --handler thing (to do e.g. partial table replication). Let's
not forget the much awaited execute <script> command that allows to include
DDL commands into the replication stream, nor the <em>parallel</em> COPY support that
will boost your initial setup.</p>
<p>walmgr in the new version behaves correctly when using <a href="http://www.postgresql.org">PostgreSQL</a> 9.0.
Meaning that as soon as no more <em>WAL</em> files are available in the archives, it
returns an error code to the <em>archiver</em> so that the server switches to
<em>streaming</em> live from the primary_conninfo, then back to replaying the files
from the archive if the connection were to fail, etc. All in all, it just
works.</p>
<p>Details to follow here, stay tuned!</p>
]]></description>
<p>Rather than talking about what <em>pgfincore</em> is all about (<em>A set of functions to manage pages in memory from PostgreSQL</em>), I will talk about its packaging and support as a <em>debian package</em>. Here's the first example of a modern multi-version packaging I have to offer. <a href="https://github.com/dimitri/pgfincore/tree/master/debian">pgfincore packaging</a> supports building for<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 26 Aug 2011 21:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/26-skytools3.html</guid> </item> <item> <title>pgfincore in debian</title> <link>http://tapoueh.org/blog/2011/08/19-pgfincore-in-debian.html</link> <description><![CDATA[<p>As of pretty recently, <a href="http://villemain.org/projects/pgfincore">pgfincore</a> is now in debian, as you can see on its <a href="http://packages.debian.org/sid/postgresql-9.0-pgfincore">postgresql-9.0-pgfincore</a> page. The reason why it entered the <a href="http://www.debian.org/">debian</a> archives is that it reached the
1.0release!</p>
8.4 and 9.0 and 9.1 out of the box, even if the only binary
you'll find in <em>debian</em> sid is the 9.0 one, as you can check on the
<a href="http://packages.debian.org/source/sid/pgfincore">pgfincore debian source package</a> page.</p>
<p>Also, this is the first package I've done properly using the newer version
of <a href="http://kitenet.net/~joey/code/debhelper/">debhelper</a>, which make the <a href="https://github.com/dimitri/pgfincore/blob/master/debian/rules">debian/rules</a> file easier than ever. Let's have
a look at it:</p>
<pre class="src">
<span style="color: #b8860b;">SRCDIR</span> = $(<span style="color: #b8860b;">CURDIR</span>)
<span style="color: #b8860b;">TARGET</span> = $(<span style="color: #b8860b;">CURDIR</span>)/debian/pgfincore-%v
| <span style="color: #b8860b;">PKGVERS</span> = $(<span style="color: #b8860b;">shell</span> dpkg-parsechangelog | awk -F <span style="color: #bc8f8f;">'[:-]'</span> <span style="color: #bc8f8f;">'/^Version:/ { print substr($$2, 2) }'</span>) |
<span style="color: #b8860b;">EXCLUDE</span> = —exclude-vcs —exclude=debian
<span style="color: #7f007f;">include</span> <span style="color: #b8860b;">/usr/share/postgresql-common/pgxs_debian_control.mk</span>
<span style="color: #0000ff;">override_dh_auto_clean</span>: debian/control
pg_buildext clean $(<span style="color: #b8860b;">SRCDIR</span>) $(<span style="color: #b8860b;">TARGET</span>) <span style="color: #bc8f8f;">"$(</span><span style="color: #b8860b;">CFLAGS</span><span style="color: #bc8f8f;">)"</span> dh_clean
<span style="color: #0000ff;">override_dh_auto_build</span>: <span style="background-color: #ff69b4;"> #</span><span style="color: #b22222;"> </span><span style="color: #b22222;">build all supported version </span> pg_buildext build $(<span style="color: #b8860b;">SRCDIR</span>) $(<span style="color: #b8860b;">TARGET</span>) <span style="color: #bc8f8f;">"$(</span><span style="color: #b8860b;">CFLAGS</span><span style="color: #bc8f8f;">)"</span>
<span style="color: #0000ff;">override_dh_auto_install</span>: <span style="background-color: #ff69b4;"> #</span><span style="color: #b22222;"> </span><span style="color: #b22222;">then install each of them </span> for v in <span style="color: #bc8f8f;">`pg_buildext supported-versions $(</span><span style="color: #b8860b;">SRCDIR</span><span style="color: #bc8f8f;">)`</span>; do \
dh_install -ppostgresql-$$v-pgfincore ;\ done
<span style="color: #0000ff;">orig</span>: clean
cd .. && tar czf pgfincore_$(<span style="color: #b8860b;">PKGVERS</span>).orig.tar.gz $(<span style="color: #b8860b;">EXCLUDE</span>) pgfincore
<span style="color: #0000ff;">%</span>:
dh <span style="color: #0000ff;">$</span><span style="color: #5f9ea0;">@</span> </pre>
debian/rules file is known to be the corner stone of your debian
packaging, and usually is the most complex part of it. It's a Makefile at
its heart, and we can see that thanks to the debhelper magic it's not that
complex to maintain anymore.</p>
<p>Then, this file is using support from a bunch of helpers command, each of
them comes with its own man page and does a little part of the work. The
overall idea around debhelper is that what it does covers 90% of the cases
around, and it's not aiming for more. You have to <em>override</em> the parts where
it defaults to being wrong.</p>
<p>Here for example the build system has to produce files for all three
supported versions of <a href="http://www.postgresql.org/">PostgreSQL</a>, which means invoking the same build system
three time with some changes in the <em>environment</em> (mainly setting the
PG_CONFIG variable correctly). But even for that we have a <em>debian</em> facility,
that comes in the package <a href="http://packages.debian.org/sid/postgresql-server-dev-all">postgresql-server-dev-all</a>, called pg_buildext. As
long as your extension build system is VPATH friendly, it's all automated.</p>
<p>Please read that last sentence another time. VPATH is the thing that allows
Make to find your source tree somewhere in the system, not in the current
working directory. That allows you to cleanly build the same sources in
different build locations, which is exactly what we need here, and is
cleanly supported by <a href="http://www.postgresql.org/docs/9.1/static/extend-pgxs.html">PGXS</a>, the <a href="http://www.postgresql.org/docs/9.1/static/extend-pgxs.html">PostgreSQL Extension Building Infrastructure</a>.</p>
<p>Which means that the main Makefile of <em>pgfincore</em> had to be simplified, and
the code layout too. Some advances Make features such as $(wildcard ...)
and all will not work here. See what we got at the end:</p>
<pre class="src">
ifndef VPATH
<span style="color: #b8860b;">SRCDIR</span> = .
else
<span style="color: #b8860b;">SRCDIR</span> = $(<span style="color: #b8860b;">VPATH</span>)
endif
<span style="color: #b8860b;">EXTENSION</span> = pgfincore
| <span style="color: #b8860b;">EXTVERSION</span> = $(<span style="color: #b8860b;">shell</span> grep default_version $(<span style="color: #b8860b;">SRCDIR</span>)/$(<span style="color: #b8860b;">EXTENSION</span>).control | \ |
sed -e <span style="color: #bc8f8f;">"s/default_version:space:*=:space:*'\([^']*\)'/\1/"</span>)
<span style="color: #b8860b;">MODULES</span> = $(<span style="color: #b8860b;">EXTENSION</span>) <span style="color: #b8860b;">DATA</span> = sql/pgfincore.sql sql/uninstall_pgfincore.sql <span style="color: #b8860b;">DOCS</span> = doc/README.$(<span style="color: #b8860b;">EXTENSION</span>).rst
<span style="color: #b8860b;">PG_CONFIG</span> = pg_config
| <span style="color: #b8860b;">PG91</span> = $(<span style="color: #b8860b;">shell</span> $(<span style="color: #b8860b;">PG_CONFIG</span>) —version | grep -qE <span style="color: #bc8f8f;">"8\.|9\.0"</span> && echo no | echo yes) |
|---|
ifeq ($(<span style="color: #b8860b;">PG91</span>),yes) <span style="color: #0000ff;">all</span>: pgfincore—$(<span style="color: #b8860b;">EXTVERSION</span>).sql
<span style="color: #0000ff;">pgfincore—$(</span><span style="color: #0000ff;">EXTVERSION</span><span style="color: #0000ff;">).sql</span>: sql/pgfincore.sql
cp $<span style="color: #5f9ea0;"><</span> <span style="color: #0000ff;">$</span><span style="color: #5f9ea0;">@</span>
<span style="color: #b8860b;">DATA</span> = pgfincore—unpackaged—$(<span style="color: #b8860b;">EXTVERSION</span>).sql pgfincore—$(<span style="color: #b8860b;">EXTVERSION</span>).sql <span style="color: #b8860b;">EXTRA_CLEAN</span> = sql/$(<span style="color: #b8860b;">EXTENSION</span>)—$(<span style="color: #b8860b;">EXTVERSION</span>).sql endif
<span style="color: #b8860b;">PGXS</span> := $(<span style="color: #b8860b;">shell</span> $(<span style="color: #b8860b;">PG_CONFIG</span>) —pgxs) <span style="color: #7f007f;">include</span> $(<span style="color: #b8860b;">PGXS</span>)
<span style="color: #0000ff;">deb</span>:
dh clean make -f debian/rules orig debuild -us -uc -sa </pre>
Make magic to find source files. Franckly though, when your sources
are 1 c file and 2 sql files, you don't need that much magic anyway. You
just want to believe that a single generic Makefile will happily build any
project you throw at it, only requiring minor adjustment. Well, the reality
is that you might need some more little adjustments if you want to benefit
from VPATH building, and having the binaries for 8.4 and 9.0 and 9.1 built
seemlessly in a simple loop. Like we have here for <em>pgfincore</em>.</p>
<p>Now the Makefile still contains a little bit of magic, in order to parse the
extension version number from its <em>control file</em> and produce a <em>script</em> named
accordingly. Then you'll notice a difference between the
<a href="https://github.com/dimitri/pgfincore/blob/master/debian/postgresql-9.1-pgfincore.install">postgresql-9.1-pgfincore.install</a> file and the
<a href="https://github.com/dimitri/pgfincore/blob/master/debian/postgresql-9.0-pgfincore.install">postgresql-9.0-pgfincore.install</a>. We're just not shipping the same files:</p>
<pre class="src">
debian/pgfincore-9.0/pgfincore.so usr/lib/postgresql/9.0/lib
sql/pgfincore.sql usr/share/postgresql/9.0/contrib
sql/uninstall_pgfincore.sql usr/share/postgresql/9.0/contrib
</pre>
<p>As you can see here:</p>
<pre class="src">
debian/pgfincore-9.1/pgfincore.so usr/lib/postgresql/9.1/lib
debian/pgfincore-9.1/pgfincore*.sql usr/share/postgresql/9.1/extension
sql/pgfincore—unpackaged—1.0.sql usr/share/postgresql/9.1/extension
</pre>
<p>So, now that we uncovered all the relevant magic, packaging and building
your next extension so that it supports as many PostgreSQL major releases as
you need to will be that easy.</p>
<p>For reference, you might need to also tweak
/usr/share/postgresql-common/supported-versions so that it allows you to
build for all those versions you claim to support in the <a href="https://github.com/dimitri/pgfincore/blob/master/debian/pgversions">debian/pgversions</a>
file.</p>
<pre class="src">
$ sudo dpkg-divert \
—divert /usr/share/postgresql-common/supported-versions.distrib \
—rename /usr/share/postgresql-common/supported-versions
$ cat /usr/share/postgresql-common/supported-versions /bin/bash
dpkg -l postgresql-server-dev-* \ | awk -F '[ -]' '/^ii/ && ! /server-dev-all/ {print $6}' </pre>
<p>All of this will come pretty handy when we finally sit down and work on a way to provide binary packages for PostgreSQL and its extensions, and all supported versions of those at that. This very project is not dead, it's just sleeping some more.</p> ]]></description><p>En effet, l'utilisation d'échappements avec le caractère « anti-slash » n'est pas conforme au standard SQL. Le paramètre<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 19 Aug 2011 23:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/19-pgfincore-in-debian.html</guid> </item> <item> <title>Échappement de chaînes</title> <link>http://tapoueh.org/blog/2011/08/18-echappements-de-chaine.html</link> <description><![CDATA[<p>Parmis les nouveautés de la <a href="http://www.postgresql.org/about/news.1331">prochaine version</a> de <a href="http://www.postgresql.org/">PostgreSQL</a>, la fameuse
9.1, il faut signaler le changement de valeur par défaut de la variablestandard_conforming_strings, qui passe à <em>vraie</em>.</p>
standard_conforming_strings permet de contrôler le comportement de
PostgreSQL lorsqu'il lit une chaîne de caractère dans une requête SQL.</p>
<p>Voyons quelques exemples :</p>
<pre class="src">
dimitri=# set standard_conforming_strings to true;
SET
dimitri=# select 'hop''';
?column?
hop' (1 ligne)
dimitri=# select 'hop\''; dimitri'# ';
?column?
hop\';
(1 ligne)
dimitri=# select E'hop\'';
?column?
hop' (1 ligne)
dimitri=# set standard_conforming_strings to false; SET dimitri=# select E'hop\'';
?column?
hop' (1 ligne)
dimitri=# select 'hop\''; ATTENTION: utilisation non standard de \' dans une chaîne littérale LIGNE 1 : select 'hop\'';
^ ASTUCE : Utilisez '' pour écrire des guillemets dans une chaîne ou utilisez la syntaxe de chaîne d'échappement (E'...'). ?column?
hop' (1 ligne) </pre>
standard_conforming_strings, c'est la notation préfixée avec E. Il est
recommandé de toujours l'utiliser dès lors que la chaîne de caractère
contient des « anti-slash » utilisés comme échappement (du caractère simple
guillemet en général).</p>
<p>Le paramètre escape_string_warning, enfin, permet de désactiver les
avertissements tels que présentés dans le dernier exemple ci-dessus,
lorsqu'il est positionné à off. Bien sûr, sa valeur par défaut est on.</p>
<p>Toute apparition de ce <em>WARNING</em> lorsque escape_string_warning est on signifie
que votre application n'est pas prête à migrer à 9.1 avec son paramétrage
par défaut. Il existe deux actions possible : changer le paramétrage de sa
nouvelle valeur par défaut à sa précédente, ou bien corriger ses
applications pour utiliser le préfixe E dès que cela est nécessaire.</p>
<p>L'utilisation de standard_conforming_strings à on présente un autre avantage
au respect du standard SQL : la sécurité contre les injections. S'il n'est
pas possible d'échapper le guillemet simple qui termine toute chaîne de
caractère utilisateur, il devient compliqué de jouer au plus malin avec le
<em>parser</em>. Le mieux ici reste bien sûr d'utiliser les requêtes paramétrées, à
suivre dans un prochain article.</p>
]]></description>
<p>Well, the joy of Open Source & Free Software (pick your own poison). <a href="https://github.com/jglee1027">jglee1027</a> is this <em>GitHub</em> guy who did offer an implementation of said facility, and who added descriptions for almost all of the now<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 18 Aug 2011 19:01:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/18-echappements-de-chaine.html</guid> </item> <item> <title>el-get-list-packages</title> <link>http://tapoueh.org/blog/2011/08/18-el-get-list-packages.html</link> <description><![CDATA[<p>From the first days of <a href="../../../emacs/el-get.html">el-get</a> is was quite clear for me that we would reach a point where users would want a nice listing including descriptions of the packages, and a <em>major mode</em> allowing you to select packages to install, remove and update. It was also quite clear that I was not much interested into doing it myself, even if I would appreciate having it done.</p>
402 recipes
that we have included with <a href="../../../emacs/el-get.html">el-get</a>.</p>
<p>Here's an image of what you get:</p>
<center>
<p><img src="../../../images/emacs-el-get-list-packages.png" alt=""></p>
</center>
<p>The packages with no description are fetched by M-x el-get-emacswiki-refresh
which will not download all <a href="http://emacswiki.org">emacswiki</a> content locally just so that it can
parse the scripts's header and have a local description. Maybe it's time to
ask for another page over there like <a href="http://www.emacswiki.org/cgi-bin/wiki?action=index;match=%5C.(el%7Ctar)(%5C.gz)%3F%24">emacswiki page index</a> but containing the
first line too.</p>
<p>For recipes we offer, this first line often looks like the following:</p>
<pre class="src">
<span style="color: #b22222;">;;; </span><span style="color: #b22222;">123-menu.el — Simple menuing system, reminiscent of Lotus 123 in DOS
</span></pre>
<p>Of course some files over there are not following the stanza, but that would
be good enough already.</p>
<p>All in all, I hope you enjoy M-x el-get-list-packages!</p>
]]></description>
<p>The basic situation where you need to do so is adding an <em>origin</em> field to your table. The value of that is not to be found in the data file itself, typically, but known in the pgloader setup. That could even be the<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 18 Aug 2011 18:10:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/18-el-get-list-packages.html</guid> </item> <item> <title>Tutoriel pgloader</title> <link>http://tapoueh.org/blog/2011/08/15-tutoriel-pgloader.html</link> <description><![CDATA[<p>En reprenant le contenu des articles de la série sur <a href="http://tapoueh.org/pgsql/pgloader.html">pgloader</a>, j'ai pris le temps de compiler un tutoriel complet, en anglais. Si j'en crois les quelques mails que je reçois régulièrement au sujet de
pgloaderdepuis quelques années maintenant, cela devrait aider les nouveaux utilisateurs.</p> ]]></description> <author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 15 Aug 2011 15:39:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/15-tutoriel-pgloader.html</guid> </item> <item> <title>pgloader tutorial</title> <link>http://tapoueh.org/blog/2011/08/15-pgloader-tutorial.html</link> <description><![CDATA[<p>To finish up the pgloader series, I've compiled all the information into a single page, the long awaited <a href="http://tapoueh.org/pgsql/pgloader.html#sec5">pgloader tutorial</a>. That should help lots of users to get started withpgloader.</p> ]]></description> <author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 15 Aug 2011 15:33:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/15-pgloader-tutorial.html</guid> </item> <item> <title>pgloader constant cols</title> <link>http://tapoueh.org/blog/2011/08/12-pgloader-udc.html</link> <description><![CDATA[<p>The previous articles in the <a href="../../../pgsql/pgloader.html">pgloader</a> series detailed <a href="http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html">How To Use PgLoader</a> then <a href="http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html">How to Setup pgloader</a>, then what to expect from a <a href="http://tapoueh.org/blog/2011/08/01-parallel-pgloader.html">parallel pgloader</a> setup, and then <a href="http://tapoueh.org/blog/2011/08/05-reformating-modules-for-pgloader.html">pgloader reformating</a>. Another need you might encounter when you get to use <a href="../../../pgsql/pgloader.html">pgloader</a> is adding <em>constant</em> values into a table's column.</p>
filename
you are importing data from.</p>
<p>In <a href="../../../pgsql/pgloader.html">pgloader</a> that's called a <em>user defined column</em>. Here's what the relevant
<a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> setup looks like:</p>
<pre class="src">
[<span style="color: #228b22;">udc</span>]
<span style="color: #b8860b;">table</span> = udc
<span style="color: #b8860b;">format</span> = text
<span style="color: #b8860b;">filename</span> = udc/udc.data
<span style="color: #b8860b;">input_encoding</span> = <span style="color: #bc8f8f;">'latin1'</span>
<span style="color: #b8860b;">field_sep</span> = %
<span style="color: #b8860b;">columns</span> = b:2, d:1, x:3, y:4
<span style="color: #b8860b;">udc_c</span> = constant value
<span style="color: #b8860b;">copy_columns</span> = b, c, d
</pre>
<p>And the data file is:</p>
<pre class="src">
1%5%foo%bar
2%10%bar%toto
3%4%toto%titi
4%18%titi%baz
5%2%baz%foo
</pre>
<p>And here's what the loaded table looks like:</p>
<pre class="src">
pgloader/examples$ pgloader -Tsc pgloader.conf udc
| Table name | duration | size | copy rows | errors |
====================================================================
| udc | 0.201s | - | 5 | 0 |
pgloader/examples$ psql —cluster 8.4/main pgloader -c <span style="color: #bc8f8f;">"table udc"</span>
| b | c | d |
+—————-+—
| 5 | constant value | 1 |
| 10 | constant value | 2 |
| 4 | constant value | 3 |
| 18 | constant value | 4 |
| 2 | constant value | 5 |
(5 rows) </pre>
<p>Of course the configuration is not so straightforward as to process fields in the data file in the order that they appear, after all the <a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> are also a test suite.</p> <p>Long story short: if you need to add some <em>constant</em> values into the target table you're loading data to, <a href="../../../pgsql/pgloader.html">pgloader</a> will help you there!</p> ]]></description><p>The basic situation where you need to do so is adding an <em>origin</em> field to your table. The value of that is not to be found in the data file itself, typically, but known in the pgloader setup. That could even be the<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 12 Aug 2011 11:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/12-pgloader-udc.html</guid> </item> <item> <title>pgloader constant cols</title> <link>http://tapoueh.org/blog/2011/08/12-pgloader-udc.html</link> <description><![CDATA[<p>The previous articles in the <a href="../../../pgsql/pgloader.html">pgloader</a> series detailed <a href="http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html">How To Use PgLoader</a> then <a href="http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html">How to Setup pgloader</a>, then what to expect from a <a href="http://tapoueh.org/blog/2011/08/01-parallel-pgloader.html">parallel pgloader</a> setup, and then <a href="http://tapoueh.org/blog/2011/08/05-reformating-modules-for-pgloader.html">pgloader reformating</a>. Another need you might encounter when you get to use <a href="../../../pgsql/pgloader.html">pgloader</a> is adding <em>constant</em> values into a table's column.</p>
filename
you are importing data from.</p>
<p>In <a href="../../../pgsql/pgloader.html">pgloader</a> that's called a <em>user defined column</em>. Here's what the relevant
<a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> setup looks like:</p>
<pre class="src">
[<span style="color: #8ae234; font-weight: bold;">udc</span>]
<span style="color: #eeeeec;">table</span> = udc
<span style="color: #eeeeec;">format</span> = text
<span style="color: #eeeeec;">filename</span> = udc/udc.data
<span style="color: #eeeeec;">input_encoding</span> = <span style="color: #ad7fa8; font-style: italic;">'latin1'</span>
<span style="color: #eeeeec;">field_sep</span> = %
<span style="color: #eeeeec;">columns</span> = b:2, d:1, x:3, y:4
<span style="color: #eeeeec;">udc_c</span> = constant value
<span style="color: #eeeeec;">copy_columns</span> = b, c, d
</pre>
<p>And the data file is:</p>
<pre class="src">
1%5%foo%bar
2%10%bar%toto
3%4%toto%titi
4%18%titi%baz
5%2%baz%foo
</pre>
<p>And here's what the loaded table looks like:</p>
<pre class="src">
pgloader/examples$ pgloader -Tsc pgloader.conf udc
| Table name | duration | size | copy rows | errors |
====================================================================
| udc | 0.201s | - | 5 | 0 |
pgloader/examples$ psql —cluster 8.4/main pgloader -c <span style="color: #ad7fa8; font-style: italic;">"table udc"</span>
| b | c | d |
+—————-+—
| 5 | constant value | 1 |
| 10 | constant value | 2 |
| 4 | constant value | 3 |
| 18 | constant value | 4 |
| 2 | constant value | 5 |
(5 rows) </pre>
<p>Of course the configuration is not so straightforward as to process fields in the data file in the order that they appear, after all the <a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> are also a test suite.</p> <p>Long story short: if you need to add some <em>constant</em> values into the target table you're loading data to, <a href="../../../pgsql/pgloader.html">pgloader</a> will help you there!</p> ]]></description><p>Some users are even starting Emacs often enough for the startup time to be a concern. With an<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 12 Aug 2011 11:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/12-pgloader-udc.html</guid> </item> <item> <title>Emacs Startup</title> <link>http://tapoueh.org/blog/2011/08/blog/2011/08/06-emacs-startup-notification.html</link> <description><![CDATA[<p>Using <a href="http://www.gnu.org/software/emacs/">Emacs</a> we get to manage a larger and larger setup file (either
~/.emacsor~/.emacs.d/init.el), sometime with lots of dependencies, and some sub-files thanks to theloadfunction or theprovideandrequiremechanism.</p>
emacs-uptime (yes it's a command, you can M-x
emacs-uptime) of days to weeks (10 days, 17 hours, 45 minutes, 34 seconds as
of this writing), it's not something I really care about much.</p>
<p>But I know that some <a href="http://tapoueh.org/emacs/el-get.html">el-get</a> users still do care, and will use el-get-is-lazy
and do all their Emacs tweaking as eval-after-load blocks. Trying to have
an idea of how much a <em>worst case</em> startup with <a href="http://www.emacswiki.org/emacs/el-get">el-get</a> is, I have added the
following piece of elisp at the very end of my startup code:</p>
<pre class="src">
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">dim:notify-startup-done</span> ()
<span style="color: #bc8f8f;">" notify user that Emacs is now ready"</span> (el-get-notify <span style="color: #bc8f8f;">"Emacs is ready."</span> (format <span style="color: #bc8f8f;">"The init sequence took %g seconds."</span> (float-time (time-subtract after-init-time before-init-time)))))
(add-hook 'after-init-hook 'dim:notify-startup-done) </pre>
<p>Theel-get-notify function will adapt and either use the dbus implementation
from Emacs 24, or <a href="http://www.emacswiki.org/emacs/notify.el">notify.el</a> from <a href="http://www.emacswiki.org/">EmacsWiki</a> (just M-x el-get-install it if
you need it), or will use its own implementation of an Emacs <a href="http://growl.info/">Growl</a> client
(it's about 5 lines long), and baring all of that will use the message
function.</p>
<p>The reason I say <em>worst case</em> is that I have a lot of packages to initialize
at startup, and that I did absolutely no effort for this initializing to be
quick. Still, my Emacs setup is taking about 20 seconds to boot. Pretty
good I would say, for a weekly operation.</p>
]]></description>
<p>Some users are even starting Emacs often enough for the startup time to be a concern. With an<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Sat, 06 Aug 2011 14:58:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/blog/2011/08/06-emacs-startup-notification.html</guid> </item> <item> <title>Emacs Startup</title> <link>http://tapoueh.org/blog/2011/08/06-emacs-startup-notification.html</link> <description><![CDATA[<p>Using <a href="http://www.gnu.org/software/emacs/">Emacs</a> we get to manage a larger and larger setup file (either
~/.emacsor~/.emacs.d/init.el), sometime with lots of dependencies, and some sub-files thanks to theloadfunction or theprovideandrequiremechanism.</p>
emacs-uptime (yes it's a command, you can M-x
emacs-uptime) of days to weeks (10 days, 17 hours, 45 minutes, 34 seconds as
of this writing), it's not something I really care about much.</p>
<p>But I know that some <a href="http://tapoueh.org/emacs/el-get.html">el-get</a> users still do care, and will use el-get-is-lazy
and do all their Emacs tweaking as eval-after-load blocks. Trying to have
an idea of how much a <em>worst case</em> startup with <a href="http://www.emacswiki.org/emacs/el-get">el-get</a> is, I have added the
following piece of elisp at the very end of my startup code:</p>
<pre class="src">
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">dim:notify-startup-done</span> ()
<span style="color: #bc8f8f;">" notify user that Emacs is now ready"</span> (el-get-notify <span style="color: #bc8f8f;">"Emacs is ready."</span> (format <span style="color: #bc8f8f;">"The init sequence took %g seconds."</span> (float-time (time-subtract after-init-time before-init-time)))))
(add-hook 'after-init-hook 'dim:notify-startup-done) </pre>
<p>Theel-get-notify function will adapt and either use the dbus implementation
from Emacs 24, or <a href="http://www.emacswiki.org/emacs/notify.el">notify.el</a> from <a href="http://www.emacswiki.org/">EmacsWiki</a> (just M-x el-get-install it if
you need it), or will use its own implementation of an Emacs <a href="http://growl.info/">Growl</a> client
(it's about 5 lines long), and baring all of that will use the message
function.</p>
<p>The reason I say <em>worst case</em> is that I have a lot of packages to initialize
at startup, and that I did absolutely no effort for this initializing to be
quick. Still, my Emacs setup is taking about 20 seconds to boot. Pretty
good I would say, for a weekly operation.</p>
]]></description>
<p>Here's what the <a href="http://pgloader.projects.postgresql.org/">pgloader documentation</a> has to say about this <em>reformat</em> parameter: <em>The value of this option is a comma separated list of columns to rewrite, which are a colon separated list of column name, reformat module name, reformat function name</em>.</p> <p>And here's the <a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> section that deals with reformat:</p> <pre class="src"> [<span style="color: #8ae234; font-weight: bold;">reformat</span>] <span style="color: #eeeeec;">table</span> = reformat <span style="color: #eeeeec;">format</span> = text <span style="color: #eeeeec;">filename</span> = reformat/reformat.data<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Sat, 06 Aug 2011 14:58:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/06-emacs-startup-notification.html</guid> </item> <item> <title>pgloader reformating</title> <link>http://tapoueh.org/blog/2011/08/05-reformating-modules-for-pgloader.html</link> <description><![CDATA[<p>Back to our series about <a href="../../../pgsql/pgloader.html">pgloader</a>. The previous articles detailed <a href="http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html">How To Use PgLoader</a> then <a href="http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html">How to Setup pgloader</a>, then what to expect from a <a href="http://tapoueh.org/blog/2011/08/01-parallel-pgloader.html">parallel pgloader</a> setup. This article will detail how to <em>reformat</em> input columns so that what <a href="http://www.postgresql.org/">PostgreSQL</a> sees is not what's in the data file, but the result of a <em>transformation</em> from this data into something acceptable as an <em>input</em> for the target data type.</p>
| <span style="color: #eeeeec;">field_sep</span> = |
<span style="color: #eeeeec;">columns</span> = id, timestamp <span style="color: #eeeeec;">reformat</span> = timestamp:mysql:timestamp </pre>
<p>The documentation says some more about it, so check it out. Also, thereformat_path option (set either on the command line or in the configuration
file) is used to find the python module implementing the reformat function.
Please refer to the manual as to how to set it.</p>
<p>Now, obviously, for the <em>reformat</em> to happen we need to write some code.
That's the whole point of the option: you need something very specific, you
are in a position to write the 5 lines of code needed to make it happen,
<a href="http://tapoueh.org/pgsql/pgloader.html">pgloader</a> allows you to just do that. Of course, the code needs to be
written in python here, so that you can even benefit from the
<a href="http://tapoueh.org/blog/2011/08/01-parallel-pgloader.html">parallel pgloader</a> settings.</p>
<p>Let's see an reformat module exemple, as found in <a href="https://github.com/dimitri/pgloader/blob/master/reformat/mysql.py">reformat/mysql.py</a> in the
pgloader sources:</p>
<pre class="src">
<span style="color: #888a85;"># </span><span style="color: #888a85;">Author: Dimitri Fontaine <<a href="mailto:dim@tapoueh.org">dim@tapoueh.org</a>>
</span><span style="color: #888a85;">#</span><span style="color: #888a85;">
</span><span style="color: #888a85;"># </span><span style="color: #888a85;">pgloader mysql reformating module
</span><span style="color: #888a85;">#</span><span style="color: #888a85;">
</span>
<span style="color: #729fcf; font-weight: bold;">def</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">timestamp</span>(reject, <span style="color: #729fcf;">input</span>):
<span style="color: #ad7fa8; font-style: italic;">""" Reformat str as a PostgreSQL timestamp
MySQL timestamps are like: 20041002152952 We want instead this input: 2004-10-02 15:29:52 """</span> <span style="color: #729fcf; font-weight: bold;">if</span> <span style="color: #729fcf;">len</span>(<span style="color: #729fcf;">input</span>) != 14: <span style="color: #eeeeec;">e</span> = <span style="color: #ad7fa8; font-style: italic;">"MySQL timestamp reformat input too short: %s"</span> % <span style="color: #729fcf;">input</span> reject.log(e, <span style="color: #729fcf;">input</span>)
<span style="color: #eeeeec;">year</span> = <span style="color: #729fcf;">input</span>[0:4] <span style="color: #eeeeec;">month</span> = <span style="color: #729fcf;">input</span>[4:6] <span style="color: #eeeeec;">day</span> = <span style="color: #729fcf;">input</span>[6:8] <span style="color: #eeeeec;">hour</span> = <span style="color: #729fcf;">input</span>[8:10] <span style="color: #eeeeec;">minute</span> = <span style="color: #729fcf;">input</span>[10:12] <span style="color: #eeeeec;">seconds</span> = <span style="color: #729fcf;">input</span>[12:14]
<p>This reformat module will <em>transform</em> a<span style="color: #729fcf; font-weight: bold;">return</span> <span style="color: #ad7fa8; font-style: italic;">'%s-%s-%s %s:%s:%s'</span> % (year, month, day, hour, minute, seconds) </pre>
timestamp representation as issued by
certain versions of MySQL into something that PostgreSQL is able to read as
a timestamp.</p>
<p>If you're in the camp that wants to write as little code as possible rather
than easy to read and maintain code, I guess you could write it this way
instead:</p>
<pre class="src">
<span style="color: #729fcf; font-weight: bold;">import</span> re
<span style="color: #729fcf; font-weight: bold;">def</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">timestamp</span>(reject, <span style="color: #729fcf;">input</span>):
<p>Whenever you have an input file with data that PostgreSQL chokes upon, you can solve this problem from <a href="http://tapoueh.org/pgsql/pgloader.html">pgloader</a> itself: no need to resort to scripting and a pipelines of <a href="http://www.gnu.org/software/gawk/manual/gawk.html">awk</a> (which I use a lot in other cases, don't get me wrong) or other tools. See, you finally have an excuse to <a href="http://diveintopython.org/">Dive into Python</a>!</p> ]]></description><span style="color: #ad7fa8; font-style: italic;">""" 20041002152952 -> 2004-10-02 15:29:52 """</span> <span style="color: #eeeeec;">g</span> = re.match(r<span style="color: #ad7fa8; font-style: italic;">"(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})"</span>, <span style="color: #729fcf;">input</span>) <span style="color: #729fcf; font-weight: bold;">return</span> <span style="color: #ad7fa8; font-style: italic;">'%s-%s-%s %s:%s:%s'</span> % <span style="color: #729fcf;">tuple</span>([g.group(x+1) <span style="color: #729fcf; font-weight: bold;">for</span> x <span style="color: #729fcf; font-weight: bold;">in</span> <span style="color: #729fcf;">range</span>(6)]) </pre>
<p>When you want to benchmark your own application, to know how many more clients it can handle or how much gain you will see with some new shiny hardware, <a href="http://tsung.erlang-projects.org/">Tsung</a> is the tool to use. It will allow you to <em>record</em> a number of sessions then replay them at high scale. <a href="http://pgfouine.projects.postgresql.org/tsung.html">pgfouine</a> supports Tsung and is able to turn your PostgreSQL logs into Tsung sessions, too.</p> <p>Tsung did get used in the video game world, their version of it is called <a href="http://www.developer.unitypark3d.com/tools/utsung/">uTsung</a>, apparently using the <a href="http://www.developer.unitypark3d.com/index.html">uLink</a> game development facilities. They even made a video demo of uTsung, that you might find interresting:</p> <blockquote> <p class="quoted"><a class="image-link" href="http://www.youtube.com/watch?v=rxBhqIP_7ls"> <img src="../../../images/utsung-demo.png"></a></p> </blockquote> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 05 Aug 2011 11:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/05-reformating-modules-for-pgloader.html</guid> </item> <item> <title>Reformater avec pgloader</title> <link>http://tapoueh.org/blog/2011/08/05-reformater-avec-pgloader.html</link> <description><![CDATA[<p>Dans la série de nos articles sur <a href="http://tapoueh.org/tags/pgloader.html">pgloader</a>, le dernier venu détaille comment utiliser la fonction de <em>reformatage</em> de cet outil. Dans le cadre d'utilisation d'un <a href="http://fr.wikipedia.org/wiki/Extract_Transform_Load">ETL</a>, cela est assimilé à la phase <em>Transform</em>, ce qui fait de
pgloaderune solution <em>simple</em> pour vos besoins d'ETL.</p> ]]></description> <author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 05 Aug 2011 11:26:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/05-reformater-avec-pgloader.html</guid> </item> <item> <title>See Tsung in action</title> <link>http://tapoueh.org/blog/2011/08/02-see-tsung-in-action.html</link> <description><![CDATA[<p><a href="http://tsung.erlang-projects.org/">Tsung</a> is an open-source multi-protocol distributed load testing tool and a mature project. It's been available for about 10 years and is built with the <a href="http://www.erlang.org/">Erlang</a> system. It supports several protocols, including the <a href="http://www.postgresql.org/">PostgreSQL</a> one.</p>
<h2>several files at a time</h2> <p class="first">Parallelism is implemented in 3 different ways in pgloader. First, you can load more than one file at a time thanks to the<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 02 Aug 2011 10:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/02-see-tsung-in-action.html</guid> </item> <item> <title>Parallel pgloader</title> <link>http://tapoueh.org/blog/2011/08/01-parallel-pgloader.html</link> <description><![CDATA[<p>This article continues the series that began with <a href="http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html">How To Use PgLoader</a> then detailed <a href="http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html">How to Setup pgloader</a>. We have some more fine points to talk about here, today's article is about loading your data in parallel with <a href="../../../pgsql/pgloader.html">pgloader</a>.</p>
max_parallel_sections
parameter, that has to be setup in the <em>global section</em> of the file.</p>
<p>This setting is quite simple and already allows the most common use case.</p>
<h2>several workers per file</h2>
<p class="first">The other use case is when you have huge files to load into the database.
Then you want to be able to have more than one process reading the file at
the same time. Using <a href="../../../pgsql/pgloader.html">pgloader</a>, you already did the compromise to load the
whole content in more than one transaction, so there's no further drawback
here about having those multiple transactions per file spread to more than
one load <em>worker</em>.</p>
<p>There are basically two ways to split the work between several workers here,
and both are implemented in pgloader.</p>
<h3>N workers, N splits of the file</h3>
<pre class="src">
<span style="color: #eeeeec;">section_threads</span> = 4
<span style="color: #eeeeec;">split_file_reading</span> = True
</pre>
<p>Setup this way, <a href="../../../pgsql/pgloader.html">pgloader</a> will launch 4 different <em>threads</em> (see the <strong>caveat</strong>
section of this article). Each thread is then given a part of the input
data file and will run the whole usual pgloader processing on its own. For
this to work you need to be able to seek in the input stream, which might
not always be convenient.</p>
<h3>one reader, N workers</h3>
<pre class="src">
<span style="color: #eeeeec;">section_threads</span> = 4
<span style="color: #eeeeec;">split_file_reading</span> = False
<span style="color: #eeeeec;">rrqueue_size</span> = 5000
</pre>
<p>With such a setup, <a href="../../../pgsql/pgloader.html">pgloader</a> will start 4 different worker <em>threads</em> that will
receive the data input in an internal <a href="http://docs.python.org/library/collections.html#deque-objects">python queue</a>. Another active <em>thread</em>
will be responsible of reading the input file and filling the queues in a
<em>round robin</em> fashion, but will hand all the processing of the data to each
worker, of course.</p>
<h3>how many threads?</h3>
<p class="first">If you're using a mix and match of max_parallel_sections and section_threads
with split_file_reading set to True of False, it's uneasy to know exactly
how many <em>threads</em> will run at any time in the loading. How to ascertain
which section will run in parallel when it depends on the timing of the
loading?</p>
<p>The advice here is the usual one, don't overestimate the capabilities of
your system unless you are in a position to check before by doing trial
runs.</p>
<h2>caveat</h2>
<p class="first">Current implementation of all the parallelism in <a href="../../../pgsql/pgloader.html">pgloader</a> has been done with
the <a href="http://docs.python.org/library/threading.html">python threading</a> API. While this is easy enough to use when you want to
exchange data between threads, it's suffering from the
<a href="http://docs.python.org/c-api/init.html#thread-state-and-the-global-interpreter-lock">Global Interpreter Lock</a> issue. This means that while the code is doing its
processing in parallel, the <em>runtime</em> not so much. You might still benefit
from the current implementation if you have hard to parse files, or custom
reformat modules that are part of the loading bottleneck.</p>
<h2>future</h2>
<p class="first">The solution would be to switch to using the newer <a href="http://docs.python.org/library/multiprocessing.html">python multiprocessing</a>
API, and some preliminary work has been done in pgloader to allow for that.
If you're interested in real parallel bulk loading, <a href="dim%20(at)%20tapoueh%20(dot)%20org">contact-me</a>!</p>
]]></description>
<h2>several files at a time</h2> <p class="first">Parallelism is implemented in 3 different ways in pgloader. First, you can load more than one file at a time thanks to the<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 01 Aug 2011 12:15:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/01-parallel-pgloader.html</guid> </item> <item> <title>Parallel pgloader</title> <link>http://tapoueh.org/blog/2011/08/01-parallel-pgloader.html</link> <description><![CDATA[<p>This article continues the series that began with <a href="http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html">How To Use PgLoader</a> then detailed <a href="http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html">How to Setup pgloader</a>. We have some more fine points to talk about here, today's article is about loading your data in parallel with <a href="../../../pgsql/pgloader.html">pgloader</a>.</p>
max_parallel_sections
parameter, that has to be setup in the <em>global section</em> of the file.</p>
<p>This setting is quite simple and already allows the most common use case.</p>
<h2>several workers per file</h2>
<p class="first">The other use case is when you have huge files to load into the database.
Then you want to be able to have more than one process reading the file at
the same time. Using <a href="../../../pgsql/pgloader.html">pgloader</a>, you already did the compromise to load the
whole content in more than one transaction, so there's no further drawback
here about having those multiple transactions per file spread to more than
one load <em>worker</em>.</p>
<p>There are basically two ways to split the work between several workers here,
and both are implemented in pgloader.</p>
<h3>N workers, N splits of the file</h3>
<pre class="src">
<span style="color: #eeeeec;">section_threads</span> = 4
<span style="color: #eeeeec;">split_file_reading</span> = True
</pre>
<p>Setup this way, <a href="../../../pgsql/pgloader.html">pgloader</a> will launch 4 different <em>threads</em> (see the <strong>caveat</strong>
section of this article). Each thread is then given a part of the input
data file and will run the whole usual pgloader processing on its own. For
this to work you need to be able to seek in the input stream, which might
not always be convenient.</p>
<h3>one reader, N workers</h3>
<pre class="src">
<span style="color: #eeeeec;">section_threads</span> = 4
<span style="color: #eeeeec;">split_file_reading</span> = False
<span style="color: #eeeeec;">rrqueue_size</span> = 5000
</pre>
<p>With such a setup, <a href="../../../pgsql/pgloader.html">pgloader</a> will start 4 different worker <em>threads</em> that will
receive the data input in an internal <a href="http://docs.python.org/library/collections.html#deque-objects">python queue</a>. Another active <em>thread</em>
will be responsible of reading the input file and filling the queues in a
<em>round robin</em> fashion, but will hand all the processing of the data to each
worker, of course.</p>
<h3>how many threads?</h3>
<p class="first">If you're using a mix and match of max_parallel_sections and section_threads
with split_file_reading set to True of False, it's uneasy to know exactly
how many <em>threads</em> will run at any time in the loading. How to ascertain
which section will run in parallel when it depends on the timing of the
loading?</p>
<p>The advice here is the usual one, don't overestimate the capabilities of
your system unless you are in a position to check before by doing trial
runs.</p>
<h2>caveat</h2>
<p class="first">Current implementation of all the parallelism in <a href="../../../pgsql/pgloader.html">pgloader</a> has been done with
the <a href="http://docs.python.org/library/threading.html">python threading</a> API. While this is easy enough to use when you want to
exchange data between threads, it's suffering from the
<a href="http://docs.python.org/c-api/init.html#thread-state-and-the-global-interpreter-lock">Global Interpreter Lock</a> issue. This means that while the code is doing its
processing in parallel, the <em>runtime</em> not so much. You might still benefit
from the current implementation if you have hard to parse files, or custom
reformat modules that are part of the loading bottleneck.</p>
<h2>future</h2>
<p class="first">The solution would be to switch to using the newer <a href="http://docs.python.org/library/multiprocessing.html">python multiprocessing</a>
API, and some preliminary work has been done in pgloader to allow for that.
If you're interested in real parallel bulk loading, <a href="dim%20(at)%20tapoueh%20(dot)%20org">contact-me</a>!</p>
]]></description>
<p>Bonne lecture !</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 01 Aug 2011 12:05:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/08/01-parallel-pgloader.html</guid> </item> <item> <title>Configurer pgloader</title> <link>http://tapoueh.org/blog/2011/07/29-configurer-pgloader.html</link> <description><![CDATA[<p>Je viens de publier un billet en anglais intitulé <a href="http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html">How to Setup pgloader</a>, qui complète l'écriture en cours d'un <a href="http://tapoueh.org/pgsql/pgloader.html">tutoriel pgloader</a> plus complet. Une fois de plus, je n'ai pas pris le temps de traduire cet article en français avant de savoir si cela vous intéresse, ô lecteurs. Si c'est le cas il suffit de me l'indiquer par mail (ou <em>courriel</em>, après tout) pour que j'ajoute cela dans ma
TODOliste.</p>
<p>This file is expected in the<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 29 Jul 2011 15:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/29-configurer-pgloader.html</guid> </item> <item> <title>How to Setup pgloader</title> <link>http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html</link> <description><![CDATA[<p>In a previous article we detailed <a href="http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html">how to use pgloader</a>, let's now see how to write the
pgloader.confthat instructs <a href="http://tapoueh.org/pgsql/pgloader.html">pgloader</a> about what to do.</p>
INI format, with a <em>global</em> section then one
section per file you want to import. The <em>global</em> section defines some
default options and how to connect to the <a href="http://tapoueh.org/pgsql/index.html">PostgreSQL</a> server.</p>
<p>The configuration setup is fully documented on the <a href="http://pgloader.projects.postgresql.org/">pgloader man page</a> that
you can even easily find online. As all <em>unix</em> style man pages, though, it's
more a complete reference than introductory material. Let's review.</p>
<h2>global section</h2>
<p class="first">Here's the <em>global</em> section of the <a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> file of the source
files. Well, some options are <em>debugger</em> only options, really, so I changed
their value so that what you see here is a better starting point.</p>
<pre class="src">
[<span style="color: #8ae234; font-weight: bold;">pgsql</span>]
<span style="color: #eeeeec;">base</span> = pgloader
<span style="color: #eeeeec;">log_file</span> = /tmp/pgloader.log <span style="color: #eeeeec;">log_min_messages</span> = INFO <span style="color: #eeeeec;">client_min_messages</span> = WARNING
<span style="color: #eeeeec;">lc_messages</span> = C <span style="color: #eeeeec;">pg_option_client_encoding</span> = <span style="color: #ad7fa8; font-style: italic;">'utf-8'</span> <span style="color: #eeeeec;">pg_option_standard_conforming_strings</span> = on <span style="color: #eeeeec;">pg_option_work_mem</span> = 128MB
<span style="color: #eeeeec;">copy_every</span> = 15000
<span style="color: #eeeeec;">null</span> = <span style="color: #ad7fa8; font-style: italic;">""</span> <span style="color: #eeeeec;">empty_string</span> = <span style="color: #ad7fa8; font-style: italic;">"\ "</span>
<span style="color: #eeeeec;">max_parallel_sections</span> = 4 </pre>
<p>You don't see all the connection setup, herebase was enough. You might
need to setup host, port and user, and maybe even pass, too, to be able to
connect to the PostgreSQL server.</p>
<p>The logging options allows you to set a file where to log all pgloader
messages, that are categorized as either DEBUG, INFO, WARNING, ERROR or
CRITICAL. The options log_min_messages and client_min_messages are another
good idea stolen from <a href="http://www.postgresql.org/">PostgreSQL</a> and allow you to setup the level of chatter
you want to see on the interactive console (standard output and standard
error streams) and on the log file.</p>
<p>Please note that the DEBUG level will produce more that 3 times as many data
as the data file you're importing. If you're not a pgloader contributor or
helping them, well, <em>debug</em> it, you want to avoid setting the log chatter to
this value.</p>
<p>The client_encoding will be <a href="http://www.postgresql.org/docs/current/static/sql-set.html">SET</a> by <a href="http://tapoueh.org/pgsql/pgloader.html">pgloader</a> on the PostgreSQL connection it
establish. You can now even set any parameter you want by using the
pg_option_parameter_name magic settings. Note that the command line option
--pg-options (or -o for brevity) allows you to override that.</p>
<p>Then, the copy_every parameter is set to 5 in the examples, because the test
files are containing less than 10 lines and we want to test several <em>batches</em>
of commits when using them. So for your real loading, stick to default
parameters (10 000 lines per COPY command), or more. You can play with this
parameter, depending on the network (or local access) and disk system you're
using you might see improvements by reducing it or enlarging it. There's no
so much theory of operation as empirical testing and setting here. For a
one-off operation, just remove the lines from the configuration.</p>
<p>The parameters null and empty_string are related to interpreting the data in
the text or csv files you have, and the documentation is quite clear about
them. Note that you have global setting and per-section setting too.</p>
<p>The last parameter of this example, max_parallel_sections, is detailed later
in the article.</p>
<h2>files section</h2>
<p class="first">After the <em>global</em> section come as many sections as you have file to load.
Plus the <em>template</em> sections, that are only there so that you can share a
bunch of parameters in more than one section. Picture a series of data file
all of the same format, the only thing that will change is the filename.
Use a template section in this case!</p>
<p>Let's see an example:</p>
<pre class="src">
[<span style="color: #8ae234; font-weight: bold;">simple_tmpl</span>]
<span style="color: #eeeeec;">template</span> = True
<span style="color: #eeeeec;">format</span> = text
<span style="color: #eeeeec;">datestyle</span> = dmy
| <span style="color: #eeeeec;">field_sep</span> = |
<span style="color: #eeeeec;">trailing_sep</span> = True
[<span style="color: #8ae234; font-weight: bold;">simple</span>] <span style="color: #eeeeec;">use_template</span> = simple_tmpl <span style="color: #eeeeec;">table</span> = simple <span style="color: #eeeeec;">filename</span> = simple/simple.data <span style="color: #eeeeec;">columns</span> = a:1, b:3, c:2 <span style="color: #eeeeec;">skip_head_lines</span> = 2
<span style="color: #888a85;"># </span><span style="color: #888a85;">those reject settings are defaults one </span><span style="color: #eeeeec;">reject_log</span> = /tmp/simple.rej.log <span style="color: #eeeeec;">reject_data</span> = /tmp/simple.rej
[<span style="color: #8ae234; font-weight: bold;">partial</span>] <span style="color: #eeeeec;">table</span> = partial <span style="color: #eeeeec;">format</span> = text <span style="color: #eeeeec;">filename</span> = partial/partial.data <span style="color: #eeeeec;">field_sep</span> = % <span style="color: #eeeeec;">columns</span> = * <span style="color: #eeeeec;">only_cols</span> = 1-3, 5 </pre>
<p>That's 2 of the examples from the <a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> file, in 3 sections so that we see one template example. Of course, having a single section using the template, it's just here for the example.</p> <h2>data file format</h2> <p class="first">The most important setting that you have to care about is the file format. Your choice here is eithertext, csv or fixed. Mostly, what we are given
nowadays is csv. You might remember having read that the nice thing about
standards is that there's so many to choose from... well, the csv land is
one where it's pretty hard to find different producers that understand it
the same way.</p>
<p>So when you fail to have pgloader load your <em>mostly csv</em> files with a csv
setup, it's time to consider using text instead. The text file format
accept a lot of tunables to adapt to crazy situations, but is all python
code when the <a href="http://docs.python.org/library/csv.html">python csv module</a> is a C-coded module, more efficient.</p>
<p>If you're wondering what kind of format we're talking about here, here's the
<a href="https://github.com/dimitri/pgloader/blob/master/examples/cluttered/cluttered.data">cluttered pgloader example</a> for your reading pleasure, using ^ (carret) as
the field separator:</p>
<pre class="src">
1^some multi\
line text with\
newline escaping^and some other data following^
2^and another line^clean^
3^and\
a last multiline\
escaped line
with a missing\
escaping^just to test^
4^\ ^empty value^
5^^null value^
6^multi line\
escaped value\
\
with empty line\
embeded^last line^
</pre>
<p>And here's what we get by loading that:</p>
<pre class="src">
pgloader/examples$ pgloader -c pgloader.conf -s cluttered
| Table name | duration | size | copy rows | errors |
====================================================================
| cluttered | 0.193s | - | 6 | 0 |
pgloader/examples$ psql pgloader -c <span style="color: #ad7fa8; font-style: italic;">"table cluttered;"</span>
| a | b | c |
—+——————————-+——————
| 1 | and some other data following | some multi |
: line text with : newline escaping
| 2 | clean | and another line |
| 3 | just to test | and |
: a last multiline : escaped line : with a missing : escaping
| 4 | empty value | |
| 5 | null value | |
| 6 | last line | multi line |
: escaped value : : with empty line : embeded (6 rows) </pre>
pgloader is still
able to help you!</p>
<p>Please refer to the <a href="http://pgloader.projects.postgresql.org/">pgloader man page</a> to know about each and every parameter
that you can define and the values accepted, etc. And the <em>fixed</em> data format
is to be used when you're not given a field separator but field positions in
the file. Yes, we still encounter those from time to time. Who needs
variable size storage, after all?</p>
]]></description>
<p>Now, a problem that I still had to solve was the colors used in the terminal. As I'm using the <em>tango</em> color theme for emacs, the default <em>ANSI</em> palette's blue color was not readable. Here's how to fix that:</p> <pre class="src"><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 29 Jul 2011 15:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html</guid> </item> <item> <title>Emacs ANSI colors</title> <link>http://tapoueh.org/blog/2011/07/blog/2011/07/29-emacs-ansi-colors.html</link> <description><![CDATA[<p><a href="http://tapoueh.org/emacs/index.html">Emacs</a> comes with a pretty good implementation of a terminal emulator,
M-x term. Well not that good actually, but given what I use it for, it's just what I need. Particulary if you add to that my <a href="http://tapoueh.org/emacs/cssh.html">cssh</a> tool, so that connecting withsshto a remote host is just a=C-= runs the command cssh-term-remote-openaway, and completes on the host name thanks to~/.ssh/known_hosts.</p>
<p>Now your colors in an emacs terminal are easy to read, as you can see:</p> <blockquote> <p class="quoted"><img src="../../../images/emacs-tango-term-colors.png" alt=""></p> </blockquote> <p>Hope you enjoy!</p> ]]></description>(<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">ansi-color</span>) (setq ansi-color-names-vector (vector (frame-parameter nil 'background-color) <span style="color: #bc8f8f;">"#f57900"</span> <span style="color: #bc8f8f;">"#8ae234"</span> <span style="color: #bc8f8f;">"#edd400"</span> <span style="color: #bc8f8f;">"#729fcf"</span> <span style="color: #bc8f8f;">"#ad7fa8"</span> <span style="color: #bc8f8f;">"cyan3"</span> <span style="color: #bc8f8f;">"#eeeeec"</span>) ansi-term-color-vector ansi-color-names-vector ansi-color-map (ansi-color-make-color-map)) </pre>
<p>Now, a problem that I still had to solve was the colors used in the terminal. As I'm using the <em>tango</em> color theme for emacs, the default <em>ANSI</em> palette's blue color was not readable. Here's how to fix that:</p> <pre class="src"><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 29 Jul 2011 10:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/blog/2011/07/29-emacs-ansi-colors.html</guid> </item> <item> <title>Emacs ANSI colors</title> <link>http://tapoueh.org/blog/2011/07/29-emacs-ansi-colors.html</link> <description><![CDATA[<p><a href="http://tapoueh.org/emacs/index.html">Emacs</a> comes with a pretty good implementation of a terminal emulator,
M-x term. Well not that good actually, but given what I use it for, it's just what I need. Particulary if you add to that my <a href="http://tapoueh.org/emacs/cssh.html">cssh</a> tool, so that connecting withsshto a remote host is just a=C-= runs the command cssh-term-remote-openaway, and completes on the host name thanks to~/.ssh/known_hosts.</p>
<p>Now your colors in an emacs terminal are easy to read, as you can see:</p> <blockquote> <p class="quoted"><img src="../../../images/emacs-tango-term-colors.png" alt=""></p> </blockquote> <p>Hope you enjoy!</p> ]]></description>(<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">ansi-color</span>) (setq ansi-color-names-vector (vector (frame-parameter nil 'background-color) <span style="color: #ad7fa8; font-style: italic;">"#f57900"</span> <span style="color: #ad7fa8; font-style: italic;">"#8ae234"</span> <span style="color: #ad7fa8; font-style: italic;">"#edd400"</span> <span style="color: #ad7fa8; font-style: italic;">"#729fcf"</span> <span style="color: #ad7fa8; font-style: italic;">"#ad7fa8"</span> <span style="color: #ad7fa8; font-style: italic;">"cyan3"</span> <span style="color: #ad7fa8; font-style: italic;">"#eeeeec"</span>) ansi-term-color-vector ansi-color-names-vector ansi-color-map (ansi-color-make-color-map)) </pre>
<p>This file is expected in the<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 29 Jul 2011 10:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/29-emacs-ansi-colors.html</guid> </item> <item> <title>How to Setup pgloader</title> <link>http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html</link> <description><![CDATA[<p>In a previous article we detailed <a href="http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html">how to use pgloader</a>, let's now see how to write the
pgloader.confthat instructs <a href="http://tapoueh.org/pgsql/pgloader.html">pgloader</a> about what to do.</p>
INI format, with a <em>global</em> section then one
section per file you want to import. The <em>global</em> section defines some
default options and how to connect to the <a href="http://tapoueh.org/pgsql/index.html">PostgreSQL</a> server.</p>
<p>The configuration setup is fully documented on the <a href="http://pgloader.projects.postgresql.org/">pgloader man page</a> that
you can even easily find online. As all <em>unix</em> style man pages, though, it's
more a complete reference than introductory material. Let's review.</p>
<h2>global section</h2>
<p class="first">Here's the <em>global</em> section of the <a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> file of the source
files. Well, some options are <em>debugger</em> only options, really, so I changed
their value so that what you see here is a better starting point.</p>
<pre class="src">
[<span style="color: #8ae234; font-weight: bold;">pgsql</span>]
<span style="color: #eeeeec;">base</span> = pgloader
<span style="color: #eeeeec;">log_file</span> = /tmp/pgloader.log <span style="color: #eeeeec;">log_min_messages</span> = INFO <span style="color: #eeeeec;">client_min_messages</span> = WARNING
<span style="color: #eeeeec;">lc_messages</span> = C <span style="color: #eeeeec;">pg_option_client_encoding</span> = <span style="color: #ad7fa8; font-style: italic;">'utf-8'</span> <span style="color: #eeeeec;">pg_option_standard_conforming_strings</span> = on <span style="color: #eeeeec;">pg_option_work_mem</span> = 128MB
<span style="color: #eeeeec;">copy_every</span> = 15000
<span style="color: #eeeeec;">null</span> = <span style="color: #ad7fa8; font-style: italic;">""</span> <span style="color: #eeeeec;">empty_string</span> = <span style="color: #ad7fa8; font-style: italic;">"\ "</span>
<span style="color: #eeeeec;">max_parallel_sections</span> = 4 </pre>
<p>You don't see all the connection setup, herebase was enough. You might
need to setup host, port and user, and maybe even pass, too, to be able to
connect to the PostgreSQL server.</p>
<p>The logging options allows you to set a file where to log all pgloader
messages, that are categorized as either DEBUG, INFO, WARNING, ERROR or
CRITICAL. The options log_min_messages and client_min_messages are another
good idea stolen from <a href="http://www.postgresql.org/">PostgreSQL</a> and allow you to setup the level of chatter
you want to see on the interactive console (standard output and standard
error streams) and on the log file.</p>
<p>Please note that the DEBUG level will produce more that 3 times as many data
as the data file you're importing. If you're not a pgloader contributor or
helping them, well, <em>debug</em> it, you want to avoid setting the log chatter to
this value.</p>
<p>The client_encoding will be <a href="http://www.postgresql.org/docs/current/static/sql-set.html">SET</a> by <a href="http://tapoueh.org/pgsql/pgloader.html">pgloader</a> on the PostgreSQL connection it
establish. You can now even set any parameter you want by using the
pg_option_parameter_name magic settings. Note that the command line option
--pg-options (or -o for brevity) allows you to override that.</p>
<p>Then, the copy_every parameter is set to 5 in the examples, because the test
files are containing less than 10 lines and we want to test several <em>batches</em>
of commits when using them. So for your real loading, stick to default
parameters (10 000 lines per COPY command), or more. You can play with this
parameter, depending on the network (or local access) and disk system you're
using you might see improvements by reducing it or enlarging it. There's no
so much theory of operation as empirical testing and setting here. For a
one-off operation, just remove the lines from the configuration.</p>
<p>The parameters null and empty_string are related to interpreting the data in
the text or csv files you have, and the documentation is quite clear about
them. Note that you have global setting and per-section setting too.</p>
<p>The last parameter of this example, max_parallel_sections, is detailed later
in the article.</p>
<h2>files section</h2>
<p class="first">After the <em>global</em> section come as many sections as you have file to load.
Plus the <em>template</em> sections, that are only there so that you can share a
bunch of parameters in more than one section. Picture a series of data file
all of the same format, the only thing that will change is the filename.
Use a template section in this case!</p>
<p>Let's see an example:</p>
<pre class="src">
[<span style="color: #8ae234; font-weight: bold;">simple_tmpl</span>]
<span style="color: #eeeeec;">template</span> = True
<span style="color: #eeeeec;">format</span> = text
<span style="color: #eeeeec;">datestyle</span> = dmy
| <span style="color: #eeeeec;">field_sep</span> = |
<span style="color: #eeeeec;">trailing_sep</span> = True
[<span style="color: #8ae234; font-weight: bold;">simple</span>] <span style="color: #eeeeec;">use_template</span> = simple_tmpl <span style="color: #eeeeec;">table</span> = simple <span style="color: #eeeeec;">filename</span> = simple/simple.data <span style="color: #eeeeec;">columns</span> = a:1, b:3, c:2 <span style="color: #eeeeec;">skip_head_lines</span> = 2
<span style="color: #888a85;"># </span><span style="color: #888a85;">those reject settings are defaults one </span><span style="color: #eeeeec;">reject_log</span> = /tmp/simple.rej.log <span style="color: #eeeeec;">reject_data</span> = /tmp/simple.rej
[<span style="color: #8ae234; font-weight: bold;">partial</span>] <span style="color: #eeeeec;">table</span> = partial <span style="color: #eeeeec;">format</span> = text <span style="color: #eeeeec;">filename</span> = partial/partial.data <span style="color: #eeeeec;">field_sep</span> = % <span style="color: #eeeeec;">columns</span> = * <span style="color: #eeeeec;">only_cols</span> = 1-3, 5 </pre>
<p>That's 2 of the examples from the <a href="https://github.com/dimitri/pgloader/blob/master/examples/pgloader.conf">examples/pgloader.conf</a> file, in 3 sections so that we see one template example. Of course, having a single section using the template, it's just here for the example.</p> <h3>data file format</h3> <p class="first">The most important setting that you have to care about is the file format. Your choice here is eithertext, csv or fixed. Mostly, what we are given
nowadays is csv. You might remember having read that the nice thing about
standards is that there's so many to choose from... well, the csv land is
one where it's pretty hard to find different producers that understand it
the same way.</p>
<p>So when you fail to have pgloader load your <em>mostly csv</em> files with a csv
setup, it's time to consider using text instead. The text file format
accept a lot of tunables to adapt to crazy situations, but is all python
code when the <a href="http://docs.python.org/library/csv.html">python csv module</a> is a C-coded module, more efficient.</p>
<p>If you're wondering what kind of format we're talking about here, here's the
<a href="https://github.com/dimitri/pgloader/blob/master/examples/cluttered/cluttered.data">cluttered pgloader example</a> for your reading pleasure, using ^ (carret) as
the field separator:</p>
<pre class="src">
1^some multi\
line text with\
newline escaping^and some other data following^
2^and another line^clean^
3^and\
a last multiline\
escaped line
with a missing\
escaping^just to test^
4^\ ^empty value^
5^^null value^
6^multi line\
escaped value\
\
with empty line\
embeded^last line^
</pre>
<p>And here's what we get by loading that:</p>
<pre class="src">
pgloader/examples$ pgloader -c pgloader.conf -s cluttered
| Table name | duration | size | copy rows | errors |
====================================================================
| cluttered | 0.193s | - | 6 | 0 |
pgloader/examples$ psql pgloader -c <span style="color: #ad7fa8; font-style: italic;">"table cluttered;"</span>
| a | b | c |
—+——————————-+——————
| 1 | and some other data following | some multi |
: line text with : newline escaping
| 2 | clean | and another line |
| 3 | just to test | and |
: a last multiline : escaped line : with a missing : escaping
| 4 | empty value | |
| 5 | null value | |
| 6 | last line | multi line |
: escaped value : : with empty line : embeded (6 rows) </pre>
pgloader is still
able to help you!</p>
<p>Please refer to the <a href="http://pgloader.projects.postgresql.org/">pgloader man page</a> to know about each and every parameter
that you can define and the values accepted, etc. And the <em>fixed</em> data format
is to be used when you're not given a field separator but field positions in
the file. Yes, we still encounter those from time to time.</p>
<h2>parallel processing</h2>
<h3>one reader, multiple workers</h3>
<h3>multiple workers, each reading</h3>
]]></description>
<p>Here's a catalog query to help you there:</p> <pre class="src"><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 29 Jul 2011 09:57:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/29-how-to-setup-pgloader.html</guid> </item> <item> <title>Next month partitions</title> <link>http://tapoueh.org/blog/2011/07/27-check-parts-for-next-month.html</link> <description><![CDATA[<p>When you do partition your tables monthly, then comes the question of when to create next partitions. I tend to create them just the week before next month and I have some nice <a href="http://www.nagios.org/">nagios</a> scripts to alert me in case I've forgotten to do so. How to check that by hand in the end of a month?</p>
> select *
-> from
-> (
(> select <span style"color: #ad7fa8; font-style: italic;">'previous parts'</span> as schemaname, count()::text as tablename
(> from pg_tables
(> where schemaname not in (<span style="color: #ad7fa8; font-style: italic;">'pg_catalog'</span>,<span style="color: #ad7fa8; font-style: italic;">'information_schema'</span>)
(> and tablename like to_char(now(), <span style="color: #ad7fa8; font-style: italic;">'%YYYYMM'</span>)
(>
(> union
(>
| (> select schemaname, substring(tablename,1,length(tablename)-6) | <span style="color: #ad7fa8; font-style: italic;">'201108'</span> |
|---|
(> from pg_tables (> where schemaname not in (<span style="color: #ad7fa8; font-style: italic;">'pg_catalog'</span>,<span style="color: #ad7fa8; font-style: italic;">'information_schema'</span>) (> and tablename like to_char(now(), <span style="color: #ad7fa8; font-style: italic;">'%YYYYMM'</span>) (> (> except (> (> select schemaname, tablename (> from pg_tables (> where schemaname not in (<span style="color: #ad7fa8; font-style: italic;">'pg_catalog'</span>,<span style="color: #ad7fa8; font-style: italic;">'information_schema'</span>) (> and tablename like to_char(now() + interval <span style="color: #ad7fa8; font-style: italic;">'1 month'</span>, <span style="color: #ad7fa8; font-style: italic;">'%YYYYMM'</span>) (> ) as t -> order by schemaname <> <span style="color: #ad7fa8; font-style: italic;">'previous parts'</span>, schemaname;
| schemaname | tablename |
<span style="color: #888a85;">—————-+————————
| </span> previous parts | 1 |
| central | stats_entrantes_201108 |
(2 rows) </pre>
<p>As you see, our partitions are named_YYYYMM so that's it's easy to match
them in our queries, but I guess about everyone does about the same here.
Then the to_char expressions only allow to not enter manually '%201108' in
the query text. And there's a trick so that we display how many partitions
we have this month, adding a line to the result...</p>
]]></description>
<p>En attendant, bonne lecture !</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 27 Jul 2011 22:35:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/27-check-parts-for-next-month.html</guid> </item> <item> <title>Comment Utiliser pgloader</title> <link>http://tapoueh.org/blog/2011/07/22-comment-utiliser-pgloader.html</link> <description><![CDATA[<p>C'est une question qui revient régulièrement, et à laquelle je pensais avoir apporté une réponse satisfaisante avec <a href="https://github.com/dimitri/pgloader/tree/master/examples">les exemples pgloader</a>. Ce document ressemble un peu à un <em>tutoriel</em>, en anglais, et je l'ai détaillé dans l'article <a href="22-how-to-use-pgloader.html">how to use pgloader</a> sur ce même site, en anglais. Si la demande est suffisante, je le traduirai en français.</p>
<h2>installing pgloader</h2> <p class="first">Either use the <a href="http://packages.debian.org/source/pgloader">debian package</a> or the one for your distribution of choice if you use another one. RedHat, CentOS, FreeBSD, OpenBSD and some more already include a binary package that you can use directly.</p> <p>Or you could<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 22 Jul 2011 13:48:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/22-comment-utiliser-pgloader.html</guid> </item> <item> <title>How To Use PgLoader</title> <link>http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html</link> <description><![CDATA[<p>This question about <a href="../../../pgsql/pgloader.html">pgloader</a> usage coms in quite frequently, and I think the examples <a href="https://github.com/dimitri/pgloader/tree/master/examples">README</a> goes a long way in answering it. It's not exactly a <em>tutorial</em> but is almost there. Let me paste it here for reference:</p>
git clone https://github.com/dimitri/pgloader.git and go from
there. As it's all python code, it runs fine interpreted from the source
directory, you don't <em>need</em> to install it in a special place in your system.</p>
<h2>setting up the test environment</h2>
<p class="first">To use them, please first create a pgloader database, then for each example
the tables it needs, then issue the pgloader command:</p>
<pre class="src">
$ createdb —encoding=utf-8 pgloader
$ cd examples
$ psql pgloader < simple/simple.sql
$ ../pgloader.py -Tvc pgloader.conf simple
</pre>
<p>If you want to load data from all examples, create tables for all of them
first, then run pgloader without argument.</p>
<h2>example description</h2>
<p class="first">The provided examples are:</p>
<ul>
<li>simple
<p>This dataset shows basic case, with trailing separator and data
reordering.</p></li>
<li>xzero
<p>Same as simple but using \0 as the null marker ( )</p></li>
<li>errors
<p>Same test, but with impossible dates. Should report some errors. If it
does not report errors, check you're not using psycopg 1.1.21.</p>
<p>Should report 3 errors out of 7 lines (4 updates).</p></li>
<li>clob
<p>This dataset shows some text large object importing to PostgreSQL text
datatype.</p></li>
<li>cluttured
<p>A dataset with newline escaped and multi-line input (without quoting)
Beware of data reordering, too.</p></li>
<li>csv
<p>A dataset with csv delimiter ',' and quoting '"'.</p></li>
<li>partial
<p>A dataset from which we only load some columns of the provided one.</p></li>
<li>serial
<p>In this dataset the id field is ommited, it's a serial which will be
automatically set by PostgreSQL while COPYing.</p></li>
<li>reformat
<p>A timestamp column is formated the way MySQL dump its timestamp,
which is not the same as the way PostgreSQL reads them. The
reformat.mysql module is used to reformat the data on-the-fly.</p></li>
<li>udc
<p>A used defined column test, where all file columns are not used but
a new constant one, not found in the input datafile, is added while
loading data.</p></li>
</ul>
<h2>running the import</h2>
<p class="first">You can launch all those pgloader tests in one run, provided you created the
necessary tables:</p>
<pre class="src">
/*sql; do psql pgloader < $sql; done $ ../pgloader.py -Tsc pgloader.conf$ for sql in
errors WARNING COPY error, trying to find on which line errors WARNING COPY data buffer saved in /tmp/errors.AhWvAv.pgloader errors WARNING COPY error recovery done (2/3) in 0.064s errors WARNING COPY error, trying to find on which line errors WARNING COPY data buffer saved in /tmp/errors.BclHtj.pgloader errors WARNING COPY error recovery done (1/1) in 0.054s errors ERROR 3 errors found into [errors] data errors ERROR please read /tmp/errors.rej.log for errors log errors ERROR and /tmp/errors.rej for data still to process errors ERROR 3 database errors occured reformat WARNING COPY error, trying to find on which line reformat WARNING COPY data buffer saved in /tmp/reformat.6P4WCD.pgloader reformat WARNING COPY error recovery done (1/4) in 0.034s reformat ERROR 1 errors found into [reformat] data reformat ERROR please read /tmp/reformat.rej.log for errors log reformat ERROR and /tmp/reformat.rej for data still to process reformat ERROR 1 database errors occured
| Table name | duration | size | copy rows | errors |
====================================================================
| allcols | 0.025s | - | 8 | 0 |
| clob | 0.034s | - | 7 | 0 |
| cluttered | 0.061s | - | 6 | 0 |
| csv | 0.035s | - | 6 | 0 |
| errors | 0.113s | - | 4 | 3 |
| fixed | 0.045s | - | 3 | 0 |
| partial | 0.030s | - | 7 | 0 |
| reformat | 0.036s | - | 4 | 1 |
| serial | 0.029s | - | 7 | 0 |
| simple | 0.050s | - | 7 | 0 |
| udc | 0.020s | - | 5 | 0 |
====================================================================
| Total | 0.367s | - | 64 | 4 |
<center> <p><a class="image-link" href="../../../images/emacs-cheat-sheet.png"> <img src="../../../images/emacs-cheat-sheet-tn.png"></a></p> </center> <p>Hope you'll like it!</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 22 Jul 2011 13:38:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/22-how-to-use-pgloader.html</guid> </item> <item> <title>Emacs Cheat Sheet</title> <link>http://tapoueh.org/blog/2011/07/blog/2011/07/20-emacs-cheat-sheet.html</link> <description><![CDATA[<p>I stumbled upon the following <em>cheat sheet</em> for <a href="http://www.gnu.org/software/emacs/">Emacs</a> yesterday, and it's worth sharing. I already learnt or discovered again some nice default chords, like for example
C-x C-o runs the command delete-blank-linesandC-M-o runs the command split-line. I guess I will use the later one a lot.</p>
<center> <p><a class="image-link" href="../../../images/emacs-cheat-sheet.png"> <img src="../../../images/emacs-cheat-sheet-tn.png"></a></p> </center> <p>Hope you'll like it!</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 20 Jul 2011 10:44:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/blog/2011/07/20-emacs-cheat-sheet.html</guid> </item> <item> <title>Emacs Cheat Sheet</title> <link>http://tapoueh.org/blog/2011/07/20-emacs-cheat-sheet.html</link> <description><![CDATA[<p>I stumbled upon the following <em>cheat sheet</em> for <a href="http://www.gnu.org/software/emacs/">Emacs</a> yesterday, and it's worth sharing. I already learnt or discovered again some nice default chords, like for example
C-x C-o runs the command delete-blank-linesandC-M-o runs the command split-line. I guess I will use the later one a lot.</p>
<center> <p><a class="image-link" href="../../../images/skytools3.pdf"> <img src="../../../images/skytools3-0.png"></a></p> </center> <p>Les <em>slides</em> de l'ensemble des présentations devraient être publiés en ligne à terme, mais cela ne va pas pouvoir être fait aussi rapidement que nous le voudrions tous. Alors voici un peu de lecture en attendant la suite !</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 20 Jul 2011 10:44:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/20-emacs-cheat-sheet.html</guid> </item> <item> <title>Skytools3 : les slides</title> <link>http://tapoueh.org/blog/2011/07/19-skytools3-slides.html</link> <description><![CDATA[<p>La conférence <a href="http://char11.org/">CHAR(11)</a> étant maintenant terminée, il est d'usage de publier les <em>slides</em> utilisés. J'ai présenté <a href="http://wiki.postgresql.org/wiki/SkyTools">Skytools</a>
3.0dont la prochaine version sera publiée dès que j'aurais eu le temps de terminer de revoir (en fait principalement d'écrire) la documentation.</p>
<center> <p><a class="image-link" href="../../../images/skytools3.pdf"> <img src="../../../images/skytools3-0.png"></a></p> </center> <p>The slides for all the talks should eventually make their way to a central place, but expect some noticable delay here. Sorry about that, and have a good reading meanwhile!</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 19 Jul 2011 14:39:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/19-skytools3-slides.html</guid> </item> <item> <title>Skytools3 talk Slides</title> <link>http://tapoueh.org/blog/2011/07/19-skytools3-talk-slides.html</link> <description><![CDATA[<p>In case you're wondering, here are the slides from the <a href="http://char11.org/">CHAR(11)</a> talk I gave, about <a href="http://wiki.postgresql.org/wiki/SkyTools">Skytools</a>
3.0, <em>soon</em> to be released. That means as soon as I have enough time available to polish (or write) the documentation.</p>
<p>As it was not that much work to implement, here's the whole of it:</p> <pre class="src"> <span style="color: #b22222;">;;;</span><span style="color: #b22222;"> </span><span style="color: #b22222;">;;; </span><span style="color: #b22222;">Breadcrumb support </span><span style="color: #b22222;">;;;</span><span style="color: #b22222;"> </span>(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">tapoueh-breadcrumb-to-current-page</span> ()<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 19 Jul 2011 14:24:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/19-skytools3-talk-slides.html</guid> </item> <item> <title>Elisp Breadcrumbs</title> <link>http://tapoueh.org/blog/2011/07/blog/2011/07/14-elisp-breadcrumbs.html</link> <description><![CDATA[<p>A <a href="http://en.wikipedia.org/wiki/Breadcrumb_(navigation)">breadcrumb</a> is a navigation aid. I just added one to this website, so that it gets easier to browse from any article to its local and parents indexes and back to <a href="../../../index.html">/dev/dim</a>, the root webpage of this site.</p>
<span style="color: #bc8f8f;">"Return a list of (name . link) from the index root page to current one"</span> (<span style="color: #7f007f;">let*</span> ((current (muse-current-file)) (cwd (file-name-directory current)) (project (muse-project-of-file current)) (root (muse-style-element <span style="color: #da70d6;">:path</span> (caddr project))) (path (tapoueh-path-to-root)) (dirs (split-string (file-relative-name current root) <span style="color: #bc8f8f;">"/"</span>))) <span style="color: #b22222;">;; </span><span style="color: #b22222;">("blog" "2011" "07" "13-back-from-char11.muse") </span> (append (list (cons <span style="color: #bc8f8f;">"/dev/dim"</span> (concat path <span style="color: #bc8f8f;">"index.html"</span>))) (<span style="color: #7f007f;">loop</span> for p in (butlast dirs) collect (cons p (format <span style="color: #bc8f8f;">"%s%s/index.html"</span> path p)) do (setq path (concat path p <span style="color: #bc8f8f;">"/"</span>))))))
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">tapoueh-insert-breadcrumb-div</span> ()
<span style="color: #bc8f8f;">"The real HTML inserting"</span> (insert <span style="color: #bc8f8f;">"<div id=\"breadcrumb\">"</span>) (<span style="color: #7f007f;">loop</span> for (name . link) in (tapoueh-breadcrumb-to-current-page) do (insert (format <span style="color: #bc8f8f;">"<a href=%s>%s</a>"</span> link name) <span style="color: #bc8f8f;">" / "</span>)) (insert <span style="color: #bc8f8f;">"</div>\n"</span>))
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">tapoueh-insert-breadcrumb</span> ()
<span style="color: #bc8f8f;">"Must run with current buffer being a muse article"</span> (<span style="color: #7f007f;">save-excursion</span> (beginning-of-buffer) (<span style="color: #7f007f;">when</span> (tapoueh-extract-directive <span style="color: #bc8f8f;">"author"</span> (muse-current-file)) (re-search-forward <span style="color: #bc8f8f;">"<body>"</span> nil t) <span style="color: #b22222;">; </span><span style="color: #b22222;">find where the article content is </span> (re-search-forward <span style="color: #bc8f8f;">"<h2>"</span> nil t) <span style="color: #b22222;">; </span><span style="color: #b22222;">that's the title line </span> (beginning-of-line) (open-line 1) (tapoueh-insert-breadcrumb-div)
(re-search-forward <span style="color: #bc8f8f;">"<h2>"</span> nil t 2) <span style="color: #b22222;">; </span><span style="color: #b22222;">that's the TAG line </span> (beginning-of-line) (open-line 1) (tapoueh-insert-breadcrumb-div)))) </pre>
:after function of my <a href="http://www.emacswiki.org/emacs/EmacsMuse">Muse</a> project style, and
it gets the work done.</p>
]]></description>
<p>This code is now called in the<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 14 Jul 2011 18:44:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/blog/2011/07/14-elisp-breadcrumbs.html</guid> </item> <item> <title>Elisp Breadcrumbs</title> <link>http://tapoueh.org/blog/2011/07/14-elisp-breadcrumbs.html</link> <description><![CDATA[ (open-line 1) (tapoueh-insert-breadcrumb-div)))) </pre>
:after function of my <a href="http://www.emacswiki.org/emacs/EmacsMuse">Muse</a> project style, and
it gets the work done.</p>
<h2>Tags</h2>
<p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/muse.html">Muse</a></p>
</div>
]]></description>
<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 14 Jul 2011 18:44:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/14-elisp-breadcrumbs.html</guid> </item> <item> <title>De retour de CHAR(11)</title> <link>http://tapoueh.org/blog/2011/07/13-de-retour-de-char11.html</link> <description><![CDATA[h1>De retour de CHAR(11)</h1>
Wednesday, July 13 2011, 17:30 </div><p>Quelle meilleure occupation dans le train du retour de <a href="http://char11.org/schedule">CHAR(11)</a> que de se faire reporteur pour l'occasion ? En réalité, dormir serait une idée tant les soirées se sont prolongées !</p> <p>Nous avons eu le plaisir d'écouter <strong><em>Jan Wieck</em></strong> présenter un historique simplifié de la réplication avec <a href="http://www.postgresql.org/">PostgreSQL</a>. Étant lui-même l'un des pionniers du domaine, son point de vue est des plus intéressants. Il a parlé de l'évolution des solutions de réplication, et je ne peux m'empêcher de penser que par bien des côtés <a href="http://wiki.postgresql.org/wiki/SKytools">Skytools</a> est une évolution de <a href="http://slony.info/">Slony</a> — Jan, auteur de Slony, semblait d'accord avec cela.</p> <p>En effet Skytools est né de limitations de Slony. Certaines d'entre elles existent toujours, comme l'absence de séparation entre la couche de <strong><em>queuing</em></strong> et la couche de réplication elle-même, et certaines ont été résolues depuis, comme les difficultés à subir de fortes charges en écriture. Et puis les deux solutions partagent même une partie de leur implémentation, depuis PostgreSQL 8.3, avec les types de donnéestxidet <a href="http://www.postgresql.org/docs/8.3/interactive/functions-info.html#FUNCTIONS-TXID-SNAPSHOT">txid_snapshot</a>. Bien sûr, l'objectif de Skytools est d'avoir une solution la plus simple possible, parfaitement adapée à un ensemble de cas d'utilisation précis et bornés, alors que Slony essaye de résoudre automatiquement les problèmes les plus difficiles du domaine, au prix d'une interface très complexe.</p> <p>Bien sûr, <strong><em>Jan</em></strong> a pris le temps de comparer objectivement ces solutions de réplication avec la solution intégrée dans PostgreSQL, <em>Streaming Replication</em> et <em>Hot Standby</em>. Nous avions déjà la réplication binaire asynchrone, PostgreSQL 9.1 nous apporte la réplication synchrone avec un contrôle par transaction. <a href="http://database-explorer.blogspot.com/">Simon Riggs</a>, auteur de la fonctionalité, a insisté sur l'innovation que cela représente : aucun autre projet ne permet de contrôler la garantie de durabilité des données avec une granularité aussi souple et précise !</p> <p><a href="http://projects.2ndquadrant.com/repmgr">repmgr</a> est une solution d'administration de <em>cluster</em> animés avec <em>Hot Standby</em> et <em>Streaming Replication</em> (synchrone ou non). Son fonctionnement a été détaillé par <strong><em>Greg Smith</em></strong> et <strong><em>Cédric Villemain</em></strong>. Le premier a montré comment mettre au point une architecture permettant de répartir la charge en lecture, et le second comment obtenir un système tolérant aux pannes grâce au <em>failover</em> automatique intégré dans repmgr. Cette solution innovante a été mise au point en grande partie par 2ndQuadrant France, nous l'avons déjà estampillée <em>production ready</em>.</p> <p><strong><em><a href="http://www.hagander.net/">Magnus Hagander</a></em></strong> a beaucoup travaillé sur le protocole de <em>streaming</em> utilisé pour la réplication intégrée dans PostgreSQL 9.1, ainsi que sur les outils qui exploitent ce protocole. Il a naturellement présenté cela, et l'idée d'un <em>proxy</em> relayant le flux binaire des journaux de transaction est revenue dans les discutions (nous avions déjà envisagé cela en 2010, l'article en anglais <a href="../../2010/05/27-back-from-pgcon2010.html">Back from PgCon2010</a> contient quelques éléments sur le sujet). Avec la réplication synchrone, il devient possible de concevoir des architectures avancées, robustes et versatiles — le proxy pourrait maintenant s'occuper à la fois des archives et des serveurs <em>standby</em>.</p> <p><a href="http://database-explorer.blogspot.com/">Simon Riggs</a> nous a ensuite proposé une rétrospective des 7 dernières années de travail qu'il a réalisé avec PostgreSQL, de l'implémentation du <em>Point in Time Recovery</em> à la réplication synchrone, en passant par <em>Hot Standby</em>. Ce que nous avons dans PostgreSQL 9.0 correspond déjà à ce qu'Oracle propose de plus avancé en terme de durabitilé des données, et 9.1 permet de franchir l'étape suivante. Cela ne freine en rien <strong><em>Simon</em></strong> qui parlait déjà des projets à venir pour les 10 prochaines années.</p> <p>Enfin, <a href="http://www.heroku.com/">Heroku</a> nous a présenté leur incroyable entreprise. Ils ont aujourd'hui plus de150 000instances de PostgreSQL en production, démontrant que notreSGBDpréféré est prêt pour les hébergeurs. <strong><em>Heroku</em></strong> est en train de concevoir et réaliser une solution prête à l'emploi pour le fameux <em>Cloud</em> si difficile à définir. Ici, il s'agit d'être capable d'ajouter des nouveaux réplicas en lecture seule à la volée pour encaisser les pics de trafic, créer des instances de développement d'un clic, etc.</p> <p>Cet article ne couvre qu'une petite sélection des sujets abordés à la conférence, je compte sur <a href="http://blog.guillaume.lelarge.info/">Guillaume</a> pour lui aussi vous parler de <a href="http://char11.org/schedule">CHAR(11)</a>, mais il faudra peut être attendre son retour des <a href="http://2011.rmll.info/">RMLL</a> (quelle énergie !).</p> <h2>Tags</h2> <p><a href="../../../tags/postgresqlfr.html">PostgreSQLFr</a> <a href="../../../tags/conferences.html">Conferences</a> <a href="../../../tags/skytools.html">Skytools</a></p>]]></description>
<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 13 Jul 2011 17:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/13-de-retour-de-char11.html</guid> </item> <item> <title>Back From CHAR(11)</title> <link>http://tapoueh.org/blog/2011/07/13-back-from-char11.html</link> <description><![CDATA[h1>Back From CHAR(11)</h1>
Wednesday, July 13 2011, 17:15 </div><p><a href="http://char11.org/schedule">CHAR(11)</a> finished somewhen in the night leading to today, if you consider the <em>social events</em> to be part of it, which I definitely do. This conference has been a very good one, both on the organisation side of things and of course for its content.</p> <p>It began with a perspective about the evolution of replication solutions, by <strong><em>Jan Wieck</em></strong> himself. In some way <a href="http://wiki.postgresql.org/wiki/SKytools">Skytools</a> is an evolution of <a href="http://slony.info/">Slony</a>, in the sense that it reuses the same concepts, a part of the design, and even share bits of the implementation (like the <a href="http://www.postgresql.org/docs/8.3/interactive/functions-info.html#FUNCTIONS-TXID-SNAPSHOT">txid_snapshot</a> datatype that were added in PostgreSQL 8.3). The evolution occured in choosing a subset of the features of Slony and then simplifying the user interface as much as possible. And with Skytools 3.0, those features that were removed but still are useful to solve real-life problems are now available too.</p> <p>Of course the talk did approach the other replication solutions (not just the trigger based ones), and did compare <a href="http://wiki.postgresql.org/wiki/Setting_up_RServ_with_PostgreSQL_7.0.3">RServ</a> to <a href="http://bucardo.org/">Bucardo</a> for example. And then all those were compared to the <a href="http://www.postgresql.org/">PostgreSQL</a> core replication facilities, which are quite a different animal. It was a really nice <em>keynote</em> here, preparing the audience minds to make the most out of all the other talks.</p> <p>I will not review all the talks in details, as I'm pretty sure some other attendees will turn into reporters themselves: scaling the write load!</p> <p>Still <a href="http://projects.2ndquadrant.com/repmgr">repmgr</a> got its share of attention. <a href="http://www.2ndquadrant.com/books/postgresql-9-0-high-performance/">Greg Smith</a> and <a href="http://www.2ndquadrant.fr/">Cédric Villemain</a> did present both how to do <strong>read scaling</strong> and <strong>auto failover</strong> management with this tool, going into fine details about how it works internally and how to best design your services architecture for maximum <strong>data availibility</strong>. The question and answers section led to insist on the fact that you can not have data availibility with less than 3 production nodes.</p> <p><a href="http://www.hagander.net/">Magnus Hagander</a> detailed how flexible the core protocol support for replication (and streaming) really is. That flexibility means that you can quite easily talk this protocol from any application, and the idea of a <em>wal proxy</em> did pop out again (see <a href="../../2010/05/27-back-from-pgcon2010.html">Back from PgCon2010</a> article for my first mentionning of the idea). The main difference is that we now have <em>synchronous replication</em> support, so that the proxy could be trusted both for archiving and serving standbys.</p> <p>Of course <a href="http://database-explorer.blogspot.com/">Simon</a> still has lots of ideas about next 10 years of replication oriented projects for core PostgreSQL code, and his talk nicely summarized the previous 7 years. Future is bright, and guess what, it's beginning today!</p> <p>We also heard about <a href="http://www.heroku.com/">Heroku</a>, and these guys are doing crazy impressive things. Like running150 000PostgreSQL instances, for example, showing that you can actually use our prefered database server in the hosting business. I expect that the maturing solution and tool sets providing data availibility are soon to be a game changer here. What they are doing is designing a <strong>flexible data architecture</strong> with strong guarantees (<strong>no data loss</strong>). The <em>cloud elasticity</em> is reaching out from the stateless services, and <em>those guys</em> are making it happen now.</p> <p>May you live in interresting times!</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/skytools.html">skytools</a> <a href="../../../tags/conferences.html">Conferences</a></p>]]></description>
<p>My setup is tentatively called <a href="../../../tapoueh.el.html">tapoueh.el</a> and browsable online. It consists of some tweaks on top of Muse, so that I can enjoy <a href="../../../tags/index.html">tags</a> and proper <a href="../../../rss/">rss</a> support. By <em>proper</em>, I mean that I want to be able to produce as many <em>topic</em><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 13 Jul 2011 17:15:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/13-back-from-char11.html</guid> </item> <item> <title>Muse setup revised</title> <link>http://tapoueh.org/blog/2011/07/05-muse-setup-revised.html</link> <description><![CDATA[<p>Most of you are probably reading my posts directly in their
RSSreader tools (mine is <a href="http://www.gnus.org/">gnus</a> thanks to the <a href="http://gwene.org/">Gwene</a> service), so you probably missed it, but I just <em>pushed</em> a whole new version of <a href="http://tapoueh.org">my website</a>, still using <a href="https://github.com/alexott/muse">Emacs Muse</a> as the engine.</p>RSS<em>feeds</em> from a single <em>blog</em>, and thanks to the <em>tags</em> support that's now what I have.</p> <p>TheRSShandling and the tagging system are adhoc code, and this very article begins like this:</p> <pre class="src"> Dimitri Fontaine Muse setup revised 20110705-19:55 Emacs Muse </pre> <p>All the information for the site navigation are taken from there, and at long last theRSSI publish now contains properURLswithout abusing <a href="../../../blog.dim.html">anchors</a>, as in the previous link which is a compatibility page in case you had some bookmarks. The compat only works with javascript (did you know that <em>anchors</em> are not part of theURLthat is sent to the server, so that you can't applyRedirectMatchor other tweaks?), but all it needs is <em>2 lines of code</em>, so I guess that's not so bad.</p> <pre class="src"> <span style="color: #fcaf3e;">var</span> <span style="color: #fce94f;">anchor</span> = window.location.hash; document.location.href=document.getElementById(anchor).href; </pre> <p>I hope you like the new setup as much as I do, even if I'm left with some debugging to do. That's the price to pay for doing it yourself I guess. But I still don't know of a ready to use solution (as in <em>off the shelf</em>) that meet my criteria for web publishing. More on that topic another time.</p> ]]></description><p>My setup is tentatively called <a href="../../../tapoueh.el.html">tapoueh.el</a> and browsable online. It consists of some tweaks on top of Muse, so that I can enjoy <a href="../../../tags/index.html">tags</a> and proper <a href="../../../rss/">rss</a> support. By <em>proper</em>, I mean that I want to be able to produce as many <em>topic</em><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 05 Jul 2011 19:55:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/05-muse-setup-revised.html</guid> </item> <item> <title>Muse setup revised</title> <link>http://tapoueh.org/blog/2011/07/blog/2011/07/05-muse-setup-revised.html</link> <description><![CDATA[<p>Most of you are probably reading my posts directly in their
RSSreader tools (mine is <a href="http://www.gnus.org/">gnus</a> thanks to the <a href="http://gwene.org/">Gwene</a> service), so you probably missed it, but I just <em>pushed</em> a whole new version of <a href="http://tapoueh.org">my website</a>, still using <a href="https://github.com/alexott/muse">Emacs Muse</a> as the engine.</p>RSS<em>feeds</em> from a single <em>blog</em>, and thanks to the <em>tags</em> support that's now what I have.</p> <p>TheRSShandling and the tagging system are adhoc code, and this very article begins like this:</p> <pre class="src"> Dimitri Fontaine <span style="font-size: 140%; font-weight: bold;"> Muse setup revised</span> 20110705-19:55 Emacs Muse </pre> <p>All the information for the site navigation are taken from there, and at long last theRSSI publish now contains properURLswithout abusing <a href="../../../blog.dim.html">anchors</a>, as in the previous link which is a compatibility page in case you had some bookmarks. The compat only works with javascript (did you know that <em>anchors</em> are not part of theURLthat is sent to the server, so that you can't applyRedirectMatchor other tweaks?), but all it needs is <em>2 lines of code</em>, so I guess that's not so bad.</p> <pre class="src"> <span style="color: #7f007f;">var</span> <span style="color: #b8860b;">anchor</span> = window.location.hash; document.location.href=document.getElementById(anchor).href; </pre> <p>I hope you like the new setup as much as I do, even if I'm left with some debugging to do. That's the price to pay for doing it yourself I guess. But I still don't know of a ready to use solution (as in <em>off the shelf</em>) that meet my criteria for web publishing. More on that topic another time.</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 05 Jul 2011 19:55:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/blog/2011/07/05-muse-setup-revised.html</guid> </item> <item> <title>Prêt pour CHAR(11) ?</title> <link>http://tapoueh.org/blog/2011/07/04-pret-pour-char11.html</link> <description><![CDATA[h1>Prêt pour CHAR(11) ?</h1>
Monday, July 04 2011, 20:15 </div><p>La semaine prochaine <strong>déjà</strong> se tient <a href="http://www.char11.org/">CHAR(11)</a>, la conférence spécialisée sur le <em>Clustering</em>, la <em>Haute Disponibilité</em> et la <em>Réplication</em> avec <a href="http://www.postgresql.org/">PostgreSQL</a>. C'est en Europe, à Cambridge cette fois, et c'est en anglais même si plusieurs compatriotes seront dans l'assistance.</p> <p>Si vous n'avez pas encore jeté un œil au <a href="http://www.char11.org/schedule">programme</a>, je vous encourage à le faire. Même si vous n'aviez pas prévu de venir… parce qu'il y a de quoi vous faire changer d'avis !</p> <p>Il est déjà difficile de suivre les <a href="http://archives.postgresql.org/">listes de diffusions PostgreSQL</a> en anglais, pour une simple question de temps, mais parfois la barrière de la langue peut également jouer. Alors si vous n'aviez pas bien suivi, je me permets de préciser qui sont les principaux intervenants à cette conférence.</p> <p><strong><em>Jan Wieck</em></strong> assure la première intervention avec un rétrospectif des solutions de réplication pour PostgreSQL. Il a initié <a href="http://slony.info/">Slony</a> et continue d'être très actif dans son architecture et son développement.</p> <p><strong><em>Greg Smith</em></strong>, un collègue chez <a href="http://www.2ndquadrant.us/">2ndQuadrant</a>, est monsieur performances « bas niveau » : sa spécialité est de tirer le meilleur de votre matériel, de votre configuration serveur, de PostgreSQL lui-même, et des requêtes que vous lui soumettez. Son livre <a href="http://www.2ndquadrant.com/books/postgresql-9-0-high-performance/">PostgreSQL High Performance</a> est un incontournable, à ce titre <a href="http://blog.guillaume.lelarge.info/index.php/post/2011/05/01/%C2%AB-Bases-de-donn%C3%A9es-PostgreSQL,-Gestion-des-performances-%C2%BB">traduit en français</a>.</p> <p>Nous avons ensuite <strong><em>Magnus Hagander</em></strong> qui a rejoint récemment la <em>core team</em> (l'organisation centrale du projet), et qui contribue depuis plus de 10 ans au code de PostgreSQL.</p> <p><strong><em>Simon Riggs</em></strong>, lui aussi un de <a href="http://www.2ndquadrant.com/about/#riggs">nos collègues</a>, a réalisé le <em>PITR</em>, l'archivage des journaux de transactions, la réplication asynchrone et pour la prochaine version de PostgreSQL, la réplication synchrone.</p> <p><strong><em>Hannu Krosing</em></strong> (devinez <a href="http://www.2ndquadrant.com/">où</a> il travaille ?) a conçu l'architecture (et les outils) qui permettent à <a href="http://www.skype.com/">Skype</a> d'annoncer une « scalability » infinie, en tout cas annoncée pour supporter jusqu'à <a href="http://highscalability.com/skype-plans-postgresql-scale-1-billion-users">1 milliard d'utilisateurs</a>.</p> <p><strong><em>Koichi Suzuki</em></strong> dirige les efforts du produit prometteur <a href="http://postgres-xc.sourceforge.net/">PostgreS-XC</a>, un bel exemple de collaboration entre différents acteurs du marché, ici <a href="http://www.enterprisedb.com/">EnterpriseDB</a> et <a href="https://www.oss.ecl.ntt.co.jp/ossc/">NTT Open Source Software Center</a>. Ce qui montre une fois de plus que l'<a href="http://fr.wikipedia.org/wiki/Open_source">Open Source</a> est solidement ancré dans entreprises commerciales.</p> <p>Bien sûr, Cédric et moi-même, de la partie française de <a href="http://www.2ndquadrant.fr/">2ndQuadrant</a>, serons de la partie. Nous interviendrons sur des sujets que nous connaissons bien pour avoir participé à leur développement et pour les déployer et les maintenir en production, <a href="http://projects.2ndquadrant.com/repmgr">repmgr</a> et <a href="http://wiki.postgresql.org/wiki/Londiste_Tutorial">Londiste</a>.</p> <p>Et je passe sur d'autres profils, dont les sujets ne serront pas moins intéressants. Bref, si <em>réplication</em> et <em>cluster</em> sont des thèmes que vous voulez conjuguer avec PostgreSQL, c'est l'endroit où passer le début de la semaine prochaine !</p> <h2>Tags</h2> <p><a href="../../../tags/postgresqlfr.html">PostgreSQLfr</a> <a href="../../../tags/conferences.html">Conferences</a> <a href="../../../tags/skytools.html">skytools</a></p>]]></description>
<item><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 04 Jul 2011 20:15:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/07/04-pret-pour-char11.html</guid> </item>
<title>Multi-Version support for Extensions</title> <link>http://tapoueh.org/blog/2011/06/29-multi-version-support-for-extensions.html</link> <description><![CDATA[h1>Multi-Version support for Extensions</h1>
Wednesday, June 29 2011, 09:50 </div><p><span class="hack"> </span></p> <p>We still have this problem to solve with extensions and their packaging. How to best organize things so that your extension is compatible with before9.1and9.1and following releases of <a href="http://www.postgresql.org/">PostgreSQL</a>?</p> <p>Well, I had to do it for the <a href="http://pgfoundry.org/projects/ip4r/">ip4r</a> contribution, and I wanted the following to happen:</p> <pre class="src"> dpkg-deb: building package `postgresql-8.3-ip4r' ... dpkg-deb: building package `postgresql-8.4-ip4r' ... dpkg-deb: building package `postgresql-9.0-ip4r' ... dpkg-deb: building package `postgresql-9.1-ip4r' ... </pre> <p>And here's a simple enough way to achieve that. First, you have to get your packaging ready the usual way, and to install the build dependencies. Then realizing that/usr/share/postgresql-common/supported-versionsfrom the latestpostgresql-commonpackage will only return8.3inlenny(yes, I'm doing some <em>backporting</em> here), we have to tweak it.</p> <pre class="src"> postgresql-server-dev-8.4 postgresql-server-dev-9.0 postgresql-server-dev-9.1 postgresql-server-dev-all$ sudo dpkg-divert \ —divert /usr/share/postgresql-common/supported-versions.distrib \ —rename /usr/share/postgresql-common/supported-versions
$ cat /usr/share/postgresql-common/supported-versions /bin/bash
dpkg -l postgresql-server-dev-* \ | awk -F '[ -]' '/^ii/ && ! /server-dev-all/ {print $6}' </pre>
<p>Now we are allowed to build our extension for all those versions, so we add9.1to thedebian/pgversionsfile. Anddebuildwill do the right thing now, thanks to <a href="http://manpages.debian.net/cgi-bin/man.cgi?query=pg_buildext">pg_buildext</a> from <a href="http://packages.debian.org/sid/postgresql-server-dev-all">postgresql-server-dev-all</a>.</p> <p>The problem we face is that the built is not an <a href="http://www.postgresql.org/docs/9.1/static/extend-extensions.html">extension</a> as in9.1, so things like\dxinpsqland <a href="http://www.postgresql.org/docs/9.1/static/sql-createextension.html">CREATE EXTENSION</a> will not work out of the box. First, we need a control file. Then we need to remove the transaction control from the install script (here,ip4r.sql), and finally, this script needs to be calledip4r--1.05.sql. Here's how I did it:</p> <pre class="src"> $ cat ip4r.control comment = 'IPv4 and IPv4 range index types' default_version = '1.05' relocatable = yes$ cat debian/postgresql-9.1-ip4r.install debian/ip4r-9.1/ip4r.so usr/lib/postgresql/9.1/lib ip4r.control usr/share/postgresql/9.1/extension debian/ip4r-9.1/ip4r.sql usr/share/postgresql/9.1/extension
$ cat debian/postgresql-9.1-ip4r.links usr/share/postgresql/9.1/extension/ip4r.sql usr/share/postgresql/9.1/extension/ip4r—1.05.sql </pre>
<p>Be careful not to forget to remove any and allBEGIN;andCOMMIT;lines from theip4r.sqlfile, which meant that I also removed support for <em>Rtree</em>, which is not relevant for modern versions of PostgreSQL saith the script (post8.2). That means I'm not publishing this very work yet, but I wanted to share thedebian/postgresql-9.1-extension.linksidea.</p> <p>Notice that I didn't change anything about the.sql.inmake rule, so I didn't have to use the support formodule_pathnamein the control file.</p> <p>Now, after the usualdebuildstep, I can justsudo debito install all the just build packages andCREATE EXTENSIONwill run fine. And in9.0you get the old way to install it, but it still works:</p> <pre class="src"> $ psql -U postgres —cluster 9.0/main -1 \ -f /usr/share/postgresql/9.0/contrib/ip4r.sql <lots of chatter>$ psql -U postgres —cluster 9.1/main -c 'create extension ip4r;' CREATE EXTENSION </pre>
<p>That's it :)</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/ip4r.html">ip4r</a> <a href="../../../tags/9.1.html">9.1</a></p>]]></description>
<pre class="src"> (<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">cl</span>) (<span style="color: #7f007f;">loop</span> for b being the buffers<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 29 Jun 2011 09:50:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/06/29-multi-version-support-for-extensions.html</guid> </item> <item> <title>Don't be afraid of 'cl</title> <link>http://tapoueh.org/blog/2011/06/blog/2011/06/20-dont-be-afraid-of-cl.html</link> <description><![CDATA[<p>In this <a href="http://tsengf.blogspot.com/2011/06/confirm-to-quit-when-editing-files-from.html">blog article</a>, you're shown a quite long function that loop through your buffers to find out if any of them is associated with a file whose full name includes
"projects". Well, you should not be afraid of usingcl:</p><p>If you want to collect the list of buffers whose name matches your test, then replace when (string-match <span style="color: #bc8f8f;">"projects"</span> (or (buffer-file-name b) <span style="color: #bc8f8f;">""</span>)) return t) </pre>
return tbycollect band you're done. Really, thisloopthing is worth learning.</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 20 Jun 2011 00:15:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/06/blog/2011/06/20-dont-be-afraid-of-cl.html</guid> </item> <item> <title>Don't be afraid of 'cl</title> <link>http://tapoueh.org/blog/2011/06/20-dont-be-afraid-of-cl.html</link> <description><![CDATA[h1>Don't be afraid of 'cl</h1>
Monday, June 20 2011, 00:15 </div><p>In this <a href="http://tsengf.blogspot.com/2011/06/confirm-to-quit-when-editing-files-from.html">blog article</a>, you're shown a quite long function that loop through your buffers to find out if any of them is associated with a file whose full name includes"projects". Well, you should not be afraid of usingcl:</p> <pre class="src"> (<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">cl</span>) (<span style="color: #729fcf; font-weight: bold;">loop</span> for b being the buffers<p>If you want to collect the list of buffers whose name matches your test, then replace when (string-match <span style="color: #ad7fa8; font-style: italic;">"projects"</span> (or (buffer-file-name b) <span style="color: #ad7fa8; font-style: italic;">""</span>)) return t) </pre>
return tbycollect band you're done. Really, thisloopthing is worth learning.</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a></p>]]></description>
<item><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 20 Jun 2011 00:15:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/06/20-dont-be-afraid-of-cl.html</guid> </item>
<title>Back from Ottawa, preparing for Cambridge</title> <link>http://tapoueh.org/blog/2011/05/30-back-from-ottawa-preparing-for-cambridge.html</link> <description><![CDATA[h1>Back from Ottawa, preparing for Cambridge</h1>
Monday, May 30 2011, 11:00 </div><p><span class="hack"> </span></p> <p>While <a href="http://blog.hagander.net/">Magnus</a> is all about <a href="http://2011.pgconf.eu/">PG Conf EU</a> already, you have to realize we're just landed back from <a href="http://www.pgcon.org/2011/">PG Con</a> in Ottawa. My next stop in the annual conferences is <a href="http://char11.org/">CHAR 11</a>, the <em>Clustering, High Availability and Replication</em> conference in Cambridge, 11-12 July. Yes, on the old continent this time.</p> <p>This year's <em>pgcon</em> hot topics, for me, have been centered around a better grasp at <a href="http://www.postgresql.org/docs/9.1/static/transaction-iso.html#XACT-SERIALIZABLE">SSI</a> and <em>DDL Triggers</em>. Having those beasts in <a href="http://www.postgresql.org/">PostgreSQL</a> would allow for auditing, finer privileges management and some more automated replication facilities. Imagine thatALTER TABLEis able to fire a <em>trigger</em>, provided by <em>Londiste</em> or <em>Slony</em>, that will do what's needed on the cluster by itself. That would be awesome, wouldn't it?</p> <p>At <em>CHAR 11</em> I'll be talking about <a href="http://wiki.postgresql.org/wiki/SkyTools">Skytools 3</a>. You know I've been working on its <em>debian</em> packaging, now is the time to review the documentation and make there something as good looking as the monitoring system are...</p> <p>Well, expect some news and a nice big picture diagram overview soon, if work schedule leaves me anytime that's what I want to be working on now.</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/pgcon.html">pgcon</a> <a href="../../../tags/conferences.html">Conferences</a> <a href="../../../tags/skytools.html">skytools</a> <a href="../../../tags/9.1.html">9.1</a></p>]]></description>
<p>You're back to enjoying<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 30 May 2011 11:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/30-back-from-ottawa-preparing-for-cambridge.html</guid> </item> <item> <title>el-get 2.2</title> <link>http://tapoueh.org/blog/2011/05/blog/2011/05/26-el-get-22.html</link> <description><![CDATA[<p>We've spotted a little too late for our own taste a discrepancy in the source tree: a work in progress patch landed in git just before to release <a href="https://github.com/dimitri/el-get">el-get</a> stable. So we cleaned the tree (thanks again <a href="http://julien.danjou.info/">Julien</a>), branched a stable maintenance tree, and released
2.2from there.</p>el-get:)</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 26 May 2011 12:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/blog/2011/05/26-el-get-22.html</guid> </item> <item> <title>el-get 2.2</title> <link>http://tapoueh.org/blog/2011/05/26-el-get-22.html</link> <description><![CDATA[h1>el-get 2.2</h1>
Thursday, May 26 2011, 12:00 </div><p>We've spotted a little too late for our own taste a discrepancy in the source tree: a work in progress patch landed in git just before to release <a href="https://github.com/dimitri/el-get">el-get</a> stable. So we cleaned the tree (thanks again <a href="http://julien.danjou.info/">Julien</a>), branched a stable maintenance tree, and released2.2from there.</p> <p>You're back to enjoyingel-get:)</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/release.html">release</a></p>]]></description>
<h3>Latest released version</h3> <p><a href="https://github.com/dimitri/el-get">el-get</a> version<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 26 May 2011 12:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/26-el-get-22.html</guid> </item> <item> <title>el-get 2.1</title> <link>http://tapoueh.org/blog/2011/05/blog/2011/05/26-el-get-21.html</link> <description><![CDATA[<p>Current <a href="https://github.com/dimitri/el-get">el-get</a> status is stable, ready for daily use and packed with extra features that make life easier. There are some more things we could do, as always, but they will be about smoothing things further.</p>
2.1is available, with a boatload of features, including autoloads support, byte-compiling in an external <em>clean room</em> <a href="http://www.gnu.org/software/emacs/">Emacs</a> instance, custom support, lazy initialisation support (defering all <em>init</em> functions toeval-after-load), and multi repositoriesELPAsupport.</p> <h3>Version numbering</h3> <p class="first">Version String are now inspired by how Emacs itself numbers its versions. First is the major version number, then a dot, then the minor version number. The minor version number is0when still developping the next major version. So3.0is a developer release while3.1will be the next stable release.</p> <p>Please note that this versioning policy has been picked while backing1.2~dev, so1.0was a <em>stable</em> release in fact. Ah, history.</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 26 May 2011 10:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/blog/2011/05/26-el-get-21.html</guid> </item> <item> <title>el-get 2.1</title> <link>http://tapoueh.org/blog/2011/05/26-el-get-21.html</link> <description><![CDATA[h1>el-get 2.1</h1>
Thursday, May 26 2011, 10:00 </div><p>Current <a href="https://github.com/dimitri/el-get">el-get</a> status is stable, ready for daily use and packed with extra features that make life easier. There are some more things we could do, as always, but they will be about smoothing things further.</p> <h3>Latest released version</h3> <p><a href="https://github.com/dimitri/el-get">el-get</a> version2.1is available, with a boatload of features, including autoloads support, byte-compiling in an external <em>clean room</em> <a href="http://www.gnu.org/software/emacs/">Emacs</a> instance, custom support, lazy initialisation support (defering all <em>init</em> functions toeval-after-load), and multi repositoriesELPAsupport.</p> <h3>Version numbering</h3> <p class="first">Version String are now inspired by how Emacs itself numbers its versions. First is the major version number, then a dot, then the minor version number. The minor version number is0when still developping the next major version. So3.0is a developer release while3.1will be the next stable release.</p> <p>Please note that this versioning policy has been picked while backing1.2~dev, so1.0was a <em>stable</em> release in fact. Ah, history.</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/release.html">release</a></p>]]></description>
<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 26 May 2011 10:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/26-el-get-21.html</guid> </item> <item> <title>Preparing for PGCON</title> <link>http://tapoueh.org/blog/2011/05/12-preparing-for-pgcon.html</link> <description><![CDATA[h1>Preparing for PGCON</h1>
Thursday, May 12 2011, 10:30 </div><p>It's this time of the year again, the main international <a href="http://www.pgcon.org/2011/">PostgreSQL Conference</a> is next week in Ottawa, Canada. If previous years are any indication, this will be great event where to meet with a lot of the members of your community. The core team will be there, developers will be there, and we will meet with users and their challenging use cases.</p> <p>This is a very good time to review both what you did in the project those last 12 months, and what you plan to do next year. To help with that, several <em>meeting</em> events are organized. They're like a whole-day round table with a kind of an agenda, with a limited number of invited people in, and very intense on-topic discussions about how to organize ourselves for another great year of innovation in PostgreSQL.</p> <p>Then we have two days full of talks where I usually learn some new aspect of the project or of the product, and where ideas tend to just pop-up in a continuous race. Being away from home and with people you see only once a year (some of them more than that of course, hi European fellows!) seems to allow for some broader thinking.</p> <p>The talks I want to go to include <a href="http://www.pgcon.org/2011/schedule/events/361.en.html">Database Scalability Patterns: Sharding for Unlimited Growth</a> by <a href="http://www.pgcon.org/2011/schedule/speakers/20.en.html">Robert Treat</a>, <a href="http://www.pgcon.org/2011/schedule/events/366.en.html">Maintaining Terabytes</a> by <a href="http://www.pgcon.org/2011/schedule/speakers/112.en.html">Selena Deckelmann</a>, <a href="http://www.pgcon.org/2011/schedule/events/307.en.html">NTT’s Case Report</a> by <a href="http://www.pgcon.org/2011/schedule/speakers/192.en.html">Tetsuo Sakata</a>, <a href="http://www.pgcon.org/2011/schedule/events/350.en.html">Hacking the Query Planner</a> by <a href="http://www.pgcon.org/2011/schedule/speakers/202.en.html">Tom Lane</a>. That's for a first day, right?</p> <p>Then, on the second day, I notice <a href="http://www.pgcon.org/2011/schedule/events/311.en.html">Range Types</a> by <a href="http://www.pgcon.org/2011/schedule/speakers/83.en.html">Jeff Davis</a>, <a href="http://www.pgcon.org/2011/schedule/events/309.en.html">SP-GiST - a new indexing infrastructure for PostgreSQL</a> by <a href="http://www.pgcon.org/2011/schedule/speakers/29.en.html">Oleg</a> and <a href="http://www.pgcon.org/2011/schedule/speakers/33.en.html">Teodor</a>, <a href="http://www.pgcon.org/2011/schedule/events/337.en.html">The Write Stuff</a> by <a href="http://www.pgcon.org/2011/schedule/speakers/110.en.html">Greg Smith</a> (a colleague at <a href="http://www.2ndquadrant.fr/">2ndQuadrant</a>).</p> <p>I will miss <a href="http://www.pgcon.org/2011/schedule/events/333.en.html">Serializable Snapshot Isolation in Postgres</a> by <a href="http://www.pgcon.org/2011/schedule/speakers/113.en.html">Kevin Grittner</a> and <a href="http://www.pgcon.org/2011/schedule/speakers/197.en.html">Dan Ports</a>, unfortunately, because I'll be talking about <a href="http://www.pgcon.org/2011/schedule/events/280.en.html">Extensions Development</a> at the same time.</p> <p>Well of course this list is just a first selection, hallway tracks are often what guides me through talks or make me skip some.</p> <p>See you there!</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/pgcon.html">pgcon</a> <a href="../../../tags/extensions.html">Extensions</a></p>]]></description>
<p>That's why I needed <a href="http://tapoueh.org/projects.html#sec21">M-x mailq</a> to display the <em>mail queue</em> and have some easy shortcuts in order to operate it (mainly<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 12 May 2011 10:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/12-preparing-for-pgcon.html</guid> </item> <item> <title>Mailq modeline display</title> <link>http://tapoueh.org/blog/2011/05/blog/2011/05/05-mailq-modeline-display.html</link> <description><![CDATA[<p>If you've not been following along, you might have missed it: it appears to me that even today, in 2011, mail systems work much better when setup the old way. Meaning with a local <a href="http://en.wikipedia.org/wiki/Mail_Transfer_Agent">MTA</a> for outgoing mail. With some niceties, such as <a href="http://tapoueh.org/articles/news/_Postfix_sender_dependent_relayhost_maps.html">sender dependent relayhost maps</a>.</p>
f runs the command mailq-mode-flush, but per site and per id delivery are useful too).</p> <p>Now, I also happen to setup outgoing mail routes to walk through an <em>SSH tunnel</em>, which thanks to both <a href="http://www.manpagez.com/man/5/ssh_config/">~/.ssh/config</a> and <a href="https://github.com/dimitri/cssh">cssh</a> (C-= runs the command cssh-term-remote-open, with completion) is a couple of keystrokes away to start. Well it still happens to me to forget about starting it, which causes mails to hold in a queue until I realise it's not delivered, which always take just about too long.</p> <p>A solution I've been thinking about is to add a little flag in the <a href="http://www.gnu.org/s/emacs/manual/html_node/elisp/Mode-Line-Format.html">modeline</a> in my <a href="http://www.gnus.org/">gnus</a>*Group*and*Summary*buffers. The flag would show up as ✔ when no mail is queued and waiting for me to open the tunnel, or ✘ as soon as the queue is not empty. Here's what it looks like here:</p> <center> <p><img src="../../../images//mailq-modeline-display.png" alt=""></p> </center> <p>Well I'm pretty happy with the setup. The flag is refreshed every minute, and here's as an example how I did setupmailqin my <a href="https://github.com/dimitri/el-get">el-get-sources</a> setup:</p> <pre class="src"><p>I'm not sure how many of you dear readers are using a local MTA to deliver your mails, but well, the ones who do (or consider doing so) might even find this article useful!</p> ]]></description> (<span style="color: #da70d6;">:name</span> mailq <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> () (mailq-modeline-display))) </pre>
<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 05 May 2011 14:10:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/blog/2011/05/05-mailq-modeline-display.html</guid> </item> <item> <title>Mailq modeline display</title> <link>http://tapoueh.org/blog/2011/05/05-mailq-modeline-display.html</link> <description><![CDATA[h1>Mailq modeline display</h1>
Thursday, May 05 2011, 14:10 </div><p>If you've not been following along, you might have missed it: it appears to me that even today, in 2011, mail systems work much better when setup the old way. Meaning with a local <a href="http://en.wikipedia.org/wiki/Mail_Transfer_Agent">MTA</a> for outgoing mail. With some niceties, such as <a href="http://tapoueh.org/articles/news/_Postfix_sender_dependent_relayhost_maps.html">sender dependent relayhost maps</a>.</p> <p>That's why I needed <a href="http://tapoueh.org/projects.html#sec21">M-x mailq</a> to display the <em>mail queue</em> and have some easy shortcuts in order to operate it (mainlyf runs the command mailq-mode-flush, but per site and per id delivery are useful too).</p> <p>Now, I also happen to setup outgoing mail routes to walk through an <em>SSH tunnel</em>, which thanks to both <a href="http://www.manpagez.com/man/5/ssh_config/">~/.ssh/config</a> and <a href="https://github.com/dimitri/cssh">cssh</a> (C-= runs the command cssh-term-remote-open, with completion) is a couple of keystrokes away to start. Well it still happens to me to forget about starting it, which causes mails to hold in a queue until I realise it's not delivered, which always take just about too long.</p> <p>A solution I've been thinking about is to add a little flag in the <a href="http://www.gnu.org/s/emacs/manual/html_node/elisp/Mode-Line-Format.html">modeline</a> in my <a href="http://www.gnus.org/">gnus</a>*Group*and*Summary*buffers. The flag would show up as ✔ when no mail is queued and waiting for me to open the tunnel, or ✘ as soon as the queue is not empty. Here's what it looks like here:</p> <center> <p><img src="../../../images//mailq-modeline-display.png" alt=""></p> </center> <p>Well I'm pretty happy with the setup. The flag is refreshed every minute, and here's as an example how I did setupmailqin my <a href="https://github.com/dimitri/el-get">el-get-sources</a> setup:</p> <pre class="src"><p>I'm not sure how many of you dear readers are using a local MTA to deliver your mails, but well, the ones who do (or consider doing so) might even find this article useful!</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/modeline.html">modeline</a> <a href="../../../tags/cssh.html">cssh</a> <a href="../../../tags/mailq.html">mailq</a> <a href="../../../tags/postfix.html">postfix</a></p> (<span style="color: #729fcf;">:name</span> mailq <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> () (mailq-modeline-display))) </pre>
]]></description>
<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 05 May 2011 14:10:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/05-mailq-modeline-display.html</guid> </item> <item> <title>Tables and Views dependencies</title> <link>http://tapoueh.org/blog/2011/05/04-tables-and-views-dependencies.html</link> <description><![CDATA[h1>Tables and Views dependencies</h1>
Wednesday, May 04 2011, 11:45 </div><p><span class="hack"> </span></p> <p>Let's say you need toALTER TABLE foo ALTER COLUMN bar TYPE bigint;, and <a href="http://postgresql.org">PostgreSQL</a> is helpfully telling you that no you can't because such and such <em>views</em> depend on the column. The basic way to deal with that is to copy paste from the error message the names of the views involved, then prepare a script wherein you firstDROP VIEW ...;thenALTER TABLEand finallyCREATE VIEWagain, all in the same transaction.</p> <p>So you have to copy paste also the view definitions. With large view definitions, it quickly gets cumbersome to do so. Well when you're working on operations, you have to bear in mind that cumbersome is a synonym for <em>error prone</em>, in fact — so you want another solution if possible.</p> <p>Oh, and the other drawback of this solution is thatALTER TABLEwill first take aLOCKon the table, locking out any activity. And more than that, the lock acquisition will queue behind current activity on the table, which means waiting for a fairly long time and damaging the service quality on a moderately loaded server.</p> <p>It's possible to abuse the <a href="http://www.postgresql.org/docs/current/static/catalogs.html">system catalogs</a> in order to find all <em>views</em> that depend on a given table, too. For that, you have to play withpg_dependand you have to know that internally, a <em>view</em> is in fact a <em>rewrite rule</em>. Then here's how to produce the two scripts that we need:</p> <pre class="src"># \t Showing only tuples.# \o /tmp/drop.sql
# select <span style"color: #ad7fa8; font-style: italic;">'DROP VIEW '</span>views <span style="color: #ad7fa8; font-style: italic;">';'</span> from (select distinct(r.ev_class::regclass) as views from pg_depend d join pg_rewrite r on r.oid = d.objid where refclassid = <span style="color: #ad7fa8; font-style: italic;">'pg_class'</span>::regclass and refobjid = <span style="color: #ad7fa8; font-style: italic;">'SCHEMA.TABLENAME'</span>::regclass and classid = <span style="color: #ad7fa8; font-style: italic;">'pg_rewrite'</span>::regclass and pg_get_viewdef(r.ev_class, true) ~ <span style="color: #ad7fa8; font-style: italic;">'COLUMN_NAME'</span>) as x;
# \o /tmp/create.sql# select <span style="color: #ad7fa8; font-style: italic;">'CREATE VIEW '</span> || views || E<span style="color: #ad7fa8; font-style: italic;">' AS \n'</span>|| pg_get_viewdef(views, true) || <span style="color: #ad7fa8; font-style: italic;">';'</span> from (select distinct(r.ev_class::regclass) as views from pg_depend d join pg_rewrite r on r.oid = d.objid where refclassid = <span style="color: #ad7fa8; font-style: italic;">'pg_class'</span>::regclass and refobjid = <span style="color: #ad7fa8; font-style: italic;">'SCHEMA.TABLENAME'</span>::regclass and classid = <span style="color: #ad7fa8; font-style: italic;">'pg_rewrite'</span>::regclass and pg_get_viewdef(r.ev_class, true) ~ <span style="color: #ad7fa8; font-style: italic;">'COLUMN_NAME'</span>) as x;
# \o </pre> <p>Replace <code>SCHEMA.TABLENAME</code> and <code>COLUMN_NAME</code> with your targets here and the first query should give you one row per candidate view. Well if you're not using the <code>\o</code> trick, that is — if you do, check out the generated file instead, with <code>\! cat /tmp/drop.sql</code> for example.</p> <p>Please note that this catalog query is not accurate, as it will select as a candidate any view that will by chance both depend on the target table and contain the <code>column_name</code> in its text definition. So either filter out the candidates properly (by proper proof reading then another <code>WHERE</code> clause), or just accept that you might <code>DROP</code> then <code>CREATE</code> again more <em>views</em> than need be.</p> <p>If you need some more details about the <code>\t \o</code> sequence you might be interested in this older article about <a href"http://tapoueh.org/articles/blog/_Resetting_sequences._All_of_them,_please!.html">resetting sequences</a>.</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/catalogs.html">catalogs</a></p>]]></description>
<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 04 May 2011 11:45:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/04-tables-and-views-dependencies.html</guid> </item> <item> <title>Extension module_pathname and .sql.in</title> <link>http://tapoueh.org/blog/2011/05/02-extension-module_pathname-and-sqlin.html</link> <description><![CDATA[h1>Extension module_pathname and .sql.in</h1>
Monday, May 02 2011, 17:30 </div><p><span class="hack"> </span></p> <p>While currently too busy at work to deliver much Open Source contributions, let's debunk an old habit of <a href="http://www.postgresql.org/">PostgreSQL</a> extension authors. It's all down to copy pasting from <em>contrib</em>, and there's no reason to continue doing$libdirthis way ever since7.4days.</p> <p>Let's take an example here, with the <a href="https://github.com/dimitri/prefix">prefix</a> extension. This one too will need some love, but is still behind on my spare time todo list, sorry about that. So, in theprefix.sql.inwe read</p> <pre class="src"><p>Two things are to change here. First, the PostgreSQL <em>backend</em> will understand just fine if you just sayCREATE OR REPLACE FUNCTION prefix_range_in(cstring) RETURNS prefix_range AS <span style="color: #ad7fa8; font-style: italic;">'MODULE_PATHNAME'</span> LANGUAGE <span style="color: #ad7fa8; font-style: italic;">'C'</span> IMMUTABLE STRICT; </pre>
AS '$libdir/prefix'. So you have to know in thesqlscript the name of the shared object library, but if you do, you can maintain directly aprefix.sqlscript instead.</p> <p>The advantage is that you now can avoid a compatibility problem when you want to support PostgreSQL from8.2to9.1in your extension (older than that and it's <a href="http://wiki.postgresql.org/wiki/PostgreSQL_Release_Support_Policy">no longer supported</a>). You directly ship your script.</p> <p>For compatibility, you could also use the <a href="http://developer.postgresql.org/pgdocs/postgres/extend-extensions.html">control file</a>module_pathnameproperty. But for9.1you then have to add a implicitMakerule so that the script is derived from your.sql.in. And as you are managing several scripts — so that you can handle <em>versioning</em> and <em>upgrades</em> — it can get hairy (<em>hint</em>, you need to copyprefix.sqlasprefix--1.1.1.sql, then change its name at next revision, and think about <em>upgrade</em> scripts too). Themodule_pathnamefacility is better to keep for when managing more than a single extension in the same directory, like the <a href="http://git.postgresql.org/gitweb?p=postgresql.git;a=blob;f=contrib/spi/Makefile;h=0c11bfcbbd47b0c3ed002874bfefd9e2022cf5ac;hb=HEAD">SPI contrib</a> is doing.</p> <p>Sure, maintaining an extension that targets both antique releases of PostgreSQL and <a href="http://developer.postgresql.org/pgdocs/postgres/sql-createextension.html">CREATE EXTENSION</a> super-powered one(s) (not yet released) is a little more involved than that. We'll get back to that, as some people are still pioneering the movement.</p> <p>On my side, I'm working with some <a href="http://www.debian.org/">debian</a> <a href="http://qa.debian.org/developer.php?login=myon">developer</a> on how to best manage the packaging of those extensions, and this work could end up as a specialized <em>policy</em> document and a coordinated <em>team</em> of maintainers for all things PostgreSQL indebian. This will also give some more steam to the PostgreSQL effort for debian packages: the idea is to maintain packages for all supported version (from8.2up to soon9.1), somethingdebianitself can not commit to.</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/prefix.html">prefix</a> <a href="../../../tags/9.1.html">9.1</a></p>]]></description>
<item><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 02 May 2011 17:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/05/02-extension-module_pathname-and-sqlin.html</guid> </item>
<p>A while ago I've been fixing and publishing <a href="https://github.com/dimitri/pgsql-linum-format">pgsql-linum-format</a> separately. That allows to number<title>Emacs and PostgreSQL, PL line numbering</title> <link>http://tapoueh.org/blog/2011/04/blog/2011/04/23-emacs-and-postgresql-pl-line-numbering.html</link> <description><![CDATA[<p><span class="hack"> </span></p>
PL/whatevercode lines when editing from <a href="http://www.gnu.org/software/emacs/">Emacs</a>, and it's something very useful to turn on when debugging.</p> <center> <p><img src="../../../images//emacs-pgsql-linum.png" alt=""></p> </center> <p>The carrets on the <em>fringe</em> in the emacs window are the result of(setq-default indicate-buffer-boundaries 'left)and here it's just overloading the image somehow. But the idea is to justM-x linum-modewhen you need it, at least that's my usage of it.</p> <p>You can use <a href="https://github.com/dimitri/el-get">el-get</a> to easily get (then update) this littleEmacsextension.</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Sat, 23 Apr 2011 10:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/04/blog/2011/04/23-emacs-and-postgresql-pl-line-numbering.html</guid> </item> <item> <title>Emacs and PostgreSQL, PL line numbering</title> <link>http://tapoueh.org/blog/2011/04/23-emacs-and-postgresql-pl-line-numbering.html</link> <description><![CDATA[h1>Emacs and PostgreSQL, PL line numbering</h1>
Saturday, April 23 2011, 10:30 </div><p><span class="hack"> </span></p> <p>A while ago I've been fixing and publishing <a href="https://github.com/dimitri/pgsql-linum-format">pgsql-linum-format</a> separately. That allows to numberPL/whatevercode lines when editing from <a href="http://www.gnu.org/software/emacs/">Emacs</a>, and it's something very useful to turn on when debugging.</p> <center> <p><img src="../../../images//emacs-pgsql-linum.png" alt=""></p> </center> <p>The carrets on the <em>fringe</em> in the emacs window are the result of(setq-default indicate-buffer-boundaries 'left)and here it's just overloading the image somehow. But the idea is to justM-x linum-modewhen you need it, at least that's my usage of it.</p> <p>You can use <a href="https://github.com/dimitri/el-get">el-get</a> to easily get (then update) this littleEmacsextension.</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/pgsql-linum-format.html">pgsql-linum-format</a></p>]]></description>
<p>What you'll find is a very simple<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Sat, 23 Apr 2011 10:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/04/23-emacs-and-postgresql-pl-line-numbering.html</guid> </item> <item> <title>Emacs Kicker</title> <link>http://tapoueh.org/blog/2011/04/blog/2011/04/15-emacs-kicker.html</link> <description><![CDATA[<p>Following up on the very popular <a href="https://github.com/technomancy/emacs-starter-kit">emacs-starter-kit</a>, I'm now proposing the <a href="https://github.com/dimitri/emacs-kicker">emacs-kicker</a>. It's about the
.emacsfile you've seen in older posts here, which I maintain for some colleagues. After all, if they find it useful, some more people might to, so I've decided to publish it.</p>128lines <a href="http://www.gnu.org/software/emacs/">Emacs</a> user init file, based on <a href="https://github.com/dimitri/el-get">el-get</a> for external packages. A not so <em>random</em> selection of those is used, here's the list when you hide some details:</p> <pre class="src"><p>Another interresting thing to note in this'(el-get <span style="color: #b22222;">; </span><span style="color: #b22222;">el-get is self-hosting </span> escreen <span style="color: #b22222;">; </span><span style="color: #b22222;">screen for emacs, C-\ C-h </span> php-mode-improved <span style="color: #b22222;">; </span><span style="color: #b22222;">if you're into php... </span> psvn <span style="color: #b22222;">; </span><span style="color: #b22222;">M-x svn-status </span> switch-window <span style="color: #b22222;">; </span><span style="color: #b22222;">takes over C-x o </span> auto-complete <span style="color: #b22222;">; </span><span style="color: #b22222;">complete as you type with overlays </span> emacs-goodies-el <span style="color: #b22222;">; </span><span style="color: #b22222;">the debian addons for emacs </span> yasnippet <span style="color: #b22222;">; </span><span style="color: #b22222;">powerful snippet mode </span> zencoding-mode <span style="color: #b22222;">; </span><span style="color: #b22222;">http://www.emacswiki.org/emacs/ZenCoding </span> (<span style="color: #da70d6;">:name</span> buffer-move <span style="color: #b22222;">; </span><span style="color: #b22222;">move buffers around in windows </span> (<span style="color: #da70d6;">:name</span> smex <span style="color: #b22222;">; </span><span style="color: #b22222;">a better (ido like) M-x </span> (<span style="color: #da70d6;">:name</span> magit <span style="color: #b22222;">; </span><span style="color: #b22222;">git meet emacs, and a binding </span> (<span style="color: #da70d6;">:name</span> goto-last-change <span style="color: #b22222;">; </span><span style="color: #b22222;">move pointer back to last change </span></pre>
kickeris a choice of some key bindings that are rather unusual (yet) I guess.</p> <pre class="src"> (global-set-key (kbd <span style="color: #bc8f8f;">"C-x C-b"</span>) 'ido-switch-buffer) (global-set-key (kbd <span style="color: #bc8f8f;">"C-x C-c"</span>) 'ido-switch-buffer) (global-set-key (kbd <span style="color: #bc8f8f;">"C-x B"</span>) 'ibuffer) </pre> <p>Yes, you see that I've reboundC-x C-cto switching buffers. That key is really easy to use and I don't think thatM-x kill-emacsdeserves it. Keys that are so easy to use should be kept for frequent actions, and quiting emacs is a once-a-day to once-a-month action here. And you can still quit from the window manager button or from the menu or fromM-x.</p> <p>Also <em>Mac</em> users are not left behind, you will see some settings that either are adapted to the system (like choosing another <em>font</em>, keep displaying themenu-baror not installing the darkishtango-color-modeon this system, where it renders poorly in my opinion), as you can see here:</p> <pre class="src"> (<span style="color: #7f007f;">if</span> (string-match <span style="color: #bc8f8f;">"apple-darwin"</span> system-configuration)(set-face-font 'default <span style="color: #bc8f8f;">"Monaco-13"</span>) (set-frame-font <span style="color: #bc8f8f;">"Monospace-10"</span>))
(<span style="color: #7f007f;">when</span> (string-match <span style="color: #bc8f8f;">"apple-darwin"</span> system-configuration)
<p>So all in all, I don't expect this(setq mac-allow-anti-aliasing t) (setq mac-command-modifier 'meta) (setq mac-option-modifier 'none)) </pre>
emacs-kickerto please everyone, but I expect it to be simple and rich enough (thanks to <a href="https://github.com/dimitri/el-get">el-get</a>), and it should be a good <em>kick start</em> that's easy to adapt.</p> <p>If you want to try it without installing it it's very easy to do so. Just clone thegitrepository then start anEmacsthat will use this. For example that could be, using the excellent <a href="http://emacsformacosx.com/">Emacs For MacOSX</a>:</p> <pre class="src"><p>I hope some readers will find it useful! :)</p> ]]></description>$ /Applications/Emacs.app/Contents/MacOS/Emacs -Q -l init.el </pre>
<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 15 Apr 2011 21:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/04/blog/2011/04/15-emacs-kicker.html</guid> </item> <item> <title>Emacs Kicker</title> <link>http://tapoueh.org/blog/2011/04/15-emacs-kicker.html</link> <description><![CDATA[h1>Emacs Kicker</h1>
Friday, April 15 2011, 21:30 </div><p>Following up on the very popular <a href="https://github.com/technomancy/emacs-starter-kit">emacs-starter-kit</a>, I'm now proposing the <a href="https://github.com/dimitri/emacs-kicker">emacs-kicker</a>. It's about the.emacsfile you've seen in older posts here, which I maintain for some colleagues. After all, if they find it useful, some more people might to, so I've decided to publish it.</p> <p>What you'll find is a very simple128lines <a href="http://www.gnu.org/software/emacs/">Emacs</a> user init file, based on <a href="https://github.com/dimitri/el-get">el-get</a> for external packages. A not so <em>random</em> selection of those is used, here's the list when you hide some details:</p> <pre class="src"><p>Another interresting thing to note in this'(el-get <span style="color: #888a85;">; </span><span style="color: #888a85;">el-get is self-hosting </span> escreen <span style="color: #888a85;">; </span><span style="color: #888a85;">screen for emacs, C-\ C-h </span> php-mode-improved <span style="color: #888a85;">; </span><span style="color: #888a85;">if you're into php... </span> psvn <span style="color: #888a85;">; </span><span style="color: #888a85;">M-x svn-status </span> switch-window <span style="color: #888a85;">; </span><span style="color: #888a85;">takes over C-x o </span> auto-complete <span style="color: #888a85;">; </span><span style="color: #888a85;">complete as you type with overlays </span> emacs-goodies-el <span style="color: #888a85;">; </span><span style="color: #888a85;">the debian addons for emacs </span> yasnippet <span style="color: #888a85;">; </span><span style="color: #888a85;">powerful snippet mode </span> zencoding-mode <span style="color: #888a85;">; </span><span style="color: #888a85;">http://www.emacswiki.org/emacs/ZenCoding </span> (<span style="color: #729fcf;">:name</span> buffer-move <span style="color: #888a85;">; </span><span style="color: #888a85;">move buffers around in windows </span> (<span style="color: #729fcf;">:name</span> smex <span style="color: #888a85;">; </span><span style="color: #888a85;">a better (ido like) M-x </span> (<span style="color: #729fcf;">:name</span> magit <span style="color: #888a85;">; </span><span style="color: #888a85;">git meet emacs, and a binding </span> (<span style="color: #729fcf;">:name</span> goto-last-change <span style="color: #888a85;">; </span><span style="color: #888a85;">move pointer back to last change </span></pre>
kickeris a choice of some key bindings that are rather unusual (yet) I guess.</p> <pre class="src"> (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-x C-b"</span>) 'ido-switch-buffer) (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-x C-c"</span>) 'ido-switch-buffer) (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-x B"</span>) 'ibuffer) </pre> <p>Yes, you see that I've reboundC-x C-cto switching buffers. That key is really easy to use and I don't think thatM-x kill-emacsdeserves it. Keys that are so easy to use should be kept for frequent actions, and quiting emacs is a once-a-day to once-a-month action here. And you can still quit from the window manager button or from the menu or fromM-x.</p> <p>Also <em>Mac</em> users are not left behind, you will see some settings that either are adapted to the system (like choosing another <em>font</em>, keep displaying themenu-baror not installing the darkishtango-color-modeon this system, where it renders poorly in my opinion), as you can see here:</p> <pre class="src"> (<span style="color: #729fcf; font-weight: bold;">if</span> (string-match <span style="color: #ad7fa8; font-style: italic;">"apple-darwin"</span> system-configuration)(set-face-font 'default <span style="color: #ad7fa8; font-style: italic;">"Monaco-13"</span>) (set-frame-font <span style="color: #ad7fa8; font-style: italic;">"Monospace-10"</span>))
(<span style="color: #729fcf; font-weight: bold;">when</span> (string-match <span style="color: #ad7fa8; font-style: italic;">"apple-darwin"</span> system-configuration)
<p>So all in all, I don't expect this(setq mac-allow-anti-aliasing t) (setq mac-command-modifier 'meta) (setq mac-option-modifier 'none)) </pre>
emacs-kickerto please everyone, but I expect it to be simple and rich enough (thanks to <a href="https://github.com/dimitri/el-get">el-get</a>), and it should be a good <em>kick start</em> that's easy to adapt.</p> <p>If you want to try it without installing it it's very easy to do so. Just clone thegitrepository then start anEmacsthat will use this. For example that could be, using the excellent <a href="http://emacsformacosx.com/">Emacs For MacOSX</a>:</p> <pre class="src"><p>I hope some readers will find it useful! :)</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/switch-window.html">switch-window</a> <a href="../../../tags/emacs-kicker.html">emacs-kicker</a></p>$ /Applications/Emacs.app/Contents/MacOS/Emacs -Q -l init.el </pre>
]]></description>
<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 15 Apr 2011 21:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/04/15-emacs-kicker.html</guid> </item> <item> <title>Some notes about Skytools3</title> <link>http://tapoueh.org/blog/2011/04/11-some-notes-about-skytools3.html</link> <description><![CDATA[h1>Some notes about Skytools3</h1>
Monday, April 11 2011, 11:30 </div><p>I've been working on <a href="http://github.com/markokr/skytools">skytools3</a> packaging lately. I've been pushing quite a lot of work into it, in order to have exactly what I needed out of the box, after some 3 years of production and experiences with the products. Plural, yes, because even if <a href="http://wiki.postgresql.org/wiki/PgBouncer">pgbouncer</a> and <a href="http://wiki.postgresql.org/wiki/PL/Proxy">plproxy</a> are siblings to the projets (same developers team, separate life cycle and releases), thenskytoolsstill includes several sub-projects.</p> <p>Here's what theskytools3packaging is going to look like:</p> <pre class="src"> skytools3 Skytool's replication and queuing python-pgq3 Skytool's PGQ python library python-skytools3 python scripts framework for skytools skytools-ticker3 PGQ ticker daemon service skytools-walmgr3 high-availability archive and restore commands postgresql-8.4-pgq3 PGQ server-side code (C module for PostgreSQL) postgresql-9.0-pgq3 PGQ server-side code (C module for PostgreSQL) </pre> <p>This split is needed so that you can install your <em>daemons</em> (we call them <em>consumers</em>) on separate machines than where you run <a href="http://postgresql.org">PostgreSQL</a>. But for thewalmgrpart, it makes no sense to install it if you don't have a local PostgreSQL service, as it's providingarchiveandrestorecommands. Then the <em>ticker</em>, you're free to run it on any machine really, so just package it this way (inskytools3the <em>ticker</em> is written inCand does not depend on the python framework any more).</p> <p>What you can't see here yet is the new goodies that wraps it as a qualitydebianpackage. A newskytoolsuser is created for you when you install theskytools3package (which contains the services), along with a skeleton file/etc/skytools.iniand a user directory/etc/skytools/. Put in there your services configuration file, and register those service in the/etc/skytools.inifile itself. Then they will get cared about in theinitsequence at startup and shutdown of your server.</p> <p>The services will run under theskytoolssystem user, and will default to put their log into/var/log/skytools/. Thepidfilewill get into/var/run/skytools/. All integrated, automated.</p> <p>Next big <em>TODO</em> is about documentation, reviewing it and polishing it, and I think thatskytools3will then get ready for public release. Yes, you read it right, it's happening this very year! I'm very excited about it, and have several architectures that will greatly benefit from the switch toskytools3. More on that later, though! (Yes, my <em>to blog later</em> list is getting quite long now).</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/skytools.html">skytools</a> <a href="../../../tags/restore.html">restore</a></p>]]></description>
<item><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 11 Apr 2011 11:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/04/11-some-notes-about-skytools3.html</guid> </item>
<title>towards pg_staging 1.0</title> <link>http://tapoueh.org/blog/2011/03/29-towards-pg_staging-10.html</link> <description><![CDATA[h1>towards pg_staging 1.0</h1>
Tuesday, March 29 2011, 15:30 </div><p>If you don't remember about what <a href="pgstaging.html">pg_staging</a> is all about, it's a central console from where to control all your <a href="http://www.postgresql.org/">PostgreSQL</a> databases. Typically you use it to manage your development and pre-production setup, where developers ask you pretty often to install them some newer dump from the production, and you want that operation streamlined and easy.</p> <center> <p><img src="../../../images//pg_staging.png" alt=""></p> </center> <h3>Usage</h3> <p class="first">The typical session would be something like this:</p> <pre class="src"> pg_staging> databases foodb.devfoodb foodb_20100824 :5432 foodb_20100209 foodb_20100209 :5432 foodb_20100824 foodb_20100824 :5432 pgbouncer pgbouncer :6432 postgres postgres :5432
pg_staging> dbsizes foodb.dev foodb.dev
foodb_20100209: -1 foodb_20100824: 104 GB Total = 104 GB
pg_staging> restore foodb.dev ... pg_staging> switch foodb.dev today </pre>
<p>The list of supported commands is quite long now, and documented too (it comes with two man pages). Therestoreone is the most important and will create the database, add it to thepgbouncersetup, fetch the backup nameddbname.`date -I`.dump, prepare a filtered object list (more on that), load <em>pre</em>SQLscripts, launchpg_restore,VACUUM ANALYZEthe database when configured to do so, load the <em>post</em>SQLscripts then optionaly <em>switch</em> thepgbouncersetup to default to this new database.</p> <h3>Filtering</h3> <p class="first">The newer option is calledtablename_nodata_regexp, and here's its documentation in full:</p> <blockquote> <p class="quoted"> List of table names regexp (comma separated) to restore without content. Thepg_restorecatalogTABLE DATAsections will get filtered out. The regexp is applied againstschemaname.tablenameand non-anchored by default.</p> </blockquote> <p>This comes to supplement theschemasandschemas_nodataoptions, that allows to only restore objects from a given set of <em>schemas</em> (filtering out triggers that will calls function that are in the excluded schemas, like e.g. <a href="http://wiki.postgresql.org/wiki/Skytools">Londiste</a> ones) or to restore only theTABLEdefinitions while skipping theTABLE DATAentries.</p> <h3>Setup</h3> <p class="first">To setup your environment for <em>pg_staging</em>, you need to take some steps. It's not complex but it's fairly involved. The benefit is this amazingly useful central unique console to control as many databases as you need.</p> <p>You need apg_staging.inifile where to describe your environment. I typically name the sessions in there by the name of the database to restore followed by adevorpreprodextension.</p> <p>You need to have all your backups available throughHTTP, and as of now, served by the famous <em>apache</em>mod_dirdirectory listing. It's easy to add support to other methods, but is has not been done yet. You also need to have a cluster wide--globals-onlybackup available somewhere so that you can easily create the users etc you need frompg_staging.</p> <p>You also need to run apgbouncerdaemon on each database server, allowing you to bypass editing connection strings when youswitcha new database version live.</p> <p>You also need to install the <em>client</em> script, have a localpgstagingsystem user and allow it to run the client script as root, so that it's able to control some services and editpgbouncer.inifor you.</p> <h3>Status</h3> <p class="first">I'm still using it a lot (several times a week) to manage a whole development and pre-production environment set, so the very low <a href="https://github.com/dimitri/pg_staging">code activity</a> of the project is telling that it's pretty stable (last series of <em>commits</em> are all bug fixes and round corners).</p> <p>Given that, I'm thinking in terms ofpg_staging 1.0soon! Now is a pretty good time to try it and see how it can help you.</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/skytools.html">skytools</a> <a href="../../../tags/backup.html">backup</a> <a href="../../../tags/restore.html">restore</a> <a href="../../../tags/pg_staging.html">pg_staging</a></p>]]></description>
<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 29 Mar 2011 15:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/03/29-towards-pg_staging-10.html</guid> </item> <item> <title>Extensions in 9.1</title> <link>http://tapoueh.org/blog/2011/03/01-extensions-in-91.html</link> <description><![CDATA[h1>Extensions in 9.1</h1>
Tuesday, March 01 2011, 16:30 </div><p>If you've not been following closely you might have missed out on extensions integration. Well, <a href="http://en.wikipedia.org/wiki/Tom_Lane_(computer_scientist)">Tom</a> spent some time on the patches I've been preparing for the last 4 months. And not only did he commit most of the work but he also enhanced some parts of the code (better factoring) and basically finished it.</p> <p>At the <a href="http://wiki.postgresql.org/wiki/PgCon_2010_Developer_Meeting">previous developer meeting</a> his advice was to avoid putting too much into the very first version of the patch for it to stand its chances of being integrated, and while in the review process more than one major <a href="http://www.postgresql.org/">PostgreSQL</a> contributor expressed worries about the size of the patch and the number of features proposed. Which is the usual process.</p> <p>Then what happened is that <strong><em>Tom</em></strong> finally took a similar reasoning as mine while working on the feature. To maximize the benefits, once you have the infrastructure in place, it's not that much more work to provide the really interesting features. What's complex is agreeing on what exactly are their specifications. And in the <em>little</em> time window we got on this commit fest (well, we hijacked about 2 full weeks there), we managed to get there.</p> <p>So in the end the result is quite amazing, and you can see that on the documentation chapter about it: <a href="http://developer.postgresql.org/pgdocs/postgres/extend-extensions.html">35.15. Packaging Related Objects into an Extension</a>.</p> <p>All the <em>contrib</em> modules that are installingSQLobjects into databases for you to use them are now converted to <strong><em>Extensions</em></strong> too, and will get released in9.1with an upgrade script that allows you to <em>upgrade from unpackaged</em>. That means that once you've upgraded from a past PostgreSQL release up to9.1, it will be a command away for you to register <em>extensions</em> as such. I expect third party <em>extension</em> authors (from <a href="http://pgfoundry.org/projects/ip4r/">ip4r</a> to <a href="http://pgfoundry.org/projects/temporal">temporal</a>) to release a <em>upgrade-from-unpackaged</em> version of their work too.</p> <p>Of course, a big use case of the <em>extensions</em> is also in-housePLcode, and having version number and multi-stage upgrade scripts there will be fantastic too, I can't wait to work with such a tool set myself. Some later blog post will detail the benefits and usage. I'm already trying to think how much of this version and upgrade facility could be expanded to classicDDLobjects…</p> <p>So expect some more blog posts from me on this subject, I will have to talk about <em>debian packaging</em> an extension (it's getting damn easy with <a href="http://packages.debian.org/squeeze/postgresql-server-dev-all">postgresql-server-dev-all</a> — yes it has received some planing ahead), and about how to package your own extension, manage upgrades, turn your currentpre-9.1extension into a <em>full blown extension</em>, and maybe how to stop worrying about extension when you're a DBA.</p> <p>If you have some features you would want to discuss for next releases, please do contact me!</p> <p>Meanwhile, I'm very happy that this project of mine finally made it to <em>core</em>, it's been long in the making. Some years to talk about it and then finally 4 months of coding that I'll remember as a marathon. Many Thanks go to all who helped here, from <a href="http://www.2ndquadrant.com/">2ndQuadrant</a> to early reviewers to people I talked to over beers at conferences… lots of people really.</p> <p>To an extended PostgreSQL (and beyond) :)</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/pgcon.html">pgcon</a> <a href="../../../tags/conferences.html">Conferences</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/ip4r.html">ip4r</a> <a href="../../../tags/9.1.html">9.1</a></p>]]></description>
<item><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 01 Mar 2011 16:30:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/03/01-extensions-in-91.html</guid> </item>
<pre class="src"><title>desktop-mode and readahead</title> <link>http://tapoueh.org/blog/2011/02/blog/2011/02/23-desktop-mode-and-readahead.html</link> <description><![CDATA[<p>I'm using <a href="http://www.gnu.org/software/emacs/manual/html_node/elisp/Desktop-Save-Mode.html#Desktop-Save-Mode">Desktop Save Mode</a> so that <a href="http://www.gnu.org/software/emacs/">Emacs</a> knows to open again all the buffers I've been using. That goes quite well with how often I start
Emacs, that is once a week or once a month. Now,M-x ibufferlast line is as following:</p><p>That means that at startup,718 buffers 19838205 668 files, 15 processes </pre>
Emacswill load that many files. In order not to have to wait until it's done doing so, I've setup things this way:</p> <pre class="src"> <span style="color: #b22222;">;; </span><span style="color: #b22222;">and the session </span>(setq desktop-restore-eager 20<p>Problem is that it's still slow. An idea I had was to use the <a href="https://fedorahosted.org/readahead/browser/README">readahead</a> tool that allows reducing some distributions boot time. Of course this tool is not expecting the same file format as desktop-lazy-verbose nil) (desktop-save-mode 1) (savehist-mode 1) </pre>
emacs-desktopuses. Still, converting is quite easy is someawkmagic. Here's the result:</p> <pre class="src"> <span style="color: #b22222;">;;; </span><span style="color: #b22222;">dim-desktop.el — Dimitri Fontaine </span><span style="color: #b22222;">;;</span><span style="color: #b22222;"> </span><span style="color: #b22222;">;; </span><span style="color: #b22222;">Allows to prepare a readahead file list from desktop-save </span> (<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">desktop</span>)(<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">dim-desktop-file-readahead-list</span>
<span style="color: #bc8f8f;">"~/.emacs.desktop.readahead"</span> <span style="color: #bc8f8f;">"Where to save the emacs desktop `readahead` file list"</span>)
(<span style="color: #7f007f;">defvar</span> <span style="color: #b8860b;">dim-desktop-filelist-command</span>
<span style="color: #bc8f8f;">"gawk -F '[ \"]' '/desktop-.-buffer/ {getline; if($4) print $4}' %s"</span> <span style="color: #bc8f8f;">"Command to run to prepare the readahead file list"</span>)
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">dim-desktop-get-readahead-file-list</span> (<span style="color: #228b22;">&optional</span> filename dir)
<span style="color: #bc8f8f;">"get the file list for readahead from dekstop file in DIR, or ~"</span> (<span style="color: #7f007f;">with-temp-file</span> (or filename dim-desktop-file-readahead-list) (insert (shell-command-to-string (format dim-desktop-filelist-command (expand-file-name desktop-base-file-name (or dir <span style="color: #bc8f8f;">"~"</span>)))))))
<span style="color: #b22222;">;; </span><span style="color: #b22222;">This will not work because the hook is run before to add the buffers into </span><span style="color: #b22222;">;; </span><span style="color: #b22222;">the desktop file. </span><span style="color: #b22222;">;;</span><span style="color: #b22222;"> </span><span style="color: #b22222;">;;</span><span style="color: #b22222;">(add-hook 'desktop-save-hook 'dim-desktop-get-readahead-file-list) </span> <span style="color: #b22222;">;; </span><span style="color: #b22222;">so instead, advise the function </span>(<span style="color: #7f007f;">defadvice</span> <span style="color: #0000ff;">desktop-save</span> (after desktop-save-readahead activate)
<span style="color: #bc8f8f;">"Prepare a readahead(8) file for the desktop file"</span> (dim-desktop-get-readahead-file-list))
(<span style="color: #7f007f;">provide</span> '<span style="color: #5f9ea0;">dim-desktop</span>) </pre>
<p>Theawkconstructgetlineallows to process the next line of the input file, which is very practical here (and in a host of other situations). Now that we have a file containing the list of filesEmacswill load, we have to tweak the system toreadaheadthose disk blocks. As I'm currently using <a href="http://kde.org/">KDE</a> again, I've done it thusly:</p> <pre class="src"> % cat ~/.kde/Autostart/readahead.emacs.sh /bin/bash# just readahead the emacs desktop files # this file listing is maintained directly from Emacs itself readahead ~/.emacs.desktop.readahead </pre>
<p>So, well, it works. The files thatEmacswill need are pre-read, so at the time the desktop really gets to them, I see no more disk activity (laptops have a led to see that happening). But the desktop loading time has not changed...</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 23 Feb 2011 16:45:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/02/blog/2011/02/23-desktop-mode-and-readahead.html</guid> </item> <item> <title>desktop-mode and readahead</title> <link>http://tapoueh.org/blog/2011/02/23-desktop-mode-and-readahead.html</link> <description><![CDATA[h1>desktop-mode and readahead</h1>
Wednesday, February 23 2011, 16:45 </div><p>I'm using <a href="http://www.gnu.org/software/emacs/manual/html_node/elisp/Desktop-Save-Mode.html#Desktop-Save-Mode">Desktop Save Mode</a> so that <a href="http://www.gnu.org/software/emacs/">Emacs</a> knows to open again all the buffers I've been using. That goes quite well with how often I startEmacs, that is once a week or once a month. Now,M-x ibufferlast line is as following:</p> <pre class="src"><p>That means that at startup,718 buffers 19838205 668 files, 15 processes </pre>
Emacswill load that many files. In order not to have to wait until it's done doing so, I've setup things this way:</p> <pre class="src"> <span style="color: #888a85;">;; </span><span style="color: #888a85;">and the session </span>(setq desktop-restore-eager 20<p>Problem is that it's still slow. An idea I had was to use the <a href="https://fedorahosted.org/readahead/browser/README">readahead</a> tool that allows reducing some distributions boot time. Of course this tool is not expecting the same file format as desktop-lazy-verbose nil) (desktop-save-mode 1) (savehist-mode 1) </pre>
emacs-desktopuses. Still, converting is quite easy is someawkmagic. Here's the result:</p> <pre class="src"> <span style="color: #888a85;">;;; </span><span style="color: #888a85;">dim-desktop.el — Dimitri Fontaine </span><span style="color: #888a85;">;;</span><span style="color: #888a85;"> </span><span style="color: #888a85;">;; </span><span style="color: #888a85;">Allows to prepare a readahead file list from desktop-save </span> (<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">desktop</span>)(<span style="color: #729fcf; font-weight: bold;">defvar</span> <span style="color: #eeeeec;">dim-desktop-file-readahead-list</span>
<span style="color: #ad7fa8; font-style: italic;">"~/.emacs.desktop.readahead"</span> <span style="color: #888a85;">"Where to save the emacs desktop `readahead` file list"</span>)
(<span style="color: #729fcf; font-weight: bold;">defvar</span> <span style="color: #eeeeec;">dim-desktop-filelist-command</span>
<span style="color: #ad7fa8; font-style: italic;">"gawk -F '[ \"]' '/desktop-.-buffer/ {getline; if($4) print $4}' %s"</span> <span style="color: #888a85;">"Command to run to prepare the readahead file list"</span>)
(<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">dim-desktop-get-readahead-file-list</span> (<span style="color: #8ae234; font-weight: bold;">&optional</span> filename dir)
<span style="color: #888a85;">"get the file list for readahead from dekstop file in DIR, or ~"</span> (<span style="color: #729fcf; font-weight: bold;">with-temp-file</span> (or filename dim-desktop-file-readahead-list) (insert (shell-command-to-string (format dim-desktop-filelist-command (expand-file-name desktop-base-file-name (or dir <span style="color: #ad7fa8; font-style: italic;">"~"</span>)))))))
<span style="color: #888a85;">;; </span><span style="color: #888a85;">This will not work because the hook is run before to add the buffers into </span><span style="color: #888a85;">;; </span><span style="color: #888a85;">the desktop file. </span><span style="color: #888a85;">;;</span><span style="color: #888a85;"> </span><span style="color: #888a85;">;;</span><span style="color: #888a85;">(add-hook 'desktop-save-hook 'dim-desktop-get-readahead-file-list) </span> <span style="color: #888a85;">;; </span><span style="color: #888a85;">so instead, advise the function </span>(<span style="color: #729fcf; font-weight: bold;">defadvice</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">desktop-save</span> (after desktop-save-readahead activate)
<span style="color: #888a85;">"Prepare a readahead(8) file for the desktop file"</span> (dim-desktop-get-readahead-file-list))
(<span style="color: #729fcf; font-weight: bold;">provide</span> '<span style="color: #8ae234;">dim-desktop</span>) </pre>
<p>Theawkconstructgetlineallows to process the next line of the input file, which is very practical here (and in a host of other situations). Now that we have a file containing the list of filesEmacswill load, we have to tweak the system toreadaheadthose disk blocks. As I'm currently using <a href="http://kde.org/">KDE</a> again, I've done it thusly:</p> <pre class="src"> % cat ~/.kde/Autostart/readahead.emacs.sh /bin/bash# just readahead the emacs desktop files # this file listing is maintained directly from Emacs itself readahead ~/.emacs.desktop.readahead </pre>
<p>So, well, it works. The files thatEmacswill need are pre-read, so at the time the desktop really gets to them, I see no more disk activity (laptops have a led to see that happening). But the desktop loading time has not changed...</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/restore.html">restore</a></p>]]></description>
<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 23 Feb 2011 16:45:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/02/23-desktop-mode-and-readahead.html</guid> </item> <item> <title>Back from FOSDEM</title> <link>http://tapoueh.org/blog/2011/02/07-back-from-fosdem.html</link> <description><![CDATA[h1>Back from FOSDEM</h1>
Monday, February 07 2011, 11:10 </div><p>This year we were in the main building of the conference, and apparently the booth went very well, solding lots of <a href="http://postgresqleu.spreadshirt.net/">PostgreSQL merchandise</a> etc. I had the pleasure to once again meet with the community, but being there only 1 day I didn't spend as much time as I would have liked with some of the people there.</p> <p>In case you're wondering, my <a href="http://fosdem.org/2011/schedule/event/pg_extension1">extension's talk</a> went quite well, and several people were kind enough to tell me they appreciated it! There was video recording of it, so we will soon have proofs showing how bad it really was and how <em>polite</em> those people really are :)</p> <p>I will soon be able to write an article series detailing what's an Extension and how you deal with them, either as a user or an author. Well in fact the goal is for any user to easily become an extension author, as I think lots of people are already maintaining server side code but missing tools to manage it properly. But that will begin once the patch is in, so that I present <em>the real stuff</em> rather than what I proposed to the community… Stay tuned!</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/conferences.html">Conferences</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/fosdem.html">FOSDEM</a> <a href="../../../tags/9.1.html">9.1</a></p>]]></description>
<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 07 Feb 2011 11:10:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/02/07-back-from-fosdem.html</guid> </item> <item> <title>Going to FOSDEM</title> <link>http://tapoueh.org/blog/2011/02/01-going-to-fosdem.html</link> <description><![CDATA[h1>Going to FOSDEM</h1>
Tuesday, February 01 2011, 13:35 </div><p>A quick blog entry to say that yes:</p> <center> <p><img src="../../../images//going-to-fosdem-2011.png" alt=""></p> </center> <p>And I will even do my <a href="http://fosdem.org/2011/schedule/event/pg_extension1">Extension's talk</a> which had a <a href="http://blog.hagander.net/archives/183-Feedback-from-PGDay.EU-the-speakers.html">success at pgday.eu</a>. The talk will be updated to include the last developments of the extension's feature, as some of it changed already in between, and to detail the plan for theALTER EXTENSION ... UPGRADEfeature that I'd like to see included as soon as9.1, but time is running so fast.</p> <p>In fact the design for theUPGRADEhas been done and reviewed already, but there's yet to reach consensus on how to setup which is the upgrade file to use when upgrading from a given version to another. I've solved it in my patch, of course, by adding properties into the extension's <em>control file</em>. That's the best place to have that setup I think, it allows lots of flexibility, leave the extension's author in charge, and avoids any hard coding of any kind of assumptions about file naming or whatever.</p> <p>Next days and reviews will tell us more about how the design is received. Meanwhile, we're working on finalizing the main extension's patch, offeringpg_dumpsupport.</p> <p>See you at <a href="http://fosdem.org/2011/">FOSDEM</a>!</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/conferences.html">Conferences</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/fosdem.html">FOSDEM</a> <a href="../../../tags/9.1.html">9.1</a></p>]]></description>
<item><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 01 Feb 2011 13:35:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/02/01-going-to-fosdem.html</guid> </item>
<p>You would tell me that there's nothing we can do for so unfriendly users. Well, here's what I did:</p> <pre class="src"> <span style="color: #b22222;">;; </span><span style="color: #b22222;">emacs setup </span> (add-to-list 'load-path <span style="color: #bc8f8f;">"~/.emacs.d/el-get/el-get"</span>) (<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">el-get</span>) (setq<title>Starting afresh with el-get</title> <link>http://tapoueh.org/blog/2011/01/blog/2011/01/11-starting-afresh-with-el-get.html</link> <description><![CDATA[<p>It so happens that a colleague of mine wanted to start using <a href="http://www.gnu.org/software/emacs/">Emacs</a> but couldn't get to it. He insists on having proper color themes in all applications and some sensible defaults full of nifty add-ons everywhere, and didn't want to have to learn that much about <em>Emacs</em> and <em>Emacs Lisp</em> to get started. I'm not even sure that he will <a href="http://www.gnu.org/software/emacs/tour/">Take the Emacs tour</a>.</p>
el-get-sources '(el-get php-mode-improved psvn auto-complete switch-window
(<span style="color: #da70d6;">:name</span> buffer-move <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> () (global-set-key (kbd <span style="color: #bc8f8f;">"<C-S-up>"</span>) 'buf-move-up) (global-set-key (kbd <span style="color: #bc8f8f;">"<C-S-down>"</span>) 'buf-move-down) (global-set-key (kbd <span style="color: #bc8f8f;">"<C-S-left>"</span>) 'buf-move-left) (global-set-key (kbd <span style="color: #bc8f8f;">"<C-S-right>"</span>) 'buf-move-right)))
(<span style="color: #da70d6;">:name</span> magit <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> () (global-set-key (kbd <span style="color: #bc8f8f;">"C-x C-z"</span>) 'magit-status)))
(<span style="color: #da70d6;">:name</span> goto-last-change <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> () <span style="color: #b22222;">;; </span><span style="color: #b22222;">azerty keyboard here, don't use C-x C-/ </span> (global-set-key (kbd <span style="color: #bc8f8f;">"C-x C-"</span>) 'goto-last-change)))))
(<span style="color: #7f007f;">when</span> window-system
(add-to-list 'el-get-sources 'color-theme-tango))
(el-get 'sync)
<span style="color: #b22222;">;; </span><span style="color: #b22222;">visual settings </span>(setq inhibit-splash-screen t) (menu-bar-mode -1) (tool-bar-mode -1) (scroll-bar-mode -1)
(line-number-mode 1) (column-number-mode 1)
<span style="color: #b22222;">;; </span><span style="color: #b22222;">Use the clipboard, pretty please, so that copy/paste "works" </span>(setq x-select-enable-clipboard t)
(set-frame-font <span style="color: #bc8f8f;">"Monospace-10"</span>)
(global-hl-line-mode)
<span style="color: #b22222;">;; </span><span style="color: #b22222;">suivre les changements exterieurs sur les fichiers </span>(global-auto-revert-mode 1)
<span style="color: #b22222;">;; </span><span style="color: #b22222;">pour les couleurs dans M-x shell </span>(autoload 'ansi-color-for-comint-mode-on <span style="color: #bc8f8f;">"ansi-color"</span> nil t) (add-hook 'shell-mode-hook 'ansi-color-for-comint-mode-on)
<span style="color: #b22222;">;; </span><span style="color: #b22222;">S-fleches pour changer de fenêtre </span>(windmove-default-keybindings) (setq windmove-wrap-around t)
<span style="color: #b22222;">;; </span><span style="color: #b22222;">find-file-at-point quand ça a du sens </span>(setq ffap-machine-p-known 'accept) <span style="color: #b22222;">; </span><span style="color: #b22222;">no pinging </span>(setq ffap-url-regexp nil) <span style="color: #b22222;">; </span><span style="color: #b22222;">disable URL features in ffap </span>(setq ffap-ftp-regexp nil) <span style="color: #b22222;">; </span><span style="color: #b22222;">disable FTP features in ffap </span>(define-key global-map (kbd <span style="color: #bc8f8f;">"C-x C-f"</span>) 'find-file-at-point)
(<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">ibuffer</span>) (global-set-key <span style="color: #bc8f8f;">"\C-x\C-b"</span> 'ibuffer)
<span style="color: #b22222;">;; </span><span style="color: #b22222;">use iswitchb-mode for C-x b </span>(iswitchb-mode)
<span style="color: #b22222;">;; </span><span style="color: #b22222;">I can't remember having meant to use C-z as suspend-frame </span>(global-set-key (kbd <span style="color: #bc8f8f;">"C-z"</span>) 'undo)
<span style="color: #b22222;">;; </span><span style="color: #b22222;">winner-mode pour revenir sur le layout précédent C-c <left> </span>(winner-mode 1)
<span style="color: #b22222;">;; </span><span style="color: #b22222;">dired-x pour C-x C-j </span>(<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">dired-x</span>)
<span style="color: #b22222;">;; </span><span style="color: #b22222;">full screen </span>(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">fullscreen</span> ()
<p>With just this simple 87 lines (all included) of setup, my local user is very happy to switch to using <a href="http://www.gnu.org/software/emacs/">our favorite editor</a>. And he's not even afraid (yet) of his(interactive) (set-frame-parameter nil 'fullscreen (<span style="color: #7f007f;">if</span> (frame-parameter nil 'fullscreen) nil 'fullboth))) (global-set-key [f11] 'fullscreen) </pre>
~/.emacs. I say that's a very good sign of where we are with <a href="https://github.com/dimitri/el-get">el-get</a>!</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 11 Jan 2011 16:20:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/01/blog/2011/01/11-starting-afresh-with-el-get.html</guid> </item> <item> <title>Starting afresh with el-get</title> <link>http://tapoueh.org/blog/2011/01/11-starting-afresh-with-el-get.html</link> <description><![CDATA[h1>Starting afresh with el-get</h1>
Tuesday, January 11 2011, 16:20 </div><p>It so happens that a colleague of mine wanted to start using <a href="http://www.gnu.org/software/emacs/">Emacs</a> but couldn't get to it. He insists on having proper color themes in all applications and some sensible defaults full of nifty add-ons everywhere, and didn't want to have to learn that much about <em>Emacs</em> and <em>Emacs Lisp</em> to get started. I'm not even sure that he will <a href="http://www.gnu.org/software/emacs/tour/">Take the Emacs tour</a>.</p> <p>You would tell me that there's nothing we can do for so unfriendly users. Well, here's what I did:</p> <pre class="src"> <span style="color: #888a85;">;; </span><span style="color: #888a85;">emacs setup </span> (add-to-list 'load-path <span style="color: #ad7fa8; font-style: italic;">"~/.emacs.d/el-get/el-get"</span>) (<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">el-get</span>) (setqel-get-sources '(el-get php-mode-improved psvn auto-complete switch-window
(<span style="color: #729fcf;">:name</span> buffer-move <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> () (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"<C-S-up>"</span>) 'buf-move-up) (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"<C-S-down>"</span>) 'buf-move-down) (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"<C-S-left>"</span>) 'buf-move-left) (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"<C-S-right>"</span>) 'buf-move-right)))
(<span style="color: #729fcf;">:name</span> magit <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> () (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-x C-z"</span>) 'magit-status)))
(<span style="color: #729fcf;">:name</span> goto-last-change <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> () <span style="color: #888a85;">;; </span><span style="color: #888a85;">azerty keyboard here, don't use C-x C-/ </span> (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-x C-"</span>) 'goto-last-change)))))
(<span style="color: #729fcf; font-weight: bold;">when</span> window-system
(add-to-list 'el-get-sources 'color-theme-tango))
(el-get 'sync)
<span style="color: #888a85;">;; </span><span style="color: #888a85;">visual settings </span>(setq inhibit-splash-screen t) (menu-bar-mode -1) (tool-bar-mode -1) (scroll-bar-mode -1)
(line-number-mode 1) (column-number-mode 1)
<span style="color: #888a85;">;; </span><span style="color: #888a85;">Use the clipboard, pretty please, so that copy/paste "works" </span>(setq x-select-enable-clipboard t)
(set-frame-font <span style="color: #ad7fa8; font-style: italic;">"Monospace-10"</span>)
(global-hl-line-mode)
<span style="color: #888a85;">;; </span><span style="color: #888a85;">suivre les changements exterieurs sur les fichiers </span>(global-auto-revert-mode 1)
<span style="color: #888a85;">;; </span><span style="color: #888a85;">pour les couleurs dans M-x shell </span>(autoload 'ansi-color-for-comint-mode-on <span style="color: #ad7fa8; font-style: italic;">"ansi-color"</span> nil t) (add-hook 'shell-mode-hook 'ansi-color-for-comint-mode-on)
<span style="color: #888a85;">;; </span><span style="color: #888a85;">S-fleches pour changer de fenêtre </span>(windmove-default-keybindings) (setq windmove-wrap-around t)
<span style="color: #888a85;">;; </span><span style="color: #888a85;">find-file-at-point quand ça a du sens </span>(setq ffap-machine-p-known 'accept) <span style="color: #888a85;">; </span><span style="color: #888a85;">no pinging </span>(setq ffap-url-regexp nil) <span style="color: #888a85;">; </span><span style="color: #888a85;">disable URL features in ffap </span>(setq ffap-ftp-regexp nil) <span style="color: #888a85;">; </span><span style="color: #888a85;">disable FTP features in ffap </span>(define-key global-map (kbd <span style="color: #ad7fa8; font-style: italic;">"C-x C-f"</span>) 'find-file-at-point)
(<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">ibuffer</span>) (global-set-key <span style="color: #ad7fa8; font-style: italic;">"\C-x\C-b"</span> 'ibuffer)
<span style="color: #888a85;">;; </span><span style="color: #888a85;">use iswitchb-mode for C-x b </span>(iswitchb-mode)
<span style="color: #888a85;">;; </span><span style="color: #888a85;">I can't remember having meant to use C-z as suspend-frame </span>(global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-z"</span>) 'undo)
<span style="color: #888a85;">;; </span><span style="color: #888a85;">winner-mode pour revenir sur le layout précédent C-c <left> </span>(winner-mode 1)
<span style="color: #888a85;">;; </span><span style="color: #888a85;">dired-x pour C-x C-j </span>(<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">dired-x</span>)
<span style="color: #888a85;">;; </span><span style="color: #888a85;">full screen </span>(<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">fullscreen</span> ()
<p>With just this simple 87 lines (all included) of setup, my local user is very happy to switch to using <a href="http://www.gnu.org/software/emacs/">our favorite editor</a>. And he's not even afraid (yet) of his(interactive) (set-frame-parameter nil 'fullscreen (<span style="color: #729fcf; font-weight: bold;">if</span> (frame-parameter nil 'fullscreen) nil 'fullboth))) (global-set-key [f11] 'fullscreen) </pre>
~/.emacs. I say that's a very good sign of where we are with <a href="https://github.com/dimitri/el-get">el-get</a>!</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/switch-window.html">switch-window</a></p>]]></description>
<h3>New source types</h3> <p class="first">We now have support for the <a href="http://www.archlinux.org/pacman/">pacman</a> package management for <a href="http://www.archlinux.org/">archlinux</a>, and a way to handle a different package name in the recipe and in the distribution. We also have support for <a href="http://mercurial.selenic.com/">mercurial</a> and <a href="http://subversion.tigris.org/">subversion</a> and <a href="http://darcs.net/">darcs</a>.</p> <p>Also, <a href="http://wiki.debian.org/Apt">apt-get</a> will sometime prompt you to validate its choices, that's the infamous <em>Do you want to continue?</em> prompt. We now handle that smoothly.</p> <h3>(el-get 'sync)</h3> <p class="first">In<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 11 Jan 2011 16:20:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2011/01/11-starting-afresh-with-el-get.html</guid> </item> <item> <title>el-get 1.1, with 174 recipes</title> <link>http://tapoueh.org/blog/2010/12/blog/2010/12/20-el-get-11-with-174-recipes.html</link> <description><![CDATA[<p>Yes, you read it well, <a href="https://github.com/dimitri/el-get">el-get</a> currently <em>features</em>
174<a href="https://github.com/dimitri/el-get/tree/master/recipes">recipes</a>, and is now reaching the1.1release. The reason for this release is mainly that I have two big chunks of code to review and the current code has been very stable for awhile. It seems better to do a release with the stable code that exists now before to shake it this much. If you're wondering when to jump in the water and switch to using <em>el-get</em>, now is a pretty good time.</p>1.1, that really means <em>synchronous</em>. That means we install one package after the other, and any error will stop it all. Before that, it was an active wait loop over a parallel install: this option is still available through calling(el-get 'wait).</p> <h3>No more <em>failed to install</em></h3> <p class="first">Exactly. This error you may have encountered sometime is due to trying to install a package over a previous failed install attempt (network outage, disk full, bad work-in-progress recipe, etc). After awhile in the field it was clear that no case where found where you would regret it if <a href="https://github.com/dimitri/el-get">el-get</a> just did removed the previous failed installation for you before to go and install again, as aked. So that's now automatic.</p> <h3>Featuring an overhauled :build facility</h3> <p class="first">Thebuildcommands can now either be a list, as before, or some that we <em>evaluate</em> for you. That allows for easier to maintain <em>recipes</em>, and here's an exemple of that:</p> <pre class="src"> (<span style="color: #da70d6;">:name</span> distel<p>As you see that also allows for maintainance of multi-platform build recipes, and multiple emacs versions too. It's still a little too much on the <em>awkward</em> side of things, though, and that's one of the ongoing work that will happen for next version.</p> <h3>Misc improvements</h3> <p class="first">We are now able to <span style="color: #da70d6;">:type</span> svn <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"http://distel.googlecode.com/svn/trunk/"</span> <span style="color: #da70d6;">:info</span> <span style="color: #bc8f8f;">"doc"</span> <span style="color: #da70d6;">:build</span> `,(mapcar (<span style="color: #7f007f;">lambda</span> (target) (concat <span style="color: #bc8f8f;">"make "</span> target <span style="color: #bc8f8f;">" EMACS="</span> el-get-emacs)) '(<span style="color: #bc8f8f;">"clean"</span> <span style="color: #bc8f8f;">"all"</span>)) <span style="color: #da70d6;">:load-path</span> (<span style="color: #bc8f8f;">"elisp"</span>) <span style="color: #da70d6;">:features</span> distel) </pre>
byte-compileyour packages, and offer some more hooks (el-get-init-hookshas been asked with a nice usage example). There's a new:localnameproperty that allows to pick where to save the local file when usingHTTPmethod for retrieval, and that in turn allows to fix some <em>recipes</em>.</p> <pre class="src"> (<span style="color: #da70d6;">:name</span> xcscope<p>Oh and you even get <span style="color: #da70d6;">:type</span> http <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"http://cscope.cvs.sourceforge.net/viewvc/cscope/cscope/contrib/xcsc</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">ope/xcscope.el?revision=1.14&content-type=text%2Fplain"</span> <span style="color: #da70d6;">:localname</span> <span style="color: #bc8f8f;">"xscope.el"</span> <span style="color: #da70d6;">:features</span> xcscope) </pre>
:beforeuser function support, even if needing it often shows that you're doing it in a strange way. More often than not it's possible to do all you need to in the:afterfunction, but this tool is there so that you spend less time on having a working environment, not more, right? :)</p> <h3>Switch notice</h3> <p class="first">All in all, if you're already using <a href="https://github.com/dimitri/el-get">el-get</a> you should consider switching to1.1(by issuingM-x el-get-updateof course), and if you're hesitating, just join the fun now!</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 20 Dec 2010 16:45:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/12/blog/2010/12/20-el-get-11-with-174-recipes.html</guid> </item> <item> <title>el-get 1.1, with 174 recipes</title> <link>http://tapoueh.org/blog/2010/12/20-el-get-11-with-174-recipes.html</link> <description><![CDATA[h1>el-get 1.1, with 174 recipes</h1>
Monday, December 20 2010, 16:45 </div><p>Yes, you read it well, <a href="https://github.com/dimitri/el-get">el-get</a> currently <em>features</em>174<a href="https://github.com/dimitri/el-get/tree/master/recipes">recipes</a>, and is now reaching the1.1release. The reason for this release is mainly that I have two big chunks of code to review and the current code has been very stable for awhile. It seems better to do a release with the stable code that exists now before to shake it this much. If you're wondering when to jump in the water and switch to using <em>el-get</em>, now is a pretty good time.</p> <h3>New source types</h3> <p class="first">We now have support for the <a href="http://www.archlinux.org/pacman/">pacman</a> package management for <a href="http://www.archlinux.org/">archlinux</a>, and a way to handle a different package name in the recipe and in the distribution. We also have support for <a href="http://mercurial.selenic.com/">mercurial</a> and <a href="http://subversion.tigris.org/">subversion</a> and <a href="http://darcs.net/">darcs</a>.</p> <p>Also, <a href="http://wiki.debian.org/Apt">apt-get</a> will sometime prompt you to validate its choices, that's the infamous <em>Do you want to continue?</em> prompt. We now handle that smoothly.</p> <h3>(el-get 'sync)</h3> <p class="first">In1.1, that really means <em>synchronous</em>. That means we install one package after the other, and any error will stop it all. Before that, it was an active wait loop over a parallel install: this option is still available through calling(el-get 'wait).</p> <h3>No more <em>failed to install</em></h3> <p class="first">Exactly. This error you may have encountered sometime is due to trying to install a package over a previous failed install attempt (network outage, disk full, bad work-in-progress recipe, etc). After awhile in the field it was clear that no case where found where you would regret it if <a href="https://github.com/dimitri/el-get">el-get</a> just did removed the previous failed installation for you before to go and install again, as aked. So that's now automatic.</p> <h3>Featuring an overhauled :build facility</h3> <p class="first">Thebuildcommands can now either be a list, as before, or some that we <em>evaluate</em> for you. That allows for easier to maintain <em>recipes</em>, and here's an exemple of that:</p> <pre class="src"> (<span style="color: #729fcf;">:name</span> distel<p>As you see that also allows for maintainance of multi-platform build recipes, and multiple emacs versions too. It's still a little too much on the <em>awkward</em> side of things, though, and that's one of the ongoing work that will happen for next version.</p> <h3>Misc improvements</h3> <p class="first">We are now able to <span style="color: #729fcf;">:type</span> svn <span style="color: #729fcf;">:url</span> <span style="color: #ad7fa8; font-style: italic;">"http://distel.googlecode.com/svn/trunk/"</span> <span style="color: #729fcf;">:info</span> <span style="color: #ad7fa8; font-style: italic;">"doc"</span> <span style="color: #729fcf;">:build</span> `,(mapcar (<span style="color: #729fcf; font-weight: bold;">lambda</span> (target) (concat <span style="color: #ad7fa8; font-style: italic;">"make "</span> target <span style="color: #ad7fa8; font-style: italic;">" EMACS="</span> el-get-emacs)) '(<span style="color: #ad7fa8; font-style: italic;">"clean"</span> <span style="color: #ad7fa8; font-style: italic;">"all"</span>)) <span style="color: #729fcf;">:load-path</span> (<span style="color: #ad7fa8; font-style: italic;">"elisp"</span>) <span style="color: #729fcf;">:features</span> distel) </pre>
byte-compileyour packages, and offer some more hooks (el-get-init-hookshas been asked with a nice usage example). There's a new:localnameproperty that allows to pick where to save the local file when usingHTTPmethod for retrieval, and that in turn allows to fix some <em>recipes</em>.</p> <pre class="src"> (<span style="color: #729fcf;">:name</span> xcscope<p>Oh and you even get <span style="color: #729fcf;">:type</span> http <span style="color: #729fcf;">:url</span> <span style="color: #ad7fa8; font-style: italic;">"http://cscope.cvs.sourceforge.net/viewvc/cscope/cscope/contrib/xcsc</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">ope/xcscope.el?revision=1.14&content-type=text%2Fplain"</span> <span style="color: #729fcf;">:localname</span> <span style="color: #ad7fa8; font-style: italic;">"xscope.el"</span> <span style="color: #729fcf;">:features</span> xcscope) </pre>
:beforeuser function support, even if needing it often shows that you're doing it in a strange way. More often than not it's possible to do all you need to in the:afterfunction, but this tool is there so that you spend less time on having a working environment, not more, right? :)</p> <h3>Switch notice</h3> <p class="first">All in all, if you're already using <a href="https://github.com/dimitri/el-get">el-get</a> you should consider switching to1.1(by issuingM-x el-get-updateof course), and if you're hesitating, just join the fun now!</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/debian.html">debian</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/release.html">release</a></p>]]></description>
<item><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 20 Dec 2010 16:45:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/12/20-el-get-11-with-174-recipes.html</guid> </item>
<title>Dynamic Triggers in PLpgSQL</title> <link>http://tapoueh.org/blog/2010/11/24-dynamic-triggers-in-plpgsql.html</link> <description><![CDATA[h1>Dynamic Triggers in PLpgSQL</h1>
Wednesday, November 24 2010, 16:45 </div><p>You certainly know that implementing <em>dynamic</em> triggers inPLpgSQLis impossible. But I had a very bad night, being up from as soon as 3:30 am today, so that when a developer asked me about reusing the same trigger function code from more than one table and for a dynamic column name, I didn't remember about it being impossible.</p> <p>Here's what happens in such cases, after a long time on the problem (yes, overall, that's a slow day). Note that I'm abusing the(record_literal).*notation a lot in there, and even the(record_literal).column_nametoo.</p> <pre class="src"> CREATE OR REPLACE FUNCTION public.update_timestamp()RETURNS TRIGGER LANGUAGE plpgsql AS $f$ DECLARE ts_column varchar; old_timestamp timestamptz; attname name; n text; v text; BEGIN IF TG_NARGS != 1 THEN RAISE EXCEPTION <span style="color: #ad7fa8; font-style: italic;">'Trigger public.update_timestamp() called with % args'</span>, TG_NARGS; END IF;
ts_column := TG_ARGV[0];
EXECUTE <span style="color: #ad7fa8; font-style: italic;">'SELECT n.'</span> ts_column <span style="color: #ad7fa8; font-style: italic;">' FROM (SELECT ('</span> quote_literal(OLD) <span style="color: #ad7fa8; font-style: italic;">'::'</span> TG_RELID::regclass <span style="color: #ad7fa8; font-style: italic;">').*) as n'</span> INTO old_timestamp;
<span style="color: #888a85;">— build NEW record text </span> n := <span style="color: #ad7fa8; font-style: italic;">'('</span>; FOR attname IN EXECUTE <span style="color: #ad7fa8; font-style: italic;">'SELECT attname '</span>
<span style="color: #ad7fa8; font-style: italic;">' FROM pg_class c left join pg_attribute a on a.attrelid = c.oid'</span> <span style="color: #ad7fa8; font-style: italic;">' WHERE c.oid = $1 and attnum > 0 order by attnum'</span> USING TG_RELID LOOP
EXECUTE <span style="color: #ad7fa8; font-style: italic;">'SELECT ('</span> quote_literal(NEW) <span style="color: #ad7fa8; font-style: italic;">'::'</span> TG_RELID::regclass <span style="color: #ad7fa8; font-style: italic;">').'</span> attname INTO v; IF n != <span style="color: #ad7fa8; font-style: italic;">'('</span> THEN n := n <span style="color: #ad7fa8; font-style: italic;">','</span>; END IF; IF attname = ts_column AND v::timestamptz IS NOT DISTINCT FROM old_timestamp THEN
n := n now(); ELSE
n := n COALESCE(v, <span style="color: #ad7fa8; font-style: italic;">''</span>); END IF; END LOOP;
n := n <span style="color: #ad7fa8; font-style: italic;">')'</span>; EXECUTE <span style="color: #ad7fa8; font-style: italic;">'SELECT ($1::'</span> TG_RELID::regclass <span style="color: #ad7fa8; font-style: italic;">').*'</span> INTO NEW USING n;
<p>It's not pretty, and not fast. It's aboutRETURN NEW; END; $f$; </pre>
2 msper call on a table with15columns, in some preliminary tests. But it sure was a nice challenge!</p> <h2>Tags</h2> <p><a href="../../../tags/plpgsql.html">plpgsql</a></p>]]></description>
<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Wed, 24 Nov 2010 16:45:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/11/24-dynamic-triggers-in-plpgsql.html</guid> </item> <item> <title>pg_basebackup</title> <link>http://tapoueh.org/blog/2010/11/07-pg_basebackup.html</link> <description><![CDATA[h1>pg_basebackup</h1>
Sunday, November 07 2010, 13:45 </div><p><a href="http://2ndquadrant.com/about/#krosing">Hannu</a> just gave me a good idea in <a href="http://archives.postgresql.org/pgsql-hackers/2010-11/msg00236.php">this email</a> on <a href="http://archives.postgresql.org/pgsql-hackers/">-hackers</a>, proposing that <a href="https://github.com/dimitri/pg_basebackup">pg_basebackup</a> should get thexlogfiles again and again in a loop for the whole duration of the <em>base backup</em>. That's now done in the aforementioned tool, whose options got a little more useful now:</p> <pre class="src"> Usage: pg_basebackup.py [-v] [-f] [-j jobs] <span style="color: #ad7fa8; font-style: italic;">"dsn"</span> destOptions:
<p>Yeah, as implementing the
- h, —help show this help message and exit
- -version show version and quit
- x, —pg_xlog backup the pg_xlog files
- v, —verbose be verbose and about processing progress
- d, —debug show debug information, including SQL queries
- f, —force remove destination directory if it exists
- j JOBS, —jobs=JOBS how many helper jobs to launch
- D DELAY, —delay=DELAY
pg_xlog subprocess loop delay, see -x
- S, —slave auxilliary process
- -stdin get list of files to backup from stdin </pre>
xlogidea required having some kind of parallelism, I built on it and the script now has a--jobsoption for you to setup how many processes to launch in parallel, all fetching somebase backupfiles in its own standard (libpq) <a href="http://www.postgresql.org/">PostgreSQL</a> connection, in compressed chunks of8 MB(so that's not8 MBchunks sent over).</p> <p>Thexlogloop will fetch anyWALfile whosectimechanged again, wholesale. It's easier this way, and tools to get optimized behavior already do exist, either <a href="http://skytools.projects.postgresql.org/doc/walmgr.html">walmgr</a> or <a href="http://www.postgresql.org/docs/9.0/interactive/warm-standby.html#STREAMING-REPLICATION">walreceiver</a>.</p> <p>The script is still a little <a href="http://python.org/">python</a> self-contained short file, it just went from about100lines of code to about400lines. There's no external dependency, all it needs is provided by a standard python installation. The problem with that is that it's usingselect.poll()that I think is not available on windows. Supporting every system or adding to the dependencies, I've been choosing what's easier for me.</p> <pre class="src"><p>If you get to try it, please report about it, you should know or easily discover my <em>email</em>!</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/skytools.html">skytools</a> <a href="../../../tags/backup.html">backup</a></p><span style="color: #729fcf; font-weight: bold;">import</span> select <span style="color: #eeeeec;">p</span> = select.poll() p.register(sys.stdin, select.POLLIN) </pre>
]]></description>
<item><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Sun, 07 Nov 2010 13:45:00 +0100</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/11/07-pg_basebackup.html</guid> </item>
<title>Introducing Extensions</title> <link>http://tapoueh.org/blog/2010/10/21-introducing-extensions.html</link> <description><![CDATA[h1>Introducing Extensions</h1>
Thursday, October 21 2010, 13:45 </div><p>After reading <a href="http://database-explorer.blogspot.com/2010/10/extensions-in-91.html">Simon's blog post</a>, I can't help but try to give some details about what it is exactly that I'm working on. As he said, there are several aspects to <em>extensions</em> in <a href="http://www.postgresql.org/">PostgreSQL</a>, it all begins here: <a href="http://www.postgresql.org/docs/9.0/interactive/extend.html">Chapter 35. Extending SQL</a>.</p> <p>It's possible, and mostly simple enough, to add your own code or behavior to PostgreSQL, so that it will use your code and your semantics while solving user queries. That's highly useful and it's easy to understand how so when you look at some projects like <a href="http://postgis.refractions.net/">PostGIS</a>, <a href="http://pgfoundry.org/projects/ip4r/">ip4r</a> (index searches ofipin arange, not limited toCIDRnotation), or our own <em>Key Value Store</em>, <a href="http://www.postgresql.org/docs/9.0/interactive/hstore.html">hstore</a>.</p> <h3>So, what's in an <em>Extension</em>?</h3> <p class="first">An <em>extension</em> in its simple form is aSQL<em>script</em> that you load on your database, but manage separately. Meaning you don't want the script to be part of your backups. Often, that kind of script will create new datatypes and operators, support functions, user functions and index support, and then it would include someCcode that ships in a <em>shared library object</em>.</p> <p>As far as PostgreSQL is concerned, at least in the current version of my patch, the extension is first a <em>meta</em> information file that allows to register it. We currently call that thecontrolfile. Then, it's anSQLscript that is <em>executed</em> by the server when youcreatethe <em>extension</em>.</p> <p>If it so happens that theSQLscript depends on some <em>shared library objects</em> file, this has to be present at the right place (MODULE_PATHNAME) for the <em>extension</em> to be successfully created, but that's always been the case.</p> <p>The problem with current releases of PostgreSQL, that the <em>extension</em> patch is solving, is thepg_dumpandpg_restoresupport. We said it, you don't want theSQLscript to be part of your dump, because it's not maintained in your database, but in some code repository out there. What you want is to be able to install the <em>extension</em> again at the file system level thenpg_restoreyour database — that depends on it being there.</p> <p>And that's exactly what the <em>extension</em> patch provides. By now having aSQLobject called anextension, and maintained in the newpg_extensioncatalog, we have anOidto refer to. Which we do by recording a dependency between any object created by the script and the <em>extension</em>Oid, so thatpg_dumpcan be instructed to skip those.</p> <h3>Examples?</h3> <p class="first">So, let's have a look at what you can do if you play with a patched development server version, or if you play directly from thegitrepository at <a href="http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=shortlog;h=refs/heads/extension">http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=shortlog;h=refs/heads/extension</a></p> <pre class="src"> dim ~ createdb exts dim ~ psql exts psql (9.1devel) Type <span style="color: #ad7fa8; font-style: italic;">"help"</span> for help.dim=# \dx+
List of extensions
Name | Description
+-+-+————————————————————————-
adminpack | Administrative functions for PostgreSQL auto_username | functions for tracking who changed a table autoinc | functions for autoincrementing fields btree_gin | GIN support for common types BTree operators btree_gist | GiST support for common types BTree operators chkpass | Store crypt()ed passwords citext | case-insensitive character string type cube | data type for representing multidimensional cubes dblink | connect to other PostgreSQL databases from within a database dict_int | example of an add-on dictionary template for full-text search dict_xsyn | example of an add-on dictionary template for full-text search earthdistance | calculating great circle distances on the surface of the Earth fuzzystrmatch | determine similarities and distance between strings hstore | storing sets of key/value pairs int_aggregate | integer aggregator and an enumerator (obsolete) intarray | one-dimensional arrays of integers: functions, operators, index support isn | data types for the international product numbering standards lo | managing Large Objects ltree | data type for hierarchical tree-like structure moddatetime | functions for tracking last modification time pageinspect | inspect the contents of database pages at a low level pg_buffercache | examine the shared buffer cache in real time pg_freespacemap | examine the free space map (FSM) pg_stat_statements | tracking execution statistics of all SQL statements executed pg_trgm | determine the similarity of text, with indexing support pgcrypto | cryptographic functions pgrowlocks | show row locking information for a specified table pgstattuple | obtain tuple-level statistics prefix | Prefix Match Indexing refint | functions for implementing referential integrity seg | data type for representing line segments, or floating point intervals tablefunc | various functions that return tables, including crosstab(text sql) test_parser | example of a custom parser for full-text search timetravel | functions for implementing time travel tsearch2 | backwards-compatible text search functionality (pre-8.3) unaccent | text search dictionary that removes accents (36 rows) </pre>
<p>Ok I've edited the output in a visible way, to leave the <em>Version</em> and <em>Custom Variable Classes</em> column out. It's taking lots of screen place and it's not that useful here. Maybe the <em>classes</em> one will even get dropped out of the patch before reaching9.1, we'll see.</p> <p>Let's pick an extension there and install it in our new database:</p> <pre class="src"> exts=# create extension pg_trgm; NOTICE: Installing extension 'pg_trgm' from '/Users/dim/pgsql/exts/share/contrib/pg_trgm.sql', with user data CREATE EXTENSION exts=# \dxList of extensions
Name Description
+—+—+———————————————————
pg_trgm determine the similarity of text, with indexing support (1 row) </pre>
<p>See, that was easy enough. Same thing, the extra columns have been removed. So, what's in this extension, will you ask me, what are those objects that you would normally (that is, before the patch) find in yourpg_dumpbackup script?</p> <pre class="src"> exts=# select * from pg_extension_objects('pg_trgm');
class classid objid objdesc
+———+——-+—————————————————————————————————————————————-
pg_extension 3996 18498 extension pg_trgm pg_proc 1255 18499 function set_limit(real) pg_proc 1255 18500 function show_limit() pg_proc 1255 18501 function show_trgm(text) pg_proc 1255 18502 function similarity(text,text) pg_proc 1255 18503 function similarity_op(text,text) pg_operator 2617 18504 operator %(text,text) pg_type 1247 18505 type gtrgm pg_proc 1255 18506 function gtrgm_in(cstring) pg_proc 1255 18507 function gtrgm_out(gtrgm) pg_type 1247 18508 type gtrgm[] pg_proc 1255 18509 function gtrgm_consistent(internal,text,integer,oid,internal) pg_proc 1255 18510 function gtrgm_compress(internal) pg_proc 1255 18511 function gtrgm_decompress(internal) pg_proc 1255 18512 function gtrgm_penalty(internal,internal,internal) pg_proc 1255 18513 function gtrgm_picksplit(internal,internal) pg_proc 1255 18514 function gtrgm_union(bytea,internal) pg_proc 1255 18515 function gtrgm_same(gtrgm,gtrgm,internal) pg_opfamily 2753 18516 operator family gist_trgm_ops for access method gist pg_opclass 2616 18517 operator class gist_trgm_ops for access method gist pg_amop 2602 18518 operator 1 %(text,text) of operator family gist_trgm_ops for access method gist pg_amproc 2603 18519 function 1 gtrgm_consistent(internal,text,integer,oid,internal) of operator family gist_trgm_ops for access method gist pg_amproc 2603 18520 function 2 gtrgm_union(bytea,internal) of operator family gist_trgm_ops for access method gist pg_amproc 2603 18521 function 3 gtrgm_compress(internal) of operator family gist_trgm_ops for access method gist pg_amproc 2603 18522 function 4 gtrgm_decompress(internal) of operator family gist_trgm_ops for access method gist pg_amproc 2603 18523 function 5 gtrgm_penalty(internal,internal,internal) of operator family gist_trgm_ops for access method gist pg_amproc 2603 18524 function 6 gtrgm_picksplit(internal,internal) of operator family gist_trgm_ops for access method gist pg_amproc 2603 18525 function 7 gtrgm_same(gtrgm,gtrgm,internal) of operator family gist_trgm_ops for access method gist pg_proc 1255 18526 function gin_extract_trgm(text,internal) pg_proc 1255 18527 function gin_extract_trgm(text,internal,smallint,internal,internal) pg_proc 1255 18528 function gin_trgm_consistent(internal,smallint,text,integer,internal,internal) pg_opfamily 2753 18529 operator family gin_trgm_ops for access method gin pg_opclass 2616 18530 operator class gin_trgm_ops for access method gin pg_amop 2602 18531 operator 1 %(text,text) of operator family gin_trgm_ops for access method gin pg_amproc 2603 18532 function 1 btint4cmp(integer,integer) of operator family gin_trgm_ops for access method gin pg_amproc 2603 18533 function 2 gin_extract_trgm(text,internal) of operator family gin_trgm_ops for access method gin pg_amproc 2603 18534 function 3 gin_extract_trgm(text,internal,smallint,internal,internal) of operator family gin_trgm_ops for access method gin pg_amproc 2603 18535 function 4 gin_trgm_consistent(internal,smallint,text,integer,internal,internal) of operator family gin_trgm_ops for access method gin (38 rows) </pre>
<p>This function main intended users are the <em>extension</em> authors themselves, so that it's easy for them to figure out which system identifier (theobjidcolumn) has been attributed to someSQLobjects from their install script. With this knowledge, you can prepare some <em>upgrade</em> scripts. But that's for another patch altogether, so we'll get back to the matter in another blog entry.</p> <p>So we chose <a href="http://www.postgresql.org/docs/9.0/interactive/pgtrgm.html">trgm</a> as an example, let's follow the documentation and create a test table and a custom index in there, just so that the extension is put to good use. Then let's try toDROPour extension, because we're testing the infrastructure, right?</p> <pre class="src"> exts=# create table test(id bigint, name text); CREATE TABLE exts=# CREATE INDEX idx_test_name ON test USING gist (name gist_trgm_ops); CREATE INDEX exts=# drop extension pg_trgm; ERROR: cannot drop extension pg_trgm because other objects depend on it DETAIL: index idx_test_name depends on operator class gist_trgm_ops for access method gist HINT: Use DROP ... CASCADE to drop the dependent objects too. </pre> <p>Of course PostgreSQL is smart enough here — the <em>extension</em> patch had nothing special to do to achieve that, apart from recording the dependencies. Next, as we didn'tdrop extension pg_trgm cascade;, it's still in the database. So let's see what apg_dumpwill look like. As it's quite a lot of text to paste, let's see thepg_restorecatalog instead. And that's a feature that needs to be known some more, too.</p> <pre class="src">
dim ~ pg_dump -Fc exts pg_restore -l grep -v '^;' 1812; 1262 18497 DATABASE - exts dim 1; 3996 18498 EXTENSION - pg_trgm 1813; 0 0 COMMENT - EXTENSION pg_trgm 6; 2615 2200 SCHEMA - public dim 1814; 0 0 COMMENT - SCHEMA public dim 1815; 0 0 ACL - public dim 320; 2612 11602 PROCEDURAL LANGUAGE - plpgsql dim 1521; 1259 18543 TABLE public test dim 1809; 0 18543 TABLE DATA public test dim 1808; 1259 18549 INDEX public idx_test_name dim </pre>
<p>As you see, the only SQL object that got into the backup are anEXTENSIONand itsCOMMENT. Nothing like the types or the functions that thepg_trgmscript creates.</p> <h3>What does it means to extension authors?</h3> <p class="first">In order to be an <em>extension</em>, you have to prepare a <em>control</em> file where to give the necessary information to register your script. This file must be namedextension.controlif the script is namedextension.sql, at least at the moment. This file can benefit from some variable expansion too, like does the currentextension.sql.in, in that if you provide anextension.control.infile the termVERSIONwill be expanded to whatever$(VERSION)is set to in yourMakefile.</p> <p>If you never wrote aCcoded <em>extension</em> for PostgreSQL, this might look complex and irrelevant. Baseline is that you need aMakefileso that you can benefit easily from the PostgreSQL infrastructure work and have themake installoperation place your files at the right place, including the newcontrolfile.</p> <h3>That's it for today, folks</h3> <p class="first">A next blog entry will detail what happens with extensions providing <em>user data</em>, and theCREATE EXTENSION name WITH NO DATA;variant. Stay tuned!</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/ip4r.html">ip4r</a> <a href="../../../tags/plpgsql.html">plpgsql</a> <a href="../../../tags/backup.html">backup</a> <a href="../../../tags/restore.html">restore</a> <a href="../../../tags/prefix.html">prefix</a> <a href="../../../tags/9.1.html">9.1</a></p>]]></description>
<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 21 Oct 2010 13:45:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/10/21-introducing-extensions.html</guid> </item> <item> <title>Extensions: writing a patch for PostgreSQL</title> <link>http://tapoueh.org/blog/2010/10/15-extensions-writing-a-patch-for-postgresql.html</link> <description><![CDATA[h1>Extensions: writing a patch for PostgreSQL</h1>
Friday, October 15 2010, 11:30 </div><p><span class="hack"> </span></p> <p>These days, thanks to my <a href="http://2ndquadrant.com/">community oriented job</a>, I'm working full time on a <a href="http://www.postgresql.org/">PostgreSQL</a> patch to terminate basic support for <a href="http://www.postgresql.org/docs/9/static/extend.html">extending SQL</a>. First thing I want to share is that patching the <em>backend code</em> is not as hard as one would think. Second one is that <a href="http://git-scm.com/">git</a> really is helping.</p> <p><em>“Not as hard as one would think</em>, are you kidding me?”, I hear some say. Well, that's true. It'sCcode in there, but with a very good layer of abstractions so that you're not dealing with subtle problems that much. Of course it happens that you have to, and managing the memory isn't an option. That said,palloc()and the <em>memory contexts</em> implementation makes that as easy as <em>in lots of cases, you don't have to think about it</em>.</p> <p>PostgreSQL is very well known for its reliability, and that's not something that just happened. All the source code is organized in a way that makes it possible, so your main task is to write code that looks as much as possible like the existing surrounding code. And we all know how to <em>copy paste</em>, right?</p> <p>So, my current work on the <em>extensions</em> is to make it so that if you install <a href="http://www.postgresql.org/docs/9.0/interactive/hstore.html">hstore</a> in your database (to pick an example), your backup won't contain any <em>hstore</em> specific objects (types, functions, operators, index support objects, etc) but rather a single line that tells PostgreSQL to install <em>hstore</em> again.</p> <pre class="src"> CREATE EXTENSION hstore; </pre> <p>The feature already works in <a href="http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=shortlog;h=refs/heads/extension">my git branch</a> and I'm extracting infrastructure work in there to ease review. That's whengithelps a lot. What I've done is create a new branch from the master one, then <a href="http://www.kernel.org/pub/software/scm/git/docs/git-cherry-pick.html">cherry pick</a> the patches of interest. Well sometime you have to resort to helper tools. I've been told after the fact that usinggit cherry-pick -nwould have allowed the following to be much simpler:</p> <pre class="src"> dim ~/dev/PostgreSQL/postgresql-extension git cherry-pick 3f291b4f82598309368610431cf2a18d7b7a7950 error: could not apply 3f291b4... Implement dependency tracking for CREATE EXTENSION, and DROP EXTENSION ... CASCADE. hint: after resolving the conflicts, mark the corrected paths hint: with 'git add <paths>' or 'git rm <paths>' hint: and commit the result with 'git commit -c 3f291b4' dim ~/dev/PostgreSQL/postgresql-extension git status \ | awk '/modified/ && ! /both/ && ! /genfile/ {print $3}<p>That's what I did to prepare a side branch containing only changes to a part of my current work. I had to filter the diff so much only because I'm commiting in rather big steps, rather than very little chunks at a time. In this case that means I had a single patch with several <em>units</em> of changes and I wanted to extract only one. Well, it happens that even in such a case, /deleted/ {print $5} /both/ {print $4}' \ | xargs echo git reset — \ | sh Unstaged changes after reset: M src/backend/catalog/dependency.c M src/backend/catalog/heap.c M src/backend/catalog/pg_aggregate.c M src/backend/catalog/pg_conversion.c M src/backend/catalog/pg_namespace.c M src/backend/catalog/pg_operator.c M src/backend/catalog/pg_proc.c M src/backend/catalog/pg_type.c M src/backend/commands/extension.c M src/backend/commands/foreigncmds.c M src/backend/commands/opclasscmds.c M src/backend/commands/proclang.c M src/backend/commands/tsearchcmds.c M src/backend/nodes/copyfuncs.c M src/backend/nodes/equalfuncs.c M src/backend/parser/gram.y M src/include/catalog/dependency.h M src/include/commands/extension.h M src/include/nodes/parsenodes.h </pre>
gitis helping!</p> <p>There's more to say about the <em>extension</em> related feature of course, but that'll do it for this article. I'd just end up with the following nice <em>diffstat</em> of 4 days of work:</p> <pre class="src"> dim ~/dev/PostgreSQL/postgresql-extension git —no-pager diff master..|wc -l3897 dim ~/dev/PostgreSQL/postgresql-extension git —no-pager diff master..|diffstat
doc/src/sgml/extend.sgml 46 ++ doc/src/sgml/ref/allfiles.sgml 2 doc/src/sgml/ref/create_extension.sgml 95 ++++ doc/src/sgml/ref/drop_extension.sgml 115 +++++ doc/src/sgml/reference.sgml 2 src/backend/access/transam/xlog.c 95 —- src/backend/catalog/Makefile 1 src/backend/catalog/dependency.c 25 + src/backend/catalog/heap.c 9 src/backend/catalog/objectaddress.c 14 src/backend/catalog/pg_aggregate.c 7 src/backend/catalog/pg_conversion.c 7 src/backend/catalog/pg_namespace.c 13 src/backend/catalog/pg_operator.c 7 src/backend/catalog/pg_proc.c 7 src/backend/catalog/pg_type.c 8 src/backend/commands/Makefile 3 src/backend/commands/comment.c 6 src/backend/commands/extension.c 688 +++++++++++++++++++++++++++++++++ src/backend/commands/foreigncmds.c 19 src/backend/commands/functioncmds.c 7 src/backend/commands/opclasscmds.c 13 src/backend/commands/proclang.c 7 src/backend/commands/tsearchcmds.c 25 + src/backend/nodes/copyfuncs.c 22 + src/backend/nodes/equalfuncs.c 18 src/backend/parser/gram.y 51 ++ src/backend/tcop/utility.c 27 + src/backend/utils/adt/genfile.c 193 +++++++++ src/backend/utils/init/postinit.c 3 src/backend/utils/misc/Makefile 2 src/backend/utils/misc/cfparser.c 113 +++++ src/backend/utils/misc/guc-file.l 26 - src/backend/utils/misc/guc.c 160 ++++++- src/bin/pg_dump/common.c 6 src/bin/pg_dump/pg_dump.c 520 ++++++++++++++++++++++— src/bin/pg_dump/pg_dump.h 10 src/bin/pg_dump/pg_dump_sort.c 7 src/bin/psql/command.c 3 src/bin/psql/describe.c 45 ++ src/bin/psql/describe.h 3 src/bin/psql/help.c 1 src/include/catalog/dependency.h 1 src/include/catalog/indexing.h 6 src/include/catalog/pg_extension.h 61 ++ src/include/catalog/pg_proc.h 13 src/include/catalog/toasting.h 1 src/include/commands/extension.h 54 ++ src/include/nodes/nodes.h 2 src/include/nodes/parsenodes.h 20 src/include/parser/kwlist.h 1 src/include/utils/builtins.h 4 src/include/utils/cfparser.h 18 src/include/utils/guc.h 11 src/makefiles/pgxs.mk 21 - <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/backup.html">backup</a></p>55 files changed, 2456 insertions(+), 188 deletions(-) </pre>
]]></description>
<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 15 Oct 2010 11:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/10/15-extensions-writing-a-patch-for-postgresql.html</guid> </item> <item> <title>Date puzzle for starters</title> <link>http://tapoueh.org/blog/2010/10/08-date-puzzle-for-starters.html</link> <description><![CDATA[h1>Date puzzle for starters</h1>
Friday, October 08 2010, 10:00 </div><p>The <a href="http://www.postgresql.org/">PostgreSQL</a>IRCchannel is a good place to be, for all the very good help you can get there, because people are always wanting to remain helpful, because of the off-topics discussions sometime, or to get to talk with community core members. And to start up your day too.</p> <p>This morning's question started simple : “how can I check if today is the "first sunday fo the month". or "the second tuesday of the month" etc?”</p> <p>And the first version of the answer, quite simple it is too:</p> <pre class="src"> dim=# with begin(d) as (select date_trunc(<span style="color: #ad7fa8; font-style: italic;">'month'</span>, <span style="color: #ad7fa8; font-style: italic;">'today'</span>::date)::date) dim-# select d + 7 - extract(dow from d)::int as sunday from begin;<p>So you just have to compare the result of the function withsunday <span style="color: #888a85;">———— </span> 2010-10-03 (1 row) </pre>
'today'::dateand there you go. The problem is that the question could be read in the other way round, like, what is today in <em>first</em> or <em>second</em> <em>day name</em> of this month <em>format</em>? Once more, <a href="http://blog.rhodiumtoad.org.uk/">RhodiumToad</a> to the rescue:</p> <pre class="src"> select to_char(current_date,
<span style="color: #ad7fa8; font-style: italic;">'"'</span> ((ARRAY[<span style="color: #ad7fa8; font-style: italic;">'First'</span>,<span style="color: #ad7fa8; font-style: italic;">'Second'</span>,<span style="color: #ad7fa8; font-style: italic;">'Third'</span>,<span style="color: #ad7fa8; font-style: italic;">'Fourth'</span>,<span style="color: #ad7fa8; font-style: italic;">'Fifth'</span>]) [(extract(day from current_date)::integer - 1)/7 + 1] )
<span style="color: #ad7fa8; font-style: italic;">'" Day'</span>); <p>That's a straight answer to the question, read that way!</p> <p>But the part that I found nice to play with was my first reading of the question, as I don't get to lose my ideas that easily, you see… so what about writing a function to return the date of any <em>nth</em> occurrence of a given <em>day of week</em> in a <em>given month</em>, defaulting to this very month?</p> <pre class="src"> create or replace function get_nth_dow_of_monthto_char <span style="color: #888a85;">—————— </span> Second Friday (1 row) </pre>
( nth int, dow int, begin date default current_date ) returns date language sql strict as $$ with month(d) as ( select generate_series(date_trunc(<span style="color: #ad7fa8; font-style: italic;">'month'</span>, $3), date_trunc(<span style="color: #ad7fa8; font-style: italic;">'month'</span>, $3) + interval <span style="color: #ad7fa8; font-style: italic;">'1 month - 1 day'</span>, interval <span style="color: #ad7fa8; font-style: italic;">'1 day'</span>)::date ), repeat as ( select d, extract(dow from d) as dow, (d - date_trunc(<span style="color: #ad7fa8; font-style: italic;">'month'</span>, $3)::date) / 7 as repeat from month ) select d from repeat where dow = $2 and repeat = $1; $$;
dim=# select get_nth_dow_of_month(0, 0);
get_nth_dow_of_month <span style="color: #888a85;">———————- </span> 2010-10-03 (1 row)
dim=# select get_nth_dow_of_month(1, 4, <span style="color: #ad7fa8; font-style: italic;">'2010-09-12'</span>);
<p>So you see we just got the first Sunday of this monthget_nth_dow_of_month <span style="color: #888a85;">———————- </span> 2010-09-09 (1 row) </pre>
(0, 0)and the second Thursday(1, 4)of the previous one. Any date within a month is a good way to tell which month you want to work in, as the function's written, abusingdate_trunclike it does.</p> <p>Now the way the function is written is unfinished. You want to fix it in one of two ways. Either stop usinggenerate_seriesto only output one row at a time, or fix theAPIso that you can ask for more than a <em>nth dow</em> at a time. Of course, that was a starter for me, not a problem I need to solve directly, and that was a good excuse for a blog entry, so I won't fix it. That's left as an exercise to our interested readers!</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/9.1.html">9.1</a></p>]]></description>
<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 08 Oct 2010 10:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/10/08-date-puzzle-for-starters.html</guid> </item> <item> <title>Resuming work on Extensions, first little step</title> <link>http://tapoueh.org/blog/2010/10/07-resuming-work-on-extensions-first-little-step.html</link> <description><![CDATA[h1>Resuming work on Extensions, first little step</h1>
Thursday, October 07 2010, 17:15 </div><p><span class="hack"> </span></p> <p>Yeah I'm back on working on my part of the extension thing in <a href="http://www.postgresql.org/">PostgreSQL</a>.</p> <p>First step is a little one, but as it has public consequences, I figured I'd talk about it already. I've just refreshed mygitrepository to follow the newmasterone, and you can see that here <a href="http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=commitdiff;h=9a88e9de246218e93c04b6b97e1ef61d97925430">http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=commitdiff;h=9a88e9de246218e93c04b6b97e1ef61d97925430</a>.</p> <p>It's been easier than I feared, mainly:</p> <pre class="src"> $ git —no-pager diff master..extension $ git —no-pager format-patch master..extension $ cp 0001-First-stab-at-writing-pg_execute_from_file-function.patch .. $ git checkout master $ git pull -f pgmaster $ git reset —hard pgmaster/master $ git checkout extension $ git reset —hard master $ git am -s ../0001-First-stab-at-writing-pg_execute_from_file-function.edit.patch $ git status
$ git log —short head $ git log -n2 —oneline $ git push -f </pre>
<p>So that's still more steps that one want to call dead simple, but still. Theformat-patchcommand is to save my work away (all patches that are in the <em>extension</em> branch but not in the <em>master</em> — well that was only one of them here). Then, as the master repositoryURLdidn't change, I can simplypullthe changes in. Of course I had a nice message <em>warning: no common commits</em>.</p> <p>Once pulled, I trashed my local copy and replaced it with the new official one, that'sgit reset --hard pgmaster/master, then in the <em>extension</em> branch I could trash it and have it linked to the localmasteragain.</p> <p>Of course, thegit ammethod wouldn't apply my patch as-is, there was some underlying changes in the source files, the identification tag changed from$PostgreSQL$to, e.g.,src/backend/utils/adt/genfile.c, and I had to cope with that. Maybe there's some tool (git am -3?) to do it automatically, I just copy edited the.patchfile.</p> <p>Lastly, it's all about checking the result and publishing the result. This last line isgit push -fand is when I just trashed and replaced my <a href="http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=summary">postgresql-extension</a> community repository. I don't think anybody was following it, but should it be the case, you will have to <em>reinit</em> your copy.</p> <p>More blog posts to come about extensions, as I arranged to have some real time to devote on the topic. At least I was able to arrange things so that I can work on the subject for real, and the first thing I did, the very night before it was meant to begin, is catch a <em>tonsillitis</em>. Lost about a week, not the project! Stay tuned!</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/extensions.html">Extensions</a></p>]]></description>
<p>Now existing users will certainly just be moderatly happy to see the tool reach that version number, depending whether they think more about the bugs they want to see fixed (ftp is supported, only called http) and the new features they want to see in (<em>info</em> documentation) or more about what<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 07 Oct 2010 17:15:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/10/07-resuming-work-on-extensions-first-little-step.html</guid> </item> <item> <title>el-get reaches 1.0</title> <link>http://tapoueh.org/blog/2010/10/blog/2010/10/07-el-get-reaches-10.html</link> <description><![CDATA[<p>It's been a week since the last commits in the <a href="http://github.com/dimitri/el-get">el-get repository</a>, and those were all about fixing and adding recipes, and about notifications. Nothing like <em>core plumbing</em> you see. Also,
0.9was released on <em>2010-08-24</em> and felt pretty complete already, then received lots of improvements. It's high time to cross the line and call it1.0!</p>el-getdoes for them already today...</p> <p>For the new users, or the yet-to-be-convinced users, let's take some time and talk aboutel-get. A <em>FAQ</em> like session might be best.</p> <h3>How is el-get different from ELPA?</h3> <p><a href="http://tromey.com/elpa/">ELPA</a> is the <em>Emacs Lisp Package Archive</em> and is also known aspackage.el, to be included in Emacs 24. This allows emacs list extension authors to <em>package</em> their work. That means they have to follow some guidelines and format their contribution, then propose it for upload.</p> <p>This requires licence checks (good) and for the <a href="http://elpa.gnu.org/">new official ELPA mirror</a> it even requires dead-tree papers exchange and contracts and copyright assignments, I believe.</p> <h3>Why have both?</h3> <p class="first">While <em>ELPA</em> is a great thing to have, it's so easy to find some high quality Emacs extension out there that are not part of the offer. Either authors are not interrested into uploading to ELPA, or they don't know how to properly <em>package</em> for it (it's only simple for single file extensions, see).</p> <p>Soel-getis a pragmatic answer here. It's there because it so happens that I don't depend only on emacs extensions that are available with Emacs itself, in my distributionsite-lispand inELPA. I need some more, and I don't need it to be complex to find it, fetch it, init it and use it.</p> <p>Of course I could try and package any extension I find I need and submit it toELPA, but really, to do that nicely I'd need to contact the extension author (<em>upstream</em>) for him to accept my patch, and then consider a fork.</p> <p>Withel-getI propose distributed packaging if you will. Let's have a look at two <em>recipes</em> here. First, theel-getone itself:</p> <pre class="src"> (<span style="color: #da70d6;">:name</span> el-get<p>Then a much more complex one, the <a href="http://bbdb.sourceforge.net/">bbdb</a> one:</p> <pre class="src"> (<span style="color: #da70d6;">:name</span> bbdb <span style="color: #da70d6;">:type</span> git <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"git://github.com/dimitri/el-get.git"</span> <span style="color: #da70d6;">:features</span> el-get <span style="color: #da70d6;">:compile</span> <span style="color: #bc8f8f;">"el-get.el"</span>) </pre>
<p>The idea is that it's much simpler to just come up with a recipe like this than to patch existing code and upload it to <span style="color: #da70d6;">:type</span> git <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"git://github.com/barak/BBDB.git"</span> <span style="color: #da70d6;">:load-path</span> (<span style="color: #bc8f8f;">"./lisp"</span> <span style="color: #bc8f8f;">"./bits"</span>) <span style="color: #da70d6;">:build</span> (<span style="color: #bc8f8f;">"./configure"</span> <span style="color: #bc8f8f;">"make autoloads"</span> <span style="color: #bc8f8f;">"make"</span>) <span style="color: #da70d6;">:build/darwin</span> (<span style="color: #bc8f8f;">"./configure —with-emacs=/Applications/Emacs.app/Contents</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">/MacOS/Emacs" "make autoloads" "make")</span> <span style="color: #da70d6;">:features</span> bbdb <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> () (bbdb-initialize)) <span style="color: #da70d6;">:info</span> <span style="color: #bc8f8f;">"texinfo"</span>) </pre>
ELPA. And anybody can share their <em>recipes</em> very easily, with or without proposing them to me, even if I very much like to add some more in the officialel-getlist.</p> <p>As a user, you don't even need to twiddle with recipes, mostly, because we already have them for you. What you do instead is list them inel-get-sources.</p> <h3>So, show me how you use it?</h3> <p class="first">Yeah, sure. Here's a sample of mydim-packages.elfile, part of my.emacs<em>suite</em>. Yeah a single.emacsdoes not suit me anymore, it's a complete.emacs.dnow, but that's because that's how I like it organised, you know. So, here's the example:</p> <pre class="src"> <span style="color: #b22222;">;;; </span><span style="color: #b22222;">dim-packages.el — Dimitri Fontaine </span><span style="color: #b22222;">;;</span><span style="color: #b22222;"> </span><span style="color: #b22222;">;; </span><span style="color: #b22222;">Set el-get-sources and call el-get to init all those packages we need. </span>(<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">el-get</span>) (add-to-list 'el-get-recipe-path <span style="color: #bc8f8f;">"~/dev/emacs/el-get/recipes"</span>)(setq el-get-sources
'(cssh el-get switch-window vkill google-maps yasnippet verbiste mailq sic<span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">p</span>
(<span style="color: #da70d6;">:name</span> magit <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> () (global-set-key (kbd <span style="color: #bc8f8f;">"C-x C-z"</span>) 'magit-status))<span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">)</span>
(<span style="color: #da70d6;">:name</span> asciidoc <span style="color: #da70d6;">:type</span> elpa <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> () (autoload 'doc-mode <span style="color: #bc8f8f;">"doc-mode"</span> nil t) (add-to-list 'auto-mode-alist '(<span style="color: #bc8f8f;">"\\.adoc$"</span> . doc-mode)) (add-hook 'doc-mode-hook '(<span style="color: #7f007f;">lambda</span> () (turn-on-auto-fill) (<span style="color: #7f007f;">require</span> '<span style="color: #5f9ea0;">asciidoc</span>)))))
(<span style="color: #da70d6;">:name</span> goto-last-change <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> () (global-set-key (kbd <span style="color: #bc8f8f;">"C-x C-/"</span>) 'goto-last-change)))
(<span style="color: #da70d6;">:name</span> auto-dictionary <span style="color: #da70d6;">:type</span> elpa) (<span style="color: #da70d6;">:name</span> gist <span style="color: #da70d6;">:type</span> elpa) (<span style="color: #da70d6;">:name</span> lisppaste <span style="color: #da70d6;">:type</span> elpa)))
(el-get) <span style="color: #b22222;">; </span><span style="color: #b22222;">that could/should be (el-get 'sync) </span>(<span style="color: #7f007f;">provide</span> '<span style="color: #5f9ea0;">dim-packages</span>) </pre>
<p>Ok that's not all of it, but it should give you a nice idea about what problem I solve withel-getand how. In my emacs startup sequence, somewhere inside my~/.emacs.d/init.elfile, I have a line that says(require 'dim-packages). This will setel-get-sourcesto the list just above, then call(el-get), the main function.</p> <p>This main function will check each given package and install it if necessary (including <em>build</em> the package, as inmake autoloads; make), then <em>init</em> it. What <em>init</em> means exactly depends on what the recipe says. That can include <em>byte-compiling</em> some files, caring about <em>load-path</em>, <em>load</em> and <em>require</em> commands, caring about <em>Info-directory-list</em> andginstall-infotoo, and some more.</p> <p>So in short, it will make it so that your emacs instance is ready for you to use. And you get the choice to use the givenel-getrecipes as-is, like I did forcssh,el-get,switch-windowand others, up tosicp, or to tweak them partly, like in themagitexample where I've added a user init function (the:afterproperty) to bindmagit-statustoC-x C-zhere. You can even embed a full recipe inline in theel-get-sourcesvariable, that's the case for each item that gives its:typeproperty, likeasciidocorgist.</p> <p>And, as you see, we're usingELPAa lot in this sources, soel-getisn't striving to replace it at all, it's just trying to accomodate to a broader world.</p> <h3>I read that the el-get-install is asynchronous, tell me more.</h3> <p class="first">Yeah, right, the example above says(el-get)at its end, and in the cases whenel-gethas to install or build sources, this will be done asynchronously. Which means that not only several sources will get processed at once (using your multi cores, yeah) but that it will let emacs start up as if it was ready.</p> <p>It happens that's usually what I want, because I seldom add sources in my setup, but in theory that can break your emacs. What I do is start it again or fix by hand, what you can do instead is(el-get 'sync)so that emacs is blocked waiting forel-getto properly install and initialize all the sources you've setup. Your choice, just add the'syncparameter there.</p> <h3>Now, explain me why it is better this way, again, please?</h3> <p class="first">Well, before I wroteel-get, trying out a new extension, setting it up etc was something quite involved, and that I had to redo on several machines. The only way not to redo it was to include the extension's code into my owngitrepository (myemacs.dis ingit, of course).</p> <p>And putting code I don't maintain into my owngitrepository is something I frown upon. I have no business pretending I'll maintain the code, and I know I will never think to check theURLwhere I've found it for updates. That's when I though noting down theURLsomewhere.</p> <p>Also, what about sharing the extension with friends. Uneasy, at best.</p> <p>Entersel-getand I can just add an entry toel-get-sources, based on a file somewhere in my ownel-get-recipe-path. When I'm happy with this file, I can contribute it toel-getproper or just send it over to any interested recipient. Adding it to your sources is easy. Copy the file in yourel-get-recipe-pathsomewhere, add its name to yourel-get-sources, thenM-x el-get-installit. Done. If you were given the:afterfunction, it's all setup already.</p> <p>If you contribute the recipe toel-get, thenM-x el-get-update RET el-get RETand you get it on this other machine where you also use Emacs. Or you can tell your friend to do the same and benefit from your <em>packaging</em>.</p> <h3>Well, sounds good. What recipes do you have already?</h3> <p class="first">I count67of them already. One of them is just a book in <em>info</em> format, with no <em>elisp</em> at all, can you spot it?</p> <pre class="src"> ELISP> (directory-files <span style="color: #bc8f8f;">"~/dev/emacs/el-get/recipes/"</span> nil <span style="color: #bc8f8f;">"el$"</span>)(<span style="color: #bc8f8f;">"auctex.el"</span> <span style="color: #bc8f8f;">"auto-complete-etags.el"</span> <span style="color: #bc8f8f;">"auto-complete-extension.el"</span> <span style="color: #bc8f8f;">"auto-complete.el"</span> <span style="color: #bc8f8f;">"auto-install.el"</span> <span style="color: #bc8f8f;">"autopair.el"</span> <span style="color: #bc8f8f;">"bbdb.el"</span> <span style="color: #bc8f8f;">"blender-python-mode.el"</span> <span style="color: #bc8f8f;">"color-theme-twilight.el"</span> <span style="color: #bc8f8f;">"color-theme.el"</span> <span style="color: #bc8f8f;">"cssh.el"</span> <span style="color: #bc8f8f;">"django-mode.el"</span> <span style="color: #bc8f8f;">"el-get.el"</span> <span style="color: #bc8f8f;">"emacs-w3m.el"</span> <span style="color: #bc8f8f;">"emacschrome.el"</span> <span style="color: #bc8f8f;">"emms.el"</span> <span style="color: #bc8f8f;">"ensime.el"</span> <span style="color: #bc8f8f;">"erc-highlight-nicknames.el"</span> <span style="color: #bc8f8f;">"erc-track-score.el"</span> <span style="color: #bc8f8f;">"escreen.el"</span> <span style="color: #bc8f8f;">"filladapt.el"</span> <span style="color: #bc8f8f;">"flyguess.el"</span> <span style="color: #bc8f8f;">"gist.el"</span> <span style="color: #bc8f8f;">"google-maps.el"</span> <span style="color: #bc8f8f;">"google-weather.el"</span> <span style="color: #bc8f8f;">"goto-last-change.el"</span> <span style="color: #bc8f8f;">"haskell-mode.el"</span> <span style="color: #bc8f8f;">"highlight-parentheses.el"</span> <span style="color: #bc8f8f;">"hl-sexp.el"</span> <span style="color: #bc8f8f;">"levenshtein.el"</span> <span style="color: #bc8f8f;">"magit.el"</span> <span style="color: #bc8f8f;">"mailq.el"</span> <span style="color: #bc8f8f;">"maxframe.el"</span> <span style="color: #bc8f8f;">"multi-term.el"</span> <span style="color: #bc8f8f;">"muse-blog.el"</span> <span style="color: #bc8f8f;">"nognus.el"</span> <span style="color: #bc8f8f;">"nterm.el"</span> <span style="color: #bc8f8f;">"nxhtml.el"</span> <span style="color: #bc8f8f;">"offlineimap.el"</span> <span style="color: #bc8f8f;">"package.el"</span> <span style="color: #bc8f8f;">"popup-kill-ring.el"</span> <span style="color: #bc8f8f;">"pos-tip.el"</span> <span style="color: #bc8f8f;">"pov-mode.el"</span> <span style="color: #bc8f8f;">"psvn.el"</span> <span style="color: #bc8f8f;">"pymacs.el"</span> <span style="color: #bc8f8f;">"rainbow-mode.el"</span> <span style="color: #bc8f8f;">"rcirc-groups.el"</span> <span style="color: #bc8f8f;">"rinari.el"</span> <span style="color: #bc8f8f;">"ropemacs.el"</span> <span style="color: #bc8f8f;">"rt-liberation.el"</span> <span style="color: #bc8f8f;">"scratch.el"</span> <span style="color: #bc8f8f;">"session.el"</span> <span style="color: #bc8f8f;">"sicp.el"</span> <span style="color: #bc8f8f;">"smex.el"</span> <span style="color: #bc8f8f;">"switch-window.el"</span> <span style="color: #bc8f8f;">"textile-mode.el"</span> <span style="color: #bc8f8f;">"todochiku.el"</span> <span style="color: #bc8f8f;">"twitter.el"</span> <span style="color: #bc8f8f;">"twittering-mode.el"</span> <span style="color: #bc8f8f;">"undo-tree.el"</span> <span style="color: #bc8f8f;">"verbiste.el"</span> <span style="color: #bc8f8f;">"vimpulse-surround.el"</span> <span style="color: #bc8f8f;">"vimpulse.el"</span> <span style="color: #bc8f8f;">"vkill.el"</span> <span style="color: #bc8f8f;">"xcscope.el"</span> <span style="color: #bc8f8f;">"xml-rpc-el.el"</span> <span style="color: #bc8f8f;">"yasnippet.el"</span>) </pre>
<h3>Ok, I want to try it, what's next?</h3> <p class="first">Visit the followingURL<a href="http://github.com/dimitri/el-get">http://github.com/dimitri/el-get</a> and follow the install instructions. You're given a <em>scratch installer</em> there, that's some <em>elisp</em> code you copy paste into*scratch*then execute there, and you haveel-getready to serve.</p> <p>An excellent idea I stole atELPA!</p> <h3>Hey, I already know what el-get is, what's new in 1.0?</h3> <p class="first">The <em>changelog</em> is quite full of good stuff, really:</p> <ul> <li>Implement el-get recipes so that el-get-sources can be a simple list of symbols. Now that there's an authoritative git repository, where to share the recipes is easy.</li> <li>Add support for emacswiki directly, save from having to enter the URL</li> <li>Implement package status on-disk saving so that installing over a previously failed install is in theory possible. Currently `el-get' will refrain from removing your package automatically, though.</li> <li>Fix ELPA remove method, adding a "removed" state too.</li> <li>Implement CVS login support.</li> <li>Add lots of recipes</li> <li>Add support for `system-type' specific build commands</li> <li>Byte compile files from the load-path entries or :compile files</li> <li>Implement support for git submodules with the command `git submodule update —init —recursive`</li> <li>Add catch-all post-install and post-update hooks</li> <li>Add desktop notification on install/update.</li> </ul> <h3>I'm still using the deprecated emacswiki version, what now?</h3> <p class="first">That version didn't have recipes, and the new version should be perfectly happy with your currentel-get-sources, so that I recommend using the <em>scratch installer</em> too. Don't forget to addel-getitself into yourel-get-sourceslist, of course!</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 07 Oct 2010 13:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/10/blog/2010/10/07-el-get-reaches-10.html</guid> </item> <item> <title>el-get reaches 1.0</title> <link>http://tapoueh.org/blog/2010/10/07-el-get-reaches-10.html</link> <description><![CDATA[h1>el-get reaches 1.0</h1>
Thursday, October 07 2010, 13:30 </div><p>It's been a week since the last commits in the <a href="http://github.com/dimitri/el-get">el-get repository</a>, and those were all about fixing and adding recipes, and about notifications. Nothing like <em>core plumbing</em> you see. Also,0.9was released on <em>2010-08-24</em> and felt pretty complete already, then received lots of improvements. It's high time to cross the line and call it1.0!</p> <p>Now existing users will certainly just be moderatly happy to see the tool reach that version number, depending whether they think more about the bugs they want to see fixed (ftp is supported, only called http) and the new features they want to see in (<em>info</em> documentation) or more about whatel-getdoes for them already today...</p> <p>For the new users, or the yet-to-be-convinced users, let's take some time and talk aboutel-get. A <em>FAQ</em> like session might be best.</p> <h3>How is el-get different from ELPA?</h3> <p><a href="http://tromey.com/elpa/">ELPA</a> is the <em>Emacs Lisp Package Archive</em> and is also known aspackage.el, to be included in Emacs 24. This allows emacs list extension authors to <em>package</em> their work. That means they have to follow some guidelines and format their contribution, then propose it for upload.</p> <p>This requires licence checks (good) and for the <a href="http://elpa.gnu.org/">new official ELPA mirror</a> it even requires dead-tree papers exchange and contracts and copyright assignments, I believe.</p> <h3>Why have both?</h3> <p class="first">While <em>ELPA</em> is a great thing to have, it's so easy to find some high quality Emacs extension out there that are not part of the offer. Either authors are not interrested into uploading to ELPA, or they don't know how to properly <em>package</em> for it (it's only simple for single file extensions, see).</p> <p>Soel-getis a pragmatic answer here. It's there because it so happens that I don't depend only on emacs extensions that are available with Emacs itself, in my distributionsite-lispand inELPA. I need some more, and I don't need it to be complex to find it, fetch it, init it and use it.</p> <p>Of course I could try and package any extension I find I need and submit it toELPA, but really, to do that nicely I'd need to contact the extension author (<em>upstream</em>) for him to accept my patch, and then consider a fork.</p> <p>Withel-getI propose distributed packaging if you will. Let's have a look at two <em>recipes</em> here. First, theel-getone itself:</p> <pre class="src"> (<span style="color: #729fcf;">:name</span> el-get<p>Then a much more complex one, the <a href="http://bbdb.sourceforge.net/">bbdb</a> one:</p> <pre class="src"> (<span style="color: #729fcf;">:name</span> bbdb <span style="color: #729fcf;">:type</span> git <span style="color: #729fcf;">:url</span> <span style="color: #ad7fa8; font-style: italic;">"git://github.com/dimitri/el-get.git"</span> <span style="color: #729fcf;">:features</span> el-get <span style="color: #729fcf;">:compile</span> <span style="color: #ad7fa8; font-style: italic;">"el-get.el"</span>) </pre>
<p>The idea is that it's much simpler to just come up with a recipe like this than to patch existing code and upload it to <span style="color: #729fcf;">:type</span> git <span style="color: #729fcf;">:url</span> <span style="color: #ad7fa8; font-style: italic;">"git://github.com/barak/BBDB.git"</span> <span style="color: #729fcf;">:load-path</span> (<span style="color: #ad7fa8; font-style: italic;">"./lisp"</span> <span style="color: #ad7fa8; font-style: italic;">"./bits"</span>) <span style="color: #729fcf;">:build</span> (<span style="color: #ad7fa8; font-style: italic;">"./configure"</span> <span style="color: #ad7fa8; font-style: italic;">"make autoloads"</span> <span style="color: #ad7fa8; font-style: italic;">"make"</span>) <span style="color: #729fcf;">:build/darwin</span> (<span style="color: #ad7fa8; font-style: italic;">"./configure —with-emacs=/Applications/Emacs.app/Contents</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">/MacOS/Emacs" "make autoloads" "make")</span> <span style="color: #729fcf;">:features</span> bbdb <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> () (bbdb-initialize)) <span style="color: #729fcf;">:info</span> <span style="color: #ad7fa8; font-style: italic;">"texinfo"</span>) </pre>
ELPA. And anybody can share their <em>recipes</em> very easily, with or without proposing them to me, even if I very much like to add some more in the officialel-getlist.</p> <p>As a user, you don't even need to twiddle with recipes, mostly, because we already have them for you. What you do instead is list them inel-get-sources.</p> <h3>So, show me how you use it?</h3> <p class="first">Yeah, sure. Here's a sample of mydim-packages.elfile, part of my.emacs<em>suite</em>. Yeah a single.emacsdoes not suit me anymore, it's a complete.emacs.dnow, but that's because that's how I like it organised, you know. So, here's the example:</p> <pre class="src"> <span style="color: #888a85;">;;; </span><span style="color: #888a85;">dim-packages.el — Dimitri Fontaine </span><span style="color: #888a85;">;;</span><span style="color: #888a85;"> </span><span style="color: #888a85;">;; </span><span style="color: #888a85;">Set el-get-sources and call el-get to init all those packages we need. </span>(<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">el-get</span>) (add-to-list 'el-get-recipe-path <span style="color: #ad7fa8; font-style: italic;">"~/dev/emacs/el-get/recipes"</span>)(setq el-get-sources
'(cssh el-get switch-window vkill google-maps yasnippet verbiste mailq sic<span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">p</span>
(<span style="color: #729fcf;">:name</span> magit <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> () (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-x C-z"</span>) 'magit-status))<span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">)</span>
(<span style="color: #729fcf;">:name</span> asciidoc <span style="color: #729fcf;">:type</span> elpa <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> () (autoload 'doc-mode <span style="color: #ad7fa8; font-style: italic;">"doc-mode"</span> nil t) (add-to-list 'auto-mode-alist '(<span style="color: #ad7fa8; font-style: italic;">"\\.adoc$"</span> . doc-mode)) (add-hook 'doc-mode-hook '(<span style="color: #729fcf; font-weight: bold;">lambda</span> () (turn-on-auto-fill) (<span style="color: #729fcf; font-weight: bold;">require</span> '<span style="color: #8ae234;">asciidoc</span>)))))
(<span style="color: #729fcf;">:name</span> goto-last-change <span style="color: #729fcf;">:after</span> (<span style="color: #729fcf; font-weight: bold;">lambda</span> () (global-set-key (kbd <span style="color: #ad7fa8; font-style: italic;">"C-x C-/"</span>) 'goto-last-change)))
(<span style="color: #729fcf;">:name</span> auto-dictionary <span style="color: #729fcf;">:type</span> elpa) (<span style="color: #729fcf;">:name</span> gist <span style="color: #729fcf;">:type</span> elpa) (<span style="color: #729fcf;">:name</span> lisppaste <span style="color: #729fcf;">:type</span> elpa)))
(el-get) <span style="color: #888a85;">; </span><span style="color: #888a85;">that could/should be (el-get 'sync) </span>(<span style="color: #729fcf; font-weight: bold;">provide</span> '<span style="color: #8ae234;">dim-packages</span>) </pre>
<p>Ok that's not all of it, but it should give you a nice idea about what problem I solve withel-getand how. In my emacs startup sequence, somewhere inside my~/.emacs.d/init.elfile, I have a line that says(require 'dim-packages). This will setel-get-sourcesto the list just above, then call(el-get), the main function.</p> <p>This main function will check each given package and install it if necessary (including <em>build</em> the package, as inmake autoloads; make), then <em>init</em> it. What <em>init</em> means exactly depends on what the recipe says. That can include <em>byte-compiling</em> some files, caring about <em>load-path</em>, <em>load</em> and <em>require</em> commands, caring about <em>Info-directory-list</em> andginstall-infotoo, and some more.</p> <p>So in short, it will make it so that your emacs instance is ready for you to use. And you get the choice to use the givenel-getrecipes as-is, like I did forcssh,el-get,switch-windowand others, up tosicp, or to tweak them partly, like in themagitexample where I've added a user init function (the:afterproperty) to bindmagit-statustoC-x C-zhere. You can even embed a full recipe inline in theel-get-sourcesvariable, that's the case for each item that gives its:typeproperty, likeasciidocorgist.</p> <p>And, as you see, we're usingELPAa lot in this sources, soel-getisn't striving to replace it at all, it's just trying to accomodate to a broader world.</p> <h3>I read that the el-get-install is asynchronous, tell me more.</h3> <p class="first">Yeah, right, the example above says(el-get)at its end, and in the cases whenel-gethas to install or build sources, this will be done asynchronously. Which means that not only several sources will get processed at once (using your multi cores, yeah) but that it will let emacs start up as if it was ready.</p> <p>It happens that's usually what I want, because I seldom add sources in my setup, but in theory that can break your emacs. What I do is start it again or fix by hand, what you can do instead is(el-get 'sync)so that emacs is blocked waiting forel-getto properly install and initialize all the sources you've setup. Your choice, just add the'syncparameter there.</p> <h3>Now, explain me why it is better this way, again, please?</h3> <p class="first">Well, before I wroteel-get, trying out a new extension, setting it up etc was something quite involved, and that I had to redo on several machines. The only way not to redo it was to include the extension's code into my owngitrepository (myemacs.dis ingit, of course).</p> <p>And putting code I don't maintain into my owngitrepository is something I frown upon. I have no business pretending I'll maintain the code, and I know I will never think to check theURLwhere I've found it for updates. That's when I though noting down theURLsomewhere.</p> <p>Also, what about sharing the extension with friends. Uneasy, at best.</p> <p>Entersel-getand I can just add an entry toel-get-sources, based on a file somewhere in my ownel-get-recipe-path. When I'm happy with this file, I can contribute it toel-getproper or just send it over to any interested recipient. Adding it to your sources is easy. Copy the file in yourel-get-recipe-pathsomewhere, add its name to yourel-get-sources, thenM-x el-get-installit. Done. If you were given the:afterfunction, it's all setup already.</p> <p>If you contribute the recipe toel-get, thenM-x el-get-update RET el-get RETand you get it on this other machine where you also use Emacs. Or you can tell your friend to do the same and benefit from your <em>packaging</em>.</p> <h3>Well, sounds good. What recipes do you have already?</h3> <p class="first">I count67of them already. One of them is just a book in <em>info</em> format, with no <em>elisp</em> at all, can you spot it?</p> <pre class="src"> ELISP> (directory-files <span style="color: #ad7fa8; font-style: italic;">"~/dev/emacs/el-get/recipes/"</span> nil <span style="color: #ad7fa8; font-style: italic;">"el$"</span>)(<span style="color: #ad7fa8; font-style: italic;">"auctex.el"</span> <span style="color: #ad7fa8; font-style: italic;">"auto-complete-etags.el"</span> <span style="color: #ad7fa8; font-style: italic;">"auto-complete-extension.el"</span> <span style="color: #ad7fa8; font-style: italic;">"auto-complete.el"</span> <span style="color: #ad7fa8; font-style: italic;">"auto-install.el"</span> <span style="color: #ad7fa8; font-style: italic;">"autopair.el"</span> <span style="color: #ad7fa8; font-style: italic;">"bbdb.el"</span> <span style="color: #ad7fa8; font-style: italic;">"blender-python-mode.el"</span> <span style="color: #ad7fa8; font-style: italic;">"color-theme-twilight.el"</span> <span style="color: #ad7fa8; font-style: italic;">"color-theme.el"</span> <span style="color: #ad7fa8; font-style: italic;">"cssh.el"</span> <span style="color: #ad7fa8; font-style: italic;">"django-mode.el"</span> <span style="color: #ad7fa8; font-style: italic;">"el-get.el"</span> <span style="color: #ad7fa8; font-style: italic;">"emacs-w3m.el"</span> <span style="color: #ad7fa8; font-style: italic;">"emacschrome.el"</span> <span style="color: #ad7fa8; font-style: italic;">"emms.el"</span> <span style="color: #ad7fa8; font-style: italic;">"ensime.el"</span> <span style="color: #ad7fa8; font-style: italic;">"erc-highlight-nicknames.el"</span> <span style="color: #ad7fa8; font-style: italic;">"erc-track-score.el"</span> <span style="color: #ad7fa8; font-style: italic;">"escreen.el"</span> <span style="color: #ad7fa8; font-style: italic;">"filladapt.el"</span> <span style="color: #ad7fa8; font-style: italic;">"flyguess.el"</span> <span style="color: #ad7fa8; font-style: italic;">"gist.el"</span> <span style="color: #ad7fa8; font-style: italic;">"google-maps.el"</span> <span style="color: #ad7fa8; font-style: italic;">"google-weather.el"</span> <span style="color: #ad7fa8; font-style: italic;">"goto-last-change.el"</span> <span style="color: #ad7fa8; font-style: italic;">"haskell-mode.el"</span> <span style="color: #ad7fa8; font-style: italic;">"highlight-parentheses.el"</span> <span style="color: #ad7fa8; font-style: italic;">"hl-sexp.el"</span> <span style="color: #ad7fa8; font-style: italic;">"levenshtein.el"</span> <span style="color: #ad7fa8; font-style: italic;">"magit.el"</span> <span style="color: #ad7fa8; font-style: italic;">"mailq.el"</span> <span style="color: #ad7fa8; font-style: italic;">"maxframe.el"</span> <span style="color: #ad7fa8; font-style: italic;">"multi-term.el"</span> <span style="color: #ad7fa8; font-style: italic;">"muse-blog.el"</span> <span style="color: #ad7fa8; font-style: italic;">"nognus.el"</span> <span style="color: #ad7fa8; font-style: italic;">"nterm.el"</span> <span style="color: #ad7fa8; font-style: italic;">"nxhtml.el"</span> <span style="color: #ad7fa8; font-style: italic;">"offlineimap.el"</span> <span style="color: #ad7fa8; font-style: italic;">"package.el"</span> <span style="color: #ad7fa8; font-style: italic;">"popup-kill-ring.el"</span> <span style="color: #ad7fa8; font-style: italic;">"pos-tip.el"</span> <span style="color: #ad7fa8; font-style: italic;">"pov-mode.el"</span> <span style="color: #ad7fa8; font-style: italic;">"psvn.el"</span> <span style="color: #ad7fa8; font-style: italic;">"pymacs.el"</span> <span style="color: #ad7fa8; font-style: italic;">"rainbow-mode.el"</span> <span style="color: #ad7fa8; font-style: italic;">"rcirc-groups.el"</span> <span style="color: #ad7fa8; font-style: italic;">"rinari.el"</span> <span style="color: #ad7fa8; font-style: italic;">"ropemacs.el"</span> <span style="color: #ad7fa8; font-style: italic;">"rt-liberation.el"</span> <span style="color: #ad7fa8; font-style: italic;">"scratch.el"</span> <span style="color: #ad7fa8; font-style: italic;">"session.el"</span> <span style="color: #ad7fa8; font-style: italic;">"sicp.el"</span> <span style="color: #ad7fa8; font-style: italic;">"smex.el"</span> <span style="color: #ad7fa8; font-style: italic;">"switch-window.el"</span> <span style="color: #ad7fa8; font-style: italic;">"textile-mode.el"</span> <span style="color: #ad7fa8; font-style: italic;">"todochiku.el"</span> <span style="color: #ad7fa8; font-style: italic;">"twitter.el"</span> <span style="color: #ad7fa8; font-style: italic;">"twittering-mode.el"</span> <span style="color: #ad7fa8; font-style: italic;">"undo-tree.el"</span> <span style="color: #ad7fa8; font-style: italic;">"verbiste.el"</span> <span style="color: #ad7fa8; font-style: italic;">"vimpulse-surround.el"</span> <span style="color: #ad7fa8; font-style: italic;">"vimpulse.el"</span> <span style="color: #ad7fa8; font-style: italic;">"vkill.el"</span> <span style="color: #ad7fa8; font-style: italic;">"xcscope.el"</span> <span style="color: #ad7fa8; font-style: italic;">"xml-rpc-el.el"</span> <span style="color: #ad7fa8; font-style: italic;">"yasnippet.el"</span>) </pre>
<h3>Ok, I want to try it, what's next?</h3> <p class="first">Visit the followingURL<a href="http://github.com/dimitri/el-get">http://github.com/dimitri/el-get</a> and follow the install instructions. You're given a <em>scratch installer</em> there, that's some <em>elisp</em> code you copy paste into*scratch*then execute there, and you haveel-getready to serve.</p> <p>An excellent idea I stole atELPA!</p> <h3>Hey, I already know what el-get is, what's new in 1.0?</h3> <p class="first">The <em>changelog</em> is quite full of good stuff, really:</p> <ul> <li>Implement el-get recipes so that el-get-sources can be a simple list of symbols. Now that there's an authoritative git repository, where to share the recipes is easy.</li> <li>Add support for emacswiki directly, save from having to enter the URL</li> <li>Implement package status on-disk saving so that installing over a previously failed install is in theory possible. Currently `el-get' will refrain from removing your package automatically, though.</li> <li>Fix ELPA remove method, adding a "removed" state too.</li> <li>Implement CVS login support.</li> <li>Add lots of recipes</li> <li>Add support for `system-type' specific build commands</li> <li>Byte compile files from the load-path entries or :compile files</li> <li>Implement support for git submodules with the command `git submodule update —init —recursive`</li> <li>Add catch-all post-install and post-update hooks</li> <li>Add desktop notification on install/update.</li> </ul> <h3>I'm still using the deprecated emacswiki version, what now?</h3> <p class="first">That version didn't have recipes, and the new version should be perfectly happy with your currentel-get-sources, so that I recommend using the <em>scratch installer</em> too. Don't forget to addel-getitself into yourel-get-sourceslist, of course!</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/muse.html">Muse</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/extensions.html">Extensions</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/switch-window.html">switch-window</a> <a href="../../../tags/cssh.html">cssh</a> <a href="../../../tags/mailq.html">mailq</a> <a href="../../../tags/rcirc.html">rcirc</a></p>]]></description>
<item><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 07 Oct 2010 13:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/10/07-el-get-reaches-10.html</guid> </item>
<p>The major reason why I dislike <a href="http://www.perl.org/">perl</a> so much, and <a href="http://www.ruby-lang.org">ruby</a> too, and the thing I'd want different in the <a href="http://www.gnu.org/software/emacs/manual/elisp.html">Emacs Lisp</a><title>Regexp performances and Finite Automata</title> <link>http://tapoueh.org/blog/2010/09/blog/2010/09/26-regexp-performances-and-finite-automata.html</link> <description><![CDATA[<p><span class="hack"> </span></p>
APIso far is how they set developers mind into using <a href="http://www.regular-expressions.info/">regexp</a>. You know the quote, don't you?</p> <blockquote> <p class="quoted"> Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.</p> </blockquote> <p>That said, some situations require the use of <em>regexp</em> — or are so much simpler to solve using them than the maintenance hell you're building here ain't that big a drag. The given expressiveness is hard to match with any other solution, to the point I sometime use them in my code (well I use <a href="http://www.emacswiki.org/emacs/rx">rx</a> to lower the burden sometime, just see this example).</p> <pre class="src"> (rx bol (zero-or-more blank) (one-or-more digit) <span style="color: #bc8f8f;">":"</span>) <span style="color: #bc8f8f;">"^:blank:*:digit:+:"</span> </pre> <p>The thing you might want to know about <em>regexp</em> is that computing them is an heavy task usually involving <em>parsing</em> their representation, <em>compiling</em> it to some executable code, and then <em>executing</em> generated code. It's been showed in the past (as soon as 1968) that a <em>regexp</em> is just another way to write a finite automata, at least as soon as you don't need <em>backtracking</em>. The writing of this article is my reaction to reading <a href="http://swtch.com/~rsc/regexp/regexp1.html">Regular Expression Matching Can Be Simple And Fast</a> (but is slow in Java, Perl, PHP, Python, Ruby, ...), a very interesting article — see the benchmarks in there.</p> <p>The bulk of it is that we find mainly two categories of <em>regexp</em> engine in the wild, those that are using <a href="http://en.wikipedia.org/wiki/Nondeterministic_finite_state_machine">NFA</a> and <a href="http://en.wikipedia.org/wiki/Deterministic_finite_automaton">DFA</a> intermediate representation techniques, and the others. Our beloved <a href="http://www.postgresql.org/">PostgreSQL</a> sure offers the feature, it's the~and~*<a href="http://www.postgresql.org/docs/9.0/interactive/functions-matching.html">operators</a>. The implementation here is based on <a href="http://www.arglist.com/regex/">Henry Spencer</a>'s work, which the aforementioned article says</p> <blockquote> <p class="quoted"> became very widely used, eventually serving as the basis for the slow regular expression implementations mentioned earlier: Perl, PCRE, Python, and so on.</p> </blockquote> <p>Having a look at the actual implementation shows that indeed, current PostgreSQL code for <em>regexp</em> matching uses intermediate representations of them asNFAandDFA. The code is quite complex, even more than I though it would be, and I didn't have the time it would take to check it against the proposed one from the <em>simple and fast</em> article.</p> <pre class="src"> postgresql/src/backend/regex<p>So all in all, I'll continue avoiding <em>regexp</em> as much as I currently do, and will maintain my tendency to using <a href="http://www.gnu.org/manual/gawk/gawk.html">awk</a> when I need them on files (it allows to refine the searching without resorting to more and more pipes in the command line). And as far as resorting to using <em>regexp</em> in PostgreSQL is concerned, it seems that the code here is already about topnotch. Once more.</p> ]]></description>
- rw-r—r— 1 dim staff 4362 Sep 25 20:59 COPYRIGHT
- rw-r—r— 1 dim staff 614 Sep 25 20:59 Makefile
- rw-r—r— 1 dim staff 28217 Sep 25 20:59 re_syntax.n
- rw-r—r— 1 dim staff 16589 Sep 25 20:59 regc_color.c
- rw-r—r— 1 dim staff 3464 Sep 25 20:59 regc_cvec.c
- rw-r—r— 1 dim staff 25036 Sep 25 20:59 regc_lex.c
- rw-r—r— 1 dim staff 16845 Sep 25 20:59 regc_locale.c
- rw-r—r— 1 dim staff 35917 Sep 25 20:59 regc_nfa.c
- rw-r—r— 1 dim staff 50714 Sep 25 20:59 regcomp.c
- rw-r—r— 1 dim staff 17368 Sep 25 20:59 rege_dfa.c
- rw-r—r— 1 dim staff 3627 Sep 25 20:59 regerror.c
- rw-r—r— 1 dim staff 27664 Sep 25 20:59 regexec.c
- rw-r—r— 1 dim staff 2122 Sep 25 20:59 regfree.c </pre>
<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Sun, 26 Sep 2010 21:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/blog/2010/09/26-regexp-performances-and-finite-automata.html</guid> </item> <item> <title>Regexp performances and Finite Automata</title> <link>http://tapoueh.org/blog/2010/09/26-regexp-performances-and-finite-automata.html</link> <description><![CDATA[h1>Regexp performances and Finite Automata</h1>
Sunday, September 26 2010, 21:00 </div><p><span class="hack"> </span></p> <p>The major reason why I dislike <a href="http://www.perl.org/">perl</a> so much, and <a href="http://www.ruby-lang.org">ruby</a> too, and the thing I'd want different in the <a href="http://www.gnu.org/software/emacs/manual/elisp.html">Emacs Lisp</a>APIso far is how they set developers mind into using <a href="http://www.regular-expressions.info/">regexp</a>. You know the quote, don't you?</p> <blockquote> <p class="quoted"> Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.</p> </blockquote> <p>That said, some situations require the use of <em>regexp</em> — or are so much simpler to solve using them than the maintenance hell you're building here ain't that big a drag. The given expressiveness is hard to match with any other solution, to the point I sometime use them in my code (well I use <a href="http://www.emacswiki.org/emacs/rx">rx</a> to lower the burden sometime, just see this example).</p> <pre class="src"> (rx bol (zero-or-more blank) (one-or-more digit) <span style="color: #ad7fa8; font-style: italic;">":"</span>) <span style="color: #ad7fa8; font-style: italic;">"^:blank:*:digit:+:"</span> </pre> <p>The thing you might want to know about <em>regexp</em> is that computing them is an heavy task usually involving <em>parsing</em> their representation, <em>compiling</em> it to some executable code, and then <em>executing</em> generated code. It's been showed in the past (as soon as 1968) that a <em>regexp</em> is just another way to write a finite automata, at least as soon as you don't need <em>backtracking</em>. The writing of this article is my reaction to reading <a href="http://swtch.com/~rsc/regexp/regexp1.html">Regular Expression Matching Can Be Simple And Fast</a> (but is slow in Java, Perl, PHP, Python, Ruby, ...), a very interesting article — see the benchmarks in there.</p> <p>The bulk of it is that we find mainly two categories of <em>regexp</em> engine in the wild, those that are using <a href="http://en.wikipedia.org/wiki/Nondeterministic_finite_state_machine">NFA</a> and <a href="http://en.wikipedia.org/wiki/Deterministic_finite_automaton">DFA</a> intermediate representation techniques, and the others. Our beloved <a href="http://www.postgresql.org/">PostgreSQL</a> sure offers the feature, it's the~and~*<a href="http://www.postgresql.org/docs/9.0/interactive/functions-matching.html">operators</a>. The implementation here is based on <a href="http://www.arglist.com/regex/">Henry Spencer</a>'s work, which the aforementioned article says</p> <blockquote> <p class="quoted"> became very widely used, eventually serving as the basis for the slow regular expression implementations mentioned earlier: Perl, PCRE, Python, and so on.</p> </blockquote> <p>Having a look at the actual implementation shows that indeed, current PostgreSQL code for <em>regexp</em> matching uses intermediate representations of them asNFAandDFA. The code is quite complex, even more than I though it would be, and I didn't have the time it would take to check it against the proposed one from the <em>simple and fast</em> article.</p> <pre class="src"> postgresql/src/backend/regex<p>So all in all, I'll continue avoiding <em>regexp</em> as much as I currently do, and will maintain my tendency to using <a href="http://www.gnu.org/manual/gawk/gawk.html">awk</a> when I need them on files (it allows to refine the searching without resorting to more and more pipes in the command line). And as far as resorting to using <em>regexp</em> in PostgreSQL is concerned, it seems that the code here is already about topnotch. Once more.</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/emacs.html">Emacs</a></p>
- rw-r—r— 1 dim staff 4362 Sep 25 20:59 COPYRIGHT
- rw-r—r— 1 dim staff 614 Sep 25 20:59 Makefile
- rw-r—r— 1 dim staff 28217 Sep 25 20:59 re_syntax.n
- rw-r—r— 1 dim staff 16589 Sep 25 20:59 regc_color.c
- rw-r—r— 1 dim staff 3464 Sep 25 20:59 regc_cvec.c
- rw-r—r— 1 dim staff 25036 Sep 25 20:59 regc_lex.c
- rw-r—r— 1 dim staff 16845 Sep 25 20:59 regc_locale.c
- rw-r—r— 1 dim staff 35917 Sep 25 20:59 regc_nfa.c
- rw-r—r— 1 dim staff 50714 Sep 25 20:59 regcomp.c
- rw-r—r— 1 dim staff 17368 Sep 25 20:59 rege_dfa.c
- rw-r—r— 1 dim staff 3627 Sep 25 20:59 regerror.c
- rw-r—r— 1 dim staff 27664 Sep 25 20:59 regexec.c
- rw-r—r— 1 dim staff 2122 Sep 25 20:59 regfree.c </pre>
]]></description>
<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Sun, 26 Sep 2010 21:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/26-regexp-performances-and-finite-automata.html</guid> </item> <item> <title>Postfix sender_dependent_relayhost_maps</title> <link>http://tapoueh.org/blog/2010/09/23-postfix-sender_dependent_relayhost_maps.html</link> <description><![CDATA[h1>Postfix sender_dependent_relayhost_maps</h1>
Thursday, September 23 2010, 14:30 </div><p><span class="hack"> </span></p> <p>The previous article about <a href="http://tapoueh.org/articles/news/_Scratch_that_itch:_M-x_mailq.html">M-x mailq</a> has raised several mails asking me details about the <a href="http://www.postfix.com/">Postfix</a> setup I'm talking about. The problem we're trying to solve is having a localMTAto send mails, so that any old-style Unix tool just works, instead of only theMUAyou've spent time setting up.</p> <p>Postfix makes it possible to do that quite easily, but it gets a little more involved if you have more than one <em>relayhost</em> that you want to use depending on your current <em>From</em> address. Think personal email against work email, or avoiding yourISPnetwork when sending your private mails, <em>hoping</em> directly on a server you own or trust.</p> <p>So how do you do just that? Let's see the relevant parts ofmain.cf.</p> <pre class="src"> relayhost = your.default.relay.host.here relay_domains = domain.org, work-domain.com, other-domain.info smtp_sender_dependent_authentication = yes sender_dependent_relayhost_maps = hash:/etc/postfix/relaymap </pre> <p>Therelaymaplooks like this:</p> <pre class="src"> <span style="color: #888a85;"># </span><span style="color: #888a85;">comments </span>user@domain.org mail.domain.org local@work-domain.com smtp.work-domain.com <span style="color: #888a85;"># </span><span style="color: #888a85;">that requires a local tunnel started with ssh, see ~/.ssh/config </span>me@other-domain.info [127.0.0.1]:10025 </pre> <p>You need to use <a href="http://www.postfix.org/postmap.1.html">postmap</a> on this file before to reload or restart your local instance of Postfix.</p> <p>Also, you should want to crypt your communication to your preferred relay host, usingTLSgoes like this:</p> <pre class="src"> smtp_sasl_auth_enable=yes smtp_sasl_password_maps=hash:/etc/postfix/sasl-passwords smtp_sasl_mechanism_filter = digest-md5 smtp_sasl_security_options = noanonymous smtp_sasl_mechanism_filter = login, plain smtp_sasl_type = cyrussmtp_tls_session_cache_database = btree:${queue_directory}/smtp_scache smtp_tls_loglevel = 2 smtp_use_tls = yes smtp_tls_security_level = may </pre>
<p>The password file will need to get parsed bypostmaptoo, and would better be set with limited read access, and looks like this:</p> <pre class="src"> mail.domain.org user@domain.org:password smtp.work-domain.com local@work-domain.com:h4ckm3 [<span style="color: #8ae234; font-weight: bold;">127.0.0.1</span>]:10025 me@other-domain.info:guess </pre> <p>Hope this help you get started, at least that's a document I would have enjoyed reading when I first started to setup my local relayingMTA.</p> <p>Oh, and now that you have this, I hope you will enjoy myM-x mailqtool for occasions when you're wondering why you're not receiving an answer back yet, then start the ssh tunnel…</p> <h2>Tags</h2> <p><a href="../../../tags/mailq.html">mailq</a> <a href="../../../tags/postfix.html">postfix</a></p>]]></description>
<item><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 23 Sep 2010 14:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/23-postfix-sender_dependent_relayhost_maps.html</guid> </item>
<title>Scratch that itch: M-x mailq</title> <link>http://tapoueh.org/blog/2010/09/23-scratch-that-itch-m-x-mailq.html</link> <description><![CDATA[h1>Scratch that itch: M-x mailq</h1>
Thursday, September 23 2010, 09:30 </div><p>Nowadays, most people would think that email is something simple, you just setup your preferred client (that's called aMUA) with some information such as thesmtphost you want it to talk to (that's call aMTAand this one is yourrelayhost). Then there's all the receiving mails part, and that'ssmtpagain on the server side. Then there's how to get those mail, read them, flag them, manage them, and that's better served byIMAP. Let's talk about sending mails insmtpfor this entry.</p> <p>The traditional way to handle mail sending is to have your ownMTAon each system you use — there used to be a <em>sysadmin</em> team caring about all those systems, but we're lost in the personal computer era now — that only means <strong><em>you</em></strong> are the sysadmin. So about any Unix tool that wants to send a mail will do so with the command/usr/bin/sendmailto queue the outgoing message.</p> <p>My typical <em>workstation</em> setup includes a full-blownMTA(my choice is <a href="http://www.postfix.com/">Postfix</a>) that will choose the next relay host depending on the message <em>From</em> field: I don't want to trust any local default relayhost. Note that the next relay is connected to with authentication and over an encrypted protocol.</p> <blockquote> <p class="quoted"> We're getting there, really. But I don't know a better way to present a software, little as it be, other than talking about the need that leads to its development.</p> </blockquote> <p>Some relaying I do atop ansshtunnel, and it happens that I send mail and have forgotten about setting up the aforementioned tunnel. In this case, the advantage is that it will not block myMUA(<a href="http://gnus.org/">gnus</a>, in quite good shape those days, receiving lots of love), as the queueing happens as usual. The drawback is that <a href="http://www.postfix.com/">Postfix</a> will <em>silently</em> queue the mail until it's able to deliver it, which can take days.</p> <p>EntersM-x mailq! Ok, I could be doingM-! mailqand see <em>Mail queue is empty</em> in the message area, but then as soon as the queue's not empty I need to resort to some <em>shell</em> or <em>terminal</em> in order to <em>flush</em> the queue — that's after setting up the tunnel, as easy asC-= remotein my case, see <a href="http://github.com/dimitri/cssh">cssh</a>. Scratching that itch, I now only have to hitfhere, to flush the queue. And from the <em>gnus</em>*Group*and*Summary*buffers, it'sM-qto see the mail queue.</p> <p>Thanks to <a href="http://forum.ubuntu-fr.org/viewtopic.php?id=218883">http://forum.ubuntu-fr.org/viewtopic.php?id=218883</a> here's a visual sample of themailqmode, where you see the mail queue in colors and the <em>keymap</em> you're offered.</p> <center> <p><img src="../../../images//mailq-el.png" alt=""></p> </center> <p>So you could even <em>flush</em> only a givenqueue idor a givensite, or just <em>kill</em> the currentidor the currentsiteso that it's aC-yaway. I hope it's useful for you too — oh, and it's already in the <a href="http://github.com/dimitri/el-get">el-get</a> recipes, of course!</p> <h2>Tags</h2> <p><a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/cssh.html">cssh</a> <a href="../../../tags/mailq.html">mailq</a> <a href="../../../tags/postfix.html">postfix</a></p>]]></description>
<pre class="src"> (<span style="color: #7f007f;">loop</span> with layout = (split-string quail-keyboard-layout <span style="color: #bc8f8f;">""</span>)<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 23 Sep 2010 09:30:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/23-scratch-that-itch-m-x-mailq.html</guid> </item> <item> <title>switch-window reaches 0.8</title> <link>http://tapoueh.org/blog/2010/09/blog/2010/09/13-switch-window-reaches-08.html</link> <description><![CDATA[<p>I wanted to play with the idea of using the whole keyboard for my <a href="http://github.com/dimitri/switch-window">switch-window</a> utility, but wondered how to get those keys in the right order and all. Finally found
quail-keyboard-layoutwhich seems to exists for such uses, as you can see:</p><p>So nowfor row from 1 to 4 collect (<span style="color: #7f007f;">loop</span> for col from 1 to 12 (<span style="color: #bc8f8f;">"q"</span> <span style="color: #bc8f8f;">"w"</span> <span style="color: #bc8f8f;">"e"</span> <span style="color: #bc8f8f;">"r"</span> <span style="color: #bc8f8f;">"t"</span> <span style="color: #bc8f8f;">"y"</span> <span style="color: #bc8f8f;">"u"</span> <span style="color: #bc8f8f;">"i"</span> <span style="color: #bc8f8f;">"o"</span> <span style="color: #bc8f8f;">"p"</span> <span style="color: #bc8f8f;">"["</span> <span style="color: #bc8f8f;">"]"</span>) (<span style="color: #bc8f8f;">"a"</span> <span style="color: #bc8f8f;">"s"</span> <span style="color: #bc8f8f;">"d"</span> <span style="color: #bc8f8f;">"f"</span> <span style="color: #bc8f8f;">"g"</span> <span style="color: #bc8f8f;">"h"</span> <span style="color: #bc8f8f;">"j"</span> <span style="color: #bc8f8f;">"k"</span> <span style="color: #bc8f8f;">"l"</span> <span style="color: #bc8f8f;">";"</span> <span style="color: #bc8f8f;">"'"</span> <span style="color: #bc8f8f;">"\\"</span>) (<span style="color: #bc8f8f;">"z"</span> <span style="color: #bc8f8f;">"x"</span> <span style="color: #bc8f8f;">"c"</span> <span style="color: #bc8f8f;">"v"</span> <span style="color: #bc8f8f;">"b"</span> <span style="color: #bc8f8f;">"n"</span> <span style="color: #bc8f8f;">"m"</span> <span style="color: #bc8f8f;">","</span> <span style="color: #bc8f8f;">"."</span> <span style="color: #bc8f8f;">"/"</span> <span style="color: #bc8f8f;">" "</span> <span style="color: #bc8f8f;">" "</span>)) </pre>
switch-windowwill use that (but only the first10letters) instead of <em>hard-coding</em> numbers from 1 to 9 as labels and direct switches. That makes it more suitable to <a href="http://github.com/dimitri/cssh">cssh</a> users too, I guess.</p> <p>In other news, I think <a href="http://github.com/dimitri/el-get">el-get</a> is about ready for its1.0release. Please test it and report any problem very soon before the release!</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 13 Sep 2010 17:45:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/blog/2010/09/13-switch-window-reaches-08.html</guid> </item> <item> <title>switch-window reaches 0.8</title> <link>http://tapoueh.org/blog/2010/09/13-switch-window-reaches-08.html</link> <description><![CDATA[h1>switch-window reaches 0.8</h1>
Monday, September 13 2010, 17:45 </div><p>I wanted to play with the idea of using the whole keyboard for my <a href="http://github.com/dimitri/switch-window">switch-window</a> utility, but wondered how to get those keys in the right order and all. Finally foundquail-keyboard-layoutwhich seems to exists for such uses, as you can see:</p> <pre class="src"> (<span style="color: #729fcf; font-weight: bold;">loop</span> with layout = (split-string quail-keyboard-layout <span style="color: #ad7fa8; font-style: italic;">""</span>)<p>So nowfor row from 1 to 4 collect (<span style="color: #729fcf; font-weight: bold;">loop</span> for col from 1 to 12 (<span style="color: #ad7fa8; font-style: italic;">"q"</span> <span style="color: #ad7fa8; font-style: italic;">"w"</span> <span style="color: #ad7fa8; font-style: italic;">"e"</span> <span style="color: #ad7fa8; font-style: italic;">"r"</span> <span style="color: #ad7fa8; font-style: italic;">"t"</span> <span style="color: #ad7fa8; font-style: italic;">"y"</span> <span style="color: #ad7fa8; font-style: italic;">"u"</span> <span style="color: #ad7fa8; font-style: italic;">"i"</span> <span style="color: #ad7fa8; font-style: italic;">"o"</span> <span style="color: #ad7fa8; font-style: italic;">"p"</span> <span style="color: #ad7fa8; font-style: italic;">"["</span> <span style="color: #ad7fa8; font-style: italic;">"]"</span>) (<span style="color: #ad7fa8; font-style: italic;">"a"</span> <span style="color: #ad7fa8; font-style: italic;">"s"</span> <span style="color: #ad7fa8; font-style: italic;">"d"</span> <span style="color: #ad7fa8; font-style: italic;">"f"</span> <span style="color: #ad7fa8; font-style: italic;">"g"</span> <span style="color: #ad7fa8; font-style: italic;">"h"</span> <span style="color: #ad7fa8; font-style: italic;">"j"</span> <span style="color: #ad7fa8; font-style: italic;">"k"</span> <span style="color: #ad7fa8; font-style: italic;">"l"</span> <span style="color: #ad7fa8; font-style: italic;">";"</span> <span style="color: #ad7fa8; font-style: italic;">"'"</span> <span style="color: #ad7fa8; font-style: italic;">"\\"</span>) (<span style="color: #ad7fa8; font-style: italic;">"z"</span> <span style="color: #ad7fa8; font-style: italic;">"x"</span> <span style="color: #ad7fa8; font-style: italic;">"c"</span> <span style="color: #ad7fa8; font-style: italic;">"v"</span> <span style="color: #ad7fa8; font-style: italic;">"b"</span> <span style="color: #ad7fa8; font-style: italic;">"n"</span> <span style="color: #ad7fa8; font-style: italic;">"m"</span> <span style="color: #ad7fa8; font-style: italic;">","</span> <span style="color: #ad7fa8; font-style: italic;">"."</span> <span style="color: #ad7fa8; font-style: italic;">"/"</span> <span style="color: #ad7fa8; font-style: italic;">" "</span> <span style="color: #ad7fa8; font-style: italic;">" "</span>)) </pre>
switch-windowwill use that (but only the first10letters) instead of <em>hard-coding</em> numbers from 1 to 9 as labels and direct switches. That makes it more suitable to <a href="http://github.com/dimitri/cssh">cssh</a> users too, I guess.</p> <p>In other news, I think <a href="http://github.com/dimitri/el-get">el-get</a> is about ready for its1.0release. Please test it and report any problem very soon before the release!</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/release.html">release</a> <a href="../../../tags/switch-window.html">switch-window</a> <a href="../../../tags/cssh.html">cssh</a></p>]]></description>
<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 13 Sep 2010 17:45:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/13-switch-window-reaches-08.html</guid> </item> <item> <title>Window Functions example remix</title> <link>http://tapoueh.org/blog/2010/09/12-window-functions-example-remix.html</link> <description><![CDATA[h1>Window Functions example remix</h1>
Sunday, September 12 2010, 21:35 </div><p><span class="hack"> </span></p> <p>The drawback of hosting a static only website is, obviously, the lack of comments. What happens actually, though, is that I receive very few comments by direct mail. As I don't get another <em>spam</em> source to cleanup, I'm left unconvinced that's such a drawback. I still miss the low probability of seeing blog readers exchange directly, but I think atapoueh.orgmailing list would be my answer, here...</p> <p>Anyway, <a href="http://people.planetpostgresql.org/dfetter/">David Fetter</a> took the time to send me a comment by mail with a cleaned up rewrite of the previous entrySQL, here's it for your pleasure!</p> <pre class="src"> WITH t AS (<p>As you can see <strong><em>David</em></strong> chose to filter the first change in the subquery rather than hacking it away with a simpleSELECT o, w, CASE WHEN LAG(w) OVER(w) IS DISTINCT FROM w AND ROW_NUMBER() OVER (w) > 1 <span style="color: #888a85;">/* Eliminate first change /</span> THEN 1 END AS change FROM ( VALUES (1, 5), (2, 10), (3, 7), (4, 7), (5, 7) ) AS data(o, w) WINDOW w AS (ORDER BY o) <span style="color: #888a85;">/ Factor out WINDOW */</span> ) SELECT SUM(change) FROM t; </pre>
-1at the outer level. I'm still wondering which way is cleaner (that depends on how you look at the problem), but I think I know which one is simpler! Thanks <strong><em>David</em></strong> for this blog entry!</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a></p>]]></description>
<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Sun, 12 Sep 2010 21:35:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/12-window-functions-example-remix.html</guid> </item> <item> <title>Window Functions example</title> <link>http://tapoueh.org/blog/2010/09/09-window-functions-example.html</link> <description><![CDATA[h1>Window Functions example</h1>
Thursday, September 09 2010, 16:35 </div><p>So, when8.4came out there was all those comments about how getting <a href="http://www.postgresql.org/docs/8.4/interactive/tutorial-window.html">window functions</a> was an awesome addition. Now, it seems that a lot of people seeking for help in <a href="http://wiki.postgresql.org/index.php?title=IRC">#postgresql</a> just don't know what kind of problem this feature helps solving. I've already been using them in some cases here in this blog, for getting some nice overview about <a href="http://tapoueh.org/articles/blog/_Partitioning:_relation_size_per_%E2%80%9Cgroup%E2%80%9D.html">Partitioning: relation size per “group”</a>.</p> <p>Now, another example use case rose onIRCtoday. I'll quote directly our user here:</p> <blockquote> <p class="quoted"> hey there, how can i count the number of (value) changes in one column?</p> <p class="quoted"> example: a table with a column <em>weight</em>. let's say we have 5 rows, having the following values for weight:5, 10, 7, 7, 7. the number of changes of weight would be 2 here (from 5 to 10 and 10 to 7). any idea how I could do that in SQL using PGSQL 8.4.4? GROUP BY or count(distinct weight) obviously does not work. thx in advance</p> </blockquote> <p>Now, several of us began talking about <em>window functions</em> and about the fact that you need some other column to identify the ordering of those weights, obviously, because that's the only way to define what a change is in this context. Let's have a first try at it.</p> <pre class="src"> =# select o, w,case when lag(w) over(order by o) is distinct from w then 1 end as change from (values (1, 5), (2, 10), (3, 7), (4, 7), (5, 7)) as data(o, w);
o w change <span style="color: #888a85;">—+—-+———
</span> 1 5 1 2 10 1 3 7 1 4 7 5 7 (5 rows) </pre>
<p>Not too bad, but of course we are seeing a false change on the first line, as for any <em>window</em> of rows you define the previous one, given bylag() over(), will beNULL. The easiest way to accommodate is the following:</p> <pre class="src"> =# select sum(change) -1 as changes<p>So don't be shy and go read about <a href="http://www.postgresql.org/docs/8.4/interactive/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS">window functions in SQL expressions</a> and <a href="http://www.postgresql.org/docs/8.4/interactive/queries-table-expressions.html#QUERIES-WINDOW">window function processing</a> in the query table expressions. That's a very nice tool to have and my guess is that you will soon enough realize the only reason why you could think you don't have a need for them is that you didn't know it existed, and what you can do with it. <em>Sharpen your saw!</em> :)</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a></p>from (select case when lag(w) over(order by o) is distinct from w then 1 end as change from (values (1, 5), (2, 10), (3, 7), (4, 7), (5, 7)) as t(o, w)) as x; changes <span style="color: #888a85;">——— </span> 2 (1 row) </pre>
]]></description>
<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 09 Sep 2010 16:35:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/09-window-functions-example.html</guid> </item> <item> <title>Synchronous Replication</title> <link>http://tapoueh.org/blog/2010/09/06-synchronous-replication.html</link> <description><![CDATA[h1>Synchronous Replication</h1>
Monday, September 06 2010, 18:05 </div><p>Although the new asynchronous replication facility that ships with 9.0 ain't released to the wide public yet, our hackers hero are already working on the synchronous version of it. A part of the facility is rather easy to design, we want something comparable to <a href="http://www.drbd.org/">DRBD</a> flexibility, but specific to our database world. So <em>synchronous</em> would either mean <em>recv</em>, <em>fsync</em> or <em>apply</em>, depending on what you need the <em>standby</em> to have already done when the master acknowledges theCOMMIT. Let's call that the <em>service level</em>.</p> <p>The part of the design that's not so easy is more interesting. Do we need to register standbys and have the <em>service level</em> setup per standby? Can we get some more flexibility and have the <em>service level</em> set on a per-transaction basis? The idea here would be that the application knows which transactions are meant to be extra-safe and which are not, the same way that you can setsynchronous_commit to offwhen dealing with web sessions, for example.</p> <p><em>Why choosing?</em> I hear you ask. Well, it's all about having more data safety, and a typical setup would contain an asynchronous reporting server and a local <em>failover</em> synchronous server. Then add a remote one, too. So even if we pick the transaction based facility, we still want to be able to choose at setup time which server to failover to. Than means we don't want that much flexibility now, we want to know where the data is safe, we don't want to have to guess.</p> <p>Some way to solve that is to be able to setup a slave as being the failover one, or say, thesyncone. Now, the detail that ruins it all is that we need a <em>timeout</em> to handle worst cases when a given slave loses its connectivity (or power, say). Now, the slave ain't in <em>sync</em> any more and some people will require that the service is still available (<em>timeout</em> butCOMMIT) and some will require that the service is down: don't accept a new transaction if you can't make its data safe to the slave too.</p> <p>The answer would be to have the master arbitrate between what the transaction wants and what the slave is setup to provide, and what it's able to provide at the time of the transaction. Given a transaction with a <em>service level</em> of <em>apply</em> and a slave setup for being <em>async</em>, theCOMMITdoes not have to wait, because there's no known slave able to offer the needed level. Or theCOMMITcan not happen, for the very same reason.</p> <p>Then I think it all flows quite naturally from there, and while arbitrating the master could record which slave is currently offering what <em>service level</em>. And offering the information in a system view too, of course.</p> <p>The big question that's not answered in this proposal is how to setup that being unable to reach the wanted <em>service level</em> is an error or a warning?</p> <p>That too would need to be for the master to arbitrate based on a per standby and a per transaction setting, and in the general case it could be a <em>quorum</em> setup: each slave is given a <em>weight</em> and each transaction a <em>quorum</em> to reach. The master sums up the weights of the standby that ack the transaction at the needed <em>service level</em> and theCOMMIThappens as soon as the quorum is reached, or is canceled as soon as the <em>timeout</em> is reached, whichever comes first.</p> <p>Such a model allows for very flexible setups, where each standby has a <em>weight</em> and offers a given <em>service level</em>, and each transaction waits until a <em>quorum</em> is reached. Giving the right weights to your standbys (like, powers of two) allow you to set the quorum in a way that only one given standby is able to acknowledge the most important transactions. But that's flexible enough you can change it at any time, it's just a <em>weight</em> that allows a <em>sum</em> to be made, so my guess would be it ends up in the <em>feedback loop</em> between the standby and its master.</p> <p>The most appealing part of this proposal is that it doesn't look complex to implement, and should allow for highly flexible setups. Of course, the devil is in the details, and we're talking about latencies in the distributed system here. That's also being discussed on the <a href="http://archives.postgresql.org/pgsql-hackers/">mailing list</a>.</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/release.html">release</a></p>]]></description>
<item><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 06 Sep 2010 18:05:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/09/06-synchronous-replication.html</guid> </item>
<p>By that I mean that<title>Want to share your recipes?</title> <link>http://tapoueh.org/blog/2010/08/blog/2010/08/31-want-to-share-your-recipes.html</link> <description><![CDATA[<p>Yes, that's another <a href="http://github.com/dimitri/el-get/">el-get</a> related entry. It seems to take a lot of my attention these days. After having setup the
gitrepository so that you can updateel-getfrom within itself (so that it's <em>self-contained</em>), the next logical step is providing <em>recipes</em>.</p>el-get-sourcesentries will certainly look a lot alike between a user and another. Let's take theel-getentry itself:</p> <pre class="src"> (<span style="color: #da70d6;">:name</span> el-get<p>I guess all <span style="color: #da70d6;">:type</span> git <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"git://github.com/dimitri/el-get.git"</span> <span style="color: #da70d6;">:features</span> <span style="color: #bc8f8f;">"el-get"</span>) </pre>
el-getusers will have just the same 4 lines in theirel-get-sources. So let's call that a <em>recipe</em>, and haveel-getlook for yours into theel-get-recipe-pathdirectories. A recipe is found looking in those directories in order, and must be namedpackage.el. Now,el-getalready contains a handful of them, as you can see:</p> <pre class="src"> ELISP> (directory-files <span style="color: #bc8f8f;">"~/dev/emacs/el-get/recipes/"</span> nil <span style="color: #bc8f8f;">"[</span><span style="color: #bc8f8f;">^</span><span style="color: #bc8f8f;">.]$"</span>) (<span style="color: #bc8f8f;">"auctex.el"</span> <span style="color: #bc8f8f;">"bbdb.el"</span> <span style="color: #bc8f8f;">"cssh.el"</span> <span style="color: #bc8f8f;">"el-get.el"</span> <span style="color: #bc8f8f;">"emms.el"</span> <span style="color: #bc8f8f;">"erc-track-score.el"</span><p>Please note that you can have your own local recipes by adding directories to<span style="color: #bc8f8f;">"escreen.el"</span> <span style="color: #bc8f8f;">"google-maps.el"</span> <span style="color: #bc8f8f;">"haskell-mode.el"</span> <span style="color: #bc8f8f;">"hl-sexp.el"</span> <span style="color: #bc8f8f;">"magit.el"</span> <span style="color: #bc8f8f;">"muse-blog.el"</span> <span style="color: #bc8f8f;">"nxhtml.el"</span> <span style="color: #bc8f8f;">"psvn.el"</span> <span style="color: #bc8f8f;">"rainbow-mode.el"</span> <span style="color: #bc8f8f;">"rcirc-groups.el"</span> <span style="color: #bc8f8f;">"vkill.el"</span> <span style="color: #bc8f8f;">"xcscope.el"</span> <span style="color: #bc8f8f;">"xml-rpc-el.el"</span> <span style="color: #bc8f8f;">"yasnippet.el"</span>) </pre>
el-get-recipe-path. So now your minimalisticel-get-sourceslist will look like'(el-get cssh screen), say. And if you want to override a recipe, for instance to use the default one but still have a personal:afterfunction containing your own setup, then simply have yourel-get-sourceentry a partial entry. Missing:typeandel-getwill merge your local overrides atop the default one.</p> <p>Finally, the way to share your recipes is by sending me an email with the file, or to do the same over thegithubinterface, I guess I'll still receive a mail then.</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 31 Aug 2010 14:15:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/blog/2010/08/31-want-to-share-your-recipes.html</guid> </item> <item> <title>Want to share your recipes?</title> <link>http://tapoueh.org/blog/2010/08/31-want-to-share-your-recipes.html</link> <description><![CDATA[h1>Want to share your recipes?</h1>
Tuesday, August 31 2010, 14:15 </div><p>Yes, that's another <a href="http://github.com/dimitri/el-get/">el-get</a> related entry. It seems to take a lot of my attention these days. After having setup thegitrepository so that you can updateel-getfrom within itself (so that it's <em>self-contained</em>), the next logical step is providing <em>recipes</em>.</p> <p>By that I mean thatel-get-sourcesentries will certainly look a lot alike between a user and another. Let's take theel-getentry itself:</p> <pre class="src"> (<span style="color: #729fcf;">:name</span> el-get<p>I guess all <span style="color: #729fcf;">:type</span> git <span style="color: #729fcf;">:url</span> <span style="color: #ad7fa8; font-style: italic;">"git://github.com/dimitri/el-get.git"</span> <span style="color: #729fcf;">:features</span> <span style="color: #ad7fa8; font-style: italic;">"el-get"</span>) </pre>
el-getusers will have just the same 4 lines in theirel-get-sources. So let's call that a <em>recipe</em>, and haveel-getlook for yours into theel-get-recipe-pathdirectories. A recipe is found looking in those directories in order, and must be namedpackage.el. Now,el-getalready contains a handful of them, as you can see:</p> <pre class="src"> ELISP> (directory-files <span style="color: #ad7fa8; font-style: italic;">"~/dev/emacs/el-get/recipes/"</span> nil <span style="color: #ad7fa8; font-style: italic;">"[</span><span style="color: #ad7fa8; font-style: italic;">^</span><span style="color: #ad7fa8; font-style: italic;">.]$"</span>) (<span style="color: #ad7fa8; font-style: italic;">"auctex.el"</span> <span style="color: #ad7fa8; font-style: italic;">"bbdb.el"</span> <span style="color: #ad7fa8; font-style: italic;">"cssh.el"</span> <span style="color: #ad7fa8; font-style: italic;">"el-get.el"</span> <span style="color: #ad7fa8; font-style: italic;">"emms.el"</span> <span style="color: #ad7fa8; font-style: italic;">"erc-track-score.el"</span><p>Please note that you can have your own local recipes by adding directories to<span style="color: #ad7fa8; font-style: italic;">"escreen.el"</span> <span style="color: #ad7fa8; font-style: italic;">"google-maps.el"</span> <span style="color: #ad7fa8; font-style: italic;">"haskell-mode.el"</span> <span style="color: #ad7fa8; font-style: italic;">"hl-sexp.el"</span> <span style="color: #ad7fa8; font-style: italic;">"magit.el"</span> <span style="color: #ad7fa8; font-style: italic;">"muse-blog.el"</span> <span style="color: #ad7fa8; font-style: italic;">"nxhtml.el"</span> <span style="color: #ad7fa8; font-style: italic;">"psvn.el"</span> <span style="color: #ad7fa8; font-style: italic;">"rainbow-mode.el"</span> <span style="color: #ad7fa8; font-style: italic;">"rcirc-groups.el"</span> <span style="color: #ad7fa8; font-style: italic;">"vkill.el"</span> <span style="color: #ad7fa8; font-style: italic;">"xcscope.el"</span> <span style="color: #ad7fa8; font-style: italic;">"xml-rpc-el.el"</span> <span style="color: #ad7fa8; font-style: italic;">"yasnippet.el"</span>) </pre>
el-get-recipe-path. So now your minimalisticel-get-sourceslist will look like'(el-get cssh screen), say. And if you want to override a recipe, for instance to use the default one but still have a personal:afterfunction containing your own setup, then simply have yourel-get-sourceentry a partial entry. Missing:typeandel-getwill merge your local overrides atop the default one.</p> <p>Finally, the way to share your recipes is by sending me an email with the file, or to do the same over thegithubinterface, I guess I'll still receive a mail then.</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/muse.html">Muse</a> <a href="../../../tags/el-get.html">el-get</a> <a href="../../../tags/cssh.html">cssh</a> <a href="../../../tags/rcirc.html">rcirc</a></p>]]></description>
<blockquote> <p class="quoted"> A happy number is defined by the following process. Starting with any positive integer, replace the number by the sum of the squares of its digits, and repeat the process until the number equals 1 (where it will stay), or it loops endlessly in a cycle which does not include 1. Those numbers for which this process ends in 1 are happy numbers, while those that do not end in 1 are unhappy numbers (or sad numbers).</p> </blockquote> <p>Now, what about implementing the same in pure<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Tue, 31 Aug 2010 14:15:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/31-want-to-share-your-recipes.html</guid> </item> <item> <title>Happy Numbers</title> <link>http://tapoueh.org/blog/2010/08/blog/2010/08/30-happy-numbers.html</link> <description><![CDATA[<p>After discovering the excellent <a href="http://gwene.org/">Gwene</a> service, which allows you to subscribe to <em>newsgroups</em> to read
RSScontent (<em>blogs</em>, <em>planets</em>, <em>commits</em>, etc), I came to read this nice article about <a href="http://programmingpraxis.com/2010/07/23/happy-numbers/">Happy Numbers</a>. That's a little problem that fits well an interview style question, so I first solved it yesterday evening in <a href="http://www.gnu.org/software/emacs/emacs-lisp-intro/html_node/List-Processing.html#List-Processing">Emacs Lisp</a> as that's the language I use the most those days.</p>SQL, for more fun? Now that's interesting! After all, we didn't getWITH RECURSIVEfor tree traversal only, <a href="http://archives.postgresql.org/message-id/e08cc0400911042333o5361b21cu2c9438f82b1e55ce@mail.gmail.com">did we</a>?</p> <p>Unfortunately, we need a little helper function first, if only to ease the reading of the recursive query. I didn't try to inline it, but here it goes:</p> <pre class="src"> create or replace function digits(x bigint)<p>That was easy: it will output one row per digit of the input number — and rather than resorting to powers of ten and divisions and remainders, we do use plain old text representation andreturns setof int language sql as $$ select substring($1::text from i for 1)::int from generate_series(1, length($1::text)) as t(i) $$; </pre>
substring. Now, to the real problem. If you're read what is an happy number and already did read the fine manual about <a href="http://www.postgresql.org/docs/8.4/interactive/queries-with.html">Recursive Query Evaluation</a>, it should be quite easy to read the following:</p> <pre class="src"> with recursive happy(n, seen) as (select 7::bigint, <span style="color: #bc8f8f;">'{}'</span>::bigint[] union all
select sum(d*d), h.seen sum(d*d) from (select n, digits(n) as d, seen from happy ) as h group by h.n, h.seen having not seen @> array[sum(d*d)] ) select * from happy;
n seen <span style="color: #b22222;">——+——————
</span> 7 {} 49 {49} 97 {49,97} 130 {49,97,130} 10 {49,97,130,10} 1 {49,97,130,10,1} (6 rows)
Time: 1.238 ms </pre>
<p>That shows how it works for some <em>happy</em> number, and it's easy to test for a non-happy one, like for example17. The query won't cycle thanks to theseenarray and thehavingfilter, so the only difference between an <em>happy</em> and a <em>sad</em> number will be that in the former case the last line output by the recursive query will haven = 1. Let's expand this knowledge into a proper function (because we want to be able to have the number we test for happiness as an argument):</p> <pre class="src"> create or replace function happy(x bigint)returns boolean language sql as $$ with recursive happy(n, seen) as ( select $1, <span style="color: #bc8f8f;">'{}'</span>::bigint[] union all
select sum(d*d), h.seen sum(d*d) <p>We need the from (select n, digits(n) as d, seen from happy ) as h group by h.n, h.seen having not seen @> array[sum(d*d)] ) select n = 1 as happy from happy order by array_length(seen, 1) desc nulls last limit 1 $$; </pre>
desc nulls lasttrick in theorder bybecause thearray_length()of any dimension of an empty array isNULL, and we certainly don't want to return all and any number as unhappy on the grounds that the query result contains a lineinput, {}. Let's now play the same tricks as in the puzzle article:</p> <pre class="src"># select array_agg(x) as happy from generate_series(1, 50) as t(x) where happy(x); happy <span style"color: #b22222;">———————————- </span> {1,7,10,13,19,23,28,31,32,44,49} (1 row)Time: 24.527 ms
# explain analyze select x from generate_series(1, 10000) as t(x) where happy(x); QUERY PLAN <span style"color: #b22222;">———————————————————— </span> Function Scan on generate_series t(cost=0.00..265.00 rows=333 width=4) (actual time=2.938..3651.019 rows=1442 loops=1) Filter: happy((x)::bigint) Total runtime: 3651.534 ms (3 rows)
Time: 3652.178 ms </pre>
<p>(Yes, I tricked theEXPLAIN ANALYZEoutput so that it fits on the page width here). For what it's worth, finding the first10000happy numbers in <em>Emacs Lisp</em> on the same laptop takes2830 ms, also running a recursive version of the code.</p> <h3>Update, the Emacs Lisp version, inline:</h3> <pre class="src"> (<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">happy?</span> (<span style="color: #228b22;">&optional</span> n seen)<span style="color: #bc8f8f;">"return true when n is a happy number"</span> (interactive) (<span style="color: #7f007f;">let*</span> ((number (or n (read-from-minibuffer <span style="color: #bc8f8f;">"Is this number happy: "</span>))) (digits (mapcar 'string-to-int (subseq (split-string number <span style="color: #bc8f8f;">""</span>) 1 -1))) (squares (mapcar (<span style="color: #7f007f;">lambda</span> (x) (* x x)) digits)) (happiness (apply '+ squares))) (<span style="color: #7f007f;">cond</span> ((eq 1 happiness) t) ((memq happiness seen) nil) (t (happy? (number-to-string happiness) (push happiness seen))))))
(<span style="color: #7f007f;">defun</span> <span style="color: #0000ff;">find-happy-numbers</span> (<span style="color: #228b22;">&optional</span> limit)
<span style="color: #bc8f8f;">"find all happy numbers from 1 to limit"</span> (interactive) (<span style="color: #7f007f;">let</span> ((count (or limit (read-from-minibuffer <span style="color: #bc8f8f;">"List of happy numbers from 1 to: "</span>))) happy) (<span style="color: #7f007f;">dotimes</span> (n (string-to-int count)) (<span style="color: #7f007f;">when</span> (happy? (number-to-string (1+ n))) (push (1+ n) happy))) (nreverse happy))) </pre> ]]></description> <author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 30 Aug 2010 11:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/blog/2010/08/30-happy-numbers.html</guid> </item> <item> <title>Happy Numbers</title> <link>http://tapoueh.org/blog/2010/08/30-happy-numbers.html</link> <description><![CDATA[h1>Happy Numbers</h1>
Monday, August 30 2010, 11:00 </div><p>After discovering the excellent <a href="http://gwene.org/">Gwene</a> service, which allows you to subscribe to <em>newsgroups</em> to readRSScontent (<em>blogs</em>, <em>planets</em>, <em>commits</em>, etc), I came to read this nice article about <a href="http://programmingpraxis.com/2010/07/23/happy-numbers/">Happy Numbers</a>. That's a little problem that fits well an interview style question, so I first solved it yesterday evening in <a href="http://www.gnu.org/software/emacs/emacs-lisp-intro/html_node/List-Processing.html#List-Processing">Emacs Lisp</a> as that's the language I use the most those days.</p> <blockquote> <p class="quoted"> A happy number is defined by the following process. Starting with any positive integer, replace the number by the sum of the squares of its digits, and repeat the process until the number equals 1 (where it will stay), or it loops endlessly in a cycle which does not include 1. Those numbers for which this process ends in 1 are happy numbers, while those that do not end in 1 are unhappy numbers (or sad numbers).</p> </blockquote> <p>Now, what about implementing the same in pureSQL, for more fun? Now that's interesting! After all, we didn't getWITH RECURSIVEfor tree traversal only, <a href="http://archives.postgresql.org/message-id/e08cc0400911042333o5361b21cu2c9438f82b1e55ce@mail.gmail.com">did we</a>?</p> <p>Unfortunately, we need a little helper function first, if only to ease the reading of the recursive query. I didn't try to inline it, but here it goes:</p> <pre class="src"> create or replace function digits(x bigint)<p>That was easy: it will output one row per digit of the input number — and rather than resorting to powers of ten and divisions and remainders, we do use plain old text representation andreturns setof int language sql as $$ select substring($1::text from i for 1)::int from generate_series(1, length($1::text)) as t(i) $$; </pre>
substring. Now, to the real problem. If you're read what is an happy number and already did read the fine manual about <a href="http://www.postgresql.org/docs/8.4/interactive/queries-with.html">Recursive Query Evaluation</a>, it should be quite easy to read the following:</p> <pre class="src"> with recursive happy(n, seen) as (select 7::bigint, <span style="color: #ad7fa8; font-style: italic;">'{}'</span>::bigint[] union all
select sum(d*d), h.seen sum(d*d) from (select n, digits(n) as d, seen from happy ) as h group by h.n, h.seen having not seen @> array[sum(d*d)] ) select * from happy;
n seen <span style="color: #888a85;">——+——————
</span> 7 {} 49 {49} 97 {49,97} 130 {49,97,130} 10 {49,97,130,10} 1 {49,97,130,10,1} (6 rows)
Time: 1.238 ms </pre>
<p>That shows how it works for some <em>happy</em> number, and it's easy to test for a non-happy one, like for example17. The query won't cycle thanks to theseenarray and thehavingfilter, so the only difference between an <em>happy</em> and a <em>sad</em> number will be that in the former case the last line output by the recursive query will haven = 1. Let's expand this knowledge into a proper function (because we want to be able to have the number we test for happiness as an argument):</p> <pre class="src"> create or replace function happy(x bigint)returns boolean language sql as $$ with recursive happy(n, seen) as ( select $1, <span style="color: #ad7fa8; font-style: italic;">'{}'</span>::bigint[] union all
select sum(d*d), h.seen sum(d*d) <p>We need the from (select n, digits(n) as d, seen from happy ) as h group by h.n, h.seen having not seen @> array[sum(d*d)] ) select n = 1 as happy from happy order by array_length(seen, 1) desc nulls last limit 1 $$; </pre>
desc nulls lasttrick in theorder bybecause thearray_length()of any dimension of an empty array isNULL, and we certainly don't want to return all and any number as unhappy on the grounds that the query result contains a lineinput, {}. Let's now play the same tricks as in the puzzle article:</p> <pre class="src"># select array_agg(x) as happy from generate_series(1, 50) as t(x) where happy(x); happy <span style"color: #888a85;">———————————- </span> {1,7,10,13,19,23,28,31,32,44,49} (1 row)Time: 24.527 ms
# explain analyze select x from generate_series(1, 10000) as t(x) where happy(x); QUERY PLAN <span style"color: #888a85;">———————————————————— </span> Function Scan on generate_series t(cost=0.00..265.00 rows=333 width=4) (actual time=2.938..3651.019 rows=1442 loops=1) Filter: happy((x)::bigint) Total runtime: 3651.534 ms (3 rows)
Time: 3652.178 ms </pre>
<p>(Yes, I tricked theEXPLAIN ANALYZEoutput so that it fits on the page width here). For what it's worth, finding the first10000happy numbers in <em>Emacs Lisp</em> on the same laptop takes2830 ms, also running a recursive version of the code.</p> <h3>Update, the Emacs Lisp version, inline:</h3> <pre class="src"> (<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">happy?</span> (<span style="color: #8ae234; font-weight: bold;">&optional</span> n seen)<span style="color: #888a85;">"return true when n is a happy number"</span> (interactive) (<span style="color: #729fcf; font-weight: bold;">let*</span> ((number (or n (read-from-minibuffer <span style="color: #ad7fa8; font-style: italic;">"Is this number happy: "</span>))) (digits (mapcar 'string-to-int (subseq (split-string number <span style="color: #ad7fa8; font-style: italic;">""</span>) 1 -1))) (squares (mapcar (<span style="color: #729fcf; font-weight: bold;">lambda</span> (x) (* x x)) digits)) (happiness (apply '+ squares))) (<span style="color: #729fcf; font-weight: bold;">cond</span> ((eq 1 happiness) t) ((memq happiness seen) nil) (t (happy? (number-to-string happiness) (push happiness seen))))))
(<span style="color: #729fcf; font-weight: bold;">defun</span> <span style="color: #edd400; font-weight: bold; font-style: italic;">find-happy-numbers</span> (<span style="color: #8ae234; font-weight: bold;">&optional</span> limit)
<h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a> <a href="../../../tags/emacs.html">Emacs</a></p><span style="color: #888a85;">"find all happy numbers from 1 to limit"</span> (interactive) (<span style="color: #729fcf; font-weight: bold;">let</span> ((count (or limit (read-from-minibuffer <span style="color: #ad7fa8; font-style: italic;">"List of happy numbers from 1 to: "</span>))) happy) (<span style="color: #729fcf; font-weight: bold;">dotimes</span> (n (string-to-int count)) (<span style="color: #729fcf; font-weight: bold;">when</span> (happy? (number-to-string (1+ n))) (push (1+ n) happy))) (nreverse happy))) </pre>
]]></description>
<p>A very good remark from some users: installing and managing<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Mon, 30 Aug 2010 11:00:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/30-happy-numbers.html</guid> </item> <item> <title>welcome el-get scratch installer</title> <link>http://tapoueh.org/blog/2010/08/blog/2010/08/27-welcome-el-get-scratch-installer.html</link> <description><![CDATA[<p><span class="hack"> </span></p>
el-getshould be simpler. They wanted both an easy install of the thing, and a way to be able to manage it afterwards (like, update the local copy against the authoritative source). So I decided it was high time for getting the code out of my~/.emacs.dgit repository and up to a public place: <a href="http://github.com/dimitri/el-get">http://github.com/dimitri/el-get</a>.</p> <p>Then, I added some documentation (aREADME), and then, a*scratch* installer, following great ideas fromELPA. So have at it, it's a copy paste away!</p> <p>Don't forget to setup yourel-get-sourcesand include there theel-getsource for updates, there's nothing magic about it so it's up to you. You may notice that it's not yet possible to initel-getfromel-get-sources, though, that's the drawback of the lack of magic. So you will have to still add an explicit(require 'el-get)before to go and define you ownel-get-sourcesthen finally(el-get). I don't think that's a problem I need to solve, though.</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 27 Aug 2010 14:15:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/blog/2010/08/27-welcome-el-get-scratch-installer.html</guid> </item> <item> <title>welcome el-get scratch installer</title> <link>http://tapoueh.org/blog/2010/08/27-welcome-el-get-scratch-installer.html</link> <description><![CDATA[h1>welcome el-get scratch installer</h1>
Friday, August 27 2010, 14:15 </div><p><span class="hack"> </span></p> <p>A very good remark from some users: installing and managingel-getshould be simpler. They wanted both an easy install of the thing, and a way to be able to manage it afterwards (like, update the local copy against the authoritative source). So I decided it was high time for getting the code out of my~/.emacs.dgit repository and up to a public place: <a href="http://github.com/dimitri/el-get">http://github.com/dimitri/el-get</a>.</p> <p>Then, I added some documentation (aREADME), and then, a*scratch* installer, following great ideas fromELPA. So have at it, it's a copy paste away!</p> <p>Don't forget to setup yourel-get-sourcesand include there theel-getsource for updates, there's nothing magic about it so it's up to you. You may notice that it's not yet possible to initel-getfromel-get-sources, though, that's the drawback of the lack of magic. So you will have to still add an explicit(require 'el-get)before to go and define you ownel-get-sourcesthen finally(el-get). I don't think that's a problem I need to solve, though.</p> <h2>Tags</h2> <p><a href="../../../tags/emacs.html">Emacs</a> <a href="../../../tags/el-get.html">el-get</a></p>]]></description>
<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Fri, 27 Aug 2010 14:15:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/27-welcome-el-get-scratch-installer.html</guid> </item> <item> <title>Playing with bit strings</title> <link>http://tapoueh.org/blog/2010/08/26-playing-with-bit-strings.html</link> <description><![CDATA[h1>Playing with bit strings</h1>
Thursday, August 26 2010, 17:45 </div><p>The idea of the day ain't directly from me, I'm just helping with a very thin subpart of the problem. The problem, I can't say much about, let's just assume you want to reduce the storage ofMD5in your database, so you want to abuse <a href="http://www.postgresql.org/docs/8.4/interactive/datatype-bit.html">bit strings</a>. A solution to use them works fine, but the datatype is still missing some facilities, for example going from and to hexadecimal representation in text.</p> <pre class="src"> create or replace function hex_to_varbit(h text)returns varbit language sql as $$
select (<span style="color: #ad7fa8; font-style: italic;">'X'</span> $1)::varbit; $$;
create or replace function varbit_to_hex(b varbit)
<p>To understand the magic in the second function, let's walk through the tests one could do when wanting to grasp how things work in thereturns text language sql as $$ select array_to_string(array_agg(to_hex((b << (32*o))::bit(32)::bigint)), <span style="color: #ad7fa8; font-style: italic;">''</span>) from (select b, generate_series(0, n-1) as o from (select $1, octet_length($1)/4) as t(b, n)) as x $$; </pre>
bitstringworld (using also some reading of the fine documentation, too).</p> <pre class="src"># select ('101011001011100110010110'::varbit << 0)::bit(8); bit ---------- 10101100 (1 row)# select ('101011001011100110010110'::varbit << 8)::bit(8);bit
10111001 (1 row)
# select ('101011001011100110010110'::varbit << 16)::bit(8); bit ---------- 10010110 (1 row)# select * from TEMP VERSION OF THE FUNCTION FOR TESTING
o b x —+———————————-+———-
0 10101100101111010001100011011011 acbd18db 1 01001100110000101111100001011100 4cc2f85c 2 11101101111011110110010101001111 edef654f 3 11001100110001001010010011011000 ccc4a4d8 (4 rows) </pre>
<p>What do we get from that, will you ask? Let's see a little example:</p> <pre class="src"># select hex_to_varbit(md5('foo')); hex_to_varbit ---------------------------------------------------------------------------------------------------------------------------------- 10101100101111010001100011011011010011001100001011111000010111001110110111101111011001010100111111001100110001001010010011011000 (1 row)# select md5('foo'), varbit_to_hex(hex_to_varbit(md5('foo')));
md5 varbit_to_hex
+———————————-
acbd18db4cc2f85cedef654fccc4a4d8 acbd18db4cc2f85cedef654fccc4a4d8 (1 row) </pre>
<p>Storingvarbitsrather than thetextform of theMD5allows us to go from6510 MBdown to4976 MBon a sample table containing 100 millions rows. We're targeting more that that, so that's a great win down here!</p> <p>In case you wonder, querying the main index onvarbitrather than the one ontextfor a single result row, the cost of doing the conversion withvarbit_to_hexseems to be around28 µs. We can afford it.</p> <p>Hope this helps!</p> <h2>Tags</h2> <p><a href="../../../tags/postgresql.html">PostgreSQL</a></p>]]></description>
<p>Also, as the<author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 26 Aug 2010 17:45:00 +0200</pubDate> <guid isPermaLink="true">http://tapoueh.org/blog/2010/08/26-playing-with-bit-strings.html</guid> </item> <item> <title>el-get news</title> <link>http://tapoueh.org/blog/2010/08/blog/2010/08/26-el-get-news.html</link> <description><![CDATA[<p>I've been receiving some requests for <a href="http://www.emacswiki.org/emacs/el-get.el">el-get</a>, some of them even included a patch. So now there's support for
bzr,CSVandhttp-tar, augmenting the existing support forgit,git-svn,apt-get,finkandELPAformats.</p>installand even thebuildare completely <em>asynchronous</em> — there's a pending bugfix for the building, which is now using <a href="http://www.gnu.org/software/emacs/elisp/html_node/Asynchronous-Processes.html">start-process-shell-command</a>. The advantage of doing so is that you're free to use Emacs as usual whileel-getis having your piece ofelispcode compiled, which can take time.</p> <p>The drawback is that it's uneasy to to do the associated setup at the right time without support fromel-get, so you have the new option:afterwhich takes afunctionpobject: please consider using that to give your own special setup for the external emacs bits and pieces you're using.</p> <p>Let's see some examples of the new features:</p> <pre class="src">(<span style="color: #da70d6;">:name</span> xml-rpc-el <span style="color: #da70d6;">:type</span> bzr <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"lp:xml-rpc-el"</span>)
(<span style="color: #da70d6;">:name</span> haskell-mode <span style="color: #da70d6;">:type</span> http-tar <span style="color: #da70d6;">:options</span> (<span style="color: #bc8f8f;">"xzf"</span>) <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">"http://projects.haskell.org/haskellmode-emacs/haskell-mode-2.8.0.</span><span style="color: #ffff00; background-color: #ff0000; font-weight: bold;">tar.gz"</span> <span style="color: #da70d6;">:load</span> <span style="color: #bc8f8f;">"haskell-site-file.el"</span> <span style="color: #da70d6;">:after</span> (<span style="color: #7f007f;">lambda</span> () (add-hook 'haskell-mode-hook 'turn-on-haskell-doc-mode) (add-hook 'haskell-mode-hook 'turn-on-haskell-indentation)))
<p>As you can see, there are also the new options(<span style="color: #da70d6;">:name</span> auctex <span style="color: #da70d6;">:type</span> cvs <span style="color: #da70d6;">:module</span> <span style="color: #bc8f8f;">"auctex"</span> <span style="color: #da70d6;">:url</span> <span style="color: #bc8f8f;">":pserver:anonymous@cvs.sv.gnu.org:/sources/auctex"</span> <span style="color: #da70d6;">:build</span> (<span style="color: #bc8f8f;">"./autogen.sh"</span> <span style="color: #bc8f8f;">"./configure"</span> <span style="color: #bc8f8f;">"make"</span>) <span style="color: #da70d6;">:load</span> (<span style="color: #bc8f8f;">"auctex.el"</span> <span style="color: #bc8f8f;">"preview/preview-latex.el"</span>) <span style="color: #da70d6;">:info</span> <span style="color: #bc8f8f;">"doc"</span>) </pre>
:module(only used byCVSso far) and:options(only used byhttp-tarso far). With this later method, the:optionskey allows you to have support for virtually any kind oftarcompression (.tar.bz2, etc).</p> <p>TheCVSsupport currently does not include authentication against the anonymouspserver, because the only repository I've been asked support for isn't using that, and the couple of servers that I know of are either wanting no password at the prompt, or a dummy one. That's for another day, if needed at all.</p> <p>That pushes the little local hack to more than a thousand lines ofelispcode, and the next steps include proposing it to <a href="http://tromey.com/elpa/">ELPA</a> so that getting to use it is easier than ever. You'd just have to choose whether to installELPAfromel-getor the other way around.</p> ]]></description><author>dim@tapoueh.org (Dimitri Fontaine)</author> <pubDate>Thu, 26 Aug 2010 16:30:00 +0200</pubDate> <guid isPermaLink="true">

