The problem was raised this week on IRC and this time again I felt it would be a good occasion for a blog entry: how to load an XML file content into a single field?

The usual tool used to import files is COPY, but it’ll want each line of the file to host a text representation of a database tuple, so it doesn’t apply to the case at hand. RhodiumToad was online and offered the following code to solve the problem:

create or replace function xml_import(filename text)
  returns xml
  language plpgsql as
        content bytea;
        loid oid;
        lfd integer;
        lsize integer;
        loid := lo_import(filename);
        lfd := lo_open(loid,262144);
        lsize := lo_lseek(lfd,0,2);
        perform lo_lseek(lfd,0,0);
        content := loread(lfd,lsize);
        perform lo_close(lfd);
        perform lo_unlink(loid);
        return xmlparse(document convert_from(content,'UTF8'));

As you can see, the trick here is to use the large objects API to load the file content into memory ( content variable), then to parse it knowing it’s an UTF8 encoded XML file and return an XML datatype object.