Posts Tagged ‘PL/SQL’

Parsing a CSV file in PL/SQL

The ability to parse a CSV file in PL/SQL seems like a simple requirement and one would think that you could either a) easily implement it yourself, or b) find some examples of it on the web.   Well if you have tried option A, you probably realized it gets real tricky when you have commas and double quotes in your actual data as well as having them as your deliminators and optionally enclosed by characters as well.  Plus all that substr‘ing and instr‘ing can really hurt your head after a while.  If you tried option B, then you probably discovered that there are some solutions out there, but they all seems to either incomplete, or just overly complex.

So I decided to write my own simple, yet complete CSV parser in PL/SQL.  It handles all data, both optionally enclosed by some character or not, as well as both DOS (CR+LF) and UNIX (LF only) end-of-line file formats.  And all this in less than 100 lines of code (with comments) and with only three distinct calls to substr() and NO calls to instr().

I wanted to share this in hopes that others find it useful.

create or replace procedure parse_csv(
  p_clob clob,
  p_delim varchar2 default ',',
  p_optionally_enclosed varchar2 default '"' ) is
  CARRIAGE_RETURN constant char(1) := chr(13);
  LINE_FEED constant char(1) := chr(10);
  l_char char(1);
  l_lookahead char(1);
  l_pos number := 0;
  l_token varchar2(32767) := null;
  l_token_complete boolean := false;
  l_line_complete boolean := false;
  l_new_token boolean := true;
  l_enclosed boolean := false;
  l_lineno number := 1;
  l_columnno number := 1;


    -- increment position index
    l_pos := l_pos + 1;

    -- get next character from clob
    l_char := dbms_lob.substr( p_clob, 1, l_pos);

    -- exit when no more characters to process
    exit when l_char is null or l_pos > dbms_lob.getLength( p_clob );

    -- if first character of new token is optionally enclosed character
    -- note that and skip it and get next character
    if l_new_token and l_char = p_optionally_enclosed then
      l_enclosed := true;
      l_pos := l_pos + 1;
      l_char := dbms_lob.substr( p_clob, 1, l_pos);
    end if;
    l_new_token := false;

    -- get look ahead character
    l_lookahead := dbms_lob.substr( p_clob, 1, l_pos+1 );

    -- inspect character (and lookahead) to determine what to do
    if l_char = p_optionally_enclosed and l_enclosed then

      if l_lookahead = p_optionally_enclosed then
        l_pos := l_pos + 1;
        l_token := l_token || l_lookahead;
      elsif l_lookahead = p_delim then
        l_pos := l_pos + 1;
        l_token_complete := true;
        l_enclosed := false;
      end if;

    elsif l_char in ( CARRIAGE_RETURN, LINE_FEED ) and NOT l_enclosed then
      l_token_complete := true;
      l_line_complete := true;

      if l_lookahead in ( CARRIAGE_RETURN, LINE_FEED ) then
        l_pos := l_pos + 1;
      end if;

    elsif l_char = p_delim and not l_enclosed then
      l_token_complete := true;

    elsif l_pos = dbms_lob.getLength( p_clob ) then
      l_token := l_token || l_char;
      l_token_complete := true;
      l_line_complete := true;

      l_token := l_token || l_char;
    end if;

    -- process a new token
    if l_token_complete then
      dbms_output.put_line( 'R' || l_lineno || 'C' || l_columnno || ': ' ||
                            nvl(l_token,'**null**') );
      l_columnno := l_columnno + 1;
      l_token := null;
      l_enclosed := false;
      l_new_token := true;
      l_token_complete := false;
    end if;

    -- process end-of-line here
    if l_line_complete then
      dbms_output.put_line( '-----' );
      l_lineno := l_lineno + 1;
      l_columnno := 1;
      l_line_complete := false;
    end if;
  end loop;
end parse_csv;

And here is a little test procedure to show it working. I have made the end-of-line different for each like to demonstrate this will work with all EOL terminators. In real-life (I hope) your CSV file will have just one.

  l_clob clob :=
    -- DOS EOL
    'A,B,C,D,E,F,G,H,I' || chr(13) || chr(10) ||
    -- Apple up to OS9 EOL
    '1,"2,3","1""2","""4,",",5"' || chr(13) ||
    -- Acorn BBD and RISC OS EOL
    '6,"this is a ""test",""",8","9"",","10,"""' || chr(10) || chr(13) ||
    -- Unix and OS X EOL
    'normal,"commas,,,in the field","""enclosed""","random "" double "" quotes","commas,,, "" and double """" quotes"' || chr(10) ||
    -- Line with EOF only
  parse_csv( l_clob );

And when I run it I get…

R1C1: A
R1C2: B
R1C3: C
R1C4: D
R1C5: E
R1C6: F
R1C7: G
R1C8: H
R1C9: I
R2C1: 1
R2C2: 2,3
R2C3: 1"2
R2C4: "4,
R2C5: ,5
R3C1: 6
R3C2: this is a "test
R3C3: ",8
R3C4: 9",
R3C5: 10,"
R4C1: normal
R4C2: commas,,,in the field
R4C3: "enclosed"
R4C4: random " double " quotes
R4C5: commas,,, " and double "" quotes
R5C1: ,F"
R5C2: **null**
R5C3: **null**
R5C4: **null**
R5C5: abcde

I think I have covered all the bases and possibilities for parsing a CSV file. You can easily modify the code to store the tokens as rows in a table or push them into an Apex collection for further processing later. I just used dbms_output.put_line() to show it working.

Give it a try and let me know if you find a case that this code does not handle.


Categories: Oracle, PL/SQL, SQL Tags: , , , ,

Email BLOB Attachments From The Database

December 1, 2009 34 comments

Recently I was developing an oracle database application and had the requirement to send out an email with an attachment. Initially it sounded pretty easy. I’ll just use the APEX_MAIL package and its all handled for me. But then I remembered, this was not an APEX application I was developing. So then I thought, no problem, I’ll just use the UTL_MAIL package. There is an API to add attachments. But wait, you can only add VARCHAR and RAW attachments and there is a max length of 32K on them. Not very useful if you want to send a multi-megabyte document as an attachment ( ie. a blob). So I had to drop down to an even lower API, UTL_SMTP and handle it all myself. Oh, and the attachment needed to be zipped as well (to save space in the tubes of our interwebs). I found the exercise interesting and thought I would share it with you.

For this example we have a table

create table assets(
  id number primary key,
  name varchar2(100),
  content blob
  mime_type varchar2(100) );

And the code

procedure mail_asset(
  p_asset_id number,
  p_from varchar2,
  p_to varchar2,
  p_subject varchar2,
  p_message varchar2 ) is
  l_asset assets%rowtype;
  l_blob blob := to_blob('1');
  l_conn utl_smtp.connection;
  l_raw raw(57);
  l_len integer := 0;
  l_idx integer := 1;
  l_buff_size integer := 57;
  l_boundary varchar2(32) := sys_guid();
  l_attachment_name long;

  -- Connect
  l_conn := utl_smtp.open_connection( '' );
  utl_smtp.helo( l_conn, '' );
  utl_smtp.mail( l_conn, p_from );
  utl_smtp.rcpt( l_conn, p_to );

  -- Header
  utl_smtp.write_data( l_conn, 'From: ' || p_from || utl_tcp.crlf );
  utl_smtp.write_data( l_conn, 'To: ' || p_to || utl_tcp.crlf );
  utl_smtp.write_data( l_conn, 'Subject: ' || p_subject ||
                               utl_tcp.crlf );
  utl_smtp.write_data( l_conn, 'MIME-Version: 1.0' || utl_tcp.crlf );
  utl_smtp.write_data( l_conn, 'Content-Type: multipart/mixed; ' ||
                               utl_tcp.crlf );
  utl_smtp.write_data( l_conn, ' boundary= "' || l_boundary || '"' ||
                               utl_tcp.crlf );
  utl_smtp.write_data( l_conn, utl_tcp.crlf );

  -- Body
  utl_smtp.write_data( l_conn, '--' || l_boundary || utl_tcp.crlf );
  utl_smtp.write_data( l_conn, 'Content-Type: text/plain;' ||
                               utl_tcp.crlf );
  utl_smtp.write_data( l_conn, ' charset=US-ASCII' || utl_tcp.crlf );
  utl_smtp.write_data( l_conn, utl_tcp.crlf );
  utl_smtp.write_data( l_conn, l_message || utl_tcp.crlf );
  utl_smtp.write_data( l_conn, utl_tcp.crlf );

  -- Fetch the asset info
  for c in ( select *
               from assets
              where asset_id = p_asset_id ) loop

    -- Compress if a content and not already zipped.
    if ( c.mime_type != 'application/zip' then
      utl_compress.lz_compress( src => c.content,
                                dst => l_blob );
      l_attachment_name := || '.gz';
      l_blob := c.content;
      l_attachment_name :=;
    end if;

    -- Attachment
    utl_smtp.write_data( l_conn, '--' || l_boundary || utl_tcp.crlf );
    utl_smtp.write_data( l_conn, 'Content-Type: application/octet-stream'
                                 || utl_tcp.crlf );
    utl_smtp.write_data( l_conn, 'Content-Disposition: attachment; ' ||
                                 utl_tcp.crlf );
    utl_smtp.write_data( l_conn, ' filename="' || l_attachment_name || '"'
                                 || utl_tcp.crlf );
    utl_smtp.write_data( l_conn, 'Content-Transfer-Encoding: base64' ||
                                 utl_tcp.crlf );
    utl_smtp.write_data( l_conn, utl_tcp.crlf );

    -- Loop through the blob
    -- chuck it up into 57-byte pieces
    -- and base64 encode it and write it into the mail buffer
    l_len := dbms_lob.getlength(l_blob);
    while l_idx < l_len loop l_blob, l_buff_size, l_idx, l_raw );
      utl_smtp.write_raw_data( l_conn, utl_encode.base64_encode(l_raw) );
      utl_smtp.write_data( l_conn, utl_tcp.crlf );
      l_idx := l_idx + l_buff_size;
    end loop;
    utl_smtp.write_data( l_conn, utl_tcp.crlf );

  end loop;

  -- Close Email
  utl_smtp.write_data( l_conn, '--' || l_boundary || '--' ||
                                         utl_tcp.crlf );
  utl_smtp.write_data( l_conn, utl_tcp.crlf || '.' || utl_tcp.crlf );
  utl_smtp.close_data( l_conn );
  utl_smtp.quit( l_conn );

  -- smtp errors, close connection and reraise
  when utl_smtp.transient_error or
       utl_smtp.permanent_error then
    utl_smtp.quit( l_conn );

end mail_asset;

Just a few things about the code.

Line 15 – I use sys_guid() to generate a random 32 character string which I use as the boundary marker in the email message. You can use any character string you like.

Lines 20 and 21 – You will need to change to your smtp server.

Line 34 – You will notice that there is a space in front of boundary. That is on purpose. The reason for that is that Content-Type (Line 32) and the boundary could be on the same line like this:

Content-Type: multipart/mixed; boundary=”xxxxxxxxxxxx”

But for formatting sake, I broke it up into two lines, so it needs to be sent to the smtp server like

Content-Type: multipart/mixed;

^ (single leading space)

with a leading space on boundary. I also did it on lines 42 and 68.

Line 79 – No you can not do larger reads. Each line of the encoded attachment needs to be 57 bytes.

In my example, I will only get at most 1 attachment because I pass in the asset_id and that is defined as the primary key. But if the query at line 48 was changed to possibly return more than a single row, the code would still work and we would have an email with multiple attachments.

Enhanced APEX Shuttle Item

April 28, 2009 34 comments

Recently I have been really, really busy building some internal applications for my company. All of them using Oracle and APEX of course. To satisfy some of application’s requirements, and just for usability sake, I had to tweek the default behavior of the APEX shuttle item. The reason was the initial LOV that populated the shuttle contained over 500 values. About 490+ too many as far as I was concerned. The users felt the same way. They complained that it was too hard to find what they were looking for, even though the list was in alphabetical order and I agreed. So, I decided to add the functionality to allow the user to constrain the values in left-hand side of the shuttle. Similar to the Popup LOVs with a filter option, but I wanted to use AJAX to accomplish it to avoid a page submit. After a bunch of playing around and optimization, here is what I came up with:
Username/Password: demo/demo

I was surprised that it was not that hard to accomplish and there are very few moving parts. All you need is a javascript function to make the AJAX call, an On Demand process to fetch only those values that meet the filter’s constraint and a way to trigger it.

How It Works.

As the user types in the textbox the shuttle automagically filters itself. I use an onKeyUp event on the textbox P10_SEARCH_BY to trigger the AJAX call.

onKeyUp="populateShuttle( this,'P10_SHUTTLE');"

So everytime a KeyUp event happens in that textbox, a call to the populateShuttle function happens. That function is responsible for making the AJAX call to the On-Demand Process, which fetches the appropriate values, and then parsing the results and repopulating the left hand side of the shuttle.

function populateShuttle(filter,shuttleName)
  var right = $x(shuttleName+"_RIGHT");

  for ( var i=0;i<right .length; i++ )
    right.options&#91;i&#93;.selected = true;

  var req = new htmldb_Get( null, 19903, 
                          'APPLICATION_PROCESS=populateShuttle', 0 );

  req.addParam( 'x01', $v(right) );
  req.addParam( 'x02', $v(filter) );

  var resp = eval(req.get());

  var left = $x(shuttleName+"_LEFT");

  if (resp)
    for ( var i=0; i<resp.length; i++ )
      left.options&#91;i&#93; = new Option(resp&#91;i&#93;.data, resp&#91;i&#93;.id);	

  req = null;

Although APEX treats the shuttle as a single element, it is in fact made up of many HTML elements, 10 images and 2 multiselect list boxes.  With a little poking around the HTML of a generated APEX page with a shuttle item on it, I was able to determine that the actual DOM names of the multiselect list boxes of a shuttle were <strong>[SHUTTLE_NAME]_LEFT and <strong>[SHUTTLE_NAME]_RIGHT</strong>.   How convenient.  Accessing the actual elements was then trivial.  I just used the APEX supplied function <strong>$x()</strong> function to get the elements.

The first thing I do is get the right textbox and loop over its values, selecting each one.  This is so when I fetch its value, I get all the elements in the list.  I need this information so when I fetch the values for the left hand side of the shuttle, based on the user's filter, I do NOT bring back any values that have already been moved to the right hand side of the shuttle.  In my example say you had already selected KING and moved him to the right.  Then you type a 'K' in the filter_by textbox, you will only get back CLARK and BLAKE even though KING also meets the constraint of containing a 'K'.
Next, I get the value of the right hand side of the shuttle and the value of the filter_by textbox, using the <strong>$v()</strong> function and send the values to the On Demand process via the APEX global variables <strong>g_x01</strong> and <strong>g_x02</strong>.

The On Demand process is just a simple PL/SQL block that is a query loop.

  l_selected long  := wwv_flow.g_x01;
  l_search long := wwv_flow.g_x02;
  for c in (
    select ename, empno, rownum r
      from emp
     where regexp_like( ename, nvl(l_search,ename), 'i' )
       and nvl(instr(':'||l_selected||':',':'||empno||':'),0)=0 )
    htp.prn( case when c.r&gt;1 then',' else null end ||
             '{ id: ' || c.empno ||
             ', data: "' || c.ename || '"}' );
  end loop;

It fetches all the values that match the filter

where regexp_like( ename, nvl(l_search,ename), 'i' )

but not those already be selected by the user.

and nvl(instr(':'||l_selected||':',':'||empno||':'),0)=0)

Now it’s just a simple process of looping over the rows, packaging them up and sending them back. I have become a big fan of JSON ( JavaScript Object Notation ) , so that how I am packaging up the payload to pass back. Basically, I’m making the results into a javascript array of objects. Each object has an id and data element. The payload look something like this

[{id: 7839, data: "KING"}{id: 7698, data: "BLAKE"} ... ]

This makes it super simple easy to work with once it gets back to the calling javascript function. All you need to do is apply an eval() on it and its transformed into a true javascript array of objects that can be easily looped over and referenced.

In one line of code, I call the On Demand process, getting back the JSON packed result and eval() it.

var resp = eval(req.get());

Now the variable resp is an Array of objects with two elements in each object, id and data. Looping over the values and populating the left hand side of the shuttle is trivial now.

  if (resp)
    for ( var i=0; i<resp.length; i++ )
      left.options[i] = new Option(resp[i].data, resp[i].id);

And there you have it. I hope I have shown that with some minor tweeks, you can extend the functionality of the base APEX items to be much more functional and user friendly. As always, comments ( good and bad ) are always welcome. Read more…

Categories: APEX, Oracle, PL/SQL, SQL Tags: , , , ,

More Google Geocoding

August 11, 2008 7 comments

Ok, for those who liked the prior post on Google geocoding via HTTP, you will enjoy this as well.  It builds on what was done in that example but retrieves even more information from google.

Before we used an output type of CSV and that only gave us back the LATITUDE and LONGITUDE.  But there is more information that we can derive from the geocoding service if we use a different output type.  If we request the result in XML, then we are given the complete address as well as the longitude and latitude.  But we have to do a little extra work to parse out the results.

I have written a PL/SQL object type to do just that.  Why an object type and not a Package?  Personal preference.  In this case, it just seemed to feel right to make it an object.  With some simple code factoring, it could easily be converted into a package.  But since most PL/SQL developers don’t code object types, thought it might be interesting.

Let’s take a look at the object’s specification.

create or replace
type google_geocode_type as object (

  -- Attributes
  response xmlType,
  search varchar2(32767),

  -- Constructor Methods
  constructor function google_geocode_type(
    p_search varchar2 default null ) return self as result,

  -- Execute Geocode
  member procedure execute(
    p_search varchar2 default null ),

  -- Getter Routines
  member function get_result_count return number,

  member function get_geometry(
    p_sequence number default 1 ) return sdo_geometry,

  member function get_latitude(
    p_sequence number default 1 ) return number,

  member function get_longitude(
    p_sequence number default 1 ) return number,

  member function get_address(
    p_sequence number default 1 ) return varchar2,

  member function get_street(
    p_sequence number default 1 ) return varchar2,

  member function get_city(
    p_sequence number default 1 ) return varchar2,

  member function get_state(
    p_sequence number default 1 ) return varchar2,

  member function get_zipcode(
    p_sequence number default 1 ) return varchar2,

  member function get_county(
    p_sequence number default 1 ) return varchar2,

  member function get_country(
    p_sequence number default 1 ) return varchar2


The object’s body is a bit long and I will not post all of it here but will make it all available for download.  But I will post a few of the member functions and talk about what they are doing.

constructor function google_geocode_type(
  p_search varchar2 default null ) return self as result is
begin := p_search;

For those who are not familiar with PL/SQL object types, a constructor function is the method called when you create a new instance of the object (just like java).  Every object has a default constructor, but you can define your own so to manage the object’s creation as well as defaulting certain object attributes.  For me, I did not want the use to have to supply a value for the attribute RESPONSE so I defined my own.

The EXECUTE() method is very similar to the the prior example.  I made a minor change to handle the case where we get back more than 32K of data.

member procedure execute( p_search varchar2 default null ) is
  l_http_req  utl_http.req;
  l_http_resp utl_http.resp;
  l_response long;
  l_clob clob;
  if p_search is not null then := p_search;
  end if;
  l_http_req := utl_http.begin_request(
                  url => '' ||
                  '?q=' || utl_url.escape( ) ||  -- address to geocode
                  '&output=xml' ||  -- XML return type
                  '&key=abcdef' );  -- site key
  l_http_resp := utl_http.get_response( l_http_req );
      utl_http.read_text( l_http_resp, l_response );
      l_clob := l_clob || l_response;
    end loop;
    when utl_http.end_of_body then
  utl_http.end_response( l_http_resp );
  self.response := sys.xmlType.createXML( l_clob );
end execute;

Now depending on how specific the address you supply, you may get more than one answer.  I sent in a value of “1600 Pennsylvania Ave” and it returned 10 results.  If I made that search more specific, say “1600 Pennsylvania Ave, DC”, then I get just a single result.  But since we don’t know how many results we will get, we need to parse out all the results.  So the first thing we need to know is how many results came back.

member function get_result_count return number is
  l_count number;
  select count(*)
    into l_count
    from table( XMLSequence( extract(
                 'xmlns=""' ) ) );
  return l_count;
end get_result_count;

This method calculates how many results are in the XML returned by finding each <Placemark> section,  and then counting them using the XMLSequence and Table functions in SQL.

Now we know how many results, we can begin to ask for the individual values.

member function get_zipcode(
  p_sequence number default 1 ) return varchar2 is
  return self.response.extract(
           '//Placemark[@id="p' || p_sequence || '"]',
           'xmlns=""' ).extract(
             'xmlns="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0"' ).getStringVal();
end get_zipcode;

Here in the GET_ZIPCODE() method, we use the EXTRACT() method on the XMLType variable RESPONSE and XPATH searching to retrieve the proper zipcode.  I had to do and extract() of an extract() because the relavant peices of data were in different namespaces in the XML.  (My XPATH searching ability is not that strong so there may be an easier way?  If you know of one, let me know.)

All the other get methods are similiar to this example.

Now let’s see how we would use this.

  1  declare
  2    g google_geocode_type := new google_geocode_type();
  3  begin
  4    g.execute( '1910 Oracle way, reston' );
  5      dbms_output.put_line( 'lat/lon: ' ||
  6                            g.get_latitude()||'/'||
  7                            g.get_longitude() );
  8      dbms_output.put_line( 'Address: ' || g.get_address() );
  9      dbms_output.put_line( 'County: ' || g.get_county() );
 10* end;
SQL> /
lat/lon: -77.351976/38.954872
Address: 1910 Oracle Way, Reston, VA 20190, USA
County: Fairfax County

PL/SQL procedure successfully completed.

Given just the street and the city, google was able to determine the full address and the latitude and longitude.

Now let’s see what happens when we supply even less information

  1  declare
  2    g google_geocode_type := new google_geocode_type();
  3  begin
  4    g.execute( '1600 Pennsylvania Ave' );
  5    for i in 1 .. g.get_result_count() loop
  6      dbms_output.put_line( 'lat/lon: ' ||
  7                            g.get_latitude(i)||'/'||
  8                            g.get_longitude(i) );
  9      dbms_output.put_line( 'Address: ' || g.get_address(i) );
 10    end loop;
 11* end;
SQL> /
lat/lon: -90.229033/38.617594
Address: 1600 Pennsylvania Ave, St Louis, MO 63104, USA
lat/lon: -76.880242/42.031789
Address: 1600 Pennsylvania Ave, Pine City, NY 14871, USA
lat/lon: -82.984848/42.36331
Address: 1600 Pennsylvania St, Detroit, MI 48214, USA
lat/lon: -76.634388/39.30307
Address: 1600 Pennsylvania Ave, Baltimore, MD 21217, USA
lat/lon: -96.77534/32.759033
Address: 1600 Pennsylvania Ave, Dallas, TX 75215, USA
lat/lon: -81.620803/38.360844
Address: 1600 Pennsylvania Ave, Charleston, WV 25302, USA
lat/lon: -80.27183/40.687529
Address: 1600 Pennsylvania Ave, Monaca, PA 15061, USA
lat/lon: -79.8573/40.362383
Address: 1600 Pennsylvania Ave, West Mifflin, PA 15122, USA
lat/lon: -117.32709/34.084866
Address: 1600 Pennsylvania Ave, Colton, CA 92324, USA
lat/lon: -75.185584/40.121061
Address: 1600 Pennsylvania Ave, Oreland, PA 19075, USA

PL/SQL procedure successfully completed.

There sure are a lot of 1600 Pennsylvania Ave’s.  But if you notice, the most famous one is not even in the list?  Why?  I have NO idea???  But if we make our search a bit more specific…

  1  declare
  2    g google_geocode_type := new google_geocode_type();
  3  begin
  4    g.execute( '1600 Pennsylvania Ave, dc' );
  5    for i in 1 .. g.get_result_count() loop
  6      dbms_output.put_line( 'lat/lon: ' ||
  7                            g.get_latitude(i)||'/'||
  8                            g.get_longitude(i) );
  9      dbms_output.put_line( 'Address: ' || g.get_address(i) );
 10    end loop;
 11* end;
SQL> /
lat/lon: -77.036698/38.897102
Address: 1600 Pennsylvania Ave NW, Washington, DC 20006, USA

PL/SQL procedure successfully completed.

… we find that famous one that will be getting a new family moving in this January.

I hope this was interesting and helpful.  Again, if you want to get the code and try it out yourself, its right here.  And as always, if you have any questions, just ask.

Categories: PL/SQL Tags: , , , ,