Archive for the ‘SQL’ Category

Parsing a CSV file in PL/SQL

The ability to parse a CSV file in PL/SQL seems like a simple requirement and one would think that you could either a) easily implement it yourself, or b) find some examples of it on the web.   Well if you have tried option A, you probably realized it gets real tricky when you have commas and double quotes in your actual data as well as having them as your deliminators and optionally enclosed by characters as well.  Plus all that substr‘ing and instr‘ing can really hurt your head after a while.  If you tried option B, then you probably discovered that there are some solutions out there, but they all seems to either incomplete, or just overly complex.

So I decided to write my own simple, yet complete CSV parser in PL/SQL.  It handles all data, both optionally enclosed by some character or not, as well as both DOS (CR+LF) and UNIX (LF only) end-of-line file formats.  And all this in less than 100 lines of code (with comments) and with only three distinct calls to substr() and NO calls to instr().

I wanted to share this in hopes that others find it useful.

create or replace procedure parse_csv(
  p_clob clob,
  p_delim varchar2 default ',',
  p_optionally_enclosed varchar2 default '"' ) is
  CARRIAGE_RETURN constant char(1) := chr(13);
  LINE_FEED constant char(1) := chr(10);
  l_char char(1);
  l_lookahead char(1);
  l_pos number := 0;
  l_token varchar2(32767) := null;
  l_token_complete boolean := false;
  l_line_complete boolean := false;
  l_new_token boolean := true;
  l_enclosed boolean := false;
  l_lineno number := 1;
  l_columnno number := 1;


    -- increment position index
    l_pos := l_pos + 1;

    -- get next character from clob
    l_char := dbms_lob.substr( p_clob, 1, l_pos);

    -- exit when no more characters to process
    exit when l_char is null or l_pos > dbms_lob.getLength( p_clob );

    -- if first character of new token is optionally enclosed character
    -- note that and skip it and get next character
    if l_new_token and l_char = p_optionally_enclosed then
      l_enclosed := true;
      l_pos := l_pos + 1;
      l_char := dbms_lob.substr( p_clob, 1, l_pos);
    end if;
    l_new_token := false;

    -- get look ahead character
    l_lookahead := dbms_lob.substr( p_clob, 1, l_pos+1 );

    -- inspect character (and lookahead) to determine what to do
    if l_char = p_optionally_enclosed and l_enclosed then

      if l_lookahead = p_optionally_enclosed then
        l_pos := l_pos + 1;
        l_token := l_token || l_lookahead;
      elsif l_lookahead = p_delim then
        l_pos := l_pos + 1;
        l_token_complete := true;
        l_enclosed := false;
      end if;

    elsif l_char in ( CARRIAGE_RETURN, LINE_FEED ) and NOT l_enclosed then
      l_token_complete := true;
      l_line_complete := true;

      if l_lookahead in ( CARRIAGE_RETURN, LINE_FEED ) then
        l_pos := l_pos + 1;
      end if;

    elsif l_char = p_delim and not l_enclosed then
      l_token_complete := true;

    elsif l_pos = dbms_lob.getLength( p_clob ) then
      l_token := l_token || l_char;
      l_token_complete := true;
      l_line_complete := true;

      l_token := l_token || l_char;
    end if;

    -- process a new token
    if l_token_complete then
      dbms_output.put_line( 'R' || l_lineno || 'C' || l_columnno || ': ' ||
                            nvl(l_token,'**null**') );
      l_columnno := l_columnno + 1;
      l_token := null;
      l_enclosed := false;
      l_new_token := true;
      l_token_complete := false;
    end if;

    -- process end-of-line here
    if l_line_complete then
      dbms_output.put_line( '-----' );
      l_lineno := l_lineno + 1;
      l_columnno := 1;
      l_line_complete := false;
    end if;
  end loop;
end parse_csv;

And here is a little test procedure to show it working. I have made the end-of-line different for each like to demonstrate this will work with all EOL terminators. In real-life (I hope) your CSV file will have just one.

  l_clob clob :=
    -- DOS EOL
    'A,B,C,D,E,F,G,H,I' || chr(13) || chr(10) ||
    -- Apple up to OS9 EOL
    '1,"2,3","1""2","""4,",",5"' || chr(13) ||
    -- Acorn BBD and RISC OS EOL
    '6,"this is a ""test",""",8","9"",","10,"""' || chr(10) || chr(13) ||
    -- Unix and OS X EOL
    'normal,"commas,,,in the field","""enclosed""","random "" double "" quotes","commas,,, "" and double """" quotes"' || chr(10) ||
    -- Line with EOF only
  parse_csv( l_clob );

And when I run it I get…

R1C1: A
R1C2: B
R1C3: C
R1C4: D
R1C5: E
R1C6: F
R1C7: G
R1C8: H
R1C9: I
R2C1: 1
R2C2: 2,3
R2C3: 1"2
R2C4: "4,
R2C5: ,5
R3C1: 6
R3C2: this is a "test
R3C3: ",8
R3C4: 9",
R3C5: 10,"
R4C1: normal
R4C2: commas,,,in the field
R4C3: "enclosed"
R4C4: random " double " quotes
R4C5: commas,,, " and double "" quotes
R5C1: ,F"
R5C2: **null**
R5C3: **null**
R5C4: **null**
R5C5: abcde

I think I have covered all the bases and possibilities for parsing a CSV file. You can easily modify the code to store the tokens as rows in a table or push them into an Apex collection for further processing later. I just used dbms_output.put_line() to show it working.

Give it a try and let me know if you find a case that this code does not handle.


Categories: Oracle, PL/SQL, SQL Tags: , , , ,

Email BLOB Attachments From The Database

December 1, 2009 34 comments

Recently I was developing an oracle database application and had the requirement to send out an email with an attachment. Initially it sounded pretty easy. I’ll just use the APEX_MAIL package and its all handled for me. But then I remembered, this was not an APEX application I was developing. So then I thought, no problem, I’ll just use the UTL_MAIL package. There is an API to add attachments. But wait, you can only add VARCHAR and RAW attachments and there is a max length of 32K on them. Not very useful if you want to send a multi-megabyte document as an attachment ( ie. a blob). So I had to drop down to an even lower API, UTL_SMTP and handle it all myself. Oh, and the attachment needed to be zipped as well (to save space in the tubes of our interwebs). I found the exercise interesting and thought I would share it with you.

For this example we have a table

create table assets(
  id number primary key,
  name varchar2(100),
  content blob
  mime_type varchar2(100) );

And the code

procedure mail_asset(
  p_asset_id number,
  p_from varchar2,
  p_to varchar2,
  p_subject varchar2,
  p_message varchar2 ) is
  l_asset assets%rowtype;
  l_blob blob := to_blob('1');
  l_conn utl_smtp.connection;
  l_raw raw(57);
  l_len integer := 0;
  l_idx integer := 1;
  l_buff_size integer := 57;
  l_boundary varchar2(32) := sys_guid();
  l_attachment_name long;

  -- Connect
  l_conn := utl_smtp.open_connection( '' );
  utl_smtp.helo( l_conn, '' );
  utl_smtp.mail( l_conn, p_from );
  utl_smtp.rcpt( l_conn, p_to );

  -- Header
  utl_smtp.write_data( l_conn, 'From: ' || p_from || utl_tcp.crlf );
  utl_smtp.write_data( l_conn, 'To: ' || p_to || utl_tcp.crlf );
  utl_smtp.write_data( l_conn, 'Subject: ' || p_subject ||
                               utl_tcp.crlf );
  utl_smtp.write_data( l_conn, 'MIME-Version: 1.0' || utl_tcp.crlf );
  utl_smtp.write_data( l_conn, 'Content-Type: multipart/mixed; ' ||
                               utl_tcp.crlf );
  utl_smtp.write_data( l_conn, ' boundary= "' || l_boundary || '"' ||
                               utl_tcp.crlf );
  utl_smtp.write_data( l_conn, utl_tcp.crlf );

  -- Body
  utl_smtp.write_data( l_conn, '--' || l_boundary || utl_tcp.crlf );
  utl_smtp.write_data( l_conn, 'Content-Type: text/plain;' ||
                               utl_tcp.crlf );
  utl_smtp.write_data( l_conn, ' charset=US-ASCII' || utl_tcp.crlf );
  utl_smtp.write_data( l_conn, utl_tcp.crlf );
  utl_smtp.write_data( l_conn, l_message || utl_tcp.crlf );
  utl_smtp.write_data( l_conn, utl_tcp.crlf );

  -- Fetch the asset info
  for c in ( select *
               from assets
              where asset_id = p_asset_id ) loop

    -- Compress if a content and not already zipped.
    if ( c.mime_type != 'application/zip' then
      utl_compress.lz_compress( src => c.content,
                                dst => l_blob );
      l_attachment_name := || '.gz';
      l_blob := c.content;
      l_attachment_name :=;
    end if;

    -- Attachment
    utl_smtp.write_data( l_conn, '--' || l_boundary || utl_tcp.crlf );
    utl_smtp.write_data( l_conn, 'Content-Type: application/octet-stream'
                                 || utl_tcp.crlf );
    utl_smtp.write_data( l_conn, 'Content-Disposition: attachment; ' ||
                                 utl_tcp.crlf );
    utl_smtp.write_data( l_conn, ' filename="' || l_attachment_name || '"'
                                 || utl_tcp.crlf );
    utl_smtp.write_data( l_conn, 'Content-Transfer-Encoding: base64' ||
                                 utl_tcp.crlf );
    utl_smtp.write_data( l_conn, utl_tcp.crlf );

    -- Loop through the blob
    -- chuck it up into 57-byte pieces
    -- and base64 encode it and write it into the mail buffer
    l_len := dbms_lob.getlength(l_blob);
    while l_idx < l_len loop l_blob, l_buff_size, l_idx, l_raw );
      utl_smtp.write_raw_data( l_conn, utl_encode.base64_encode(l_raw) );
      utl_smtp.write_data( l_conn, utl_tcp.crlf );
      l_idx := l_idx + l_buff_size;
    end loop;
    utl_smtp.write_data( l_conn, utl_tcp.crlf );

  end loop;

  -- Close Email
  utl_smtp.write_data( l_conn, '--' || l_boundary || '--' ||
                                         utl_tcp.crlf );
  utl_smtp.write_data( l_conn, utl_tcp.crlf || '.' || utl_tcp.crlf );
  utl_smtp.close_data( l_conn );
  utl_smtp.quit( l_conn );

  -- smtp errors, close connection and reraise
  when utl_smtp.transient_error or
       utl_smtp.permanent_error then
    utl_smtp.quit( l_conn );

end mail_asset;

Just a few things about the code.

Line 15 – I use sys_guid() to generate a random 32 character string which I use as the boundary marker in the email message. You can use any character string you like.

Lines 20 and 21 – You will need to change to your smtp server.

Line 34 – You will notice that there is a space in front of boundary. That is on purpose. The reason for that is that Content-Type (Line 32) and the boundary could be on the same line like this:

Content-Type: multipart/mixed; boundary=”xxxxxxxxxxxx”

But for formatting sake, I broke it up into two lines, so it needs to be sent to the smtp server like

Content-Type: multipart/mixed;

^ (single leading space)

with a leading space on boundary. I also did it on lines 42 and 68.

Line 79 – No you can not do larger reads. Each line of the encoded attachment needs to be 57 bytes.

In my example, I will only get at most 1 attachment because I pass in the asset_id and that is defined as the primary key. But if the query at line 48 was changed to possibly return more than a single row, the code would still work and we would have an email with multiple attachments.

Enhanced APEX Shuttle Item

April 28, 2009 34 comments

Recently I have been really, really busy building some internal applications for my company. All of them using Oracle and APEX of course. To satisfy some of application’s requirements, and just for usability sake, I had to tweek the default behavior of the APEX shuttle item. The reason was the initial LOV that populated the shuttle contained over 500 values. About 490+ too many as far as I was concerned. The users felt the same way. They complained that it was too hard to find what they were looking for, even though the list was in alphabetical order and I agreed. So, I decided to add the functionality to allow the user to constrain the values in left-hand side of the shuttle. Similar to the Popup LOVs with a filter option, but I wanted to use AJAX to accomplish it to avoid a page submit. After a bunch of playing around and optimization, here is what I came up with:
Username/Password: demo/demo

I was surprised that it was not that hard to accomplish and there are very few moving parts. All you need is a javascript function to make the AJAX call, an On Demand process to fetch only those values that meet the filter’s constraint and a way to trigger it.

How It Works.

As the user types in the textbox the shuttle automagically filters itself. I use an onKeyUp event on the textbox P10_SEARCH_BY to trigger the AJAX call.

onKeyUp="populateShuttle( this,'P10_SHUTTLE');"

So everytime a KeyUp event happens in that textbox, a call to the populateShuttle function happens. That function is responsible for making the AJAX call to the On-Demand Process, which fetches the appropriate values, and then parsing the results and repopulating the left hand side of the shuttle.

function populateShuttle(filter,shuttleName)
  var right = $x(shuttleName+"_RIGHT");

  for ( var i=0;i<right .length; i++ )
    right.options&#91;i&#93;.selected = true;

  var req = new htmldb_Get( null, 19903, 
                          'APPLICATION_PROCESS=populateShuttle', 0 );

  req.addParam( 'x01', $v(right) );
  req.addParam( 'x02', $v(filter) );

  var resp = eval(req.get());

  var left = $x(shuttleName+"_LEFT");

  if (resp)
    for ( var i=0; i<resp.length; i++ )
      left.options&#91;i&#93; = new Option(resp&#91;i&#93;.data, resp&#91;i&#93;.id);	

  req = null;

Although APEX treats the shuttle as a single element, it is in fact made up of many HTML elements, 10 images and 2 multiselect list boxes.  With a little poking around the HTML of a generated APEX page with a shuttle item on it, I was able to determine that the actual DOM names of the multiselect list boxes of a shuttle were <strong>[SHUTTLE_NAME]_LEFT and <strong>[SHUTTLE_NAME]_RIGHT</strong>.   How convenient.  Accessing the actual elements was then trivial.  I just used the APEX supplied function <strong>$x()</strong> function to get the elements.

The first thing I do is get the right textbox and loop over its values, selecting each one.  This is so when I fetch its value, I get all the elements in the list.  I need this information so when I fetch the values for the left hand side of the shuttle, based on the user's filter, I do NOT bring back any values that have already been moved to the right hand side of the shuttle.  In my example say you had already selected KING and moved him to the right.  Then you type a 'K' in the filter_by textbox, you will only get back CLARK and BLAKE even though KING also meets the constraint of containing a 'K'.
Next, I get the value of the right hand side of the shuttle and the value of the filter_by textbox, using the <strong>$v()</strong> function and send the values to the On Demand process via the APEX global variables <strong>g_x01</strong> and <strong>g_x02</strong>.

The On Demand process is just a simple PL/SQL block that is a query loop.

  l_selected long  := wwv_flow.g_x01;
  l_search long := wwv_flow.g_x02;
  for c in (
    select ename, empno, rownum r
      from emp
     where regexp_like( ename, nvl(l_search,ename), 'i' )
       and nvl(instr(':'||l_selected||':',':'||empno||':'),0)=0 )
    htp.prn( case when c.r&gt;1 then',' else null end ||
             '{ id: ' || c.empno ||
             ', data: "' || c.ename || '"}' );
  end loop;

It fetches all the values that match the filter

where regexp_like( ename, nvl(l_search,ename), 'i' )

but not those already be selected by the user.

and nvl(instr(':'||l_selected||':',':'||empno||':'),0)=0)

Now it’s just a simple process of looping over the rows, packaging them up and sending them back. I have become a big fan of JSON ( JavaScript Object Notation ) , so that how I am packaging up the payload to pass back. Basically, I’m making the results into a javascript array of objects. Each object has an id and data element. The payload look something like this

[{id: 7839, data: "KING"}{id: 7698, data: "BLAKE"} ... ]

This makes it super simple easy to work with once it gets back to the calling javascript function. All you need to do is apply an eval() on it and its transformed into a true javascript array of objects that can be easily looped over and referenced.

In one line of code, I call the On Demand process, getting back the JSON packed result and eval() it.

var resp = eval(req.get());

Now the variable resp is an Array of objects with two elements in each object, id and data. Looping over the values and populating the left hand side of the shuttle is trivial now.

  if (resp)
    for ( var i=0; i<resp.length; i++ )
      left.options[i] = new Option(resp[i].data, resp[i].id);

And there you have it. I hope I have shown that with some minor tweeks, you can extend the functionality of the base APEX items to be much more functional and user friendly. As always, comments ( good and bad ) are always welcome. Read more…

Categories: APEX, Oracle, PL/SQL, SQL Tags: , , , ,

Report Hierarchical Data With APEX

September 5, 2008 12 comments

Anyone familiar with Oracle and hearing the word hierarchical immediately thinks of the sql CONNECT BY clause.  And I would bet that when they think about displaying this data, they would use some sort of tree widget.  APEX has a built-in tree widget and for many applications, it works fine.  

The drawback to the built-in tree widget is that it brings back the entire dataset on the initial page render.  Good and Bad.  Saves round trips to the server, but could take a lot of initial time depending on the size of the data.  Showing/hiding branches of the tree with many elements also can make the application feel sluggish.  

On my current project, I needed to build an interface against just such a dataset.  So I needed an alternative to the built-in tree.  I looked at either writing my own tree using javascript and AJAX or using an already developed one.  Neither one appealed to me.  They both added development complexity and maintaining them weighed on my final decision.

What I chose to do was build a breadcrumb-like report displaying the path traversed and a second report displaying the elements at the current level.  All default APEX functionality.  Simple clean interface.  Easily developed and easily maintained.  You can check it out here.

Let me quickly walk you through how it was built:

1. Create a report on the base table constrained by the hidden item.  The query should look something like:

select *
  from emp
 where nvl(mgr,-1) = :p5_empno

I then linked the ENAME column back to the same page passing the EMPNO value into :P5_EMPNO.
2. Create a hidden Item to hold the manager’s id.  I put it in the step 1’s report region and called it P5_EMPNO and set its default value to -1.

3. Create a PATH report to manage the traversing the data.  This is where the magic happens.  I make use of the SYS_CONNECT_BY_PATH() function in conjunction with the START WITH…CONNECT BY clause. The query I used was:

select '<a href="f?p=' || :app_id || ':5:' ||
         :app_session || 
         '::::p5_empno:-1">Top</a> >>> ' || 
             '<a href="f?p=' || :app_id || ':5:' ||
             :app_session || '::::p5_empno:' || empno ||
             '">' || ename || '</a>', ' : ' ), 4 ) path 
  from emp 
 where empno = :p5_empno 
 start with mgr is null 
connect by prior empno = mgr

Some other tweeks to this region. No Pagination. Report Template of Value Attribute Pairs. Layout above the first report region. No region template.

4. ???

5. Profit!

Now you can use the main report to drill into the children of the row you selected, all the while maintaining the context of where you are in the hierarchy with the path.

It’s a quick simple technique, and 100% APEX. No additional javascript libraries or custom coding to accomplish a clean and simple interface. And it will be easily maintained, either by you, or by the developer that comes after you.

Let me know what you think.

Categories: APEX, SQL Tags: , , , ,