Archive

Posts Tagged ‘SQL’

Parsing a CSV file in PL/SQL

The ability to parse a CSV file in PL/SQL seems like a simple requirement and one would think that you could either a) easily implement it yourself, or b) find some examples of it on the web.   Well if you have tried option A, you probably realized it gets real tricky when you have commas and double quotes in your actual data as well as having them as your deliminators and optionally enclosed by characters as well.  Plus all that substr‘ing and instr‘ing can really hurt your head after a while.  If you tried option B, then you probably discovered that there are some solutions out there, but they all seems to either incomplete, or just overly complex.

So I decided to write my own simple, yet complete CSV parser in PL/SQL.  It handles all data, both optionally enclosed by some character or not, as well as both DOS (CR+LF) and UNIX (LF only) end-of-line file formats.  And all this in less than 100 lines of code (with comments) and with only three distinct calls to substr() and NO calls to instr().

I wanted to share this in hopes that others find it useful.

create or replace procedure parse_csv(
  p_clob clob,
  p_delim varchar2 default ',',
  p_optionally_enclosed varchar2 default '"' ) is
  --
  CARRIAGE_RETURN constant char(1) := chr(13);
  LINE_FEED constant char(1) := chr(10);
  --
  l_char char(1);
  l_lookahead char(1);
  l_pos number := 0;
  l_token varchar2(32767) := null;
  l_token_complete boolean := false;
  l_line_complete boolean := false;
  l_new_token boolean := true;
  l_enclosed boolean := false;
  --
  l_lineno number := 1;
  l_columnno number := 1;

begin

  loop
    -- increment position index
    l_pos := l_pos + 1;

    -- get next character from clob
    l_char := dbms_lob.substr( p_clob, 1, l_pos);

    -- exit when no more characters to process
    exit when l_char is null or l_pos > dbms_lob.getLength( p_clob );

    -- if first character of new token is optionally enclosed character
    -- note that and skip it and get next character
    if l_new_token and l_char = p_optionally_enclosed then
      l_enclosed := true;
      l_pos := l_pos + 1;
      l_char := dbms_lob.substr( p_clob, 1, l_pos);
    end if;
    l_new_token := false;

    -- get look ahead character
    l_lookahead := dbms_lob.substr( p_clob, 1, l_pos+1 );

    -- inspect character (and lookahead) to determine what to do
    if l_char = p_optionally_enclosed and l_enclosed then

      if l_lookahead = p_optionally_enclosed then
        l_pos := l_pos + 1;
        l_token := l_token || l_lookahead;
      elsif l_lookahead = p_delim then
        l_pos := l_pos + 1;
        l_token_complete := true;
      else
        l_enclosed := false;
      end if;

    elsif l_char in ( CARRIAGE_RETURN, LINE_FEED ) and NOT l_enclosed then
      l_token_complete := true;
      l_line_complete := true;

      if l_lookahead in ( CARRIAGE_RETURN, LINE_FEED ) then
        l_pos := l_pos + 1;
      end if;

    elsif l_char = p_delim and not l_enclosed then
      l_token_complete := true;

    elsif l_pos = dbms_lob.getLength( p_clob ) then
      l_token := l_token || l_char;
      l_token_complete := true;
      l_line_complete := true;

    else
      l_token := l_token || l_char;
    end if;

    -- process a new token
    if l_token_complete then
      dbms_output.put_line( 'R' || l_lineno || 'C' || l_columnno || ': ' ||
                            nvl(l_token,'**null**') );
      l_columnno := l_columnno + 1;
      l_token := null;
      l_enclosed := false;
      l_new_token := true;
      l_token_complete := false;
    end if;

    -- process end-of-line here
    if l_line_complete then
      dbms_output.put_line( '-----' );
      l_lineno := l_lineno + 1;
      l_columnno := 1;
      l_line_complete := false;
    end if;
  end loop;
end parse_csv;
/

And here is a little test procedure to show it working. I have made the end-of-line different for each like to demonstrate this will work with all EOL terminators. In real-life (I hope) your CSV file will have just one.

declare
  l_clob clob :=
    -- DOS EOL
    'A,B,C,D,E,F,G,H,I' || chr(13) || chr(10) ||
    -- Apple up to OS9 EOL
    '1,"2,3","1""2","""4,",",5"' || chr(13) ||
    -- Acorn BBD and RISC OS EOL
    '6,"this is a ""test",""",8","9"",","10,"""' || chr(10) || chr(13) ||
    -- Unix and OS X EOL
    'normal,"commas,,,in the field","""enclosed""","random "" double "" quotes","commas,,, "" and double """" quotes"' || chr(10) ||
    -- Line with EOF only
    '",F""",,,,abcde';
begin
  parse_csv( l_clob );
end;
/

And when I run it I get…

R1C1: A
R1C2: B
R1C3: C
R1C4: D
R1C5: E
R1C6: F
R1C7: G
R1C8: H
R1C9: I
-----
R2C1: 1
R2C2: 2,3
R2C3: 1"2
R2C4: "4,
R2C5: ,5
-----
R3C1: 6
R3C2: this is a "test
R3C3: ",8
R3C4: 9",
R3C5: 10,"
-----
R4C1: normal
R4C2: commas,,,in the field
R4C3: "enclosed"
R4C4: random " double " quotes
R4C5: commas,,, " and double "" quotes
-----
R5C1: ,F"
R5C2: **null**
R5C3: **null**
R5C4: **null**
R5C5: abcde
-----

I think I have covered all the bases and possibilities for parsing a CSV file. You can easily modify the code to store the tokens as rows in a table or push them into an Apex collection for further processing later. I just used dbms_output.put_line() to show it working.

Give it a try and let me know if you find a case that this code does not handle.

Enjoy.

Categories: Oracle, PL/SQL, SQL Tags: , , , ,

Report Hierarchical Data With APEX

September 5, 2008 12 comments

Anyone familiar with Oracle and hearing the word hierarchical immediately thinks of the sql CONNECT BY clause.  And I would bet that when they think about displaying this data, they would use some sort of tree widget.  APEX has a built-in tree widget and for many applications, it works fine.  

The drawback to the built-in tree widget is that it brings back the entire dataset on the initial page render.  Good and Bad.  Saves round trips to the server, but could take a lot of initial time depending on the size of the data.  Showing/hiding branches of the tree with many elements also can make the application feel sluggish.  

On my current project, I needed to build an interface against just such a dataset.  So I needed an alternative to the built-in tree.  I looked at either writing my own tree using javascript and AJAX or using an already developed one.  Neither one appealed to me.  They both added development complexity and maintaining them weighed on my final decision.

What I chose to do was build a breadcrumb-like report displaying the path traversed and a second report displaying the elements at the current level.  All default APEX functionality.  Simple clean interface.  Easily developed and easily maintained.  You can check it out here.

Let me quickly walk you through how it was built:

1. Create a report on the base table constrained by the hidden item.  The query should look something like:

select *
  from emp
 where nvl(mgr,-1) = :p5_empno

I then linked the ENAME column back to the same page passing the EMPNO value into :P5_EMPNO.
 
2. Create a hidden Item to hold the manager’s id.  I put it in the step 1’s report region and called it P5_EMPNO and set its default value to -1.

3. Create a PATH report to manage the traversing the data.  This is where the magic happens.  I make use of the SYS_CONNECT_BY_PATH() function in conjunction with the START WITH…CONNECT BY clause. The query I used was:

select '<a href="f?p=' || :app_id || ':5:' ||
         :app_session || 
         '::::p5_empno:-1">Top</a> >>> ' || 
         substr(
           sys_connect_by_path( 
             '<a href="f?p=' || :app_id || ':5:' ||
             :app_session || '::::p5_empno:' || empno ||
             '">' || ename || '</a>', ' : ' ), 4 ) path 
  from emp 
 where empno = :p5_empno 
 start with mgr is null 
connect by prior empno = mgr

Some other tweeks to this region. No Pagination. Report Template of Value Attribute Pairs. Layout above the first report region. No region template.

4. ???

5. Profit!

Now you can use the main report to drill into the children of the row you selected, all the while maintaining the context of where you are in the hierarchy with the path.

It’s a quick simple technique, and 100% APEX. No additional javascript libraries or custom coding to accomplish a clean and simple interface. And it will be easily maintained, either by you, or by the developer that comes after you.

Let me know what you think.

Categories: APEX, SQL Tags: , , , ,