The ability to parse a CSV file in PL/SQL seems like a simple requirement and one would think that you could either a) easily implement it yourself, or b) find some examples of it on the web. Well if you have tried option A, you probably realized it gets real tricky when you have commas and double quotes in your actual data as well as having them as your deliminators and optionally enclosed by characters as well. Plus all that substr‘ing and instr‘ing can really hurt your head after a while. If you tried option B, then you probably discovered that there are some solutions out there, but they all seems to either incomplete, or just overly complex.
So I decided to write my own simple, yet complete CSV parser in PL/SQL. It handles all data, both optionally enclosed by some character or not, as well as both DOS (CR+LF) and UNIX (LF only) end-of-line file formats. And all this in less than 100 lines of code (with comments) and with only three distinct calls to substr() and NO calls to instr().
I wanted to share this in hopes that others find it useful.
create or replace procedure parse_csv( p_clob clob, p_delim varchar2 default ',', p_optionally_enclosed varchar2 default '"' ) is -- CARRIAGE_RETURN constant char(1) := chr(13); LINE_FEED constant char(1) := chr(10); -- l_char char(1); l_lookahead char(1); l_pos number := 0; l_token varchar2(32767) := null; l_token_complete boolean := false; l_line_complete boolean := false; l_new_token boolean := true; l_enclosed boolean := false; -- l_lineno number := 1; l_columnno number := 1; begin loop -- increment position index l_pos := l_pos + 1; -- get next character from clob l_char := dbms_lob.substr( p_clob, 1, l_pos); -- exit when no more characters to process exit when l_char is null or l_pos > dbms_lob.getLength( p_clob ); -- if first character of new token is optionally enclosed character -- note that and skip it and get next character if l_new_token and l_char = p_optionally_enclosed then l_enclosed := true; l_pos := l_pos + 1; l_char := dbms_lob.substr( p_clob, 1, l_pos); end if; l_new_token := false; -- get look ahead character l_lookahead := dbms_lob.substr( p_clob, 1, l_pos+1 ); -- inspect character (and lookahead) to determine what to do if l_char = p_optionally_enclosed and l_enclosed then if l_lookahead = p_optionally_enclosed then l_pos := l_pos + 1; l_token := l_token || l_lookahead; elsif l_lookahead = p_delim then l_pos := l_pos + 1; l_token_complete := true; else l_enclosed := false; end if; elsif l_char in ( CARRIAGE_RETURN, LINE_FEED ) and NOT l_enclosed then l_token_complete := true; l_line_complete := true; if l_lookahead in ( CARRIAGE_RETURN, LINE_FEED ) then l_pos := l_pos + 1; end if; elsif l_char = p_delim and not l_enclosed then l_token_complete := true; elsif l_pos = dbms_lob.getLength( p_clob ) then l_token := l_token || l_char; l_token_complete := true; l_line_complete := true; else l_token := l_token || l_char; end if; -- process a new token if l_token_complete then dbms_output.put_line( 'R' || l_lineno || 'C' || l_columnno || ': ' || nvl(l_token,'**null**') ); l_columnno := l_columnno + 1; l_token := null; l_enclosed := false; l_new_token := true; l_token_complete := false; end if; -- process end-of-line here if l_line_complete then dbms_output.put_line( '-----' ); l_lineno := l_lineno + 1; l_columnno := 1; l_line_complete := false; end if; end loop; end parse_csv; /
And here is a little test procedure to show it working. I have made the end-of-line different for each like to demonstrate this will work with all EOL terminators. In real-life (I hope) your CSV file will have just one.
declare l_clob clob := -- DOS EOL 'A,B,C,D,E,F,G,H,I' || chr(13) || chr(10) || -- Apple up to OS9 EOL '1,"2,3","1""2","""4,",",5"' || chr(13) || -- Acorn BBD and RISC OS EOL '6,"this is a ""test",""",8","9"",","10,"""' || chr(10) || chr(13) || -- Unix and OS X EOL 'normal,"commas,,,in the field","""enclosed""","random "" double "" quotes","commas,,, "" and double """" quotes"' || chr(10) || -- Line with EOF only '",F""",,,,abcde'; begin parse_csv( l_clob ); end; /
And when I run it I get…
R3C2: this is a "test
R4C2: commas,,,in the field
R4C4: random " double " quotes
R4C5: commas,,, " and double "" quotes
I think I have covered all the bases and possibilities for parsing a CSV file. You can easily modify the code to store the tokens as rows in a table or push them into an Apex collection for further processing later. I just used dbms_output.put_line() to show it working.
Give it a try and let me know if you find a case that this code does not handle.
Anyone familiar with Oracle and hearing the word hierarchical immediately thinks of the sql CONNECT BY clause. And I would bet that when they think about displaying this data, they would use some sort of tree widget. APEX has a built-in tree widget and for many applications, it works fine.
The drawback to the built-in tree widget is that it brings back the entire dataset on the initial page render. Good and Bad. Saves round trips to the server, but could take a lot of initial time depending on the size of the data. Showing/hiding branches of the tree with many elements also can make the application feel sluggish.
What I chose to do was build a breadcrumb-like report displaying the path traversed and a second report displaying the elements at the current level. All default APEX functionality. Simple clean interface. Easily developed and easily maintained. You can check it out here.
Let me quickly walk you through how it was built:
1. Create a report on the base table constrained by the hidden item. The query should look something like:
select * from emp where nvl(mgr,-1) = :p5_empno
I then linked the ENAME column back to the same page passing the EMPNO value into :P5_EMPNO.
2. Create a hidden Item to hold the manager’s id. I put it in the step 1’s report region and called it P5_EMPNO and set its default value to -1.
3. Create a PATH report to manage the traversing the data. This is where the magic happens. I make use of the SYS_CONNECT_BY_PATH() function in conjunction with the START WITH…CONNECT BY clause. The query I used was:
select '<a href="f?p=' || :app_id || ':5:' || :app_session || '::::p5_empno:-1">Top</a> >>> ' || substr( sys_connect_by_path( '<a href="f?p=' || :app_id || ':5:' || :app_session || '::::p5_empno:' || empno || '">' || ename || '</a>', ' : ' ), 4 ) path from emp where empno = :p5_empno start with mgr is null connect by prior empno = mgr
Some other tweeks to this region. No Pagination. Report Template of Value Attribute Pairs. Layout above the first report region. No region template.
Now you can use the main report to drill into the children of the row you selected, all the while maintaining the context of where you are in the hierarchy with the path.
Let me know what you think.