Parsing a CSV file in PL/SQL
The ability to parse a CSV file in PL/SQL seems like a simple requirement and one would think that you could either a) easily implement it yourself, or b) find some examples of it on the web. Well if you have tried option A, you probably realized it gets real tricky when you have commas and double quotes in your actual data as well as having them as your deliminators and optionally enclosed by characters as well. Plus all that substr‘ing and instr‘ing can really hurt your head after a while. If you tried option B, then you probably discovered that there are some solutions out there, but they all seems to either incomplete, or just overly complex.
So I decided to write my own simple, yet complete CSV parser in PL/SQL. It handles all data, both optionally enclosed by some character or not, as well as both DOS (CR+LF) and UNIX (LF only) end-of-line file formats. And all this in less than 100 lines of code (with comments) and with only three distinct calls to substr() and NO calls to instr().
I wanted to share this in hopes that others find it useful.
create or replace procedure parse_csv( p_clob clob, p_delim varchar2 default ',', p_optionally_enclosed varchar2 default '"' ) is -- CARRIAGE_RETURN constant char(1) := chr(13); LINE_FEED constant char(1) := chr(10); -- l_char char(1); l_lookahead char(1); l_pos number := 0; l_token varchar2(32767) := null; l_token_complete boolean := false; l_line_complete boolean := false; l_new_token boolean := true; l_enclosed boolean := false; -- l_lineno number := 1; l_columnno number := 1; begin loop -- increment position index l_pos := l_pos + 1; -- get next character from clob l_char := dbms_lob.substr( p_clob, 1, l_pos); -- exit when no more characters to process exit when l_char is null or l_pos > dbms_lob.getLength( p_clob ); -- if first character of new token is optionally enclosed character -- note that and skip it and get next character if l_new_token and l_char = p_optionally_enclosed then l_enclosed := true; l_pos := l_pos + 1; l_char := dbms_lob.substr( p_clob, 1, l_pos); end if; l_new_token := false; -- get look ahead character l_lookahead := dbms_lob.substr( p_clob, 1, l_pos+1 ); -- inspect character (and lookahead) to determine what to do if l_char = p_optionally_enclosed and l_enclosed then if l_lookahead = p_optionally_enclosed then l_pos := l_pos + 1; l_token := l_token || l_lookahead; elsif l_lookahead = p_delim then l_pos := l_pos + 1; l_token_complete := true; else l_enclosed := false; end if; elsif l_char in ( CARRIAGE_RETURN, LINE_FEED ) and NOT l_enclosed then l_token_complete := true; l_line_complete := true; if l_lookahead in ( CARRIAGE_RETURN, LINE_FEED ) then l_pos := l_pos + 1; end if; elsif l_char = p_delim and not l_enclosed then l_token_complete := true; elsif l_pos = dbms_lob.getLength( p_clob ) then l_token := l_token || l_char; l_token_complete := true; l_line_complete := true; else l_token := l_token || l_char; end if; -- process a new token if l_token_complete then dbms_output.put_line( 'R' || l_lineno || 'C' || l_columnno || ': ' || nvl(l_token,'**null**') ); l_columnno := l_columnno + 1; l_token := null; l_enclosed := false; l_new_token := true; l_token_complete := false; end if; -- process end-of-line here if l_line_complete then dbms_output.put_line( '-----' ); l_lineno := l_lineno + 1; l_columnno := 1; l_line_complete := false; end if; end loop; end parse_csv; /
And here is a little test procedure to show it working. I have made the end-of-line different for each like to demonstrate this will work with all EOL terminators. In real-life (I hope) your CSV file will have just one.
declare l_clob clob := -- DOS EOL 'A,B,C,D,E,F,G,H,I' || chr(13) || chr(10) || -- Apple up to OS9 EOL '1,"2,3","1""2","""4,",",5"' || chr(13) || -- Acorn BBD and RISC OS EOL '6,"this is a ""test",""",8","9"",","10,"""' || chr(10) || chr(13) || -- Unix and OS X EOL 'normal,"commas,,,in the field","""enclosed""","random "" double "" quotes","commas,,, "" and double """" quotes"' || chr(10) || -- Line with EOF only '",F""",,,,abcde'; begin parse_csv( l_clob ); end; /
And when I run it I get…
R3C2: this is a "test
R4C2: commas,,,in the field
R4C4: random " double " quotes
R4C5: commas,,, " and double "" quotes
I think I have covered all the bases and possibilities for parsing a CSV file. You can easily modify the code to store the tokens as rows in a table or push them into an Apex collection for further processing later. I just used dbms_output.put_line() to show it working.
Give it a try and let me know if you find a case that this code does not handle.