Home > Oracle, PL/SQL, SQL > Parsing a CSV file in PL/SQL

Parsing a CSV file in PL/SQL

The ability to parse a CSV file in PL/SQL seems like a simple requirement and one would think that you could either a) easily implement it yourself, or b) find some examples of it on the web.   Well if you have tried option A, you probably realized it gets real tricky when you have commas and double quotes in your actual data as well as having them as your deliminators and optionally enclosed by characters as well.  Plus all that substr‘ing and instr‘ing can really hurt your head after a while.  If you tried option B, then you probably discovered that there are some solutions out there, but they all seems to either incomplete, or just overly complex.

So I decided to write my own simple, yet complete CSV parser in PL/SQL.  It handles all data, both optionally enclosed by some character or not, as well as both DOS (CR+LF) and UNIX (LF only) end-of-line file formats.  And all this in less than 100 lines of code (with comments) and with only three distinct calls to substr() and NO calls to instr().

I wanted to share this in hopes that others find it useful.

create or replace procedure parse_csv(
  p_clob clob,
  p_delim varchar2 default ',',
  p_optionally_enclosed varchar2 default '"' ) is
  CARRIAGE_RETURN constant char(1) := chr(13);
  LINE_FEED constant char(1) := chr(10);
  l_char char(1);
  l_lookahead char(1);
  l_pos number := 0;
  l_token varchar2(32767) := null;
  l_token_complete boolean := false;
  l_line_complete boolean := false;
  l_new_token boolean := true;
  l_enclosed boolean := false;
  l_lineno number := 1;
  l_columnno number := 1;


    -- increment position index
    l_pos := l_pos + 1;

    -- get next character from clob
    l_char := dbms_lob.substr( p_clob, 1, l_pos);

    -- exit when no more characters to process
    exit when l_char is null or l_pos > dbms_lob.getLength( p_clob );

    -- if first character of new token is optionally enclosed character
    -- note that and skip it and get next character
    if l_new_token and l_char = p_optionally_enclosed then
      l_enclosed := true;
      l_pos := l_pos + 1;
      l_char := dbms_lob.substr( p_clob, 1, l_pos);
    end if;
    l_new_token := false;

    -- get look ahead character
    l_lookahead := dbms_lob.substr( p_clob, 1, l_pos+1 );

    -- inspect character (and lookahead) to determine what to do
    if l_char = p_optionally_enclosed and l_enclosed then

      if l_lookahead = p_optionally_enclosed then
        l_pos := l_pos + 1;
        l_token := l_token || l_lookahead;
      elsif l_lookahead = p_delim then
        l_pos := l_pos + 1;
        l_token_complete := true;
        l_enclosed := false;
      end if;

    elsif l_char in ( CARRIAGE_RETURN, LINE_FEED ) and NOT l_enclosed then
      l_token_complete := true;
      l_line_complete := true;

      if l_lookahead in ( CARRIAGE_RETURN, LINE_FEED ) then
        l_pos := l_pos + 1;
      end if;

    elsif l_char = p_delim and not l_enclosed then
      l_token_complete := true;

    elsif l_pos = dbms_lob.getLength( p_clob ) then
      l_token := l_token || l_char;
      l_token_complete := true;
      l_line_complete := true;

      l_token := l_token || l_char;
    end if;

    -- process a new token
    if l_token_complete then
      dbms_output.put_line( 'R' || l_lineno || 'C' || l_columnno || ': ' ||
                            nvl(l_token,'**null**') );
      l_columnno := l_columnno + 1;
      l_token := null;
      l_enclosed := false;
      l_new_token := true;
      l_token_complete := false;
    end if;

    -- process end-of-line here
    if l_line_complete then
      dbms_output.put_line( '-----' );
      l_lineno := l_lineno + 1;
      l_columnno := 1;
      l_line_complete := false;
    end if;
  end loop;
end parse_csv;

And here is a little test procedure to show it working. I have made the end-of-line different for each like to demonstrate this will work with all EOL terminators. In real-life (I hope) your CSV file will have just one.

  l_clob clob :=
    -- DOS EOL
    'A,B,C,D,E,F,G,H,I' || chr(13) || chr(10) ||
    -- Apple up to OS9 EOL
    '1,"2,3","1""2","""4,",",5"' || chr(13) ||
    -- Acorn BBD and RISC OS EOL
    '6,"this is a ""test",""",8","9"",","10,"""' || chr(10) || chr(13) ||
    -- Unix and OS X EOL
    'normal,"commas,,,in the field","""enclosed""","random "" double "" quotes","commas,,, "" and double """" quotes"' || chr(10) ||
    -- Line with EOF only
  parse_csv( l_clob );

And when I run it I get…

R1C1: A
R1C2: B
R1C3: C
R1C4: D
R1C5: E
R1C6: F
R1C7: G
R1C8: H
R1C9: I
R2C1: 1
R2C2: 2,3
R2C3: 1"2
R2C4: "4,
R2C5: ,5
R3C1: 6
R3C2: this is a "test
R3C3: ",8
R3C4: 9",
R3C5: 10,"
R4C1: normal
R4C2: commas,,,in the field
R4C3: "enclosed"
R4C4: random " double " quotes
R4C5: commas,,, " and double "" quotes
R5C1: ,F"
R5C2: **null**
R5C3: **null**
R5C4: **null**
R5C5: abcde

I think I have covered all the bases and possibilities for parsing a CSV file. You can easily modify the code to store the tokens as rows in a table or push them into an Apex collection for further processing later. I just used dbms_output.put_line() to show it working.

Give it a try and let me know if you find a case that this code does not handle.


Categories: Oracle, PL/SQL, SQL Tags: , , , ,
  1. Dan
    April 3, 2012 at 6:29 pm

    Hi Chris,

    Did you see that APEX 4.1 now has this built in? It’s called Data Upload and it’s awesome


  2. Christopher Beck
    April 3, 2012 at 6:35 pm

    Hey Dan,

    Yup, I saw it and have use it in the past. But what I am currently working on could not take advantage of it so I needed to implement my own parser for non-apex related projects. I know, *gasp*, non-apex? Is there such a thing. 😉


  3. April 3, 2012 at 7:33 pm

    why is p_clob a varchar2 and not a clob? especially since you use dbms_lob calls on it later. You’ll be forcing the weight of implicit conversions for lob manipulations but still restricted to 32K of input

    p_clob varchar2,

    • Christopher Beck
      April 3, 2012 at 7:35 pm

      Opps. Nice catch. Just a typo on my part. Should be a clob. I’ll fix it. Thanks for pointing that out.

  4. April 3, 2012 at 7:53 pm

    glad to help, one thing I did for my clob parser was to pull out 32K varchar2 chunks and parse those using varchar2 functions rather than dbms_lob functions.

    I had to code extra steps to look for delimiters within my chunk so I wouldn’t accidentally split a field; but, even with that extra work I got 20-25 times speed improvement.

    So yes, it would add complexity and force additional instr/substr calls but you might want to consider it if you have to do lots of parsing on large targets.

    • Matthew Layton
      October 22, 2015 at 4:50 pm

      Hi Sean,

      We used Chris’s code for our development, but for the volume we’re loading its not quick enough. Have you got the code you made to pull out in chunks that you could share?

      Long shot but hope you can help.


  5. April 4, 2012 at 3:00 am

    Another way to skin this cat is to use External Tables, e.g.:

    CREATE TABLE mytable
    ( FIELD01 VARCHAR2(1000 BYTE)
    , FIELD02 VARCHAR2(1000 BYTE)
    , FIELD03 VARCHAR2(1000 BYTE)
    , FIELD04 VARCHAR2(1000 BYTE)
    , FIELD05 VARCHAR2(1000 BYTE)
    LOGFILE MYLOGDIR:’myfile.csv.log’
    BADFILE MYLOGDIR:’myfile.csv.bad’
    LOCATION (MYDIR:’myfile.csv’)

    Mind you, this method requires you to know what the line endings will be ahead of time.

    • Christopher Beck
      April 4, 2012 at 11:59 am


      I love external tables, used them plenty, but they are limited in that they are static. They need to be created prior to loading. You need to have CREATE DIRECTORY priv to access the file. You need a way to get the file from the user to the filesystem of the database server. UTL_FILE could write it once in the database, sure, but, in many cases, is not allowed by the customer. I don’t think you could use external tables on apex.oracle.com. So I offered up this solution to serve as a generic solution in any case. If you have all the access to the server you need, and you know more about the the CSV being parsed, then external tables may be the way to go. But in the generic case, not really feasible.

  6. Rich
    April 4, 2012 at 4:09 am

    What would the advantage of this be over simply using an external table to process the csv file?

    • Christopher Beck
      April 4, 2012 at 12:18 pm


      See my response to Jeffery above. External tables are fine, if you have the access and privs to create them. In many cases, customers will not allow you either.

  7. April 4, 2012 at 7:47 am

    Hi Chris, check out my blog post on this topic from a couple of years ago:


    The latest version of the code can be found in the Alexandria Utility Library for PL/SQL:


    – Morten

    • Christopher Beck
      April 4, 2012 at 12:14 pm


      I don’t believe your code handles optionally enclosed values, but only handles comma separated values. I needed a parser that accounted for CSV data that was optionally enclosed with some character, delimited by another, and potentially have the data within each cell contain either or both of those characters as well. So although I did find you code when looking for solutions on the web, the parser was just too limited for my needs.

      • April 4, 2012 at 12:39 pm


        As I pointed out, the latest version of the csv_util_pkg package can be found in the Alexandria library, and this does indeed support optionally enclosed values.

        I just tested it with your example data:

        select *
        from table(csv_util_pkg.clob_to_csv(‘normal,”commas,,,in the field”,”””enclosed”””,”random “” double “” quotes”,”commas,,, “” and double “””” quotes”‘))

        And this splits the data into 5 columns:

        c001 = normal
        c002 = commas,,,in the field
        c003 = “enclosed”
        c004 = random ” double ” quotes
        c005 = commas,,, ” and double “” quotes

        (I suppose I should remove the older code from the blog post and just direct people to download the latest library code.)

        – Morten

      • Christopher Beck
        April 4, 2012 at 1:33 pm

        Morten, Yea, that would have been better. The blog examples seem to show it only handling the simple cases. Wish I had realized. Probably would not have written my version then. Oh well. Thanks for pointing that out.

  8. July 11, 2012 at 1:03 pm

    This was very helpful! Thanks Chris!

  9. July 11, 2012 at 2:41 pm

    My file upload procedure creates a BLOB data type. Does this procedure require CLOB, if so, how would I convert it.

    • Christopher Beck
      July 11, 2012 at 3:47 pm

      You could use a procedure like this to convert it.

        function blob_to_clob( p_lob in blob ) return clob is
           l_clob_result   clob := 'X';
           l_dest_offsset integer := 1;
           l_src_offsset  integer := 1;
           l_lang_context integer := dbms_lob.default_lang_ctx;
           l_warning      integer;
           if p_lob is not null and length(p_lob) > 0 then
              dbms_lob.converttoclob(dest_lob     => l_clob_Result,
                                     src_blob     => p_lob,
                                     amount       => dbms_lob.lobmaxsize,
                                     dest_offset  => l_dest_offsset,
                                     src_offset   => l_src_offsset,
                                     blob_csid    => dbms_lob.default_csid,
                                     lang_context => l_lang_context,
                                     warning      => l_warning);
              if l_warning != 0 then
                 dbms_output.put_line('Function blob_to_clob warning:' || l_warning);
                 return null;
              end if;
              return l_clob_result;
              return null;
           end if;
           when others then
              dbms_output.put_line('Function blob_to_clob error:' || SQLCODE);
              return null;
        end blob_to_clob;
  10. November 17, 2012 at 8:11 am

    Hi Chris,

    First, thanks for sharing this piece of code. It comes useful to me.I have one remark.
    In the definition of CSV it is possible to enclose CR and LF characters within a field. When I offer such a content to the parser, it created a new line at the point of the CR and LF.
    Does your parser support this CSV feature?


    • Christopher Beck
      February 21, 2013 at 12:14 am

      Just fixed it. I modified line 56.

  11. Danny Mathis
    February 14, 2013 at 1:26 am

    Hey Chris,

    Awesome code! Though it doesn’t look like it correctly parses multi-line values. For instance:

    l_clob :=
    ‘a,b,c,”d’ || ch(13) || ch(10) || ‘more”, e’;

    The values should be: (a), (b), (c), (dCRLFmore), (e)
    But are instead: (a), (b), (c), (d), (more”), (e)

    I believe that’s legal, according to this pseud-spec: http://tools.ietf.org/html/rfc4180#page-2

    Let me know if you think I’m mistaken.

    Thanks again!
    – Danny

    • Christopher Beck
      February 21, 2013 at 12:10 am

      Good catch. Seems I missed that possibility. But it’s an easy fix. Just change line 56 to

      elsif l_char in ( CARRIAGE_RETURN, LINE_FEED ) and NOT l_enclosed then

      That will then handle it.

      1 declare
      2 l_clob clob :=
      3 'a,b,c,"d' || chr(13) || chr(10) || 'more", e';
      4 begin
      5 parse_csv( l_clob );
      6* end;
      SQL> /
      R1C1: a
      R1C2: b
      R1C3: c
      R1C4: d
      R1C5: e

      PL/SQL procedure successfully completed.

  12. March 18, 2014 at 2:36 pm

    Not working when last token is enclosed in double quotes, then it will not move to next rows, but token will contail EOL – see this example:

    set serveroutput on
    l_clob clob :=
    ‘”Email”,”Phone”,Manager Email’||chr(10)||
    ‘”danciu.anca.elena@oracle.com”,” +40 21 367 8461″,”alexandra.popa@oracle.com”‘||chr(10)

    parse_csv( l_clob );

    First line is ok, but then everything is on one line.

    This helped to fix it:

    — add aditional brach after line 53 to handle enclosed tokens at end of line
    elsif l_lookahead IN ( CARRIAGE_RETURN, LINE_FEED ) THEN
    l_enclosed := false;
    END IF;

    • Christopher Beck
      March 18, 2014 at 3:33 pm

      Great catch. Not sure how this is was not discovered before. But the check after line 53 should just be an else

      l_enclosed := false;

      Doesn’t matter what l_lookahead is at that point. If l_char = p_optionally_enclosed and the l_lookahead p_optionally_enclosed or p_delim, then you are no longer enclosed, so l_enclosed := false.

  13. March 28, 2014 at 5:34 pm


  14. shailesh
    April 1, 2014 at 4:01 pm

    Thanks for the amazing code. Just one question will this process a csv file with a blank line between the records. Is it a valid scenerio?

    • Christopher Beck
      April 1, 2014 at 5:59 pm


      2 l_clob clob :=
      3 'A,B,C' || chr(13) || chr(10) ||
      4 chr(10) || chr(13) ||
      5 'D,E,F' || chr(10) || chr(13);
      6 begin
      7 parse_csv( l_clob );
      8 end;
      9 /

      R1C1: A
      R1C2: B
      R1C3: C
      R2C1: **null**
      R3C1: D
      R3C2: E
      R3C3: F

      PL/SQL procedure successfully completed.

  15. Richard
    November 3, 2014 at 1:01 pm

    Hi, I implemented this, and it works great… my main problem however is speed.. With a 20,000 line CSV file, it takes nearly 8 minutes to extract… What can I do to make this significantly faster? Someone mentioned varchar2 functions, but I’m not sure how you would do it

    • Christopher Beck
      November 3, 2014 at 3:49 pm

      Well I know in at least 12c and I believe back in 11g as well, substr() and length() are overloaded to support clobs as well as varchar2. So instead of using dbms_lob.substr() and dbms_lob.getLength(), you could try using the native built-ins and see if you get a performance increase? I’d be interested in what you find? I may give it a try myself if I can find some time.

    • Christopher Beck
      November 3, 2014 at 3:53 pm

      Also, in reviewing at the code, I could have been more efficient and determined the length of p_clob once, stored it in a local variable and then referenced that variable instead of calling dbms_lob.getLength() over and over again.

    • Christopher Beck
      November 3, 2014 at 4:52 pm

      Well I just ran a test on the code as it stands above ( just commented out the dbms_outpu.put_line calls since I did not want to include that processing in the overall timing ) for 20,000 rows and it ran in 113 seconds in oracle 12c, And that is before making any of the changes.

      I used…

      l_clob clob;
      l_start_time number;
      for i in 1 .. 20000 loop
      l_clob := l_clob || ‘11111,22222,333333,444444,555555,’ ||
      ‘666666,777777,888888,999999,101010101010’ ||
      ‘aaaaa,bbbbb,ccccc,ddddd,eeeee,fffff,ggggg’ ||
      ‘hhhhh,iiiiiii,jj,kkkkkkkkkkkk,llllllllllll,m’ || chr(10);
      end loop;
      l_start_time := dbms_utility.get_time;
      parse_csv( l_clob );
      dbms_output.put_line( (dbms_utility.get_time – l_start_time)/100 );

      Then I quickly made the changes I specified in the comments above and it ran in 69 seconds, so it looks like you will get an significant increase in performance using the built-ins. But the longer the line being processed, the longer it will take to parse since you will perform that many more calls to substr().

  16. Mark Wooldridge
    November 5, 2015 at 9:38 pm

    Thanks for posting a very elegant solution. I can post the version of your code I updated to use a buffer but wanted to get your approval first. I would really like to get your input on my buffered implementation and if you find any potential problems or if you have suggestions for a better approach using a buffer.
    I had to add a section to catch if the cr was found at the end of a buffer and if the first character of the next buffer would be the line feed

    running oracle 11g using virtualbox

    converted blob – clob size: 14583797
    –* line count: 88701
    –* parse time: 689.42

    original changing dbms_lob.substr to substr
    converted blob – clob size: 14583797
    –* line count: 88701
    –* parse time: 1451.97

    converted blob – clob size: 14583797
    –* line count: 88701
    –* parse time: 13.9

    • Christopher Beck
      November 9, 2015 at 4:34 pm

      Send me your code and I can take a look at it.

  17. pungaret
    December 31, 2015 at 10:07 pm

    wondering if you had a chance to review the buffered solution mark sent.
    I’m trying to find a solution that can parse 30K rows.

    • Christopher Beck
      January 12, 2016 at 6:29 pm

      I did and I’m working on the last of the testing and hope to have it published by end-of-week.

  18. Rick
    January 11, 2016 at 11:09 pm

    Chris – Can you please share the latest code that uses buffer approach?

    • Christopher Beck
      January 12, 2016 at 6:28 pm

      I’m trying. I know it’s been awhile, but been swamped with work. Hoping to finish up testing and publish by end-of-week.

  19. January 13, 2016 at 1:37 am

    how wide is your data? I’ve written code to parse a clob and I’ve tested it on texts as large as the King James Bible and War and Peace. It’s not too hard to get decent performance if no single line is greater than 32K.

    • January 13, 2016 at 4:14 am

      I just realized that’s kind of a tease. The code I’m referring to can be found here:

      using the split_clob function there and str2tbl (Kyte’s or others) to parse each line into rows.
      The queries below first parse the bible (4,452,066 characters) in two ways 100,222 lines
      and then again into 851,334 words.

      Of course this is somewhat artificial because I’m not “doing” anything with the results, I’m simply pumping them as fast as sqlplus will read them.

      But, the point here was to illustrate the parsing speed, the performance of what is done with the data will of course vary on the app.

      I hope it helps!

      Oracle Database 12c Enterprise Edition Release – 64bit Production
      With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

      SQL> set timing on
      SQL> set autotrace traceonly
      SQL> set arraysize 5000
      SQL> select x.* from etexts,table(split_clob(text)) x
      2 where etexts.name = ‘King James Bible’;

      100222 rows selected.

      Elapsed: 00:00:00.56

      Execution Plan
      Plan hash value: 2639684787

      | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
      | 0 | SELECT STATEMENT | | 8168 | 845K| 32 (0)| 00:00:01 |
      | 1 | NESTED LOOPS | | 8168 | 845K| 32 (0)| 00:00:01 |
      |* 2 | TABLE ACCESS FULL | ETEXTS | 1 | 104 | 3 (0)| 00:00:01 |
      | 3 | COLLECTION ITERATOR PICKLER FETCH| SPLIT_CLOB | 8168 | 16336 | 29 (0)| 00:00:01 |

      Predicate Information (identified by operation id):

      2 – filter(“ETEXTS”.”NAME”=’King James Bible’)

      29 recursive calls
      0 db block gets
      219 consistent gets
      1524 physical reads
      0 redo size
      4576710 bytes sent via SQL*Net to client
      764 bytes received via SQL*Net from client
      22 SQL*Net roundtrips to/from client
      0 sorts (memory)
      0 sorts (disk)
      100222 rows processed

      SQL> select y.* from etexts,table(split_clob(text)) x,TABLE(str2tbl(x.COLUMN_VALUE,’ ‘)) y
      2 where etexts.name = ‘King James Bible’;

      851334 rows selected.

      Elapsed: 00:00:02.76

      Execution Plan
      Plan hash value: 3493471110

      | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
      | 0 | SELECT STATEMENT | | 66M| 6871M| 221K (1)| 00:00:09 |
      | 1 | NESTED LOOPS | | 66M| 6871M| 221K (1)| 00:00:09 |
      | 2 | NESTED LOOPS | | 8168 | 845K| 32 (0)| 00:00:01 |
      |* 3 | TABLE ACCESS FULL | ETEXTS | 1 | 104 | 3 (0)| 00:00:01 |
      | 4 | COLLECTION ITERATOR PICKLER FETCH| SPLIT_CLOB | 8168 | 16336 | 29 (0)| 00:00:01 |
      | 5 | COLLECTION ITERATOR PICKLER FETCH | STR2TBL | 8168 | 16336 | 27 (0)| 00:00:01 |

      Predicate Information (identified by operation id):

      3 – filter(“ETEXTS”.”NAME”=’King James Bible’)

      23 recursive calls
      2 db block gets
      250 consistent gets
      1524 physical reads
      0 redo size
      5356930 bytes sent via SQL*Net to client
      2414 bytes received via SQL*Net from client
      172 SQL*Net roundtrips to/from client
      0 sorts (memory)
      0 sorts (disk)
      851334 rows processed

  20. Miroslav Kapusta
    May 10, 2019 at 2:12 pm

    Thank you for this utility.
    I have one remark. To work this with UTF-8 data you have to define char variables like this:

    l_char char(1 char);
    l_lookahead char(1 char);

  21. March 19, 2020 at 9:42 pm

    Chris, your little utility here was just what I needed. I’ve adapted to my needs, and it works great; thank you!

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: