PL/SQL String Tokenizer

Posted by jt - 03/05/09 at 07:05 am

One of my goals for Log4Ora is to make it easy to use.   Has the ability for you to set the module name for logging.  This is handy in the logging so you can see where in the code the log message came from.  Downside of the Log4PLSQL implementation is you have to set the value manually.  Meaning, it might not be set, or forgotten to be reset, or is incorrect all together due to a cut an paste error.

In Log4Ora, I wanted this data, but I wanted to take the developer out of it, and fetch the information automatically via reflection.  But, PL/SQL does not have reflection.  Chet pointed me to an ‘Ask Tom’ utility, which does reflection, kind of.   Here is a link to the source code for ‘who_called_me’.

Tom’s utility fetches the program name from the output of the Oracle supplied function dbms_utility.format_callstack.  Here is an example of what the Oracle tool will produce:

----- PL/SQL Call Stack -----
object      line  object
handle    number  name
0x9f4ff770        27  package body LOG4ORA.LOG4_CORE
0x9f4ff770       243  package body LOG4ORA.LOG4_CORE
0x9f4ff770       215  package body LOG4ORA.LOG4_CORE
0x94df0178        23  package body LOG4ORA.LOG4
0xa7950748         5  package body JT.PACKAGE_A
0x7a259278         2  anonymous block

I setup a test of Tom’s utility, and it failed.  I traced the problem back to a substring function.

  IF ( cnt = 3 ) THEN
    lineno := TO_NUMBER(SUBSTR( line, 13, 6 ));
    line   := SUBSTR( line, 21 );

I imagine this worked fine on a previous version of Oracle.  But running in 11g, the columns of text had shifted.   The design was fragile and tied the program logic to specific positions in the string.  Should the string format change, it would break.

Java has a very handy String Tokenizer class which is used to convert a line of words into an array of words.  This allows you to easily parse space, comma, pipe, or tab separated values into an array.   I wanted to do something similar here.  PL/SQL does not have a built in function for this.  (Hey Oracle, this would be nice for a future release!)

After a little search on google, I found exactly what I was looking for here.  Oracle added Regular Expressions in the 10g release.  This is a very powerful tool.  The fifth comment on the blog post had exactly the function I was looking for.  It takes a string and the separater value and returns an array.

    FUNCTION Tokenizer (p_string IN VARCHAR2, p_separators IN VARCHAR2)
    RETURN DBMS_SQL.varchar2s IS
        l_token_tbl DBMS_SQL.varchar2s;
        pattern VARCHAR2(250);
        BEGIN
 
        pattern := '[^(' || p_separators || ')]+' ;
 
            SELECT   REGEXP_SUBSTR (p_string, pattern, 1, LEVEL) token
              BULK   COLLECT
              INTO   l_token_tbl
              FROM   DUAL
             WHERE   REGEXP_SUBSTR (p_string, pattern, 1, LEVEL) IS NOT NULL
        CONNECT BY   REGEXP_INSTR (p_string, pattern, 1, LEVEL) > 0;
 
        RETURN l_token_tbl;
    END Tokenizer;

With this in place, I no longer need the substring function. My logic changes to something like this:

                vTokens := tokenizer(line, ' ');
 
                -- module name will shift due to word count of type
                IF vTokens.LAST = 5 THEN
                     vModule_type := vTokens(3) || ' ' || vTokens(4);
                     vModule := vTokens(5);
                ELSE
                    vModule_type := vTokens(3);
                    vModule := vTokens(5);
                END IF;

My code is still very much tied to the string created by Oracle. But in this version, it is no longer sensitive to changes in white space or word size. Plus I think this is a little more straight forward an easier to work with.

Share

2 Responses to “PL/SQL String Tokenizer”

  1. Mark says:
    July 27th, 2009 at 11:20 am

    So what did you end up with for a function to return the name of the calling module name?

  2. Jessie says:
    March 1st, 2012 at 3:10 pm

    What do you recommend doing with larger strings? I believe there is a limit of 4000 characters for the tokenizer. It will generate a message

    SQL Error: ORA_01460: unimplemented or unreasonable conversion requested

Leave a Reply