PL/SQL String Tokenizer
Posted by jt - 03/05/09 at 07:05 amOne of my goals for Log4Ora is to make it easy to use. Has the ability for you to set the module name for logging. This is handy in the logging so you can see where in the code the log message came from. Downside of the Log4PLSQL implementation is you have to set the value manually. Meaning, it might not be set, or forgotten to be reset, or is incorrect all together due to a cut an paste error.
In Log4Ora, I wanted this data, but I wanted to take the developer out of it, and fetch the information automatically via reflection. But, PL/SQL does not have reflection. Chet pointed me to an ‘Ask Tom’ utility, which does reflection, kind of. Here is a link to the source code for ‘who_called_me’.
Tom’s utility fetches the program name from the output of the Oracle supplied function dbms_utility.format_callstack. Here is an example of what the Oracle tool will produce:
----- PL/SQL Call Stack ----- object line object handle number name 0x9f4ff770 27 package body LOG4ORA.LOG4_CORE 0x9f4ff770 243 package body LOG4ORA.LOG4_CORE 0x9f4ff770 215 package body LOG4ORA.LOG4_CORE 0x94df0178 23 package body LOG4ORA.LOG4 0xa7950748 5 package body JT.PACKAGE_A 0x7a259278 2 anonymous block |
I setup a test of Tom’s utility, and it failed. I traced the problem back to a substring function.
IF ( cnt = 3 ) THEN lineno := TO_NUMBER(SUBSTR( line, 13, 6 )); line := SUBSTR( line, 21 ); |
I imagine this worked fine on a previous version of Oracle. But running in 11g, the columns of text had shifted. The design was fragile and tied the program logic to specific positions in the string. Should the string format change, it would break.
Java has a very handy String Tokenizer class which is used to convert a line of words into an array of words. This allows you to easily parse space, comma, pipe, or tab separated values into an array. I wanted to do something similar here. PL/SQL does not have a built in function for this. (Hey Oracle, this would be nice for a future release!)
After a little search on google, I found exactly what I was looking for here. Oracle added Regular Expressions in the 10g release. This is a very powerful tool. The fifth comment on the blog post had exactly the function I was looking for. It takes a string and the separater value and returns an array.
FUNCTION Tokenizer (p_string IN VARCHAR2, p_separators IN VARCHAR2) RETURN DBMS_SQL.varchar2s IS l_token_tbl DBMS_SQL.varchar2s; pattern VARCHAR2(250); BEGIN pattern := '[^(' || p_separators || ')]+' ; SELECT REGEXP_SUBSTR (p_string, pattern, 1, LEVEL) token BULK COLLECT INTO l_token_tbl FROM DUAL WHERE REGEXP_SUBSTR (p_string, pattern, 1, LEVEL) IS NOT NULL CONNECT BY REGEXP_INSTR (p_string, pattern, 1, LEVEL) > 0; RETURN l_token_tbl; END Tokenizer; |
With this in place, I no longer need the substring function. My logic changes to something like this:
vTokens := tokenizer(line, ' '); -- module name will shift due to word count of type IF vTokens.LAST = 5 THEN vModule_type := vTokens(3) || ' ' || vTokens(4); vModule := vTokens(5); ELSE vModule_type := vTokens(3); vModule := vTokens(5); END IF; |
My code is still very much tied to the string created by Oracle. But in this version, it is no longer sensitive to changes in white space or word size. Plus I think this is a little more straight forward an easier to work with.
A bunch of random technology stuff that has my attention. I work with a lot of Oracle, Java, and dabble with various open source software packages.
July 27th, 2009 at 11:20 am
So what did you end up with for a function to return the name of the calling module name?
March 1st, 2012 at 3:10 pm
What do you recommend doing with larger strings? I believe there is a limit of 4000 characters for the tokenizer. It will generate a message
SQL Error: ORA_01460: unimplemented or unreasonable conversion requested