Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

PureData (a.k.a. Netezza, PostgreSQL) ERROR parsing problem

Status
Not open for further replies.
Dec 12, 2007
19
US
If anyone has the time to read this problem statement and can tell me how to use Perl to read the input records, write them as is and then read those to then format the output as shown below, I would really appreciate it. We have a lot of business users spending a lot of time trying to solve syntax errors and if I could figure out how to read this log file and process it using Perl, I know how to take the output and email to our business users shortly after they have a syntax error and this may save a lot of people a lot of time. Thanks in advance.

Our company bought an appliance from IBM called PureData for our data warehouse. PureData used to be known as Netezza and it appears that the software was originally PostgreSQL. I guess IBM named it PureData so it would sound like Teradata. It may sound like it but that's about it.

When people at our company run SQL (Structured Query Language) using this relational database and get a syntax error, this software returns a message like this.

error found "A" (at char 195) expecting a keyword

The problem is that this error does not display enough of the SQL statement to quickly find the syntax error and so people are manually counting the character offset to find where the problem is. Imagine if this char 195 was char 1,202 ??? Needless to say this is taking people a lot of time.

I found some PostgreSQL log files and the records below show an example of an SQL stateuemtn that has an ERROR.

How do I read and format records like this into lines like the example I show below that will have 99 characters per line with numbered lines above each SQL line so we can quickly fine the character offset into our SQL and see where the syntax error is located ???

Here are the input records that I would like to read and write into a file and then the additional formatted output records with the character offset count above each line.

Input Records to Read and Write

2014-11-23 13:39:16.171213 EST [62534] ERROR: 'select distinct
a.plcy_nb,
b.vhcl_ky
from
edwp06.edw.plcy_trm as a,
edwp06.edw.vhcl as b
where
a.plcy_ky = b.plcy_ky and
a.plcy_evnt_ky = b.plcy_evnt_ky and
a.rtd_st_cd = 'SD'
a.plcy_efctv_typ_cd = 'N' and
date(a.isu_actvty_ts) between '2014-10-01' and '2014-10-31''
error ^ found "A" (at char 195) expecting a keyword

Additional output records that I would like to write with the SQL statement formatted with 99 characters per output line. The example below has to work for SQL statements that have thousands of characters if not tens of thousands because we have some business users who come up with pages and pages of complex analytical SQL that joins many tables and is quite complex at times. Solving syntax errors in long SQL statements is taking a very long time with this database and the type of ERROR information it returns. (Warning, when I entered this there are 9 spaces between the numbers on the 3 lines above the 4th number line and the SQL appears below the correct numbers but when I previewed this everything is pushed back to the left so some sort of font or something must be doing this. Hopefully, you can understand what my idea is to try to solve this problem.)

0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 1 2 3 4 5 6 7 8 9
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
EST 62534 ERROR 'select distinct a.plcy_nb, b.vhcl_ky from edwp06.edw.plcy_trm as a, edwp06.edw.vh
0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
cl as b where a.plcy_ky = b.plcy_ky and a.plcy_evnt_ky = b.plcy_evnt_ky and a.rtd_st_cd = 'SD' a.plc
0 0 0 0 0 0 0 0 0 0
2 2 2 2 2 2 2 2 2 2
0 1 2 3 4 5 6 7 8 9
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
y_efctv_typ_cd = 'N' and date(a.isu_actvty_ts) between '2014-10-01' and '2014-10-31''

Thanks again.
 
Give this a try. It's totally brute force, but for the sake of getting a job done over elegance --
Code:
#
# Reading in
#
undef %org_data ;
$line_num   = 0 ;
$char_count = 0 ;

while ($inline = <DATA>) {
   $line_str = substr('00000' . ++$line_num, -5) ;
   chomp $inline ;   # ?? Does your 'at char xxx' count return's ??

   # Your call if you want to count the charactes before or after
   # here.  Dump the data and see how it looks, then try the
   # other way.
   $org_data{$line_str}{'char_count'} = $char_count ;
   $org_data{$line_str}{'inline'}  = $inline ;

   $char_count += length($inline) ;

};

#
# Writing out
#
$buffer_100_chars = "" ;
@buffer_arr = () ;
foreach $line_str (sort keys %org_data) {
   $char_count = $org_data{$line_str}{'char_count'} ;
   $char_count = substr('00000' . $char_count, -5) ;

   $inline     = $org_data{$line_str}{'inline'} ;
   print "$line_str $char_count $inline\n" ;  # optional print to file handle

   # This may be overkill, but perl will have it's limits on
   # how many characters any given var chan hold.  Just taking
   # care of it in advance.
   $buffer_100_chars .= $inline ;
   while (length($buffer_100_chars) > 100) {
      $tmp_str = substr($buffer_100_chars, 0, 100) ;
      $buffer_100_chars = substr($buffer_100_chars, 100) ;
      push(@buffer_arr, $tmp_str) ;
   };
};
push(@buffer_arr, $buffer_100_chars) ; # tack on any remainder

$zero2ninetynine = "/123456789" x 10 ;
$tens = "" ; for ($i = 0; $i < 10 ; ++$i) { $tens .= "$i         " ;};
$tens =~ s/ +$// ;
$char_count = 0 ;
foreach $inline (@buffer_arr) {
   if ($char_count >= 1000) {
      $ref = int($char_count / 1000) % 10 ;
      $character_count_ref = "1000's    " . "$ref         " x 9 ;
      print "$character_count_ref\n" ;
   };
   if ($char_count >= 100) {
      $ref = int($char_count / 100) % 10 ;
      $character_count_ref = "100's     " . "$ref         " x 9 ;
      print "$character_count_ref\n" ;
   };
   print "$tens\n" ;
   print "$zero2ninetynine\n" ;
   print "$inline\n" ;
   $char_count += 100 ;
};
Caveats on this:
- I used <DATA> to read this in. You may have your own file handle. You may need to add open/close statements for your file.
- I see the original data ends in what appears to be keywords and/or recognizeable points where a trailing space may exist in the source but not in the saved file error file. This will throw character character count off.
- I used '/123456789' vs '0123456789' - For my eyes, the / is faster to see than 0
- Your posting appeared to strip some characters off the very first line of data (ie: "2014-11-23 13:39:16.171213" is missing). You may need to parse this off before going into the main read loop
- I gave this the basic works-for-me-your-results-may-vary type of testing.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top