Hi there,
Explanation about the problem, what the script is supposed to do and what it actually does.
We use doc scanners to automatically digitalize & sort, via OCR Zoning and a vbs to get pdfs on the correct subfolders on the server.
The script is supposed to verify if there is a valid content in that zone or not - if so - classify the file in the folder, if not, put it in a "to be sorted" folder.
When the document is scanned, 99% gets sorted perfectly.
For those documents that were rejected, they fall in specific "to be sorted" folder as TIFF an are re-injected manually. This is done when the initial process got interrupted because of timeouts, network errors etc.
My guess - some steps between the "receiving Text Zone" and the "Text zone after space trim". It seems to have different impact on the paper scan and the processed tiff file.
I'll attach you the rar with the txt files (The post only allows one file but it's helpful to see the difference between both cases)
The same script is used for paper-scans and the re-processing for the tiff-files that fall into a "rejected" folder for unknown failure reasons.
Oh, for explanation : our references are built like this : 22-08-0155-01 meaning year, month, filenumber and version. our server folders look like this
\\blablabla\missions\2022\08\0155\ <=== the file would be supposed to land here. 99% of the files land here after paper-scanning.
\\blablabla\missions\to be sorted\ <=== here's where literally everything lands wich is re-processed. I mean 10% would be okay because no input in OCR zoning etc, but currently, nothing gets sorted - BUT - see the logfile - you can see that the trimming works and the reference number is found but... "match not found."
Sorry for the tonns of text - as it is very specific I tought you might wanna have a lot of infos "isolating" the potential error source.
For the explanation of the rar:
Error log tiff scan - the log i ged when i re-process the document - see for the "no match found !" message. That's the issue I'm trying to figure out.
Success log paper scan - same script running - document scan - works like a bliss.
Script - the magic happens here
Many thanks to those who took the time reading my post and still have the courage to look into the script & log files !
TF
Explanation about the problem, what the script is supposed to do and what it actually does.
We use doc scanners to automatically digitalize & sort, via OCR Zoning and a vbs to get pdfs on the correct subfolders on the server.
The script is supposed to verify if there is a valid content in that zone or not - if so - classify the file in the folder, if not, put it in a "to be sorted" folder.
When the document is scanned, 99% gets sorted perfectly.
For those documents that were rejected, they fall in specific "to be sorted" folder as TIFF an are re-injected manually. This is done when the initial process got interrupted because of timeouts, network errors etc.
My guess - some steps between the "receiving Text Zone" and the "Text zone after space trim". It seems to have different impact on the paper scan and the processed tiff file.
I'll attach you the rar with the txt files (The post only allows one file but it's helpful to see the difference between both cases)
The same script is used for paper-scans and the re-processing for the tiff-files that fall into a "rejected" folder for unknown failure reasons.
Oh, for explanation : our references are built like this : 22-08-0155-01 meaning year, month, filenumber and version. our server folders look like this
\\blablabla\missions\2022\08\0155\ <=== the file would be supposed to land here. 99% of the files land here after paper-scanning.
\\blablabla\missions\to be sorted\ <=== here's where literally everything lands wich is re-processed. I mean 10% would be okay because no input in OCR zoning etc, but currently, nothing gets sorted - BUT - see the logfile - you can see that the trimming works and the reference number is found but... "match not found."
Sorry for the tonns of text - as it is very specific I tought you might wanna have a lot of infos "isolating" the potential error source.
For the explanation of the rar:
Error log tiff scan - the log i ged when i re-process the document - see for the "no match found !" message. That's the issue I'm trying to figure out.
Success log paper scan - same script running - document scan - works like a bliss.
Script - the magic happens here
Many thanks to those who took the time reading my post and still have the courage to look into the script & log files !
TF