Hi i have a piece of code which i want to improve. The two methods are SplitTextArea and SplitGlossArea.
SplitTextArea, looks at the black and white parts of the page and uses this to separate sentances from eachother by detecting where the sentance is by the black text.
And the SplitGlossArea is to do with annotations and comments.
To improve the code i have changed the value of backColor1 to 200 from 245. This increases the number of locations considered to be "on average white". But this is not perfect on different types of images.
The program seems to struggle separating the test and the gloss and this seems to effect the text splitting. So i have looked at another method called "SetTextEdge", this works out the gap between the gloss and the text, by looking for the most common distance of first non-white from the left edge or the right edge, depending on whether it is a left or right page. The array "firstMode" records how many times each distance occurs, and the first most common distance is "posMode".
Some possible things i have planned to try (but have not got round to wondering if anyone out there can HELP!! )
1. to do some sort of rolling average as following (to print out the values in "firstMode" and adjust them so i can use them in the code to see what happened)
2. To consider only distances near where the text edge is expected (say between 100 and 200) rather than right.
Here are the SplitTextArea and SplitGlossArea methods::
And here is the code for the setTextEdge method.
Hope someone can help me.
Sorry about this being quite a long post
Reply as soon as you can
Thanks
Mark
SplitTextArea, looks at the black and white parts of the page and uses this to separate sentances from eachother by detecting where the sentance is by the black text.
And the SplitGlossArea is to do with annotations and comments.
To improve the code i have changed the value of backColor1 to 200 from 245. This increases the number of locations considered to be "on average white". But this is not perfect on different types of images.
The program seems to struggle separating the test and the gloss and this seems to effect the text splitting. So i have looked at another method called "SetTextEdge", this works out the gap between the gloss and the text, by looking for the most common distance of first non-white from the left edge or the right edge, depending on whether it is a left or right page. The array "firstMode" records how many times each distance occurs, and the first most common distance is "posMode".
Some possible things i have planned to try (but have not got round to wondering if anyone out there can HELP!! )
1. to do some sort of rolling average as following (to print out the values in "firstMode" and adjust them so i can use them in the code to see what happened)
2. To consider only distances near where the text edge is expected (say between 100 and 200) rather than right.
Here are the SplitTextArea and SplitGlossArea methods::
Code:
private static int setTextEdge(boolean leftPage)
{
int[] firstMode = new int[imageWidth + 1] ;
for (int i = 0 ; i <= imageWidth ; i++)
firstMode[i] = 0 ;
for (int i = 0 ; i < cropHeight ; i++)
{
int mx1 = 0 ;
if (leftPage)
while ((mx1 <cropWidth) &&
(getColour(leftEdge + mx1, topEdge + i) > 200))
mx1++ ;
else
while ((mx1 <cropWidth) &&
(getColour(leftEdge + cropWidth - mx1, topEdge + i) > 200))
mx1++ ;
if (mx1 < (cropWidth / 2))
firstMode[mx1]++ ;
}
int maxMode = -1,
posMode = -1 ;
for (int i = 0 ; i < imageWidth ; i++)
{
if (maxMode < firstMode[i])
{
maxMode = firstMode[i] ;
posMode = i ;
}
}
posMode -= 2 ;
if (leftPage)
{
message("left edge of text is at " + (leftEdge + posMode)) ;
return leftEdge + posMode ;
}
else
{
message("right edge of text is at " + (leftEdge + cropWidth - posMode)) ;
return leftEdge + cropWidth - posMode ;
}
} // end of method setTextEdge
private static void splitTextArea(boolean leftPage) throws IOException
{
int left = 0,
right = 0 ;
if (leftPage)
{
left = textEdge ;
right = leftEdge + cropWidth ;
}
else
{
left = leftEdge ;
right = textEdge ;
}
int rowWidth = right - left ;
// int backcolour1 = 253 ;
int backcolour1 = 200,
max_light_lines = 1,
max_dark_lines = 15,
lightLineCounter = 0,
darkLineCounter = 0 ;
boolean inLightZone = true,
inTextZone = false ;
int lastY = topEdge,
descender_space = 3 ;
// PrintWriter pout2 = new PrintWriter(new FileWriter(outDirectory1 +
// File.separator + "xxx.txt")) ;
noOfRows = 0 ;
int clearance = 40 ;
for (int i = topEdge + clearance ; i < topEdge + cropHeight - clearance ; i++)
{
int colour = getRowColour(left, right, i) ;
// pout2.println(i + ", " + colour) ;
if (colour >= backcolour1)
{
lightLineCounter++ ;
darkLineCounter-- ;
}
else
{
darkLineCounter++ ;
lightLineCounter-- ;
}
if (lightLineCounter > max_light_lines)
lightLineCounter = max_light_lines ;
if (lightLineCounter < 0)
lightLineCounter = 0 ;
if (darkLineCounter > max_dark_lines)
darkLineCounter = max_dark_lines ;
if (darkLineCounter < 0)
darkLineCounter = 0 ;
if ((lightLineCounter == max_light_lines) && (inTextZone))
{
inLightZone = true ;
inTextZone = false ;
int line = noOfRows + 1 ;
int rowHeight = i + descender_space - lastY ;
// message("(" + left + ", " + lastY + ", " + rowWidth + ", " +
// rowHeight + ")") ;
pout1.println("(" + left + ", " + lastY + ", " + rowWidth +
", " + rowHeight + ")") ;
rows[noOfRows] = new IntegerQuad(left, lastY, rowWidth, rowHeight) ;
noOfRows++ ;
lastY = i + descender_space ;
}
else if ((darkLineCounter == max_dark_lines) && (inLightZone))
{
inLightZone = false ;
inTextZone = true ;
}
}
// pout2.close() ;
message("no of lines = " + noOfRows) ;
pout1.println("No of Lines = " + noOfRows) ;
pout1.println() ;
} // end of method splitTextArea
private static void splitGlossArea(boolean leftPage)
{
int left = 0,
right = 0 ;
if (leftPage)
{
left = leftEdge ;
right = textEdge ;
}
else
{
left = textEdge ;
right = leftEdge + cropWidth ;
}
int rowWidth = right - left ;
if (rowWidth == 0)
{
message("WARNING: no gloss area found") ;
noOfGlossRows = 0 ;
return ;
}
// int backcolour1 = 253,
int backcolour1 = 245,
max_light_lines = 20,
max_dark_lines = 10,
lightLineCounter = 0,
darkLineCounter = 0 ;
boolean inLightZone = true,
inTextZone = false ;
int lastY = topEdge,
descender_space = 17,
ascender_space = 20 ;
noOfGlossRows = 0 ;
int clearance = 40 ;
for (int i = topEdge + clearance ; i < topEdge + cropHeight - clearance ; i++)
{
int colour = getRowColour(left, right, i) ;
if (colour >= backcolour1)
{
lightLineCounter++ ;
darkLineCounter-- ;
}
else
{
darkLineCounter++ ;
lightLineCounter-- ;
}
if (lightLineCounter > max_light_lines)
lightLineCounter = max_light_lines ;
if (lightLineCounter < 0)
lightLineCounter = 0 ;
if (darkLineCounter > max_dark_lines)
darkLineCounter = max_dark_lines ;
if (darkLineCounter < 0)
darkLineCounter = 0 ;
if ((lightLineCounter == max_light_lines) && (inTextZone))
{
inLightZone = true ;
inTextZone = false ;
int line = noOfGlossRows + 1 ;
int rowHeight = i + descender_space - lastY ;
// message("(" + left + ", " + lastY + ", " + rowWidth + ", " +
// rowHeight + ")") ;
pout1.println("(" + left + ", " + lastY + ", " + rowWidth +
", " + rowHeight + ")") ;
glossRows[noOfGlossRows] = new IntegerQuad(left, lastY, rowWidth,
rowHeight) ;
noOfGlossRows++ ;
lastY = i + descender_space ;
}
else if ((darkLineCounter == max_dark_lines) && (inLightZone))
{
inLightZone = false ;
inTextZone = true ;
// this overrides other values of lastY,
// to eliminate gloss whitespace
lastY = i - max_dark_lines - ascender_space ;
}
}
message("no of glosses = " + noOfGlossRows) ;
pout1.println("No of Glosses = " + noOfGlossRows) ;
pout1.close() ;
And here is the code for the setTextEdge method.
Code:
private static int setTextEdge(boolean leftPage)
{
int[] firstMode = new int[imageWidth + 1] ;
for (int i = 0 ; i <= imageWidth ; i++)
firstMode[i] = 0 ;
for (int i = 0 ; i < cropHeight ; i++)
{
int mx1 = 0 ;
if (leftPage)
while ((mx1 <cropWidth) &&
(getColour(leftEdge + mx1, topEdge + i) > 200))
mx1++ ;
else
while ((mx1 <cropWidth) &&
(getColour(leftEdge + cropWidth - mx1, topEdge + i) > 200))
mx1++ ;
if (mx1 < (cropWidth / 2))
firstMode[mx1]++ ;
}
int maxMode = -1,
posMode = -1 ;
for (int i = 0 ; i < imageWidth ; i++)
{
if (maxMode < firstMode[i])
{
maxMode = firstMode[i] ;
posMode = i ;
}
}
posMode -= 2 ;
if (leftPage)
{
message("left edge of text is at " + (leftEdge + posMode)) ;
return leftEdge + posMode ;
}
else
{
message("right edge of text is at " + (leftEdge + cropWidth - posMode)) ;
return leftEdge + cropWidth - posMode ;
}
} // end of method setTextEdge
Hope someone can help me.
Sorry about this being quite a long post
Reply as soon as you can
Thanks
Mark