Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Large File Problem 1

Status
Not open for further replies.

clk430

Programmer
Sep 21, 2005
235
US
OK, so we have a once a month file that can be anywhere from 400mb to 4gb from a trading partner (recieve transaction). I discovered that Mercator choked on the last file of 400MB, so I want to play with the pageCount and PageSize settings.

Default is:
PageSize - 64KB (max of 1024)
PageCount - 8 (max size of 9999)

Would I have to simply calculate pageSize X pageCount = 400MB or are there other ways to take care of this?
 
Probably the easiest way is to copy the map and import it into a V8.0 design studio. From here, using a standard input file, you can use a wizard that determines the best settings for you. You can note these settings and reimport it to your standard build map.

Otherwise it is trial and error for pagesize/count. Increase pagesize if its a complex type tree and pagecount if the file is large.

Often with large files or output word cards its best not to use the sink option but instead use !create. We have found out of memory failures

Tim
 
The only V8.0 we have is tomato juice....not design studio.

I guess I'll just place around with the pageCount like you said because the TT is not that complex.
 
ugh, nothing is working. I think 400mb is just too big.
 
OK, I know others have had more success than I have playing with pagesize/count. I've only realized minimal gain after loads of effort. V8.0 should change that. Its supposed to be able to determine optimal sizing. Also, it can determine where you may have performance issues.

In your case, I suggest you look into using burst mode and evaluate both your map and tree structures.

Certain map designs will consume resources fast. For example,(and this is very simplified) one of our first maps passed the entire file to the 1st level f_map, then half of that to the next and so on. In your case assuming you don't pass any additional data to the f_maps, for a 400mb file you'll have consumed in excess of 1000mb of resources. Add to that additional f_maps and /or output cards and boom! failure.

In this case we redefined the type tree and passed down much smaller data chunks. Then turned on burst mode, stopped having problems and reduced overall processing time for the file in general. I mocked up a 2GB file and tested it on a w2k server with 2gb mem and a 1.2 ghz cpu. Ran fine.

Other things you can try...

--When working with large files minimize the use of SINK files. Define temporary outputs as file: !create.

--If executing a runmap with large files don't 'echoin' the data. Pass files as addresses, ' -if1 '.

--If you are woring with large files don't have one map do all of the work unless you have to. I've seen maps that sort, concatinate common record/groups, removed trailing filler, split 1 large file into multiple smaller files and so on.

There's probably a hundred ways to skin this.... If you're able, can you give a simple definition of your data structure? Lets see what people have to say.
 
Well, the map is already set up using SINK files and a plethora of echoin statements. It actually is a simple map that brings in this file and simply splits it (copies it) and routes it to two folders. One output card write to a temp folder and the other write to a archive. And it runs out of memory on the 2nd card output.


But like you said, There's probably a hundred ways to skin this....so, I'll look into all your advice and fine tune it/rewrite accordingly. I really appreciate all your help.
 
If you can test using !create instead of 'sink'.... you might get past the second card.

"...One output card write to a temp folder and the other write to a archive. And it runs out of memory on the 2nd card output...."

Is the map using a rule to 'PUT' the data to each location? or it the output file & location defined as the output card settings?

Probably a dumb question, but, assuming you are archiving the output in the second card, are you archiving output card #1 or processing the data a second time?
 
Well, this map has one input parameter (400mb file) that is the file (on source event).

It also has three output cards:
Out Card #1 settings = PUT-Target-Sink
Out Card #2 settings = PUT-Target-Sink
Out Card #3 settings = PUT-Target-File . Basically just builds a trigger file.

In the map, the inpt card is the bytestream file.

For the output cards, the only difference between the first two is the destination (Archive folder vs. Temp)
The first out card command is:

=RUN("MyCopy","-WM -IE1S" + NUMBERTOTEXT( SIZE(File1)) + " " + File1 + "-OF1 " + "**our directory**\Archive\" + SUBSTITUTE(GETFILENAME(File1), GETDIRECTORY(File1), "") + ".d" + RIGHT((LEAVEALPHANUM(DATETOTEXT(CURRENTDATE()))),2) +LEFT((LEAVEALPHANUM(DATETOTEXT(CURRENTDATE()))),4) + ".t" +LEAVEALPHANUM(TIMETOTEXT(CURRENTTIME())))

The second out card is:

=RUN("MyCopy","-WM -IE1S" + NUMBERTOTEXT( SIZE(File1)) + " " + File1 + "-OF1 " + "**our directory**\Temp\" + SUBSTITUTE(GETFILENAME(File1), GETDIRECTORY(File1), "") + ".d" + RIGHT((LEAVEALPHANUM(DATETOTEXT(CURRENTDATE()))),2) +LEFT((LEAVEALPHANUM(DATETOTEXT(CURRENTDATE()))),4) + ".t" +LEAVEALPHANUM(TIMETOTEXT(CURRENTTIME())))

And the last one is simply:
=trigger


To me, it looks like we are processing/validating/building the data twice just to rename it.
I'm not sure what -WM -IE1S and -OF1 means yet. ugh.
 
and the !create...I can't seem to find that parameter in the event server map settings.
 
target rule\put\target=file 'file_path\file_name'
onsuccess\!create
onfailure\(either rollback or create)


hmmm.. would this help? add a second output card to "MyCopy" maybe call it 'Archive'. Archive/OC2 rule would be '=OC1'. That way you aren't processing twice.

=RUN("MyCopy",
"-WM -IE1S" + NUMBERTOTEXT( SIZE(File1)) + " " + File1 +
" -OF1 " + "**our directory**\Archive\" +
SUBSTITUTE(GETFILENAME(File1), GETDIRECTORY(File1), "") +
".d" + RIGHT((LEAVEALPHANUM(DATETOTEXT(CURRENTDATE()))),2) +
LEFT((LEAVEALPHANUM(DATETOTEXT(CURRENTDATE()))),4) +
".t" +
LEAVEALPHANUM(TIMETOTEXT(CURRENTTIME())) +
" -OF2 " + "**our directory**\Temp\" +
SUBSTITUTE(GETFILENAME(File1), GETDIRECTORY(File1), "") +
".d" + RIGHT((LEAVEALPHANUM(DATETOTEXT(CURRENTDATE()))),2) +
LEFT((LEAVEALPHANUM(DATETOTEXT(CURRENTDATE()))),4) +
".t" +
LEAVEALPHANUM(TIMETOTEXT(CURRENTTIME())))



Assuming File1 is your input and is a physical file this might free up ram

" -IF1 " + getresourcename( File1 )

instead of
" -IE1S " + NUMBERTOTEXT( SIZE(File1)) + " " + File1


also you could try

VALID(RUN("MyCopy",
" -IF1" + NUMBERTOTEXT( SIZE(File1)) + " " + File1 +
" -OF1 " + "**our directory**\Archive\" +
SUBSTITUTE(GETFILENAME(File1), GETDIRECTORY(File1), "") +
".d" + RIGHT((LEAVEALPHANUM(DATETOTEXT(CURRENTDATE()))),2) +
LEFT((LEAVEALPHANUM(DATETOTEXT(CURRENTDATE()))),4) +
".t" +
LEAVEALPHANUM(TIMETOTEXT(CURRENTTIME())) +
" -OF2 " + "**our directory**\Temp\" +
SUBSTITUTE(GETFILENAME(File1), GETDIRECTORY(File1), "") +
".d" + RIGHT((LEAVEALPHANUM(DATETOTEXT(CURRENTDATE()))),2) +
LEFT((LEAVEALPHANUM(DATETOTEXT(CURRENTDATE()))),4) +
".t" +
LEAVEALPHANUM(TIMETOTEXT(CURRENTTIME()))),
("Failure on RUN - MyCopy (" + TEXT ( LASTERRORCODE ( ) ) + "):" + LASTERRORMSG ( ) ) ) ))

 
SORRY, you'll have to test this anyway but I think the rule should have been...

VALID(RUN("MyCopy",
" -IF1 " + getdirectory(File1) +
" -OF1 " + "**our directory**\Archive\" +
SUBSTITUTE(GETFILENAME(File1), GETDIRECTORY(File1), "") +
".d" + RIGHT((LEAVEALPHANUM(DATETOTEXT(CURRENTDATE()))),2) +
LEFT((LEAVEALPHANUM(DATETOTEXT(CURRENTDATE()))),4) +
".t" +
LEAVEALPHANUM(TIMETOTEXT(CURRENTTIME())) +
" -OF2 " + "**our directory**\Temp\" +
SUBSTITUTE(GETFILENAME(File1), GETDIRECTORY(File1), "") +
".d" + RIGHT((LEAVEALPHANUM(DATETOTEXT(CURRENTDATE()))),2) +
LEFT((LEAVEALPHANUM(DATETOTEXT(CURRENTDATE()))),4) +
".t" +
LEAVEALPHANUM(TIMETOTEXT(CURRENTTIME()))),
("Failure on RUN - MyCopy (" + TEXT ( LASTERRORCODE ( ) ) + "):" + LASTERRORMSG ( ) ) ) ))
 
EyeTry, it worked! Well, kinda of. I used WU instead of WM and !create as well.

The map competed successfully and the file was placed in the correct folder, however it had just had 0 bytes - in other words, the 400mb of data disapered. :)

Any ideas why it would create an empty file?
 
Nope. A simple rename of the file name and move to a different folder. My relationship with Mercator is of love and hate.
 
Set StreamMaxMemLimit in the /ini file to 10 and make sure the path is valid. Page size and count settings are for performance, not for file size.

Depending on version and platform, there could be a 2 gig limit to the total size of data being handled, or, if on UNIX you could have ulimit or system setting issues.

Best way to handle large files is to use burst mode if the file is burstable. You might want to use a map to break up the large file into smaller sub files, before doing the complete validation and processing.



BocaBurger
<===========================||////////////////|0
The pen is mightier than the sword, but the sword hurts more!
 
Was there any interesting info in MyCopy's log file? Were the source file byte sizes what you'd expect? If your map is bursting what were you burst settings?

If you are using functional maps to process the input confirm that all of the f-map input requirements are being met. For example, if 3 of 4 inputs contain data your f-map will not be executed. depending on where that condition occurs you'd have a zero byte output file.

----------------------------------------------------------
Hey, BTW......

RIGHT((LEAVEALPHANUM(DATETOTEXT(CURRENTDATE()))),2)

might be rewritten as

FROMDATETIME(CURRENTDATE(), "DD")



 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top