Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

BatchFile Output redirection > with Wscript.shell

Status
Not open for further replies.

stanlyn

Programmer
Sep 3, 2003
945
US
Hi,

I'm having an issue running a batch file from Wscript.shell that has a redirector ">" to a file within. The batch file works perfectly if run from Explorer by double clicking it or command line with cmd. When running it via a VFP program, it creates an empty file (0kb). Here is the batch file contents
Code:
whois.exe  -v  yahoo.com >C:\temp\output.txt

Using VFP's run command produces the same result.

Also, breaking the redirect parameter out of the .bat file and passing it as an external parameter also produces the same result.

The only way that I've been able to get a non-empty output.txt file is outside of VFP at the command line or Explorer.

Environment is Windows 10pro64 and VFP9sp2.

Any ideas?

Thanks, Stanley
 
Chriss,
The initial response is nill, like 1 second before the 1st batch file is created and executed. It continues to create and execute the batch files with nowait as they are created. Once 500 are created, the program goes into cleanup mode by issuing adir() to get a list of the txt files into a cursor, then iterating and deleting them as well as the matching bat file that was processed by that run. Once the cleanup has finished, it starts another batch of 500 and continues this over and over.

I still need to get some controls over whether a file can be deleted safely by issuing the fopen() on it. If fopen() returns anything but a -1, then that file has processed and safe to delete. The problem I'm having, is it always returns a -1 on every file, even the finished ones where the corresponding whois.exe file is long gone from the process list.

One of my test was batching 100 names and the performance got worse, by a lot.

Any idea of why the -1 value is returned by fopen() on all processed and closed files?

See the slowness from the screenshot of the 100 batch list...

Thanks, Stanley

20211212_223412_pa2ow1.jpg

 
 https://files.engineering.com/getfile.aspx?folder=9a2540ef-3724-4a68-b942-76702e027909&file=20211212_221545.jpg
Are you starting the next batch only after all of the previous batch is processed? You should interleave adding processes and harvesting the result files.

If you get -1 every time from fopen, then I'd say it's because you don't fopen existing file names. Use fullpath. And you're not always creating the same file, are you?

Just a little manual check after running one whois.exe I can get a file handle, no problem.

Chriss
 
Also, looking at the timing from your screenshots, isn't 4040 after 7:39 (with 100 files per batch) better than 4004 after 20:14 min (with 500 per batch=? Even much faster.

As said, I would adapt. measure each single process time, in itself it has no big meaning as the response times vary so much, but when the average time rises add lower the concurrent processes, when it lowers, rise the process number, the nuber to optimize is domains/minute, obviously.

Chriss
 
stanlyn said:
Rick,
Thanks for the reply and offer. You are way over my head as I know nothing about C++ and calling their dlls. Also have never used bindevent before, but am aware of it. I still may have to start learning it if I cannot get this fully solved.

The DLL would be referenced like this:
Code:
* Declare the functions in the DLL one time in your app:
DECLARE INTEGER cmd_window_launch IN cmdwindow.dll ;
    STRING  cLaunchFilename, ;
    STRING  cLaunchFolder, ;
    INTEGER nHwndCallback, ;
    INTEGER nMessageId

DECLARE INTEGER cmd_window_get_mail IN cmdwindow.dll ;
    STRING  @cMail, ;
    INTEGER nMailLength, ;
    INTEGER nMailId

* Then reference it in your code:
cmd_window_launch("c:\path\to\my.exe", "c:\temp\", thisForm.HWnd, 0x400)

* And in your thisForm.init() code:
BINDEVENT(thisForm.HWnd, 0x400, thisForm, "my_callback_handler")

* And on your my_callback_handler() function:
LPARAMETERS tnHwnd, tnMessageId, tnMailId, tnLength
LOCAL lcMail

    * Get the mail message
    lcMail = SPACE(tnLength)
    cmd_window_get_mail(@lcMail, tnLength, tnMailId)

    * Append it to our incoming stream
    thisForm.cStdIn = thisForm.cStdIn + lcMail

Something like that would be how it would work. The code inside the DLL does all the hard work. Your app just receives the fruits of that labor.

--
Rick C. Hodgin
 
Stanlyn,

One easy way to measure the turnover time is:
1. Create the output file with STRTOFILE() empty
2. Use >> instead of > to let whois.exe append to the empty text file instead of creating a new one.
3. There now are several ways to determine file creation and last update time and the difference is the turnover time.

Rick,

sounds like an interesting DLL, in general, not only but also in this case.

Chriss
 
Chris Miller said:
Rick,
sounds like an interesting DLL, in general, not only but also in this case.

We haven't used it in a while. We've been using the modified 7-Zip DLL interface for a few years now so we haven't needed to run anything in the cmd shell. But until then, we had a few hundred customers using it daily.

--
Rick C. Hodgin
 
I see, Rick, the github solution I pointed to also uses a cmd.exe process, created in the line

Code:
= CreateProcess(THIS.GetSysDir() + "\" + MsdosShell, "",0,0, 1, 0,0, SYS(5)+SYS(2003), @cStartupInfo, @cProcInfo)

Where Startupinfo contains some duplicated pipe handles (that's what stdin andstdout are - pipes). But you don't even need the cmd.exe, you could also write your own console.exe that acts as the console inheriting the pipe handles and then, you can let it run whois and read the output in stdout.

This could be able to run several whois in parallel, but they'd all answer in the same stdout pipe and so their output might overlap and therefore I think you'd run multiple cmd.exe anyway.

Or is there some way to reduce that? Did you use this DLL to create several zip archives in parallel or unzip several files at once?

Chriss
 
Chris Miller said:
This could be able to run several whois in parallel, but they'd all answer in the same stdout pipe and so their output might overlap and therefore I think you'd run multiple cmd.exe anyway.

I don't think I've ever done that before, but I would presume it would be handled one of two ways:
1. A coordinator app which reads the many STDERR/STDOUT handles, and combines them into a common feed.
2. It works like you suggest.

I know things like Visual Studio will schedule multiple compiles in parallel and they all provide feedback to the same Output window, and if you run from the command line then to the same cmd window. So it's definitely doable. Just don't know how it would be done because I've never tried it.

I'll write a test app to see.

--
Rick C. Hodgin
 
Hi,

Here is an update.

For some reason my Win10Pro machine crashes everything including Windows and after several changes that may cause the issue, it still crashes and it's random. It may process 10000 records or maybe 18000 and all of a sudden it all goes idle until all the performance indicators flat lines and at that point the entire machine is unresponsive and must be hard rebooted.

Same code running on server 2019 does not crash and is still running after a couple hours now. It is currently doing 2000 records in a batch with no wait time between batches. So far, very solid... I will test other batch sizes later, but now I'm trying to get it stable and understand what is causing the crash on Win10Pro, The Win10Pro machine is a much faster machine, therefore the speed shown in the photo reflects the slowness compared to the previous photos.

Anyone have an idea on how to troubleshoot this.

Chriss, every batch file is different, and every batch file creates the output text file with the same name, which is the pk of the matching record so I can get the data back into the correct record. I use this to figure out what has finished and do cleanup.

Thanks Chriss for the tip to fix the -1 value issue. The -1 value returned from fopen() was indeed caused by not passing the full path. I incorrectly assumed that I would get a file not found type of error, but it doesn't.

Thanks, Stanley

Image...
Capture9_iu3ywg.png
 
And many hours later, it has proven to be stable at least on the server os...

Capture10_adkzc3.png
 
Stanlyn, I can't help you further if you still stay with your design to always start a batch of N processes and only vary N. Abandon that, it's not faster, I guess you even didn't notice that I pointed out your idea of what is better is just plain wrong.

Only run as many processes in parallel as you have cores in the first place, you don't have 100 cores, even less so 500 or 1000.

It's bound to fail, your server is likely only more stable as a server has more resources including more CPU cores.

Yes, most of the time each whois process only waits for the server response, but that waiting is still bound to one cpu core. Yes, the OS also runs more processes than there are cores because the OS can do multitasking by switching between process after a time slice, but since you say:

stanlyn said:
no wait time between batches

Doesn't that mean you start processes unless either the computer "says no"(crashes) or the more stable hardware or server OS saves you from the harm you're doing yourself.

Again, the very common idea of multiprocessing is orient yourself on the number of cores, create a pool of processes and a pool manager that starts up until the pool is fully used, then monitors finishing processes and adding new ones for them. This way it's stable.

I gave you another idea what to measure to decide the pool size. But what you're doing leads to crashing the system quite naturally. You also don't need to program a pool manager, it could simply be an array of a number of processes or batch files. You have a table of bat files to start, don't you? I suggest you simply read back what I already suggested and take it more serious.

Chriss
 
Chriss,

Chriss said:
Only run as many processes in parallel as you have cores in the first place, you don't have 100 cores, even less so 500 or 1000.

It's bound to fail, your server is likely only more stable as a server has more resources including more CPU cores.

No Chriss, It's not bound to fail... It's clear that we both are not understanding each other's processes.

My current process has now passed 318,000 processed records that started last night on the old Win 2019Std server hardware platform. As said before, the old non-failing server is much much less capable from the hardware perspective. The server is an old 1366 I7-920 with 8gb from 10+ years ago. The Win10Pro workstation is an I7-8700k with 12 cores and 16gb memory.

Processing Status
330k_lwv1lw.png


Not Failing Server
ServerSpec_qk0e34.png


Failing Workstation
WSspec_rlislj.png


If I understand you correctly, you are asking that I reduce the N down to less than 8 on the server or 12 on the workstation. Doing so will cause the performance to start matching the original serial design (do one, then next or one after the other) which translates to a max of 120 records per minute, or 2 per second. I am now getting about 480 per minute on the slow server. It takes 500 seconds to do 2000 records, which is .25 seconds per record, yielding 480 per minute.

To test your suggestion, I will be doing a run with N set to 12 on the 16 core machine and will take note on performance, cpu usage, responsiveness, and other metrics and report back. I will also be testing on another Win10Pro (less capable) machine to see if its a Win10 issue.

As said earlier, I'm now just throwing different work loads (the batch N value) for stability testing. I also know that is too high in production as I have seen slow responsiveness to other processes. I'm stress testing the code and resources. The old server is passing all tests in flying colors. The program is also passing with no errors on creating new records, doing lookups, parsing the returned values and looking up the matching records and updating them, and if successfull, it deletes both the .bat and .txt files.

Here is what the program is doing... Starting with an empty (0 row) domain table and a list table containing only words and phrases that is used to create a fqdn domain record in the domain table. We will be iterating the list table, creating a new domain row and populating it with only known values such as the fqdn, tld, timestamp, list source, pk and others. Other fields such as status, whois returned data, the size of that data, registrar, expiration will have to be updated later once the whois lookup has completed. This will be updated in the cleanup phase that occurs at N (currently set to 2000).

1. Start of iterating the list table.

2. while iterating, get list value, then create a new domain row and saves the fqdn based on the list record and creates the pk for that row. This pk is what is used later in creating the batch and output files.

3. The N value is number of list records to process, before we enter cleanup mode. We start processing the finished bat and txt files. Therefore the list table is iterated 2000 rows before entering processing mode. Note the whois.exe is executed immediately after they are created in the iteration process, so at this point the system has created 2000 new domain entries and have executed the whois on each of them when they were created and by the time we get to this cleanup mode, most of them has finished. There are never 2000 whois executions happening at the same time. Notice the word "termination" in the resource monitor. You can see that most are terminated. It may take 5 to 10 seconds for terminated items to fall off the list, however they are not consuming any resources.

4. Once we enter this final processing and cleanup mode, I use adir() to get a list of all .txt files from the output folder and store their names in a temp cursor, then start iterating them top to bottom, which first checks to see if it has finished by trying to open it via a fopen() command and if I can open it, we process it, as it had already finished. Note that if the .txt file cannot be fopen()ed, neither can the .bat as the .bat is its parent that is trying to fill the .txt file. If it has not finished (fopen() returns -1), I skip it as the next cleanup process can deal with it. The processing includes updating the matching domain row by looking for the domain.pk that equals the justfname() of the file found in the adir() array collection. Once the domain record is updated, I delete the cursor record, .bat and .txt file based on the justfname() as they are now full processed. The cursor of .txt file names and update process is iterated until eof('cursor') returns true. From the time the process enters this inner sub process of finishing and cleanup which is all 100% VFP, most if not all of the whois commands have finished. The resource monitor shows this as it drops to 20-40% from the high 90s when its doing all the whois concurrently. Testing is showing that a max of about 100 of the whois commands are ever concurrent and drops from there to zero. While these running whois commands are finishing/terminated/dropping, VFP is busy opening the .txt files and updating the matching domain record.

5. Once eof('cursor from adir()') returns true, we return to the main loop and continue by doing another batch of 2000/N list names, then repeat above process. At this point we can see the cpu climb to near 100% as it starts running a whole new batch of whois processes.

At this point, I am no closer to knowing why the much more powerful matching with more resources is crashing, and the old slower machine is not crashing.

Thanks,
Stanley
 
Chriss,

I really do appreciate your input and feedback. Several issues have been resolved because of your posts, so please don't stop.

As we all know there are many ways to skin something with VFP. The several ways others and you have suggested that I start over on a completely different path dealing with the learning curve of said path and all the issues that arises, I have resisted to throw away something that works on one machine, adheres to the KISS methodology, and is what I know. As said earlier, if necessary, I will explore and learn these other options.

The code as written days back may have been good. I just knew it was failing on the workstation and after evaluating all the metrics available via resource and process monitors, I see nothing that suggests an issue. There maybe an issue with the failing workstation, hence the need to test on another workstation. This is what I was doing when I added it to the server. I maybe should have tested it on a different workstation first, but we now know it runs without issue on the server platform, so that is good to know.

Testing has revealed the code and os is stable on the old server platform.

Thanks again,
Stanley
 
You're not getting a main factor of this, spawning new processes interleaves with endig the first one. So turnover times overlap and the overall speed can be higher than you assume.

I already noticed, and you seem not to understand this, that your logs show that less processes at the same time mean an overall better throughput.

I said you orient the number of concurrent processes by the number of cores, but that's an orientation, not a fixed number.


One thing that would help understand what's going on is post your process of running N whois. The way you said you do it, again quoted:
stanlyn said:
no wait time between batches

This is nonsensical to me, translated to code you do:
Code:
Select whoisjobs
Scan
   Scan Next 1000
      * do whois
   Endscan
Endscan

That's equivalent to just one scan endscan starting all whois without waiting for any of them. This asks for trouble and that you don't have it on an old server might simply be because it only has one core or the OS itself is more stable or the hardware is better even maybe because it is older and doesn't allow to do too much in parallel.

You have all means of logging errors, I assume from what you tell the system doesn't actually error, it just crashes at some point, you could log the number of processes and see if that's happening at a certain number, you have all means for analyzing what's going on, I have nothing like that available to me.


I told you what I suggest too many times that I would like to repeat myself, I don't even get what you don't get. I don't mean a fixed N, that's the first thing you got wrong, I suggested to add processes, even more than were finished, when the throughput is not going down by it. And less, when throughput goes down. I mention monitoring the throughput, measured in number of bytes of responses received per time, because that's the factor that determines how fast you finish overall throughput time number of whois. Since you have your whateveer number of whois queries to do, the throughput is the only number to worry about.

Please, really look back, no matter how you like your old simple way of doing it, you now adapted to do things in parallel instead of sequential, didn't you? But you do it in a too simple way, not checking what's best. You don't find out by testing a few fixed number of processes. You actually said you want to optimize the whole processing time. Well, that means optiomizing the throughput. And to be able to do so, the first thing is to measure it. Not with number of batches and time, measure the bytes received per time, that's the number that matters.


If you want to switch from sequential to parallel then don't do it stupid, do it adaptive, and you get a program that can automatically adjust itself to hardware resources available.

Again, what is the throughput? That's bytes of the output file divided by (endtime of receiving the file - start time of creating the process and file). And I showed you that you can create the empty file in foxpro and let whois add to it with >>. Then have file creation and last write time and file size to measure.

And that's not complicated math, it's a division. It's more complicated to get the file times than it is to divide a file length by a duration, isn't it?

If, as you say you want to optimize the overall process that's not by running as many things in parallel as you can but to get the throughput highest as it can be. And you don't know in advance, it depends on many factors that are not fixed, surely not during the time it takes overall. Your throughput can go down because other network sections of the company currently use up more of your overall internet bandwidth, it can go down because many output files arrived at about the same time and now you're clogging the file system with writes. But all the details don't matter in average, so you concentrate on measuring the throughput in bytes/duration and not in number of whois per duration.

With the experience of how this number adjusts itself you can start with more (or less) next time.

You can only optimize your overall time that way, by optimizing the average throughput, because the total time will be average throughput * number of queries anyway you turn it around, simply by definition. To optimize that is non negotiable. So in the first place measure that and not how many whois you started already.
 

Or take it from another perspective - just stabilizing the code. You judge your code as stable because the server run proofs there's no conceptual problem.
Well, the workstation shows there is. I would only judge code as stable when it runs stable on any system.

The way to get there is starting smaller and upping. My guess - my experience is - that you don't get very much higher performance, if you run very much more processes than you have cores. Even in a case the major time in a process is spent waiting. If you measure throughput you'd see an optimum at perhaps 2 or 3 times the number of cores, perhaps more, but it's easy to let this adjust itself by monitoring that.

The only reason your system becomes instable is the number of processes that run in parallel. A server OS is programmed to run more stable as it runs 24/7, a client OS does not do that. Server hardware has to be more stable for that reason, too. That's all there is to it, very likely. You'll only be convinced if you make the experience yourself, then do so. Put it on another workstation, maybe you get luckier and maybe from differences you can finally see what makes one workstation more stable than another.

Chriss
 
Hi Chriss,

Just a quick update...

I have a stability test running an a different Win10Pro workstation. It has been processing whois requests for about 30 hours with no issues at all, and it is still running. I'll post screenshots later of its progress and specs.

Looks like I have an issue with the original workstation. Now that we know that it isn't a difference between workstation and server OS'es, I can start to work on performance. I haven't paid much attention to performance once the crashing started. At that point, performance didn't matter, if all I can get is 10-20k rows processed before a crash.

Stability, or "why is this crashing" issue had to be answered. I stayed focused on this issue because I did not want to mask a potential issue that slowing things down might hide, and in this case it looks like it would have masked a real underlying issue.

Now to performance and the use of the append >> command. Are you suggesting that I add an additional routine that concatenates the whois output to a total.txt file everytime I get whois results back. Then get its byte size at the end of the cleanup cycle, do some math and adjust the N (batch) size, then empty total.txt.

Instead of concatenating to the total.txt file (which will grow and grow, thereby getting slower and slower), wouldn't it be better to get only the size as each whois result comes in and add that to a var, and work only with the var. This should be the fastest and achieve the same result, but without all the overhead of writing to the os, via cmd.exe. You agree?

Yesterday, on the server, I ran into the 2gb vfp limit at about 380,000 rows. I've started building out a MSSQL backend for it. At this point, little has been done on it's frontend, as I needed these test out of the way. I know a lot more now about its needed specs. So for now, its speed, size, and time limits imposed by the whois servers. See new thread entitled "Changing VPN Servers in VFP Code" I've started for dealing with that.

Thanks,
Stanley
 
You should really just read what I already said.

I'll try to summarize:

1. I suggested optimizing performance because it was a major concern even after you had crashes. Also I don't just think, I now tested that you don't get better throughput from more than 20 processes in parallel. That's what I always suspected, if you don't spike up the process count you also won't have issues and even if more processes work on another server or client OS, you still overlook that it's slower with more processes where you read your data to be improving performance.

2. I suggested >> to an output file per each whois not to a total output file. Just reread what I purposed. You then can measure the duration of a process by file creation time, which is short before you do RUN and last write access time, which you can get even when you get the file handle later. This duration then is a perfect measure of the time it took including to start the process and up to the time it finishes writing the output. Then take the length of the output and divide by that duration and you have the throughput of that single process.

You'll see throughput drop when you start too many processes, that can have multiple reasons, but won't matter in detail. Partly because starting a process spikes concentration on that new proccess, hindering already running processes to continue their work, partly because all processes want to write to the same output directory, partly because the more processes wait for responses they all require internet connection access and the system has to iterate through more processes, so each one does take longer and doesn't finish optimal.

The details don't matter. All you need to have control about the processes is a counter you increase with every RUN and decrease after erasing an output file you could open and read. In conjunction with the average throughput of say the last 10 whois requests that finished, you'll find that a certain number of processes, far below 100 have the best throughput and you can always adjust the number of parallel processes you want to allow and only start a new process if the currently running count is or becomes lower than the max limit you set by throughput optimization.

I don't share your opinion this masks the reason for crashes. Even now you find another client runs stable you're still running with very unoptimal number of processes. Your instable client also will run stable AND have better throughput with a low number of parallel processes. You'll never see when you don't even test my advice.

I don't think you'll find out just by knowing the CPU, RAM or other system specifics, the core reason for a system to get unresponsive is too much workload and that's the number of parallel processes. Do you remember the time CPUs only had one core? Do you remember how a single wrong endless loop within a VFP exe could spike up CPU load to 100% and make a system crash? This vanished when we had CPUs with 2 cores. The VFP process then could only jam one of the cores, as it was and still is running on a single core, so you could only get CPU load to 50%, the other core allowed the system to still iterate and swirtch processes as it does in general.

Chriss
 
Hi Chriss,

I agree with most of what you are suggesting, but have not spent any time on the performance side of things. When I was doing it serially, it had not crashed as I only let it run for 1000-5000 rows before killing the process. At this point, it became apparent this would take too long, and I started working a parallel version and once I got it to where I could test its performance, it was clear that parallel was the way to go. I quickly saw the performance was good enough to continue, but first, the crashing had to be understood and fixed as stability is by far the most important part.

Most all of your posts have been centered around performance, and thank you for that. When dealing with the random crashing, performance had already been determined to be sufficient to continue its development, therefore shifting my focus to understanding and fixing the crashing issue. Currently, I do not know exactly why, but I do know that it isolated to my main dev machine as it has never crashed on the other 2 test machines, all the way up to the 2gb limit, which is around the 380k mark.

Right now I'm working on the 3rd issue that has been raised, the 2gb limit by porting everything to MSSQL.

Tonight, I'm going to start a full all-night run using 20 as a batch size on the machine that has crashed in the past, just to see if it crashes Windows again and record it throughput.

Currently, as I see it,
1. Issue #1 - Performance, resolved by going parallel and applying principles we've talked about here,
2. Issue #2 - Stability, It is stable even when stress tested very very hard, as I was intentionally trying to kill it... and couldn't...
3. Issue #3 - 2gb Limit, Porting to MSSQL Standard,
4. Issue #4 - Getting around registrar's exceed limits,
5. Issue #5 - Who knows, but more, I'm sure...

Concerning #2 above. It appears to me that your way would cause the system to wait until the batch of 20 completes. If so, then all we need is a single whois request to take 20-30 seconds and performance is killed. If not, then why as I don't see its logic.

My way doesn't care if all 20 requests times out or takes 30-60 seconds each to finish, as the work keeps getting done by NOT waiting. Any locked whois output files will be processed on the next cleanup. Note the word "cleanup" here is probably not the best word as this process tries to get a handle on the file and if so, it parses the file, stuffs data into the table, then deletes the file. If a handle cannot be acquired, it is skipped, allowing it to finish or timeout on its own terms and will be checked again on the next cleanup. I ran into this slowdown very early on when I was doing them serially one after the other. I see your way no different, except they are being done at a batch level instead of individually.

Chriss said:
>> to an output file per each whois not to a total output file
Is the purpose of this to get a timestamp at the top of the batch? What if 5 of this batch takes 20 seconds each?

What are you concatenating into this file? You say it does not go into a single totals file, which suggests that each whois has its own output file. Correct? Why would you do that as the whois is already generating the response file which is a single file per record.

By the time the program creates 20 records in the domain table which has spun off 20 whois requests that have been completely decoupled from VFP, you are asking VFP to wait until they finish before continuing? Wouldn't that be a performance killer in and of itself? I would think so...

1. Outer loop - Start a batch of 20

2. As each row in the list file is read, I create a new row in the domain table, I stuff all known data into the domain row, then,

3. Create its matching batch file for each row that executes whois (ie. 123456.bat) where 123456 represents the pk for the matching domain record, to be used later when processing the response files. This completely decouples the whois process from VFP so not to slow it down.

4. Inner loop - Once all 20 rows in the batch completes, we enter cleanup mode, where it tries to get a handle on the example 123456.txt request file, and if successful, it gathers data from it, and stuffs it into the matching domain table via its fname() which matches the fname() of the .bat file and the pk of the domain record after a lookup.

5. Outer loop continued - Once the 20 response files have completed processing, we go and do another 20, over and over.

And, I'm re-visiting and studying this thread, top to bottom now and will again when dealing with getting the best performance.

Thanks,
Stanley
 
stanlyn said:
It appears to me that your way would cause the system to wait until the batch of 20 completes.. If so, then all we need is a single whois request to take 20-30 seconds and performance is killed.

No, All a max process cap does is only start as many in parallel, if one of them finishes it can immediately be replaced by another process, so why do you think this waits for a batch of processes to all finish.

The number of currently running processes is limited by a max value that's also not fixed, but adapts to what makes sense, therefore the measurement of something that's useful to detect what's making the overall processing fastest. As I said again and again the throughput.

Do you start to get the idea?

stanlyn said:
Why would you do that as the whois is already generating the response file which is a single file per record.
Please, read more thoroughly what I write. I said the VFP process controlling all this can generate an empty file. This file therefore will have a file creation timestamp that's BEFORE the whois process even starts, thus measuring the start time of the process, which is lost if you would just measure the file creation time that's caused by whois starting AFTER it has started. That way you DO get how much more processes could cause new process starts to take longer. Remember any OS has to share CPU time to all running processes. More processes don't run faster and also the start of a process is, well, a process in itself. A process that can take longer just because so many processes already run.

Taking the full time duration means you measure the time spent on the whois request, not just the net internal time. Well, there's perhaps one bit to add, the time you take after detecting a closed file you can read to putting it's content somewhere else, into a database, perhaps. But that wouldn't need to count in the interval between last write access to you detecting this as complete file. It's not time wasted but time spent on other things like starting a process or looking through all currently existing output files.

You could also decide to leave all files as is and process them after all whois are through, as that means you can concentrate on the whois calls, processing lots of files with uch data to a database server shrinks your network bandwidth for the whois repsonses, so is counterproductive.

stanlyn said:
What if 5 of this batch takes 20 seconds each?
Then a measure like suggested, the throughput of the last 5 whois requests (notice that's a moving window, with the next finished request one of the five falls out). In the worst case you judge this five unfortunate requests as a bad performance, reduce the number of processes to find out in 10 requests this can get faster again with more processes. You adjust - and that's important at least to me - to a current performance only measured from things that happend currently, not 5 hours ago when the whole system was in another state perhaps also serving data to clients or whatever happend then that reduced the performance, or perhaps was better than now.

So mainly the difference between your and my way of running this is that you work with a fixed number of processes (at least you say and think, maybe you even start more processes than you're aware of, that's another thing to verify by measuring it) while I - to come back to our difference - while I would let the system adapt this max number of concurrent processes by the experience about how they perform.

And to finish the thought, in doing so I don't just optimize the performance, I prevent situations where I already see in advance more processes make the whole system slower and then unstable until it may crash. If there is a bottleneck at some time that may not be caused by the server itself but by an internet quirk or several servers reacting with exceed limits, then this adjustment may also from time to time become too cautious, but it's also a matter of experience what to measure and how to react to it in detail.

From all my tests the end phase where no process is started and the already running processes finish one after the other, the performance peeks, which alone shows me that starting processes is a bottleneck which I didn't see coming. At that phase my algorithm was increasing the max allowed process count, which then wasn't used, as there where no further requests. At that time the whole system could also loop back and restart with the first whois to continuously go through all the domains and get the information as frequently as possible.

It's a matter of several development iterations to see what measures make most sense, for example you could also take the knowledge of the expected size of the whois response of a domain you already checked to see whether it performed as usual, better, or worse and take that into account for the control of how many things to run in parallel.

Again, I don't see why you think controlling the number of concurrent processes means starting that number and waiting for all of them to finish. As I said once a process ends and you can get its output file, you decrease the number of currently running processes by one, which allows to start another. At least if the current max cap stayed or even grew. And if it shrinked, then you have to wait for further processes to finish until the current number goes below the max cap, but that's just done when you detect that current performance went down.

It seems you think along the lines that the major time is spent at the whois servers anyway and the rest is quite constant. Well it isn't There are times the parallel output of files causes disk writes to take longer than when les processes try to write. This could also be optimized by using raid or multiple drives.

I introduced a minimum process count too, so that maximum process cap never goes down to 0. At any time at least that many processes run in parallel. because of course when the current number sinks below max you dont need to wait for all current processes to finish, who said that? I thought of that happening in your approach, in the way you described it you run a batch of N whois and then continue by starting another N, not saying anything about the way you react to finished processes this actually means and still means that you would start all requests. You were saying by the time you have started N some of them have already finished. Yes, of course, but likely not all and when you continue starting this just jams your queue.

The core loop should looks for finished outputs and then triggers a new process (if allowed) and evaluates the found result(s). And as you start at 0 processes this same main loop logic finds that it can immediatley start max cap number of processes. In parallel, of course. I don't know why you still don't see that logic working. It really just needs a counter that in and decreases a max cap, a min cap, too perhaps, to never drop down to 0, and also you don't have to change the max cap drastically just because of the current measurement. You could define a trend that starts at 0 and gets incremented. That increment doesn't even need to be an integer, it could be something like increasing the max cap by 0.5 more processes per second, and when that leads to worse throughput you don't steer hectically in the other direction but let the trend sink in .01 steps, whatever turns out to be the best feedback loop.

Still, it mainly is maintaining one counter of currently started and not yet finally processed requests plus a max cap to watch over and a measurement of the performance with these variables, no rocket science. In short this means when you would run 10 tests with 10 different N values, then the way I suggest this will vary N itself over night. You can later also analyze a log to see which number of processes was average during a night and make that the best guess for the starting number. You could see whether the process count has a tendency to oscillate instead of converging to an optimium. And you might be able to see what else in the system causes the performance of your appication to change, be it as simple as Windows downloading update to then wait for you to allow to install them. There always is a multitude of things happening to a system even if you devote it to running your whois list and nothing else during night.

Chriss
 
To get to another point separately, don't confuse the way I recommend to work with me describing how I understand you work. And risking false accusations, the way you say you start N requests, rely on many of them already being finished when you get to N and then start the next N, if I take that very literally, it won't matter what N is, you arrive at a point N and continue from there.

To make an analogy, it wouldn't matter if you say you go one kilometer after another or one mile after another, when the whole distance is the same. As far as I see you didn't describe your logic about detecting and processing finished requests. You briefly said you'd fopen the files and see whether you get access to them. That's a valid method I'd also do, but if you go through this in waves, like startng N processes then scanning for completed files and then starting further N, you will not let the number drop very much, I think, so changing N wouldn't change much about how many processes you add.

And the more processes run, the longer it even just takes to go through each single output file and check whether it's finished. With less processes you can detect the finish earlier very near to the actual event of closing the file.

Another way of doing that is also described in one link I already gave - at least not far away from there, to use monitoring of the file system to be informed about file closing events in near real time:

I just think you're having too much weight on the KISS principle. Again, mainly I just keep one counter variable and another variable maximum cap number that influence each other and/or are influenced by other measures like the throughput. That's not that complicated concept.

Chriss
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top