ns1234
Programmer
- Dec 3, 2002
- 6
Hi All
I need suggestions on improving the design of my multithreaded application more effectively. The required as-is scenario is as follows:
We have a data processing Java console application which is processing data from N- channels. The data is being downloaded from N number of channels (E.G a folder on the local systems or LAN, COM port, Email, FTP). Data precisely consists of 2-3KB of text files or mage files (upto 2 MB). Now there can be N number of such channels i.e there permutations and combinations (2 hotfolders, 3 COM ports, 3 FTP servers…etc etc). Data is being processed for each channel according to some filters and rules applied to their contents and stored at particular destination folders specified. Each Channel can have N Number of files. I’m using the CoR(Chain of responsibility) pattern for this.
Currently the application that we have for this doing the following:
1. A configuring client GUI application is used to configure channels, basically info about the channel FTP server and user-id,passwd,…filter on the channel i.e filter1, filter2, and rule to apply like ToUpper…etc. It creates an XML file for each active channel. Every channel is given a system generated name is ch1, ch2…etc when is is configured by the client. This app is used once to configure the server.
The Server console consists of:
1. One ChannelManager thread runs and checks the number of channels present.(basically XML files (ch1.xml, ch2.xml..etc)
2. This ChannelManager spawns the required number of ChannelProc threads for each channel in a loop to download data to their specific (ch1,ch2…etc) channel folder in a RAW folder on a local drive eg Raw /ch1, Raw/ch2
3. Another Thread called FilterManager polls the RAW folder to find the number of channel folders to process and spawns that many number of FilterProcessor threads to parse data based on the filter and moves the data to individual folders (ch1,ch2…etc) in a FILTER folder.ie Filter/ch1, Filter/ch2 on local drive
4. Another Thread called Filter2Manager polls the FILTER folder to find the number of channel folders to process and spawns that many number of Filter2Processor threads to parse data based on the filter2 and moves the data to individual folders (ch1,ch2…etc) in a FILTER2 folder.ie Filter2/ch1, Filter2/ch2.
5. Another Thread called RuleManager polls the FILTER2 folder to check the number of channel folders to process and for each channel spawns a RuleProcessor thread to convert the data in each channel to the specific output (eg all uppercase, lowercase..etc) and moves the data to individual folders (ch1,ch2…etc) in a RULE folder.i.e Rule1/ch1, Rule2/ch1
So basically the funda is there are 4 polling Manager threads which are in turn spawning that many number of processor threads to process data. Since downloading of data can be scheduled by the user so the ChannelProc thread for a particular channel keeps polling every 1 min (min time interval) and checks against the configured timing interval to download data. Other processor threads (Filterprocessor, Filterprocessor2..) die when processing of a particular file is complete and data moved to the other successive folder. One channel can have N Number of files/data at any given point of time. All rules are read and applied using XML files. For now the Max number of processor threads have been fixed to 3 for FilterProcessor, Filter2Processor, RuleProcessor threads, since there are so many threads running the Memory and CPU usage goes up.
THE PROBLEM:
The problem is data is coming continuously from COM ports so those threads will never be free. The more the number of COM ports the more the data and the more the number of threads, so much so that other threads don’t get a chance to execute, since max number of FilterProcessor, Filter2Processor, RuleProcessor threads have been fixed. So if there are 2 COM channels getting in data continuously and LAN Folder or FTP server to download data from, the COM threads are always occupied and the third channel is never processed. How do I utilize the sleep time between Channels polling by channleproc threads to process other unprocessed channels.
I need to re-design this system so that there is not data loss and yet all channels get processed. ANY SUGGESTIONS, TIPS, Schemes, Patterns.
I was thinking of using TimerTask threads for Filters processing…but if I use one Timer to process all channels then Time slicing will go up as channels increase. Also if one Filter processing task fails the others will also stop. If I spawn a new Timer for each channel then again the number of threads goes up.
Thanks a lot
I need suggestions on improving the design of my multithreaded application more effectively. The required as-is scenario is as follows:
We have a data processing Java console application which is processing data from N- channels. The data is being downloaded from N number of channels (E.G a folder on the local systems or LAN, COM port, Email, FTP). Data precisely consists of 2-3KB of text files or mage files (upto 2 MB). Now there can be N number of such channels i.e there permutations and combinations (2 hotfolders, 3 COM ports, 3 FTP servers…etc etc). Data is being processed for each channel according to some filters and rules applied to their contents and stored at particular destination folders specified. Each Channel can have N Number of files. I’m using the CoR(Chain of responsibility) pattern for this.
Currently the application that we have for this doing the following:
1. A configuring client GUI application is used to configure channels, basically info about the channel FTP server and user-id,passwd,…filter on the channel i.e filter1, filter2, and rule to apply like ToUpper…etc. It creates an XML file for each active channel. Every channel is given a system generated name is ch1, ch2…etc when is is configured by the client. This app is used once to configure the server.
The Server console consists of:
1. One ChannelManager thread runs and checks the number of channels present.(basically XML files (ch1.xml, ch2.xml..etc)
2. This ChannelManager spawns the required number of ChannelProc threads for each channel in a loop to download data to their specific (ch1,ch2…etc) channel folder in a RAW folder on a local drive eg Raw /ch1, Raw/ch2
3. Another Thread called FilterManager polls the RAW folder to find the number of channel folders to process and spawns that many number of FilterProcessor threads to parse data based on the filter and moves the data to individual folders (ch1,ch2…etc) in a FILTER folder.ie Filter/ch1, Filter/ch2 on local drive
4. Another Thread called Filter2Manager polls the FILTER folder to find the number of channel folders to process and spawns that many number of Filter2Processor threads to parse data based on the filter2 and moves the data to individual folders (ch1,ch2…etc) in a FILTER2 folder.ie Filter2/ch1, Filter2/ch2.
5. Another Thread called RuleManager polls the FILTER2 folder to check the number of channel folders to process and for each channel spawns a RuleProcessor thread to convert the data in each channel to the specific output (eg all uppercase, lowercase..etc) and moves the data to individual folders (ch1,ch2…etc) in a RULE folder.i.e Rule1/ch1, Rule2/ch1
So basically the funda is there are 4 polling Manager threads which are in turn spawning that many number of processor threads to process data. Since downloading of data can be scheduled by the user so the ChannelProc thread for a particular channel keeps polling every 1 min (min time interval) and checks against the configured timing interval to download data. Other processor threads (Filterprocessor, Filterprocessor2..) die when processing of a particular file is complete and data moved to the other successive folder. One channel can have N Number of files/data at any given point of time. All rules are read and applied using XML files. For now the Max number of processor threads have been fixed to 3 for FilterProcessor, Filter2Processor, RuleProcessor threads, since there are so many threads running the Memory and CPU usage goes up.
THE PROBLEM:
The problem is data is coming continuously from COM ports so those threads will never be free. The more the number of COM ports the more the data and the more the number of threads, so much so that other threads don’t get a chance to execute, since max number of FilterProcessor, Filter2Processor, RuleProcessor threads have been fixed. So if there are 2 COM channels getting in data continuously and LAN Folder or FTP server to download data from, the COM threads are always occupied and the third channel is never processed. How do I utilize the sleep time between Channels polling by channleproc threads to process other unprocessed channels.
I need to re-design this system so that there is not data loss and yet all channels get processed. ANY SUGGESTIONS, TIPS, Schemes, Patterns.
I was thinking of using TimerTask threads for Filters processing…but if I use one Timer to process all channels then Time slicing will go up as channels increase. Also if one Filter processing task fails the others will also stop. If I spawn a new Timer for each channel then again the number of threads goes up.
Thanks a lot