southbeach
Programmer
Problem:
I have a number of CSV files sitting in a repository (say /home/csvdir/) - There can be any number of files within the directory. Each file has about 18 columns per row using comma (hence CSV) as a column separator.
I have the code to scan through the directory and read the files into a single "massive" array - or not so massive ...
This process needs to scan through and match the rows to filter/search criteria and return a count based on multiple criteria. Say col01 = city, col02 = state, col11 = date, col16 = size; I need to match against my search criteria against these columns.
As of this moment, my process takes way too long since I am
1. looping through directory
2. loading files content into array
3. looping through array one row at a time
4. exploding the row into variables using list()=explode()
5. matching filter criteria against the exploded values
Now, imaging having several hundred instances searching for matching rows over several thousand rows ... It is taking my dashboard about 20 seconds to load at only 120 instances with growing prospect to close to 20 times that.
Question:
How can I improve on my approach to this problem? I figure using array_search() but can I do so effectively if the matching values are spread across the row?
I am looking at array_search to extract the key from the array, then explode the row and hopefully skip 90% of the looping process.
Thank you all in advance for your assistance!
--
SouthBeach
The good thing about not knowing is the opportunity to learn - Yours truly, 2008.
I have a number of CSV files sitting in a repository (say /home/csvdir/) - There can be any number of files within the directory. Each file has about 18 columns per row using comma (hence CSV) as a column separator.
I have the code to scan through the directory and read the files into a single "massive" array - or not so massive ...
This process needs to scan through and match the rows to filter/search criteria and return a count based on multiple criteria. Say col01 = city, col02 = state, col11 = date, col16 = size; I need to match against my search criteria against these columns.
As of this moment, my process takes way too long since I am
1. looping through directory
2. loading files content into array
3. looping through array one row at a time
4. exploding the row into variables using list()=explode()
5. matching filter criteria against the exploded values
Now, imaging having several hundred instances searching for matching rows over several thousand rows ... It is taking my dashboard about 20 seconds to load at only 120 instances with growing prospect to close to 20 times that.
Question:
How can I improve on my approach to this problem? I figure using array_search() but can I do so effectively if the matching values are spread across the row?
I am looking at array_search to extract the key from the array, then explode the row and hopefully skip 90% of the looping process.
Thank you all in advance for your assistance!
--
SouthBeach
The good thing about not knowing is the opportunity to learn - Yours truly, 2008.