CSV files and/or array search

southbeach · Jun 13, 2014

Problem:
I have a number of CSV files sitting in a repository (say /home/csvdir/) - There can be any number of files within the directory. Each file has about 18 columns per row using comma (hence CSV) as a column separator.

I have the code to scan through the directory and read the files into a single "massive" array - or not so massive ...

This process needs to scan through and match the rows to filter/search criteria and return a count based on multiple criteria. Say col01 = city, col02 = state, col11 = date, col16 = size; I need to match against my search criteria against these columns.

As of this moment, my process takes way too long since I am
1. looping through directory
2. loading files content into array
3. looping through array one row at a time
4. exploding the row into variables using list()=explode()
5. matching filter criteria against the exploded values

Now, imaging having several hundred instances searching for matching rows over several thousand rows ... It is taking my dashboard about 20 seconds to load at only 120 instances with growing prospect to close to 20 times that.

Question:
How can I improve on my approach to this problem? I figure using array_search() but can I do so effectively if the matching values are spread across the row?

I am looking at array_search to extract the key from the array, then explode the row and hopefully skip 90% of the looping process.

Thank you all in advance for your assistance!

--
SouthBeach

http://www.fpgroups.com

The good thing about not knowing is the opportunity to learn - Yours truly, 2008.

spamjim · Jun 13, 2014

Flat data files are clumsy. Is there a reason why this cannot be handled in a database?

southbeach · Jun 13, 2014

spamjim, it is what I walked into. I have asked the same question but I keep loosing the argument. To be more specific, the flat files are the product of a number of cron processes dumping flat files which are then used as source to push web data. Data source is a "non-relational" 4GL database so, no SQL here. I am facing the task of writing code to mimic "relational" database feature in a "nonrelational" database environment.

--
SouthBeach

http://www.fpgroups.com

The good thing about not knowing is the opportunity to learn - Yours truly, 2008.

jpadie · Jun 13, 2014

Two thoughts:

Apply the filter intelligently in the first pass filter.
Have a cron job run periodically to scan the csvs and bung the data into appropriate tables.

southbeach · Jun 18, 2014

I ended up writing code to:

1. Build a dynamic shell script
2. Run script which uses grep to decrease the data volume to rows containing data of interest
3. Parse the "mush smaller" file where the result from grep was dumped

Reduced processing time from 20 seconds to under 5 ... I also modified my user interface so that as new entries are added, not to refresh the data grid and rather keep user in "Add Record Mode" and refresh only once they choose to close the data entry form.

Thank you all for your suggestions!

--
SouthBeach

http://www.fpgroups.com

The good thing about not knowing is the opportunity to learn - Yours truly, 2008.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

CSV files and/or array search

southbeach

Programmer

spamjim

Instructor

southbeach

Programmer

jpadie

Technical User

southbeach

Programmer

Similar threads

Part and Inventory Search

Sponsor