Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Andrzejek on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Merge CSV contents

Status
Not open for further replies.

drclaw

IS-IT--Management
Sep 3, 2010
2
US
I apologize if this was covered in another post, but I wasn't able to find it in my searches. Maybe there's a better name for what I'm trying to do.

I'm trying to merge the contents of 2 different CSV files.

Basically I have 1 file that looks like this:

Code:
Field1,Field2,Field3
value1,      ,value3

And another that looks like this:

Code:
Field1,Field2,Field3
value1,value2,

In this example assume that value1 is equal in both files. What I'd ultimately like to end up with is a single CSV file that looks like this:

Code:
Field1,Field2,Field3
value1,value2,value3

Instead of 2 separate records in 2 different files, I'd like to combine the 2 files so that all the data is in a single record.

If these were 2 tables in a database, a join would work perfectly. I know that I could import the CSVs into tables in a DB, do the join, and output the results, but I'd like to keep this process as lightweight as possible.

Is there a way to do this in PS? Would VB be a better choice?

Thanks
 
This is untested, but something along these lines should work:

$dataOne = Import-CSV C:\Path\To\File1.csv
$dataTwo = Import-CSV C:\Path\To\File2.csv

foreach ($row in $dataOne){
$row.Field2 = $($dataTwo | Where-Object{$_.FieldOne -eq $row.FieldOne}).FieldTwo
}

$dataOne | Export-Csv C:\Path\To\Combined\File.csv

[red]"... isn't sanity really just a one trick pony anyway?! I mean, all you get is one trick, rational thinking, but when you are good and crazy, oooh, oooh, oooh, the sky is the limit!" - The Tick[/red]
 
Thanks very much!

Tweaked it a little to fit my exact situation and it worked perfectly.

I might experiment with it a bit more. PS might not be the best choice for this type of thing since both of those files will have around 2000 rows in them each day. It took it about 20 min to complete on a my work desktop which is pretty well equipped.
 
[0] 2000 records for 20 min seem high? Am not sure how much side-processing is involved, if the task is just concentrating of the updating match index (field1), my testing shows "much" less time needed on my not-so-well and average type of developing machine.

[1] I use this exact script for testing updating $dataOne by new data in $dataTwo.
[tt]
$data1 = Import-CSV C:\xyz\abc_data.csv
$data2 = Import-CSV C:\xyz\abc_data2.csv
$dataout="c:\xyz\abc_out.csv"

foreach ($row2 in $data2) {
$row1=$($data1 | Where-object{$_.field1 -eq $row2.field1})
if ($row1 -ne $null) {
if ($row2.field2 -ne "") {
$row1.field2 = $row2.field2
}
if ($row2.field3 -ne "") {
$row1.field3 = $row2.field3
}
}
}

$data1 | Export-Csv $dataout
[/tt]
[2] I use 2000 rows in $data1. In data2, in real data it can contain less rows, but I use 2000 rows as well. Every index field1 in $data2 matches a field in $data1.

[2.1] Field2 and field2 can have random empty string in $data2 so that no need to update $data1 on the matching row for that field.

[2.2] I use foreach() loop on the $data2. That can help shortening the time in real life if $data2 contains significantly less row to update. But testing sample does not rely on this.

[3] I get the average of about 3-4 minutes for the processing of 2000 records each of the base ($data1) and update ($data2) csv. Users sure have a sense of waiting... but not as much as 20 min order of magnitude.

Can you test the above scheme see how it performs in your machine?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top