Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Convert data structure 1

Status
Not open for further replies.

peac3

Technical User
Jan 17, 2009
226
AU
Hello C# legends,

I have raw data with a family structure vertical as below

Code:
ROW_N FamilyID CompanyName HierarchyLevel ClientID
1      A        XX1          0              A1
2      A        XX2          1              A12
3      A        XX3          2              A345
4      A        XX4          2              A456
5      A        XX5          3              A6788
6      A        XX6          2              A346
7      A        XX7          3              A6789
8      B        XX8          0              B1
9      B        XX9          1              B11 
10     B        X10          2              B345
11     B        X11          2              B567 
12     B        X12          3              B6789


We want to build parent-children relationships from the above structure to horizontal using ClientID as keys as per below.
The logic is the parent is the closest one level higher hierarchy order by ROW_N.

Hence expected output is below

Code:
ROW_N FamilyID ParentID ParentName ChildrenID ChildrenName HierarchyLevel 
1      A        A1         XX1        A12        XX2          1
2      A        A12        XX2        A345       XX3          2
3      A        A12        XX2        A456       XX4          2
4      A        A456       XX4        A6788      XX5          3
5      A        A12        XX2        A346       XX6          2
6      A        A346       XX6       A6789       XX7          3
7      B        B1         XX8        B11        XX9          1
8      B        B11        XX9        B345       X10          2
9      B        B11        XX9        B567       X11          2
10     B        B567        X11       B6789      X12          3

SQL gurus say this is too complex for SQL hence I was wondering whether you guys could help the logic to build in C#?
 
Hi peac3,

I don't have MS SQL server, so I used MS Access.
First I created a MS Access database file named peac3.accdb with one table named Table_inp which contains your data.

Then I created C# Console Application which reads the MS Access table and writes out results on the console. The next step would be to write the result into other table.

The logic is similar as in in the awk script posted here:

Program.cs
Code:
using System;
using System.Collections.Generic;
using System.Data.OleDb;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace db_example
{

    public class Parent
    {
        public string FamilyID;
        public string ParentID;
        public string ParentName;
    }

    internal class Program
    {
        static void Main(string[] args)
        {
            string conn_string = "PROVIDER=Microsoft.ACE.OLEDB.12.0" +
                  "; DATA SOURCE=peac3.accdb" +
                  "; Persist Security Info=False;";
            OleDbConnection dbcon = new OleDbConnection(conn_string);

            string query = "select * from Table_inp";
            OleDbCommand cmd = new OleDbCommand(query);
            cmd.Connection = dbcon;

            dbcon.Open();

            // declare parent dictionary
            var parent_dict = new Dictionary<String, Parent>();

            var reader = cmd.ExecuteReader();
            int count = 0;
            while (reader.Read())
            {
                var HierarchyLevel = (int)reader["HierarchyLevel"];
                var FamilyID = reader["FamilyID"].ToString();
                var ClientID = reader["ClientID"].ToString();
                var CompanyName = reader["CompanyName"].ToString();
                var current_key = FamilyID + "#" + HierarchyLevel.ToString();
                var parent_key = FamilyID + "#" + (HierarchyLevel - 1).ToString();

                if (parent_dict.ContainsKey(parent_key))
                {
                    count++;
                    if (count == 1) {
                        Console.WriteLine(
                            "ROW\tFamID\tParID\tParNam\tChldID\tChldNam\tHier");
                    }
                    var parent_data =
                        parent_dict[parent_key].FamilyID + "\t" +
                        parent_dict[parent_key].ParentID + "\t" +
                        parent_dict[parent_key].ParentName;

                    var out_line =
                        count.ToString("0#") + "\t" +
                        parent_data + "\t" +
                        ClientID + "\t" +
                        CompanyName + "\t" +
                        HierarchyLevel.ToString();

                    Console.WriteLine(out_line);
                }

                // create new parent object from current record
                // to store it into the parrent dictionary
                Parent parent = new Parent
                {
                    FamilyID = FamilyID,
                    ParentID = ClientID,
                    ParentName = CompanyName
                };

                // store current record into parent dictionary
                parent_dict[current_key] = parent;  
            }
            dbcon.Close();
        }
    }
}
The output is this:
peac3_xmqfil.png
 
Excellent, you are such a legend.

Quick question, why are you not sorting it by ROW_N or is it already by default?
 
Good question :)
Yes, it didn't occur to me because it is default sorted by ROW_N in my MS Access table
peac3_access_table_cewl89.png


So change the query to
Code:
var query = "select * from Table_inp order by ROW_N";
 
Hi Mikrom,

I just realised the CLIENTID can be duplicated as the company name could be similar, we need to show the duplication in the child.
But when we choose the parent, we need to dedup it by the highest match rank.

For example below

Code:
ROW_N FamilyID CompanyName HierarchyLevel ClientID MatchRank
1      A        XX1          0              A1
2      A        XX2          1              A12
3      A        XX3          2              A345        
4      A        XX4ABC       2              A456A        2
5      A        XX4ABC       2              A456B        1
6      A        XX4ABC       2              A456C        3
7      A        XX5          3              A6788
8      A        XX6          2              A346
9      A        XX7          3              A6789
10     B        XX8          0              B1
11     B        XX9          1              B11 
12     B        X10          2              B345
13     B        X11          2              B567 
14     B        X12          3              B6789

expected output would be
Code:
ROW_N FamilyID ParentID ParentName ChildrenID ChildrenName HierarchyLevel 
1      A        A1         XX1        A12        XX2          1
2      A        A12        XX2        A345       XX3          2
[highlight #FCE94F]3      A        A12        XX2        A456A       XX4ABC       2
4      A        A12        XX2        A456B       XX4ABC       2
5      A        A12        XX2        A456C       XX4ABC       2
6      A        A456B      XX4ABC     A6788      XX5          3[/highlight]
7      A        A12        XX2        A346       XX6          2
8      A        A346       XX6       A6789       XX7          3
9      B        B1         XX8        B11        XX9          1
10      B        B11        XX9        B345       X10          2
11      B        B11        XX9        B567       X11          2
12     B        B567        X11       B6789      X12          3

Any idea how to filter by this exception?
 
Hi peac3,
I don't understand what you want to achieve.

Where did you get ParentID = A456BBC? We don't have such a name in the input table.
Code:
ROW_N FamilyID ParentID ParentName ChildrenID ChildrenName HierarchyLevel 
...
4      A        A456BBC    XX4        A6788      XX5          3
...

Why shouldn't A6788 have A456 as its parent when in the input table A456 is its predecessor one level lower?
Code:
ROW_N FamilyID CompanyName HierarchyLevel ClientID MatchRank
...
6      A        XX4          2              A456        
7      A        XX5          3              A6788
...

What does mean match rank and which match rank is lower in this case, 1 or 2 ?
 
Sorry mikrom,
it was like Sunday after midnight when I typed this, so it has some typos.

We just received a new data set to handle duplicate company names.
In that case, they rank it for us to identify the client who has the biggest deal amount and flag it in MatchRank.
With match rank, then we should identify which client ID should be the parent.

The vanilla rule is still picked up from the higher row with additional exceptions below.
if there are duplicate companies, list them all as children then pick up the highest rank as parent.

I have rectified the sample accordingly

Where did you get ParentID = A456BBC? We don't have such a name in the input table.

sorry typo, the parentID should be A456B because this one is the highest match rank (number 1), and the parent name should be XX4ABC -> as per row 6 (I have corrected this)

Why shouldn't A6788 have A456 as its parent when in the input table A456 is its predecessor one level lower?

because A456A, A456B and A456C have the same company name hence we need to pick up the highest match rank.

What does mean match rank and which match rank is lower in this case, 1 or 2?

lower number is higher hence 1 is the highest...

I have rectified the sample as below


Code:
ROW_N FamilyID CompanyName HierarchyLevel ClientID MatchRank
1      A        XX1          0              A1
2      A        XX2          1              A12
3      A        XX3          2              A345        
4      A        XX4ABC       2              A456A        2
5      A        XX4ABC       2              A456B        1
6      A        XX4ABC       2              A456C        3
7      A        XX5          3              A6788
8      A        XX6          2              A346
9      A        XX7          3              A6789
10     B        XX8          0              B1
11     B        XX9          1              B11 
12     B        X10          2              B345
13     B        X11          2              B567 
14     B        X12          3              B6789

expected output would be
Code:
ROW_N FamilyID ParentID ParentName ChildrenID ChildrenName HierarchyLevel 
1      A        A1         XX1        A12        XX2          1
2      A        A12        XX2        A345       XX3          2
[highlight #FCE94F]3      A        A12        XX2        A456A       XX4ABC       2
4      A        A12        XX2        A456B       XX4ABC       2
5      A        A12        XX2        A456C       XX4ABC       2
6      A        A456B      XX4ABC     A6788      XX5          3[/highlight]
7      A        A12        XX2        A346       XX6          2
8      A        A346       XX6       A6789       XX7          3
9      B        B1         XX8        B11        XX9          1
10      B        B11        XX9        B345       X10          2
11      B        B11        XX9        B567       X11          2
12     B        B567        X11       B6789      X12          3
 
So, when saving into the parrent dictionary you also have to take MatchRank into account.
 
So, when saving into the parrent dictionary you also have to take MatchRank into account.

Correct, but I am not quite sure whether I understand your logic codingly,
pseudo-code or example would be nice :)
 
I tried this
Input:
2024-03-18_02h02_32_pxvg6p.png


Output:
2024-03-18_02h06_35_zqf1ja.png


Program Code:
Code:
using System;
using System.Collections.Generic;
using System.Data.OleDb;
using System.Linq;
using System.Runtime.InteropServices.ComTypes;
using System.Security.Policy;
using System.Text;
using System.Threading.Tasks;

namespace db_read_example
{
    public class Parent
    {
        public string FamilyID;
        public string ParentID;
        public string ParentName;
        public string MatchRank;
    }

    internal class Program
    {
        static void Main(string[] args)
        {
            var db_file = "c:\\00_mikrom\\Work\\C#\\peac3.accdb";
            var conn_string = "PROVIDER=Microsoft.ACE.OLEDB.12.0" +
                  "; DATA SOURCE=" + db_file +
                  "; Persist Security Info=False;";
            OleDbConnection dbcon = new OleDbConnection(conn_string);

            var query = "select * from Table_inp order by ROW_N";
            OleDbCommand cmd = new OleDbCommand(query);
            cmd.Connection = dbcon;

            dbcon.Open();

            // declare parent dictionary
            var parent_dict = new Dictionary<String, Parent>();

            var reader = cmd.ExecuteReader();
            int count = 0;
            while (reader.Read())
            {
                var HierarchyLevel = (int)reader["HierarchyLevel"];
                var FamilyID = reader["FamilyID"].ToString();
                var ClientID = reader["ClientID"].ToString();
                var CompanyName = reader["CompanyName"].ToString();
                var current_key = FamilyID + "#" + HierarchyLevel.ToString();
                var parent_key = FamilyID + "#" + (HierarchyLevel - 1).ToString();
                var MatchRank = reader["MatchRank"].ToString();

                // check current match rank
                bool current_matchrank_numeric = false;
                int current_matchrank = 0;
                if (!string.IsNullOrWhiteSpace(MatchRank))
                {
                    current_matchrank_numeric =
                    int.TryParse(MatchRank, out current_matchrank);
                }

                if (parent_dict.ContainsKey(parent_key))
                {
                    count++;
                    if (count == 1)
                    {
                        // write header
                        Console.WriteLine("ROW\tFamID\tParID\tParNam\t" +
                            "ChldID\tChldNam\tHier");
                    }

                    var parent_data =
                        parent_dict[parent_key].FamilyID + "\t" +
                        parent_dict[parent_key].ParentID + "\t" +
                        parent_dict[parent_key].ParentName;

                    var out_line =
                        count.ToString("0#") + "\t" +
                        parent_data + "\t" +
                        ClientID + "\t" +
                        CompanyName + "\t" +
                        HierarchyLevel.ToString();

                    Console.WriteLine(out_line);
                }

                // create new parent object from current record
                // to store it into the parrent dictionary
                Parent parent = new Parent
                {
                    FamilyID = FamilyID,
                    ParentID = ClientID,
                    ParentName = CompanyName,
                    MatchRank = MatchRank
                };

                // store current record into parent dictionary
                if (parent_dict.ContainsKey(current_key))
                {
                    if (!string.IsNullOrWhiteSpace(parent_dict[current_key].MatchRank))
                    {
                        bool previous_matchrank_numeric = false;
                        int previous_matchrank = 0;
                        previous_matchrank_numeric =
                             int.TryParse(parent_dict[current_key].MatchRank,
                                          out previous_matchrank);
                        if (current_matchrank <= previous_matchrank)
                        {
                            parent_dict[current_key] = parent;
                        }
                    }
                    else
                    {
                        parent_dict[current_key] = parent;
                    }
                }
                else
                {
                    parent_dict[current_key] = parent;
                }
            }
            dbcon.Close();
        }
    }
}
 
Thanks Mikrom,

You are such a legend!

After discussion with our stakeholders, if the client names are duplicated, they do not want to pick based on the biggest deal only (match rank), there are other considerations.
Hence solution-wise, I will build a web front end for them to cherry-pick manually and it will be captured into our reference table.

Yes, they changed their mind :(

So I have built the reference table as simple as below.

Code:
FamilyID ClientName ClientID
A          XX4ABC    A456B

Hence for exception cases as in duplicate names, this reference table above will determine uniquely which parent should be (replace the match rank logic).
So when saving into the parent dictionary we no longer need to Match Rank into account but we take it from the reference table instead.

How is the best way to incorporate it into your c# code above?

Thank you so much in advance,
 
If you have changed the basic data structure, you will not be able to use the program I posted above. You then have to solve it in your own way.
 
Sorry mikrom, I was throwing a solution after I was talking to my stakeholders and yesterday tek-tips was down.

No, the basic data structure does not change. I just need to merge my reference table into the main table.
Hence to handle duplicate names, I just need to replace the column MatchRank with UserManualFlag as below.


Code:
ROW_N FamilyID CompanyName HierarchyLevel ClientID UserManualFlag
1      A        XX1          0              A1
2      A        XX2          1              A12
3      A        XX3          2              A345        
4      A        XX4ABC       2              A456A        
5      A        XX4ABC       2              A456B        Y
6      A        XX4ABC       2              A456C        
7      A        XX5          3              A6788
8      A        XX6          2              A346
9      A        XX7          3              A6789
10     B        XX8          0              B1
11     B        XX9          1              B11 
12     B        X10          2              B345
13     B        X11          2              B567 
14     B        X12          3              B6789


expected output would be the same as before
Code:
ROW_N FamilyID ParentID ParentName ChildrenID ChildrenName HierarchyLevel 
1      A        A1         XX1        A12        XX2          1
2      A        A12        XX2        A345       XX3          2
[highlight #FCE94F]3      A        A12        XX2        A456A       XX4ABC       2
4      A        A12        XX2        A456B       XX4ABC       2
5      A        A12        XX2        A456C       XX4ABC       2
6      A        A456B      XX4ABC     A6788      XX5          3[/highlight]
7      A        A12        XX2        A346       XX6          2
8      A        A346       XX6       A6789       XX7          3
9      B        B1         XX8        B11        XX9          1
10      B        B11        XX9        B345       X10          2
11      B        B11        XX9        B567       X11          2
12     B        B567        X11       B6789      X12          3

So, when saving into the parent dictionary, how the best way to incorporate UserManualFlag='Y' into account

Thank you Sir
 
With the UserManualFlag it seems to be a modification of MatchRank.
 
This seems to work.

Program Code:

Code:
using System;
using System.Collections.Generic;
using System.Data.OleDb;

namespace db_read_example
{
    public class Parent
    {
        public string FamilyID;
        public string ParentID;
        public string ParentName;
        public string UserManualFlag;
    }

    internal class Program
    {
        static void Main(string[] args)
        {
            var db_file = "c:\\00_mikrom\\Work\\C#\\peac3.accdb";
            var conn_string = "PROVIDER=Microsoft.ACE.OLEDB.12.0" +
                  "; DATA SOURCE=" + db_file +
                  "; Persist Security Info=False;";
            OleDbConnection dbcon = new OleDbConnection(conn_string);

            var query = "select * from Table_inp order by ROW_N";
            OleDbCommand cmd = new OleDbCommand(query);
            cmd.Connection = dbcon;

            dbcon.Open();

            // declare parent dictionary
            var parent_dict = new Dictionary<String, Parent>();

            var reader = cmd.ExecuteReader();
            int count = 0;
            var previous_key = "";
            while (reader.Read())
            {
                var HierarchyLevel = (int)reader["HierarchyLevel"];
                var FamilyID = reader["FamilyID"].ToString();
                var ClientID = reader["ClientID"].ToString();
                var CompanyName = reader["CompanyName"].ToString();
                var current_key = FamilyID + "#" + HierarchyLevel.ToString();
                var parent_key = FamilyID + "#" + (HierarchyLevel - 1).ToString();
                var UserManualFlag = reader["UserManualFlag"].ToString();

                if (parent_dict.ContainsKey(parent_key))
                {
                    count++;
                    if (count == 1)
                    {
                        // write header
                        Console.WriteLine("ROW\tFamID\tParID\tParNam\t" +
                            "ChldID\tChldNam\tHier");
                    }

                    var parent_data =
                        parent_dict[parent_key].FamilyID + "\t" +
                        parent_dict[parent_key].ParentID + "\t" +
                        parent_dict[parent_key].ParentName;

                    var out_line =
                        count.ToString("0#") + "\t" +
                        parent_data + "\t" +
                        ClientID + "\t" +
                        CompanyName + "\t" +
                        HierarchyLevel.ToString();

                    Console.WriteLine(out_line);
                }

                // create new parent object from current record
                // to store it into the parrent dictionary
                Parent parent = new Parent
                {
                    FamilyID = FamilyID,
                    ParentID = ClientID,
                    ParentName = CompanyName,
                    UserManualFlag = UserManualFlag
                };

                // store current record into parent dictionary
                if (parent_dict.ContainsKey(current_key) && (previous_key == current_key))
                {
                    if (parent_dict[current_key].UserManualFlag.Trim().ToUpper() != "Y")
                    {
                        parent_dict[current_key] = parent;
                    }
                }
                else
                {
                    parent_dict[current_key] = parent;
                }
                previous_key = current_key; 
            }
            dbcon.Close();
        }
    }
}

Input:

2024-03-19_00h57_24_hprguy.png


Output:

2024-03-19_01h02_46_vgljx1.png
 
Fantastic, you are such a legend.
I tried for a few companies, it works... let me try it with hundreds thousands of companies
 
Hi Mikrom,

We need some little enhancement in your codes.
if there was some not duplicated value after the duplicated, it should be the parents the one right above it, not the one with Manual Flag = 'y'

As per below example

Data like below
Code:
ROW_N FamilyID CompanyName HierarchyLevel ClientID UserManualFlag
1      A        XX1          0              A1
2      A        XX2          1              A12
3      A        XX3          2              A345        
4      A        XX4ABC       2              A456A        
5      A        XX4ABC       2              A456B        Y
6      A        XX4ABC       2              A456C        
[COLOR=#73D216]7      A        XX4-1       2              A456D[/color]        
8      A        XX5          3              A6788
9      A        XX6          2              A346
10      A        XX7          3              A6789
11     B        XX8          0              B1
12     B        XX9          1              B11 
13     B        X10          2              B345
14     B        X11          2              B567 
15     B        X12          3              B6789



your code will produce below, to dedup the duplicated, however, your codes won't detect after duplication there is a singular value which is (A456D)
Code:
ROW_N FamilyID ParentID ParentName ChildrenID ChildrenName HierarchyLevel 
1      A        A1         XX1        A12        XX2          1
2      A        A12        XX2        A345       XX3          2
3      A        A12        XX2        A456A       XX4ABC       2
4      A        A12        XX2        A456B       XX4ABC       2
5      A        A12        XX2        A456C       XX4ABC       2
6      A        A12        XX2        A456D       XX4-1       2
[highlight #FCE94F]7      A        A456B      XX4ABC     A6788      XX5          3[/highlight]
8      A        A12        XX2        A346       XX6          2
9      A        A346       XX6       A6789       XX7          3
10      B        B1         XX8        B11        XX9          1
11      B        B11        XX9        B345       X10          2
12      B        B11        XX9        B567       X11          2
13     B        B567        X11       B6789      X12          3


The expected output would be below
Code:
ROW_N FamilyID ParentID ParentName ChildrenID ChildrenName HierarchyLevel 
1      A        A1         XX1        A12        XX2          1
2      A        A12        XX2        A345       XX3          2
3      A        A12        XX2        A456A       XX4ABC       2
4      A        A12        XX2        A456B       XX4ABC       2
5      A        A12        XX2        A456C       XX4ABC       2
6      A        A12        XX2        A456D       XX4-1       2
[highlight #8AE234]7      A        A456D      XX4-1      A6788      XX5          3[/highlight]
8      A        A12        XX2        A346       XX6          2
9      A        A346       XX6       A6789       XX7          3
10      B        B1         XX8        B11        XX9          1
11      B        B11        XX9        B345       X10          2
12      B        B11        XX9        B567       X11          2
13     B        B567        X11       B6789      X12          3

Any idea on how to enhance this?

Thanks in advance
 
Hi peac3,

If you remove the Y flag from the 5th rows, then the result will be ok.

2024-04-09_16h59_30_khyu3m.png


2024-04-09_16h50_06_yjyt2j.png
 
If you remove the Y flag from the 5th rows, then the result will be ok.

I know, but I need to keep that flag because the company name may be the parents in different families and that flag is to dedup which one is the right ClientID.

I think I have found a solution by comparing it with the previous name...
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top