Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Loading a binary tree from a file 3

Status
Not open for further replies.

Trevoke

Programmer
Jun 6, 2002
1,142
US
Hello,
I'm working on a Project Euler problem ( and one of the problems uses a binary tree stored in a file. All the nodes are fully leafed-out, up until the last row.
I can't think of a way to populate a binary tree from a file like that - usually recursion is in order but I don't know how to code a way to traverse the tree that corresponds to a way of parsing the file!
While I enjoy making my code as open-ended as possible, since this is a one-off for this particular problem, I can put as many constraints into the code as possible (I know how many levels the tree has, etc) to make this work.

Any help is appreciated, even a "google this" and a "try this kind of tree traversal".. Thanks!

Tao Te Ching Discussions : Chapter 9 (includes links to previous chapters)
What is the nature of conflict?
 
Where is the file format and structure description?
 
Ah, of course. My apologies.
File is ASCII, .txt and the tree is written line by line in there as such:
3
7 5
2 4 6
8 5 9 3

! While writing out the tree, I realized that it's not a binary tree but a different structure! I'd understood the problem incorrectly. I'm still trying to load this but now I don't even know what C++ structure I should be using.. Back to the drawing board for me!

Tao Te Ching Discussions : Chapter 9 (includes links to previous chapters)
What is the nature of conflict?
 
Ohh... A tree is a collection of nodes and edges. Do you think that four lines in your post above described the tree structure?..
 
First line has one number.
Second line has two numbers. (both linked to first number in first line ?)
Third line has three numbers. (first and second linked to first number in second line, second and third linked to second number in second line ?)
Fourth line has four numbers. (first and second linked to first number in third line, second and third linked to second number in third line, third and fourth linked to third number in third line ?)
Fifth line has five numbers.
Sixth line has six numbers.
Seventh line has seven numbers.

Maybe a tree is not the right word; in which case, I apologize for my ignorance (a little knowledge is a dangerous thing). In which case.. What is it?
The problem indicates I have to work my way down the structure, finding the path with the highest possible sum.

Tao Te Ching Discussions : Chapter 9 (includes links to previous chapters)
What is the nature of conflict?
 
OK, you have some kind of DGA (Directed Acyclic Graph), a generalization of trees (see The file contains weights of verticies.

It's not a problem to load data from the file into the dynamically formed structure. There are lots of methods to do that. The true problem is: how to coordinate this structure with the main algorithm. For example, if you want to traverse the grid not only up-to-down but also from leaves to the root (one of possible approaches), you need not only downcast left/right links but also direct upward left/right links (possibly closed) in every node. It's unclear now what additional vars must been placed in nodes (for example, to keep tracks of probe paths).

It's time to think a little...
 
ArkM,
Thank you, this was a very informative link. I clearly bit off more than I can currently chew on my own with this problem.. I'll do research online to try and find a way to load the data into this DGA. For this problem's purposes, I only need to be able to traverse the tree downward, but I need each parent to have two children..

Unless I can somehow bypass the tree implementation and just do the downward traversal while reading from file.. Hmmm..

Tao Te Ching Discussions : Chapter 9 (includes links to previous chapters)
What is the nature of conflict?
 
It doesn't sound all that complicated to me (yet...). You are right, the structure you have simply expands into a binary tree as you read it (personally I view trees upside down, so please forgive me for bottom-level leaves!):

At each ascii line, you add a row of leaves to the tree. The first bottom-level leaf now becomes a node pointing to two new leaves, corresponding to the first two numbers of the ascii row you are reading. You then discard the first ascii number, and continue with the next two and the next bottom-level leaf, to build up a complete new bottom level.

The only information needed at each node is the current running sum of all numbers encountered to reach that node (so when you make the two new leaves, they are the sums of the current numbers under consideration and the value of the parent leaf).

When you've built the entire tree, you look along the bottom row to find which number is biggest.

The beauty is that the tree structure is irrelevant. You can even throw away all non-bottom-level nodes, and not bother storing the tree at all, because the order of the bottom-level leaves is effectively one big binary number consisting of left-right choices at each level. This is because you've built it strictly in order.

A good practical way to implement this might be as a linked list where you replace each list-item by two at each stage. The list length obviously doubles for each line of ascii text, but since you know the file length, you can check if this is a problem, and split things into blocks.

But I'm sure there are other solutions, and since this is a 10-sec instant reply, the other solutions are likely to be better.
 
Well, now the DAG monster is ready to...debug?..

I have used VC++ 9 so all STL includes are placed in stdafx.h header. If you have another compiler, add stdafx.h to the project (get includes list from dagstaff.cpp) or erase #include stdafx.h and add system includes explicitly.
Alas, I can't prepare docs/guides/manuals now. Start from dagtest.cpp and go on (At first copy/paste all sources;)...
dagstuff.h:
Code:
/** @file
 *  Special DAG structure template declarations.
 *  Copyright (C) ArkM, 2008. All rights reserved.
 *  Alpha version 2008-09-25/2.
 *  Free for non-commercial use.
 */
#ifndef DAGSTUFF_H
#define DAGSTUFF_H
/// Misc class predeclaration.
class DagLoaderImpl;

/// Helper to load DAG file with double values
/** It's impossible to declare it explicitly. */
class DagLoader {
public:
    virtual ~DagLoader() { close(); }
    static DagLoader* create();
    void Delete();
    bool open(const char* fname);
    int countLines();
    bool getLine();
    bool getNext(double& x);
    void close();
private:
    DagLoader():pImpl(0) {}
    DagLoaderImpl* pImpl;
};

/// Special DAG structure based on the two-way tree.
/** It's possible to include any user data in verticies.
 *  By default it's an int var == 0.
 *  Weight type: any number type.
 *  Not copiable.
 */
template<typename Weight = int, class UserData = typename Weight>
class DagBase {
public:
    DagBase(): pool(0), rows(0), loader(*DagLoader::create()), pRow(0)
    {}
    virtual ~DagBase() {
        delete pool;
        delete pRow;
    }
    /// The DAG root has level index zero.
    /** columns index started from zero. */
    Weight weight(int level, int column) const {
        return (pRow[level]+column)->weight;
    }
    UserData& data(int level, int column) {
        return (pRow[level]+column)->data;
    }
    const UserData& data(int level, int column) const {
        return (pRow[level]+column)->data;
    }
    UserData* dataPtr(int level, int column = 0) {
        return &(pRow[level]+column)->data;
    }
    const UserData* dataPtr(int level, int column = 0) const {
        return &(pRow[level]+column)->data;
    }
    int levels() const { return rows; }
    /// Load DAG file.
    bool load(const char* fname) {
        bool res = loader.open(fname);
        if (!res)
            return false;
        int n = loader.countLines();
        if (n <= 0)
            return false;
        int i;
        int level = 0;
        Weight w;
        double x;

        int cells = n *(n+1)/2;
       
        delete [] pool;
        delete [] pRow;
        DagNode* p;
        pool = p = new DagNode[cells];
        pRow = new DagNode*[n];
        // Avoid STL dependencies in the header...
        //pRow.reserve(n);
        while (loader.getLine()) {
            //pRow.push_back(p);
            pRow[level] = p;
            n = ++level;
            for (i = 0; i < level && loader.getNext(x); ++i) {
                w = static_cast<Weight>(x);
                p++ ->weight = w;
            }
            if (i < level) {
                loader.close();
                res = false;
                break;
            }
        }
        if (res)
            rows = n;
        return res;
    }
    /// Probe DAG cursor class (DAG navigation).
    /** @todo Add indicies check up... */
    class cursor {
    public:
        typedef DagBase<Weight,UserData> Dag;
        /// Must be attached before using!
        cursor():i(0), j(0), n(0), pdag(0)
        {}
        explicit cursor(Dag& dag):
          i(0), j(0), n(dag.levels()), pdag(&dag)
        {}
        
        void attachDag(Dag& dag) {
            i = j = 0;
            pdag = &dag;
        }
        int getLevel()  const { return i; }
        int getColumn() const { return j; }

        bool toRoot() {
            i = j = 0;
            return n > 0;
        }
        bool move(int level, int column) {
            if (level >= 0 && level < n &&
                column >= 0 && n - column >= 1) {
                i = level; j = column;
                return true;
            }
            return false;
        }
        bool toBottomLeft() {
            return move(n-1,0);
        }
        bool mightDown() const { return n - i >= 1; }
        bool mightUp()   const { return i > 0; }
        bool mightLeft() const { return j > 0; }
        bool mightRight()const { return i - j >= 1; }

        bool downLeft() {
            if (n - i > 1) {
                ++i;
                return true;
            }
            return false;
        }
        bool operator++() { return downLeft(); } // ++c
        bool downRight() {
            if (n - i >= 1) {
                ++i, ++j;
                return true;
            }
            return false;
        }
        bool operator++(int) { return downRight(); } // c++
        bool upLeft() {
            if (i > 0 && j > 0) {
                --i, --j;
                return true;
            }
            return false;
        }
        bool operator --() { return upLeft(); } // --c
        bool upRight() {
            if (i > 0 && i - j >= 1) {
                --i;
                return true;
            }
            return false;
        }
        bool operator --(int) { return upRight(); } // c--

        Weight weight() const { 
            return pdag?pdag->weight(i,j):Weight(0); 
        }
        UserData& data() { return pdag->data(); }
        operator bool() const { return i < n; }
    protected:
        int i, j, n;
        Dag* pdag;
    }; // end of cursor definition.
protected:
    struct DagNode {
        DagNode():weight(0), data(0)
        {}
        Weight weight;
        UserData data;
        /// dag[i][j] gets vertex weight.
        Weight operator[](int j) const {
            return this[j].weight;
        }
    };
    DagNode* pool;
    // No std::vector dependency now...
    //std::vector<DagNode*> pRow;
    DagNode** pRow;
private:
    DagLoader& loader;
    int rows;
    DagBase(const DagBase&);
    DagBase& operator =(const DagBase&);
};
#endif
dagstuff.cpp:
Code:
/** @file
 *  DAG file loader implementation.
 */
#include "stdafx.h"
/** STL header dependencies:
#include <iostream>
#include <ifstream>
#include <sstream>
#include <string>
*/
using namespace std;

#include "dagstuff.h"

class DagLoaderImpl {
public:
    DagLoaderImpl():level(0) {
        line.reserve(4096); // size does no matter...
    }
    ifstream f;
    std::string line;
    std::istringstream str;
    int level;
};

bool DagLoader::open(const char* fname) {
    if (!fname)
        return false;
    if (!pImpl)
        pImpl = new DagLoaderImpl;
    DagLoaderImpl& impl = *pImpl;
    if (impl.f.is_open())
        impl.f.close();
    impl.f.open(fname);
    if (!impl.f.is_open())
        return false;
    impl.level = 0;
    impl.line.erase();
    return true;
}

DagLoader* DagLoader::create() {
    return new DagLoader();
}

bool DagLoader::getLine() {
    bool res = false;
    if (pImpl && getline(pImpl->f,pImpl->line)) {
        pImpl->str.clear();
        pImpl->str.str(pImpl->line);
        pImpl->str.seekg(0);
        pImpl->level++;
        res = true;
    }
    return res;
}

bool DagLoader::getNext(double& x) {
    bool res = false;
    if (pImpl && (pImpl->str >> x))
        res = true;
    return res;
}
/// Count lines in the opened file.
int DagLoader::countLines() {
    if (!pImpl)
        return 0;
    DagLoaderImpl& impl = *pImpl;
    ifstream& f = pImpl->f;
    if (!f)
        return 0;
    int n;
    for (n = 0; getline(f,impl.line); ++n)
        ;
    f.clear();
    f.seekg(0);
    return n;
}

void DagLoader::close() {
    if (pImpl) {
        if (pImpl->f.is_open())
            pImpl->f.close();
        pImpl->level = 0;
        pImpl->line.erase();
    }
}

void DagLoader::Delete() {
    delete this;
}
dagtest.cpp:
Code:
/** @file
 *  Start your program bootstrapping from this file...
 */
#include "stdafx.h"

using std::cout;
using std::endl;

#include "dagstuff.h"

//typedef DagBase<int,int> Dag; // debug stage...
/// DAG class with full dag[i][j] support, int weights
class Dag: public DagBase<int> {
public:
    typedef DagBase<int> DagBase;
    const DagBase::DagNode& operator[](int i) const {
        return *pRow[i];
    }
};
// Test DAG file name:
const char* fName = "euler.txt";
/* euler.txt contents:
3
7 5
2 4 6
8 5 9 3
*/
void TestDag() {
    Dag dag;
    dag.load(fName);
    int levels = dag.levels();
    cout << "DAG levels " << levels << endl;
    // Print DAG weights:
    for (int i = 0; i < levels; ++i) {
        int n = i + 1;
        for (int j = 0; j < n; ++j)
            cout << ' ' << dag[i][j];
        cout << '\n';
    }
    cout.flush();
    Dag::cursor cur(dag);
    cur++;//cur.downRight();
    ++cur;//cur.downLeft();
    cout << "weight[" << cur.getLevel() << "][" 
        << cur.getColumn() << "] "
        << cur.weight() << endl;
}
/* Output:
DAG levels 4
 3
 7 5
 2 4 6
 8 5 9 3
weight[2][1] 4
*/
 
Wow ..
That is a lot more than I expected to get! I shall get to the studying! Thanks a lot; will let you know how I fare.

Tao Te Ching Discussions : Chapter 9 (includes links to previous chapters)
What is the nature of conflict?
 
Thanks for a really interesting thread.

Faced with such a beautiful general implementation of a data structure, I feel obliged to flesh out my rather horrible explanation with at least some pseudo-code. So here it is. This does absolutely nothing except find the path through your triangle of numbers that gives the largest outcome. It's absolutely not general.

To find the path
================
PathArray = array[0] starting with one value, 1
OldArray = array[0] starting with value of first line of ascii
While there are more lines of ascii text
Create NewArr twice the size of OldArr
NewIndex = 0
OldIndex = 0
PathIndex = 0
Read Number1
While numbers remain on current ascii line
Read Number2
Loop PathArray[PathIndex] times:
NewArr[NewIndex] = OldArr[OldIndex] + Number1
NewArr[NewIndex+1] = OldArr[OldIndex] + Number2
NewIndex = NewIndex + 2
OldIndex = OldIndex + 1
End Loop
Number1 = Number2
Increment PathIndex
EndWhile
Replace OldArr by NewArr, and get rid of previous OldArr
Create New PathArr one item longer than previous
a = 0
for i = 0 to end of old PathArr
New PathArr = Old PathArr + a
a = Old PathArr
EndFor
New PathArr[i+1] = 1
EndWhile

To find the best path and print it
==================================

i = location of largest value in final NewArr[0..n]
c = number of ascii lines - 1 (i.e. number of choices)
while c > 0
if (i mod 2) = 0 print "L" else print "R"
i = i divided by 2
endwhile

I've only tested this on the back of an envelope, but I think it works. PathArr is very old-fashioned stuff, just triangular numbers to show by how many paths through the tree you can reach the current two ascii numbers. It holds values such as
1
1 1
1 2 1
1 3 3 1
etc.

Given the 4 line input
3
7 5
2 4 6
8 5 9 3
you should find array values
20 17 19 23 17 21 23 17
The largest value is 23, for instance in position 3, which corresponds to left-right-right choices (3+7+4+9), and 3 in binary is 011, which is decoded to left-right-right
You can also get 23 from position 6, i.e. binary 110, or right-right-left(3+5+6+9)

Sorry if this is all rubbish, it's late here...
 
No, no! Has anyone red-flagged their own post before for being a dreadful algorithm? The method above does work, but it looks at every possible path through the mesh of numbers, which is hideously inefficient, scaling awfully as the tree gets bigger.

(1) For a start, the array PathArr stores the number of ways to arrive at any given pair of numbers. The algorithm follows forwards all of these ways, but they are all the same from this point. It's therefore only necessary to follow one of them, using the value of the best path so far.

(2) But even better, one should start at the other end of the tree. If we write the same example as before, but bottom-to-top:
8 5 9 3
2 4 6
7 5
3
It's obvious that when we reach 2 in the 2nd line, we are going to choose 8 rather than 5 as our onward path. Therefore the true value of the node "2" is 8+2, and it will be followed by a left choice. We therefore process the first two lines by adding the better choice in each case:
(8) (5) (9) (3) <--- this line no longer needed
10L 13R 15L
7 5
3
Now continue to the next line:
(8) (5) (9) (3)
(10L) (13R) (15L)
20Rr 20Rl
3
Now on the final line it's obvious that both directions yield the same result; we have a sum of 23 and routes LRR or RRL.

Personally I'd implement this by importing the whole ascii file into a 2d matrix, either ragged or a half-occupied square one, and keeping a spare matrix to contain Ls and Rs. Alternatively, if space were an issue, it would be possible to maintain only the two working lines, but this would involve reading the ascii file backwards.

The lessons for me are (1) don't post quick solutions; (2) be careful about letting the data structure dictate the algorithm.
 
First of all, thank you, lionelhill, for your remark on my improvisation. You are right, better fit data structures to algorithmes than vice versa (see my previous posts;) but I have this template so that's a time to solve the problem in practice (if not - what for these efforts?;).

Please, look at cursor::mightBlaBla family in dagstuff.h; must be:
Code:
    bool mightDown() const { return n - i > 1; }
    bool mightUp()   const { return i > 0; }
    bool mightLeft() const { return j > 0; }
    bool mightRight()const { return i - j >= 1; }
It's the same approach in dynamic programming style:
Code:
/* DAG
    3
   7 5
  2 4 6
 8 5 9 3
*/
#include "dagstaff.h"
/// The problem solver data type (UserData)
struct Data {
    /// UserData class must accept UserData(0).
    explicit Data(void* = 0):sum(0), goleft(false) {}
    //int  sum;   //< Sum of the optimal downward path.
    /// Better use double to prevent integer overflow
    double sum;
    bool goleft;//< Optimal downward direction flag.
};
/// Problem DAG data structure:
class Pyramid: public DagBase<int,Data> {
public:
    typedef DagBase<int,Data> DagBase;
    /// I'm too lazy to think about this op packaging
    const DagBase::DagNode& operator[](int i) const {
        return *pRow[i];
    }
};
/**
 *  Solve the max problem with O(N) complexity.
 *  Trivial modifications for min etc possible.
 *  Ave Mr.Bellman for Dynamic Programming!..
 *  See [URL unfurl="true"]http://en.wikipedia.org/wiki/Dynamic_programming[/URL]
 */
void Solver(const char* fname) {
    Pyramid grid;

    if (!fname || !grid.load(fname)) {
        cout << "*** Can\' load DAG" << endl;
        return;
    }
    int levels = grid.levels();
    if (levels <= 1) {
        cout << "Trivial solution only.\n" << endl;
        return;
    }

    int leaves = levels - 1;
    Data* pdata = grid.dataPtr(leaves);

    /// Initialize leaves level (sum = weight).
    for (int i = 0; i < levels; ++i) {
        grid.data(leaves,i).sum = grid.weight(leaves,i);
    }
    /// Now we are ready to go...
    Pyramid::cursor cur(grid);
    double sleft, sright; // avoid int overflow
    bool goleft;

    cur.toBottomLeft();
    while (cur.mightUp()) {
        sleft = cur.data().sum;
        cur--;
        cur++;
        sright = cur.data().sum;
        --cur;
        // Mark optimal path direction in the prev row.
        cur.data().goleft = goleft = (sleft > sright);
        cur.data().sum = cur.weight()
            + (goleft?sleft:sright);
        cur++; // move to the next vertex on the curr level
        if (!cur.mightRight()) // end of curr level?
            cur.move(cur.getLevel()-1,0); // level up
    }
    // Done. Print the problem solution.
    cur.toRoot();
    cout << "Payoff: " << cur.data().sum << "; Path:\n";
    double sum = 0;
    for (;;) { // What's a pleasure this loop forever!..
        sum += cur.weight();
        cout << cur.getLevel() << ',' << cur.getColumn() 
            << " (" << cur.weight() << ") " << sum;
        if (cur.mightDown())
            cout << (cur.data().goleft?"\t/\n":"\t\\\n");
        else {
            cout << "\t$\n";
            break;
        }
        cur.data().goleft? ++cur: cur++;
    } while (cur.mightDown());
    cout << "That\'s all, folks!\n";
    cout.flush();
}
/// Test run...
void TestAlg() {
    Solver("euler.txt");
}
/* Output:
Payoff: 23; Path:
0,0 (3) 3       \
1,1 (5) 8       \
2,2 (6) 14      /
3,2 (9) 23      $
That's all, folks!
*/
Generally speaking no need in special data structures for this alg. It's possible to implement it in a common (even Fortran-like) style as an array numbers mill.

So that's how it is...
 
Ohh, the last line of for (;;) printing loop must be simple }, not } while...
It's interesting: no C++ syntax (or semantics;) error in that case. Yes, do-while loop is a bad thing (it was here before the last version with for(;;) loop...
 
Nice! One of the most educational tek-tips experiences I've had in a long time. Many thanks, ArkM for the fully-fleshed out solution and pointing me at DAGs & Mr Bellman, and Trevoke for starting the whole thread in the first place.
 
Most definitely.. Many thanks to both of you :) I was clearly making my own problem bigger than it needed to be, but it allowed me to learn rather important principles!

Tao Te Ching Discussions : Chapter 9 (includes links to previous chapters)
What is the nature of conflict?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top