Search without an extra field 1

Koen Piller · Aug 28, 2020

Hi,

Based on a previous thread I have a similar problem:

My dbf consists of a ID which is a little bit different constructed as we know usualy: it is unique, with the following construction:
a.b.c.d.e
where a is the master and is always 1
where b is the first related record to a
where c is the first related record to b
a.s.o.
so you can end up ID's like:
1.1
1.1.2.1
1.1.2.2
1.2.1
1.3.1
which I would like to order:
1.1.
1.2.1
1.3.1
1.1.2.1
1.1.2.2
a possibility would be to transfer this to a number, where as, since a relation of > 20 is impossible, the number should like for 1.1.2.1 -> 1010201. I have constructed a procedure for doing this and added an extra record to my dbf. Works OK. But with the solution of "Anglisize" of Griff in mind, could we avoid this extra record and construct a similar function for this indexing?

Stay healthy,

Koen

Olaf Doschke · Aug 28, 2020

How is that stored? Do you have 4 fields, each int?

As each individual value will stay low you get to 4 hex digits, perhaps, which would even only need 2 bytes. Even if you allow up to 256 values per level you'd get this into one integer just by CHR():

INDEX ON CHR(a)+CHR(b)+CHR(c)+CHR(d) TAG hierarchy

And when a really always is one, you don't even need to include it:

INDEX ON CHR(b)+CHR(c)+CHR(d) TAG hierarchy

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

Koen Piller · Aug 29, 2020

Olaf,

I have field IdentificationID c(30) and I have field IdenIndex n(20,0) The IdentificationID can go up to 15 digits, (14x number + 14 dots) So I have 2 fields only. Your solution would suggest to use n Number of IdentificationID's. This is a family tree project, each new digit represents a descendent.
It is a system which works OK, I only want to have a different sorting as described, which can be arranged by transforming the IdentificationID from C to N.

Stay healthy,

Koen

GriffMG · Aug 29, 2020

Do your numbers always fall in the range 1-9 or can they go to double/treble digits or more?

Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are !good for you.

Olaf Doschke · Aug 29, 2020

Well, I was expecting 4 fields from what you first described.

So the overall hierarchy id only is given with parent records? A single record does only have its own supposition in IdenIndex and a reference to the parent in IdentificationID c(30)?

Then you'd have to index a combination of those: IdentificationID+CHR(IdenIndex). Provided IdenIndex will stay at least below 256. The question is why its n(20,0) when you say each level will stay a low number, or are you trying to build up the number to index in there? Can you give some sample for better understanding, because it still doesn't get clear where you store what part of say "1.2.3.1", I now assume IdentificationID would store "1.2.3" and IdenIndex would be 1 to combine to "1.2.3.1"

The usual solution for compound indexes would take BINTOC(integerfield), but if numbers are low CHR

only uses up 1 byte instead of 4 and makes the index expression shorter.

If you have "1.2.3" in IdentificationID you might get into trouble with out-of-order sorting when there can also be "1.10.2" for indexing you'd need something as you said yourself in your first post, that keeps the number right aligned with leading 0 and then when everything is aligned that way can also drop the dots and put everything into a single number. Otherwise such an example sorts in wrong, obviously:
"1.1.1"
"1.10.1"
"1.10.2"
"1.2.1"
"1.2.2"
"1.3.1"
...

Computing a single int from this firs part plus a 4th number you could go up to 99 in each level and have an 8 digit number. An int would suffice. But 4 CHR() are also just 4 bytes and offer up to 256 levels. Even if it's not more than 10 a single byte per level is not too much and its extensible for more levels, just 1 byte per level, index limit is 240 bytes...

It might not matter when anything stays even below 10 you could even store up to 8 levls in a single integer or if the root level only is 1 use the fact you can even store any number between 1 and 2 billion and set the first digit 1. But you won't need it if it can only be 1. You could store n-1 and start counting from 0 in this computed integer, then the 9th digit of an int could be 0 or 1 and store up to 2 levels, the root number could then be 1 or 2 stored as 0 or 1.

If that's all right assumptions you could go back to my suggestion of four CHR(), but no, not with numbers of several records, whatever number of levels are in IdentificationID need to come from there and the last one comes from IdenIndex, I assume. And then you could PADL() this to the max number of levels with CHR(0).

A completely other way is the question how I would store a hierarchy and be able to sort recorded by it without needing to traverse multiple records. I'd still rather want no special id like your char field ID. Just mainly an autoinc and a parentid as foreign key. This only sorts in chronological order of records populating and 1.2.2 might be inserted after 1.3.1. You need some extra info, but that could be the levelof a record. Wehn I'D then index on level,parentid,id I'd have all root records at the top followed by 2nd level records etc., so this isn't necessarily the order of how nodes appear in a treeview with all nodex expanded, but it'd give me enough overview. So I would not insist on haveing an index that sorts a tree in the same order a treeview would display it. Because that's what a treeview does, I don't need a browse to do that.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

Olaf Doschke · Aug 29, 2020

One question from my side. If at least some assumptions I have are correct and you store the hierarchy id of the parent node record, then how will you cope with such situations. Say you have the following hierarchy ids stored in records:

1.1.1
1.1.2
1.1.3
1.2.1
1.1.4
1.2.2
1.3.1
1.4.1
1.2.3

Now you see 1.2.2 is totally misplaced and actually belongs into the 1.1 branch or even some completely different set of 2.x records (though ou say root always is 1 I think of the general word section numbering case).

Then removing 1.2.2 is one thing and setting its fields. But this cascades to all 1.2.x records. In this case just 1.2.3, but it's enough that it effects other records. Any delete or update would need to cascade through records of the same branch and renumber them. And then this also affects all child data from there, if further levels exist.

That is a reason you don't store this in your data and you only have the parent ference and that srays valid forever, no matter what else happens with nodex.

It's the very general reason to have ids that don't have any information about the data itself, actually not even chronological order. In theory, a sequence number id should be replaceable with a GUID/UUID at any time for the general case an id wouldn't even sort data with no hierarchy or only fixed levels of hierarchy in a table dedicated to each level like orders and orderitems where clearly the orderitem level just needs other attributes than the order itself. I am relaxing that rule a bit as I don't see a big minus in using the fact sequences order themselves. I would just always accept the modification of data to leave gaps, so this has to work out with them existing to conform to at least almost all rules about primary keys.

You can always overrule things for good reasons, and hierarchy ids are. SQL Server actually has a hierarchyid type for the special case of files tables, where files are coming from a directory tree also having a hierarchy, so they thought of a special data type for this, you're not alone in this. I'd have to dig in what they do there, But you can also read about this here:

https://docs.microsoft.com/en-US/sq...a-type-method-reference?view=sql-server-ver15

It starts a bit disappointing as it says:

hierarchyid said:
A column of type hierarchyid does not automatically represent a tree. It is up to the application to generate and assign hierarchyid values in such a way that the desired relationship between rows is reflected in the values.

So it doesn't automate the wish to sort data the way a treeview would, it even leaves it as task to you. But you might find some ideas in there.

The case of cascaded updates necessary is not a fun case, so I'd just lie with some but not the full order. It is simple enough to develop a treeview "browse" form to show data. Computers are fast enough to expand a whole table into such a view and then you don't need to be able to sort by an index, the sort order just eveolves naturally in the nodex of a treeview populated node by node with just the parentid known in each step, not the whole hierarchy above.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

GriffMG · Aug 29, 2020

I think the number of 'words' is key to this, so shorter (less words) numbers take priority over longer ones - so you end up with the word count being the first part of
the index key. On your limited dataset, the code below works well

Code:

FUNCTION HIERACHY
	PARAMETERS M.KEYFIELD
	PRIVATE M.KEYFIELD,M.STRING,M.WORDCOUNT,I
	M.WORDCOUNT = GETWORDCOUNT(M.KEYFIELD,".")
	M.STRING = STR(M.WORDCOUNT,3,0)
	FOR I = 1 TO M.WORDCOUNT
		M.STRING = M.STRING + STR(VAL(GETWORDNUM(M.KEYFIELD,I,".")),3,0)
	NEXT
	RETURN(M.STRING)

Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are !good for you.

Koen Piller · Aug 29, 2020

Griff,

your question:

Griftt said:
Do your numbers always fall in the range 1-9 or can they go to double/treble digits or more?

The numbers always fall in a range of 1 up, upto, theoretialy 99, in practice never > 15 since the numbers represent the sequence number of the descendent of the previous number.
So you have e.g. 1.2.1.3
so going form left to right:
3 is the third descendent of 1 and i is the first descendent of 2 and 2 is the second descendent of 1.
And a mother with >15 childs is possible but > 99 impossible
Capito?

Stay healthy,

Koen

GriffMG · Aug 29, 2020

In that case, this will do the job nicely:

Code:

FUNCTION HIERACHY
	PARAMETERS M.KEYFIELD
	PRIVATE M.KEYFIELD,M.STRING,M.WORDCOUNT
	M.WORDCOUNT = GETWORDCOUNT(M.KEYFIELD,".")
	M.STRING = STR(M.WORDCOUNT,2,0)
	FOR I = 1 TO M.WORDCOUNT
		M.STRING = M.STRING + STR(VAL(GETWORDNUM(M.KEYFIELD,I,".")),2,0)
	NEXT
RETURN(M.STRING)

Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are !good for you.

Olaf Doschke · Aug 29, 2020

You could also store the hierarchy that way in the first place. Let the id of a child be parent*100+childlevel, then you don't need a function at all, you handle this in the first insert.

Or, as you keep the parent hierarchy and the current level separate, this happens delayed by one layer, ie you compute parent+level of a parent record and use that as parentid of the child record l´plus its first record has 1, then siblings get 2,3,4...

Keep it to be a number and not make a string out of it. You'd just have more complex join conditions.

childtable.parentid = parenttable.parentid*100+level

Which shows the composition of parentids starts with the grandparentid.

Edit: Anyway you do it numerically, the composition of numbers would only sort same root levels like the tree does, when you always put the first level on the leftmost, most significant digits.
Sou you'd limit the levels to 4 and put the first level with factor 1000000 (1 million), the second level with factor 10000 (ten thousand), then 100 and finally 1. You'd need decimal places for the next levels. A floatgiv3es about 15.x decimal places, you can ensure 14, so you'd be able to do 7 levels with a double float. N(20) can store 10 levels, indeed, but even when your hierarchy id is just 3 levels like 1.2.2 you'd put the 1 in the two most significant places of the number to compute as representing it. There will be no id below 1 million then. And it also wouldn't make sense to start counting with 0 as I previously suggested, as 0 then is ambiguous, does it mean 1? 1.1? 1.1.1?

You also have room for more than enough siblings or children with 2 digits as you say yourself.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

Koen Piller · Aug 29, 2020

Olaf,
Good idea however , sorry I cannot, the hierarchy ID is an imported field.

Griff,

Your procedure 'Hierarchy' is just what I want, works like a charm. I owe you another one! Thanks a lot.

Stay healthy,

Koen

GriffMG · Aug 29, 2020

No problem, a fun exercise to keep the noodle working

Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are !good for you.

GriffMG · Aug 29, 2020

Point to note, the latter version of Hierachy() only allows for 99 'words' or generations - about 2500 years?

Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are !good for you.

Olaf Doschke · Aug 29, 2020

Shouldn't 1.1.1 and 1.1.2 sort between 1.1 and 1.2?

The idea to put the word coznt in front doesn't solve that. it sorts 1.1 and 1.2 before the first child 1.1.1, for example.
It would just need a right padded value, but you need to decide for a maximum hierahcy length, not just the max number per node, the number of maximum child level.

And as Griffs function still returns strings you can index up to almost arbitrary levels. It just won't sort them as sections would wort in a Word document, for example.
And be sure your index doesn't start on a short node, the cDX tag VFP cdreates will orient itself on the length of the first indexed entry, which in general poses a problem with variable string length expressions.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

GriffMG · Aug 29, 2020

I think it sorts in the order he specified Olaf.
It's not the order you would expect in a word document you are quite correct.

Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are !good for you.

Olaf Doschke · Aug 29, 2020

Well, not only that, it doesn't gain much anything in comparison to just indexing the keyvalue itself. Or bintoc(level)+keyvalue

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

Olaf Doschke · Aug 29, 2020

Koen said:
1.1.2.1 -> 1010201

Wasn't this the original idea of transformation?
I don'T know where the idea changed, but I surely also contributed other ideas, including the level in some form, too. Nothing wrong with that. The wordcount in front doesn't help with sorting, these hierarchies as is, though, so I'm just not sure whether Koen just didn't look into that detail yet.

It's simple enough to remove it, too.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

Koen Piller · Aug 30, 2020

Hi,

Olaf said:
Shouldn't 1.1.1 and 1.1.2 sort between 1.1 and 1.2?

No the idea is to have a sequence:
1.1
1.2
1.3
1.1.1
1.1.2
1.1.1.1
1.1.1.2
1.1.2.1
1.1.2.2
1.1.2.3
~
1.1.2.10
1.1.2.11
1.2.1
I think Griff's Hierarchy() is doing just that. Have implemented on a big .dbf with values up to 1.3.1.4.11.1.2. and it works as I would like it to do.
Stay healthy,
Koen

Olaf Doschke · Aug 30, 2020

Usually, any subsection comes between main sections, ie 1.2 only comes directly afer 1.1 on that level of nodes, but expand 1.1 (or write nested sections in word with such a numbering) 1.2 only comes after any 1.1.x

If you don't need or even don't want that it's strange, but your decision, of course.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Search without an extra field 1

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Similar threads

Log in

Part and Inventory Search

Sponsor