Olaf Doschke
Programmer
Hi all,
I'm quite a novice to data warehousing.
How would you model a data warehouse for searching recipes? Searches will be made with
a) giving some relative amount of ingredients that have to be in a recipe recipes within that range are searched (Search for recipes having 10-20% apples and 20-40% sugar).
b) giving a full recipe the "most similar" recipes are serached (What recipe is most similar to "apple pie"?).
Recipes can have a varying number of ingredients out of thousands of raw materials and so a recipe could be represented as a N-dimensional point in the space of rawmaterials, where each rawmaterial will be a dimension (N in the range of 10.000!)and each value is going from 0% to 100%.
But most recipes of course have only 10-20 rawmaterials, being 0% in all other dimensions (ramaterials). So shouldn't it be possible to store this in several cubes with lower dimension.
Although a) could also be done on production data the amount of data makes this quite slow. There are three tables involved, main table is recipes, then there are subrecipes and these have ingredients, of whom some may be recipes or subrecipes. This of course can be denormalized to recipes with ingredients of a certain amount. I think that would be a good first step of course.
But then?
Building some clusters of recipes with similar ingredients?
Perhaps building a one-dimensional cube for each ingredient (would be equivalent to a simple index)?
I could build clusters per type of recipe, as those should be more similar to each other, but then you don't know...
How yould you model this?
Bye, Olaf.
I'm quite a novice to data warehousing.
How would you model a data warehouse for searching recipes? Searches will be made with
a) giving some relative amount of ingredients that have to be in a recipe recipes within that range are searched (Search for recipes having 10-20% apples and 20-40% sugar).
b) giving a full recipe the "most similar" recipes are serached (What recipe is most similar to "apple pie"?).
Recipes can have a varying number of ingredients out of thousands of raw materials and so a recipe could be represented as a N-dimensional point in the space of rawmaterials, where each rawmaterial will be a dimension (N in the range of 10.000!)and each value is going from 0% to 100%.
But most recipes of course have only 10-20 rawmaterials, being 0% in all other dimensions (ramaterials). So shouldn't it be possible to store this in several cubes with lower dimension.
Although a) could also be done on production data the amount of data makes this quite slow. There are three tables involved, main table is recipes, then there are subrecipes and these have ingredients, of whom some may be recipes or subrecipes. This of course can be denormalized to recipes with ingredients of a certain amount. I think that would be a good first step of course.
But then?
Building some clusters of recipes with similar ingredients?
Perhaps building a one-dimensional cube for each ingredient (would be equivalent to a simple index)?
I could build clusters per type of recipe, as those should be more similar to each other, but then you don't know...
How yould you model this?
Bye, Olaf.