Hi,
The project I'm working on is a web service that provides a page that can be used to send an e-mail to our customer base. The e-mail addresses are gathered from 6 different sources in our system, and each source has 6 different categories. The user will choose a category, and then any combination of sources. I need to calculate and display the number of unique e-mail addresses the user's combination choice will result in. E-mail addresses may be duplicated across multiple sources, and even within a source's categories.
I'm trying to find an efficient way to provide summary data to the webpage that doesn't involve sending the 36 lists of up to 2000 e-mail addresses each and doing the de-duplication in javascript whenever the user changes their choice.
I can't just provide a count of e-mail addresses for each category in each source, because if the user selects multiple categories, and I add all the selected category counts together, the total will be too high due to it including duplicates.
One idea I had was to generate a master de-duplicated index of e-mails, then send 36 bit-strings where each bit is set if the e-mail address at that bit's position in the master list is present in that source/category list. So if the master de-duplicated list had 5000 e-mails in it, each bit-string would have 5000 characters in it, and character eg. 401 would be '0' if the e-mail in masterList[401] was present in that source/category. '1' if it was. That would cut down on the data sent, but still be a computational task in javascript.
Then I figured it would be simpler to just calculate every combination of category for each source. 6 sources and 6 categories = 384 summaries (6 * 2^6). What I don't like about this is that it has the potential to skyrocket if more categories are added (each category doubles the number of summaries that must be generated).
I reckon there's got to be a smart shortcut to all this that involves sending less data, and a faster algorithm to calculate the totals I'm after.
Any thoughts are appreciated.
Thanks.
The project I'm working on is a web service that provides a page that can be used to send an e-mail to our customer base. The e-mail addresses are gathered from 6 different sources in our system, and each source has 6 different categories. The user will choose a category, and then any combination of sources. I need to calculate and display the number of unique e-mail addresses the user's combination choice will result in. E-mail addresses may be duplicated across multiple sources, and even within a source's categories.
I'm trying to find an efficient way to provide summary data to the webpage that doesn't involve sending the 36 lists of up to 2000 e-mail addresses each and doing the de-duplication in javascript whenever the user changes their choice.
I can't just provide a count of e-mail addresses for each category in each source, because if the user selects multiple categories, and I add all the selected category counts together, the total will be too high due to it including duplicates.
One idea I had was to generate a master de-duplicated index of e-mails, then send 36 bit-strings where each bit is set if the e-mail address at that bit's position in the master list is present in that source/category list. So if the master de-duplicated list had 5000 e-mails in it, each bit-string would have 5000 characters in it, and character eg. 401 would be '0' if the e-mail in masterList[401] was present in that source/category. '1' if it was. That would cut down on the data sent, but still be a computational task in javascript.
Then I figured it would be simpler to just calculate every combination of category for each source. 6 sources and 6 categories = 384 summaries (6 * 2^6). What I don't like about this is that it has the potential to skyrocket if more categories are added (each category doubles the number of summaries that must be generated).
I reckon there's got to be a smart shortcut to all this that involves sending less data, and a faster algorithm to calculate the totals I'm after.
Any thoughts are appreciated.
Thanks.