Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Many to Many join but need one unique to one unique match - mysql newbie

Status
Not open for further replies.

Jacque

Technical User
Nov 9, 2001
301
US
First let me say what I'm working with is not optimal or preferred - (heavy on the sarcasm and lots of [hairpull3])
Also, I have limited table design experience and would great appreciate any help and/or suggestions and I'm fairly new to MySQL. I know there has got to be a way to accomplish this, I'm so close....

Here is the background of what I'm dealing with. (Sorry in advance for the length)
I have two tables - one contains call data and the other contains survey data.
Each has it's own id BUT there is no connection between the two tables except for date/time and there is a timezone difference between the two tables.
In the call table there are separate call time and date fields - both starting and ending and a field identifying that the call was transferred to a survey.
In the survey table there are separate call time and date - both starting and ending

My task is to create a linking table using the call end date/time and the timezone converted survey call start date/time to join on within a 0 to 5 second gap. This is the step is where I run into problems with the one to one unique matching. I have situations like the following where multiple calls can match multiple surveys.

Let's say I have two calls -
Call#1 9/1/2014 10:02:51
Call#2 9/1/2014 10:02:54

That can match two surveys -
Survey#1 9/1/2014 10:02:55
Survey#2 9/1/2014 10:02:56

What I'm running into is that a really small number of the surveys are not being loaded into the CallSurvey table and I think it's because they can match multiple calls but I'm not sure how to redesign it so that if Call#2 shows Survey#1 in the CallSurvey table, when it comes time for Call#1 to update it's Survey_ID, then it can only choose Survey#2. Later you will understand why that is how they should link up based upon the order of the updates.

I created a linking table, it contains the call id, survey id and a flag field to identify calls that connected to the survey - sometimes the call is transferred but the call is hung up or dropped so I need to capture that.

CREATE TABLE IF NOT EXISTS CallSurvey (
Call_ID varchar(254) NOT NULL,
Survey_ID int(11) ,
Completed int(11) NOT NULL Default 0, -- Was originally set as a bit but the tool I'm using for reporting on this data does not handle bit fields well so I reverted back to int(11)
PRIMARY KEY (`Call_ID`),
UNIQUE INDEX (`Survey_ID`)

I drop if exists, then create tables with just the fields I need for both the call and survey data. In the call table I create 5 1 second incremented call end times. I load them with the date and time fields concatenated. In the survey table I convert the timezone to allow them to match and in the case of the call data, load just those calls that have been identified as being transferred. See below:

CREATE TABLE `Call_CONVERTED` (
`Call_ID` varchar(254) NOT NULL,
`Call End Time_CST` datetime DEFAULT NULL,
`Call ET plus 1` datetime DEFAULT NULL,
`Call ET plus 2` datetime DEFAULT NULL,
`Call ET plus 3` datetime DEFAULT NULL,
`Call ET plus 4` datetime DEFAULT NULL,
`Call ET plus 5` datetime DEFAULT NULL,
`Date_Call_Ended` date DEFAULT NULL, -- used for validation
`Time_Call_Ended` time DEFAULT NULL, -- used for validation
`Survey` varchar(15) DEFAULT NULL,
PRIMARY KEY `Call_ID` (`Call_ID`),
KEY `Call End Time_CST` (`Call End Time_CST`),
INDEX `Call ET plus 1` (`Call ET plus 1`),
INDEX `Call ET plus 2` (`Call ET plus 2`),
INDEX `Call ET plus 3` (`Call ET plus 3`),
INDEX `Call ET plus 4` (`Call ET plus 4`),
INDEX `Call ET plus 5` (`Call ET plus 5`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;


INSERT INTO Call_CONVERTED
(SELECT
Call_ID,
STR_TO_DATE(concat(Date_Call_Ended, ' ', Time_Call_Ended), '%Y-%m-%d %H:%i:%s') AS `Call End Time_CST`,
Date_add(STR_TO_DATE(concat(Date_Call_Ended, ' ', Time_Call_Ended), '%Y-%m-%d %H:%i:%s'), INTERVAL 1 SECOND) as `Call ET plus 1`,
Date_add(STR_TO_DATE(concat(Date_Call_Ended, ' ', Time_Call_Ended), '%Y-%m-%d %H:%i:%s'), INTERVAL 2 SECOND) as `Call ET plus 2`,
Date_add(STR_TO_DATE(concat(Date_Call_Ended, ' ', Time_Call_Ended), '%Y-%m-%d %H:%i:%s'), INTERVAL 3 SECOND) as `Call ET plus 3`,
Date_add(STR_TO_DATE(concat(Date_Call_Ended, ' ', Time_Call_Ended), '%Y-%m-%d %H:%i:%s'), INTERVAL 4 SECOND) as `Call ET plus 4`,
Date_add(STR_TO_DATE(concat(Date_Call_Ended, ' ', Time_Call_Ended), '%Y-%m-%d %H:%i:%s'), INTERVAL 5 SECOND) as `Call ET plus 5`,
Date_Call_Ended,
Time_Call_Ended,
Survey
FROM Calls
WHERE Survey = '1234567890');

CREATE TABLE `Survey_CONVERTED` (
`survey_id` int(11) NOT NULL,
`Call Start Time_Converted` datetime DEFAULT NULL,
`call_date` date DEFAULT NULL,
`call_time_start` time DEFAULT NULL,
PRIMARY KEY `survey_id` (`survey_id`),
KEY `Call Start Time_Converted` (`Call Start Time_Converted`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;


INSERT INTO Survey_CONVERTED
(SELECT
survey_id,
Date_ADD(STR_TO_DATE(concat(call_date, ' ', call_time_start), '%Y-%m-%d %H:%i:%s'), INTERVAL 1 Hour) AS `Call Start Time_Converted`,
call_date, -- used for validation
call_time_start -- used for validation
FROM Survey);

Next I loaded all the calls into the linking table.
<Each update statement runs separately - I have to do that based on the tool I'm using, so they are in running in order of Call_CONVERTED.Call End Time = Survey_CONVERTED.Call Start Time_Converted Matching exactly, then the one second difference and so on until it completes the update of the 5 second difference.>

Below is an example of an updates using the 3 second difference timeframe.

UPDATE IGNORE CallSurvey
INNER JOIN Call_CONVERTED
ON CallSurvey.Call_ID = Call_CONVERTED.Call_ID
INNER JOIN Survey_CONVERTED
ON Call_CONVERTED.`Call ET plus 3` = Survey_CONVERTED.`Call Start Time_Converted`
SET survey_id = Survey_CONVERTED.survey_id,
Completed = 1;

I tried adding a where clause to limit the available surveys to include in the update but I wasn't sure how to limit them. I'm sure at this point it's something simple but I'm missing it. Any help you can provide would be appreciated. Thanks.
 
are you after something like this ?

Code:
select s.*,c.id
from   surveys s
join   calls c
on     s.startTime between c.starttme and date_add(c.starttime, interval 5 second)
and    s.endTime between date_sub(c.endtime, interval 5 second) and c.endtime

 
Hi jpadie,
Thanks for responding, unfortunately that's not the issue. My issue is that some of the calls date/time can 'match' multiple surveys and vice versa, and the data that meets this criteria sometimes does not update the CallSurvey table, I don't know why it doesn't update and haven't come up with a way to guarantee that it does work as expected.

I can only have one call match one survey, so if a survey has been matched to a call, the next call which could also match it - would instead 'skip over/pass by' that survey and get matched with the next appropriate survey.

Yeah, it's hokey, but it's what I have to work with. [morning]

Again, thanks for any help.
 
can you then check for a null field in you join table?
 
You mention that this is a one-to-one relationship, but is it really? So one call record links to only one survey record and vice versa? Is there a reason you wouldn't use a unique record id in the tables then?

Max Hugen
Australia
 
Hi jpadie, yes thanks - I realized that and fixed that afterwards but I'm still losing a few records when I update the table.

Hi maxhugen, "Many to Many join but need one unique to one unique match" is how I titled this craziness. Unfortunately this situation came about due to lack of planning, 'merging' of disparate systems and management saying "hey this is good information, we need it now, it takes too long for it to be pulled manually (yes someone actually did it manually), Jacque go make it happen" (of course I'm paraphrasing some of that and being a tad bit sarcastic). This is supposed to be corrected in the near future but until that comes to fruition, I've got to deal with the hand I've been dealt. The CallSurvey table I've created needs to be one-to-one relationship (1 Call matching 1 Survey) however there really is nothing but timing to base that relationship on. Because of this, I can run into calls that can match more than one survey and surveys that can match more than one call. I thought that setting the Call_ID as my primary key and the Survey_ID as a unique index would allow me to accomplish this but no, I'm still missing a few surveys.

I was hoping that there was a way to use something like an existence test but all my attempts give me an error message that says
"Error Code: 1093. You can't specify target table 'CallSurvey' for update in FROM clause"
Currently the only thing in the where clause of the update statements is the null test.

UPDATE IGNORE CallSurvey
INNER JOIN Call_CONVERTED
ON CallSurvey.Call_ID = Call_CONVERTED.Call_ID
INNER JOIN Survey_CONVERTED
ON Call_CONVERTED.`Call ET plus 3` = Survey_CONVERTED.`Call Start Time_Converted`
SET survey_id = Survey_CONVERTED.survey_id,
Completed = 1
WHERE CallSurvey.survey_id is null;

Thanks for any help you can provide.
 
Could you create a temporary table, and then join on that?

Max Hugen
Australia
 
Can we take a step back to the two raw data tables?

How do you define that a survey relates to a particular call?

I assume that there are multiple inbound lines so you can have many calls extant at any one moment?/ so are you just making a guess based on the proximity of start/stop of each event?

If so then perhaps an iterative process of starting with time deltas of 1 second and expanding outwards until there is a one-one match?

Further there must be a call for every survey but not necessarily the other ways round (hang ups). So if there are two matching records for calls for a given survey does it really matter that they are wrongly assigned? Surely you would not be associating the caller id with the survey questions as that would be almost certainly a breach of the data privacy regs (if you are in the EU); and if there were consent then the app would have been created to associate right from the start.

 
Hi maxhugen, yeah I thought of that too, it just seemed liked that was an extraneous step that I shouldn't need - but then I'm no expert....[surprise]

Hi jpadie,
jpadie said:
How do you define that a survey relates to a particular call?

Call
All calls are stored.​
All calls have an unique id.​
Calls transferred to a survey must contain '1234567890' in the Survey field.​
-- sadly this is the only 'real' relationship to the survey
The call start date, start time, end date, end time are stored.​

Surveys
All surveys are stored .​
All surveys have a unique id.​
The survey call start date, start time, end date, end time are stored.​

jpadie said:
I assume that there are multiple inbound lines so you can have many calls extant at any one moment?/ so are you just making a guess based on the proximity of start/stop of each event?
yes and yes

jpadie said:
If so then perhaps an iterative process of starting with time deltas of 1 second and expanding outwards until there is a one-one match?
That's why I set it up the way I did...

jpadie said:
Further there must be a call for every survey but not necessarily the other ways round (hang ups).
There is a call for every survey and we do track hangups or abandoned calls - those load the survey field in the Call table and a survey id in the Survey table with all response fields showing as null.

jpadie said:
So if there are two matching records for calls for a given survey does it really matter that they are wrongly assigned? Surely you would not be associating the caller id with the survey questions as that would be almost certainly a breach of the data privacy regs (if you are in the EU); and if there were consent then the app would have been created to associate right from the start.

No it doesn't matter if they are wrongly assigned, I wouldn't have any way to identify that they are wrongly assigned.
Calls are associated with phone numbers and account numbers if known because those are specifically dealing with services.
Surveys are anonymous and not associated with any personal identifier.


Still the problem remains that a really small number of the surveys are not being loaded into the CallSurvey table, I'm not sure how to redesign it so that if multiple calls can match multiple surveys, when the first such call matches a survey, that survey is no longer available for matching and when it comes time for the next call (same situation) to update it's Survey_ID, then it can choose any other qualifying survey.

Thanks for all your help.
 
But if you associate a survey with a call then you are associating a survey with the calling phone number which is personal data for most purposes in the EU. So the surveys are not anonymous. Which perhaps was why the disconnect was designed in the first place?

I think you can solve your issue anyway by approaching the join table population from the survey side rather then the call side.
 
I have to associate a call with a survey - there is no choice in that matter - what we are reporting on is the number of calls, calls that successfully transfer to surveys and the responses recorded in the surveys and how that data trends. Are the service ratings improving or not? Where do we need to improve? Our concern is where we need to improve and where are we doing well as well. There is absolutely no data reported on the caller or account - note that those fields may or may not be filled in and may be incorrect if manually entered, depending upon how the call was made - inbound, outbound, transferred or automated and because there is no truly unique linking, there is no guarantee that the call to the survey match is valid or correct - this is explicitly understood. We recognize all of these issues and no, it was not thought through, it was not designed to be this way, they didn't plan it this way - they didn't plan it at all and I'm having to deal with that fall out.

I tried approaching it from the survey side and ran into slightly different and worse issues issues but I don't have time to get into those at the moment because I'm being told the reports are due by EONW. Yikes.....Thanks for all your help, if I get those last few stragglers figured out, I'll post it.

[americanflag]
 
I'm happy to help further. It would be useful to see perhaps a hundred record from each table (or say all records for a given day) so that we can see the issues at the level of the data. If you could provide this in a SQL dump format together with the schema then that would be useful.

If you can't share this publicly then feel free to send to me offline. You can easily find me via my website at rathercurious.net. and my profile hopefully provides some comfort about my bona fides!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top