-----------------------------------------------------------
SCHEMA
-----------------------------------------------------------
CREATE TABLE reports
(
report_id INTEGER NOT NULL PRIMARY KEY,
report VARCHAR(50) NOT NULL UNIQUE
-- ...
);
CREATE TABLE fields
(
field_id INTEGER NOT NULL PRIMARY KEY,
field VARCHAR(50) NOT NULL UNIQUE
-- ...
);
CREATE TABLE report_fields
(
report_id INTEGER NOT NULL REFERENCES reports,
field_id INTEGER NOT NULL REFERENCES fields,
-- ...
CONSTRAINT pk_report_fields
PRIMARY KEY (report_id, field_id)
);
CREATE TABLE report_data1
(
record_id INTEGER NOT NULL PRIMARY KEY,
field1 VARCHAR(100),
field2 VARCHAR(100),
field3 INTEGER,
field4 NUMERIC(25, 9),
field5 DATETIME,
field6 DATETIME
-- ...
);
CREATE TABLE report_data2
(
record_id INTEGER NOT NULL REFERENCES records,
field_id INTEGER NOT NULL REFERENCES fields,
-- populate 1 (potentially but unlikely more) columns depending on data type:
"integer" INTEGER,
"numeric" NUMERIC(25, 9),
"date" DATETIME,
"text" VARCHAR(100),
CONSTRAINT pk_report_data2
PRIMARY KEY (record_id, field_id)
);
-----------------------------------------------------------
I know of two ways of converting rows to columns required to build a report:
-----------------------------------------------------------
Version 1
-----------------------------------------------------------
SELECT D.record_id,
D.field1 "column 1",
D.field3 "column 2",
D.field5 "column 3",
D1."integer" "column 4",
D2."text" "column 5",
D3."text" "column 6"
FROM report_data1 D
JOIN
report_data2 D1 ON D1.record_id = D.record_id
AND D1.field_id = 1
JOIN
report_data2 D2 ON D2.record_id = D.record_id
AND D2.field_id = 2
JOIN
report_data2 D3 ON D3.record_id = D.record_id
AND D3.field_id = 3
WHERE D.field5 BETWEEN TIMESTAMP '2012-02-01 00:00:00' AND TIMESTAMP '2012-02-29 23:59:59'
AND D1."integer" = 101555
AND D3."text" = 'confirmed'
-----------------------------------------------------------
Version 2
-----------------------------------------------------------
SELECT D1.record_id,
D1.field1 "column 1",
D1.field3 "column 2",
D1.field5 "column 3",
MIN(CASE D2.field_id WHEN 1 THEN D2."integer" END) "column 4",
MIN(CASE D2.field_id WHEN 2 THEN D2."text" END) "column 5",
MIN(CASE D2.field_id WHEN 3 THEN D2."text" END) "column 6"
FROM report_data1 D1
JOIN
report_data2 D2 ON D2.record_id = D1.record_id
WHERE D1.field5 BETWEEN TIMESTAMP '2012-02-01 00:00:00' AND TIMESTAMP '2012-02-29 23:59:59'
GROUP BY D1.record_id
HAVING MIN(CASE D2.field_id WHEN 1 THEN D2."integer" END) = 101555
AND MIN(CASE D2.field_id WHEN 3 THEN D2."text" END) = 'confirmed'
-----------------------------------------------------------
Version 1 is not a workable solution, for the number of columns may reach 90 whereas the Sybase limit for the number of tables in a query is 50. Version 2 is not scalable: when the number of rows expected is 50,000, it may return in under 3 minutes; when it is just over 170,000, it will never return ultimately complaining about the lack of space in tempdb.
Interestingly, the filtering conditions work OK in both these versions, although version 1 is obviously somewhat better. Ultimately it is the number of columns that is a problem. So, version 1 just does not work because of the number of tables required. With version 2, Sybase creates many work tables (their number depends on the number of columns) and I could not find a way of convincing it that null-skipping MIN() is not expected to find more than just 1 value. I tried to use PLAN clause but it does not provide a way of reducing the number of work tables.
The questions then are, is their another way of converting rows to columns? If not, is there a way to make the above work? The current solution extracts data from Sybase piecemeal and then completes the query, so to speak, but since tables are both populated and read from, resulting datasets are inconsistent.
SCHEMA
-----------------------------------------------------------
CREATE TABLE reports
(
report_id INTEGER NOT NULL PRIMARY KEY,
report VARCHAR(50) NOT NULL UNIQUE
-- ...
);
CREATE TABLE fields
(
field_id INTEGER NOT NULL PRIMARY KEY,
field VARCHAR(50) NOT NULL UNIQUE
-- ...
);
CREATE TABLE report_fields
(
report_id INTEGER NOT NULL REFERENCES reports,
field_id INTEGER NOT NULL REFERENCES fields,
-- ...
CONSTRAINT pk_report_fields
PRIMARY KEY (report_id, field_id)
);
CREATE TABLE report_data1
(
record_id INTEGER NOT NULL PRIMARY KEY,
field1 VARCHAR(100),
field2 VARCHAR(100),
field3 INTEGER,
field4 NUMERIC(25, 9),
field5 DATETIME,
field6 DATETIME
-- ...
);
CREATE TABLE report_data2
(
record_id INTEGER NOT NULL REFERENCES records,
field_id INTEGER NOT NULL REFERENCES fields,
-- populate 1 (potentially but unlikely more) columns depending on data type:
"integer" INTEGER,
"numeric" NUMERIC(25, 9),
"date" DATETIME,
"text" VARCHAR(100),
CONSTRAINT pk_report_data2
PRIMARY KEY (record_id, field_id)
);
-----------------------------------------------------------
I know of two ways of converting rows to columns required to build a report:
-----------------------------------------------------------
Version 1
-----------------------------------------------------------
SELECT D.record_id,
D.field1 "column 1",
D.field3 "column 2",
D.field5 "column 3",
D1."integer" "column 4",
D2."text" "column 5",
D3."text" "column 6"
FROM report_data1 D
JOIN
report_data2 D1 ON D1.record_id = D.record_id
AND D1.field_id = 1
JOIN
report_data2 D2 ON D2.record_id = D.record_id
AND D2.field_id = 2
JOIN
report_data2 D3 ON D3.record_id = D.record_id
AND D3.field_id = 3
WHERE D.field5 BETWEEN TIMESTAMP '2012-02-01 00:00:00' AND TIMESTAMP '2012-02-29 23:59:59'
AND D1."integer" = 101555
AND D3."text" = 'confirmed'
-----------------------------------------------------------
Version 2
-----------------------------------------------------------
SELECT D1.record_id,
D1.field1 "column 1",
D1.field3 "column 2",
D1.field5 "column 3",
MIN(CASE D2.field_id WHEN 1 THEN D2."integer" END) "column 4",
MIN(CASE D2.field_id WHEN 2 THEN D2."text" END) "column 5",
MIN(CASE D2.field_id WHEN 3 THEN D2."text" END) "column 6"
FROM report_data1 D1
JOIN
report_data2 D2 ON D2.record_id = D1.record_id
WHERE D1.field5 BETWEEN TIMESTAMP '2012-02-01 00:00:00' AND TIMESTAMP '2012-02-29 23:59:59'
GROUP BY D1.record_id
HAVING MIN(CASE D2.field_id WHEN 1 THEN D2."integer" END) = 101555
AND MIN(CASE D2.field_id WHEN 3 THEN D2."text" END) = 'confirmed'
-----------------------------------------------------------
Version 1 is not a workable solution, for the number of columns may reach 90 whereas the Sybase limit for the number of tables in a query is 50. Version 2 is not scalable: when the number of rows expected is 50,000, it may return in under 3 minutes; when it is just over 170,000, it will never return ultimately complaining about the lack of space in tempdb.
Interestingly, the filtering conditions work OK in both these versions, although version 1 is obviously somewhat better. Ultimately it is the number of columns that is a problem. So, version 1 just does not work because of the number of tables required. With version 2, Sybase creates many work tables (their number depends on the number of columns) and I could not find a way of convincing it that null-skipping MIN() is not expected to find more than just 1 value. I tried to use PLAN clause but it does not provide a way of reducing the number of work tables.
The questions then are, is their another way of converting rows to columns? If not, is there a way to make the above work? The current solution extracts data from Sybase piecemeal and then completes the query, so to speak, but since tables are both populated and read from, resulting datasets are inconsistent.