Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SQL for multi-column?row cross-tab - not exactly

Status
Not open for further replies.

Tbull1001

Technical User
Dec 10, 2003
5
US
Environment is MySQL 3.23.33, PHP Version 4.3.2.

Have included a couple of extra columns here in an attempt to clarify what
I am trying to do. In the table rid is a unique, auto-increment. The fields
date_created and date_updated are datetime format yyyymmdd hh:mm:ss. Passing a begining and ending date format of mm-yyyy as begdate 06-2003 and endate 11-2003 (those can be 1-24 months and cross a year boundry) into a single table query like ;

Code:
SELECT (date_format(date_created,'%m - %Y')) as RptMonth
    , SUM(IF(date_created != \"0000-00-00\", 1,0)) AS NEW
    , date_format(date_created,'%m - %Y')) as Created
    , SUM(IF(date_updated != \"0000-00-00\", 1,0)) AS Upd
    , date_format(date_updated,'%m - %Y')) as Updated
      from activity_table
      where (
  (date_format(date_created,'%Y%m') between \"$begdate\" and \"$enddate\") or
  (date_format(date_updated,'%Y%m') between \"$begdate\" and \"$enddate\")
       )
          group by rid


Grouping by rid just to ensure that all rows are returned and to use the
SUM function. Injected > and >> manually in the following table to highlight
what I am trying to get as results. First result example set is a series of rows
(small representative set is shown here).

Code:
Activity  For period of Jun/2003 to Nov/2003
RptMonth   New    Created       Upd        Updated    Total
07 - 2003   1    07 - 2003       0       00 - 0000         1
07 - 2003   1    07 - 2003       0       00 - 0000         2
07 - 2003   1    07 - 2003       0       00 - 0000         3
07 - 2003   1    07 - 2003       0       00 - 0000         4
07 - 2003   1    07 - 2003       0       00 - 0000         5
07 - 2003   1 >  07 - 2003       1    >> 09 - 2003         6
08 - 2003   1    08 - 2003       0       00 - 0000         7
08 - 2003   1    08 - 2003       0       00 - 0000         8
08 - 2003   1    08 - 2003       0       00 - 0000         9
09 - 2003   1    09 - 2003       0       00 - 0000        10
10 - 2003   1    10 - 2003       0       00 - 0000        11
10 - 2003   1    10 - 2003       0       00 - 0000        12
10 - 2003   1    10 - 2003       0       00 - 0000        13
10 - 2003   1    10 - 2003       0       00 - 0000        14
10 - 2003   1    10 - 2003       0       00 - 0000        15
11 - 2003   1    11 - 2003       0       00 - 0000        16


I am certain that a query can return a result set that is like my desired output, but I am absolutely stuck on getting the results set. Desired output would be a table that looks like the following, note that the > and >> wouldn't actually be displayed it is added here, just to show where that row highlighted above would get counted. Also the Total is decremented by that 1. In the actual output I wouldn't even display the Created or Updated date.

Code:
Activity  For period of Jun/2003 to Nov/2003
RptMonth    New     Created       Upd      Updated    Total
07 - 2003    6    > 07 - 2003       0    00 - 0000     6
08 - 2003    3      08 - 2003       0    00 - 0000     9
09 - 2003    1      09 - 2003       1 >> 09 - 2003    10
10 - 2003    5      10 - 2003       0    00 - 0000    15
11 - 2003    1      11 - 2003       0    00 - 0000    16


I have tripped and ripped and fallen all over myself on this - i admit that i have even posted this other places but i need a new set of brain cells - maybe i have to concat, transform and invert by a giraffe then sum by date_xxx and beg-enddate or a lengthy passed array or a mysterious moon phase (gasp):

Code:
SELECT (date_format(date_created,'%m - %Y')) as RptMonth
    , SUM(IF(date_created != \"0000-00-00\", 1,0)) AS NEW
    , date_format(date_created,'%m - %Y')) as Created
    , SUM(IF(date_updated != \"0000-00-00\", 1,0)) AS Upd
    , date_format(date_updated,'%m - %Y')) as Updated
      from activity_table
      where (
  (date_format(date_created,'%Y%m') between \"$begdate\" and \"$enddate\") or
  (date_format(date_updated,'%Y%m') between \"$begdate\" and \"$enddate\")
       )
          group by RptMonth



Anybody offer any thoughts on this? Welcome all of them, as your idea might just kick my brains cells in another direction.

TIA
email nospam1001 at nonags dot com
 
I have two suggestions:

1. get rid of the 'group by rid' in your first sql statement.
It isn't doing anything except slowing your query down.
2. Whenever you use an aggregator, such as SUM, AVG, MIN, MAX,
COUNT, etc., you must include ALL select fields in the 'group by'
clause that don't have an aggregator function.
That means that your sql statements are each missing three
'group by' fields. (group by 1,3,5)

You never come out and specifically say exactly what problem
you're having.
 
vanekl,
Thank you for your responese.
1. The 'group by rid' in the first sql is only there to allow all of the rows to print out. As you have correctly identified a SUM is a grouping/aggregator and that is why it is in the query. If there is no group by the query will fail.

2. I beg to differ with you on your statement regarding "ALL" select fields must be used in the 'group by'. That is not required. It is also demonstrated by the first sql which is 'grouping by rid', which is unique in every row (a grouping of 1), that guarantees that every row will be returned - as intended for this example. In actuality, any field in any query can be used as a 'group by' as can several fields or even calculations - not just those used in an aggregator.

The "problem" that I am trying to solve is to create the second result table in the example from the data represented in the first. Notice that all of the months are represented by a single line. The challenge is that each row has a creation date and an updated date. Those are what need to be 'counted'. The SUM(IF ... will count correctly, it is finding a way to count each field in separate months that is the hurdle.

A row can be updated at a much later time as is the case in the row with the > and >>. That row is created in July and needs to be counted as NEW in the 07-2003 row. It is updated in Sep and needs to be counted as Upd (Updated) in the 09-2003 row. Hope that is slightly clearer.

Thank you again for responding.
 
You want something like this,

SELECT
(date_format(r1.date_created,'%m - %Y')) as RptMonth
, count(distinct r1.id) AS NEW
, count(distinct r2.id) AS Upd
from activity_table r1 left join activity_table r2 on
( date_format(r2.date_updated,'%m - %Y') =
date_format(r1.date_created,'%m - %Y')
)
group by 1;

I ran this on my MySQL database and it worked
with the expected results. The only problem with
this query is it drops 'updated' counts out if there
isn't at least one record created in the same month,
but if you have a lot of DB activity then this shouldn't
be a problem.
Add where clause to filter.
 
The self-join is the path that I needed. You probably noticed that there is another event that can occur.

If a begin and end date don't include the created date, but do have the updated date a record gets generated that is outside of the range. As an example if the requested range is Aug03 - Feb04, July still gets a hit because of the updated event in Sep. Will have to test for a missing month or a record showing up outside of the requested range but I think that can be a couple of programmatic if-else when the records are returned into the report.

What is most important is that thanks to your help I can add the "WHERE" and grouping by RptMonth from the 2nd select in the original post and get the db engine do the heavy lifting (as it should)!

Thanks!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top