Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Optimize a query to create a view

Status
Not open for further replies.

elsenorjose

Technical User
Oct 29, 2003
684
US
Hello everyone...

I'm trying to create a view that feeds a reporting application (Crystal Reports) using the following query:

select
market,
station_id,
cprog,
count(distinct c_ip) as 'users',
date_time,
count(track) as 'sessions',
sum(bytes)/1000000000 as 'gbytes',
sum(x_duration) as 'xduration'
from live_play_table
group by market, station_id, cprog, c_ip

The reason I want to use a view is that when Crystal accesses just a base table, it retrieves all rows it thinks it needs (in this case 15-20 million rows) to perform its calculations. By creating a view that summarizes the data, I restrict the number of records returned down to the order of several thousand vs. several million.

The problem is that by including the distinct count of IPs (c_ip), the performance of this query is abysmal. It takes hours to retrieve the data. This is true in Query Browser as well as in Crystal so I'm sure it's a DB side issue.

Is there any other way to include the distinct count of IP addresses in the view while improving the performance of the query? If there is no c_ip group in the query, I retrieve just a few thousand rows and the query executes very quickly. If I have the c_ip group in there, I retrieve a few million rows...still fewer than Crystal retrieves from a base table but it results in a very slow execution.
 
the more important question is, you have two different queries, one where c_ip is in the GROUP BY, and one where it isn't, and you're talking about them as though they were the same query with different speeds, whereas in fact they are two fundamentally different queries which just happen, according to you, to run at different speeds

i think the more fundamental question is, which query should you be running, because the totals are completely different -- which set of totals is correct?

r937.com | rudy.ca
 
Yeah, you're right. That was totally convoluted.

I NEED to group on c_ip to get the distinct count of IPs. Then, I query the view to generate my metrics.

So...

create or replace view v_live
as
select
market,
station_id,
cprog,
count(distinct c_ip) as 'users',
date_time,
count(track) as 'sessions',
sum(bytes)/1000000 as 'mb',
sum(x_duration) as 'xduration'
from live_play_table
group by date(date_time), market, station_id, cprog, c_ip

Creates the view.

select
case when market regexp '-[[:alpha:]]{2}$' then market
when market = '-' then 'ZZZ-Missing Value'
else 'ZZZ-Invalid Value' end as market,
case when station_id like '%-FM' or station_id like '%-IP' or station_id like '%-AM' then station_id
when station_id = '-' then 'ZZZ-Missing Value'
else 'ZZZ-Invalid Value' end as 'Station',
case when cprog = '-' then 'ZZZ-Missing Value'
when cprog like '%\%%' then 'ZZZ-Invalid Value'
else cprog end as cprog,
sum(users) as 'distinct_users',
sum(sessions) as 'total_sessions',
round(sum(bytes),2) as 'total_mb',
round(sum(xduration)/60/60,2) as 'ATH'
from v_live
WHERE YEARweek(Date_time) = YEARweek(CURRENT_DATE - INTERVAL 7 DAY)
group by market, station_id, cprog

Provides the report.


The main reason I went with a view, as I stated, was that querying the base table once it's loaded with a month's worth of data brings my system to its knees. It creates a very large temp table and consumes literally all available memory (4GB). When I run the same report on the view it takes FOREVER but doesn't hit a memory wall.

Thanks for aligning my thinking a bit better :)
 
I NEED to group on c_ip to get the distinct count of IPs
no you don't :)


try this --
Code:
select market
     , station_id
     , cprog
     , date(date_time)     
     , count(distinct c_ip)       as users
     , count(track)               as sessions
     , sum(bytes)/1000000         as mb
     , sum(x_duration)            as xduration
  from live_play_table
group
    by market
     , station_id
     , cprog
     , date(date_time)

r937.com | rudy.ca
 
Thanks for the query. I've been testing it and come up with strange results.

After creating the view with your query, I do a sum(users) and get 182039.

However, the count(distinct c_ip) from the base table is 146706.

Shouldn't the results be the same?

Thanks.
 
sorry, can't really say without seeing sample rows

i'm quite lost as far as understanding your tables goes

r937.com | rudy.ca
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top