Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Which is Faster: Left(), Substring(), or Cast()

Status
Not open for further replies.

Qmoto

MIS
Oct 21, 2004
74
US
I'm working with a smallish recordset of just over 2 million records. In order to select the correct information, one of the select statements needs only the 5 leftmost characters.

Currently I'm using LEFT(col3, 5)

Does anyone have any advice as to which of the three methods would be faster, and possibly an explination of why?

Code:
The three methods...

LEFT(MyColumn, 5)

SUBSTRING(MyColumn, 1, 5)

CAST(MyColumn AS CHAR(5)

Thanks!
 
Are you using this for a where clause?

If it is, and you have a clustered index on MyColumn, I think you should use LIKE. Your options will all find records using an index scan, and you want to be using an index seek.

Perhaps we could better help you if you explained how this is used in your query?

HOpe this gets you started,

Alex

Ignorance of certain subjects is a great part of wisdom
 
Thanks Alex,

This is actually going to be used in a view, more specifically on a zipcode column on a join in a view.

Confused yet... :0)

Here's the code I'm using
Code:
LEFT JOIN
	ZIPFILE ZIPF
	ON
	MBAL.ALCPTX = ZIPF.CITY	AND
	LEFT(MBAL.ALCVCD, 5) = ZIPF.ZIP5

The column being trimmed to 5 characters is being replicated from our as/400 and does not have a index built for it, and I'm not sure what the implications would be in adding a clustered index on that column. I'd have to see how many times a day that file gets updated.
 
Warning- long post...

I think performance for your three methods is exactly the same (after my limited testing). I learned yesterday that you can use computed columns in SQL, and tried creating one of these and building an index on that. Using this method I was able to squeeze out a little bit more speed. For only 1900 records this translated to about a slightly > 60% decrease in cost over the alternatives. For a large recordset this might be a viable option.

Try running this and look at your execution plan for the last 4 queries (hit ctrl+k prior to running). It's kind of a lot of work to set up, but you might be able to get better performance using it (if speed is really that big of a concern). (make sure you play with this in a safe testing environment)

Hope it helps,

Alex

PS - to get a better idea of the query costs, first run the create table portion (up to
---end creating test data) then run the queries and the drop tables. I was seeing about 29% costs for the first three, and around 11 for the last one (using the computed column as index).

Code:
create table tester (ID int identity(1,1), val varchar(15))


declare @loopcnt int

set @loopcnt = 1


while @loopcnt < 101
begin
insert into tester
select 'Alex' + convert(varchar(3), @loopcnt)
union all select 'Qmoto'+ convert(varchar(3), @loopcnt)
union all select 'Sven'+ convert(varchar(3), @loopcnt)
union all select 'Scott'+ convert(varchar(3), @loopcnt)
union all select 'Steven'+ convert(varchar(3), @loopcnt)
union all select 'Alan'+ convert(varchar(3), @loopcnt)
union all select 'Bart'+ convert(varchar(3), @loopcnt)
union all select 'Bjorn'+ convert(varchar(3), @loopcnt)
union all select 'Chris'+ convert(varchar(3), @loopcnt)
union all select 'Barry'+ convert(varchar(3), @loopcnt)
union all select 'Asthma'+ convert(varchar(3), @loopcnt)
union all select 'SQL'+ convert(varchar(3), @loopcnt)
union all select 'Word'+ convert(varchar(3), @loopcnt)
union all select 'I'+ convert(varchar(3), @loopcnt)
union all select 'Think'+ convert(varchar(3), @loopcnt)
union all select 'This'+ convert(varchar(3), @loopcnt)
union all select 'is'+ convert(varchar(3), @loopcnt)
union all select 'enough'+ convert(varchar(3), @loopcnt)
union all select 'B'+ convert(varchar(3), @loopcnt)

set @loopcnt = @loopcnt + 1
end

alter table tester add mytestval as left(val, 1) 
create clustered index idx_tester on tester (mytestval)


go

create table tester2 (ID int identity(1,1), val varchar(15) primary key clustered)
insert into tester2
select 'S'
union all select 'B'


---end creating test data

select a.* from tester a inner join tester2 b on left(a.val, 1) = b.val

select a.* from tester a inner join tester2 b on substring(a.val, 1, 1) = b.val

select a.* from tester a inner join tester2 b on cast(a.val as char(1)) = b.val

select a.* from tester a inner join tester2 b on a.mytestval = b.val

drop table tester
drop table tester2


Ignorance of certain subjects is a great part of wisdom
 
Hmmm very interesting... thanks for the test data, let me play with it for an evening and see what I can come up with.

I appreciate the help.

 
No problem. It was something I needed to play with myself anyway :) Let me know what you find out.

Alex



Ignorance of certain subjects is a great part of wisdom
 
WARNING: LONG (AND WIDE) POST

It appears that the results are similar, although I didn't drop the tables each time because I uped the iterations to 150000, which gave me 2,849,981 million rows (closer to the particular view I'm working with).

Even though there were 2.8 million rows, the queries only selected something like 1.1 million rows... but the each query returned the same number of rows so I did not worry about it.

I tested each query with the index, and without the index. I find it interesting to note, the only query which didn't seem to be affected by the index was the CAST() version.

Other than that it would appear that each of the other versions of the query (Left, Substring) perorm exactly the same. So it appears that for my purposes, I can pick and choose.

Thanks again for the testing code and the help,
Steve

Code:
[COLOR=red][QUERY ONE][/color]
DBCC FREEPROCCACHE WITH NO_INFOMSGS
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
select a.* from tester a inner join tester2 b on left(a.val, 1) = b.val

	[w/o index]
	18 seconds to select 1199992 rows
	select (0%) <-- Parallelism (9%) <-- Hash Match (50%) <-- Compute Scaler (1%) <-- Compute Scaler (1%) <-- Table Scan (40%)

	[w/ index ]
	17 seconds to select 1199992 rows
	select (0%) <-- Compute Scalar (1%) <-- Parallelism (25%) <-- Nested Loops (inner join) (29%) <-- Clustered Intex Seek (44%)


[COLOR=red][QUERY TWO][/color]
DBCC FREEPROCCACHE WITH NO_INFOMSGS
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
select a.* from tester a inner join tester2 b on substring(a.val, 1, 1) = b.val
	
	[w/o index]
	18 seconds to select 1199992 rows
	select (0%) <-- Parallelism (9%) <-- Hash Match (50%) <-- Compute Scalar (1%) <-- Compute Scaler (1%) <-- Table Scan (40%)
	
	[w/ index ]
	17 seconds to select 1199992 rows
	select (0%) <-- Compute Scalar (1%) <-- Parallelism (25%) <-- Nested Loops (29%) <-- Clustered Intex Seek (44%)


[COLOR=red][QUERY THREE][/color]
DBCC FREEPROCCACHE WITH NO_INFOMSGS
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
select a.* from tester a inner join tester2 b on cast(a.val as char(1)) = b.val

	[w/o index]
	[COLOR=green]18 seconds[/color] to select 1199992 rows
	select (0%) <-- Parallelism (0%) <-- Hash Match (52%) <-- Compute Scalar (1%) <-- Compute Scaler (1%) <-- Table Scan (47%)

	[w/ index ]
	[COLOR=green]18 seconds[/color] to select 1199992 rows
	select (0%) <-- Parallelism (0%) <-- Hash Match (52%) <-- Compute Scalar (1%) <-- Clustered Index Scan (47%)


[COLOR=red][QUERY FOUR][/color]
DBCC FREEPROCCACHE WITH NO_INFOMSGS
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
select a.* from tester a inner join tester2 b on a.mytestval = b.val

	[w/o index]
	18 seconds to select 1199992 rows
	select (0%) <-- Parallelism (9%) <-- Hash Match (50%) <-- Compute Scalar (1%) <-- Compute Scaler (1%) <-- Table Scan (40%)

	[w/ index ]
	17 seconds to select 1199992 rows
	select (0%) <-- Compute Scalar (1%) <-- Parallelism (25%) <-- Nested Loops (29%) <-- Clustered Intex Seek (44%)
 
The reason it did not return everything is because there are only two starting letters in tester2, so no worries there.

To tell you the truth, I was expecting much better performance out of query 4. The largest table I built to try it with was only 19,000 records, but I was seeing query costs a little over 30% for #1-3, and a little under 8 for #4 on that table. These were so quick that I could not time though, I was only looking at relative cost in the batch of 4.

Glad you figured it out though.

Alex

Ignorance of certain subjects is a great part of wisdom
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top