Flexible WHERE criteria for a data object? 3

CosmicCharlie · Jun 30, 2006

I am developing a data tier for the first time, using the Abstract Factory design pattern. I have a working set of classes that return DataTables, DataReaders, and DataRows. I am also designing a Data Object, inherited from an abstract class, to support handling data for specific tables. The idea behind the Data Object is that it is dedicated to a single table, though nothing enforces this. Internally the data object calls various CRUD stored procedures associated with the table.

The problem I have is knowing how to design the data object so that it will support flexible searches of the table. Suppose it is a Customer table, and a user may want to search by Customer Name and State, or by City only, or any other combination of criteria. I cannot think of an elegant way to support this. I could create a WhereClause property, but that would require the client to have too much knowledge of the internal database design. It would also require me to use SQL commands in code, which I am trying to avoid.

I also considered creating a Parameter subclass in my data object, then creating a stored procedure with an optional parameter for every field. This would work, except I get stuck on how to handle numeric fields. I know of no numeric wildcards like there are for strings.

Is there any way to elegantly allow users search on any combination of fields they like without sacrificing solid OO design?

Charlie

BobRodes · Jul 2, 2006

Hi Charlie,

You might find it interesting that your Parameter idea is pretty much what ADO does: the ADO command class has a dependent Parameters class. Also, most of the classes have a dependent Properties class that allows custom definition of properties to provider implementations. Now, I would consider not a Parameter subclass, but a dependent Parameter class. I suspect this is what you mean, but a subclass would be a class that inherits the interface of the base class, and this is a class that should be able to be instantiated only by the base class. In UML terms, this is a "composition association," meaning that the lifetime of the set of parameter objects is tied to the lifetime of the Data object. You may also say that the Data object "is composed of" zero to many parameter objects. (Which is not to say "composed of and only of"; the point is that the parameter object is associated with only one data object, which is responsible for its creation and destruction.)

<I know of no numeric wildcards like there are for strings.

I'm thinking that what you are saying is that your client would search string data using standard wildcard protocols, and that you don't see an equivalent protocol for numeric data. However, there is a fairly common protocol using commas and dashes and such in a list, e. g. "1-17, 23, 25, >37" can be parsed to mean "one to 17 or 23 or 25 or greater than 37". The point is that you have adopted a search protocol for strings already; the fact that it is assumed that the semantics of that protocol are understood by the client already (I wouldn't do that, by the way) doesn't change that. You'll need to do the same for numerics and any other types (boolean, for example) you're using.

Another way of implementing the search capability would be to have a different stored procedure for each of the commands. The thing to keep in mind when determining how to implement this is to minimize decision logic, while minimizing also communications between objects.

HTH

Bob

CosmicCharlie · Jul 3, 2006

Hi, Bob. Thank you for your thoughtful reply. I think my original question was a little unclear. Let me see if my response helps get to the root of what I am looking for.

First, thank you for correcting my use of the term “subclass.” Indeed, I meant a dependent class. Thank you for clarifying that.

Second, regarding wildcards, I really am looking for the numeric equivalent to the string percent sign, which I fear does not exist. The circumstance that prompts this question is one where my users want to search a table either by a certain varchar field, or by a certain integer field, or by both. Without a numeric wildcard, I was forced to create a stored procedure like this:

ALTER PROCEDURE dbo.uspAgencySearch
@AgencyName varchar(50) = '%',
@CategoryID int = -1
AS
IF @CategoryID = -1
EXECUTE dbo.uspAgencySearchWITHOUTCategoryID @AgencyName
ELSE
EXECUTE dbo.uspAgencySearchWithCategoryID @AgencyName, @CategoryID

I dislike this approach, but can live with it in this limited circumstance. However the problem got me wondering about the way to implement a more flexible search situation where there could be any combination of numeric parameters passed in. What would I do then? The problem would become unmanageable with dozens of nearly identical stored procedures.

In that context, I am not sure I understand your comment about using commas and dashes. I tried passing in a string that was used in an IN criterion, but for that to work, T-SQL would have to allow concatenating strings to create SQL like VB does, which to my knowledge it does not. I also tried using a CASE statement in a WHERE clause, but I could not make that work. I would love to know how to conditionally create a WHERE clause in the middle of a stored procedure.

Third, in general, I am looking for a way to create a genuinely flexible data object that will allow a business object to specify whatever criteria it wants from the data object, and have that data object create custom SQL for it. There is a limit as to how far you can go on this, of course, but there has to be a more flexible way than to create a stored procedure for every possible combination of search criteria.

Since writing this initial post, I got the idea of creating a dependent WHERECLAUSE class that would allow an business object to enter any number of field-operator-value sets of data. The data object would then parse these into a WHERE clause and pass SQL to the data engine. This has its limitations too, of course, and if anyone has tried it, I would like to know how it worked.

Thank you for reading this very long question.

Charlie

BobRodes · Jul 3, 2006

Actually, if you think about it, the idea of creating a stored procedure for each possible combination of search criteria is precisely the methodology by which overloading is accomplished. So, it's actually quite common, although it does seem unwieldy. In fact, your code is doing the equivalent of overloading uspAgencySearch to accept two kinds of parameters.

I like your whereclause class idea. In a certain sense, it's an implementation of the one-to-many paradigm. You might take it one further: create a whereclause class with a property that is an array of searchfield classes, and a method that parses these. Then have your searchfield class be a pair of fieldname/searchvalue properties. Your whereclause class could accept a dataset of some sort, error check the searchfields against it, and return a valid where clause.

As for arcane T-SQL questions, I'm not the best. You might try reposting them in forum183. Those guys live for that stuff. With my limited knowledge of CASE statements, I can only say that I would think of it as a WHERE clause in a CASE statement rather than the reverse.

HTH

Bob

dragonwell · Jul 3, 2006

Hi Charlie,
Most search implementations I've seen use some form of dynamic SQL, especially when there's a lot of variablity in parameters. I like to use .NET code to build up the where clause before submitting it to the server, but it can also be done in sql (build up the query in a nvarchar variable and execute it using sp_executesql). This way, you can have a "Searcher" class that knows how to take something like a url querystring and turn it into a sql query.

CosmicCharlie · Jul 3, 2006

Thank you, Dragonwell! I looked sp_executesql up in the Help files and this will do exactly what I need. You have just solved a huge problem for me, and moved my development far forward.

And thank you, Bob, for your ideas and insights. These will also help me a lot in developing the kinds of classes I want to use in my projects.

Charlie

rac2 · Jul 5, 2006

Hello Charlie,

Sounds like you have an answer you like. But there is another solution to the optional search criteria requirement.

This is to have two properties and parameters for each search criterion, one is an indicator variable which tells whether the criterion should be used to limit the selected rows, the other is the usual variable that carries the value to be searched for.

Always provide values for both variables.
E.g.

Code:

SELECT * FROM MyTable
WHERE 
      (@ignoreColA OR ColA = @searchValueForColA)
      AND
      (@ignoreColB OR ColB = @searchValueForColB)
      AND
      etc.

The query has 4 parameters, a pair of parameters for each of two criteria.

@ignoreColA and @ignoreColB are boolean types. Set the parameter to false if the criterion should be used, true if it should be ignored.

@searchValueForColA and @searchValueForColB are appropriate datatypes for their columns, numeric or string. When the column is used as a criterion, the value of these parameters will control the result set; other wise the value will have no effect, even NULL could be used.

Why do it this way?

You will gain a bit of efficiency since the query will always be the same, it will not be compiled each time it is used. Unlike the build-it-on-the-fly methods.

Second, there actually is a second fact to be represented for each possible criterion, to use it or not. The indicator variable represents that fact.

CosmicCharlie · Jul 5, 2006

Awesome idea, Rac2! Very simple and straightforward. So simple, in fact, that it would be hard to come up with. I like it a lot!

I already implemented my solution using sp_executesql, which I was happy to do because I learned a lot, but this strikes me as a preferable method in my situation. The actual business problem required me to create some aliases combining several columns and inserting some parens and punctuation. Writing a plain SQL statement that does that isn't difficult, but writing it as plain text to a varchar variable that then must be interpreted as SQL just wouldn't work. I ended up having to shift the column aliasing to my VB code, which I hated doing.

If the SQL had been more straightforward, I think sp_executesql would have been the best choice. However, your solution is better suited to the problem at hand. I could write my convoluted SQL and still get the benefits of varying criteria.

Thanks for the idea. I will use this in the future, definitely.

Charlie

BobRodes · Jul 5, 2006

I like that idea too, rac2. I don't like using sp_executesql when avoidable for the reason you mention.

dhaggard · Oct 2, 2006

There is a problem with Rac2's solution. It's simple and straightforward, but it takes a monster performance hit.

This search method causes the query to use a table scan instead of an index lookup.

In the "where" line:
(@ignoreColA OR ColA = @searchValueForColA)

If "@ignoreColA" is false, then table scan is used to resolve "ColA = @searchValueForColA)." This is because the SQL engine sees the entire line as a single operation. It is not able to see that it can do an index lookup on the second half of the line.

If the table is very big, this is a performance killer! The dynamic SQL approach is far better for performance.

CosmicCharlie · Oct 2, 2006

Thank you, dhaggard, for this observation. That is indeed an important thing to consider, and I hadn't thought of it. I'll be sure to keep that in mind as I decide which option to use under different circumstances.

Charlie

rac2 · Oct 3, 2006

... a monster performance hit ...

Interesting opinion. Could you provide us with numbers? What is a "big table"? What is the break-even point?

I looked at the execution plans for a couple of tables I work with, one with 400 rows and one with 400000. For queries with and without the indicator variable, 4 queries {big, small} x {indicator variable, no indicator variable}.

Code:

Table Size  Indicator Variable Gross Cost
small          Yes                0.44%
large          Yes               49.56%
small          No                 0.44%
large          No                49.56%

It seems that table size makes a big difference and that the indicator variable does not.

The execution plans in all cases consisted of a SELECT and CLUSTERED INDEX SCAN. There were subtle differences between the queries with and without the indicatior variable.

Code:

                             SCAN         SELECT   
Table Size  Indicator Var    Est Row Cnt  Est Cost
small          Yes               1        0.000069
large          Yes             446        0.069825
small          No                1        0.000069
large          No                1        0.069780

Of course these are relative numbers, hard to know in human terms what impression they might make on a person using our applications.

sp_executesql has a couple of features which minimize or eliminate the compilation step present in dynamic SQL when using EXECUTE, thus making the performance issue moot within T-SQL.

I understood CosmicCharlie to be building strings within his application, not within a T-SQL stored procedure. In that case I believe building strings in Java or C is a bit costly, although probably not monstrously so.

ThatRickGuy · Oct 3, 2006

Even with the overhead of string creation outside of the database, it's still a write once, execute many. If you are dynamicly generating SQL and looping through different iterations in T-SQL, there's going to be a pretty nasty performance hit (in theory, I have no numbers).

-Rick

VB.Net Forum forum796 forum855 ASP.NET Forum
[monkey]

I believe in killer coding ninja monkeys. [monkey]

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Flexible WHERE criteria for a data object? 3

CosmicCharlie

Programmer

BobRodes

Instructor

CosmicCharlie

Programmer

BobRodes

Instructor

dragonwell

Programmer

CosmicCharlie

Programmer

rac2

Programmer

CosmicCharlie

Programmer

BobRodes

Instructor

dhaggard

Programmer

CosmicCharlie

Programmer

rac2

Programmer

ThatRickGuy

Programmer

Similar threads

Part and Inventory Search

Sponsor