Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Grouping on Similar Merchant Names 3

Status
Not open for further replies.

Jcfx

MIS
Oct 8, 2004
134
US
CR10, Sql DB

Basic Month summary report with MerchantName, TranAmount and Trandate.

Grouping on MerchantName currently.

Would like to clean up the report and consolidate the Grouping based on MerchantName similarities. Sample Data:

MerchantName
7-Eleven
7 Eleven
7Eleven
A1 Auto
A-1Auto
A-1 Auto
A 1 Auto
#765 Meijer
#456 Meijer
#123 Meijer
Amoco Oil 123456
Amoco Oil 456789
Pizza Hut 1234567T
PizzaHut #23456
Pizza Hut#23445
Pizza Hut
Wal Mart
Wal-Mart Stores
WalMart #3452
234 Sears
Sears #23475

"Rolling" those Merchant names to group like:
7-Eleven
A1 Auto
Meijer
Amoco Oil
Pizza Hut
Wal Mart
Sears

There isn't much of a consistant pattern other than hitting the first alpha and counting x spaces, then looking for similarities.

Any Suggestions?


Julie
CR 9,10 CE10 Sql DB
 
Hi,
Since it would take a lot of pre-processing and lots of If..Then..Else and Replace() and other parsing steps to do that on-demand, it would be better and probably easier to work on cleaning the data at the database level and not have the reporting tool try to do it every time it gets the data.

After the initial cleaning, you might need to find a way to prevent future data entries from being so varied..






[profile]

To Paraphrase:"The Help you get is proportional to the Help you give.."
 
Thanks for the reply, this data comes from electronic data transfer import from our CC vendor, no data entry involved, what you are seeing is actually a lot cleaner than the original :(



Julie
CR 9,10 CE10 Sql DB
 
I've done something very similar in the past. What I did was strip out all of the punctuation, spaces, and in your case you'd need to strip the numbers as well. That would get you pretty close. I used a series of loops to accomplish this.
 
You can't strip numbers, you'd lose the 7 from 7-eleven, etc.

What you have is a bad data entry process, and some heads should roll as a result. Certainly a report is NOT the place to correct the data, but you might approach it using soundex({table.company}).

AFter that it would require a lot of coding and testing to clean it up, and since you'll still have dirty data, you should place this squarely on the dba and the screen hangers (coders) to clean up what they allowed in the first place.

There are also 3rd party firms that can clean data, in any case, it's not simple in Crystal to display what your mind can glean, there are too many possibilities.

-k
 
I didnt say it was a perfect solution, it will get you close.
 
Thanks again all.

And for Synapsevampire, it's Halloween dress up today, Maybe my nerf quarterstaff isn't just for show...



Julie
CR 9,10 CE10 Sql DB
 
I'm with you, Julie, nerf is very spongelike, all you need is a vat-o-suplhuric acid...

A start to code might be not using soundex:

numbervar counter;
stringvar Output;
for counter:= 1 to len({table.field}) do(
if asc(mid({table.field},counter,1)) in [65 to 90]
or
asc(mid({table.field},counter,1)) in [97 to 122]
or
asc(mid({table.field},counter,1)) in [48 to 57]
or
asc(mid({table.field},counter,1)) = 32
then
Output:=Ouput + mid({table.field},counter,1)
);
Output

Now you have just letters, spaces and numbers.

Now you need to guess how to flip the data around based on the data entry mess.

-k
 
I've learned alot about punctuation stripping today, thanks to all.

I used this formula to strip my punctuation:
Code:
Uppercase(replace(replace(replace(replace(replace({Mer.MerchantName},".",""),",",""),"-","")," ",""),"#",""))

and I am playing with those results before I saw SynapseVampire's last reply, but I wanted to try his code anyway, to see what it did. Please bear with this newbie:)

My research today found a another similar formula,and both choked with 'A String can be at most 65534 characters long'. I see that it is building one large string containing everything but the punctuation.

I take it the next step would be to start pulling the alphas from that string and throwing them into another variable?

The other formula I found and have been trying to break apart, in case others might be looking:
Code:
WhilePrintingRecords;
StringVar MyField;
StringVar Strip;
StringVar Alphabet;

MyField := {Table.Field};
Alphabet:= "abcdefghijklmnopqrstuvwxyz"; 

While Length(MyField) > 0 
Do
    (
        If IsNumeric(Left(MyField,1)) 
        Then Strip := Strip + Left(MyField,1)
        Else
            If LowerCase(Left(MyField,1)) in Alphabet
            Then Strip := Strip + Left(MyField,1);
        MyField:= Mid(MyField,2);
    );
    
Strip;

Appreciate any clarification, I like understanding what I am using so I can think it through myself the next time, instead of just copying and pasting and then being scre.. erm at a loss. :)

Julie
CR 9,10 CE10 Sql DB
 
I think that you'll do better to analyze what I posted to allow for more precise extraction.

What's posted above is OK, but it's a tad primitive and not comprehensive enough for company names.

Study the ASC technique I posted:

-k
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top