Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Removing Non-Numeric Characters from a String 1

Status
Not open for further replies.

idbr

MIS
May 1, 2003
247
GB
Hi,

I need to take a string and remove any non-numeric characters (whilst preserving any decimalisation), e.g.

Bob is 50
Price = 29.99
I have 312 tasks

Should become

50
29.99
312

My data sits in a single field in an Access table.

I was hoping someone might have a more creative solution then looking at each individual character in the string - any suggestions?

I run A2002 on an XP machine.

Thanks, Iain
 
Looks like an assignment for school...With the above example I would use the Split() function, with delimeter being a single space. I will let you figure out the rest.
 
Something like this ?
Public Function getDigits(myField)
Dim i As Long, x As String, s As String
If Len(Trim(Nz(myField, ""))) = 0 Then
getDigits = myField
Exit Function
End If
s = ""
For i = 1 To Len(myField)
x = Mid(myField, i, 1)
If (x >= "0" And x <= "9") Or x = "." Then
s = s & x
End If
Next i
getDigits = s
End Function

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
Creepers, ooh - get you! A mere schoolboy such as I can only wonder at your genius :)

PHV, I've been running something similar (apologies, should have said). I was hoping there might be something that didn't have to examine each character as this is pretty slow and I'm dealing with 100k + records at a time.

I hoped there might be a function like say nz where I can convert text to null.

Thanks, Iain
 
My comment was not ment to offend you. What does Get You! suppose to mean?
Seriously... I would use the split() function. There will be not need to look at each character.

 
I really don't know if something like this will be faster:
Public Function getDigits(myField)
Dim i As Long, a
If Len(Trim(Nz(myField, ""))) = 0 Then
getDigits = myField
Exit Function
End If
a = Split(myfield)
For i = 0 To UBound(a)
If IsNumeric(a(i)) Then
getDigits = a(i)
Exit Function
End If
Next i
getDigits = ""
End Function

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
'Get you' is a camp English expression of love. I think I might be falling for you... Relax, relax, I'm only kidding! It's just a silly phrase :)

I've taken a look at the help file (you might guess at this point that I'm not exactly a hardcore programmer) and I can see how split() would work for delimited strings (assuming the space is the delimiter) but what about the example below:

Price299.99
Bob2345
23.27%
£4.20 per hour

I need

299.99
2345
23.27
4.20

but would return

Price299.99
Bob2345
23.27%
£4.20

would I not?
 
I'm afraid the character by character approach is the best I can offer.
 
Hi...

I enjoyed your dialog with PHV

I was doing this in the midst of that and thought I'd post it anyway. You are probably doing something similar.

Public Function ParseString(MyString)
Dim a, b, c As Variant
Dim x As Long
a = MyStirng
For x = 1 To Len(a)
b = Mid(a, x, 1)
If b = Chr$(46) Or IsNumeric(b) Then c = c + b
Next x
ParseString = c
End Function
 
Perhaps a little faster (avoid string concatenation):
Public Function getDigits(myField)
Dim s As Long, l As Long, x As String
If Len(Trim(Nz(myField, ""))) = 0 Then
getDigits = myField
Exit Function
End If
s = 1
Do While s <= Len(myField)
x = Mid(myField, s, 1)
If (x >= "0" And x <= "9") Or x = "." Then
Exit Do
End If
s = s + 1
Loop
l = s
Do While l <= Len(myField)
x = Mid(myField, l, 1)
If (x < "0" Or x > "9") And x <> "." Then
Exit Do
End If
l = l + 1
Loop
getDigits = Mid(myField, s, l - s)
End Function

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
I've just started to play a little with RegExp (regular expressions) - and DON'T LAUGH! I doubt the efficiency on single fields exceeds the character by character approach, but just for the fun of it.

I think I'd try declaring the regexp object as a public, then let the function(s) instantiate it in their first run. Which might save some time.

Using late binding, so there's no need to set a reference (would have been to the Microsoft VBScript Regular Expressions #.#)

[tt]Option Compare Database
Option Explicit
Public gre As Object

' initialisation sub
Public Sub InstantGRE()
Set gre = CreateObject("VBScript.RegExp")
gre.MultiLine = True
gre.Global = True
End Sub

' getting a single number from a string
Public Function GetMyNum(ByVal vstrInString As String) As String
Dim mc As Object
If (gre Is Nothing) Then
InstantGRE
End If
gre.Pattern = "[\d]+\.+[\d]+|[\d]+"
Set mc = gre.Execute(vstrInString)
If (mc.Count > 1) Then
GetMyNum = mc(0)
End If
Set mc = Nothing
End Function

' getting more than one number from a string,
' separated by commas
Public Function GetMyNums(ByVal vstrInString As String) As String
Dim mc As Object
Dim m As Object
Dim strTmp As String
If (gre Is Nothing) Then
InstantGRE
End If
gre.Pattern = "[\d]+\.+[\d]+|[\d]+"
Set mc = gre.Execute(vstrInString)
For Each m In mc
strTmp = strTmp & m.Value & ", "
Next m
If (Len(strTmp) > 0) Then '
GetMyNums = Left(strTmp, Len(strTmp) - 2)
End If
Set m = Nothing
Set mc = Nothing
End Function[/tt]

- note, since the parameter is string, no Null values will be passed to the functions. One can pass variants (and test for Null and give some appropriate return value)

Roy-Vidar
 
Gee, at least one typo, "If (mc.Count > 1) Then" should have been "If (mc.Count > 0) Then"...

Roy-Vidar
 
Just thought I'd join in this banter and put a potential spanner in the works. Creepers made the comment that this looked like a school project - if so fine, no problem. However in the real world you could get entries like this:
Bob is 50 but thinks he's 19
Price = 29.99 but the cost is 9.99
I have 312 tasks, only 2 of which are worth doing

If that's a possibility Iain, there's a lot more work to be done!
Simon Rouse
 
DrSimons challenge is a much more "worthy" challenge of demonstrating the power of Regular Expressions (which I'm only at the very, very beginning of discovering). Try pasting the above functions into a standard module, using the last function in the immidiate pane should produce:

[tt]? GetMyNums("Bob is 50 but thinks he's 19") -> 50, 19
? GetMyNums("Price = 29.99 but the cost is 9.99") -> 29.99, 9.99
? GetMyNums("I have 312 tasks, only 2 of which are worth doing") -> 312, 2
? GetMyNums("Testing - ab12c34de5.4c678") -> 12, 34, 5.4, 678[/tt]

As this is my first feeble attempts at RegExp, there might of course be something the pattern does not account for. The concatination of the returned string is performed in the for each loop, and could easily be adapted to performing something else on the numbers (calculation?).

Roy-Vidar
 
Wow, what have I started? :) I'm going to take a look at all of these approaches and I'll get back to see which works quickest.

Just for clarity I am waaaaay past school age! I'm using the examples above because the data I'm working with is confidential and can't be posted.

It's basically free format memo notes that contain numeric values I need to extract to a table field. Hence the need for a quick solution, as the source string can be quite long.

Dr Simon, you're so very right - there's lots of fun to be had... The data should only contain 1 numeric per record - *if* the operators have followed instructions. Bearing in mind my experience of users I forsee headaches... :)

Roy, your post says:

I've just started to play a little with RegExp (regular expressions) - and DON'T LAUGH! I doubt the efficiency on single fields exceeds the character by character approach, but just for the fun of it.

I think I'd try declaring the regexp object as a public, then let the function(s) instantiate it in their first run. Which might save some time.

Using late binding, so there's no need to set a reference (would have been to the Microsoft VBScript Regular Expressions #.#)

Could you explain the concept a bit more - I'm intermediate with VBA at best :)

Thanks to everyone for the input - much appreciated!

Iain
 
If you are taking my point seriously then
Bob is 50 but thinks he's got the mind of a 19 day old
Price = 29.99 but the cost is 9.99 less
I have 312 tasks, and 2 are not important

... will give exactly the same results with Roy-Vidar's code. What you want to do is intellectually interesting, but the whole concept seems to be fraught with potential ambiguity. Such concept don't really fit into databases they would fit closer to Knowledge Base Programming. If you really need to extract meaning you should have 3 fields:
Age (Numeric)
Price (Currency)
Number_of_tasks (Integer)
So I suppose my question to you Ian is - why are you doing this?
Simon Rouse
 
Ah, I'm afraid I've thrown you with my examples Simon. They're just random things put together to show what I want to do.

The data I'm working with will contain a value for a certain characteristic of that record. The previous incumbent of my desk didn't see fit to include a field for this value and so is (in my opinion) an utter b******. :) The value will fit a standard format ###.## (when forced to it, the format can be different in the text).

At the moment I'm running the string through a select case statement (checking the character number), assembling a string of numeric values (strNum). I've covered "/" & "-" to include date seperators in case there are date values in there.

Once I hit a non-numeric I append strNum to a new row in a target table, (using the ID field of the master record to identify which record it belongs to) and then continue processing the string to the end.

I'll then process these values again to see if they fit the valid format for the data I'm extracting (dates won't and neither will any string over 6 digits, e.g. telephone numbers). The fun will come where I get a number of valid values!

I'm sorry to be so vague :)

Iain
 
I understand - you're just tidying up the crap. Been there, done that. Good luck.
 
I throw my support behind RoyVidar. This is an ideal use of Regular Expressions, and will be the most efficient method of solving this problem.

Good Luck
--------------
To get the most from your Tek-Tips experience, please read FAQ181-2886
As a circle of light increases so does the circumference of darkness around it. - Albert Einstein
 
In my environment (P4 2.6 GHz,512Mb RAM, XPpro sp1, Off2003)
1000000 iterations for "Price299.99"
RoyVidar GetMyNum: 22 seconds
PHV last getDigits: 15 seconds
Anyone interested to do some bench ?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top