Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Urgent problem: Search for words with underscore 4

Status
Not open for further replies.

MakeItSo

Programmer
Oct 21, 2003
3,316
DE
Hi friends,

usually I don't have problems with such a seemingly easy one, perhaps I'm simply mind-blocked but:

I need to find all underscore-connected words in a text and don't know how! [sadeyes]

Example for the strings i need to find:
{ABC_here_comes_some_variable_name12&23}

Yes: the opening and closing curly brackets enclose ALL these strings.
No: this is nothing special, in fact they enclose ALL strings, including thos I do not wish to find (which are the vast majority)

As you can see, these strings also may contain the ampersand character as well as numbers.

Can you please help me find an appropriate wildcard search pattern - or regex search pattern - to loop through my text and highlight all these strings?

Working on Word XP.

Can also be VBA.

Thanks a lot!
Andy


[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
*SIGH*

Forget about it - blockade is gone, problem solved.

For all with a similar problem, here's the solution in VBA.
If anyone can give me a better, esp. FASTER solution, your welcome to contribute!
:)
Code:
With Selection.Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Text = "^?_^?" 'find combination character underscore character
    .Wrap = wdFindStop
    .MatchWildcards = False
    .Execute
End With
Do While Selection.Find.Found
    'now move a bit to select the entire word_word
    Selection.Start = Selection.Start + 1
    Selection.Collapse wdCollapseStart
    Selection.MoveLeft unit:=wdWord, Count:=1
    Selection.MoveRight unit:=wdWord, Count:=3, Extend:=wdExtend
    'check whether _ or & follows:
    Do While (ActiveDocument.Range(Selection.Range.End, Selection.Range.End + 1).Text = "_" Or ActiveDocument.Range(Selection.Range.End, Selection.Range.End + 1).Text = "&")
        Selection.MoveRight unit:=wdWord, Count:=2, Extend:=wdExtend
    Loop
    Selection.Range.Style = ActiveDocument.Styles("tw4winExternal")
    Selection.Collapse wdCollapseEnd
    Selection.Find.Execute
Loop

The code is comparably slow because it works with selection, move extend... but at least it works.
Any improvement greatly welcome.

Cheers!
Andy

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
Hi Andy,

So what you really want to find is strings beginning with '{' and ending with '}' that might (or is that will - what if there's only one "word"?) contain underscores. Correct?

And then format those strings with the 'tw4winExternal' Style. Correct?

Is there any prospect that your string delimiters (ie '{' and '}') will be unmatched?

Cheers

[MS MVP - Word]
 
Hi macropod!

I am only looking for strings containing underscores and sometimes ampersands, yes.
I will neglect one-word-strings without underscores.
The string delimiters are always there without exemption.

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
Hi Andy,

OK, try the following:
Code:
Sub ReformatStrings()
With ActiveDocument.Content.Find
  .ClearFormatting
  .Text = "[{]*[}]"
  With .Replacement
    .ClearFormatting
    .Style = ActiveDocument.Styles("tw4winExternal")
    .Text = "^&"
  End With
  .Forward = True
    .Wrap = wdFindContinue
    .Format = True
    .MatchCase = False
    .MatchWholeWord = False
    .MatchAllWordForms = False
    .MatchSoundsLike = False
    .MatchWildcards = True
  .Execute Replace:=wdReplaceAll
End With
End Sub
Provided the delimiters are present as you say, this will even pick up the strings that lack the underscores.

Cheers

[MS MVP - Word]
 
Hi Andy,

I should have mentioned:
If you really only want the strings with the underscores, change '.Text = "[{]*[}]"' to '.Text = "[{]*[_]*[}]"'.

Cheers

[MS MVP - Word]
 
No macropod,

this will certainly not do what I am tring to do.
Let me clarify a bit:

This is what the document looks like

------
...
{text a}
{text b}
{yadda_yadda}
{text c}
{another_yadda_12&34}
...
------

Now I ONLY want to catch the yadda lines. The others may not be touched!
As I said: the curly brackets limit EVERY string and cannot be used to identify the underscore lines.

Thanks a lot for trying though!
:)

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
Ah, posted before your second post.
But that will only get those strings with 1 underscore.
Not feasible either, I'm afraid.
[ponder]

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
No Any,

It will get the strings with at least one underscore!

Cheers

[MS MVP - Word]
 
Nope, does not work.
In my example text above, it highlights all up to and including the yadda line. But the selection spans over several lines.
Nice try though.

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
Hi Andy,

I've retested the code and it works as advertised.

If you're getting spurious ranges being bolded then it's because the braces aren't always matched (eg {yadda {yadda_yadda_yadda}) or you've got strings with underscores as well as characters you want to exclude (eg {yadda_yadda yadda}).

Cheers

[MS MVP - Word]
 
Sorry.
I just realised that there are in fact strings outside the limit markers containing underscores too.
Anyway: that blows the whole thing. A one-liner wildcard search does not do the trick, just as I feared.

Probably a regex expression will be the winner.
That will have to wait though.
For now, I've prepared the doc with my slow-working but working vba looping.

[sigh]
Life could be so easy...

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
Possibly...
Code:
Dim r As Range
Set r = ActiveDocument.Range
With r.Find
   .MatchWildcards = True
   .Text = "[{]*[}]"
   Do While .Execute(Forward:=True) = True
      If InStr(1, r.Text, "_") > 0 Then
         r.Style = "Gerry"
      End If
   Loop
End With
If, as you say, you DO have strings outside of { } - although you did state:

As I said: the curly brackets limit EVERY string and cannot be used to identify the underscore lines.

then it will only look for strings inside the curly brackets.

Once it finds those strings, it checks to see if there an underscore.

If there is, that string is made the style "Gerry". Note that this will include the curly brackets. If this is not desired, then re-adjust the Range appropriately.

Another alternative. If these are separate paragraphs, then:
Code:
Dim oPara As Paragraph
For Each oPara In ActiveDocument.Paragraphs
   If InStr(oPara.Range.Text, "_") > 0 Then
      oPara.Style = "Gerry"
   End If
Next

In both cases (assuming for testing purposes the "Gerry" styles is a bolding):

{text a}
{text b}
{yadda_yadda}
{text c}
{another_yadda_12&34}

becomes:

{text a}
{text b}
{yadda_yadda}
{text c}
{another_yadda_12&34}

Now, if you DO have strings (or paragraphs) with underscores, that are NOT between curly brackets, then....do another test.
Code:
Dim oPara As Paragraph
For Each oPara In ActiveDocument.Paragraphs
   If InStr(oPara.Range.Text, "_") > 0 _
      And Left(oPara.Range.Text, 1) = "{" Then
      oPara.Style = "Gerry"
   End If
Next

This will turn:

{text a}
{text b}
{yadda_yadda}
{text c}
{another_yadda_12&34}
not_in_curly brackets
{ text yadda yadda}
{more_in_a_bunch_of bracket with some text not _}


into:

{text a}
{text b}
{yadda_yadda}
{text c}
{another_yadda_12&34}
not_in_curly brackets
{ text yadda yadda}
{more_in_a_bunch_of bracket with some text not _}

If you use the Range method (as opposed to the Paragraph method), there is no need to change anything, as it only looks in strings between curly brackets.





faq219-2884

Gerry
My paintings and sculpture
 
Hi Andy,

Perhaps:
Code:
Option Explicit

Sub ReformatStrings()
Application.ScreenUpdating = False
Dim i As Integer
Dim StrWord
With ActiveDocument
  With .Content.Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Text = "_"
    .Replacement.Text = "^-"
    .Forward = True
    .Wrap = wdFindContinue
    .Format = False
    .MatchCase = False
    .MatchWholeWord = False
    .MatchWildcards = False
    .MatchSoundsLike = False
    .MatchAllWordForms = False
    .Execute Replace:=wdReplaceAll
  End With
  With .Content.Find
    .Text = "&"
    .Replacement.Text = "^-^-"
    .Execute Replace:=wdReplaceAll
  End With
  For i = 1 To .Words.Count - 3
    If .Words(i) = "{" Then
      If .Words(i + 2) = "}" Then
        If InStr(.Words(i + 1), " ") = 0 And InStr(.Words(i + 1), Chr(160)) = 0 Then
          If InStr(.Words(i + 1), Chr(31)) > 0 Then
            StrWord = .Words(i) & .Range.Words(i + 1) & .Range.Words(i + 2)
            i = i + 3
            With .Content.Find
              .Text = StrWord
              .Replacement.Style = ActiveDocument.Styles("tw4winExternal")
              .Replacement.Text = "^&"
              .Format = True
              .Execute Replace:=wdReplaceAll
              End With
          End If
        End If
      End If
    End If
  Next
  With .Content.Find
    .Text = "^-^-"
    .Replacement.Text = "&"
    .Replacement.ClearFormatting
    .Format = False
    .Execute Replace:=wdReplaceAll
  End With
  With .Content.Find
    .Text = "^-"
    .Replacement.Text = "_"
    .Execute Replace:=wdReplaceAll
  End With
End With
Application.ScreenUpdating = True
End Sub
Basically, the above code replaces the underscores and ampersands with optional hyphens, thus turning the strings concerned into a single word. The code then test each set of 3 words to see if they start & end with the braces, contain optional hyphens and no spaces. If so the style is applied. Finally, the underscores and ampersands are restored.

Cheers

[MS MVP - Word]
 
Thank you both!

I am going to fiddle around a bit more, but I think a mix between paragraph-cycling and the optional hyphens will be good.

Re: concerning the EVERY string statement, yes that was very mistakable, sorry. The concerned text consists of software strings (hence I said "strings"), but actually it is string blocks. Most of them DO contain of one string alone, but some don't.
I cannot go for the paragraph alone, as there are few occurences of such underscored strings occuring within a sentence, that need different handling. That poses no problem at all since these all begin with a distinct character ($). They need different handling - and a different style - anyway.
However treating the ones i wanted to catch in THIS cycle per entire paragraph without further checking, would also override my "in sentence" check...

But paragraph-checking will speed up the thing essentially!
Thanks Gerry!

And I do like the idea with the optional hyphens, because there are none in the entire text. no danger of accidentally catching a wrong word - that will speed it up to...
:)

Thanks macropod!

Will let you know once I have optimized the code.

Cheers,
Andy

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
This is how I am currently doing it, and it's working nicely.

Code:
With ActiveDocument.Content.Find        
    .ClearFormatting
    .Replacement.ClearFormatting
    .Text = "_"
    .Replacement.Text = "^-"
    .MatchWildcards = False
    .Format = False
    .Execute Replace:=wdReplaceAll
    
    .Text = "&"
    .Replacement.Text = "^-^-"
    .Execute Replace:=wdReplaceAll
End With

Set ran = ActiveDocument.Range
With ran.Find
    .ClearFormatting
    .Format = False
    .Text = "^-"
    .MatchWildcards = False
    .Execute
End With
Do While ran.Find.Found
    ran.Words(1).Style = intrn
    Set ran = ActiveDocument.Range(ran.Words(1).End, ActiveDocument.Range.End)
    With ran.Find
        .Text = "^-"
        .Forward = True
        .Execute
    End With
Loop

Thanks to all!

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
Seems to me that you could get much the same effect as your final example by using something like:
Code:
[blue]    With ActiveDocument.Content.Find
        .ClearFormatting
        .Replacement.ClearFormatting
        .Text = "[0-9a-zA-Z&]{1,}_[0-9a-zA-Z&_]{1,}"
        .Replacement.Text = ""
        .Replacement.Font.Bold = True 'for example. You can revert to Style here if you prefer
        .Forward = True
        .Wrap = wdFindContinue
        .Format = True
        .MatchCase = False
        .MatchWholeWord = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchWildcards = True
        .Execute Replace:=wdReplaceAll
    End With[/blue]
 
Strongm,

this looks exactly like what I've been looking for!
Alas, word tells me something like "... contains invalid wildcard ..."

Can you assemble character ranges, number ranges and single character like this in one square bracket?
[ponder]

I have re-checked the rules for wildcard search and according to them, your search pattern should do fine - but it doesn't.
[3eyes]

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
MakeItSo,

I believe you are in Germany; if you have German settings on your system then the separator character will be, I think, a semi-colon instead of a comma:

[blue][tt].Text = "[0-9a-zA-Z&]{1[/tt][highlight yellow][tt];[/tt][/highlight][tt]}_[0-9a-zA-Z&_]{1[/tt][highlight yellow][tt];[/tt][/highlight][tt]}"[/tt][/blue]

Enjoy,
Tony

------------------------------------------------------------------------------------
We want to help you; help us to do it by reading this: Before you ask a question.

I'm working (slowly) on my own website
 
Perfect Tony!
As so often it is the simple things that get us.
[tongue]

Works like charm.
:)

[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top