Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations biv343 on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Reformat a Report Extracted from a .PDF File

Status
Not open for further replies.

pcdaugs

Technical User
Jan 11, 2008
14
US
Hello Everyone,

I have a limited CRM that will produce two lists, producers and clients, in a contiguous format but the data does not show which producer wrote which client. To solve this I have another report(from the CRM) that is only available by printing it to a .PDF file and then extracting it to excel so the format is horrible. I am writing a program that will search this report and find the producers and their clients in the document and then copy them into a new sheet showing the one to many relationship of the producer to client. The final product should look like the following:

Cell1 Cell2 Cell3 Cell4 Cell5 Cell6
PFirst PMiddle PLast CFirst CMiddle CLast
John A Dowe Jack J Anders
John A Dowe Paul C Johnson
Allen T Hawkins Amy P Andrews
Allen T Hawkins John J Baker
Allen T Hawkins Dawn G Jones

Anyway, the code I have below has been cut and pasted in and developed from trial and error to pull out the producers and parse their names in to the three cells I am looking for. Now I am moving on to adding the clients to these producers and I have run into two problems.

PROBLEM 1: The producers and clients do not always follow the same name pattern. Generally the producer will follow "Producer: "First MiddleInitial Last" pattern and the client "First MiddleInitial Last" pattern but there are cases where both the producer and the client do not have Middle Initials.

PROBLEM 2: The relationship between the producer and the client is one to many and the report list the producer once and then lists the clients below. I need to list the producer once for each occurrence of their clients in the new sheet.

If anyone can give me a hand with this I would greatly appreciate it. My code is below.

Thanks in Advance!

pcdaugs

Code:
Sub FindProducerClient()
     
    Dim rFndP1   As Excel.Range ' Producer found range of Sheet2
    Dim rFndP2   As Excel.Range ' Next Producer found range of Sheet2
    Dim rFndC    As Excel.Range ' Client found range of Sheet2
    Dim sAdr   As String ' Address of first found cell
    Dim astr1()  As String ' Zero-based parsing buffer for rFndP1
    Dim astr2()  As String ' Zero-based parsing buffer for rFndC
     
    Dim rOut    As Excel.Range ' Output range on Sheet1
     
    'Clear output sheet and write header
    Sheet1.Cells.ClearContents
    Sheet1.Range("A1").Resize(, 6) = Array("ProducerFN", "ProducerMI", "ProducerLN", _
    "ClientFN", "ClientMI", "ClientLN")
     
    'Finds the producer cell location
    Set rFndP1 = Sheet2.Cells.Find( _
    What:="Producer:", LookIn:=xlValues, LookAt:=xlPart, _
    SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:=False)
    
    'Finds the next producer cell location
    Set rFndP2 = Sheet2.Cells.FindNext(rFndP1)
    
    'Ends the Sub
    If rFndP1 Is Nothing Then Exit Sub
    
    '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
    'Because the report is extracted from a .PDF file it does not list the client with any type
    'of identifier. Although, if you look in column B you will find the text "LTC" in the same row
    'as the client's name. This Find method searches for "LTC" and then I offset this cell location
    'to pull the client's name.
    '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
    Set rFndC = Sheet2.Range(rFndP1, rFndP2).Find( _
    What:="LTC", LookIn:=xlValues, LookAt:=xlPart, _
    SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:=False)
 
    ' save the address of the first found cell
    sAdr = rFndP1.Address
     
    '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
    'THIS DO WHILE LOOP parses out the producers and the clients names into three cells which will
    'later be use with data validation to add the producer's name to a client list
    '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
    'PROBLEM1: The producers and clients do not always follow the same name pattern. Generally the producer
    'will follow "Producer: First MiddleInitial Last" pattern and the client "First MiddleInitial Last"
    'pattern but there are case cases where both the producer and the client do not have Middle Initials.
    '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
    'PROBLEM2: I need to list the producer once for each occurrence of their clients. This relationship
    'between the producer and client is one to many and in the report it will list the producer and
    'then the clients below.
    '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
    Do
        astr1 = Split(WorksheetFunction.Trim(rFndP1.Value), " ")
        astr2 = Split(WorksheetFunction.Trim(rFndC.Offset(0, -1).Value), " ")
        Set rOut = Sheet1.Cells(Rows.Count, "A").End(xlUp).Offset(1)
        rOut(1, "A").Value = astr1(1)
        rOut(1, "B").Value = astr1(2)
        rOut(1, "C").Value = astr1(3)
        rOut(1, "D").Value = astr2(1)
        rOut(1, "E").Value = astr2(2)
        rOut(1, "F").Value = astr2(3)
        Set rFndP1 = Sheet2.Cells.FindNext(rFndP1)
    Loop While rFndP1.Address <> sAdr
     
End Sub
 



Without knowing the structure of the pdf, it's kind of difficult to offer help. I'm sure that this thread was read, because, I read it before, but I had nothing to offer.

I have tried to parse a pdf, with great difficulty and with mixed results.

Bite the bullet and purchase the latest vertion of an Adobe product or some other vendor, that can convert a pdf to Excel.

Is there ANY possibility of getting the to the source data that was used to produce the pdf?

Skip,

[glasses] When a diminutive clarvoyant had disappeared from detention, headlines read...
Small Medium at Large[tongue]
 
Hello All,

I guess I made the mistake in explaining that the .pdf file has already been extracted into excel document. Now I am searching the excel document for the producer and client and relating them to each other. If any one can help me in pulling the client out tied to their producer, it would be much appreciated!

Thanks in Advanced!

pcdaugs
 




Slight oversight. Hmmmmmmm?

What does the source data look like?

Where are the client and producer?

Exactly what data is in what columns?

Please be CLEAR, CONCISE and COMPLETE.

Skip,

[glasses] When a diminutive clarvoyant had disappeared from detention, headlines read...
Small Medium at Large[tongue]
 
Hello Skip,

I do appreciate the help, I will be sure to reread my posts twice before I post. Maybe that way I will avoid the misunderstanding.

Anyway, here are the statements I can make about the report once it is extracted into excel.

~The Producer is found most of the time in column A but can be found in column B too.

~The Producer is easily searched by way of the Find method because it ALWAYS follows these two formats "Producer: John A. Doe" and "Producer: John Doe". Searching for the "Producer:" finds them all.

~The Client is ALWAYS in column A.

~The Client follows the following two formats, "John A. Doe" and "John Doe".

~The Client is easily searched based on the text "LTC" in the cell adjacent the client's name in column B.

I have based my code on these statements, which has seemed to work pretty well. In the code below I have been able to pull the producers and clients out and input them into the proper cell location.

The problem that I am running into is my code will find the first producer and then only print that producer tied with all of the clients it finds. I have not figured out how I can limit the client find method to only find the clients listed between the first and second producers and so on.

Would it make more sense to break apart the producer find method, client find method and the actual output?

Thanks in Advance!

pcdaugs

As far as the actual report looks something like this.

Column A Column B
Producer 1


Client LTC

Client LTC

Producer 2

Client LTC





Client LTC


Producer 3


Client LTC

Code:
Sub FindProducerClient()
     
    Dim rFndP1   As Excel.Range ' Producer found range of Sheet2
    Dim rFndP2   As Excel.Range ' Next Producer found range of Sheet2
    Dim rFndC    As Excel.Range ' Client found range of Sheet2
    Dim sAdrP    As String ' Address of first found Producer Cell
    Dim sAdrC    As String ' Address of first found Client Cell
    Dim astr1()  As String ' Zero-based parsing buffer for rFndP1
    Dim astr2()  As String ' Zero-based parsing buffer for rFndC
     
    Dim rOut    As Excel.Range ' Output range on Sheet1
     
    'Clear output sheet and write header
    Sheet1.Cells.ClearContents
    Sheet1.Range("A1").Resize(, 6) = Array("ProducerFN", "ProducerMI", "ProducerLN", _
    "ClientFN", "ClientMI", "ClientLN")
     
    'finds first occurence of "Producer:"
    Set rFndP1 = Sheet2.Cells.Find( _
    What:="Producer:", LookIn:=xlValues, LookAt:=xlPart, _
    SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:=False)
    
    'finds next occurence of "Producer:"
    Set rFndP2 = Sheet2.Cells.FindNext(rFndP1)
    If rFndP1 Is Nothing Then Exit Sub
    
    'finds the first occurence of the client by searching for "LTC".
    'I was trying to pass a range for the method to search in between.
    'This range being the first producer found and the second found.
    'This has not worked.
    Set rFndC = Sheet2.Range(rFndP1, rFndP1).Find( _
    What:="LTC", LookIn:=xlValues, LookAt:=xlPart, _
    SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:=False)
    
 
    ' save the address of the first found cell
    sAdrP = rFndP1.Address
    
    'THIS DO WHILE LOOP parses out the producers and the clients names into three cells which will
    'later be use with data validation to add the producer's name to a client list

    Do
        
        'zero-based parsing buffer for producer
        astr1 = Split(WorksheetFunction.Trim(rFndP1.Value), " ")
        
        'zero-based parsing buffer for client. This rFndC is based on finding
        'the value "LTC" so I use Offset to grab the value of the Cell immediately
        'to the left of the rFndC cell.
        astr2 = Split(WorksheetFunction.Trim(rFndC.Offset(0, -1).Value), " ")
        
        'rOut expression moves the row that the producer and client are going to
        'set in down for each new record.
        Set rOut = Sheet1.Cells(Rows.Count, "A").End(xlUp).Offset(1)
        
        rOut(1, "A").Value = astr1(1)
        rOut(1, "B").Value = astr1(2)
        rOut(1, "C").Value = astr1(3)
        
        'the if statement here will count the number of words in the client cell.
        'if the number of words are equal to 2 then it will set the first and last
        'name in the approprate column. if it is not equal to 2 then it will parse
        'the text as having a middle initial.
        If 2 = Len(WorksheetFunction.Trim(rFndC.Offset(0, -1).Value)) - _
        Len(WorksheetFunction.Substitute(rFndC.Offset(0, -1).Value, " ", "")) + 1 Then
        
        rOut(1, "D").Value = astr2(0)
        rOut(1, "F").Value = astr2(1)
        
        Else
        rOut(1, "D").Value = astr2(0)
        rOut(1, "E").Value = astr2(1)
        rOut(1, "F").Value = astr2(2)
        End If
            
        Set rFndP1 = Sheet2.Cells.FindNext(rFndP1)
    Loop While rFndP1.Address <> sAdrP
     
End Sub

 




Use the Find method to search for Producer:
Code:
dim found as range, sClient as string
set found = cells.find("Producer:")
if not found is nothing then
  'found something
   
end if
use the findnext of After:= argument to search for the next in your loop.

Skip,

[glasses] When a diminutive clarvoyant had disappeared from detention, headlines read...
Small Medium at Large[tongue]
 
Hello Skip,

I am already using the FindNext(rFndP1) argument to find the next producer. The problem I am having is searching for the clients that are tied to the specific producer. Example:

Report in Excel
Column 1 Column 2
Producer1

Client LTC


Client LTC

Client LTC

Producer 2

Client LTC

Producer 3

Client LTC

Client LTC

I have the code to search for each of the producers and then parse their name into three cells.
Code:
    Dim rfnd    As Excel.Range  ' found range of Sheet2
    Dim sAdr    As String ' address of first found cell
    Dim astr()  As String ' zero-based parsing buffer for rFnd
     
    Dim rOut    As Excel.Range ' output range on Sheet1


    Set rfnd = Sheet2.Cells.Find( _
    What:="Producer:", LookIn:=xlValues, LookAt:=xlPart, _
    SearchOrder:=xlByRows, SearchDirection:=xlNext,           
    MatchCase:=False)
    If rfnd Is Nothing Then Exit Sub

    Do
        'zero-based parsing buffer
        astr = Split(WorksheetFunction.Trim(rfnd.Value), " ")
        Set rOut = Sheet1.Cells(Rows.Count, "A").End(xlUp).Offset(1)
        rOut(1, "A").Value = astr(1)
        rOut(1, "B").Value = astr(2)
        rOut(1, "C").Value = astr(3)
        Set rfnd = Sheet2.Cells.FindNext(rfnd)
     Loop While rfnd.Address <> sAdr

What I need to do now is to add a Find method to search for the client and parse their names out. I can find the clients names by using Find("LTC") which will find the row that the client's name is in. Then I can use offset(0,-1) to grab the client's name in column A at the same row number.

Code:
    Set rFndC = Sheet2.Cells().Find( _
    What:=[b]"LTC"[/b], LookIn:=xlValues, LookAt:=xlPart, _
    SearchOrder:=xlByRows, SearchDirection:=xlNext, _
    MatchCase:=False)

    'zero-based parsing buffer
    astr2 = Split(WorksheetFunction.Trim(rFndC.[b]Offset(0, -1)[/b].Value), " ")

This code for finding "LTC" works but I need to limit this find method to only find the clients related to their producer. I was thinking that could use the producer.address to find the first two occurrences of producer and the cell location and pass this range to a countif method with the criteria to count "LTC". See the code below. This code does not work. Primarily because I can't figure out how to pass the address locations of the first two producers to the countif method. One of the problems I with this is I need to give it the range in column B and some of the producers have cell locations in column A.

Code:
Do
    Set rFndC = Sheet2.Cells().Find( _
    What:="LTC", LookIn:=xlValues, LookAt:=xlPart, _
    SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:=False)
    
    astr2 = Split(WorksheetFunction.Trim(rFndC.Offset(0, -1).Value), " ")
    Set rOut = Sheet1.Cells(Rows.Count, "A").End(xlUp).Offset(1)
    
    Set rFndC = Sheet2.Cells.FindNext(rFndC)
    iCount = iCount + 1
Loop While iCount <>[b] WorksheetFunction.CountIf(Producer1.address _
     Producer2.address, "LTC")[/b]

Anyway, if you, Skip, or someone else can solve my problem with the countif method I would greatly appreciate it. I have done a lot of research online into how to pass the right range to countif but I just can seem to get it to work.

Thanks in Advanced,

pcdaugs
 





Once you FIND a specific producer, you must find the next PRODUCER:, the position of which will limit your accumulating the clients.

Skip,

[glasses] When a diminutive clarvoyant had disappeared from detention, headlines read...
Small Medium at Large[tongue]
 
Hello Skip,

I understand that I need to find the next producer and I have done that with the following code in bold.

Code:
    Set rfnd = Sheet2.Cells.Find( _
    What:="Producer:", LookIn:=xlValues, LookAt:=xlPart, _
    SearchOrder:=xlByRows, SearchDirection:=xlNext,           
    MatchCase:=False)
    If rfnd Is Nothing Then Exit Sub
    [b]Set rfnd1 = Sheet2.Cells.FindNext(rfnd)[/b]

The problem that I am having is passing these two variable rfnd and rfnd1 to the countif(range, criteria) as a range. every time I pass it as a range I get a compiler error of mismatched types. Anyway suggestions as to how to pass these variables to countif?

The other problem is that the range needs to search column B.

Thanks,

pcdaugs
 


Code:
Loop While iCount <> WorksheetFunction.CountIf(range(Producer1, Producer2), "LTC")

Skip,

[glasses] When a diminutive clarvoyant had disappeared from detention, headlines read...
Small Medium at Large[tongue]
 
Hello Skip,

Boy, is this a DUH moment! passing it as a range would have been smart...lol well I am very greatful for your help. Anyway, I just played around with this code and it does what I want it to do but there is still one hitch that I thought I would ask about.

Producer1 and Producer2 are most often found in column A but can be found in Column B. For the producers found in column B passing producer1 or 2 as a range does the trick but when the variable is found in column A countif can not find the text "LTC" because it is always in column B. Any suggestion as to how I might force the variable to point to column B?

I thought I might use .offset(0,1) but that will through off the producer when it is found in column B to start.

Thoughts?

Thanks Much,

pcdaugs
 




Are there LTCs in column C? If not...
Code:
Loop While iCount <> WorksheetFunction.CountIf(range(Producer1, Producer2.offset(0,1)), "LTC")
or...
Code:
if Producer2.column = 1 then set producer2 = Producer2.offset(0,1)




Skip,

[glasses] When a diminutive clarvoyant had disappeared from detention, headlines read...
Small Medium at Large[tongue]
 
Hello Skip,

Again, you help has been GREAT! And just as it seems I am out of the woods and finished when another issue rears it ugly head. Boy, I forgot how much thought goes into programing! Anyway, the If statement works great I even was able to find a couple of errors in my programing because of it.

I have found two issues. I will start with the more difficult one. When I use the if statement to offset the column in the cell address it will do this search and find the two clients tied to the first producer and the icount and countif statement equal the do while loop ends. This is great but when it moves to the bold code below it will find the second producer loop back to the first Do and only find one client then error out on the italic line with a subscript out of range(runtime error 9). When I do a little more research into the problem I realized the Producer1 variable has changed its value to "LTC" instead of looking for "Producer:". So the error is cause because I am calling the second word in a one word string. That string being "LTC" instead of "Producer: John P. Andrews".

Code:
Sub FindProducerClient()
     
    Dim Producer1   As Excel.Range ' Producer found range of Sheet2
    Dim Producer2   As Excel.Range ' Next Producer found range of Sheet2
    Dim rFndC    As Excel.Range ' Client found range of Sheet2
    Dim rCount1  As Excel.Range
    Dim rCount2  As Excel.Range
    Dim sAdrP    As String ' Address of first found Producer Cell
    Dim astr1()  As String ' Zero-based parsing buffer for rFndP1
    Dim astr2()  As String ' Zero-based parsing buffer for rFndC
    Dim iCount   As Integer
     
    Dim rOut    As Excel.Range ' Output range on Sheet1
     
    'Clear output sheet and write header
    Sheet1.Cells.ClearContents
    Sheet1.Range("A1").Resize(, 6) = Array("ProducerFN", "ProducerMI", "ProducerLN", _
    "ClientFN", "ClientMI", "ClientLN")
     
    Set Producer1 = Sheet2.Cells.Find( _
    What:="Producer:", LookIn:=xlValues, LookAt:=xlPart, _
    SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:=False)
    Set Producer2 = Sheet2.Cells.FindNext(Producer1)
    If rFndP1 Is Nothing Then Exit Sub
    
    Set rFndC = Sheet2.Cells().Find( _
    What:="LTC", LookIn:=xlValues, LookAt:=xlPart, _
    SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:=False)
    
    ' save the address of the first found cell
    sAdrP = rFndP1.Address
    
    
    Do
    
        If Producer1.Column = 1 Then
        Set rCount1 = Producer1.Offset(0, 1)
        Else
        Set rCount1 = Producer1
        End If
            
        If Producer2.Column = 1 Then
        Set rCount2 = Producer2.Offset(0, 1)
        Else
        Set rCount2 = Producer2
        End If


            Do
                
                astr1 = Split(WorksheetFunction.Trim(Producer1.Value), " ")
                astr2 = Split(WorksheetFunction.Trim(rFndC.Offset(0, -1).Value), " ")
                Set rOut = Sheet1.Cells(Rows.Count, "A").End(xlUp).Offset(1)
                
                Set rOut = Sheet1.Cells(Rows.Count, "A").End(xlUp).Offset(1)
                [b][i]rOut(1, "A").Value = astr1(1)[/i][/b]
                rOut(1, "B").Value = astr1(2)
                rOut(1, "C").Value = astr1(3)
                
                If 2 = Len(WorksheetFunction.Trim(rFndC.Offset(0, -1).Value)) - _
                Len(WorksheetFunction.Substitute(rFndC.Offset(0, -1).Value, " ", "")) + 1 Then
                
                rOut(1, "D").Value = astr2(0)
                rOut(1, "F").Value = astr2(1)
                
                Else
                rOut(1, "D").Value = astr2(0)
                rOut(1, "E").Value = astr2(1)
                rOut(1, "F").Value = astr2(2)
                End If

                Set rFndC = Sheet2.Cells.FindNext(rFndC)
                iCount = iCount + 1
            Loop While iCount < WorksheetFunction.CountIf(Range(rCount1, rCount2), "LTC")
        
        
        [b]Set Producer1 = Producer2
        Set Producer2 = Sheet2.Cells.FindNext(Producer2)[/b]
        iCount = 0
        
    Loop While Producer1.Address <> sAdrP
     
End Sub

The thing I can't figure out is how Producer1 ever gets set to search for the text "LTC". Anyway thoughts?

Thanks Again,

pcdaugs
 




The problem is
Code:
Set Producer1 = Producer2
if you have offset, you must "reverse offset"
Code:
Set Producer1 = Producer2.offset(0,-1)
using the same logic.

Skip,

[glasses] When a diminutive clarvoyant had disappeared from detention, headlines read...
Small Medium at Large[tongue]
 
Hello Skip,

Actually, I setup two new variables so that didn't have to reverse the offset.

Code:
    Dim rCount1  As Excel.Range
    Dim rCount2  As Excel.Range

        If Producer1.Column = 1 Then
        Set rCount1 = Producer1.Offset(0, 1)
        Else
        Set rCount1 = Producer1
        End If
            
        If Producer2.Column = 1 Then
        Set rCount2 = Producer2.Offset(0, 1)
        Else
        Set rCount2 = Producer2
        End If

Then I pass rCount1 and rCount2 as ranges to the countif function.

Code:
Loop While iCount < WorksheetFunction.CountIf(Range(rCount1, rCount2), "LTC")

When I set Producer1 = Producer2 it is the same as the following code.

Code:
Set Producer1 = Sheet2.Cells.FindNext(Producer1)

So when the (Producer1) gets set to find the next occurrence of "LTC" is bewildering! Anyway, thoughts to why this would set itself up like that?

Thanks for sticking with me.

pcdaugs
 
Alright, I just used F8 to cycle through my code and found that the code is work beautiful up until I hit the line below that is located between the ends of the nested Do While Loops. For some reason this code assumes you want to find the next value from the most recent find method instance. Is there a way to toggle back and forth between these two find method instances?

Code:
Set Producer1 = Producer2
[b]Set Producer2 = Sheet2.Cells.FindNext(Producer2)[/b]

Thanks,

pcdaugs
 




Any RANGE object has MANY properties, ONE of which is he Value property. It is the Value property that you are using to FIND.

I would modify each FIND statement to include the Value property for the search value.



Skip,

[glasses] When a diminutive clarvoyant had disappeared from detention, headlines read...
Small Medium at Large[tongue]
 
Hello Skip,

I went through and added producer1.value as below. All it did was through runtime errors. Did I do this right or you suggesting something else?

Code:
Set Producer2 = Sheet2.Cells.FindNext(Producer2.Value)

Thanks,

Paul
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top