Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Power Query Extract from pdf 1

Status
Not open for further replies.

GPnew

Programmer
Nov 12, 2001
56
NZ
I am extracting data using power from a pdf file with headers and was wondering if there was an easy way to transform the data into a data cube in power query
eg currently it imports like this
Dept Sales
Jo Bloggs 120
John Smith 100
Dept Sales Total 220

Dept Admin
Ned Kelly 80
Butch Cassidy 200
Dept Admin Total 280

And I would like to see it like this is put the dept in the same data line with no header and total lines

Sales Jo Bloggs 120
Sales John Smith 100
Admin Ned Kelly 80
Admin Butch Cassidy 200

Thanks


 
I'm not sure what is precise format of your entry data. Assuming that it is a single column string, with some preprocessing in excel:
- add header 'Data'
- convert the range to table and name it 'tData'

the M query that converts input to 3-column table:
Code:
let
    Source = Excel.CurrentWorkbook(){[Name="tData"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Data", type text}}),
    ExtractDeptNotTotal = Table.AddColumn(#"Changed Type", "Dept", each if Text.Contains([Data], "Dept")=true and Text.Contains([Data], "Total")=false then Text.Replace([Data],"Dept ","") else null),
    FillDeptDown = Table.FillDown(ExtractDeptNotTotal,{"Dept"}),
    ExtractAmount = Table.AddColumn(FillDeptDown, "Amount", each Text.Reverse(Text.BeforeDelimiter(Text.Reverse([Data]), " "))),
    ExtractName = Table.AddColumn(ExtractAmount, "Name", each Text.Replace([Data],Text.Combine({" ",[Amount]}),"")),
    ReorderColumns = Table.ReorderColumns(ExtractName,{"Data", "Dept", "Name", "Amount"}),
    RemoveNullAndEmptyData = Table.SelectRows(ReorderColumns, each [Data] <> null and [Data] <> ""),
    RemoveDeptInData = Table.SelectRows(RemoveNullAndEmptyData, each not Text.StartsWith([Name], "Dept")),
    RemoveData = Table.RemoveColumns(RemoveDeptInData,{"Data"}),
    AmountToNumber = Table.TransformColumnTypes(RemoveData,{{"Amount", type number}})
in
    AmountToNumber

combo
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top