Wednesday, June 17, 2009

How to get first 3 words of first page of any PDF file using QTP.

See main post here.

strFileName = "C:\Readme.pdf"
Set AcroApp = CreateObject("AcroExch.App")
AcroApp.Show
Set AcroAVDoc = CreateObject("AcroExch.AVDoc")
AcroAVDoc.Open strFileName,""
Set AcroAVDoc = AcroApp.GetActiveDoc
Set AcroPDDoc = AcroAVDoc.GetPDDoc

For i = 0 To AcroPDDoc.GetNumPages - AcroPDDoc.GetNumPages

' AcquirePage: Acquires the specified page. The first page in a PDDoc is always 0. returns true if
successful and false otherwise.

Set PageNumber = AcroPDDoc.AcquirePage(i)

'the Hilite list object is being created

Set PageText = CreateObject("AcroExch.HiliteList")
PageText.Add 0, 3 ' getting 3 words of first page.

'text selection AcroTextSelect is being created

Set AcroTextSelect = PageNumber.CreateWordHilite(PageText)

'GetNumText: Gets the number of text elements in a text selection. Use this method to determine
how many times to call the PDTextSelect.GetText method to obtain all of a text selection’s text.

For j = 0 To AcroTextSelect.GetNumText -1
PText = PText & AcroTextSelect.GetText(j)
Next
Next

msgbox PText
AcroAVDoc.Close True
AcroApp.Exit
Set AcroDoc = Nothing
Set AcroApp = Nothing