Wednesday, June 17, 2009

Reading data from PDF and writing to a text file using QTP.

See main post here.

'Reading for example first three word of the first page of abc2.pdf and writing to r.txt.

strFileName = "C:\abc2.pdf"
Set AcroApp = CreateObject("AcroExch.App")
Set AcroAVDoc = CreateObject("AcroExch.AVDoc")
AcroAVDoc.Open strFileName,""
Set AcroAVDoc = AcroApp.GetActiveDoc
Set AcroPDDoc = AcroAVDoc.GetPDDoc

For i = 0 To AcroPDDoc.GetNumPages - AcroPDDoc.GetNumPages

' AcquirePage: Acquires the specified page. The first page in a PDDoc is always 0. returns true if
successful and false otherwise.

Set PageNumber = AcroPDDoc.AcquirePage(i)

'the Hilite list object is being created

Set PageContent = CreateObject("AcroExch.HiliteList")
PageContent.Add 0, 20 ' getting 3 words of first page.

'text selection AcroTextSelect is being created

Set AcroTextSelect = PageNumber.CreateWordHilite(PageContent)

'GetNumText: Gets the number of text elements in a text selection. Use this method to
determine how many times to call the PDTextSelect.GetText method to obtain all of a text selection’s text.

For j = 0 To AcroTextSelect.GetNumText -1
Content = Content & AcroTextSelect.GetText(j)

msgbox Content

strFile = "c:\r.txt"
strText = Content

Set objFSO = CreateObject("Scripting.FileSystemObject")
Const ForAppending = 8

Set objTextFile = objFSO.OpenTextFile (strFile, ForAppending, True)

AcroAVDoc.Close True
Set AcroDoc = Nothing
Set AcroApp = Nothing