Пишу скрипт где нужен тессеракт. Скачал библиотеку аж 10 летней давности тут(ничего новее не нашел):
www. autoitscript.com/forum/topic/89542-tesseract-screen-ocr-udf/
Как и ожидалось, при работе с Tesseract v4 вылезают ошибки сплошь и рядом. В коде не особо разбираюсь, потому и фиксить что-то сложнее "тут изменить путь к файлу" и пр. не в состоянии. Может у кого есть обновленная версия?
-------------------------
На случай если нет:
Вот библиотека
Ошибка первая
В функции _TesseractScreenCapture при $show_capture= 1 (для дебага) вылетает ошибка во вложении.
Бог с ним, со встроенным дебагом. Из без нее тоже не работает.
Ошибка 2
Функция возвращает пустой массив, т.к .tiff файл это один пиксель ровно по центру.
Дальше не смог разобраться. Почему так происходит?
P.S Не знаю важно это или нет, но скрипт пишется для автоматизации рутины в веб приложении где кроме как через тессеракт данные никак не получить(цифры не выделяются). И да, все это в хроме.
www. autoitscript.com/forum/topic/89542-tesseract-screen-ocr-udf/
Как и ожидалось, при работе с Tesseract v4 вылезают ошибки сплошь и рядом. В коде не особо разбираюсь, потому и фиксить что-то сложнее "тут изменить путь к файлу" и пр. не в состоянии. Может у кого есть обновленная версия?
-------------------------
На случай если нет:
Вот библиотека
Код:
#include-once
#Include <Array.au3>
#Include <File.au3>
#include <GDIPlus.au3>
#include <ScreenCapture.au3>
#include <WinAPI.au3>
#include <ScrollBarConstants.au3>
#include <WindowsConstants.au3>
#Include <GuiComboBox.au3>
#Include <GuiListBox.au3>
#Region Header
#cs
Title: Tesseract UDF Library for AutoIt3
Filename: Tesseract.au3
Description: A collection of functions for capturing text in applications.
Author: seangriffin
Version: V0.6
Last Update: 17/03/09
Requirements: AutoIt3 3.2 or higher,
Tesseract 2.01.
Changelog: ---------15/02/09---------- v0.1
Initial release.
---------15/02/09---------- v0.2
Changed path to tesseract.exe to @ProgramFilesDir.
Added scaling as input to _TesseractCapture.
Fixed indentation.
Changed CaptureHWNDToTIFF to input window and control IDs.
---------16/02/09---------- v0.3
Added the parameter $get_last_capture to _TesseractCapture.
Added the parameter $show_capture to _TesseractCapture.
---------16/02/09---------- v0.4
Added the function _TesseractFind.
---------21/02/09---------- v0.5
Updated _TesseractCapture to remove a listbox selection entirely,
and return it after the text capture is done.
---------17/03/09---------- v0.6
Split the function "_TesseractCapture" into 3 functions:
_TesseractScreenCapture
_TesseractWinCapture
_TesseractControlCapture
Split the function "_TesseractFind" into 3 functions:
_TesseractScreenFind
_TesseractWinFind
_TesseractControlFind
Renamed the function "CaptureHWNDToTIFF" to "CaptureToTIFF",
and modified it to allow for handling of the screen, windows
and controls.
Added the function "_TesseractTempPathSet".
#ce
#EndRegion Header
#Region Global Variables and Constants
Global $last_capture
Global $tesseract_temp_path = "C:\"
#EndRegion Global Variables and Constants
#Region Core functions
; #FUNCTION# ;===============================================================================
;
; Name...........: _TesseractTempPathSet()
; Description ...: Sets the location where Tesseract functions temporary store their files.
; You must have read and write access to this location.
; The default location is "C:\".
; Syntax.........: _TesseractTempPathSet($temp_path)
; Parameters ....: $temp_path - The path to use for temporary file storage.
; This path must not contain any spaces (see "Remarks" below).
; Return values .: On Success - Returns 1.
; On Failure - Returns 0.
; Author ........: seangriffin
; Modified.......:
; Remarks .......: The current version of Tesseract doesn't support paths with spaces.
; Related .......:
; Link ..........:
; Example .......: No
;
; ;==========================================================================================
func _TesseractTempPathSet($temp_path)
$tesseract_temp_path = $temp_path
Return 1
EndFunc
; #FUNCTION# ;===============================================================================
;
; Name...........: _TesseractScreenCapture()
; Description ...: Captures text from the screen.
; Syntax.........: _TesseractScreenCapture($get_last_capture = 0, $delimiter = "", $cleanup = 1, $scale = 2, $left_indent = 0, $top_indent = 0, $right_indent = 0, $bottom_indent = 0, $show_capture = 0)
; Parameters ....: $get_last_capture - Retrieve the text of the last capture, rather than
; performing another capture. Useful if the text in
; the window or control hasn't changed since the last capture.
; 0 = do not retrieve the last capture (default)
; 1 = retrieve the last capture
; $delimiter - Optional: The string that delimits elements in the text.
; A string of text will be returned if this isn't provided.
; An array of delimited text will be returned if this is provided.
; Eg. Use @CRLF to return the items of a listbox as an array.
; $cleanup - Optional: Remove invalid text recognised
; 0 = do not remove invalid text
; 1 = remove invalid text (default)
; $scale - Optional: The scaling factor of the screenshot prior to text recognition.
; Increase this number to improve accuracy.
; The default is 2.
; $left_indent - A number of pixels to indent the capture from the
; left of the screen.
; $top_indent - A number of pixels to indent the capture from the
; top of the screen.
; $right_indent - A number of pixels to indent the capture from the
; right of the screen.
; $bottom_indent - A number of pixels to indent the capture from the
; bottom of the screen.
; $show_capture - Display screenshot and text captures
; (for debugging purposes).
; 0 = do not display the screenshot taken (default)
; 1 = display the screenshot taken and exit
; Return values .: On Success - Returns an array of text that was captured.
; On Failure - Returns an empty array.
; Author ........: seangriffin
; Modified.......:
; Remarks .......: Use the default values for first time use. If the text recognition accuracy is low,
; I suggest setting $show_capture to 1 and rerunning. If the screenshot of the
; window or control includes borders or erroneous pixels that may interfere with
; the text recognition process, then use $left_indent, $top_indent, $right_indent and
; $bottom_indent to adjust the portion of the screen being captured, to
; exclude these non-textural elements.
; If text accuracy is still low, increase the $scale parameter. In general, the higher
; the scale the clearer the font and the more accurate the text recognition.
; Related .......:
; Link ..........:
; Example .......: No
;
; ;==========================================================================================
func _TesseractScreenCapture($get_last_capture = 0, $delimiter = "", $cleanup = 1, $scale = 2, $left_indent = 0, $top_indent = 0, $right_indent = 0, $bottom_indent = 0, $show_capture = 0)
Local $tInfo
dim $aArray, $final_ocr[1], $xyPos_old = -1, $capture_scale = 3
Local $tSCROLLINFO = DllStructCreate($tagSCROLLINFO)
DllStructSetData($tSCROLLINFO, "cbSize", DllStructGetSize($tSCROLLINFO))
DllStructSetData($tSCROLLINFO, "fMask", $SIF_ALL)
if $last_capture = "" Then
$last_capture = ObjCreate("Scripting.Dictionary")
EndIf
; if last capture is requested, and one exists.
if $get_last_capture = 1 and $last_capture.item(0) <> "" Then
return $last_capture.item(0)
EndIf
$capture_filename = _TempFile($tesseract_temp_path, "~", ".tif")
$ocr_filename = StringLeft($capture_filename, StringLen($capture_filename) - 4)
$ocr_filename_and_ext = $ocr_filename & ".txt"
CaptureToTIFF("", "", "", $capture_filename, $scale, $left_indent, $top_indent, $right_indent, $bottom_indent)
ShellExecuteWait(@ProgramFilesDir & "\Tesseract-OCR\tesseract.exe", $capture_filename & " " & $ocr_filename)
; If no delimter specified, then return a string
if StringCompare($delimiter, "") = 0 Then
$final_ocr = FileRead($ocr_filename_and_ext)
Else
_FileReadToArray($ocr_filename_and_ext, $aArray)
_ArrayDelete($aArray, 0)
; Append the recognised text to a final array
_ArrayConcatenate($final_ocr, $aArray)
EndIf
; If the captures are to be displayed
if $show_capture = 1 Then
GUICreate("Tesseract Screen Capture. Note: image displayed is not to scale", 640, 480, 0, 0, $WS_SIZEBOX + $WS_SYSMENU) ; will create a dialog box that when displayed is centered
GUISetBkColor(0xE0FFFF)
$Obj1 = ObjCreate("Preview.Preview.1")
$Obj1_ctrl = GUICtrlCreateObj($Obj1, 0, 0, 640, 480)
$Obj1.ShowFile ($capture_filename, 1)
GUISetState()
if IsArray($final_ocr) Then
_ArrayDisplay($aArray, "Tesseract Text Capture")
Else
MsgBox(0, "Tesseract Text Capture", $final_ocr)
EndIf
GUIDelete()
EndIf
FileDelete($ocr_filename & ".*")
; Cleanup
if IsArray($final_ocr) And $cleanup = 1 Then
; Cleanup the items
for $final_ocr_num = 1 to (UBound($final_ocr)-1)
; Remove erroneous characters
$final_ocr[$final_ocr_num] = StringReplace($final_ocr[$final_ocr_num], ".", "")
$final_ocr[$final_ocr_num] = StringReplace($final_ocr[$final_ocr_num], "'", "")
$final_ocr[$final_ocr_num] = StringReplace($final_ocr[$final_ocr_num], ",", "")
$final_ocr[$final_ocr_num] = StringStripWS($final_ocr[$final_ocr_num], 3)
Next
; Remove duplicate and blank items
for $each in $final_ocr
$found_item = _ArrayFindAll($final_ocr, $each)
; Remove blank items
if IsArray($found_item) Then
if StringCompare($final_ocr[$found_item[0]], "") = 0 Then
_ArrayDelete($final_ocr, $found_item[0])
EndIf
EndIf
; Remove duplicate items
for $found_item_num = 2 to UBound($found_item)
_ArrayDelete($final_ocr, $found_item[$found_item_num-1])
Next
Next
EndIf
; Store a copy of the capture
if $last_capture.item(0) = "" Then
$last_capture.item(0) = $final_ocr
EndIf
Return $final_ocr
EndFunc
; #FUNCTION# ;===============================================================================
;
; Name...........: _TesseractWinCapture()
; Description ...: Captures text from a window.
; Syntax.........: _TesseractWinCapture($win_title, $win_text = "", $get_last_capture = 0, $delimiter = "", $cleanup = 1, $scale = 2, $left_indent = 0, $top_indent = 0, $right_indent = 0, $bottom_indent = 0, $show_capture = 0)
; Parameters ....: $win_title - The title of the window to capture text from.
; $win_text - Optional: The text of the window to capture text from.
; $get_last_capture - Retrieve the text of the last capture, rather than
; performing another capture. Useful if the text in
; the window or control hasn't changed since the last capture.
; 0 = do not retrieve the last capture (default)
; 1 = retrieve the last capture
; $delimiter - Optional: The string that delimits elements in the text.
; A string of text will be returned if this isn't provided.
; An array of delimited text will be returned if this is provided.
; Eg. Use @CRLF to return the items of a listbox as an array.
; $cleanup - Optional: Remove invalid text recognised
; 0 = do not remove invalid text
; 1 = remove invalid text (default)
; $scale - Optional: The scaling factor of the screenshot prior to text recognition.
; Increase this number to improve accuracy.
; The default is 2.
; $left_indent - A number of pixels to indent the capture from the
; left of the window.
; $top_indent - A number of pixels to indent the capture from the
; top of the window.
; $right_indent - A number of pixels to indent the capture from the
; right of the window.
; $bottom_indent - A number of pixels to indent the capture from the
; bottom of the window.
; $show_capture - Display screenshot and text captures
; (for debugging purposes).
; 0 = do not display the screenshot taken (default)
; 1 = display the screenshot taken and exit
; Return values .: On Success - Returns an array of text that was captured.
; On Failure - Returns an empty array.
; Author ........: seangriffin
; Modified.......:
; Remarks .......: Use the default values for first time use. If the text recognition accuracy is low,
; I suggest setting $show_capture to 1 and rerunning. If the screenshot of the
; window or control includes borders or erroneous pixels that may interfere with
; the text recognition process, then use $left_indent, $top_indent, $right_indent and
; $bottom_indent to adjust the portion of the window being captured, to
; exclude these non-textural elements.
; If text accuracy is still low, increase the $scale parameter. In general, The higher
; the scale the clearer the font and the more accurate the text recognition.
; Related .......:
; Link ..........:
; Example .......: No
;
; ;==========================================================================================
func _TesseractWinCapture($win_title, $win_text = "", $get_last_capture = 0, $delimiter = "", $cleanup = 1, $scale = 2, $left_indent = 0, $top_indent = 0, $right_indent = 0, $bottom_indent = 0, $show_capture = 0)
Local $tInfo
dim $aArray, $final_ocr[1], $xyPos_old = -1, $capture_scale = 3
Local $tSCROLLINFO = DllStructCreate($tagSCROLLINFO)
DllStructSetData($tSCROLLINFO, "cbSize", DllStructGetSize($tSCROLLINFO))
DllStructSetData($tSCROLLINFO, "fMask", $SIF_ALL)
if $last_capture = "" Then
$last_capture = ObjCreate("Scripting.Dictionary")
EndIf
$hwnd = WinGetHandle($win_title, $win_text)
; if last capture is requested, and one exists.
if $get_last_capture = 1 and $last_capture.item(Number($hwnd)) <> "" Then
return $last_capture.item(Number($hwnd))
EndIf
; Perform the text recognition
$capture_filename = _TempFile($tesseract_temp_path, "~", ".tif")
$ocr_filename = StringLeft($capture_filename, StringLen($capture_filename) - 4)
$ocr_filename_and_ext = $ocr_filename & ".txt"
CaptureToTIFF($win_title, $win_text, "", $capture_filename, $scale, $left_indent, $top_indent, $right_indent, $bottom_indent)
ShellExecuteWait(@ProgramFilesDir & "\Tesseract-OCR\tesseract.exe", $capture_filename & " " & $ocr_filename)
; If no delimter specified, then return a string
if StringCompare($delimiter, "") = 0 Then
$final_ocr = FileRead($ocr_filename_and_ext)
Else
_FileReadToArray($ocr_filename_and_ext, $aArray)
_ArrayDelete($aArray, 0)
; Append the recognised text to a final array
_ArrayConcatenate($final_ocr, $aArray)
EndIf
; If the captures are to be displayed
if $show_capture = 1 Then
GUICreate("Tesseract Screen Capture. Note: image displayed is not to scale", 640, 480, 0, 0, $WS_SIZEBOX + $WS_SYSMENU) ; will create a dialog box that when displayed is centered
GUISetBkColor(0xE0FFFF)
$Obj1 = ObjCreate("Preview.Preview.1")
$Obj1_ctrl = GUICtrlCreateObj($Obj1, 0, 0, 640, 480)
$Obj1.ShowFile ($capture_filename, 1)
GUISetState()
if IsArray($final_ocr) Then
_ArrayDisplay($aArray, "Tesseract Text Capture")
Else
MsgBox(0, "Tesseract Text Capture", $final_ocr)
EndIf
GUIDelete()
EndIf
FileDelete($ocr_filename & ".*")
; Cleanup
if IsArray($final_ocr) And $cleanup = 1 Then
; Cleanup the items
for $final_ocr_num = 1 to (UBound($final_ocr)-1)
; Remove erroneous characters
$final_ocr[$final_ocr_num] = StringReplace($final_ocr[$final_ocr_num], ".", "")
$final_ocr[$final_ocr_num] = StringReplace($final_ocr[$final_ocr_num], "'", "")
$final_ocr[$final_ocr_num] = StringReplace($final_ocr[$final_ocr_num], ",", "")
$final_ocr[$final_ocr_num] = StringStripWS($final_ocr[$final_ocr_num], 3)
Next
; Remove duplicate and blank items
for $each in $final_ocr
$found_item = _ArrayFindAll($final_ocr, $each)
; Remove blank items
if IsArray($found_item) Then
if StringCompare($final_ocr[$found_item[0]], "") = 0 Then
_ArrayDelete($final_ocr, $found_item[0])
EndIf
EndIf
; Remove duplicate items
for $found_item_num = 2 to UBound($found_item)
_ArrayDelete($final_ocr, $found_item[$found_item_num-1])
Next
Next
EndIf
; Store a copy of the capture
if $last_capture.item(Number($hwnd)) = "" Then
$last_capture.item(Number($hwnd)) = $final_ocr
EndIf
Return $final_ocr
EndFunc
; #FUNCTION# ;===============================================================================
;
; Name...........: _TesseractControlCapture()
; Description ...: Captures text from a control.
; Syntax.........: _TesseractControlCapture($win_title, $win_text = "", $ctrl_id = "", $get_last_capture = 0, $delimiter = "", $expand = 1, $scrolling = 1, $cleanup = 1, $max_scroll_times = 5, $scale = 2, $left_indent = 0, $top_indent = 0, $right_indent = 0, $bottom_indent = 0, $show_capture = 0)
; Parameters ....: $win_title - The title of the window to capture text from.
; $win_text - Optional: The text of the window to capture text from.
; $ctrl_id - Optional: The ID of the control to capture text from.
; The text of the window will be returned if one isn't provided.
; $get_last_capture - Retrieve the text of the last capture, rather than
; performing another capture. Useful if the text in
; the window or control hasn't changed since the last capture.
; 0 = do not retrieve the last capture (default)
; 1 = retrieve the last capture
; $delimiter - Optional: The string that delimits elements in the text.
; A string of text will be returned if this isn't provided.
; An array of delimited text will be returned if this is provided.
; Eg. Use @CRLF to return the items of a listbox as an array.
; $expand - Optional: Expand the control before capturing text from it?
; 0 = do not expand the control
; 1 = expand the control (default)
; $scrolling - Optional: Scroll the control to capture all it's text?
; 0 = do not scroll the control
; 1 = scroll the control (default)
; $cleanup - Optional: Remove invalid text recognised
; 0 = do not remove invalid text
; 1 = remove invalid text (default)
; $max_scroll_times - The maximum number of scrolls to capture in a control
; If a control has a very long scroll bar, the text recognition
; process will take too long. Use this value to restrict
; the amount of text to recognise in a long control.
; $scale - Optional: The scaling factor of the screenshot prior to text recognition.
; Increase this number to improve accuracy.
; The default is 2.
; $left_indent - A number of pixels to indent the capture from the
; left of the control.
; $top_indent - A number of pixels to indent the capture from the
; top of the control.
; $right_indent - A number of pixels to indent the capture from the
; right of the control.
; $bottom_indent - A number of pixels to indent the capture from the
; bottom of the control.
; $show_capture - Display screenshot and text captures
; (for debugging purposes).
; 0 = do not display the screenshot taken (default)
; 1 = display the screenshot taken and exit
; Return values .: On Success - Returns an array of text that was captured.
; On Failure - Returns an empty array.
; Author ........: seangriffin
; Modified.......:
; Remarks .......: Use the default values for first time use. If the text recognition accuracy is low,
; I suggest setting $show_capture to 1 and rerunning. If the screenshot of the
; window or control includes borders or erroneous pixels that may interfere with
; the text recognition process, then use $left_indent, $top_indent, $right_indent and
; $bottom_indent to adjust the portion of the control being captured, to
; exclude these non-textural elements.
; If text accuracy is still low, increase the $scale parameter. In general, The higher
; the scale the clearer the font and the more accurate the text recognition.
; Related .......:
; Link ..........:
; Example .......: Yes
;
; ;==========================================================================================
func _TesseractControlCapture($win_title, $win_text = "", $ctrl_id = "", $get_last_capture = 0, $delimiter = "", $expand = 1, $scrolling = 1, $cleanup = 1, $max_scroll_times = 5, $scale = 2, $left_indent = 0, $top_indent = 0, $right_indent = 0, $bottom_indent = 0, $show_capture = 0)
Local $tInfo
dim $aArray, $final_ocr[1], $xyPos_old = -1, $capture_scale = 3
Local $tSCROLLINFO = DllStructCreate($tagSCROLLINFO)
DllStructSetData($tSCROLLINFO, "cbSize", DllStructGetSize($tSCROLLINFO))
DllStructSetData($tSCROLLINFO, "fMask", $SIF_ALL)
if $last_capture = "" Then
$last_capture = ObjCreate("Scripting.Dictionary")
EndIf
; if a control ID is specified, then get it's HWND
if StringCompare($ctrl_id, "") <> 0 Then
$hwnd = ControlGetHandle($win_title, $win_text, $ctrl_id)
; if expansion of the control is required.
if $expand = 1 and StringCompare($delimiter, "") <> 0 Then
$hwnd2 = $hwnd
If _GUICtrlComboBox_GetComboBoxInfo($hwnd, $tInfo) Then
$hwnd = DllStructGetData($tInfo, "hList")
EndIf
; Expand the control.
_GUICtrlComboBox_ShowDropDown($hwnd2, True)
EndIf
EndIf
; if last capture is requested, and one exists.
if $get_last_capture = 1 and $last_capture.item(Number($hwnd)) <> "" Then
return $last_capture.item(Number($hwnd))
EndIf
; Text recognition improves alot if the current selection and focus rectangle is removed.
; The following code will remove the selection.
; After text recognition the selection is returned.
$sel_index = _GUICtrlListBox_GetCurSel($hwnd)
; The following two lines should remove the current selection and focus rectangle
; in all cases.
_GUICtrlListBox_SetCurSel($hWnd, -1)
_GUICtrlListBox_SetCaretIndex($hWnd, -1)
; Scroll to the top
DllCall("user32.dll", "int", "SendMessage", "hwnd", $hwnd, "int", $WM_VSCROLL, "int", $SB_TOP, "int", 0)
for $i = 1 to $max_scroll_times
if $i > 1 Then
; Scroll the list down one page
DllCall("user32.dll", "int", "SendMessage", "hwnd", $hwnd, "int", $WM_VSCROLL, "int", $SB_PAGEDOWN, "int", 0)
EndIf
; Get the position of the scroll bar
DllCall("user32.dll", "int", "GetScrollInfo", "hwnd", $hwnd, "int", $SB_VERT, "ptr", DllStructGetPtr($tSCROLLINFO))
$xyPos = DllStructGetData($tSCROLLINFO, "nPos")
; If the scroll bar hasn't moved, we have finished scrolling
if $xyPos_old = $xyPos then ExitLoop
$xyPos_old = $xyPos
; Perform the text recognition
WinActivate($win_title)
$capture_filename = _TempFile($tesseract_temp_path, "~", ".tif")
$ocr_filename = StringLeft($capture_filename, StringLen($capture_filename) - 4)
$ocr_filename_and_ext = $ocr_filename & ".txt"
CaptureToTIFF($win_title, $win_text, $hwnd, $capture_filename, $scale, $left_indent, $top_indent, $right_indent, $bottom_indent)
ShellExecuteWait(@ProgramFilesDir & "\Tesseract-OCR\tesseract.exe", $capture_filename & " " & $ocr_filename)
; Return the current selection (if one existed)
if $sel_index > -1 Then
_GUICtrlListBox_SetCurSel($hwnd, $sel_index)
EndIf
; If no delimter specified, then return a string
if StringCompare($delimiter, "") = 0 Then
$final_ocr = FileRead($ocr_filename_and_ext)
$i = $max_scroll_times
Else
_FileReadToArray($ocr_filename_and_ext, $aArray)
_ArrayDelete($aArray, 0)
; Append the recognised text to a final array
_ArrayConcatenate($final_ocr, $aArray)
EndIf
; If the captures are to be displayed
if $show_capture = 1 Then
GUICreate("Tesseract Screen Capture. Note: image displayed is not to scale", 640, 480, 0, 0, $WS_SIZEBOX + $WS_SYSMENU) ; will create a dialog box that when displayed is centered
GUISetBkColor(0xE0FFFF)
$Obj1 = ObjCreate("Preview.Preview.1")
$Obj1_ctrl = GUICtrlCreateObj($Obj1, 0, 0, 640, 480)
$Obj1.ShowFile ($capture_filename, 1)
GUISetState()
if IsArray($final_ocr) Then
_ArrayDisplay($aArray, "Tesseract Text Capture")
Else
MsgBox(0, "Tesseract Text Capture", $final_ocr)
EndIf
GUIDelete()
EndIf
FileDelete($ocr_filename & ".*")
Next
; Cleanup
if IsArray($final_ocr) And $cleanup = 1 Then
; Cleanup the items
for $final_ocr_num = 1 to (UBound($final_ocr)-1)
; Remove erroneous characters
$final_ocr[$final_ocr_num] = StringReplace($final_ocr[$final_ocr_num], ".", "")
$final_ocr[$final_ocr_num] = StringReplace($final_ocr[$final_ocr_num], "'", "")
$final_ocr[$final_ocr_num] = StringReplace($final_ocr[$final_ocr_num], ",", "")
$final_ocr[$final_ocr_num] = StringStripWS($final_ocr[$final_ocr_num], 3)
Next
; Remove duplicate and blank items
for $each in $final_ocr
$found_item = _ArrayFindAll($final_ocr, $each)
; Remove blank items
if IsArray($found_item) Then
if StringCompare($final_ocr[$found_item[0]], "") = 0 Then
_ArrayDelete($final_ocr, $found_item[0])
EndIf
EndIf
; Remove duplicate items
for $found_item_num = 2 to UBound($found_item)
_ArrayDelete($final_ocr, $found_item[$found_item_num-1])
Next
Next
EndIf
; Store a copy of the capture
if $last_capture.item(Number($hwnd)) = "" Then
$last_capture.item(Number($hwnd)) = $final_ocr
EndIf
Return $final_ocr
EndFunc
; #FUNCTION# ;===============================================================================
;
; Name...........: _TesseractScreenFind()
; Description ...: Finds the location of a string within text captured from the screen.
; Syntax.........: _TesseractScreenFind($find_str = "", $partial = 1, $get_last_capture = 0, $delimiter = "", $cleanup = 1, $scale = 2, $left_indent = 0, $top_indent = 0, $right_indent = 0, $bottom_indent = 0, $show_capture = 0)
; Parameters ....: $find_str - The text (string) to find.
; $partial - Optional: Find the text using a partial match?
; 0 = use a full text match
; 1 = use a partial text match (default)
; $get_last_capture - Search within the text of the last capture, rather than
; performing another capture. Useful if the text in
; the window or control hasn't changed since the last capture.
; 0 = do not use the last capture (default)
; 1 = use the last capture
; $delimiter - Optional: The string that delimits elements in the text.
; A string of text will be searched if this isn't provided.
; The index of the item found will be returned if this is provided.
; Eg. Use @CRLF to find an item in a listbox.
; $cleanup - Optional: Remove invalid text recognised
; 0 = do not remove invalid text
; 1 = remove invalid text (default)
; $scale - Optional: The scaling factor of the screenshot prior to text recognition.
; Increase this number to improve accuracy.
; The default is 2.
; $left_indent - A number of pixels to indent the capture from the
; left of the screen.
; $top_indent - A number of pixels to indent the capture from the
; top of the screen.
; $right_indent - A number of pixels to indent the capture from the
; right of the screen.
; $bottom_indent - A number of pixels to indent the capture from the
; bottom of the screen.
; $show_capture - Display screenshot and text captures
; (for debugging purposes).
; 0 = do not display the screenshot taken (default)
; 1 = display the screenshot taken and exit
; Return values .: On Success - Returns the location of the text that was found.
; If $delimiter is "", then the character position of the text found
; is returned.
; If $delimiter is not "", then the element of the array where the
; text was found is returned.
; On Failure - Returns an empty array.
; Author ........: seangriffin
; Modified.......:
; Remarks .......:
; Related .......:
; Link ..........:
; Example .......: No
;
; ;==========================================================================================
func _TesseractScreenFind($find_str = "", $partial = 1, $get_last_capture = 0, $delimiter = "", $cleanup = 1, $scale = 2, $left_indent = 0, $top_indent = 0, $right_indent = 0, $bottom_indent = 0, $show_capture = 0)
; Get all the text from the screen
$recognised_text = _TesseractScreenCapture($get_last_capture, $delimiter, $cleanup, $scale, $left_indent, $top_indent, $right_indent, $bottom_indent, $show_capture)
if IsArray($recognised_text) Then
$index_found = _ArraySearch($recognised_text, $find_str, 0, 0, 0, $partial)
Else
if $partial = 1 Then
$index_found = StringInStr($recognised_text, $find_str)
Else
if StringCompare($recognised_text, $find_str) = 0 Then
$index_found = 1
Else
$index_found = 0
EndIf
EndIf
EndIf
Return $index_found
EndFunc
; #FUNCTION# ;===============================================================================
;
; Name...........: _TesseractWinFind()
; Description ...: Finds the location of a string within text captured from a window.
; Syntax.........: _TesseractWinFind($win_title, $win_text = "", $find_str = "", $partial = 1, $get_last_capture = 0, $delimiter = "", $cleanup = 1, $scale = 2, $left_indent = 0, $top_indent = 0, $right_indent = 0, $bottom_indent = 0, $show_capture = 0)
; Parameters ....: $win_title - The title of the window to find text in.
; $win_text - Optional: The text of the window to find text in.
; $find_str - The text (string) to find.
; $partial - Optional: Find the text using a partial match?
; 0 = use a full text match
; 1 = use a partial text match (default)
; $get_last_capture - Search within the text of the last capture, rather than
; performing another capture. Useful if the text in
; the window or control hasn't changed since the last capture.
; 0 = do not use the last capture (default)
; 1 = use the last capture
; $delimiter - Optional: The string that delimits elements in the text.
; A string of text will be searched if this isn't provided.
; The index of the item found will be returned if this is provided.
; Eg. Use @CRLF to find an item in a listbox.
; $cleanup - Optional: Remove invalid text recognised
; 0 = do not remove invalid text
; 1 = remove invalid text (default)
; $scale - Optional: The scaling factor of the screenshot prior to text recognition.
; Increase this number to improve accuracy.
; The default is 2.
; $left_indent - A number of pixels to indent the capture from the
; left of the window.
; $top_indent - A number of pixels to indent the capture from the
; top of the window.
; $right_indent - A number of pixels to indent the capture from the
; right of the window.
; $bottom_indent - A number of pixels to indent the capture from the
; bottom of the window.
; $show_capture - Display screenshot and text captures
; (for debugging purposes).
; 0 = do not display the screenshot taken (default)
; 1 = display the screenshot taken and exit
; Return values .: On Success - Returns the location of the text that was found.
; If $delimiter is "", then the character position of the text found
; is returned.
; If $delimiter is not "", then the element of the array where the
; text was found is returned.
; On Failure - Returns an empty array.
; Author ........: seangriffin
; Modified.......:
; Remarks .......:
; Related .......:
; Link ..........:
; Example .......: No
;
; ;==========================================================================================
func _TesseractWinFind($win_title, $win_text = "", $find_str = "", $partial = 1, $get_last_capture = 0, $delimiter = "", $cleanup = 1, $scale = 2, $left_indent = 0, $top_indent = 0, $right_indent = 0, $bottom_indent = 0, $show_capture = 0)
; Get all the text from the window
$recognised_text = _TesseractWinCapture($win_title, $win_text, $get_last_capture, $delimiter, $cleanup, $scale, $left_indent, $top_indent, $right_indent, $bottom_indent, $show_capture)
if IsArray($recognised_text) Then
$index_found = _ArraySearch($recognised_text, $find_str, 0, 0, 0, $partial)
Else
if $partial = 1 Then
$index_found = StringInStr($recognised_text, $find_str)
Else
if StringCompare($recognised_text, $find_str) = 0 Then
$index_found = 1
Else
$index_found = 0
EndIf
EndIf
EndIf
Return $index_found
EndFunc
; #FUNCTION# ;===============================================================================
;
; Name...........: _TesseractControlFind()
; Description ...: Finds the location of a string within text captured from a control.
; Syntax.........: _TesseractControlFind($win_title, $win_text = "", $ctrl_id = "", $find_str = "", $partial = 1, $get_last_capture = 0, $delimiter = "", $expand = 1, $scrolling = 1, $cleanup = 1, $max_scroll_times = 5, $scale = 2, $left_indent = 0, $top_indent = 0, $right_indent = 0, $bottom_indent = 0, $show_capture = 0)
; Parameters ....: $win_title - The title of the window to find text in.
; $win_text - Optional: The text of the window to find text in.
; $ctrl_id - Optional: The ID of the control to find text in.
; The text of the window will be usee if one isn't provided.
; $find_str - The text (string) to find.
; $partial - Optional: Find the text using a partial match?
; 0 = use a full text match
; 1 = use a partial text match (default)
; $get_last_capture - Search within the text of the last capture, rather than
; performing another capture. Useful if the text in
; the window or control hasn't changed since the last capture.
; 0 = do not use the last capture (default)
; 1 = use the last capture
; $delimiter - Optional: The string that delimits elements in the text.
; A string of text will be searched if this isn't provided.
; The index of the item found will be returned if this is provided.
; Eg. Use @CRLF to find an item in a listbox.
; $expand - Optional: Expand the control before searching it?
; 0 = do not expand the control
; 1 = expand the control (default)
; $scrolling - Optional: Scroll the control to search all it's text?
; 0 = do not scroll the control
; 1 = scroll the control (default)
; $cleanup - Optional: Remove invalid text recognised
; 0 = do not remove invalid text
; 1 = remove invalid text (default)
; $max_scroll_times - The maximum number of scrolls to capture in a control
; If a control has a very long scroll bar, the text recognition
; process will take too long. Use this value to restrict
; the amount of text to recognise in a long control.
; $scale - Optional: The scaling factor of the screenshot prior to text recognition.
; Increase this number to improve accuracy.
; The default is 2.
; $left_indent - A number of pixels to indent the capture from the
; left of the control.
; $top_indent - A number of pixels to indent the capture from the
; top of the control.
; $right_indent - A number of pixels to indent the capture from the
; right of the control.
; $bottom_indent - A number of pixels to indent the capture from the
; bottom of the control.
; $show_capture - Display screenshot and text captures
; (for debugging purposes).
; 0 = do not display the screenshot taken (default)
; 1 = display the screenshot taken and exit
; Return values .: On Success - Returns the location of the text that was found.
; If $delimiter is "", then the character position of the text found
; is returned.
; If $delimiter is not "", then the element of the array where the
; text was found is returned.
; On Failure - Returns an empty array.
; Author ........: seangriffin
; Modified.......:
; Remarks .......:
; Related .......:
; Link ..........:
; Example .......: Yes
;
; ;==========================================================================================
func _TesseractControlFind($win_title, $win_text = "", $ctrl_id = "", $find_str = "", $partial = 1, $get_last_capture = 0, $delimiter = "", $expand = 1, $scrolling = 1, $cleanup = 1, $max_scroll_times = 5, $scale = 2, $left_indent = 0, $top_indent = 0, $right_indent = 0, $bottom_indent = 0, $show_capture = 0)
; Get all the text from the control
$recognised_text = _TesseractControlCapture($win_title, $win_text, $ctrl_id, $get_last_capture, $delimiter, $expand, $scrolling, $cleanup, $max_scroll_times, $scale, $left_indent, $top_indent, $right_indent, $bottom_indent, $show_capture)
if IsArray($recognised_text) Then
$index_found = _ArraySearch($recognised_text, $find_str, 0, 0, 0, $partial)
Else
if $partial = 1 Then
$index_found = StringInStr($recognised_text, $find_str)
Else
if StringCompare($recognised_text, $find_str) = 0 Then
$index_found = 1
Else
$index_found = 0
EndIf
EndIf
EndIf
Return $index_found
EndFunc
; #FUNCTION# ;===============================================================================
;
; Name...........: CaptureToTIFF()
; Description ...: Captures an image of the screen, a window or a control, and saves it to a TIFF file.
; Syntax.........: CaptureToTIFF($win_title = "", $win_text = "", $ctrl_id = "", $sOutImage = "", $scale = 1, $left_indent = 0, $top_indent = 0, $right_indent = 0, $bottom_indent = 0)
; Parameters ....: $win_title - The title of the window to capture an image of.
; $win_text - Optional: The text of the window to capture an image of.
; $ctrl_id - Optional: The ID of the control to capture an image of.
; An image of the window will be returned if one isn't provided.
; $sOutImage - The filename to store the image in.
; $scale - Optional: The scaling factor of the capture.
; $left_indent - A number of pixels to indent the screen capture from the
; left of the window or control.
; $top_indent - A number of pixels to indent the screen capture from the
; top of the window or control.
; $right_indent - A number of pixels to indent the screen capture from the
; right of the window or control.
; $bottom_indent - A number of pixels to indent the screen capture from the
; bottom of the window or control.
; Return values .: None
; Author ........: seangriffin
; Modified.......:
; Remarks .......:
; Related .......:
; Link ..........:
; Example .......: No
;
; ;==========================================================================================
Func CaptureToTIFF($win_title = "", $win_text = "", $ctrl_id = "", $sOutImage = "", $scale = 1, $left_indent = 0, $top_indent = 0, $right_indent = 0, $bottom_indent = 0)
Local $hWnd, $hwnd2, $hDC, $hBMP, $hImage1, $hGraphic, $CLSID, $tParams, $pParams, $tData, $i = 0, $hImage2, $pos[4]
Local $Ext = StringUpper(StringMid($sOutImage, StringInStr($sOutImage, ".", 0, -1) + 1))
Local $giTIFColorDepth = 24
Local $giTIFCompression = $GDIP_EVTCOMPRESSIONNONE
; If capturing a control
if StringCompare($ctrl_id, "") <> 0 Then
$hwnd2 = ControlGetHandle($win_title, $win_text, $ctrl_id)
$pos = ControlGetPos($win_title, $win_text, $ctrl_id)
Else
; If capturing a window
if StringCompare($win_title, "") <> 0 Then
$hwnd2 = WinGetHandle($win_title, $win_text)
$pos = WinGetPos($win_title, $win_text)
Else
; If capturing the desktop
$hwnd2 = ""
$pos[0] = 0
$pos[1] = 0
$pos[2] = @DesktopWidth
$pos[3] = @DesktopHeight
EndIf
EndIf
; Capture an image of the window / control
if IsHWnd($hwnd2) Then
WinActivate($win_title, $win_text)
$hBitmap2 = _ScreenCapture_CaptureWnd("", $hwnd2, 0, 0, -1, -1, False)
Else
$hBitmap2 = _ScreenCapture_Capture("", 0, 0, -1, -1, False)
EndIf
_GDIPlus_Startup ()
; Convert the image to a bitmap
$hImage2 = _GDIPlus_BitmapCreateFromHBITMAP ($hBitmap2)
$hWnd = _WinAPI_GetDesktopWindow()
$hDC = _WinAPI_GetDC($hWnd)
$hBMP = _WinAPI_CreateCompatibleBitmap($hDC, ($pos[2] * $scale) - ($right_indent * $scale), ($pos[3] * $scale) - ($bottom_indent * $scale))
_WinAPI_ReleaseDC($hWnd, $hDC)
$hImage1 = _GDIPlus_BitmapCreateFromHBITMAP ($hBMP)
$hGraphic = _GDIPlus_ImageGetGraphicsContext($hImage1)
_GDIPLus_GraphicsDrawImageRect($hGraphic, $hImage2, 0 - ($left_indent * $scale), 0 - ($top_indent * $scale), ($pos[2] * $scale) + $left_indent, ($pos[3] * $scale) + $top_indent)
$CLSID = _GDIPlus_EncodersGetCLSID($Ext)
; Set TIFF parameters
$tParams = _GDIPlus_ParamInit(2)
$tData = DllStructCreate("int ColorDepth;int Compression")
DllStructSetData($tData, "ColorDepth", $giTIFColorDepth)
DllStructSetData($tData, "Compression", $giTIFCompression)
_GDIPlus_ParamAdd($tParams, $GDIP_EPGCOLORDEPTH, 1, $GDIP_EPTLONG, DllStructGetPtr($tData, "ColorDepth"))
_GDIPlus_ParamAdd($tParams, $GDIP_EPGCOMPRESSION, 1, $GDIP_EPTLONG, DllStructGetPtr($tData, "Compression"))
If IsDllStruct($tParams) Then $pParams = DllStructGetPtr($tParams)
; Save TIFF and cleanup
_GDIPlus_ImageSaveToFileEx($hImage1, $sOutImage, $CLSID, $pParams)
_GDIPlus_ImageDispose($hImage1)
_GDIPlus_ImageDispose($hImage2)
_GDIPlus_GraphicsDispose ($hGraphic)
_WinAPI_DeleteObject($hBMP)
_GDIPlus_Shutdown()
EndFunc
В функции _TesseractScreenCapture при $show_capture= 1 (для дебага) вылетает ошибка во вложении.
Бог с ним, со встроенным дебагом. Из без нее тоже не работает.
Ошибка 2
Функция возвращает пустой массив, т.к .tiff файл это один пиксель ровно по центру.
Дальше не смог разобраться. Почему так происходит?
P.S Не знаю важно это или нет, но скрипт пишется для автоматизации рутины в веб приложении где кроме как через тессеракт данные никак не получить(цифры не выделяются). И да, все это в хроме.