models
Module models
Classes
AreasType(value, names=None, *, module=None, qualname=None, type=None, start=1)- Determines the type of regions detected by the model.
Used in the get_text_areas to indicate which result will be obtained - paragraph coordinates or line coordinates.
Ancestors (in MRO)
- enum.Enum
Class variables
LINES- Sets regions as lines
PARAGRAPHS- Sets regions as paragraphs
WORDS- Sets regions as words
DetectAreasMode(value, names=None, *, module=None, qualname=None, type=None, start=1)- Determines the type of neural network used for areas detection.
Used in the RecognitionSettings to specify which type of image you want to recognize.
Ancestors (in MRO)
- enum.Enum
Class variables
COMBINE- Detects paragraphs with text and then uses other NN model to detect areas inside of paragraphs. Better for images with complex structure.
CURVED_TEXT- Detects lines and recognizes text on curved images. Preferred mode for photos of book and magazine pages.
DOCUMENT- Detects paragraphs uses NN model for documents. Better for multicolumn document, document with pictures or with other not text objects.
NONE- Doesn’t detect paragraphs. Better for a simple one-column document without pictures.
PHOTO- Detects paragraphs uses NN model for photos. Better for image with a lot of pictures and other not text objects.
TABLE- Detects cells with text. Preferable mode for images with table structure.
TEXT_IN_WILD- A super-powerful neural network specialized in extracting words from low-quality images such as street photos, license plates, passport photos, meter photos, and photos with noisy backgrounds.
Format(value, names=None, *, module=None, qualname=None, type=None, start=1)- Format to save recognition result as document.
Ancestors (in MRO)
- enum.Enum
Class variables
DOCX- Saves the result as an Office Open XML Word processing ML Document (macro-free).
EPUB- Saves the document as an EPUB file.
HTML- Saves the document as an HTML file.
JSON- Saves the result as an plain text written in JavaScript object notation.
PDF- Saves the result as a PDF (Adobe Portable Document) Document.
PDF_NO_IMG- Saves the document as a Searchable PDF (Adobe Portable Document) Document without image.
RTF- Saves the document as an rtf file.
TEXT- Saves the result in the plain text format.
XLSX- Saves the result as an Excel ( 2007 and later) workbook Document.
XML- Saves the result as an XML Document.
ImageData(javaClass)Ancestors (in MRO)
- aspose.helper.BaseJavaClass
Methods
initParams(self):InputType(value, names=None, *, module=None, qualname=None, type=None, start=1)- Types of image/ documents for processing / recognition.
Ancestors (in MRO)
- enum.Enum
Class variables
BASE64- base64 string with the image or path to the .txt file with the base64 content. Supports GIF, PNG, JPEG, BMP, TIFF.
DIRECTORY- Path to the directory. Nested archives and folders are not supported. Supports GIF, PNG, JPEG, BMP, TIFF. Default amount of processed images is all.
PDF- Scanned PDF document from file or from bynary array.
SINGLE_IMAGE- Supports GIF, PNG, JPEG, BMP, TIFF, JFIF, binary array.
TIFF- Multipage TIFF, TIF document from file or from InputStream.
URL- Link on the image. Supports GIF, PNG, JPEG, BMP, TIFF.
ZIP- Full name of the ZIP archive. Nested archives and folders are not supported. Supports GIF, PNG, JPEG, BMP, TIFF, JFIF. Default amount of processed images is all.
Language(value, names=None, *, module=None, qualname=None, type=None, start=1)- Language model for the recognition.
Ancestors (in MRO)
- enum.Enum
Class variables
BEL- Belorussian alphabet
BUL- Bulgarian alphabet
CHI- Chinese alphabet
CYRILLIC- Multi - language(cyrillic alphabet) support
CZE- Czech alphabet
DAN- Danish alphabet
DEU- German alphabet
DUM- Dutch alphabet
ENG- English alphabet
EST- Estonian alphabet
FIN- Finnish alphabet
FRA- French alphabet
HIN- Hindi alphabet
ITA- Italian alphabet
KAZ- Kazakh alphabet
LATIN- Multi - language(latin alphabet) support
LAV- Latvian alphabet
LIT- Lithuanian alphabet
NONE- Multi - language support
NOR- Norwegian alphabet
POL- Polish alphabet
POR- Portuguese alphabet
RUM- Romanian alphabet
RUS- Russian alphabet
SLK- Slovak alphabet
SLV- Slovene alphabet
SPA- Spanish alphabet
SRP- Serbian alphabet
SRP_HRV- Serbo-Croatian alphabet
SWE- Swedish alphabet
UKR- Ukrainian alphabet
ModelsConverter()Methods
convertInputTypeToJava(jType):convertToJavaAreasMode(jType):convertToJavaAreasType(jType):convertToJavaFormat(jType):convertToJavaLanguage(jType):convertToJavaSpellCheckLanguage(jType):OcrInput(type: models.InputType, filters: models.PreprocessingFilter = None)- Main class to collect images.
Constructor to create container and set the type of images / documents and filters for further processing / recognition. @param type: Set the images/documents type will be added to container. @param filters: Set processing filters will be applied for further processing or recognition.
Methods
add(self, fullPath: str, startPage: int = None, pagesNumber: int = None)- Add the path or URI containing the image for recognition / processing. The type of the image must correspond to the type specified in the constructor. @param fullPath: Path to the image/ document / folder / archive. @param startPage: The first page/image for processing / recognition. Use for documents, zip, folders. @param pagesNumber: The total amount of pages/images for processing / recognition. Use for documents, zip, folders. Default = all.
addStream(self, image_data_binary, startPage: int = None, pagesNumber: int = None)- Add the InputStream containing the image for recognition / processing.
The type of the image must correspond to the type specified in the constructor.
\code input = OcrInput(InputType.SINGLE_IMAGE) file = open(imgPath, “rb”) image_data_binary = file.read() file.close() input.addStream(image_data_binary) result = api.recognize(input, RecognitionSettings()) \endcode
@param image_data_binary: containing the image or document. @param startPage: The first page/image for processing / recognition. Use for documents, zip, folders. @param pagesNumber: The total amount of pages/images for processing / recognition. Use for documents, zip, folders. Default = all.
add_base64(self, base64: str)- Add the base64 string containing the image for recognition / processing. The type of the image must correspond to the type specified in the constructor. @param base64: Base64 string with single image.
clear(self)- Set the amount of items for processing / recognition as 0. Clear the collection.
clear_filters(self)- Remove all filters.
get(self, index: int) ‑> models.ImageData- Returns information about processed / recognized image. @param index: Position of the image in the List. @return: The object of ImageData.
getJavaClass(self):init(self, javaClass):size(self)- Amount of items for processing / recognition. @return: Amount of items.
PreprocessingFilter()- Base class for image processing commands.
Ancestors (in MRO)
- aspose.helper.BaseJavaClass
Class variables
JAVA_CLASS_NAME:Static methods
auto_denoising()- Enables the use of an additional neural network to improve the image - reduce noise. Useful for images with scan artifacts, distortion, spots, flares, gradients, foreign elements. @return: AutoDenoisingFilter object.
auto_dewarping()- Automatically corrects geometric distortions in the image. Extremely resource intensive! @return: AutoDewarpingFilter object.
auto_skew()- Enables the automatic image skew correction. @return: AutoSkewFilter object.
binarize()- Converts an image to black-and-white image. Binary images are images whose pixels have only two possible intensity values. They are normally displayed as black and white. Numerically, the two values are often 0 for black, and 255 for white. Binary images are produced by auto thresholding an image. @return: BinarizeFilter object.
binarize_and_dilate()- Dilation adds pixels to the boundaries of objects in an image. @return: DilateFilter object.
contrast_correction()- Contrast correction filter. @return: ContrastCorrectionFilter object.
invert()- Automatically inverts colors in a document image. @return: InvertFilter object.
median()- The median filter run through each element of the image and replace each pixel with the median of its neighboring pixels. @return: MedianFilter object.
resize(width: int, height: int)- Rescale image - upscale or downscale image resolution. @param width: The new width of the image. @param height: The new height of the image. @return: ResizeFilter object.
rotate(angle: float)- Rotate original image. @param angle: Angle of rotation. Value from -360 to 360. @return: RotateFilter object.
scale(ratio: float)- Rescale image - Upscale or downscale image resolution. InterpolationFilterType bilinear or nearest neighbor. @param ratio: The scaling factor. Recommended value from 0.1 to 1 to shrink. From 1 to 10 to enlarge. @return: ScaleFilter object.
threshold(value: int)- Create a binary image based on setting a threshold value on the pixel intensity of the original image. @param value: The max value. @return: BinarizeFilter object.
to_grayscale()- Converts an image to grayscale image. Grayscale image have 256 level of light in image (0 to 255). @return: GrayscaleFilter object.
Methods
add(self, filter)- Add filter to collection for further preprocessing. @param filter: PreprocessingFilter object.
getJavaClass(self): SpellCheckError(javaClass)- Representing misspelled word with additional data.
Ancestors (in MRO)
- aspose.helper.BaseJavaClass
Methods
initParams(self): SpellCheckLanguage(value, names=None, *, module=None, qualname=None, type=None, start=1)- Dictionary language for spell-check correction.
Ancestors (in MRO)
- enum.Enum
Class variables
CZE- Czech dictionary
DAN- Danish dictionary
DEU- German dictionary
DUM- Dutch dictionary
ENG- English dictionary
EST- Estonian dictionary
FIN- Finnish dictionary
FRA- French dictionary
ITA- Italian dictionary
LAV- Latvian dictionary
LIT- Lithuanian dictionary
POL- Polish dictionary
POR- Portuguese dictionary
RUM- Romanian dictionary
SLK- Slovak dictionary
SLV- Slovene dictionary
SPA- Spanish dictionary
SWE- Swedish dictionary
SuggestedWord(javaClass)- Spelling suggestion returned from get_spell_check_error_list.
Ancestors (in MRO)
- aspose.helper.BaseJavaClass
Methods
initParams(self):