Image To Text ^0.3.0
Image to text is a computer vision task that involves extracting text from images. The task accepts image inputs and returns a text related to the content of the image. The most common applications of image to text are in Image Captioning and Optical Character Recognition (OCR).
Task ID
image-to-text
Default Model
Xenova/vit-gpt2-image-captioning
Use Cases
Image to text models find application in various scenarios, including:
- Image Captioning: Generating textual descriptions of images for visually impaired users or for content generation.
- Optical Character Recognition (OCR): Extracting text from images of documents, receipts, and other printed materials for digitization and indexing.
- Content Moderation: Analyzing images for text content to filter out inappropriate or harmful content.
Running an Inference Session
Here's how to perform image to text using the pipeline:
use function Codewithkyrian\Transformers\Pipelines\pipeline;
$captioner = pipeline('image-to-text');
$result = $captioner('path/to/image.jpg');
Click to view output
[
"text" => "A close up of a cat sitting on a bed"
]
The task being performed here is image captioning, determined by the model used. If you want to perform OCR, you can use a model specifically trained for that task eg. Xenova/trocr-small-handwritten
, and then pass in an image of a single line of handwritten text.
Pipeline Input Options
When running the image-to-text
pipeline, you can the following options:
The image(s) to extract text from. It can be a local file path, a file resource, a URL to an image (local or remote), or an array of these inputs. It's the first argument so there's no need to pass it as a named argument.texts
(string) php$result = $captioner('https://example.com/image.jpg');
All other options are the same as the ones in the text2text-generation pipeline, including the streamer
.
Pipeline Output
The output is an array where each element corresponds to an input text and contains a key generated_text
with the detected text in the image.
[
[
"generated_text" => "A close up of a cat sitting on a bed"
]
]
The number of elements in the output array corresponds to the number of input images.
Additional Notes
- The accuracy of the extracted text depends on the quality of the input image and the model's training data.
- The model may not be able to detect text in images with complex backgrounds, low resolution, or unusual fonts.
- Use non-quantized models for better text extraction results.