Pipelines

Pipelines are a core feature of TransformersPHP, designed to simplify the use of machine learning models for various natural language processing (NLP) tasks. They encapsulate the entire process of running a model, from input preprocessing to post-processing the output, making it easy to integrate advanced NLP capabilities into your applications.

Creating a Pipeline

To create a pipeline, you use the pipeline function, specifying the task you want to perform. Here's a basic example for sentiment analysis:

php

use function Codewithkyrian\Transformers\Pipelines\pipeline;

$classifier = pipeline('sentiment-analysis');

This initializes a pipeline for sentiment analysis, automatically handling model downloading, caching, input processing, and output interpretation.

Pipeline with Options

Besides passing the task you want to perform, you can also customize the pipeline instance creation with some additional options. For instance, you can specify different model to use instead of the default model:

php

$classifier = pipeline('sentiment-analysis', 'nlptown/bert-base-multilingual-uncased-sentiment');

Beyond the task and model name, you can further tailor your pipeline with additional named arguments. Here's a breakdown of these options for better clarity:

`task`

Specifies the task you wish the pipeline to execute. Refer to the list of supported tasks for available options.

`modelName`

Specifies the model to be used by the pipeline. You can use any ONNX model from the Hugging Face model repository that is compatible with the specified task. You can also use your custom models, provided you've prepared them as instructed. If not provided, the default model for the task will be used. Eg

php

$generator = pipeline('text-generation', 'Xenova/codegen-350M-mono');

`quantized`

A boolean value indicating whether to use a quantized version of the model. Quantization reduces the model size and speeds up inference but may slightly decrease accuracy. This option defaults to false.

`config`

Allows you to pass a custom configuration for the pipeline. This could include specific model parameters or preprocessing options. Providing a custom config can help tailor the pipeline's behavior to better fit your application' s requirements.

`cacheDir`

While it's typically recommended to set the cache directory globally, this allows you to modify the cache directory to save and look for models for this pipelie instance.

`revision`

This specified model version to use. It can be a branch name, a tag name, or a commit id. Since HuggingFace uses a git-based system for storing models and other artifacts, so revision can be any identifier allowed by git.

`modelFilename`

This specified the filename of the model in the repository. It's particularly used for decoder only models. It defaults to decoder_model_merged but you can set it to use another if the repository doesn't use that nomenclature.

Running a Pipeline

Once you've created a pipeline, running it is straightforward. All pipelines are designed to accept input text as their primary argument. Here's how to run a pipeline for some common NLP tasks.

Basic Usage

For tasks like sentiment analysis, text generation, or named entity recognition (NER), you typically provide a string or an array of strings as input. Here's an example using the sentiment analysis pipeline created earlier:

php

$result = $classifier("TransformersPHP makes NLP easy and accessible.");

Handling Multiple Inputs

Most pipelines can also process multiple inputs in a single call, which is especially useful for batch processing. Provide an array of strings to analyze multiple texts at once:

php

$results = $classifier([
    "I love using TransformersPHP for my projects.",
    "The weather today is dreadful."
]);

Additional Options

Additional arguments can be passed to the pipeline function to customize it's behavior, but they are hugely dependent on the task you're using the pipelines for. For example, for translation, you can specify the source and target languages:

php

$translator = pipeline('translation', 'Xenova/m2m100_418M');

$result = $translator('I love TransformersPHP!', srcLang: 'en', tgtLang: 'fr');

Details on the specific options available for each pipeline task are provided within the documentation for that task.

Pipeline Output

The output generated by a pipeline varies based on the task it's performing and the nature of the input provided. For example:

For the classifier with one input, the output can be:

php

['label' => 'POSITIVE',  'score' => 0.9995358059835]

and for the multiple input classifier:

php

[
    ['label' => 'POSITIVE',  'score' => 0.99980061678407],
    ['label' => 'NEGATIVE',  'score' => 0.99842234422764],
]

and for the translation task:

php

['translation_text' => 'J\'aime TransformersPHP!']

Supported Tasks

Natural Language Processing

Task	ID	Description	Supported?
Fill-Mask	`fill-mask`	Masking some of the words in a sentence and predicting which words should replace those masks.	✅
Question Answering	`question-answering`	Retrieve the answer to a question from a given text.	✅
Sentence Similarity	`sentence-similarity`	Determining how similar two texts are.	✅
Summarization	`summarization`	Producing a shorter version of a document while preserving its important information.	✅
Table Question Answering	`table-question-answering`	Answering a question about information from a given table.	❌
Text Classification	`text-classification` or `sentiment-analysis`	Assigning a label or class to a given text.	✅
Text Generation	`text-generation`	Producing new text by predicting the next word in a sequence.	✅
Text-to-text Generation	`text2text-generation`	Converting one text sequence into another text sequence.	✅
Token Classification	`token-classification` or `ner`	Assigning a label to each token in a text.	✅
Translation	`translation`	Converting text from one language to another.	✅
Zero-Shot Classification	`zero-shot-classification`	Classifying text into classes that are unseen during training.	✅

Computer Vision

Task	ID	Description	Supported?
Depth Estimation	`depth-estimation`	Predicting the depth of objects present in an image.	❌
Image Classification	`image-classification`	Assigning a label or class to an entire image.	✅
Zero-Shot Image Classification	`zero-shot-image`	Classifying images into classes that are unseen during training.	✅
Image Segmentation	`image-segmentation`	Divides an image into segments where each pixel is mapped to an object. This task has multiple variants such as instance segmentation, panoptic segmentation and semantic segmentation.	❌
Image-to-Image	`image-to-image`	Transforming a source image to match the characteristics of a target image or a target image domain.	✅
Mask Generation	`mask-generation`	Generate masks for the objects in an image.	❌
Object Detection	`object-detection`	Identify objects of certain defined classes within an image.	✅
Zero-Shot Object Detection	`zero-shot-object`	Detecting objects in images that are unseen during training.	✅

Audio

Task	ID	Description	Supported?
Audio Classification	`audio-classification`	Assigning a label or class to a given audio.	✅
Audio-to-Audio	N/A	Generating audio from an input audio source.	❌
Automatic Speech Recognition	`automatic-speech-recognition`	Transcribing a given audio into text.	✅
Text-to-Speech	`text-to-speech` or `text-to-audio`	Generating natural-sounding speech given text input.	❌

Multimodal

Task	ID	Description	Supported?
Document Question Answering	`document-question-answering`	Answering questions on document images.	❌
Feature Extraction	`feature-extraction`	Transforming raw data into numerical features that can be processed while preserving the information in the original dataset.	✅
Image Feature Extraction	`image-feature-extraction`	Extracting features from images.	✅
Image-to-Text	`image-to-text`	Output text from a given image.	✅
Text-to-Image	`text-to-image`	Generates images from input text.	❌
Visual Question Answering	`visual-question-answering`	Answering open-ended questions based on an image.	❌
Zero-Shot Audio Classification	`zero-shot-audio-classification`	Classifying audios into classes that are unseen during training.	❌
Zero-Shot Image Classification	`zero-shot-image-classification`	Classifying images into classes that are unseen during training.	✅
Zero-Shot Object Detection	`zero-shot-object-detection`	Identify objects of classes that are unseen during training.	✅

Supported Model Architectures

TransformersPHP supports a wide range of model architectures for various NLP tasks. If the specific model you're interested in isn't listed here, you can open an issue on the repository so we can add support for it. Here's a list of currently tested and supported model architectures:

ALBERT (from Google Research and the Toyota Technological Institute at Chicago) released with the paper ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
BART (from Facebook) released with the paper BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
BERT (from Google) released with the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova.
BERT For Sequence Generation (from Google) released with the paper Leveraging Pre-trained Checkpoints for Sequence Generation Tasks by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
BERTweet (from VinAI Research) released with the paper BERTweet: A pre-trained language model for English Tweets by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen.
BigBird-Pegasus (from Google Research) released with the paper Big Bird: Transformers for Longer Sequences by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
BigBird-RoBERTa (from Google Research) released with the paper Big Bird: Transformers for Longer Sequences by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
CLIP (from OpenAI) released with the paper Learning Transferable Visual Models From Natural Language Supervision by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
CodeGen (from Salesforce) released with the paper A Conversational Paradigm for Program Synthesis by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
ConvBERT (from YituTech) released with the paper ConvBERT: Improving BERT with Span-based Dynamic Convolution by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
DeBERTa (from Microsoft) released with the paper DeBERTa: Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
DeBERTa-v2 (from Microsoft) released with the paper DeBERTa: Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
DETR (from Facebook) released with the paper End-to-End Object Detection with Transformers by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
DistilBERT (from HuggingFace), released together with the paper DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into DistilGPT2, RoBERTa into DistilRoBERTa, Multilingual BERT into DistilmBERT and a German version of DistilBERT.
Donut (from NAVER), released together with the paper OCR-free Document Understanding Transformer by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park.
ELECTRA (from Google Research/Stanford University) released with the paper ELECTRA: Pre-training text encoders as discriminators rather than generators by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
FLAN-T5 (from Google AI) released in the repository google-research/t5x by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei
GPT-2 (from OpenAI) released with the paper Language Models are Unsupervised Multitask Learners by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
GPT-J (from EleutherAI) released in the repository kingoflolz/mesh-transformer-jax by Ben Wang and Aran Komatsuzaki.
GPTBigCode (from BigCode) released with the paper SantaCoder: don't reach for the stars! by Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra.
M2M100 (from Facebook) released with the paper Beyond English-Centric Multilingual Machine Translation by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
MobileBERT (from CMU/Google Brain) released with the paper MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.
OWL-ViT (from Google AI) released with the paper Simple Open-Vocabulary Object Detection with Vision Transformers by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby.
OWLv2 (from Google AI) released with the paper Scaling Open-Vocabulary Object Detection by Matthias Minderer, Alexey Gritsenko, Neil Houlsby.
RoBERTa (from Facebook), released together with the paper RoBERTa: A Robustly Optimized BERT Pretraining Approach by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
RoBERTa-PreLayerNorm (from Facebook) released with the paper fairseq: A Fast, Extensible Toolkit for Sequence Modeling by Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli.
RoFormer (from ZhuiyiTechnology), released together with the paper RoFormer: Enhanced Transformer with Rotary Position Embedding by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
SigLIP (from Google AI) released with the paper Sigmoid Loss for Language Image Pre-Training by Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, Lucas Beyer.
Swin2SR (from University of Würzburg) released with the paper Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration by Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte.
T5 (from Google AI) released with the paper Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
T5v1.1 (from Google AI) released in the repository google-research/text-to-text-transfer-transformer by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
TrOCR (from Microsoft), released together with the paper TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
Vision Transformer (ViT) (from Google AI) released with the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
YOLOS (from Huazhong University of Science & Technology) released with the paper You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu.

Pipelines ​

Creating a Pipeline ​

Pipeline with Options ​

task ​

modelName ​

quantized ​

config ​

cacheDir ​

revision ​

modelFilename ​

Running a Pipeline ​

Basic Usage ​

Handling Multiple Inputs ​

Additional Options ​

Pipeline Output ​

Supported Tasks ​

Natural Language Processing ​

Computer Vision ​

Audio ​

Multimodal ​

Supported Model Architectures ​