

To support this, 🤗 Optimum allows you to provide a calibration dataset. Static quantization relies on feeding batches of data through the model to estimate the activation quantization parameters ahead of inference time. arm64 ( is_static = True, per_channel = False ) Similarly, you can apply static quantization by simply setting is_static to True when instantiating the QuantizationConfig object: qconfig = AutoQuantizationConfig. evaluation_loop ( tokenized_ds ) # Extract logits! ort_outputs. map ( partial ( preprocess_fn, tokenizer = quantizer. from_dict () # Tokenize the inputs def preprocess_fn ( ex, tokenizer ): return tokenizer ( ex ) tokenized_ds = ds. _onnx_config ) # Create a dataset or load one from the Hub ds = Dataset.
OPTIMUM PACKAGES HOW TO
Here's an example of how to load an ONNX Runtime model and generate predictions with it: from functools import partial from datasets import Dataset from import ORTModel # Load quantized model ort_model = ORTModel ( "model-quantized.onnx", quantizer. The result from applying the export() method is a model-quantized.onnx file that can be used to run inference. The feature argument in the from_pretrained() method corresponds to the type of task that we wish to quantize the model for. In this example, we've quantized a model from the Hugging Face Hub, but it could also be a path to a local model directory. export ( onnx_model_path = "model.onnx", onnx_quantized_model_output_path = "model-quantized.onnx", quantization_config = qconfig, ) from_pretrained ( model_checkpoint, feature = "sequence-classification" ) # Quantize the model! quantizer. arm64 ( is_static = False, per_channel = False ) quantizer = ORTQuantizer. Quantizationįor example, here's how you can apply dynamic quantization with ONNX Runtime: from import AutoQuantizationConfig from optimum.onnxruntime import ORTQuantizer # The model we wish to quantize model_checkpoint = "distilbert-base-uncased-finetuned-sst-2-english" # The type of quantization to apply qconfig = AutoQuantizationConfig. These objects are then used to instantiate dedicated optimizers, quantizers, and pruners.
OPTIMUM PACKAGES INSTALL
python -m pip install git+ =optimum QuickstartĪt its core, 🤗 Optimum uses configuration objects to define parameters for optimization on different accelerators.
OPTIMUM PACKAGES CODE
If you'd like to play with the examples or need the bleeding edge of the code and can't wait for a new release, you can install the base library from source as follows: python -m pip install git+įor the accelerator-specific features, you can install them by appending #egg=optimum to the pip command, e.g.


If you'd like to use the accelerator-specific features of 🤗 Optimum, you can install the required dependencies according to the table below: Accelerator 🤗 Optimum can be installed using pip as follows: python -m pip install optimum Optimum enables the usage of popular compression techniques such as quantization and pruning by supporting ONNX Runtime along with Intel Neural Compressor (INC). More information here.Īlong with supporting dedicated AI hardware for training, Optimum also provides inference optimizations towards various frameworks and
