Azure Machine learning Fine tuning Parallel Large language models Q&A and Summarization

4 min readSep 23, 2023

Using GPU compute to fine tune large language models and using deepspeed to parallelize the fine tuning

introduction

Fine tune large language models in Azure ML
Using GPU Clusters
Using existing samples from our Azure machine learning sdk examples
I had to request quota increase using Azure ML to achieve this experiment
using open source data set
Summarization url : https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/system/finetune/summarization/news-summary.ipynb
QnA url: https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/system/finetune/question-answering/extractive-qa.ipynb
I am using python 3 kernel

Runs

using 2 A100 4 GPU computer
Using SKU — Standard_NC96ads_A100_v4 (96 cores, 880 GB RAM, 256 GB disk)
we are using 2 GPU VM to horizontally run the fine tuning
Few changes to do in step 5
Summarization url : https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/system/finetune/summarization/news-summary.ipynb
in step 3 i need to add the above SKU

import ast

if "computes_allow_list" in foundation_model.tags:
    computes_allow_list = ast.literal_eval(
        foundation_model.tags["computes_allow_list"]
    )  # convert string to python list
    #computes_allow_list.append("STANDARD_NC24ADS_A100_V4")
    computes_allow_list.append("Standard_NC48ads_A100_v4")
    computes_allow_list.append("Standard_NC96ads_A100_v4") 
    print(f"Please create a compute from the above list - {computes_allow_list}")
else:
    computes_allow_list = None
    print("Computes allow list is not part of model tags")

to optimize the speed of the run we need change batch size in step 5

# Training parameters
training_parameters = dict(
    num_train_epochs=3,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    learning_rate=2e-5,
    metric_for_best_model="rouge1"
    #auto_find_batch_size=True
)
print(f"The following training parameters are enabled - {training_parameters}")

# Optimization parameters - As these parameters are packaged with the model itself, lets retrieve those parameters
if "model_specific_defaults" in foundation_model.tags:
    optimization_parameters = ast.literal_eval(
        foundation_model.tags["model_specific_defaults"]
    )  # convert string to python dict
else:
    optimization_parameters = dict(
        apply_lora="true", apply_deepspeed="true", apply_ort="true"
    )
print(f"The following optimizations are enabled - {optimization_parameters}")

change batch size to 8
add this to the pipeline creation cell at the end of that cell

# set the pytorch and mlflow mode to mount
pipeline_object.jobs["summarization_pipeline"]["outputs"]["pytorch_model_folder"].mode = "mount"

pipeline_object.jobs["summarization_pipeline"]["outputs"]["mlflow_model_folder"].mode = "mount"

Above will set the output for pytorch to mount mode
other wise it will error at the end of fine tuning step

compute

Summary information

metric

GPU utilization

GPU memory utilization

GPU energy usage

Second run

using 2 A100 4 GPU computer
Using SKU — Standard_NC48ads_A100_v4 (48 cores, 440 GB RAM, 128 GB disk)
we are using 2 GPU VM to horizontally run the fine tuning
same notebook as before
make sure the changes in steps 3 and 5 are changed as above previous run

compute

Summary information

Metrics

GPU utilization

GPU memory utilization

GPU Energy usage

Summary

now both runs
First one with Standard_NC48ads_A100_v4 (48 cores, 440 GB RAM, 128 GB disk)
Second one is Standard_NC96ads_A100_v4 (96 cores, 880 GB RAM, 256 GB disk)

original article — Samples2023/AzureML/finetuningamlparallel.md at main · balakreshnan/Samples2023 (github.com)

Azure Machine learning Fine tuning Parallel Large language models Q&A and Summarization

Using GPU compute to fine tune large language models and using deepspeed to parallelize the fine tuning

introduction

Runs

Second run

Summary

Written by Balamurugan Balakreshnan

No responses yet