Azure Machine learning Fine tuning Parallel Large language models Q&A and Summarization

Balamurugan Balakreshnan
4 min readSep 23, 2023

--

Using GPU compute to fine tune large language models and using deepspeed to parallelize the fine tuning

introduction

Runs

import ast
if "computes_allow_list" in foundation_model.tags:
computes_allow_list = ast.literal_eval(
foundation_model.tags["computes_allow_list"]
) # convert string to python list
#computes_allow_list.append("STANDARD_NC24ADS_A100_V4")
computes_allow_list.append("Standard_NC48ads_A100_v4")
computes_allow_list.append("Standard_NC96ads_A100_v4")
print(f"Please create a compute from the above list - {computes_allow_list}")
else:
computes_allow_list = None
print("Computes allow list is not part of model tags")
  • to optimize the speed of the run we need change batch size in step 5
# Training parameters
training_parameters = dict(
num_train_epochs=3,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
learning_rate=2e-5,
metric_for_best_model="rouge1"
#auto_find_batch_size=True
)
print(f"The following training parameters are enabled - {training_parameters}")
# Optimization parameters - As these parameters are packaged with the model itself, lets retrieve those parameters
if "model_specific_defaults" in foundation_model.tags:
optimization_parameters = ast.literal_eval(
foundation_model.tags["model_specific_defaults"]
) # convert string to python dict
else:
optimization_parameters = dict(
apply_lora="true", apply_deepspeed="true", apply_ort="true"
)
print(f"The following optimizations are enabled - {optimization_parameters}")
  • change batch size to 8
  • add this to the pipeline creation cell at the end of that cell
# set the pytorch and mlflow mode to mount
pipeline_object.jobs["summarization_pipeline"]["outputs"]["pytorch_model_folder"].mode = "mount"
pipeline_object.jobs["summarization_pipeline"]["outputs"]["mlflow_model_folder"].mode = "mount"
  • Above will set the output for pytorch to mount mode
  • other wise it will error at the end of fine tuning step
  • compute
  • Summary information
  • metric
  • GPU utilization
  • GPU memory utilization
  • GPU energy usage

Second run

  • using 2 A100 4 GPU computer
  • Using SKU — Standard_NC48ads_A100_v4 (48 cores, 440 GB RAM, 128 GB disk)
  • we are using 2 GPU VM to horizontally run the fine tuning
  • same notebook as before
  • make sure the changes in steps 3 and 5 are changed as above previous run
  • compute
  • Summary information
  • Metrics
  • GPU utilization
  • GPU memory utilization
  • GPU Energy usage

Summary

  • now both runs
  • First one with Standard_NC48ads_A100_v4 (48 cores, 440 GB RAM, 128 GB disk)
  • Second one is Standard_NC96ads_A100_v4 (96 cores, 880 GB RAM, 256 GB disk)

original article — Samples2023/AzureML/finetuningamlparallel.md at main · balakreshnan/Samples2023 (github.com)

--

--

No responses yet