Fine tuning PHI 2 with Azure Machine learning and upload to hugging face — Cricket IPL Dataset

Balamurugan Balakreshnan

4 min readApr 6, 2024

Introduction

Let’s use Azure machine learning to fine tune the phi 2 model and upload to hugging face.
Using open source dataset for public documentation
Using PHI 2 model
Then uploading to a my new repo
Creating my own Cricket dataset from IPL data

Requirements

Azure Account
Azure Machine Learning Service
Compute instance with GPU compute
using Standard_NC48ads_A100_v4 (48 cores, 440 GB RAM, 128 GB disk)
Model size is 6GB
So we need enough memory to load the model and train
Need hugging face token with write access.
We can use read access key to download the model from hugging face
TO upload to hugging face we need write access token

Steps

First lets create the compute instance
Use GPU SKU for model finetuning
log into terminal and create conda environment

conda create -n finetune python=3.10 anaconda

ipython kernel install --user --name finetune --display-name "finetune"

Now create a new notebook and select the kernel finetune
Install necessary libraries

%pip install transformers
%pip install trl
%pip install datasets
%pip install protobuf
%pip install evaluate
%pip install peft
%pip install bitsandbytes
%pip install accelerate
%pip install sentencepiece

Now code to fine tune

import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig
from trl import SFTTrainer

Setup the model information like name for input and output model

# Model from Hugging Face hub
base_model = "microsoft/phi-2"

# Fine-tuned model
new_model = "phi2cricketipl"

Convert the JSONL to LLaMa 2 format.
Will help fine tune both with phi 2 and LLama2.
Load dataset.

from datasets import load_dataset
dataset = load_dataset('csv', data_files='llama2_crick_dataset.csv', split="train")

now login into hugging face

from huggingface_hub import notebook_login
notebook_login()

Use the write token to read and upload the model
Setup bits and bytes config

compute_dtype = getattr(torch, "float16")

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=False,
)

Setup the base model

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=quant_config,
    device_map={"": 0}
)
model.config.use_cache = False
model.config.pretraining_tp = 1

get the tokenizer

tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Setup LoRA config

peft_params = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
)

Setup training parameters

training_params = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",
    save_steps=25,
    logging_steps=25,
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=False,
    bf16=False,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="constant",
    report_to="tensorboard"
)

Make sure batch size is set to 1
other wise it will not fit into the GPU memory
Setup the trainer

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_params,
    dataset_text_field="text",
    max_seq_length=None,
    tokenizer=tokenizer,
    args=training_params,
    packing=False,
)

now train the model — fine tuning

trainer.train()

Save the output to upload

trainer.model.save_pretrained(new_model)
trainer.tokenizer.save_pretrained(new_model)

Test the output

logging.set_verbosity(logging.CRITICAL)

prompt = "What is Mayank Agarwal's batting average?"
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(f"<s>[INST] {prompt} [/INST]")
print(result[0]['generated_text'])

now upload to huggingface

from peft import PeftModel

Setup the model to upload

# Reload model in FP16 and merge it with LoRA weights
load_model = AutoModelForCausalLM.from_pretrained(
    base_model,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map={"": 0},
)

model = PeftModel.from_pretrained(load_model, new_model)
model = model.merge_and_unload()# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

from huggingface_hub import notebook_login

notebook_login()

setup up the model repo name

hugginfacemoderepo = 'Balab2021/phi2'
new_model = "phi2-chat-g"

upload the model to hub

model.push_to_hub(new_model, use_temp_dir=False)
tokenizer.push_to_hub(new_model, use_temp_dir=False)

here is the huggingface link to the model
huggingface url: https://huggingface.co/Balab2021/phi2cricketipl

Note: this is for learning and documentation purpose only.

Original article — Samples2024/finetuning/cricketphi2.md at main · balakreshnan/Samples2024 (github.com)

Fine tuning PHI 2 with Azure Machine learning and upload to hugging face — Cricket IPL Dataset

Introduction

Requirements

Steps

Written by Balamurugan Balakreshnan