Fine tuning PHI 2 with Azure Machine learning and upload to hugging face — Cricket IPL Dataset
4 min readApr 6, 2024
Introduction
- Let’s use Azure machine learning to fine tune the phi 2 model and upload to hugging face.
- Using open source dataset for public documentation
- Using PHI 2 model
- Then uploading to a my new repo
- Creating my own Cricket dataset from IPL data
Requirements
- Azure Account
- Azure Machine Learning Service
- Compute instance with GPU compute
- using Standard_NC48ads_A100_v4 (48 cores, 440 GB RAM, 128 GB disk)
- Model size is 6GB
- So we need enough memory to load the model and train
- Need hugging face token with write access.
- We can use read access key to download the model from hugging face
- TO upload to hugging face we need write access token
Steps
- First lets create the compute instance
- Use GPU SKU for model finetuning
- log into terminal and create conda environment
conda create -n finetune python=3.10 anaconda
ipython kernel install --user --name finetune --display-name "finetune"
- Now create a new notebook and select the kernel finetune
- Install necessary libraries
%pip install transformers
%pip install trl
%pip install datasets
%pip install protobuf
%pip install evaluate
%pip install peft
%pip install bitsandbytes
%pip install accelerate
%pip install sentencepiece
- Now code to fine tune
import os
import torch
from datasets import load_dataset
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
TrainingArguments,
pipeline,
logging,
)
from peft import LoraConfig
from trl import SFTTrainer
- Setup the model information like name for input and output model
# Model from Hugging Face hub
base_model = "microsoft/phi-2"
# Fine-tuned model
new_model = "phi2cricketipl"
- Convert the JSONL to LLaMa 2 format.
- Will help fine tune both with phi 2 and LLama2.
- Load dataset.
from datasets import load_dataset
dataset = load_dataset('csv', data_files='llama2_crick_dataset.csv', split="train")
- now login into hugging face
from huggingface_hub import notebook_login
notebook_login()
- Use the write token to read and upload the model
- Setup bits and bytes config
compute_dtype = getattr(torch, "float16")
quant_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=compute_dtype,
bnb_4bit_use_double_quant=False,
)
- Setup the base model
model = AutoModelForCausalLM.from_pretrained(
base_model,
quantization_config=quant_config,
device_map={"": 0}
)
model.config.use_cache = False
model.config.pretraining_tp = 1
- get the tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
- Setup LoRA config
peft_params = LoraConfig(
lora_alpha=16,
lora_dropout=0.1,
r=64,
bias="none",
task_type="CAUSAL_LM",
)
- Setup training parameters
training_params = TrainingArguments(
output_dir="./results",
num_train_epochs=1,
per_device_train_batch_size=1,
gradient_accumulation_steps=1,
optim="paged_adamw_32bit",
save_steps=25,
logging_steps=25,
learning_rate=2e-4,
weight_decay=0.001,
fp16=False,
bf16=False,
max_grad_norm=0.3,
max_steps=-1,
warmup_ratio=0.03,
group_by_length=True,
lr_scheduler_type="constant",
report_to="tensorboard"
)
- Make sure batch size is set to 1
- other wise it will not fit into the GPU memory
- Setup the trainer
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
peft_config=peft_params,
dataset_text_field="text",
max_seq_length=None,
tokenizer=tokenizer,
args=training_params,
packing=False,
)
- now train the model — fine tuning
trainer.train()
- Save the output to upload
trainer.model.save_pretrained(new_model)
trainer.tokenizer.save_pretrained(new_model)
- Test the output
logging.set_verbosity(logging.CRITICAL)
prompt = "What is Mayank Agarwal's batting average?"
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(f"<s>[INST] {prompt} [/INST]")
print(result[0]['generated_text'])
- now upload to huggingface
from peft import PeftModel
- Setup the model to upload
# Reload model in FP16 and merge it with LoRA weights
load_model = AutoModelForCausalLM.from_pretrained(
base_model,
low_cpu_mem_usage=True,
return_dict=True,
torch_dtype=torch.float16,
device_map={"": 0},
)
model = PeftModel.from_pretrained(load_model, new_model)
model = model.merge_and_unload()# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
- login into huggingface
from huggingface_hub import notebook_login
notebook_login()
- setup up the model repo name
hugginfacemoderepo = 'Balab2021/phi2'
new_model = "phi2-chat-g"
- upload the model to hub
model.push_to_hub(new_model, use_temp_dir=False)
tokenizer.push_to_hub(new_model, use_temp_dir=False)
- here is the huggingface link to the model
- huggingface url: https://huggingface.co/Balab2021/phi2cricketipl
Note: this is for learning and documentation purpose only.
Original article — Samples2024/finetuning/cricketphi2.md at main · balakreshnan/Samples2024 (github.com)