Automated Machine learning End to End Open Hack

Published in

Analytics Vidhya

6 min readMay 17, 2021

Citizen Data science open hack

Introduction

Automated Machine Learning hackathon: To learn how to use automated machine learning build models and deploy to production.

Agenda

Introduction to Openhack
Introduction to Azure Machine learning — 1 hour
Introduction to Machine learning Ops — 1 hour
Openhack use case introduction — 4hours
Deploy model — 1 hour
Clean up — 15 minutes
Recap — 1 hour

Use Case

Predicting population growth for years to come. Predicting population provides economist to know how to build the next generation supply chain across the world.
This information also allows countries and state to plan their future for healthcare, consumer demand and even urban development.
Build a end to end pipeline
Move data from source into Azure using Azure data factory
Process data as you move into Azure storage
We can use data flow in Data factory to process data to consume in machine learning
ETL/Data Engineering is the scope of data processing
There is no need to format data for ML algorithms
Tabular data is enough
Make sure all the features and label are available
To Showcase Data Ops + ML Ops
End to end process to get the data and process and then create machine learning models and consume
Note: the accuracy of the model is not important
We are using regression as a sample to predict population
Assumption is the source data in Azure Open data set and validate the configuration

Architecture

Azure Resources

Azure Account — https://azure.microsoft.com/en-us/free/?WT.mc_id=A261C142F
Create a Resource group called automlopenhack — https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/manage-resource-groups-portal#create-resource-groups
Create Azure Storage account called automloutput — https://docs.microsoft.com/en-us/azure/storage/common/storage-account-create?tabs=azure-portal
Create a Azure Data Factory — automladfopenhack — https://docs.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-portal#create-a-data-factory
Azure open data set configuration
Create Azure machine learning service — https://docs.microsoft.com/en-us/azure/machine-learning/quickstart-create-resources
Also create a container registry to store model pickle files
Create a compute-cluster — https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-attach-compute-cluster?tabs=python
Create a Inference-cluster for AKS — https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-attach-kubernetes?tabs=azure-portal#create-a-new-aks-cluster
List of resources created

Steps

Create Data factory copy pipeline
Create a blob storage
For URI : https://azureopendatastorage.blob.core.windows.net/
For SAS key type in “”
Do test connection to see if connection is success
Create a blob source
save the input as azureopendataset

Choose the container
Choose or type container name as “censusdatacontainer”
Type or Select the foldername as “release/us_population_zip/”
For file name type as “*.parquet” for parquet or if you see the below file select the right one as below
Files name and type can change — since it’s a open source data set.

Name it as ADLSoutput
Select the subscription and storage account name

Click Next and Next
Keep everything as default
Click Finish

Sometime is takes like 15 to 20 minutes or more depending on the data size. To increase the speed we can copy with more DIU unit or parallelizing the copy activity.

Model Development

Now to Azure Machine learning setups

Lets create a setup
First Create a dataset and datastore

Create a compute cpu-compute
Use 0 to 2 nodes is good

Once the experiment completed
usually takes 2 to 3 hours

Check the explanation or feature importance to see which columns had the most impact on prediction
Logs model information

Those are the columns we need to use

Model Deployment

Select the best model and Deploy
Deploy to AKS cluster
Create AKS cluster
Go to Compute and create inference cluster

Select Vm configuration
I am choosing Dev/Test since this is for hackathon
For production note minimum 12 nodes AKS cluster has to be created for HA
Production config below

Then click create
AKS cluster creation started

Click aksdeploy left to succeeded to go to Endpoint information
Once created an Endpoint will be created

Click Test tab
Type in the following information

decennialTime : 2020 zipCode: 77480 race: WHITE ALONE sex: Female minAge: 56 maxAge: 59

Cleanup

Delete all the resources
Drop the entire Resource group to get rid of all the components

Comments and Feedback

Originally published at https://github.com.