Automated Machine learning End to End Open Hack

Balamurugan Balakreshnan
Analytics Vidhya
Published in
6 min readMay 17, 2021

--

Citizen Data science open hack

Introduction

Automated Machine Learning hackathon: To learn how to use automated machine learning build models and deploy to production.

Agenda

  • Introduction to Openhack
  • Introduction to Azure Machine learning — 1 hour
  • Introduction to Machine learning Ops — 1 hour
  • Openhack use case introduction — 4hours
  • Deploy model — 1 hour
  • Clean up — 15 minutes
  • Recap — 1 hour

Use Case

  • Predicting population growth for years to come. Predicting population provides economist to know how to build the next generation supply chain across the world.
  • This information also allows countries and state to plan their future for healthcare, consumer demand and even urban development.
  • Build a end to end pipeline
  • Move data from source into Azure using Azure data factory
  • Process data as you move into Azure storage
  • We can use data flow in Data factory to process data to consume in machine learning
  • ETL/Data Engineering is the scope of data processing
  • There is no need to format data for ML algorithms
  • Tabular data is enough
  • Make sure all the features and label are available
  • To Showcase Data Ops + ML Ops
  • End to end process to get the data and process and then create machine learning models and consume
  • Note: the accuracy of the model is not important
  • We are using regression as a sample to predict population
  • Assumption is the source data in Azure Open data set and validate the configuration

Architecture

Azure Resources

Steps

  • Choose the container
  • Choose or type container name as “censusdatacontainer”
  • Type or Select the foldername as “release/us_population_zip/”
  • For file name type as “*.parquet” for parquet or if you see the below file select the right one as below
  • Files name and type can change — since it’s a open source data set.
  • Name it as ADLSoutput
  • Select the subscription and storage account name
  • Click Next and Next
  • Keep everything as default
  • Click Finish

Sometime is takes like 15 to 20 minutes or more depending on the data size. To increase the speed we can copy with more DIU unit or parallelizing the copy activity.

Model Development

Now to Azure Machine learning setups

  • Lets create a setup
  • First Create a dataset and datastore
  • Create a compute cpu-compute
  • Use 0 to 2 nodes is good
  • Once the experiment completed
  • usually takes 2 to 3 hours
  • Check the explanation or feature importance to see which columns had the most impact on prediction
  • Logs model information
  • Those are the columns we need to use

Model Deployment

  • Select the best model and Deploy
  • Deploy to AKS cluster
  • Create AKS cluster
  • Go to Compute and create inference cluster
  • Select Vm configuration
  • I am choosing Dev/Test since this is for hackathon
  • For production note minimum 12 nodes AKS cluster has to be created for HA
  • Production config below
  • Then click create
  • AKS cluster creation started
  • Click aksdeploy left to succeeded to go to Endpoint information
  • Once created an Endpoint will be created
  • Click Test tab
  • Type in the following information
decennialTime : 2020 zipCode: 77480 race: WHITE ALONE sex: Female minAge: 56 maxAge: 59

Cleanup

  • Delete all the resources
  • Drop the entire Resource group to get rid of all the components

Comments and Feedback

Originally published at https://github.com.

--

--