SynapseML (Open Source) in Azure Synapse Analytics Spark

Balamurugan Balakreshnan
1 min readFeb 26, 2022

--

How to run synalpseml in Azure Syanpse Analtyics Spark

Prerequisites

  • Azure Subscription
  • Azure Synapse Analytics Workspace
  • SynapseML installation instruction

Steps

  • Create a new Spark 3.1 cluster
  • For spark 3.1 this step is important
  • Spark configuration below image
  • Create a new notebook
  • Select the above cluster
  • This has to be in first cluster
%%configure -f
{
"name": "synapseml",
"conf": {
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:0.9.4",
"spark.jars.repositories": "https://mmlspark.azureedge.net/maven",
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12",
"spark.yarn.user.classpath.first": "true"
}
}
  • now imports
from synapse.ml import *
from synapse.ml.cognitive import *
dataFile = "AdultCensusIncome.csv"
import os, urllib
if not os.path.isfile(dataFile):
urllib.request.urlretrieve("https://mmlspark.azureedge.net/datasets/" + dataFile, dataFile)
data = spark.createDataFrame(pd.read_csv(dataFile, dtype={" hours-per-week": np.float64}))
data.show(5)
data = data.select([" education", " marital-status", " hours-per-week", " income"])
train, test = data.randomSplit([0.75, 0.25], seed=123)
from synapse.ml.train import TrainClassifier
from pyspark.ml.classification import LogisticRegression
model = TrainClassifier(model=LogisticRegression(), labelCol=" income").fit(train)
from synapse.ml.train import ComputeModelStatistics
prediction = model.transform(test)
metrics = ComputeModelStatistics().transform(prediction)
metrics.select('accuracy').show()

Originally published at https://github.com.

--

--

No responses yet