SynapseML (Open Source) in Azure Synapse Analytics Spark

How to run synalpseml in Azure Syanpse Analtyics Spark


  • Azure Subscription
  • Azure Synapse Analytics Workspace
  • SynapseML installation instruction


  • Create a new Spark 3.1 cluster
  • For spark 3.1 this step is important
  • Spark configuration below image
  • Create a new notebook
  • Select the above cluster
  • This has to be in first cluster
%%configure -f
"name": "synapseml",
"conf": {
"spark.jars.packages": "",
"spark.jars.repositories": "",
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12",
"spark.yarn.user.classpath.first": "true"
  • now imports
from import *
from import *
dataFile = "AdultCensusIncome.csv"
import os, urllib
if not os.path.isfile(dataFile):
urllib.request.urlretrieve("" + dataFile, dataFile)
data = spark.createDataFrame(pd.read_csv(dataFile, dtype={" hours-per-week": np.float64}))
data =[" education", " marital-status", " hours-per-week", " income"])
train, test = data.randomSplit([0.75, 0.25], seed=123)
from import TrainClassifier
from import LogisticRegression
model = TrainClassifier(model=LogisticRegression(), labelCol=" income").fit(train)
from import ComputeModelStatistics
prediction = model.transform(test)
metrics = ComputeModelStatistics().transform(prediction)'accuracy').show()

