Using Dataflow and Synapse Spark to analyze data and Spark ML modelling. Using TPCH data Pre-requisites Azure subscription Azure Storage Account Azure Synapse Analytics Load TPCH data I had to limit line items data as it was 150 billion rows. Dataset Rows Orders: 15,000,000,000 Customers: 1,500,000,000 Lineitems: 279,286,998 Goal Use Dataflow to analyze data Use join and create year, month and day columns