Introducing the language, statistics, data mining, and machine learning with R and using R in SQL Server and Microsoft BI stack
Author and Instructor: Dejan Sarka
Dejan Sarka, MCT and SQL Server MVP, is an independent trainer and consultant that focuses on development of database & business intelligence applications. Besides projects, he spends about half of the time on training and mentoring. He is the founder of the Slovenian SQL Server and .NET Users Group. Dejan Sarka is the main author or coauthor of eighteen books about databases, SQL Server, and data science. Dejan Sarka also developed many courses and seminars for Microsoft, SolidQ and Pluralsight.
R is the most popular environment and language for statistical analyses, data mining, and machine learning. Managed and scalable version of R runs in SQL Server and Azure ML.
As being an open source development, R is the most popular analytical engine and programming language for data scientists worldwide. The number of libraries with new analytical functions is enormous and continuously growing. However, there are also some drawbacks. R is a programming language, so you have to learn it to use it. Open source development also means less control over code. Finally, the free R engine is not scalable.
Microsoft added support for R code in SQL Server 2016, and continues to support it in later versions. A parallelized highly scalable execution engine is used to execute the R scripts. In addition, not every library is allowed in these two environments.
Attendees of this course learn to program with R from the scratch. Basic R code is introduced using the free R engine and RStudio IDE. A lifecycle of a data science project is explained in details. The attendees learn how to perform the data overview and do the most tedious task in a project, the data preparation task. After data overview and preparation, the analytical part begins with intermediate statistics in order to analyze associations between pairs of variables. Then the course introduces more advanced methods for researching linear dependencies.
Too many variables in a model can make its own problem. The course shows how to do feature selection, starting with the basics of matrix calculations. Then the course switches more advanced data mining and machine learning analyses, including supervised and unsupervised learning. The course also introduces the currently modern topics, including forecasting, text mining, and reinforcement learning. Finally, the attendees also learn how to use the R code in SQL Server, Azure ML, and Power BI.
Attendees should have basic understanding of data analysis and basic familiarity with SQL Server tools.