### Course: Advanced Analytics with Transact-SQL (SQL-AA)

#### Description

If you want to learn how to get information from your data with Transact-SQL, or shortly T-SQL language, then this course is the course for you. It will teach you how to calculate statistical measures from descriptive statistics including centers, spreads, skewness and kurtosis of a distribution, find the associations between pairs of variables, including calculating the linear regression formula, calculate the confidence level with definite integration, find the amount of information in your variables, and do also some machine learning or data science analysis, including predictive modeling and text mining.

T-SQL language in latest editions of SQL Server, Azure SQL Database, and Azure Synapse Analytics, has so many business intelligence (BI) improvements that it might become your primary analytic database system. Many database developers and administrators are already proficient with T-SQL. Occasionally they need to analyze the data with statistical or data science methods, but they do not want to or have time to learn a completely new language for these tasks. In addition, they need to analyze huge amounts of data, where specialized languages like R and Python might not be fast enough. SQL Server has been optimized for work with big datasets for decades.

In order to get the maximum out of these language constructs, you need to learn how to properly use them. This in-depth course shows extremely efficient statistical queries that use the window functions and are optimized through algorithms that use mathematical knowledge and creativity. The formulas and usage of those statistical procedures are explained as well.

Any serious analysis starts with data preparation. The course introduces some common data preparation tasks and shows how to implement them in T-SQL. No analysis is good without good data quality. The course introduces data quality issues, and shows how you can check for completeness and accuracy with T-SQL, and how to measure improvements of data quality over time. And since the talk is already about the time, the course shows how you can optimize queries with temporal data, for example when you search for overlapping intervals. More advanced time-oriented information includes hazard and survival analysis.

Then the course switches to the currently most fashionable topic, the data science. Some of quite advanced algorithms can also be implemented in T-SQL. The reader learns about the market basket analysis with association rules using different measures like support and confidence, and even sequential market basket analysis, when there is a sequence in the basket. Then the course shows how to develop predictive models with a mixture of k-nearest neighbor and decision trees algorithms and with Bayesian inference analysis.

Analyzing text, or text mining, is another modern topic. However, many people do not realize that you can do really a lot of text mining also in pure T-SQL. SQL Server can also become a text mining engine. The course shows how to analyze text in multiple natural languages with pure T-SQL, using also features from the full-text search (FTS).

In short, this course teaches you how to use T-SQL for:

- Statistical analysis
- Data science methods
- Text mining.

#### What Attendees Will Learn?

- Describe the distribution of a variable with statistical measures.
- Find associations between pairs of variables.
- Evaluate the quality of the data.
- Analyze data over time.
- Do the market basket analysis.
- Predict outcome of a target variable by using few input variables.
- Extract the key words from text data in order to categorize the texts.

#### Audience

Advanced Analytics with Transact-SQL is a course for database developers and database administrators who want to take their T-SQL programming skills to the max. It is for the attendees who want to analyze huge amounts of data in an efficient way by using their existing knowledge of the T-SQL language. It is also for the attendees who want to improve their querying by teaching them new and original optimization techniques.

#### Modules:

- Descriptive Statistics
- Associations between Pairs of Variables
- Data Preparation
- Data Quality
- Time-Oriented Data
- Time-Oriented Analyses
- Data Mining
- Text Mining

### EmbRace R (E-R)

#### Delivery format

Instructor-led seminar in class with discussions. There are no guided labs; however, the attendees get all of the code and access to the virtual machines, so they can test the code in the evenings.

#### Short Description

R is the most popular environment and language for statistical analyses, data mining, and machine learning. Managed and scalable version of R runs in SQL Server and Azure ML. Learn how to program in R to squeeze the information from your data. This course is both, an independent seminar, and a complement to the Python for SQL Server Specialists seminar, meaning that the two courses overlap only partially; in each course, some different algorithms and techniques are introduced.

#### Target Audience

The target audience is everybody that wants to start analyzing data with R. Database developers that deal with SQL Server and code in T-SQL and want to move more to advanced analytics can get the most of this course.

#### Acquired Skills

After completing the course, the delegates are able to start analyzing their data with statistical and machine learning methods immediately. In addition, they can also prepare the data accordingly for the target analysis, and deploy the solution in SQL Server. Besides practical skills, the delegates also learn the basics of the mathematics behind the algorithms.

#### Abstract

As being an open-source development, R is the most popular analytical engine and programming language for data scientists worldwide. The number of libraries with new analytical functions is enormous and continuously growing. However, there are also some drawbacks. R is a programming language, so you have to learn it to use it. Open-source development also means less control over code. Finally, the free R engine is not scalable.

Microsoft added support for R code in SQL Server 2016 and, Azure Machine Learning, or Azure ML, and in Power BI. A parallelized highly scalable execution engine is used to execute the R scripts. In addition, not every library is allowed in these two environments.

Attendees of this seminar learn to program with R from the scratch. Basic R code is introduced using the free R engine and RStudio IDE. Then the seminar shows some more advanced data manipulations, matrix calculations and statistical analysis together with graphing options. The mathematics behind is briefly explained as well. Then the seminar switches more advanced data mining and machine learning analyses. Attendees also learn how to use the R code in SQL Server.

#### Modules:

- Introduction to R
- Data overview and manipulation
- Basic and advanced visualizations
- Data mining and machine learning methods
- Scalable R in SQL Server

### Python for Data Analysis (P-DA)

#### Delivery format

Instructor-led seminar in class with discussions. There are no guided labs; however, the attendees get all of the code and access to the virtual machines, so they can test the code in the evenings.

#### Short Description

Although R is the most popular environment and language for statistical analyses, data mining, and machine learning, Python as a more general language might be even more popular. Lately, Python is more widely used for data science as well. SQL Server 2017 adds support for running Python code inside the Database Engine. This course is both, an independent seminar, and a complement to the EmbRaceR seminar, meaning that the two courses overlap only partially; in each course, some different algorithms and techniques are introduced.

#### Target Audience

The target audience is everybody that wants to start developing with Python and use the language for machine learning. However, the course is focused on data science, not general development. Database developers that deal with SQL Server and code in T-SQL and want to move more to advanced analytics can get the most of this course.

#### Acquired Skills

After completing the course, the delegates are able to start analyzing their data with statistical and machine learning methods immediately. In addition, they can also prepare the data accordingly for the target analysis, and deploy the solution in SQL Server. Besides practical skills, the delegates also learn the basics of the mathematics behind the algorithms.

#### Abstract

Python is more organized language than R. In last years, many data analytics libraries for Python evolved, and thus Python is catching up with R even in the data science area.

Microsoft added support for Python code in SQL Server in version 2017. Now you can use either R or Python inside the Database Engine for advanced tasks like predictive analytics. Therefore, you can use the language that suits you better. Statisticians and mathematicians might prefer R, while developers tend to be more Python oriented. Python has also become overwhelming analytical language in the Azure cloud.

Attendees of this seminar learn to program with Python from the scratch. Basic Python code is introduced using the Python engine installed with SQL Server and Visual Studio. The seminar shows some more advanced data manipulations, matrix calculations and statistical analysis together with graphing options. The mathematics behind is briefly explained as well. Then the seminar switches to more advanced data mining and machine learning analyses. Finally, the seminar introduces how you can use Python in SQL Server and in Azure.

#### Modules:

- Introduction to Python
- Data overview and manipulation
- Basic and advanced visualizations
- Data mining and machine learning methods
- Scalable Python in SQL Server, Power BI, and Azure ML