Delavnica poteka štiri ure na dan, v zaporedju 2 dveh dneh od 8:00 - 12:00 ure.
Don't just use statistics, data mining, and machine learning, without understanding how it works. Get the insights in the most popular algorithms.
Advanced data analysis techniques are gaining popularity. With modern statistics / data mining / machine learning engines, products and packages, like SQL Server Analysis Services (SSAS), R, Python, and Azure ML, data mining has become a black box. It is possible to use data mining without knowing how it works. However, not knowing how the algorithms work might lead to many problems, including using the wrong algorithm for a task, misinterpretation of the results, and more. This seminar explains how the most popular data mining algorithms work, when to use which algorithm, and advantages and drawbacks of each algorithm as well. Demonstrations show the algorithms usage in SQL Server Analysis Services, R and Python languages and SQL Server Machine Learning Services, Azure ML native algorithms, using R in Power BI, and using the R and Python algorithms in Azure ML. The attendees also learn how to evaluate different predictive and unsupervised models.
Algorithms explained include Naïve Bayes, Decision Trees, Random Forests, Gradient Boosted Trees, Neural Networks, Logistic Regression, Perceptron Model, Linear Regression, Regression Trees, Ordinal Regression, Poisson Regression, Principal Component Analysis, K-Nearest Neighbors, Support Vector Machines, Hierarchical Clustering, K-Means Clustering, Gaussian Mixed Models Clustering, Association Rules, Sequence Clustering, Auto-Regressive Trees with Cross-Prediction (ARTXP), Auto-Regressive Integrated Moving Average (ARIMA), and Time Series.
The seminar also includes the explanation of the introductory statistics, including descriptive statistics, correlations and linear associations. Even the information theory is touched briefly. All of these methods are useful for gathering understanding of the data used for later analysis and advanced data profiling. Mining unstructured data, specifically texts, is covered in the course as well.
The focus of the training is the theoretical concepts of advanced analytics. The importance for the attendees to fully understand how the algorithms work, how to correctly use them, how to prepare the data, and how to interpret the results is the first training goal. The software part is used just for showing the concepts and enriching the concept with examples. It helps a lot in understanding how to work with data, how to prepare useful derived variables, or to smooth values of a variable appropriately, or to discretize them correctly, etc. Attendees can and should be able to use different tools in the future.
Attendees should have basic understanding of data analysis, relational data models; knowledge in statistics and mathematics is a very desired to get the maximum results of this training.
Every attendee gets a .PDF printout of all slides and all of the code shown in the seminar.
Author and Instructor
Dejan Sarka, MCT and Data Platform MVP, is an independent trainer and consultant that focuses on development of database & business intelligence applications. Besides projects, he spends about half of the time on training and mentoring. He is the founder of the Slovenian SQL Server and .NET Users Group. Dejan Sarka is the main author or coauthor of seventeen books about databases and SQL Server. Dejan Sarka also developed many courses and seminars for Microsoft, SolidQ and Pluralsight.
- Introduction to data mining and / or machine learning
- Data preparation and overview
- Classification, prediction and estimation algorithms
- Forecasting, unsupervised algorithms, and text mining
- 08:00 - 12:00