R and Data Manipulation: Big Data Application

 

Venue

Adya Hotel, Langkawi Island, Malaysia

Date

5 – 6 November 2017

Time

08:30 to 17:00

 

Organised by:

The 3rd International Conference on Computing, Mathematics and Statistics (iCMS2017) UiTM Kedah

 

Trainer and Facilitator:  Matthias Templ, PhD

Dr Matthias Templ is the owner of data analysis company OG. He obtained his PhD in Technical Mathematics from Vienna University of Technology. He is an associate professor at the same university. Currently Dr Templ lectures at the Institute of Data Analysis and Process Design, Zurich University of Applied Sciences, Switzerland. He also holds a position as a consultant to the Palacký University Olomouc, Czech Republic. Dr Templ is a researcher at the Quality and Methods Unit at Statistics Austria. His other consultation works include for the World Bank and the OECD. His main research interest is in the area of imputation, statistical disclosure control, visualization, compositional data analysis, computational statistics, teaching in statistics and cluster analysis. He participated in many European research projects, to name one of those: the AMELI project for adequate estimation of poverty and social exclusion. Dr Templ has more than 280 scientific contributions in books, scientific journals and conferences as well as more than 40 papers in well-known indexed scientific journals. He is the author and maintainer of several R packages for official statistics, such as the sdcMicro package for statistical disclosure control, the VIM package for visualization and imputation of missing values, the simPop package for synthetic population simulation, and the robCompositions package for robust analysis of compositional data. He is also a co-author to many other packages. In addition, Dr. Matthias Templ is the editor-in-chief of the Austrian Journal of Statistics; associate editor for Springer’s Journal Wirtschafts-und Sozialstatistische Archiv; editor for the journal The Register; and on the editorial board of three other journals – JWARP journal, The Journal Transactions on Data Privacy, and the Journal Frontiers in Biomedical Physics.

 

WHAT IS R? 

The R provides a powerful and comprehensive platform for visualizing data, understanding and evaluating statistical models, and effectively communicating research results to both technical and nontechnical audiences. R is a programming language and software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. R is ready to use for applications in data science, data analytics and for big data sets. It is expected that after the 2-days workshop, participants will be able to use efficient R code and use R packages to perform data analytics, including the work with big data sets using R packages like dplyr, Rcpp and ggplot2.

 

WHO SHOULD ATTEND? 

R is widely used within the academia especially in the fields of computational biology, applied science, quantitative finance and business intelligence. R is capable of solving challenging problems and among the strengths of R are its powerful built-in tools for inferential statistics, its compact modelling syntax, and its data visualization capabilities. In addition, R's open source nature and its extensibility via add-on "packages" has allowed it to keep up with the leading edge in academic research. This workshop on R and Data Visualization is suitable and relevant for: Lecturers, Researchers, Engineers, Students, Industry Professionals and Scientists of any discipline who wish to explore R and Big Data. Since this is a continuation to our first R pre-conference workshop in 2015, prior experience with R is required. The instructor will not cover the very basics of R but doing an introductory course on efficient methods in R. If you have no basic knowledge in R, please prepare yourself by:

1.    doing the examples on http://tryr.codeschool.com/ or/and RSwirl package beforehand.

2.    reading http://adv-r.had.co.nz/ until (and including) the chapter on functions

 

COURSE OUTLINE

  1. Indexing and vectorized computations and data manipulation with base R. The aim is to use efficient R code and functions. 
  2. Efficient and fast data manipulation with dplyr (and optionally data.table) to speed up your data manipulation and use a consistent approach for data manipulation. Seven useful functions when working with dplyr and the use of pipeline syntax. Application of these functions on example data.
  3. How to make your computations faster. A very basic introduction to profiling, Rcpp and parallel computing. This part is only lectured without practical hands-on by the participants.
  4. Some comments and issues when working with big data sets. We touch issues on keeping data in a data base and outline some approaches to deal with big data sets. Open discussion.
  5. Data visualization using ggplot2 to produce nice looking graphics with less code. The main aim is to understand the concept of the grammar of graphics implementation of ggplot2 (aethetics, geometric objects, statistical objects, scales, themes). The differences to the graphics package are shown and participants will be trained to use ggplot2 on example data sets.

 

WORKSHOP SCHEDULE

 

 DAY 1

SUNDAY (5th Nov 2017)

DAY 2

MONDAY (6th Nov 2017)

8.15 – 8.45

Registration

Registration

8.45 – 10.45

Brief R Introduction.

How to make your computations faster.

A very basic introduction to profiling, Rcpp, and parallel computing.

10.45 – 11.00

COFFEE BREAK AND DISCUSSION

11.00 – 13.00

 

Indexing and vectorized computations and data manipulation with base R. The aim is to use efficient R code and functions.

Some comments and issues when working with big data sets. We touch issues on keeping data in a data base and outline some approaches to deal with big data sets.

13.00 – 14.00

 LUNCH AND DISCUSSION

14.00 – 17.00

Efficient and fast data manipulation with dplyr (and optionally data.table) to speed up your data manipulation and use a consistent approach for data manipulation. Seven useful functions when working with dplyr and the use of pipeline syntax. Application of these functions on example data.

Data visualization using ggplot2 to produce nice looking graphics with less code.

The main aim is to understand the concept of the grammar of graphics implementation of ggplot2 (aethetics, geometric objects, statistical objects, scales, themes).

The differences to the graphics package are shown and participants will be trained to use ggplot2 on example data sets.

17.00

TEA BREAK

 

If you have any queries on R please email: This email address is being protected from spambots. You need JavaScript enabled to view it.

For other queries, please use the official email address

To register, please go to the registration page