Big Data Methods with R
The “Big Data Methods with R” training course is an excellent choice for organisations willing to leverage their existing R skills and extend them to include R’s connectivity with a large variety of Big Data tools, storage solutions (e.g. SQL/NoSQL databases) and processing engines (Hadoop, Spark, h2o etc.).
During this training course, your attendees will be provided with essential know-how on applications of R language to manage, manipulate and analyse out-of-memory data, datasets stored in distributed file systems or large databases, and to write fast, parallel R code to allow scalability of algorithms and data processing. The course also serves as a good introduction to Cloud Computing (Amazon Web Services and Microsoft Azure) and the growing ecosystem of tools that support Big Data analytics including methods applicable to large scale statistical and machine learning (h2o, Spark).
This course can be combined with our “Machine Learning with R” and/or “Deep Learning with R” courses to form a powerful training programme which will cover the most exciting topics and cutting edge technologies within the data science community.
Basic course information
Minimum recommended duration: 4-5 full days or 8-10 half-days (can be spread across multiple weeks)
Programming languages used: R (plus basics of SQL and NoSQL e.g. MongoDB queries)
Minimum number of attendees: 5
Course level: For pre-intermediate/intermediate users of R.
Pre-requisites: Pre-intermediate/intermediate skills in data management, processing and analytics in R language are recommended for delegates attending this course. Understanding of basic concepts in statistics, Big Data analytics, data storage architectures and Big Data tools would be beneficial. It is advisable that the course is preceded with our “Applied Data Science with R”.
IT recommendations: This course is implemented using a Mind Project computing cluster and the only requirement is that the attendees have access to a PC/laptop with a stable broadband/Internet connection and a standard web browser installed e.g. Chrome, Mozilla Firefox, Internet Explorer, Opera, Safari. Please contact us should you wish to use a different setup for your course.
Programme outline
The programme for each in-house training course is discussed and agreed individually with the client. The proposed contents of the course may include (but is not limited to) the following concepts and topics:
Built-in core R methods and third-party R packages, which support parallel computing to boost the speed and data processing capabilities of R language,
Selected approaches to expand memory capabilities through a range of R packages e.g. ff, ffbase, bigmemory, bigtabulate, biganalytics, biglm etc.,
Speeding up data wrangling, exploratory and data analytics tasks with tidyverse (dplyr) and data.table packages,
Working with large data sets in the Cloud (Microsoft Azure and Amazon Web Services) through R deployed on the server – accessing data from Amazon S3 buckets, web-scraping methods,
Connecting to and extracting, aggregating and managing data with leading relational SQL-based database management systems (RDBMSs) using a variety of R packages (including dplyr, RMySQL, DBI, ROracle etc.),
Applying NoSQL queries to access, transform and manipulate large data sets in MongoDB through rmongodb, RMongo and mongolite packages,
Managing the Hadoop Distributed File System (HDFS), HBase and Hive databases, and implementing MapReduce framework for scalable out-of-memory data processing and analysis straight from the R console,
Improving the data flow and speed of processing of Big Data through R’s connectivity with Apache Spark engine (SparkR and sparklyr packages),
Applying scalable approaches to machine learning and predictive analytics using R with Spark, Hive and h2o,
Implementing selected Big Data tools in the Big Data Product Cycle with R.
Customise the course
We can adapt our in-house training courses to address your specific needs and requirements e.g.:
The course can be designed to include your own data. If it is not possible e.g. due to data security issues, we can customise the course to contain exercises that address similar problems,
The course period can be spread across multiple weeks/months depending on your needs and availability – this will allow your delegates to revise and practise the learnt skills before the next session and provide them with additional time to internalise all presented material,
The course can include a custom project spread across several weeks/months with a follow-up session at the end of the period,
As all our in-house training courses are quoted individually, the final cost quotation will be based on several factors: the number of attendees, days of training (plus additional support/project guidance if needed), location of the training, complexity of IT setup and the extent of course customisation.
Arrange this course at your organisation
If you are interested in this in-house training course, please press Ask For Quote button in the top part of the page to enquire about and request a quote for this course based on your specific needs and desired outcomes of the training.
In your enquiry please include the following information:
contact details to a person who should receive the quote,
number of delegates you would like to train,
approximate number of days (or half-days) you would like to arrange the course for (including additional support/project guidance if needed),
location of the training venue,
any details on course customisation or specific topics you would like the course to address – most importantly, please indicate desired outcomes of the course if different then presented above,
any other questions you may have.