Big Data Training

Opportunity: Social Media, Cloud Computing, Mobile Applications are passé. The Next BIG opportunity is Big Data.

Challenge: Finding professionals who have expertise in Big Data.

Solution: We provide Instructor Led Classroom Training. Our Big Data course can help you develop the talent pool in Big Data.

Duration: The course can be customised based on your needs. This can be delivered in 1 to 2 weeks (40 to 80 Hours), 6 to 8 hours/day.

Big Data Training Program

Big data is a blanket term for any collection of data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications.
The challenges include capturing, storing, searching, sharing, transferring, analysing and visualising. Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data in a single data set.
This program teaches learners how to build distributed, data-intensive applications using the Hadoop framework. A key part of the course is having learners deliver a Big Data project that demonstrates they know how to work with the tools.

Objectives

  • Understand Hadoop architecture (HDFS and MapReduce)
  • Capture and Store Data using Sqoop, Flume, No SQL DB, HBDASE
  • Use Big Data tools (Pig, Hive, HBase, etc.)

Course Content

  • Foundation: What is Data?, Introduction to Big Data, Characteristics of Big Data, Introduction to Hadoop Ecosystem
  • Refresher: Java refresher, SQL Refresher, Unix basics and Command
  • Getting Started: Getting Started with Hadoop, Installation & Configuring of Hadoop Components
  • Data Capture & Storage: Sqoop, Flume, NoSQL Database, HBASE
  • Data Processing: Hadoop MapReduce Framework, Programming in MapReduce
  • Data Analytics: Understanding Analytics, Pig, Pig Latin, Hive, HiveQL
  • Management and Best Practices: Installing & Configuring Zookeeper, Managing HDFS, Routine Administration Procedures
  • Management and Best Practices: Real World Dataset and Analysis, Hadoop Project Environment, Project Discussion
  • Labs: Throughout the course, learners will perform hands-on exercises to practice the concepts taught

Pre-requisites for the Program

  • Basics of programming language: Concepts of OOP, Basics of scripting language (Like PERL or RUBY)
  • Basics of Linux/Unix operating systems
  • Good understanding of Java programming language: Core Java
  • Understanding of basic SQL statements

Recommended Readings

  • Hadoop Definitive Guide – by Tom White