Hadoop Class Training Class will start on10th March at 08:30 - TopicsExpress



          

Hadoop Class Training Class will start on10th March at 08:30 A.M(IST) on Week days. if any interested candidates contact :+91- 9845631030(Prasad). Mail: [email protected] Course outline. The Motivation for Hadoop and NoSQL • Big Data • Problems with traditional large-scale systems • Requirements for a new approaches Introduction • An Overview of Hadoop • Comparing with SQL Databases • The Hadoop Distributed File System • MapReduce Programming model • Hadoop Common Utilities • Hadoop Ecosystem Components • Hadoop Architecture Building Blocks of Hadoop • Name node(NN) • Data node(DN) • Job Tracker(JT) • Task Tracker(TT) • SecondaryNamenode(SNN) Hadoop Cluster Setup • Configuration details • Local mode • Pseudo distributed mode • Distributed mode Components of Hadoop • Hadoop Distributed File System • MapReduce Programming model • Hadoop Common Utilities The Hadoop Distributed File System (HDFS) • HDFS Design & Concepts • Blocks, Replication • Hadoop dfs and dfsadmin Command-Line Interfaces • Basic File System Operations • Reading Data by HDFS Java Client API • Distributed Cache • DistCP - Data loading into HDFS parallel MapReduce Program • Building blocks of MapReduce • The MapReduce program flow (MR Skeleton) • Sample MapReduce Program • MapReduce API Concepts • The Mapper • The Reducer • The Combiner • The Partitioner • The Shuffle • Hadoop Data Types • Hadoop Serialization • Hadoop Streaming API (Any Programming Language) • Integrating Hadoop with R Language • Some MapReduce Program Examples Common MapReduce Algorithms • Sorting and Searching • Indexing • Crawling • Logs Processing • Machine Learning • Data Aggregation • TF-IDF • Word Co-Occurrence • Predictive Analytics Advanced MapReduce Programming • Custom Writables and WritableComparables • Saving Binary Data using SequenceFiles and Avro Files • Creating InputFormats and OutputFormats • Database Input and Output formats • Chaining MapReduce jobs • Joining data from different sources • Bloom filter concept Programming Practices • Develop MapReduce Programs • Monitoring cluster • Performance tuning • Sending Job specific parameters • Partitioning into multiple output files • Using Distributed Cache Job Scheduling and Monitoring • Job Submission • Schedulers (FIFO, Fair and Capacity) • Web UI • Adding third party libraries • Configuration Tuning Managing Hadoop Cluster • Setting up configuration parameters • Checking Cluster Health • Setting permissions • Adding Nodes • Removing Nodes • Managing Name Node and Secondary Name Node • Recovery Hadoop Ecosystem • Pig • Hive • HBase • Sqoop • Zookeeper • Cassandra • Mahout • Flume • Cloudata • Stratosphere • Accumulo • Kafka • Ambari • HCatalog • Oozie • dataFu • Many other We are going to provide very good number of MapReduce programs, Pig, Hive scripts for Unstructured data processing, ETL Work, Semi structured data processing, and Relational Database data processing. We will give clear explanation about Data Integration services like Flume, Sqoop. We will show how to ingest data In/Out of Hadoop by using Data Integration services as well as Distributed copying like distCp. We will give very clear explanation about How Organizations are adopting Hadoop and Hadoop Ecosystem into their business across the Verticals. We will talk about what we can Hadoopable and what We cannot. We will provide very clear Use cases of Hadoop in the areas of ETL, Text Mining, Natural Language Processing, Analytics, Information retrieval across verticals like Teclo, BFSI, Retail, E-Commerce, Digital Media, Search Engines, Data Ware housing, Pharma, Oil & Gas, Health care etc. We will provide complete plan, design, data ingestion, processing and report of A Project Web/Log Analytics. Duration : 60 hours Contact:+91-9845631030
Posted on: Tue, 04 Mar 2014 06:22:19 +0000

Recently Viewed Topics




© 2015