Apache Storm Interview Questions and Answers

Apache Storm, in easy terms, is a distributed framework for real time process of massive data like Apache Hadoop is a distributed framework for batch processing. Apache Storm works on task similarity principle wherever within the same code is executed on multiple nodes with totally different input data.

Apache Storm adds reliable real-time data processing capabilities to Enterprise Hadoop. Storm on YARN is powerful for scenarios requiring real-time analytics, machine learning and continuous monitoring of operations. It is originally created by Nathan Marz and team at BackType, the project was open sourced after being acquired by Twitter. It uses custom created “spouts” and “bolts” to define information sources and manipulations to allow batch, distributed processing of streaming data. The initial release was on 17 September 2011.

A Storm application is meant as a “topology” within the form of a directed acyclic graph (DAG) with spouts and bolts acting because the graph vertices. Edges on the graph are named streams and direct data from one node to a different. Together, the topology acts as a data transformation pipeline. At a superficial level the general topology structure is similar to a MapReduce job, with the main difference being that data is processed in real time as opposed to in individual batches. Additionally, Storm topologies run indefinitely until killed, while a MapReduce job DAG must eventually end.

Scroll to Top