Abinitio interview questions - Latest interview questions about Ab Initio with their answers. PDF version also available with list of questions.
What is Abinitio?
“Abinitio” is a latin word meaning “from the beginning.” Abinitio is a tool used to extract, transform and load data. It is also used for data analysis, data manipulation, batch processing, and graphical user interface based parallel processing.

What is the architecture of Abinitio?
Architecture of Abinitio includes
  1. GDE (Graphical Development Environment)
  2. Co-operating System
  3. Enterprise meta-environment (EME)
  4. Conduct-IT

What is the role of Co-operating system in Abinitio?
The Abinitio co-operating system provide features like
  1. Manage and run Abinitio graph and control the ETL processes
  2. Provide Abinitio extensions to the operating system
  3. ETL processes monitoring and debugging
  4. Meta-data management and interaction with the EME

What does dependency analysis mean in Abinitio?
In Abinitio, dependency analysis is a process through which the EME examines a project entirely and traces how data is transferred and transformed- from component-to-component, field-by-field, within and between graphs.

How Abinitio EME is segregated?
Abinition is logically divided into two segments
  1. Data Integration Portion
  2. User Interface (Access to the meta-data information)

How can you connect EME to Abinitio Server?
To connect with Abinitio Server, there are several ways like
  • Set AB_AIR_ROOT
  • Login to EME web interface- https://serverhost:[serverport]/abinitio
  • Through GDE, you can connect to EME data-store
  • Through air-command

List out the file extensions used in Abinitio?
The file extensions used in Abinitio are
  • .mp: It stores Abinitio graph or graph component
  • .mpc: Custom component or program
  • .mdc: Dataset or custom data-set component
  • .dml: Data manipulation language file or record type definition
  • .xfr: Transform function file
  • .dat: Data file (multifile or serial file)

What information does a .dbc file extension provides to connect to the database?
The .dbc extension provides the GDE with the information to connect with the database are
  • Name and version number of the data-base to which you want to connect
  • Name of the computer on which the data-base instance or server to which you want to connect runs, or on which the database remote access software is installed
  • Name of the server, database instance or provider to which you want to link

How you can run a graph infinitely in Abinitio?
To execute graph infinitely, the graph end script should call the .ksh file of the graph. Therefore, if the graph name is abc.mp then in the end script of the graph it should call to abc.ksh. This will run the graph for infinitely.

What is SANDBOX?
A SANDBOX is referred for the collection of graphs and related files that are saved in a single directory tree and behaves as a group for the purposes of navigation, version control, and migration.

What the difference between “Look-up” file and “Look is up” in Abinitio?
Lookup file defines one or more serial file (Flat Files); it is a physical file where the data for the Look-up is stored.  While Look-up is the component of abinitio graph, where we can save data and retrieve it by using a key parameter.

What are the different types of parallelism used in Abinitio?
Different types of parallelism used in Abinitio includes
Component parallelism: A graph with multiple processes executing simultaneously on separate data uses parallelism
Data parallelism: A graph that works with data divided into segments and operates on each segments respectively, uses data parallelism.
Pipeline parallelism: A graph that deals with multiple components executing simultaneously on the same data uses pipeline parallelism. Each component in the pipeline read continuously from the upstream components, processes data and writes to downstream components.  Both components can operate in parallel.

What is Sort Component in Abinitio?
The Sort Component in Abinitio re-orders the data. It comprises of two parameters “Key” and “Max-core”.
Key: It is one of the parameters for sort component which determines the collation order
Max-core: This parameter controls how often the sort component dumps data from memory to disk
What dedup-component and replicate component does?
Dedup component: It is used to remove duplicate records
Replicate component: It combines the data records from the inputs into one flow and writes a copy of that flow to each of its output ports

What is a partition and what are the different types of partition components in Abinitio?
In Abinitio, partition is the process of dividing data sets into multiple sets for further processing. Different types of partition component includes
Partition by Round-Robin: Distributing data evenly, in block size chunks, across the output partitions
Partition by Range: You can divide data evenly among nodes, based on a set of partitioning ranges and key
Partition by Percentage: Distribution data, so the output is proportional to fractions of 100
Partition by Load balance: Dynamic load balancing
Partition by Expression: Data dividing according to a DML expression
Partition by Key: Data grouping by a key

What is de-partition in Abinitio?
De-partition is done in order to read data from multiple flow or operations and are used to re-join data records from different flows. There are several de-partition components available which includes Gather, Merge, Interleave, and Concatenation.

List out some of the air commands used in Abintio?
Air command used in Abinitio includes
air object Is:  It is used to see the listings of objects in a directory inside the project
air object rm: It is used to remove an object from the repository
air object versions-verbose: It gives the version history of the object.
Other air command for Abinitio include air object cat, air object modify, air lock show user, etc.

What is Rollup Component?
Roll-up component enables the users to group the records on certain field values.  It is a multiple stage function and consists initialize 2 and Rollup 3.

What is the difference between rollup and scan?
By using rollup we cant generate cumulative summary records for that we will be
using scan.

What is the syntax for m_dump in Abinitio?
The syntax for m_dump in Abinitio is used to view the data in multifile from unix prompt. The command for m_dump includes
m_dump a.dml a.dat: This command will print the data as it manifested from GDE when we view data in formatted text
m_dump a.dml a.dat>b.dat: The output is re-directed in b.dat and will act as a serial file.b.dat that can be referred when it is required.

What is the relation between eme, gde and co-operating system?
Eme is said as enterprise metadataenv, gde as graphical development env and co-operating system can be said as abinitio server relation b/w this co-op, eme and gde is as fallowsco operating system is the abinitio server. This co-op is installed on particular o.s platform that is called native o.s .coming to the eme, its just as repository in Informatica, its hold the metadata, transformations, dbconfig files source and targets information’s. Coming to gde its is end user environment where we can develop the graphs (mapping just like in Informatica) designer uses the gde and designs the graphs and save to the eme or sand box it is at user side. Where eme is at server side.

How can you run a graph infinitely?
To run a graph infinitely, the end script in the graph should call the .ksh file of the graph. Thus if the name of the graph is abc.mp then in the end script of the graph there should be a call to abc.ksh. Like this the graph will run infinitely.

How do you add default rules in transformer?
Double click on the transform parameter of parameter tab page of component properties, it will open transform editor. In the transform editor click on the Edit menu and then select Add Default Rules from the dropdown.
It will show two options –
  1. Match Names
  2. Wildcard

What a local lookup is?
If your lookup file is a multifile and partioned/sorted on a particular key then local lookup function can be used ahead of lookup function call. This is local to a particular partition depending on the key.
Lookup File consists of data records which can be held in main memory. This makes the transform function to retrieve the records much faster than retrieving from disk. It allows the transform component to process the data records of multiple files fast.

What is the difference between look-up file and look-up, with a relevant example?
Generally Lookup file represents one or more serial files (Flat files). The amount of data is small enough to be held in the memory. This allows transform functions to retrieve records much more quickly than it could retrieve from Disk.
A lookup is a component of abinitio graph where we can store data and retrieve it by using a key parameter.A lookup file is the physical file where the data for the lookup is stored.

What is lookup?
Lookup is basically a specific dataset which is keyed. This can be used to mapping values as per the data present in a particular file (serial/multi file). The dataset can be static as well dynamic ( in case the lookup file is being generated in previous phase and used as lookup file in current phase). Sometimes, hash-joins can be replaced by using reformat and lookup if one of the inputto the join contains less number of records with slim record length.AbInitio has built-in functions to retrieve values using the key for the lookup.

What is an outer join?
An outer join is used when one wants to select all the records from a port – whether it has satisfied the join criteria or not.

What are primary keys and foreign keys?
In RDBMS the relationship between the two tables is represented as Primary key and foreign key relationship. Whereas the primary key table is the parent table and foreign key table is the child table. The criteria for both the tables are there should be a matching column.

How do you truncate a table?
From Abinitio run sql component using the DDL “truncate table by using the truncate table component in Ab Initio

How to run the graph without GDE?
In RUN ==> Deploy >> As script, it create a .bat file at ur host directory ,and then run .bat file from Command prompt

How do you improve the performance of a graph?
There are many ways the performance of the graph can be improved.
  • Use a limited number of components in a particular phase
  • Use optimum value of max core values for sort and join components
  • Minimize the number of sort components
  • Minimize sorted join component and if possible replace them by in-memory join/hash join
  • Use only required fields in the sort, reformat, join components
  • Use phasing/flow buffers in case of merge, sorted joins
  • If the two inputs are huge then use sorted join, otherwise use hash join with proper driving port
  • For large dataset don’t use broadcast as partitioner
  • Minimize the use of regular expression functions like re_index in the transfer functions
  • Avoid repartitioning of data unnecessarily

Try to run the graph as long as possible in MFS. For these input files should be partitioned and if possible output file should also be partitioned.

How do you truncate a table?
There are many ways to do it.
  • Probably the easiest way is to use Truncate Table
  • Run Sql or update table can be used to do the same thing
  • Run Program


What is the difference between a DB config and a CFG file?
A .dbc file has the information required for Ab Initio to connect to the database to extract or load tables or views. While .CFG file is the table configuration file created by db_config while using components like Load DB Table

What is data mapping and data modelling?
Data mapping deals with the transformation of the extracted data at FIELD level
i.e. the transformation of the source field to target field is specified by the mapping
defined on the target field. The data mapping is specified during the cleansing of the
data to be loaded.
For Example:
source;
string(35) name = "Siva Krishna ";
target;
string("01") nm=NULL("");/*(maximum length is string(35))*/
Then we can have a mapping like:
Straight move.Trim the leading or trailing spaces.
The above mapping specifies the transformation of the field nm

What are the Graph parameter?
There are 2 types of graph parameters in AbInitio
  • local parameter
  • Formal parameters.(those parameters working at runtime)


What are kinds of layouts does ab initio supports?
Basically there are serial and parallel layouts supported by AbInitio. A graph can have both at the same time. The parallel one depends on the degree of data parallelism. If the multi-file system is 4-way parallel then a component in a graph can run 4 way parallel if the layout is defined such as it's same as the degree of parallelism

What are primary keys and foreign keys?
In RDBMS the relationship between the two tables is represented as Primary key and foreign key relationship.Wheras the primary key table is the parent table and foreignkey table is the child table.The criteria for both the tables is there should be a matching column.

What are Cartesian joins?
Cartesian join will get you a Cartesian product. A Cartesian join is when you join every row of one table to every row of another table. You can also get one by joining every row of a table to every row of itself.

How can i run the 2 GUI merge files?
Do you mean by merging Gui map files in WR.If so, by merging GUI map files in GUI map editor it wont create corresponding test script.without testscript you cant run a file.So it is impossible to run a file by merging 2 GUI map files.

What is .abinitiorc and What it contain?
abinitiorc is the config file for ab initio. It is found in user's home directory. Generally it is used to contain abinitio home path, different log in information like id encrypted password login method for hosts where the graph connects in time of execution.
It may contain inf like EME host and others.

What are local and formal parameter?
Two are graph level parameters but in local you need to initialize the value at the time of declaration where as global no need to initialize the data it will promt at the time of running the graph for that parameter.
Request to Download PDF

Post A Comment:

0 comments: