- Informatica vs ab initio. …
- What is the relation between eme, gde and co-operating system? …
- What are the benefits of data processing according to you? …
- What exactly do you understand with the term data processing and businesses can trust this approach?
AbInitio Interview Questions and Answers | BI |ETL |
Mention what is Abinitio?
“Abinitio” is a latin word meaning “from the beginning.” Abinitio is a tool used to extract,
transform and load data. It is also used for data analysis, data manipulation, batch processing,
and graphical user interface based parallel processing.
Mention what is the role of Co-operating system in Abinitio?
The Abinitio co-operating system provide features like
Manage and run Abinitio graph and control the ETL processes
Provide Ab initio extensions to the operating system
ETL processes monitoring and debugging
Meta-data management and interaction with the EME
Differences Between Ab-Initio and Informatica?
Answer: Informatica and Ab-Initio both support parallelism. But Informatica supports only one type of parallelism but the Ab-Initio supports three types of parallelisms. Component Data Parallelism Pipe Line parallelism. We dont have scheduler in Ab-Initio like Informatica , you need to schedule through script or you need to run manually. Ab-Initio supports different types of text files means you can read same file with different structures that is not possible in Informatica, and also Ab-Initio is more user friendly than Informatica . Informatica is an engine based ETL tool, the power this tool is in its transformation engine and the code that it generates after development cannot be seen or modified. Ab-Initio is a code based ETL tool, it generates ksh or bat etc. code, which can be modified to achieve the goals, if any that can not be taken care through the ETL tool itself. Initial ramp up time with Ab-Initio is quick compare to Informatica, when it comes to standardization and tuning probably both fall into same bucket. Ab-Initio doesnt need a dedicated administrator, UNIX or NT admin will suffice, where as Informatica need a dedicated administrator. With Ab-Initio you can read data with multiple delimiter in a given record, where as Informatica force you to have all the fields be delimited by one standard delimiter Error Handling – In Ab-Initio you can attach error and reject files to each transformation and capture and analyze the message and data separately. Informatica has one huge log! Very inefficient when working on a large process, with numerous points of failure.
Why we go for Ab-Initio?
Answer : Ab-Initio designed to support largest and most complex business applications. We can develop applications easily using GDE for Business requirements. Data Processing is very fast and efficient when compared to other ETL tools. Available in both Windows NT and UNIX
List out the file extensions used in Abinitio?
- The file extensions used in Abinitio are
.mp: It stores Ab initio graph or graph component
.mpc: Custom component or program
.mdc: Dataset or custom data-set component
.dml: Data manipulation language file or record type definition
.xfr: Transform function file
.dat: Data file (multifile or serial file)
What is meant by limit and ramp in Ab-Initio? Which situation it’s using?
Answer: The limit and ramp are the variables that are used to set the reject tolerance for a particular graph. This is one of the option for reject-threshold properties. The limit and ramp values should pass if enables this option. Graph stops the execution when the number of rejected records exceeds the following formula. limit + (ramp * no_of_records_processed). The default value will be set to 0.0. The limit parameter contains an integer that represents a number of reject events The ramp parameter contains a real number that represents a rate of reject events in the number of records processed. Typical Limit and Ramp settings Limit = 0 Ramp = 0.0 Abort on any error Limit = 50 Ramp = 0.0 Abort after 50 errors Limit = 1 Ramp = 0.01 Abort if more than 2 in 100 records causes error Limit = 1 Ramp = 1 Never Abort
What is meant by merge join and hash join? Where those are used in Ab Initio?
Answer: The command line syntax for Join Component consists of two commands. The first one calls the component, and is one of two commands:
What are the most commonly used components in a Ab-Initio graphs?
Answer: input file / output file input table / output table lookup / lookup_local reformat gather / concatenate join run sql join with db compression components filter by expression sort (single or multiple keys) rollup partition by expression / partition by key
What are the most important components of the architecture of Abinitio?
The most important components that the architecture of Abinitio includes are as follows:
- GDE (Graphical Development Environment)
- Co-operating System
- Enterprise meta-environment (EME)
- Conduct-IT
What does dependency analysis mean in Ab Initio?
Answer : Dependency Analysis It analyses the Project for the dependencies within and between the graphs. The EME examines the Project and develops a survey tracing how data is transformed and transferred field by field from component to component. Dependency analysis has two basic steps:
Analysis Level: In the check in wizard’s advanced options, the analysis level can be specified as one of the following:
Graph being checked in is translated to data store format but no error checking is done. This is the minimum requirement during check in.
Along with the translation, errors, which will interfere with dependency analysis, are checked for. These include:
Full dependency analysis is done during check in. It is not recommended as takes a long time and in turn can delay the check in process. What to analyse:
Analyse all files in the Project
Analyse all files that have been changed or which are dependent on or required by files that have changed since the last time they were analysed.
All files checked in by you would be analysed if they have not been before.
Apply analysis to the file specified only.
What is data encoding in Abinitio?
In Abinitio, data encoding is an approach that is used to keep data confidential. In this approach, we ensure that the information remains in a form that cannot be understood by someone else other than the sender and the receiver.
Is it possible to run a graph infinitely in Ab Initio? If yes, how?
Yes, it is possible to run a graph infinitely in Ab Initio. To do so, the graph end script should call the .ksh file of the graph. After that, if the graph name is xyz.mp then in the end script of the graph, it should call to xyz.ksh. By following the above steps, we can run the graph for infinitely.
What is the importance of EME in abinitio?
Answer: EME is a repository in Ab Inition and it used for checkin and checkout for graphs also maintains graph version.
What is $mpjret? Where it is used in ab-initio?
Answer: $mpjret is return value of shell command “mp run” execution of Ab-Initio graph. this is generally treated as graph execution status return value
What is the latest version that is available in Ab-initio?
Answer: The latest version of GDE ism1.15 AND Co>operating system is 2.14
What is mean by Co>Operating system and why it is special for Ab-initio?
Answer: Co-Operating systems, that itself means a lot, its not merely an engine or interpretor. As it says, its an operating system which co-exists with another operating system. What does that mean…. in laymans term abinitio, unlike other applications, does not sit as a layer on top of any OS? It itself has quite a lot of operating system level capabilities such as multi files, memory management and so on and this way it completely integrate with any other OS and work jointly on the available hardware resources. This sort of Synergy with OS optimize the utilization of available hardware resources. Unlike other applications (including most other ETL tools) it does not work like a layer and interprete the commands. That is the major difference with other ETL tools , this is the reason why abinitio is much much faster than any other ETL tool and obviously much much costlier as well.
What are the contineous components in Abinitio?
Answer: Contineous components used to create graphs,that produce useful output file while running continously Ex:- Contineous rollup,Contineous update,batch subscribe
What is meant by fancing in abinitio ?
Answer:The word Abinitio means from the beginning. did you mean “fanning” ? “fan-in” ? “fan-out” ?
What is the use of aggregation when we have rollup as we know rollup component in abinitio is used to summirize group of data record. then where we will use aggregation ?
Answer: Aggregation and Rollup both can summerise the data but rollup is much more convenient to use. In order to understand how a particular summerisation being rollup is much more explanatory compared to aggregate. Rollup can do some other functionalities like input and output filtering of records.
What are kinds of layouts does ab initio supports ?
Answer: Basically there are serial and parallel layouts supported by AbInitio. A graph can have both at the same time. The parallel one depends on the degree of data parallelism. If the multi-file system is 4-way parallel then a component in a graph can run 4 way parallel if the layout is defined such as its same as the degree of parallelism.
What does dependency analysis mean in Ab-Initio?
Answer: dependency analysis will answer the questions regarding data linage that is where does the data comes from and what applications produced depend on this data etc..
What is meant by Fencing in Ab-Initio?
Answer: In Software World fencing means job controlling on priority basis. In Ab-Initio it actually refers to customized phase breaking. A well fenced graph means no matter what is source data volume process will not cough in dead locks. It actually limits the number of simultaneous processes. In Ab-Initio you need to Fence the job in some times to stop the schedule. Fencing is nothing but changing the priority of the particular job.
What is $mpjret? Where it is used in ab-Initio?
Answer: $mpjret gives the status of a graph. U can use $mpjret in end script like if 0 -eq($mpjret) then echo success else mailx -s failed mail_id
What is meant by fancing in abinitio?
Answer: The word Abinitio means from the beginning.
How to handle if DML changes dynamically in abinitio?
Answer: If the DML changes dynamically then both dml and xfr has to be passed as graph level parameter during the runtime. By parametrization or by conditional record format or by metadata
What is mean by Co>Operating system and why it is special for Ab-Initio?
Answer: Co > Operating System:Layered top to the Native operating system. It converts the Ab-Initio specific code into the format, which the UNIX/Windows can understand and feeds it to the native operating system, which carries out the task.
What does layout means in terms of Ab Initio?
Answer: Before you can run an Ab Initio graph, you must specify layouts to describe the following to the Co>Operating System:
A layout is one of the following:
Every component in a graph — both dataset and program components — has a layout. Some graphs use one layout throughout; others use several layouts and repartition data as needed for processing by a greater or lesser number of processors. During execution, a graph writes various files in the layouts of some or all of the components in it. For example:
What is the importance of EME in ab initio?
Answer: EME is a repository in Ab Inition and it used for checkin and checkout for graphs also maintains graph version.
What are the contineous components in Abinitio?
Answer: Contineous components used to create graphs,that produce useful output file while running continuously Ex:- Contineous rollup,Contineous update,batch subscribe.
What is mean by Co > Operating system and why it is special for Ab-initio ?
Answer: Co > Operating System: It converts the AbInitio specific code into the format, which the UNIX/Windows can understand and feeds it to the native operating system, which carries out the task.
How to Create Surrogate Key using Ab Initio?
Answer. A key is a field or set of fields that uniquely identifies a record in a file or table. A natural key is a key that is meaningful in some business or real-world sense. For example, a social security number for a person, or a serial number for a piece of equipment, is a natural key. A surrogate key is a field that is added to a record, either to replace the natural key or in addition to it, and has no business meaning. Surrogate keys are frequently added to records when populating a data warehouse, to help isolate the records in the warehouse from changes to the natural keys by outside processes.
How many types of joins are in Ab-Initio?
Answer: Join is based on a match key for inputs, Join components describes out port, unused ports, reject ports and log port. Inner Joins: The most common case is when join-type is Inner Join. In this case, if each input port contains a record with the same value for the key fields, the transform function is called and an output record is produced. If some of the input flows have more than one record with that key value, the transform function is called multiple times, once for each possible combination of records, taken one from each input port.Whenever a particular key value does not have a matching record on every input port and Inner Join is specified, the transform function is not called and all incoming records with that key value are sent to the unused ports. Full Outer Joins: Another common case is when join-type is Full Outer Join: if each input port has a record with a matching key value, Join does the same thing as it does for an Inner Join. If some input ports do not have records with matching key values, Join applies the transform function anyway, with NULL substituted for the missing records. The missing records are in effect ignored. With an Outer Join, the transform function typically requires additional rules (as compared to an Inner Join) to handle the possibility of NULL inputs. Explicit Joins: The final case is when join-type is Explicit. This setting allows you to specify True or False for the record-required n parameter for each in n port. The settings you choose determine when Join calls the transform function. The join-type and record-required n Parameters The two intersecting ovals in the diagrams below represent the key values in the records on the two ports — in0 and in1 — that are the inputs to join: For each possible setting of join-type or (if join-type is Explicit) combination of settings for record-required n, the shaded region of each of the following diagrams represents the inputs for which Join calls the transform. Join ignores the records that have key values represented by the white regions, and consequently those records go to the unused port.
How to execute the graph from start to end stages?Tell me and how to run graph in non Ab-Initio system?
Answer: There are so many ways to do this, 1.you can run components according to phases how you defined. 2.by creating ksh, sh scripts also you can run.
How to Create Surrogate Key using Ab-Initio?
Answer: A surrogate key is a substitution for the natural primary key. –It is just a unique identifier or number for each record like ROWID of an Oracle table. Surrogate keys can be created using 1)next_in_sequence 2)this_partition 3)no_of_partitions
How many parallelisms are in Abinitio? Please give a definition of each.
Answer: There are 3 kinds of Parallelism: 1) Data Parallesim 2)Componnent Paralelism 3) Pipeline. When the data is divided into smalll chunks and processed on different components simultaneously we call it DataParallelism When different components work on different data sets it is called Component parallelism When a graph uses multiple components to run on the same data simultaneously we call it Pipeline parallelism
This mode option can evaluates using (without aggregation functions) user defined functions alike temporary function, initialize, finalize and rollup functions in transform function propriety. Scan generates a series of cumulative summary records — such as successive year-to-date totals for groups of data records. Scan produces intermediate summary records. Rollup is for group by and Scan is for successive total. Basically, when we need to produce summary then we use scan. Rollup is used to aggregate data.
How to Create Surrogate Key using Ab Initio?
Ans. A key is a field or set of fields that uniquely identifies a record in a file or table. A natural key is a key that is meaningful in some business or real-world sense. For example, a social security number for a person, or a serial number for a piece of equipment, is a natural key. A surrogate key is a field that is added to a record, either to replace the natural key or in addition to it, and has no business meaning. Surrogate keys are frequently added to records when populating a data warehouse, to help isolate the records in the warehouse from changes to the natural keys by outside processes.
How to Improve Performance of graphs in Ab initio? Give some examples or tips.
Ans. There are somany ways to improve the performance of the graphs in Abinitio. I have few points from my side. 1.Use MFS system using Partion by Round by robin. 2.If needed use lookup local than lookup when there is a large data. 3.Takeout unnecessary components like filter by exp instead provide them in reformat/Join/Rollup. 4.Use gather instead of concatenate. 5.Tune Max_core for Optional performance. 6.Try to avoid more phases. There are many ways the performance of the graph can be improved. 1) Use a limited number of components in a particular phase 2) Use optimum value of max core values for sort and join components 3) Minimise the number of sort components 4) Minimise sorted join component and if possible replace them by in-memory join/hash join 5) Use only required fields in the sort, reformat, join components 6) Use phasing/flow buffers in case of merge, sorted joins 7) If the two inputs are huge then use sorted join, otherwise use hash join with proper driving port 8) For large dataset dont use broadcast as partitioner 9) Minimise the use of regular expression functions like re_index in the trasfer functions 10) Avoid repartitioning of data unnecessarily
How do you done the unit testing in Ab-Initio? How will you perform the Ab-Initio Graphs executions? How will you increase the performance in Ab-Inito graphs?
Answer: The Ab-Initio Co>operating system is handling the graph with multiple processes running simultaneously. This is primary performance. Follows the given below actions:
How to Improve Performance of graphs in Ab initio? Give some examples or tips.
There are so many ways to improve the performance of the graphs in Ab initio. Here are few points.
Can we load multiple files?
Answer: Load multiple files from my perspective means writing into more than one file at a time. If this is the same case with you, Ab initio provides a component called Write Multiplefiles (in dataset Component group) which can write multiple files at a time. But the files which are to be written must be local files i.e., they should reside in your local PC. For more information on this component read in help file.
Can You Explain What Co-Operating System Does in Ab Initio?
The Ab Initio Co-Operating system plays several essential roles, which explains why it is a major tool component. First, it manages and runs the Ab Initio graph, thus controlling the extraction, transformation, and loading processes. Secondly, it offers all the necessary Ab Initio extensions to the operating system, which is an equally important role. Thirdly, it facilitates metadata management and its interaction with EME. Lastly, a Co-Operating system in Ab Initio manages the monitoring and debugging processes in ETL.
Explain what is the architecture of Abinitio?
The syntax for m_dump in Abinitio is used to view the data in multifile from unix prompt. The command for m_dump includes
Mention what is Abinitio?
De-partition is done in order to read data from multiple flow or operations and are used to re-join data records from different flows. There are several de-partition components available which includes Gather, Merge, Interleave, and Concatenation.