Microsoft SQL Server Integration Services (SSIS) is an Extract, Transform and Load (ETL) tool. It can be used to extract data from a wide variety of sources such as Excel Files, XML Files, Flat Files, Relational databases, and transform them as per requirements, and finally load the data into the destination. SSIS is a business intelligence tool that provides data transformation solutions for various organizations. It is designed for data integration and workflow for extract, transform and load (ETL) operations used in data warehousing.
SSIS stands for SQL Server Integration Services. It was introduced by Microsoft SQL server 2005.It is mainly used for data transformation , ETL operations and data integration solutions. Some of the more common transformations It allows you to add auditing information as required in auditing world specified by HIPPA and Sarbanes-Oxley (SOX). Auditing options that you can add to transformed data through this transformation are: This component simply copies a column to another new column. Just like ALIAS Column in T-Sql. Derived column creates new column or put manipulation of several columns into new column. You can directly copy existing or create a new column using more than one column also. There are 3 data flow components in SSIS The best and fastest way to do incremental load is by using Timestamp column in the source table and storing the last ETL timestamp. A checkpoint is the property in SSIS which enables the project to restart from the point of failure. When we set the property to true package create the checkpoint file which stores the information about package execution and use to restart package from the point of failure. If the package runs successfully, the checkpoint file is deleted, and then re-created the next time the package runs. Manifiest file is the utility which can be used to deploy the package using wizard on file system and SQL Server database Script Task is the latest version for the deprecated feature ActiveX Script. Both are used to implement extended functionality in SSIS. ActiveX script supports VBScript and JScript where as Script Task supports VB.Net and C#.Net. Script Task is preferable as ActiveX Script has been removed in MSSQL 2012. Script Task is supported with integrated help, IntelliSense, debugging and can reference external Dotnet assembles. Both are used to extend the native functionality of SSIS. Script Task is to enhance the functionality for control flow where as Script Component is to enhance the functionality for Data flow. Script Task can handle the execution of parts of the package where as Script Component can handle the data flow and transformations by processing row by row. Oracle: Native OLEDB\Microsoft OLEDB Provider for Oracle Native .Net providers\ or .Net providers for OLEDB\ DB2: Native OLEDB\Microsoft OLEDB Provider for DB2 Native .Net providers\ , .Net providers\ ODBC OR .Net providers for OLEDB\ MySQL: .Net Providers \ MySQL Data Provider Or .Net Providers \ ODBC This transformation sends output to multiple output paths with no conditional as Conditional Split does. Takes ONE Input and makes the COPY of data and passes the same data through many outputs. In simple Give one input and take many outputs of the same data. The Fuzzy grouping task performs the same operations as the Fuzzy Lookup task but instead of evaluating input records against an outside reference table, the input set becomes the reference. Input records are therefore evaluated against other records in the input set and evaluated for similarity and assigned to a group. A task is very much like a method of any programming language which represents or carries out an individual unit of work. There are broadly two categories of tasks in SSIS, Control Flow tasks and Database Maintenance tasks. All Control Flow tasks are operational in nature except Data Flow tasks. Although there are around 30 control flow tasks which you can use in your package you can also develop your own custom tasks with your choice of .NET programming language Workflow is a set of instructions on to specify the Program Executor on how to execute tasks and containers within SSIS Packages When you start working with SSIS, you first create a package which is nothing but a collection of tasks or package components. The control flow allows you to order the workflow, so you can ensure tasks/components get executed in the appropriate order A transformation simply means bringing in the data in a desired format. For example, you are pulling data from the source and want to ensure only distinct records are written to the destination, so duplicates are removed. Another example is if you have master/reference data and want to pull only related data from the source and hence you need some sort of lookup. There are around 30 transformation tasks available and this can be extended further with custom built tasks if needed. Config file in SSIS is used to provide inputs to connection manager different properties which package use to assign values at runtime dynamically. Using config file user need to make changes to the config file which package will take automatically at runtime because of using it you don’t need to every time make changes in the packages in case you are deploying package on multiple servers or locations. There are multiple ways in which configuration values can be stored. XML configuration file: Store the config file as an XML file.. Environment variable Store the config in on of the environment variables. Registry entry Store the confi in the registry Parent package variable Store the config as a variable in the package that contains the tasks. SQL Server Store the config in t a table in SQL Server It is very common to get all kinds of sources so the more the person worked with the better for you. Common ones are SQL Server, CSV/TXT, Flat Files, Excel, Access, Oracle, MySQL but also Salesforce, web data scrapping. When a data flow component applies a transformation to column data, extracts data from sources, or loads data into destinations, errors can occur. Errors frequently occur because of unexpected data values. Errors typically fall into one the following categories: Data conversion errors: occurs if a conversion results in loss of significant digits, the loss of insignificant digits, and the truncation of strings. Data conversion errors also occur if the requested conversion is not supported. Expression evaluation errors: occurs if expressions that are evaluated at run time perform invalid operations or become syntactically incorrect because of missing or incorrect data values. Lookup errors: occurs if a lookup operation fails to locate a match in the lookup table. Many data flow components support error outputs, which let you control how the component handles row-level errors in both incoming and outgoing data. You specify how the component behaves when truncation or an error occurs by setting options on individual columns in the input or output. The OLE DB Command Transform is a component designed to execute a SQL Statement for each row in an input stream. This task is analogous to an ADO Command Object being created, prepared, and executed for each row of a result set. The input stream provides that data for parameters that can be set into the SQL Statement that is either an In – line statement or a stored procedure call. A precedence constraint is a link between 2 control flow tasks and lays down the condition on which the second task is run. They are used to control the workflow of the package. There are 3 kinds of precedence constraint – success (green arrow), failure (red arrow) or Completion script task (blue arrow). By default, when we add 2 tasks, it links by green arrow. The way the precedence constraint is evaluated can be based on outcome of the initial task. Also, we can add expression to evaluate such outcome. Any expression that can be judged as true or false can be used for such purpose. The precedence constraint is very useful in error handling in SSIS package. Variable allows us to dynamically control the package at runtime. Example: You have some custom code or script that determines the query parameter’s value. Now, we cannot have fixed value for query parameter. In such scenarios, we can use variables and refer the variable to query parameter. We can use variables for like: We can customize SSIS through code by using Script Task. The main purpose of this task is to control the flow of the package. This is very useful in the scenario where the functionality you want to implement is not available in existing control flow item. To add your own code: Grid Histogram Scatter Plot Column Chart DTSEXECUI: When this command line tool is run a user interface is loaded in order to configure each of the applicable parameters to execute an SSIS package. DTEXEC: This is a pure command line tool where all of the needed switches must be passed into the command for successful execution of the SSIS package. As the name suggest, this transformation splits the data based on condition and route them to different path. The logic for this transformation is based on CASE statement. The condition for this transformation is an expression. This transformation also provides us with default output, where rows matching no condition are routed. Conditional split is useful in scenarios like Telecom industry data you want to divide the customer data on gender, condition would be: GENDER == ‘F’ Runtime engine: The Integration Services runtime saves the layout of packages, runs packages, and provides support for logging, breakpoints, configuration, connections, and transactions. API or object model: The Integration Services object model includes managed application programming interfaces (API) for creating custom components for use in packages, or custom applications that create, load, run, and manage packages. Developer can write custom applications or custom tasks or transformations by using any common language runtime (CLR) compliant language. Integration Services service: It is a Windows service, monitors running SSIS packages and manages the storage of packages. Data flow: It contains a data flow engine that manages the data flow components. There are 3 types of Data Flow components: Source components (which extracts the data from a system), Transformation components (performs transformations, modifications onto the extracted data) and Load components (which simply performs the data loading tasks into the destination systems). Besides the available data flow components, we can write our own custom data flow components to accomplish any custom requirements. A breakpoint is a stopping point in the code. The breakpoint can give the Developer\DBA an opportunity to review the status of the data, variables and the overall status of the SSIS package. Breakpoints are setup in BIDS. In BIDS, navigate to the control flow interface. Right click on the object where you want to set the breakpoint and select the ‘Edit Breakpoints…’ option. Containers can be defined as objects that stores one or more tasks. The primary purpose of container is grouping logically related tasks. Once the task is placed into the containers, we can perform various operations such as looping on container level until the desired criterion is met. Nesting of container is allowed. Container is placed inside the control flow. There are 4 types of Container: It is 3rd party free component used rather frequently to output errors into XML field which saves development time. This transformation converts the datatype of input columns to different datatype and then route the data to output columns. This transformation can be used to: This data conversion transformation is very useful where you want to merge the data from different source into one. This transformation can remove the abnormality of the data. Example à The Company’s offices are located at different part of world. Each office has separate attendance tracking system in place. Some offices stores data in Access database, some in Oracle and some in SQL Server. Now you want to take data from all the offices and merged into one system. Since the datatypes in all these databases vary, it would be difficult to perform merge directly. Using this transformation, we can normalize them into single datatype and perform merge. An error handler allows us to create flows to handle errors in the package in quite an easy way. Through event handler tab, we can name the event on which we want to handle errors and the task that needs to be performed when such an error arises. We can also add sending mail functionality in event of any error through SMTP Task in Event handler. This is quite useful in event of any failure in office non-working hours. In Data flow, we can handle errors for each connection through following failure path or red arrow. Control flow deals with orderly processing of individual, isolated tasks, these tasks are linked through precedence constraints in random order. Also the output for task has finite outcome i.e., Success, Failure, or Completion. A subsequent task does not initiate unless its predecessor has completed. Data flow, on the other hand, streams the data in pipeline manner from its source to a destination and modifying it in between by applying transformations. Another distinction between them is the absence of a mechanism that would allow direct transfer of data between individual control flow tasks. On the other hand, data flow lacks nesting capabilities provided by containers. In the data flow, we will use one OLE DB source, data conversion transformation and one OLE DB destination or SQL server destination. OLE DB source is data source is useful for reading data from Oracle, SQL Server and Access databases. Data Conversion transformation would be needed to remove datatype abnormality since there is difference in datatype between the two databases (Access and SQL Server) mentioned. If our database server is stored on and package is run from same machine, we can use SQL Server destination otherwise we need to use OLE DB destination. The SQL Server destination is the destination that optimizes the SQL Server. What is SSIS?
What are the Data Flow Transformations in SSIS?
What are the different types of files that support SSIS?
Explain Audit Transformation?
What are Advantages of SSIS?
Explain Copy column Transformation?
Explain Derived column Transformation?
What are the different types of Data flow components in SSIS?
How can you do an incremental load?
Can you explain Checkpoint?
What is Manifest file in SSIS?
What is the difference between ActiveX Script and Script Task?
What is the difference between Script Task and Script Component?
Explain how can we connect to Oracle, DB2 and MySQL from SSIS?
Explain Multicast Transformation?
What is the difference between Fuzzy Lookup and Fuzzy Grouping?
What is a Task?
What is a workflow in SSIS?
What is the Control Flow?
What is a Transformation?
Can you explain config file in SSIS?
What are the main SSIS tool box widgets?
How many difference source and destinations have you used?
Can you explain how do you handle errors in SSIS?
Can you explain OLE DB Command Transform?
What is precedence constraint?
Explain why variables called the most powerful component of SSIS
Can we add our custom code in SSIS?
What are the Data viewers in SSIS?
What are the some SSIS components?
What are the command line tools to execute SQL Server Integration Services packages?
What is conditional split?
Explain the SSIS Architecture?
What is a breakpoint in SSIS?
Explain the use of containers in SSIS and also their types
What is XMLify component?
Why is the need for data conversion transformations?
Can you explain Error Handling in SSIS?
What is the difference between control flow and data flow?
What are different component of SSIS will you use?
Related posts:
- CouchDB Interview Questions and Answers
- DBMS Interview Questions and Answers
- MongoDB Interview Questions and Answers
- MSBI Interview Questions and Answers
- SQL Interview Questions and Answers
- SQL Server 2008 Interview Questions and Answers
- SQL Server Interview Questions and Answers
- SSRS Interview Questions and Answers