Teradata Corporation is a provider of database-related products and services. The company was formed in 1979 in Brentwood, California, as a collaboration between researchers at Caltech and Citibank’s advanced technology group. The company was acquired by NCR Corporation in 1991, and subsequently spun-off again as an independent public company on October 1, 2007.
Teradata began to market with the term “big data” in 2010. CTO Stephen Brobst attributed the rise of big data to “new media sources, such as social media. The increase in semi-structured and unstructured data gathered from online interactions prompted Teradata to form the “Petabyte club” in 2011 for its heaviest big data users. (Wikipedia says)
Teradata is a fully scalable relational database management system produced by Teradata Corp. It is widely used to manage large data warehousing operations. Teradata database is an information repository supported by tools and utilities that make it, as part of Teradata Warehouse, a complete and active relational database management system. There have plenty of reasons why customers like to choose Teradata. Teradata Architecture consists of three components. Parsing Engine: It is receiving the query from the user, parses it and prepares the execution plan. BYNET: It is receiving the execution plan from the Parsing Engine and dispatches to the appropriate AMP. AMP: It is responsible for storing and retrieving rows. It stores the data in the virtual disk associated with it. In addition to this, AMP is responsible for lock management, space management, sorting and aggregation. The Primary Index is defined when the table is created. Every table must have at least one column as the Primary Index. There are two reasons you might pick a different Primary Index then your Primary Key. They are (1) for Performance reasons and (2) known access paths. Partitioned Primary Index (PPI) is an indexing mechanism that facilitates to improve the performance of certain queries. Within a table with Partitioned Primary Index (PPI) defined, rows are sorted according to their partition number. Within each partition, records are arranged by their row hash. Also, rows are assigned to a partition based on the definition of a partition. Pros of PPI: A Unique Primary Index (UPI) is unique and cannot have any duplicates. If you try and insert a row with a Primary Index value that is already in the table, the row will be rejected. An UPI enforces UNIQUENESS for a column. A Unique Primary Index (UPI) will always spread the rows of the table evenly amongst the AMPs. UPI access is always a one-AMP operation. We have selected EMP_NO to be our Primary Index. Because we have designated EMP_NO to be a Unique Primary Index, there can be no duplicate employee numbers in the table. A Non-Unique Primary Index (NUPI) means that the values for the selected column can be non-unique. Duplicate values can exist. A Non-Unique Primary Index will almost never spread the table rows evenly. An All-AMP operation will take longer if the data is unevenly distributed. You might pick a NUPI over an UPI because the NUPI column may be more effective for query access and joins. A Secondary Index (SI) is an alternate data access path. It allows you to access the data without having to do a full-table scan. You can drop and recreate secondary indexes dynamically, as they are needed. Secondary Indexes are stored in separate sub tables that requires additional disk space and maintenance which is handled automatically by the system. A Join Index is an optional index which may be created by a User. Join indexes provide additional processing efficiencies: TPump (Teradata Parallel Data Pump), as learned Fastload and Multiload are loads huge volume of data. But TPump loads data one row at a time, using row hash locks. Because it locks at this level, and not at the table level like MultiLoad, TPump can make many simultaneous, or concurrent, updates on a table. TPump performs Inserts, Updates, Deletes and Upserts from Flat filed to populated Teradata tables at ROW LEVEL. TPump supports: The Teradata Parallel Transport (TPT) utility is combination of BTEQ, FastLoad, MultiLoad, Tpump, and FastExport utilities. So TPT can perform BTEQ (Batch TEradata Query) was the first utility and query tool for Teradata. BTEQ can be used as a Query tool, to load data a row at a time into Teradata and to export data off of Teradata a row at a time. BTEQ is pronounced Bee-Teeeek) Features of BTEQ: Following is the list of terms commonly used in BTEQ scripts. There are mainly 4 types of BTEQ Exports. Export DATA: Users will export data to a flat file format. This is called Record Mode or DATA mode. If the data has no headers or white space between the data contained in each column and the data is written to the file in a normal format. (. EXPORT DATA) Export INDICDATA: It is used to export data with extra indicator bytes to indicate NULLs in column for a row. (.EXPORT INDICDATA) Export REPORT: In this mode the output of BTEQ export would return the column headers for the fields, white space, expanded packed or binary data.(.EXPORT REPORT) Export DIF: This called as Data Interchange Format, which allows users to export data from Teradata to be directly utilized for spreadsheet applications like Excel, FoxPro and Lotus. Transferring of large amount of data can be done using various Application Teradata Utilities which resides on the host computer (Mainframe or Workstation) i.e. BTEQ, FastLaod, MultiLoad, Tpump and FastExport. BTEQ (Basic Teradata Query) supports all 4 DMLs: SELECT, INSERT, UPDATE and DELETE. BTEQ also support IMPORT/EXPORT protocols. Fastload, MultiLoad and Tpump transfer the data from Host to Teradata. FastExport is used to export data from Teradata to the Host. The steps involved after the disk manager passes the request are as follows: OLTP (Online Transaction Processing) uses normalized tables to quickly record large amounts of transactions while making sure that these updates of data occur in as few places as possible. Consequently, OLTP database are designed for recording the daily operations and transactions of a business. E.g. a timecard system that supports a large production environment must record successfully a large number of updates during critical periods like lunch hour, breaks, startup and close of work. Inserting data records into the table using multiple insert statements. Putting a semi colon in front of the key word INSERT in the next statement rather than terminating the first statement with a semi colon achieves it. Caching is primarily working with the source that does not change frequently. The system keeps frequently using the data blocks in a separate memory in the application. Also caching is shared within several applications. Permanent journals keep track of data before or after applying the changes. This help to roll back or roll forward the table to a particular state. Permanent journals can be enabled at table level or database level. It is used for the operation that needs to perform Update Else Insert natures. This is a specific option that is only available in Teradata. While using the mload of fastload if you don’t want to load a particular filed in the data file to the target then use this filler command to achieve this. A Data model is a conceptual representation of data structures (tables) required for a database and is very powerful in expressing and communicating the business requirements Dimensional Data Modeling comprises of one or more-dimension tables and fact tables. Good examples of dimensions are location, product, time, promotion, organization etc. Dimension tables store records related to that particular dimension and no facts (measures) are stored in these tables. Set operators are used to combine the result from multiple SELECT statements. This is different from joins because joins combine columns in multiple tables, but set operators combine multiple rows. Below Set operators in Teradata: Ensure statistics are collected on join columns and this is especially important if the columns you are joining on are not unique. A Logical data model is the version of a data model that represents the business requirements (entire or part) of an organization and is developed before the physical data model. A sound logical design should streamline the physical design process by clearly defining data structures and the relationships between them. A good data model is created by clearly thinking about the current and future business requirements. Logical data model includes all required entities, attributes, key groups, and relationships that represent business information and define business rules. An ODS (Operational Data Store) is primarily a “dump” of relevant information from a very small number of systems (often just one) usually with little or no transformation. The benefits are an ad hoc query database, which does not affect the operation of systems required to run the business. ODS’s usually deal with data “raw” and “current” and can answer a limited set of queries as a result. Subject area means fundamental entities that make up the major components of the business, e.g. customer, product, employee. Checkpoints are entries posted to a restart log table at regular intervals during the data transfer operation. If processing stops while a job is running, you can restart the job at the most recent checkpoint. A data warehouse is a subject oriented, integrated, time variant, non-volatile collection of data in support of management’s decision-making process. The three-tier differs from the two-tier architecture by strictly enforcing a logical separation of the graphical user interface, business logic, and data. The three-tier is widely used for data warehousing today. Organizations that require greater performance and scalability, the three-tier architecture may be more appropriate. In this architecture, data extracted from legacy systems is cleansed, transformed, and stored in high –speed database servers, which are used as the target database for front-end data access. Global Temporary Table is also known as GTT. The global table is not much different than regular table you create with any database with few exceptions on data life. The table is global but not the data. What does that mean to you? The data on GTT is private meaning the session that inserts data can access it, no one else can access it. The table is accessible or say visible to anyone who has proper access to database and table. Extract, transformation, and loading. ETL refers to the methods involved in accessing and manipulating source data and loading it into target database. The first step in ETL process is mapping the data between source systems and target database (data warehouse or data mart). The second step is cleansing of source data in staging area. The third step is transforming cleansed source data and then loading into the target system. Note that ETT (extract, transformation, transportation) and ETM (extraction, transformation, and move) are sometimes used instead of ETL. The Teradata RDBMS is used by the users that are on the client system and sends the request to the system. The data staging area is a system that stands between the legacy systems and the analytics system, usually a data warehouse and sometimes an ODS. The data staging area is considered the “back room” portion of the data warehouse environment. The data staging area is where the extract, transform and load (ETL) takes place and is out of bounds for end users. Some of the functions of the data staging area include: Teradata mode: ANSI mode: Teradata contains a huge data processing system; Shared Nothing design thereby eliminating resource competition. Teradata is especially designed to accommodate massive information warehouse implementation and has several information deposition options embedded into the core of the info. With linear measurability (Software will scale linearly with hardware), unconditional correspondence, multi-faceted correspondence, intelligent information distribution, parallel-aware optimizer makes Teradata is capable of handling massive information and sophisticated queries. The following rules govern how Primary Indexes in a Teradata Database must be defined as well as how they function: Following are the different functions which a DBA can perform: User Management: Creation and managing Users, Databases, Roles, Profiles and Accounts. Space Allocation: Assigning Permanent Space, Spool Space and Temporary Space. Access of Database Objects: Granting and Revoking Access Rights on different database objects. Security Control: Handling logon and logoff rules for Users. System Maintenance: Specification of system defaults, restart etc. System Performance: Use of Performance Monitor (PMON), Priority Scheduler and Job Scheduling. Resource Monitoring: Database Query Log (DBQL) and Access Logging. Data Archives, Restores and Recovery: ARC Utility and Permanent Journals. There are several types of data protection are available with the Teradata Database. Some processes for protection are automatic and some of them are optional. There are RAID: Redundant Array of Inexpensive Disks (RAID) is a storage technology that provides data protection at the disk drive level. Cliques: A clique is a group of nodes that share access to the same disk arrays. Each multi-node system has at least one clique. Hot Standby Nodes: A Hot Standby Node (HSN) is a node that is a member of a clique that is not configured (initially) to execute any Teradata vprocs. Fallback: Fallback is a Teradata Database feature that protects data in the case of an AMP vproc failure. Journaling: Journals are kept on the system to provide data availability in the event of a component or process failure in the system Locks: Locking prevents multiple users who are trying to access or change the same data simultaneously from violating data integrity. This concurrency control is implemented by locking the target data. What is Teradata?
Why Teradata?
Can you explain the architecture of Teradata?
Can you define Primary Index?
What is Partitioned Primary Index (PPI) and discuss the Pros of using it in a query?
Can you explain Unique Primary Index (UPI)?
Can you explain Non-Unique Primary Index (NUPI)?
Can you define Secondary Index?
Can you define Join Index?
Can you explain TPUMP?
Can you define TPT?
Can you define BTEQ? And features of BTEQ?
What are the BTEQ Scripts?
What are the different types of BTEQ Exports?
How do you transfer large amount of data in Teradata?
What are the steps involved after the disk manager passes the request?
What is OLTP?
What is Multi Insert?
What is Caching in Teradata?
What is a permanent Journal in Teradata?
What is UPSERT command?
What is filler command in Teradata?
Can you define Data Modeling?
Can you define Dimensional Modeling?
What is set operators in Teradata?
How to eliminate product joins in a Teradata SQL query?
What are the types of tables in Teradata?
Can you define Logical Data Model?
What is ODS?
What is Subject Area?
What is a Checkpoint?
Can you define Data Warehousing?
What is a three-tier data warehouse?
Can you define GTT?
Can you define ETL?
What can be achieved by using the Teradata RDBMS?
Can you explain Staging Area?
What are differences between Teradata and ANSI session modes in Teradata?
What are the features of Teradata?
What are the basic rules that define how pi is defined in Teradata?
What are the functions of a Teradata DBA?
Can you explain Teradata database Data Protection?
Related posts:
- CouchDB Interview Questions and Answers Apache CouchDB is an open source database and NoSQL or...
- Database Interview Questions and Answers Database Interview Questions and Answers for beginners Database is a...
- Database Testing Interview Questions and Answers Database is the important role in software applications system. DB...
- DBMS Interview Questions and Answers DBMS stands for Database Management system. It is system software...
- Greenplum Interview Questions and Answers Greenplum was a big data analytics company headquartered in san...
- MongoDB Interview Questions and Answers MongoDB Interview Questions :MongoDB is a free and open source...
- PostgreSQL Interview Questions and Answers PostgreSQL, often simply Postgres, is an object-relational database management system...
- SQL Server Interview Questions and Answers Top SQL Server Interview Questions and Answers: Below, we have...