Data warehousing with Teradata: 2013

Wednesday, December 11, 2013

move space in teradata

In teradata permanent space can be moved from one database to another, below is commands in sequence to perform this task.

1. CREATE DATABASE TDAdmin20131211041448 FROM "DBC" AS perm = 322122547200
2. GIVE TDAdmin20131211041448 TO "dba_dat"

3. DROP DATABASE TDAdmin20131211041448

Monday, December 9, 2013

Starting teradata database service in linux(vm)

For Teradata database express edition installed on Suse linux, we can check the database service status by
command "pdestate".

# pdestate -a
PDE state is START/TVSASTART. --which indicates database service is up and running.
or
PDE state is DOWN/HARDSTOP. --- indicates database service is down.

we can start or stop the database service from /etc/init.d as

s10-1310:/etc/init.d # service tpa start/stop.

Note: once we start the database service (tpa), this may take few minute to start and enable logons etc.

Troubleshooting

If you are running into problems getting Teradata started, the first place to check for clues is in the log file:

#tail /var/log/messages

And finally, to check your storage, use the verify_pdisks command:

# verify_pdisks 

All pdisks on this node verified.

You may see some warning messages with this, but what we're looking for is the final 'verified' message.

Wednesday, November 20, 2013

Release Hut-Lock in Teradata database

Hut-Lock is placed on table or database by Teradata where there is archive/restore operation is being performed on table/database, ideally the archive/resotre/copy script should include “release lock” command at the end of script to release lock once the operation is complete. But in case if backup script did not mention “release lock” clause or backup job failed with abnormal termination the lock is not release automatically.

In this case we need to release lock manually; we can release Hut- lock in two ways 1. By using arcmain or by using SQL Assistant/bteq.

Using arcmain to release lock.

Using command prompt or Linux prompt, enter below command.

> arcmain

Logon serverip/arcuser1,arcuserpassword;

Release lock (databasename.tablename); à to release table level lock.

Release lock (databasename); à to release database level lock.

Logoff;

Note: In both the cases Teradata expect to use same user (arcuser1) which has put Hut-Lock, if not you can use super user with “override” option.

i.e. Release lock (databasename.tablename), override;

Using Sql Assistant/bteq to release lock.

1. Logon to SQL Assistant using same arcuser1, which accuire lock.

2. release lock(databasename.tablename);

Hut-Lock

Host Utility (HUT) Lock is a lock that Teradata ARC utility places when ARC command like ARCHIVE/RESTORE etc. are executed on database or objects with below condition apply.

1. HUT locks are associated with the currently logged-on user who entered the statement,

not with a job or transaction.

2. HUT locks are placed only on the AMPs that are participating in a Teradata ARC

operation.

3. A HUT lock that is placed for a user on an object at one level never conflicts with another

level of lock on the same object for the same user.

We can check the Hut-lock in Teradata database using the ShowLocks Console utility.

Friday, November 15, 2013

Teradata constraints type

Teradata has various types of explicit or implicit constraints to support object relationship and Teradata specific data validation constraints.

The type of table-level check constraint or partitioning constraint (formerly referred to as indexes).

constraints types:

C = Explicit table-level constraint check

P = Non-partitioned Primary Index

Q = Partitioning constraint

S = Hash-Ordered Secondary Index without ALL

K = Primary Key

U = Unique constraint

R = References constraint

V = Value-Ordered Secondary Index without ALL

H = Hash-Ordered Secondary Index with ALL

O = Value-Ordered Secondary with ALL

Sunday, October 27, 2013

Teradata session mode

Teradata support both ANSI and Teradata session mode while connecting to database, the main difference between the two modes are as listed.

TERADATA Mode	ANSI Mode
Comparisons are not case specific	Comparisons are case specific
ALLOWS Truncation of Display data	No Truncation of Display data allowed
CREATE TABLE default to SET tables	CREATE TABLE default to MULTISET tables
Each Transaction is IMPLICIT automatically	Each Transaction is IMPLICIT automatically
Have to specify BT/ET explicitly	Do not have to specify BT/ET explicitly

The current mode of session can be checked by "help session" query in the result set "Transaction Semantics" column represent the Session mode. or alternatively we can query dbc.SessionInfoV view as below .

SELECT transaction_mode FROM dbc.SessionInfoV WHERE SessionNo = SESSION;

we can also set the session mode as per requirement. for bteq we can simply execute below command before .logon

.set session transation BTET; ----> for teradata mode

.set session transation ANSI; ---> for ansi mode.

for "sql assistant" if we are connecting using odbc driver we have to set odbc connection as below.

if we are using teradata.net provider to connect ,

Tuesday, October 15, 2013

BTEQ Help

Introduction to teradata utilities BTEQ

BTEQ is a Teradata native query tool for DBA and programmers. BTEQ (Basic
TEradata Query) is a command-driven utility used to 1) access and manipulate
data, and 2) format reports for both print and screen output.

DEFINITION

BTEQ, short for Basic TEradata Query,
is a general-purpose command-driven utility used to access and manipulate data
on the Teradata Database, and format reports for both print and screen output. [1]

OVERVIEW

As part of the Teradata Tools and Utilities (TTU), BTEQ is a
Teradata native query tool for DBA and programmers — a real Teradata workhorse,
just like SQLPlus for the Oracle Database. It enables users on a workstation to
easily access one or more Teradata Database systems for ad hoc queries, report
generation, data movement (suitable for small volumes) and database
administration.

All database requests in BTEQ are expressed in Teradata
Structured Query Language (Teradata SQL). You can use Teradata SQL statements
in BTEQ to:

    * Define
      data — create and modify data structures;
    * Select
      data — query a database;
    * Manipulate
      data — insert, delete, and update data;
    * Control
      data — define databases and users, establish access rights, and secure
      data;
    * Create
      Teradata SQL macros — store and execute sequences of Teradata SQL
      statements as a single operation.

BTEQ supports Teradata-specific SQL functions for doing
complex analytical querying and data mining, such as:

    * RANK -
      (Rankings);
    * QUANTILE
      - (Quantiles);
    * CSUM -
      (Cumulation);
    * MAVG -
      (Moving Averages);
    * MSUM -
      (Moving Sums);
    * MDIFF
      - (Moving Differences);
    * MLINREG
      - (Moving Linear Regression);
    * ROLLUP
      - (One Dimension of Group);
    * CUBE -
      (All Dimensions of Group);
    * GROUPING
      SETS - (Restrict Group);
    * GROUPING
      - (Distinguish NULL rows).

Noticeably, BTEQ supports the conditional logic (i.e.,
"IF..THEN..."). It is useful for batch mode export / import
processing.

OPERATING FEATURES

This section is based on Teradata documentation for the
current release.[1]

BTEQ Sessions

In a BTEQ session, you can access a Teradata Database easily
and do the following:

    * enter
      Teradata SQL statements to view, add, modify, and delete data;
    * enter
      BTEQ commands;
    * enter
      operating system commands;
    * create
      and use Teradata stored procedures.

Tuesday, September 24, 2013

Datatype short forms in Teradata

When we help table in Teradata, datatype is represented in short forms. below is list of datatype and it respective short forms.

• AT = TIME

• BF = BYTE

• BO = BLOB

• BV = VARBYTE

• CF = CHAR

• CO = CLOB

• CV = VARCHAR

• D = DECIMAL

• DA = DATE

• DH = INTERVAL DAY TO HOUR

• DM = INTERVAL DAY TO MINUTE

• DS = INTERVAL DAY TO SECOND

• DY = INTERVAL DAY

• F = FLOAT

• HM = INTERVAL HOUR TO MINUTE

• HR = INTERVAL HOUR

• HS = INTERVAL HOUR TO SECOND

• I1 = BYTEINT

• I2 = SMALLINT

• I8 = BIGINT

• I = INTEGER

• MI = INTERVAL MINUTE

• MO = INTERVAL MONTH

• MS = INTERVAL

• MINUTE TO SECOND

• PD = PERIOD(DATE)

• PM = PERIOD

(TIMESTAMP(n) WITH TIME

ZONE)

• PS = PERIOD

(TIMESTAMP(n))

• PT = PERIOD(TIME(n))

• PZ = PERIOD(TIME(n) WITH

TIME ZONE)

• SC = INTERVAL SECOND

• SZ = TIMESTAMP WITH

TIME ZONE

• TS = TIMESTAMP

• TZ = TIME WITH TIME

ZONE

• YM = INTERVAL YEAR TO

MONTH

• YR = INTERVAL YEAR

• UT = UDT Type

Thursday, September 12, 2013

More on Volatile Tables

Volatile tables are useful for the following:

• Creating tables that the system accesses and modifies frequently

• For when you know you do not need the data for later use

• Automatically dropping the table at session end

Using volatile tables enables the Optimizer to do an index scan instead of a table scan.

The definition of a volatile table is stored in memory cache only for the duration of the

current session. Space usage is charged to the spool space of the login user. If your SPOOL

allocation is large enough, you can create up to 1000 volatile tables per session.

Note: Volatile tables do not survive a Teradata Database reset. Restart processing destroys

both the contents and the definition of a volatile table.

The following are not available for volatile tables:

• Referential integrity constraints

• Check constraints

• Permanent journaling

• Compressed column values

• DEFAULT clause

• TITLE clause

• Named indexes

• Privilege checking (because volatile tables are private to the session in which they are

created)

• Identity Column

More on Global Temporary Tables

Global temporary tables store a query result in a temporary table for use in subsequent

queries. This improves system performance as follows:

• The system needs to re-execute the query fewer times.

• The user needs to create and drop fewer tables.

Use the CREATE GLOBAL TEMPORARY TABLE statement to create a global table. Global

temporary tables are stored in the Data Dictionary, but their contents do not persist across

sessions.

Also, their contents are not saved when a transaction completes, unless you explicitly specify

otherwise. To save table rows after a transaction ends, specify ON COMMIT PRESERVE

ROWS as the last keywords in the CREATE statement. However, if you are using a proxy

connection and the connection is lost or switched to another proxy user, all temporary tables

are also lost.

The following options are not available for global temporary tables:

• Referential constraints

• Permanent journaling

• Identity column

Space usage is charged to the temporary space of the login user.

Note: Global temporary tables require both permanent and temporary space. A minimum of

512 bytes of permanent space is required for the table header stored on each AMP for the base

global temporary table definition. Temporary space is required for storing the actual rows.

You can materialize up to 2000 instances of global temporary tables in a single session, as long

as your login user has adequate temporary space.

To obtain a list of all global temporary tables in the system, query the CommitOpt column of

the DBC.TablesV view. For example:

SELECT * FROM DBC.TablesV WHERE CommitOpt IN (‘D’, ‘P’);

The CommitOpt column shows the value D or P for a global temporary table. (P is for ON

COMMIT PRESERVE ROWS and D is ON COMMIT DELETE ROWS. N indicates the object

is not a temporary table.)

To obtain a list of all global temporary tables you own, query the restricted (DBC.TablesVX)

view.

You can also query the TransLog column to determine if there is transaction logging for a

global temporary table.

You can materialize a global temporary table locally by referencing it in an SQL data

manipulation language (DML) statement. To do this, you must have the appropriate privilege

on the base temporary table or on the containing database, as required by the statement that

materializes the table. Space usage is charged to the temporary space of the login user.

Any number of different sessions can materialize the same table definition, but the contents

change depending on the DML statements applied during the course of a session.

Privileges are not checked on the materialized instances of any global temporary tables

because those tables exist only for the duration of the session in which they are materialized.

Wednesday, September 4, 2013

How to find number of amp in the system?

Number of AMP depends on number of node in the system and can vary from 8 to 40 depending on type of node used in system, latest node type 6690 can have 40 AMP on it.

Total AMP in the system can be found out by query

select hashamp() +1;

or

select Vproc from dbc.Acctg where VprocType='AMP' group by Vproc;

It is recomended to have nodetype same accross the system to avoid any skew while processing request, but in case you have more than one node type in the system we can use below query to get nodewise number of AMPs in the system.

Select nodeid,count(distinct Vproc) from dbc.ResCpuUsageByAmpView group by 1

We can query to find what are the different node types are being used in the system

select distinct nodetype from dbc.resusagespma;

Number of node in the system can be found out by

Select count(distinct nodeid) from dbc.resusagescpu;

Thursday, August 22, 2013

Index creation in Teradata

There are two types of indexes in Teradata, namely Primary Index and Secondary index. primary index should be created while creating table, whereas secondary index can be created after table has been created.

Secondary index create syntax:

non-unique(NUSI):

Create Index indexname(col1,col2,...) on databasename.tablename;

unique(USI):

Create Unique Index indexname(col1,col2,...) on databasename.tablename;

Tuesday, August 20, 2013

What is Hot Amp in Teradata?

Hot Amp is situation when one or few AMP's in the system work more than other AMP's in the system due to uneven data distribution among the AMP's.

when a select query is executed on skewed tables, few amps with more data will work longer while other AMP's with lower data volume will finish its processing earlier and sit idle waiting for Hot AMP's to finish its processing. in this scenario the overall query performance directly proportional to the Hot AMP's performance.

Find Skewed tables

we can find the skewed tables of a system by using below query, we can use the skewed tables for Primary Index revisit for performance enhancement of overall system.

Lock Dbc.TableSize For Access
Lock Dbc.diskspace For Access
SELECT b.databasename,b.TableName AS TableName,c.tablekind,SUM(CurrentPerm) /1024/1024 AS "CurrentPerm(MB)",
SUM(PeakPerm)/1024/1024 AS "PeakPerm(MB)",(100 - (AVG(CurrentPerm)/MAX(CurrentPerm)*100)) AS SkewFactor
FROM Dbc.TableSize b,dbc.tables c, (
Select
databasename ,zeroifnull((Max(currentperm) - Avg(currentperm))/max(nullifzero(currentperm)) * 100) As SKW
From dbc.diskspace
Group By 1
Having SKW >=10 ) a
WHERE b.DataBaseName=a.databasename
and c.tablename=b.tablename
and c.tablekind='T'
Group By 1,2,3
Having SkewFactor >=20
Order By "CurrentPerm(MB)" desc;

Hash function in teradata

Data distribution of a table on all AMP's depends on Hashvalue calculated on Primary Index value, Teradata has is own algorithm to calculate hash value. There are four type of hash functions in Teradata which simulate the hash algorithm, which can be used to find the data distribution of a table.

1. HASHROW(column1,..)
2. HASHBUCKET(Hashrow)
3. HASHAMP(hashbucket)
4. HASHBAKAMP(hashbucket)

Query to check data distribution of a table:

SELECT HASHAMP (HASHBUCKET (HASHROW (columnName))) AS “AMPNO”, COUNT (*) FROM Tablename GROUP BY 1 ORDER BY 2 DESC;

You can use below query to check on which amp number, amp bucket number a particular value will be mapped to.

SELECT HASHROW ('motivation') AS "Hash Value"
, HASHBUCKET (HASHROW ('motivation')) AS "Bucket No"
, HASHAMP (HASHBUCKET (HASHROW ('motivation'))) AS "AMP No"
, HASHBAKAMP (HASHBUCKET (HASHROW ('motivation'))) AS "AMP Fallback No" ;

you can use HASHROW() function to check hash collision.

SELECT HASHROW ('motivation') AS "Hash Value";

Monday, August 12, 2013

Teradata Viewpoint Services

Teradata viewpoint is web application and is very interactive and efficient monitoring tool, which replace teradata manager and its functionality from version td13, it is installed and run on dedicated server in the network to capture various monitoring and performance data of teradata database system.

viewpoint basically capture data using its data collection service (dcs) and save it to its back-end database runs in postgressql database, its portal service called viewpoint service displays the collected information interactively on viewpoint portlet.

there is mainly 3 services which manage the viewpoint operation
1. viewpoint (portal service).
2. dcs (data collector service)
3. postgresql (back-end database service)

we can check the status of services or start/stop/restart as we face some issue

$/etc/init.d/dcs start/stop/status

$/etc/init.d/postgresql start/stop/status

$/etc/init.d/viewpoint start/stop/status

we have their respective log files to check for any error or warnings.

/opt/teradata/viewpoint/logs/viewpoint.log

/opt/teradata/dcs/logs/dcs.log

* apart from these services there is supporting service called CAM services for all java based services.

Wednesday, July 24, 2013

Teradata error 3541 The request to assign new PERMANENT space is invalid.

While creating a new user or cloning a user from existing user, we may come across the error 3541.

known remedy is to Either decrease the PERMANENT space in the new database or increase the PERMANENT space in the parent database.

Monday, July 1, 2013

Failure 3523 after teradata version upgrade

There are scenarios when user or owner has specific user rights on specific object or database, the statement or stored procedure may fail with error code 3523(Failure 3523 COLLECT_STATS:The user does not have STATISTICS access to test_db.user_tbl), this is seen mostly after any upgrade,patch or efix. at this moment you may wonder what has gone wrong.

we could fix the issue by just compiling the stored procedure.

Thursday, May 23, 2013

Teradata database port

the default port for Teradata database is 1025.

this can be checked by netstat command on database node.

MD3NODE1-9:~ # netstat -aon|grep 1025

tcp 0 0 10.35.48.11:1025 10.51.180.204:2610 ESTABLISHED keepalive (314.05/0/0)
tcp 0 0 10.35.48.11:1025 10.51.183.23:1669 ESTABLISHED keepalive (512.87/0/0)
tcp 0 0 10.35.48.11:1025 10.44.10.68:3209 ESTABLISHED keepalive (6.78/0/0)
tcp 0 0 10.35.48.11:1025 10.44.32.80:2241 ESTABLISHED keepalive (462.68/0/0)
tcp 0 0 10.35.48.11:1025 10.51.181.17:2087 ESTABLISHED keepalive (54.02/0/0)
tcp 0 0 10.35.48.11:1025 10.18.32.237:43922 ESTABLISHED keepalive (21.70/0/0)
tcp 0 0 10.35.48.11:1025 10.18.70.125:55381 ESTABLISHED keepalive (97.30/0/0)
tcp 0 0 10.35.48.11:1025 10.18.32.237:40140 ESTABLISHED keepalive (322.54/0/0)
tcp 0 0 10.35.48.11:1025 10.18.32.237:42456 ESTABLISHED keepalive (187.41/0/0)
tcp 0 0 10.35.48.11:1025 10.18.32.237:53040 ESTABLISHED keepalive (322.93/0/0)

Sunday, May 19, 2013

Space query

Query to get space report database wise.

SELECT Databasename AS "Database Name",
Max_perm_gb (DECIMAL (10,1)) AS "Max Perm (GB)",
current_perm_gb (DECIMAL (10,1)) AS "Current Perm (GB)",
EffectiveUsedSpace_Gb (DECIMAL (10,1)) AS "Effective Used Space (GB)",
Current_Perm_Percent AS "Current Perm %",
(Max_perm_gb (DECIMAL (10,1))) -( current_perm_gb (DECIMAL (10,1))) as "unused space in GB" ,
(100) - (Current_Perm_Percent) as "unused %",
Peak_Perm_Gb (DECIMAL (10,1)) AS "Peak Perm (GB)",
Peak_Perm_Percent AS "Peak Perm %",
D_SKEW AS "Skew %"
FROM
(
SELECT
databasename, SUM(maxperm)/1024/1024/1024 AS Max_Perm_Gb,
SUM(currentperm)/1024/1024/1024 AS Current_Perm_Gb,
MAX(currentperm)*COUNT(*) /1024/1024/1024 AS EffectiveUsedSpace_Gb,
Current_Perm_Gb/Max_Perm_Gb * 100 AS Current_Perm_Percent,
SUM(peakperm)/1024/1024/1024 AS Peak_Perm_Gb,
Peak_Perm_Gb/Max_Perm_Gb * 100 AS Peak_Perm_Percent,
ZEROIFNULL((MAX(currentperm) - AVG(currentperm))/MAX(NULLIFZERO(currentperm)) * 100) AS D_SKEW
FROM dbc.diskspace
GROUP BY 1
HAVING Max_Perm_Gb >= 9.9
) a
--WHERE databasename IN ('database1','database2')
ORDER BY 2 DESC;

Tuesday, April 30, 2013

Teradata Table Kind List

All the object in teradata is listed in "dbc.tables", different objects are differentiated through column "Tablekind"

List of table kind:

A	=	AGGREGATE UDF
B	=	COMBINED AGGREGATE AND ORDERED ANALYTICAL FUNCTION
E	=	EXTERNAL STORED PROCEDURE
F	=	SCALAR UDF
G	=	TRIGGER
H	=	INSTANCE OR CONSTRUCTOR METHOD
I	=	JOIN INDEX
J	=	JOURNAL
M	=	MACRO
N	=	HASH INDEX
P	=	STORED PROCEDURE
Q	=	QUEUE TABLE
R	=	TABLE FUNCTION
S	=	ORDERED ANALYTICAL FUNCTION
T	=	TABLE
U	=	USER-DEFINED DATA TYPE
V	=	VIEW
X	=	AUTHORIZATION

O = NOPI TABLE
D = JAR

Saturday, February 9, 2013

Failed. 6706: The string contains an untranslatable character

Problem Description: SELECT Failed. 6706: The string contains an untranslatable character

This error usually comes when a junk character come across when selecting from column of a table using some function like cast (), coalesce(),trim() etc.

Example :

Select Prop_Name, cast(coalesce(Prop_DSC,'') as char(400) ) from P_PROPERTY. PROPTYPE ;

Problem seems to be with data in column "Prop_DSC" in P_PROPERTY. PROPTYPE table. column character set is LATIN.

Problem Solution: Please use translate_chk function to determine untranslatable column values that are causing this issue

You can use the proper “source_TO_target” value for the translation. e.g. LATIN_TO_UNICODE
please check “show table” to verify any character set is specified for column in table definition and choose character set translation string accordingly e.g. LATIN_TO_UNICODE, UNICODE _TO_ LATIN etc .

SELECT Prop_DSC FROM P_ PROPERTY.PROPTYPE WHERE TRANSLATE_CHK(Prop_DSC USING LATIN_TO_UNICODE) <> 0;

Monday, January 7, 2013

Coalesce() function in teradata

COALESCE is used to check if the argument is NULL, it takes the default value if it encounter NULL.

It will check for NOT NULL values sequentially in the list and it will return the first NOT NULL value.

Syntax:

COALESCE(Argument list,['default value'])

Argument list - May be columns of a table or expressions

COALESCE returns NULL if all its arguments evaluate to null.
Otherwise, it returns the value of the first non-null argument in the given expression list.
Each COALESCE function must have at least two operands. Either two columns or one column with one default value.

It is possible to have multiple columns under coalesce like below:

COALESCE(col1, col2, col3, 0)

The above code says that if col1 is null then it will check col2.
If col2 is null then it will check for col3.
If col3 is also null, then 0 will be the output.

If coalesce encounters any of the column having not null value then the column value will be returned by the query. (Left to right priority)

Example : 1
If the country takes NULL value, 'US' will be supplied by Teradata

1. SELECT COALESCE(Country,'US') as Output

Output

Country	Output
Italy	Italy
NULL	US

Example: 2
The following example returns the home phone number of the named individual (if present), or office phone if HomePhone is null, or CellPhone if present and both home and office phone values are null. Returns NULL if all three values are null.

1. SELECT

2. Name,

3. COALESCE (HomePhone, OfficePhone, cellPhone) FROM PhoneDirectory;

When a default value is provided, default value will be returned if all the input fields are null

1. SELECT

2. Name,

3. COALESCE (HomePhone, OfficePhone, cellPhone,'No Phone') FROM PhoneDirectory;

Example:3

Whenever you combine COALESCE with non-comparison operators, such as || or * or +, you must enclose the entire COALESCE function and its operands in parentheses.

1. SELECT (COALESCE(100,0)) * 5;