Data warehousing with Teradata: Hash function in teradata

Data distribution of a table on all AMP's depends on Hashvalue calculated on Primary Index value, Teradata has is own algorithm to calculate hash value. There are four type of hash functions in Teradata which simulate the hash algorithm, which can be used to find the data distribution of a table.

1. HASHROW(column1,..)
2. HASHBUCKET(Hashrow)
3. HASHAMP(hashbucket)
4. HASHBAKAMP(hashbucket)

Query to check data distribution of a table:

SELECT HASHAMP (HASHBUCKET (HASHROW (columnName))) AS “AMPNO”, COUNT (*) FROM Tablename GROUP BY 1 ORDER BY 2 DESC;

You can use below query to check on which amp number, amp bucket number a particular value will be mapped to.

SELECT HASHROW ('motivation') AS "Hash Value"
, HASHBUCKET (HASHROW ('motivation')) AS "Bucket No"
, HASHAMP (HASHBUCKET (HASHROW ('motivation'))) AS "AMP No"
, HASHBAKAMP (HASHBUCKET (HASHROW ('motivation'))) AS "AMP Fallback No" ;

you can use HASHROW() function to check hash collision.

SELECT HASHROW ('motivation') AS "Hash Value";

Data warehousing with Teradata

Tuesday, August 20, 2013

Hash function in teradata

No comments:

Post a Comment