Teradata Primer for SQL Developers: Character Functions

Introduction

I recently started working with Teradata for a client, and my initial line of learning has been to compare and constrast Teradata functionality with Oracle, the database I know best. This post covers Teradata character (string) functions and their Oracle equivalents.  I’ve included their SQL Server and MySQL equivalents as well.

Teradata Character Functions

Here’s a list of Teradata character functions, though note that one of these is actually an operator (for concatenation) and another is classified as an attribute (for string length).

Concatenation Operator

Teradata concatenates strings using the || operator, just like Oracle. SQL Server and MySQL use the CONCAT function to concatenate multiple character values. All four databases will also handle values that can be implicitly cast to character. Oracle also has a CONCAT function but it takes only two arguments.

Database Expression Result
Teradata 'abc' || 'def' || 'ghi' abcdefghi
'abc' || 123 || 'def' abc 123def
'abc' || -123 || 'def' abc-123def
Oracle 'abc' || 'def' || 'ghi' abcdefghi
SELECT 'abc' || 123 || 'def' FROM DUAL; abc123def
'abc' || -123 || 'def' abc-123def
SQL Server CONCAT('abc', 'def', 'ghi') abcdefghi
CONCAT('abc', 123, 'def') abc123def
CONCAT('abc', -123, 'def') abc-123def
MySQL CONCAT('abc', 'def', 'ghi') abcdefghi
CONCAT('abc', 123, 'def') abc123def
CONCAT('abc', -123, 'def') abc-123def

Note that the Teradata concatenation of abc, 123, and def puts a space before the 123. The space is presumably a placeholder for a sign, a fair assumption since the concatenation of abc, -123, and def doesn’t have a space.

CHAR2HEXINT Function

This Teradata function returns the hexadecimal value of the passed string:

SELECT CHAR2HEXINT('abc'); → 006100620063

I’m not yet familiar enough with Teradata’s character set handling to know if there’s a scenario where CHAR2HEXINT('abc') will return the single-byte value of 616263 instead of the double-byte 006100620063 value. I did try specifying the LATIN character set for the string but still got a double-byte result:

SELECT CHAR2HEXINT(_latin 'abc'); → 006100620063

Oracle
To get the same functionality in Oracle, use the RAWTOHEX or RAWTONHEX functions. I don’t use these functions regularly, but in my tests they appeared to be interchangeable. My “regular” character set is AL32UTF8 and my “national” (NCHAR) character set is AL16UTF16. When I called either function on the string 'abc' I got 616263 as a result:

SELECT RAWTOHEX('abc') FROM DUAL; → 616263
SELECT RAWTONHEX('abc') FROM DUAL; → 616263

When I defined the string as the “national” character set, both functions returned the same 006100620063 value as the Teradata CHAR2HEXINT function:

SELECT RAWTOHEX(N'abc') FROM DUAL; → 006100620063
SELECT RAWTONHEX(N'abc') FROM DUAL; → 006100620063

SQL Server
To convert a value to hex in SQL Server, CONVERT or CAST the string value to VARBINARY. As with Oracle, I got single-byte results for the “regular” character set and double-byte results for the “national” character set. Unlike Oracle (and Teradata), the 00 “filler” byte comes after the significant value (61 or 62 or 63).

SELECT CAST('abc' AS VARBINARY) → 0x616263
SELECT CAST(N'abc' AS VARBINARY) → 0x610062006300

MySQL
In MySQL, use the HEX function. I got single-byte results for ASCII characters, even when I specified them as “national” characters:

SELECT HEX('abc'); → 616263
SELECT HEX(N'abc'); → 616263

CHARACTER_LENGTH Attribute

Teradata uses the ANSI SQL-2003 compliant CHARACTER_LENGTH attribute to return a string’s length. It’s called an attribute but it’s used like a function. Here’s how to get character length in Teradata, Oracle, SQL Server, and MySQL:

Database Function Example
Teradata CHARACTER_LENGTH CHARACTER_LENGTH('abcd€') → 5
Oracle LENGTH LENGTH('abcd€') → 5
SQL Server LEN LEN('abcd€') → 5
MySQL CHAR_LENGTH CHAR_LENGTH('abcd€') → 5

All the functions above return 5, which is number of characters in the string.

The Euro sign () in the string above is one character, but it requires three bytes of storage. To get byte (rather than character) length, use these functions instead:

Database Function Example
Teradata OCTET_LENGTH OCTET_LENGTH('abcd€') → 7
Oracle LENGTHB LENGTHB('abcd€') → 7
SQL Server DATALENGTH DATALENGTH('abcd€') → 7
MySQL LENGTH LENGTH('abcd€') → 7

These functions all return 7 because that’s the number of bytes needed to store the string:

  1 byte  for a
+ 1 byte  for b
+ 1 byte  for c
+ 1 byte  for d
+ 3 bytes for €
---------
  7 bytes

INDEX Function

This function returns the position of a string within another string. It’s a Teradata extension and not ANSI compliant. Teradata also provides an ANSI SQL-2003 compliant version of the function, named POSITION.

The INDEX function takes two parameters: a string and a string to search for. The return value is the first position the “string to search for” was found, or zero if it’s not found:

SELECT INDEX('abcdef', 'd'); → 4 ('d' is at position 4 of 'abcdef')
SELECT INDEX('abcdef', 'g'); → 0 ('g' isn't in 'abcdef')
SELECT INDEX('abcbcb', 'b'); → 2 (first position of 'b' is 2)
SELECT INDEX('abcbdbe', 'bd') → 4 ('bd' is at position 4)

Oracle, SQL Server, and MySQL all have similar functions:

Database Function Example
Teradata INDEX INDEX('abcdef', 'de') → 4
Oracle 1 INSTR INSTR('abcdef', 'de') → 4
SQL Server 2 CHARINDEX CHARINDEX('de', 'abcdef') → 4
MySQL INSTR INSTR('abcdef', 'de') → 4

1 Oracle also lets you specify the starting position of the search and which occurrence of the searched-for value to find. See the Oracle INSTR function documentation for more information.
2 SQL Server also lets you specify the starting position of the search. See the SQL Server CHARINDEX documentation for more information.

LOWER Function

The LOWER function converts all letters in a string to lowercase. It’s ANSI SQL-2003 compliant.

Oracle, SQL Server, and MySQL all support this function, and their syntax is identical to Teradata’s:

LOWER('This is IT!') → this is it!

MySQL also has an LCASE function, which is synonymous with LOWER. The LCASE function is not ANSI compliant.

POSITION Function

This is the ANSI SQL-2003 compliant version of the INDEX function. Its syntax is as follows:

SELECT POSITION('de' IN 'abcdef'); → 4

The POSITION function is not available in Oracle, SQL Server, or MySQL. The equivalent functions for these databases are shown in the section that explains the Teradata INDEX function.

SOUNDEX Function

The SOUNDEX function returns a short string representing the phonetic value of a string. It’s not an ANSI-compliant function.

Oracle, SQL Server, and MySQL also have a SOUNDEX function, but MySQL’s implementation is different. The Oracle SOUNDEX function documentation includes an explanation of how the function works for Oracle (as well as Teradata and SQL Server). The MySQL SOUNDEX documentation describes how its calculation differs from the Teradata, Oracle, and SQL Server implementations.

Database Function Example
Teradata SOUNDEX SOUNDEX('database') → D312
Oracle 1 SOUNDEX SOUNDEX('database') → D312
SQL Server 2 SOUNDEX SOUNDEX('database') → D312
MySQL SOUNDEX SOUNDEX('database') → D120

STRING_CS Function

This has something to do with Kanji characters, which are Chinese pictographs used in Japanese writing. I haven’t tried to learn this function because I don’t forsee using it in the near future..

SUBSTR Function

This function returns a slice of a string based on position and length. It’s not ANSI compliant, but Teradata also provides an ANSI SQL-2003 compliant SUBSTRING function. The Teradata syntax is as follows:

  • SUBSTR(string, start) → return from start to the end of the string.
  • SUBSTR(string, start, length) → return length characters starting at start.

The start position is one-indexed, meaning the first character of the string is position 1.

Oracle, SQL Server, and MySQL all support this function, though note that the SQL Server function is named SUBSTRING. MySQL also supports SUBSTRING as a synonym of the SUBSTR function.

Database Expression Result
Teradata SUBSTR('abcdef', 3, 2) cd
SUBSTR('abcdef', 4) def
Oracle 1 SUBSTR('abcdef', 3, 2) cd
SUBSTR('abcdef', 4) def
SQL Server SUBSTRING('abcdef', 3, 2) cd
SUBSTRING('abcdef', 4) Error: requires 3 arguments
MySQL 1 SUBSTR('abcdef', 3, 2) cd
SUBSTRING('abcdef', 3, 2) cd
SUBSTR('abcdef', 4) def
SUBSTRING('abcdef', 4) def

1 In Oracle and MySQL, a negative start value will calculate the start position from the end of the string. See the Oracle and MySQL documentation for more information.

SUBSTRING Function

The Teradata SUBSTRING function returns a slice of a string based on position and length. Its behavior is the same as the SUBSTR function, but its syntax is ANSI SQL-2003 compliant. The syntax is as follows:

  • SUBSTRING(string FROM start) → return from start to the end of the string.
  • SUBSTRING(string FROM start FOR length) → return length characters starting at start.

MySQL supports the ANSI syntax in the SUBSTRING function and its SUBSTR synonym. Oracle and SQL Server do not support the ANSI syntax.

Database Expression Result
Teradata SUBSTR('abcdef' FROM 3 FOR 2) cd
SUBSTR('abcdef' FROM 4) def
MySQL 1 SUBSTRING('abcdef' FROM 3 FOR 2) cd
SUBSTR('abcdef' FROM 3 FOR 2) cd
SUBSTRING('abcdef' FROM 4) def
SUBSTR('abcdef' FROM 4) def

1 In MySQL, a negative start value will calculate the start position from the end of the string. See the MySQL documentation for more information.

TRANSLATE Function

The Teradata TRANSLATE function converts a string from one character set to another. It’s not related to the Oracle function of the same name, which substitutes characters in a string with other characters.

I didn’t spend any time mastering this function because I don’t forsee using it in the near future. Similar functions in the other databases are:

  • Oracle: CONVERT
  • SQL Server: as far as I know, there is no equivalent function
  • MySQL: CONVERT

TRANSLATE_CHK Function

The TRANSLATE_CHK function checks to see if a proposed character set conversion will be successful. As with the TRANSLATE function, I didn’t spend any time learning this.

TRIM Function

The TRIM function trims leading or trailing (or both leading and trailing) characters from a string. Teradata supports the ANSI form of the function, which is as follows:

  • TRIM(string) → trim leading and trailing whitespace from a string.
  • TRIM(LEADING|TRAILING|BOTH char FROM string) → trim the specified character from the beginning or ending (or from the beginning and ending) of a string.

Oracle and MySQL use the same syntax as Teradata. SQL Server has limited TRIM support. Oracle and MySQL also have LTRIM and RTRIM functions with similar functionality. Teradata has these functions too, though they’re not documented. The following examples are grouped by scenario.

Scenario: Trim leading and trailing spaces from a string

SQL Server doesn’t have a dedicated function to trim from both sides of a string, but it does have an LTRIM function that trims spaces from the left and an RTRIM that trims spaces from the right. These can be combined to trim spaces from both sides: LTRIM(RTRIM(' abc ')).


Expression
Works in
Teradata?
Works in
Oracle?
Works in
MySQL?
TRIM(' abc ') Yes Yes Yes
TRIM(BOTH ' ' FROM ' abc ') Yes Yes Yes
TRIM(BOTH FROM ' abc ') Yes Yes Yes
TRIM(' ' FROM ' abc ') No Yes Yes

Scenario: Trim leading character ‘a’ from a string

SQL Server doesn’t have this functionality at all, though it can be done with a fairly simple user-defined function.


Expression
Works in
Teradata?
Works in
Oracle?
Works in
MySQL?
TRIM(LEADING 'a' FROM 'aaabcd') Yes Yes Yes
LTRIM('aaabcd', 'a') 1 Yes 2 Yes Yes

1 The LTRIM function is not ANSI compliant.
2 The Teradata LTRIM function is undocumented.

UPPER Function

The UPPER function converts all letters in a string to uppercase. It’s ANSI SQL-2003 compliant.

Oracle, SQL Server, and MySQL all support this function, and their syntax is identical to Teradata’s:

UPPER('This is it!') → THIS IS IT!

MySQL also has a UCASE function, which is synonymous with UPPER. The UCASE function is not ANSI compliant.

VARGRAPHIC Function

The Teradata VARGRAPHIC function has something to do with (I think) a type of character set. The documentation states that the VARGRAPHIC type is implemented for DB2 compatibility. I didn’t spend any time mastering this function because I don’t forsee using it in the near future. I know of no similar specialized data type or function in Oracle, SQL Server, or MySQL.

Leave a Reply

Your email address will not be published. Required fields are marked *