GUID Values as Primary Keys

A Globally Unique Identifier (GUID) is an excellent choice for a primary key. It has three of the main attributes I look for in a primary key:

  • It’s compact: just 16 bytes.
  • It’s guaranteed to be unique.
  • It’s not a business value like an SSN or an account number. This is important because once a primary key value is assigned it shouldn’t change, and business values – even SSN’s – can change.
  •  
    This post shows how to use a GUID in Oracle, SQL Server, and MySQL. A GUID is a 16-byte binary value that should always be unique. Oracle, SQL Server, and MySQL all have functions that will generate a GUID. Their functions vary, as do the data types used to represent the GUID.

    GUID Values in Oracle

    The Oracle function to generate a GUID is SYS_GUID. It returns a 16-byte RAW value:

    SELECT SYS_GUID() FROM DUAL;
    
    SYS_GUID()
    --------------------------------
    96A8B17B17FB464B886146A28399E7E8
    

    To store a GUID in a table, use the RAW(16) data type:

    CREATE TABLE myTable (
      myID RAW(16) NOT NULL,
      myVal VARCHAR2(20));
    
    INSERT INTO myTable VALUES (SYS_GUID(), 'First value');
    INSERT INTO myTable VALUES (SYS_GUID(), 'Second value');
    
    SELECT * FROM myTable;
    
    MYID                             MYVAL
    -------------------------------- --------------------
    F772F8FF55BA4CBCB4400DF5565EF39F First value
    8B33954DCEEC4AB58A77418FF1665EF9 Second value
    

    Oracle will implicitly cast the raw GUID value to a string when needed, so to query for a specific GUID just put its value in quotes:

    SELECT * FROM myTable
    WHERE myID = '8B33954DCEEC4AB58A77418FF1665EF9';
    
    MYID                             MYVAL
    -------------------------------- --------------------
    8B33954DCEEC4AB58A77418FF1665EF9 Second value

    The SYS_GUID function can be used for a column’s default value:

    CREATE TABLE myTable (
      myID RAW(16) DEFAULT SYS_GUID() NOT NULL,
      myVal VARCHAR2(20));
    
    INSERT INTO myTable (myVal) VALUES ('Third value');
    
    SELECT * FROM myTable;
    
    MYID                             MYVAL
    -------------------------------- --------------------
    F772F8FF55BA4CBCB4400DF5565EF39F First value
    8B33954DCEEC4AB58A77418FF1665EF9 Second value
    250449AD17E14B1EBD66DFAF7668BD20 Third value
    

    Note that Oracle’s default display mode for the GUID is 32 hex characters, without dashes.

    When to Assign a GUID Automatically

    Assigning the GUID automatically using DEFAULT may seem convenient, but it has one important drawback: there’s no way to read back the generated GUID. If you’re writing rows to “master” and “detail” tables (for example Orders followed by Order Detail), you need to capture the “master” primary key so you can duplicate it in the “detail” rows. You can’t do that with this approach.

    On the other hand, this works really well for tables that don’t have a detail table under them – in other words it works in cases where you don’t care what the assigned GUID’s value is.

    GUID Values in SQL Server

    The SQL Server function to generate a GUID is NEWID. It returns a value of type UNIQUEIDENTIFIER:

    SELECT NEWID()
    
    (No column name)
    ------------------------------------
    4E2D5E47-5B77-4FD9-B7C0-5A88B2C662BA

    To store a GUID in a table, use the UNIQUEIDENTIFIER data type:

    CREATE TABLE myTable (
      myID UNIQUEIDENTIFIER NOT NULL,
      myVal VARCHAR(20))
    
    INSERT INTO myTable VALUES (NEWID(), 'First value')
    INSERT INTO myTable VALUES (NEWID(), 'Second value')
    
    SELECT * FROM myTable
    
    myID                                 myVal
    ------------------------------------ ------------
    6028D29A-4C90-4A2F-9400-0796F1AD3F54 First value
    B906D682-7F54-43DD-80A6-D0E6003DB6D6 Second value

    SQL Server will implicitly cast a formatted GUID to a UNIQUEIDENTIFIER value when needed, so to query for a specific GUID just add standard dash formatting and put the value in quotes:

    SELECT * FROM myTable
    WHERE myID = '6028D29A-4C90-4A2F-9400-0796F1AD3F54'
    
    myID                                 myVal
    ------------------------------------ -----------
    6028D29A-4C90-4A2F-9400-0796F1AD3F54 First value
    

    The NEWID function can be used for a column’s default value:

    CREATE TABLE myTable (
      myID UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID(),
      myVal VARCHAR(20))
    
    INSERT INTO myTable (myVal) VALUES ('My value');
    
    SELECT * FROM myTable;
    
    MYID                                 MYVAL
    ------------------------------------ --------------------
    94CD8824-BDAC-40FA-9F53-6A3AFAA2015D My value
    

    Note that SQL Server’s default display mode for the GUID uses standard formatting with dashes after the 8th, 12th, 16th, and 20th digits. The Formatting GUID Values section summarizes how to format a GUID in Oracle, SQL Server, and MySQL.

    Also note that automatically assigning the GUID isn’t always the best approach. See section When to Assign a GUID Automatically for more information.

    GUID Values in MySQL

    The MySQL function to generate a GUID is UUID. It returns a value of type CHAR(36): 32 characters (all valid hex digits) with four dashes for standard formatting.

    SELECT UUID();
    
    a99208ab-636e-11e3-a647-f0def1feb9e8

    The most efficient storage for the GUID is a BINARY(16) column, which means stripping out the dashes and converting the remaining digits to a true hex value.

    CREATE TABLE myTable (
      myID BINARY(16) NOT NULL,
      myVal VARCHAR(20))
    
    INSERT INTO myTable VALUES (UNHEX(REPLACE(UUID(), '-', '')), 'First value');
    INSERT INTO myTable VALUES (UNHEX(REPLACE(UUID(), '-', '')), 'Second value');
    
    SELECT * FROM myTable;
    
    myID   myVal
    ------ ------------
    [BLOB] First value
    [BLOB] Second value
    

    Note that the default display for a BINARY value is [BLOB]. To see the actual value, wrap it with the HEX function:

    SELECT HEX(myID), myVal FROM myTable;
    
    HEX(myID)                        myVal
    -------------------------------- ------------
    CD1BF132638011E3A647F0DEF1FEB9E8 First value
    CF6C70C6638011E3A647F0DEF1FEB9E8 Second value
    

    MySQL won’t let you use the UUID function to define a default value, so if you want to automatically assign a GUID value you’ll need to write a trigger. Note that the trigger isn’t the complete solution, as I’ll explain after this example:

    -- This goes along with the table definition above.
    -- Also note that there's a slight problem here; it's
    -- explained after this example.
    CREATE TRIGGER myTable_Assign_UUID
      BEFORE INSERT ON myTable
      FOR EACH ROW
      SET NEW.myID = UNHEX(REPLACE(UUID(), '-', ''));

    With the trigger in place, if you try an insert the GUID value will be assigned automatically:

    INSERT INTO myTable (myVal) VALUES ('Third value');
    
    SELECT HEX(myID), myVal FROM myTable;
    
    HEX(myID)                        myVal
    -------------------------------- ------------
    CD1BF132638011E3A647F0DEF1FEB9E8 First value
    CF6C70C6638011E3A647F0DEF1FEB9E8 Second value
    ECAA4BF4638011E3A647F0DEF1FEB9E8 Third value
    

    The problem with the trigger is that myID is a NOT NULL column and the INSERT statement doesn’t specify a value for it. Because of this, MySQL will insert the row but it will also issue a warning.

    Warnings are best avoided, so to get an automatically assigned GUID value without warnings, specify a default value for the column definition. Any valid value will do because it will be preempted by the trigger-assigned value, and having the DEFAULT will get rid of the warning. Here are the full table and trigger declarations; the only change from above is the DEFAULT clause for the myID column:

    CREATE TABLE myTable (
      myID BINARY(16) NOT NULL DEFAULT '0',
      myVal VARCHAR(20));
    
    CREATE TRIGGER myTable_Assign_UUID
      BEFORE INSERT ON myTable
      FOR EACH ROW
      SET NEW.myID = UNHEX(REPLACE(UUID(), '-', ''));
    

    MySQL normally performs implicit casts quite freely (some would claim excessively), but not for a BINARY value. To look for a specific GUID you’ll need to convert the string to hex:

    SELECT HEX(myID), myVal FROM myTable
    WHERE myID = UNHEX('CD1BF132638011E3A647F0DEF1FEB9E8');
    
    HEX(myID)                        myVal
    -------------------------------- ------------
    CD1BF132638011E3A647F0DEF1FEB9E8 First value


    Formatting GUID Values

    Primary keys don’t need to be displayed as often as business values, but there will be times when you want to have them in non-binary format. One common scenario is when the primary key is included in a URL to a dynamic web page. Of course the URL should be encrypted in some way, but the encryption should start with a canonical representation of the primary key value.

    Here’s how to pull a GUID stored as a binary into an unformatted (no dashes) 32-character hex string.

    Database Storage type How to convert
    Oracle RAW(16) RAWTOHEX(binary-guid)
    CAST(binary-guid AS CHAR(32))
    SQL Server 1 UNIQUEIDENTIFIER REPLACE(binary-guid, '-', '')
    MySQL BINARY(16) HEX(binary-guid)

    1 The SQL Server UNIQUEIDENTIFIER data type is specifically designed to hold a GUID, so it will implicitly cast a UNIQUEIDENTIFIER value to a formatted GUID when it’s referenced as a string.

    Note that the MySQL UUID function returns hex digits A through F as lowercase, whereas Oracle’s SYS_GUID and SQL Server’s NEWID functions return them as uppercase. If your database is all in MySQL then that’s not a problem. If your database is distributed across MySQL and Oracle and/or SQL Server, you should force MySQL’s GUIDs to uppercase by calling UPPER(UNHEX(REPLACE(UUID(), '-', ''))) when you generate a GUID. That’s a mouthful, but it’s much easier than forcing Oracle or SQL Server to lowercase.

    To convert a binary GUID to a formatted hex string takes a bit more work, at least in Oracle and MySQL. The following are the shortest methods I could come up with:

    Database Storage type How to convert
    Oracle 1 RAW(16) REGEXP_REPLACE(binary-guid, ‘(.{8})(.{4})(.{4})(.{4})(.*)’, ‘\1-\2-\3-\4-\5’)()
    SQL Server UNIQUEIDENTIFIER CONVERT(CHAR(36), binary-guid)
    MySQL 2 BINARY(16) CONCAT_WS('-',
      SUBSTR(HEX(binary-guid), 1, 8),
      SUBSTR(HEX(binary-guid), 9, 4),
      SUBSTR(HEX(binary-guid), 13, 4),
      SUBSTR(HEX(binary-guid), 17, 4),
      SUBSTR(HEX(binary-guid), 21))

    1 This approach is also safe to call on the SYS_GUID function because REGEXP_REPLACE(SYS_GUID(), ...) will call SYS_GUID only once. If you use something like SUBSTR(SYS_GUID(), 1, 8) || ‘-‘ || SUBSTR(SYS_GUID(), 9, 4) || ..., you’d need to call SYS_GUID five times, and each call would return a different GUID.
    2 This approach can’t be used with the UUID function because it would call the function five times, and each call would return a different GUID.

    2 thoughts on “GUID Values as Primary Keys

    1. Clayton Hoyt

      Hi

      I can only speak from a SQL Server perspective but with that in mind here are my thoughts.

      – It’s hard for me to see that 16 bytes is compact when compared to an INT or even a BIGINT.
      – If a GUID is used as your PK and the clustered index (a common convention in SQL Server), you can count on extremely high page splits with associated index fragmentation.
      – The clustered index is a part of every index created on its table. If a GUID used as such, you will have an additional 16 bytes added to each index you create, making it extremely heavy.

      Personally, the only time I ever use a GUID in practice is when there will be some type of data synchronization between databases (e.g. home grown replication).

      Clay

      Reply
    2. ed@lakenine.com Post author

      Thank you Clay, these are all excellent points! I wrote this after working on a project that spanned multiple databases (most of which were external to the customer), and we used GUID values as our PK’s. We didn’t cluster the indexes on our end because we were using Oracle, and its kind-of parallel to SQL Server’s clustered indexes is an index-organized table, which (a) would cause index fragmentation with a GUID just like in SQL Server (except maybe worse) and (b) isn’t quite enough like a clustered index to use lightly, even with an “auto increment” numeric primary key.

      I’ve been super busy on a project (which is why no new posts in the past 6 months) but I’ll make it a point to review this post and make sure I’m not promising the world with GUIDs. At the very least I’ll make sure your caveats are front and center because, well, you’re right. GUIDs were an excellent choice for my last project, but on my current project (SQL Server 2012 only, one extremely large and busy OLTP database) I’m using… BIGINT PK columns with clustered indexes. So the pendulum swings back the other way :)

      Reply

    Leave a Reply

    Your email address will not be published. Required fields are marked *