Database identifiers, quoting and case sensitivity

Almost every month I get a bug report for PEAR::MDB2 about identifiers (table and field names) not being quoted as expected. Or, when they are quoted, the query fails for unknown reasons. Most of the times, the problem is not within MDB2: there's simply a lot of confusion on how quoting the identifiers affects the table/field creation and the subsequent queries that reference them. I think it's time to dispell some myths and shed some light on the topic of identifier quoting and case sensitivity.

The SQL:2008 and SQL-99 and standards

The SQL:2008 and SQL-99 standards define databases to be case insensitive for identifiers unless they are quoted. Lower case characters may be used in identifiers and keywords, but are considered to be their upper case counterparts.
In other words, delimited identifiers are case sensitive ("table_name" != "Table_Name"), while non quoted identifiers are not, and are transformed to upper case (table_name => TABLE_NAME). The SQL-92 standard is even more explicit on how to handle lower/upper case letters in identifiers, while I find it a bit less clear on how delimited identifiers should be treated.

Now, as everyone knows, not all the DBMS vendors adhere to the SQL standards in a strict way. Let's see how each one of them interprets it on this issue.

  1. First, I'm going to create a table without quoting the name, and then to create some other tables with the same name but with slight differences in the characters case, this time quoting the name.
  2. Then I'm going to do the same thing with some table fields.
  3. Finally, I'm going to run some queries to insert some data in those tables, with and without quoting the identifiers.

DB2, Oracle and Interbase/Firebird

DB2, Oracle and Interbase/Firebird are 100% compliant with the aforementioned standard. Their behavior is identical: the identifiers are treated in a case-sensitive way when quoted, and uppercased when not quoted.

-- test table names and case sensitivity
CREATE TABLE test_case (id INTEGER);   --ok, TEST_CASE table created
CREATE TABLE "Test_Case" (id INTEGER); --ok, Test_Case table created
CREATE TABLE "test_case" (id INTEGER); --ok, test_case table created
CREATE TABLE "TEST_CASE" (id INTEGER); --fails: already exists

-- test field names and case sensitivity
ALTER TABLE TEST_CASE ADD TeSt INTEGER;   --ok, TEST field created
ALTER TABLE TEST_CASE ADD "TeSt" INTEGER; --ok, TeSt field created
ALTER TABLE TEST_CASE ADD "test" INTEGER; --ok, test field created
ALTER TABLE TEST_CASE ADD "TEST" INTEGER; --fails: already exists

-- test identifiers and case sensitivity
INSERT INTO Test_Case (id) VALUES (1);   --record inserted into TEST_CASE table
INSERT INTO "test_case" (id) VALUES (1); --record inserted into test_case table
INSERT INTO "Test_Case" (id) VALUES (1); --record inserted into Test_Case table
INSERT INTO "TEST_CASE" (id) VALUES (1); --record inserted into TEST_CASE table
INSERT INTO "TEST_case" (id) VALUES (1); --fails: TEST_case table doesn't exists

One very important thing to note is that if you quote the table/field name when you create it, you MUST use quotes in every query, or at least you must reference them in upper case if you don't delimit them.

PostgreSQL

PostgreSQL has the same behaviour, with the only exception that lowercases every unquoted identifier, instead of uppercasing it.

-- test table names and case sensitivity
CREATE TABLE TEST_CASE (id INTEGER);   --ok, test_case table created
CREATE TABLE "Test_Case" (id INTEGER); --ok, Test_Case table created
CREATE TABLE "test_case" (id INTEGER); --fails: already exists
CREATE TABLE "TEST_CASE" (id INTEGER); --ok, TEST_CASE table created

-- test field names and case sensitivity
ALTER TABLE TEST_CASE ADD TeSt INTEGER;   --ok, test field created
ALTER TABLE TEST_CASE ADD "TeSt" INTEGER; --ok, TeSt field created
ALTER TABLE TEST_CASE ADD "test" INTEGER; --fails: already exists
ALTER TABLE TEST_CASE ADD "TEST" INTEGER; --ok, TEST field created

-- test identifiers and case sensitivity
INSERT INTO Test_Case (id) VALUES (1);   --record inserted into test_case table
INSERT INTO "test_case" (id) VALUES (1); --record inserted into test_case table
INSERT INTO "Test_Case" (id) VALUES (1); --record inserted into Test_Case table
INSERT INTO "TEST_CASE" (id) VALUES (1); --record inserted into TEST_CASE table
INSERT INTO "TEST_case" (id) VALUES (1); --fails: TEST_case table doesn't exists

MySQL

Since MySQL stores each table into a file with the same name in the file system, we can expect it to behave differently depending on the OS where it's installed.


On Windows:

-- test table names and case sensitivity
CREATE TABLE TEST_CASE (id INTEGER);   --ok, test_case table created
CREATE TABLE `Test_Case` (id INTEGER); --fails: already exists
CREATE TABLE `test_case` (id INTEGER); --fails: already exists
CREATE TABLE `TEST_CASE` (id INTEGER); --fails: already exists
 
ALTER TABLE TEST_CASE ADD TeSt INTEGER;   --ok, TeSt field created
ALTER TABLE TEST_CASE ADD `TeSt` INTEGER; --fails: already exists
ALTER TABLE TEST_CASE ADD `test` INTEGER; --fails: already exists
ALTER TABLE TEST_CASE ADD `TEST` INTEGER; --fails: already exists
 
INSERT INTO test_case (id) VALUES (1);   --record inserted into test_case table
INSERT INTO `test_case` (id) VALUES (1); --record inserted into test_case table
INSERT INTO `Test_Case` (id) VALUES (1); --record inserted into test_case table
INSERT INTO `TEST_CASE` (id) VALUES (1); --record inserted into test_case table
INSERT INTO `TEST_case` (id) VALUES (1); --record inserted into test_case table

On Unix/Linux:

-- test table names and case sensitivity
CREATE TABLE Test_Case (id INTEGER);   --ok, Test_Case created
CREATE TABLE `Test_Case` (id INTEGER); --fails: already exists
CREATE TABLE `test_case` (id INTEGER); --ok, test_case created
CREATE TABLE `TEST_CASE` (id INTEGER); --ok, TEST_CASE created
 
ALTER TABLE TEST_CASE ADD TeSt INTEGER;   --ok, TeSt field created
ALTER TABLE TEST_CASE ADD `TeSt` INTEGER; --fails: already exists
ALTER TABLE TEST_CASE ADD `test` INTEGER; --fails: already exists
ALTER TABLE TEST_CASE ADD `TEST` INTEGER; --fails: already exists
 
INSERT INTO Test_Case (id) VALUES (1);   --record inserted into Test_Case table
INSERT INTO `test_case` (id) VALUES (1); --record inserted into test_case table
INSERT INTO `Test_Case` (id) VALUES (1); --record inserted into Test_Case table
INSERT INTO `TEST_CASE` (id) VALUES (1); --record inserted into TEST_CASE table
INSERT INTO `TEST_case` (id) VALUES (1); --fails: TEST_case table doesn't exists

Some interesting observations: on Windows, the table name is lowercased if unquoted, while on *nix the case is preserved. That's because of the default value of the lower_case_table_names system variable, which is different on the two systems.
In all the platforms, the case in the field names is preserved, but no two columns are allowed in the same table with the same name (regardless the case of their characters).

SQLite and SQL Server

SQLite and SQL Server have an even more interesting behavior:

-- test table names and case sensitivity
CREATE TABLE TeSt_CaSe (id INTEGER);   --ok, TeSt_CaSe table created
CREATE TABLE "TeSt_CaSe" (id INTEGER); --fails: already exists
CREATE TABLE "TEST_CASE" (id INTEGER); --fails: already exists
CREATE TABLE "test_case" (id INTEGER); --fails: already exists
 
ALTER TABLE TEST_CASE ADD TeSt INTEGER;   --ok, TeSt created
ALTER TABLE TEST_CASE ADD "TeSt" INTEGER; --fails: already exists
ALTER TABLE TEST_CASE ADD "test" INTEGER; --fails: already exists
ALTER TABLE TEST_CASE ADD "TEST" INTEGER; --fails: already exists
 
INSERT INTO test_case (id) VALUES (1);   --record inserted into Test_CaSe table
INSERT INTO "test_case" (id) VALUES (1); --record inserted into Test_CaSe table
INSERT INTO "TEST_CASE" (id) VALUES (1); --record inserted into Test_CaSe table
INSERT INTO "TeSt_CaSe" (id) VALUES (1); --record inserted into Test_CaSe table
INSERT INTO "TEST_case" (id) VALUES (1); --record inserted into Test_Case table

The case of the table and field names are preserved on creation, but they are completely ignored afterwards, that is you can't create two tables (or two fields in the same table) whose only difference in their names is the capitalisation of the characters. What I find really curious is that if you quote the identifier, even if you use a different capitalisation than the one used at creation time, the table/field is always referenced in a case-insensitive way.

Conclusion

No wonders that case sensitivity of identifiers causes so much confusion! Since every DMBS seems to implement the SQL standards in a different way, you can't expect an uniform behavior. The safest choice if you care about portability and peace of mind is not to quote the identifiers when you create your tables/fields and when you run your queries.

I hope now it's clear why the quote_identifier option in MDB2 is disabled by default. Now, I can hear the complaints coming... "But I *need* to quote my table/field name, because I'm using a reserved keyword...". Let me answer with another question: why on earth do you use a reserved keyword as identifier? ;-)

Thank you for your attention.

Update

Just after I finished writing this article, I found this other reference: delimited identifiers. Have a look. Also, in case of doubt, check if the name you want to use is a reserved keyword. MDB2_Schema can also check a name against a list of reserved keywords.




5 responses to "Database identifiers, quoting and case sensitivity"

The filesystem dependency of MySQL depends on the storage engine. MyISAM suffers from the case-sensitivity of the filesystem whereas InnoDB only depends on the lower_case_tablenames settings. I\'m not sure if InnoDB when running with one_file_per_table also suffers from the case sensitivity of the file system.

I use SQL Server a lot and find their approach more natural. However, I wanted to point out that its 'delimited' identifiers includes both bracketted, e.g. [x], and quoted, e.g. "x". I believe the rules are the same for both but quoted has to be enabled explicitly.
Their delimited identifiers copes with much more than mere case differences though. It allows special characters to be included and reserved words to be used (usually tables that have been previously named by a reserved word).
The bracketted form is also used extensively in Microsoft Analysis Services, and even appears in their old VB6 syntax.

Where is admin?! Thank for all

Hi.. very logical and helpful article.. and yes.. the questions like .. \\\"Wht if I would like to use a reserved keyword\\\" are hogwash...

While I agree that identifiers should be left unquoted when you are hard-coding them into your application, if you\'re writing an application where the identifiers can be set by the user then it pays to quote your identifiers to avoid things such as SQL injection attacks, or in case the user (who is out of your control) decides they want to use an identifier that just happens to be an SQL keyword.
Thanks for the article, it\'s a very helpful comparison of how various DBs deal with case in identifiers.

Lorenzo Alberton

Lorenzo Alberton Lorenzo PHP5 ZCE - Zend Certified Engineer has been working with large enterprise UK companies for the past years and is now Chief Tech Architect at DataSift. He's an international conference speaker and a long-time contributor to many open source projects. Lorenzo Alberton's profile on LinkedIN View Lorenzo Alberton's Twitter stream

Lorenzo Alberton - Sun Certified MySQL 5 Developer

Tags

AJAX, Apache, Book Review, Charset, Cheat Sheet, Data structures, Database, Firebird SQL, Hadoop, Imagick, INFORMATION_SCHEMA, JavaScript, Kafka, Linux, Message Queues, mod_rewrite, Monitoring, MySQL, NoSQL, Oracle, PDO, PEAR, Performance, PHP, PostgreSQL, Profiling, Scalability, Security, SPL, SQL Server, SQLite, Testing, Tutorial, TYPO3, Windows, Zend Framework

Buy me a book - Seeing Is Forgetting