Exploring database delimited identifiers and case sensitivity: the effect of quoting table and field names in the various DBMS.
Almost every month I get a bug report for PEAR::MDB2 about identifiers (table and field names) not being quoted as expected. Or, when they are quoted, the query fails for unknown reasons. Most of the times, the problem is not within MDB2: there's simply a lot of confusion on how quoting the identifiers affects the table/field creation and the subsequent queries that reference them. I think it's time to dispell some myths and shed some light on the topic of identifier quoting and case sensitivity.
The SQL:2008 and SQL-99 standards define databases to be case insensitive for identifiers unless they are quoted.
Lower case characters may be used in identifiers and keywords, but are considered to be their upper case counterparts.
In other words, delimited identifiers are case sensitive ("table_name" != "Table_Name"),
while non quoted identifiers are not, and are transformed to upper case (table_name => TABLE_NAME).
The SQL-92 standard is even more explicit on how to handle lower/upper case letters in identifiers, while I find it a bit less clear on how delimited identifiers should be treated.
Now, as everyone knows, not all the DBMS vendors adhere to the SQL standards in a strict way. Let's see how each one of them interprets it on this issue.
DB2, Oracle and Interbase/Firebird are 100% compliant with the aforementioned standard. Their behavior is identical: the identifiers are treated in a case-sensitive way when quoted, and uppercased when not quoted.
-- test table names and case sensitivity CREATE TABLE test_case (id INTEGER); --ok, TEST_CASE table created CREATE TABLE "Test_Case" (id INTEGER); --ok, Test_Case table created CREATE TABLE "test_case" (id INTEGER); --ok, test_case table created CREATE TABLE "TEST_CASE" (id INTEGER); --fails: already exists -- test field names and case sensitivity ALTER TABLE TEST_CASE ADD TeSt INTEGER; --ok, TEST field created ALTER TABLE TEST_CASE ADD "TeSt" INTEGER; --ok, TeSt field created ALTER TABLE TEST_CASE ADD "test" INTEGER; --ok, test field created ALTER TABLE TEST_CASE ADD "TEST" INTEGER; --fails: already exists -- test identifiers and case sensitivity INSERT INTO Test_Case (id) VALUES (1); --record inserted into TEST_CASE table INSERT INTO "test_case" (id) VALUES (1); --record inserted into test_case table INSERT INTO "Test_Case" (id) VALUES (1); --record inserted into Test_Case table INSERT INTO "TEST_CASE" (id) VALUES (1); --record inserted into TEST_CASE table INSERT INTO "TEST_case" (id) VALUES (1); --fails: TEST_case table doesn't exists
One very important thing to note is that if you quote the table/field name when you create it, you MUST use quotes in every query, or at least you must reference them in upper case if you don't delimit them.
PostgreSQL has the same behaviour, with the only exception that lowercases every unquoted identifier, instead of uppercasing it.
-- test table names and case sensitivity CREATE TABLE TEST_CASE (id INTEGER); --ok, test_case table created CREATE TABLE "Test_Case" (id INTEGER); --ok, Test_Case table created CREATE TABLE "test_case" (id INTEGER); --fails: already exists CREATE TABLE "TEST_CASE" (id INTEGER); --ok, TEST_CASE table created -- test field names and case sensitivity ALTER TABLE TEST_CASE ADD TeSt INTEGER; --ok, test field created ALTER TABLE TEST_CASE ADD "TeSt" INTEGER; --ok, TeSt field created ALTER TABLE TEST_CASE ADD "test" INTEGER; --fails: already exists ALTER TABLE TEST_CASE ADD "TEST" INTEGER; --ok, TEST field created -- test identifiers and case sensitivity INSERT INTO Test_Case (id) VALUES (1); --record inserted into test_case table INSERT INTO "test_case" (id) VALUES (1); --record inserted into test_case table INSERT INTO "Test_Case" (id) VALUES (1); --record inserted into Test_Case table INSERT INTO "TEST_CASE" (id) VALUES (1); --record inserted into TEST_CASE table INSERT INTO "TEST_case" (id) VALUES (1); --fails: TEST_case table doesn't exists
Since MySQL stores each table into a file with the same name in the file system, we can expect it to behave differently depending on the OS where it's installed.
On Windows:
-- test table names and case sensitivity CREATE TABLE TEST_CASE (id INTEGER); --ok, test_case table created CREATE TABLE `Test_Case` (id INTEGER); --fails: already exists CREATE TABLE `test_case` (id INTEGER); --fails: already exists CREATE TABLE `TEST_CASE` (id INTEGER); --fails: already exists ALTER TABLE TEST_CASE ADD TeSt INTEGER; --ok, TeSt field created ALTER TABLE TEST_CASE ADD `TeSt` INTEGER; --fails: already exists ALTER TABLE TEST_CASE ADD `test` INTEGER; --fails: already exists ALTER TABLE TEST_CASE ADD `TEST` INTEGER; --fails: already exists INSERT INTO test_case (id) VALUES (1); --record inserted into test_case table INSERT INTO `test_case` (id) VALUES (1); --record inserted into test_case table INSERT INTO `Test_Case` (id) VALUES (1); --record inserted into test_case table INSERT INTO `TEST_CASE` (id) VALUES (1); --record inserted into test_case table INSERT INTO `TEST_case` (id) VALUES (1); --record inserted into test_case table
On Unix/Linux:
-- test table names and case sensitivity CREATE TABLE Test_Case (id INTEGER); --ok, Test_Case created CREATE TABLE `Test_Case` (id INTEGER); --fails: already exists CREATE TABLE `test_case` (id INTEGER); --ok, test_case created CREATE TABLE `TEST_CASE` (id INTEGER); --ok, TEST_CASE created ALTER TABLE TEST_CASE ADD TeSt INTEGER; --ok, TeSt field created ALTER TABLE TEST_CASE ADD `TeSt` INTEGER; --fails: already exists ALTER TABLE TEST_CASE ADD `test` INTEGER; --fails: already exists ALTER TABLE TEST_CASE ADD `TEST` INTEGER; --fails: already exists INSERT INTO Test_Case (id) VALUES (1); --record inserted into Test_Case table INSERT INTO `test_case` (id) VALUES (1); --record inserted into test_case table INSERT INTO `Test_Case` (id) VALUES (1); --record inserted into Test_Case table INSERT INTO `TEST_CASE` (id) VALUES (1); --record inserted into TEST_CASE table INSERT INTO `TEST_case` (id) VALUES (1); --fails: TEST_case table doesn't exists
Some interesting observations: on Windows, the table name is lowercased if unquoted, while on *nix the case is preserved. That's because of the default value of the lower_case_table_names system variable, which is different on the two systems.
In all the platforms, the case in the field names is preserved, but no two columns are allowed in the same table with the same name (regardless the case of their characters).
SQLite and SQL Server have an even more interesting behavior:
-- test table names and case sensitivity CREATE TABLE TeSt_CaSe (id INTEGER); --ok, TeSt_CaSe table created CREATE TABLE "TeSt_CaSe" (id INTEGER); --fails: already exists CREATE TABLE "TEST_CASE" (id INTEGER); --fails: already exists CREATE TABLE "test_case" (id INTEGER); --fails: already exists ALTER TABLE TEST_CASE ADD TeSt INTEGER; --ok, TeSt created ALTER TABLE TEST_CASE ADD "TeSt" INTEGER; --fails: already exists ALTER TABLE TEST_CASE ADD "test" INTEGER; --fails: already exists ALTER TABLE TEST_CASE ADD "TEST" INTEGER; --fails: already exists INSERT INTO test_case (id) VALUES (1); --record inserted into Test_CaSe table INSERT INTO "test_case" (id) VALUES (1); --record inserted into Test_CaSe table INSERT INTO "TEST_CASE" (id) VALUES (1); --record inserted into Test_CaSe table INSERT INTO "TeSt_CaSe" (id) VALUES (1); --record inserted into Test_CaSe table INSERT INTO "TEST_case" (id) VALUES (1); --record inserted into Test_Case table
The case of the table and field names are preserved on creation, but they are completely ignored afterwards, that is you can't create two tables (or two fields in the same table) whose only difference in their names is the capitalisation of the characters. What I find really curious is that if you quote the identifier, even if you use a different capitalisation than the one used at creation time, the table/field is always referenced in a case-insensitive way.
No wonders that case sensitivity of identifiers causes so much confusion!
Since every DMBS seems to implement the SQL standards in a different way, you can't expect an uniform behavior.
The safest choice if you care about portability and peace of mind is not to quote the identifiers when you create your tables/fields and when you run your queries.
I hope now it's clear why the quote_identifier option in MDB2 is disabled by default.
Now, I can hear the complaints coming... "But I *need* to quote my table/field name, because I'm using a reserved keyword...".
Let me answer with another question: why on earth do you use a reserved keyword as identifier? ;-)
Thank you for your attention.
Just after I finished writing this article, I found this other reference: delimited identifiers. Have a look. Also, in case of doubt, check if the name you want to use is a reserved keyword. MDB2_Schema can also check a name against a list of reserved keywords.
Lorenzo
Leave a reply