CHAR and VARCHAR

Recently I happened to find an error in the decision, which included the conversion:

CAST(model AS VARCHAR)

Those who have studied the scheme “Computers”, think about the absurdity of the type conversion in the same type (the column ‘model’ is defined as VARCHAR (50)). However, it is this conversion, and made the request wrong.

The fact is that, if the type size is not specified when converting, then SQL Server is set to defaults, which for VARCHAR are 30. Thus, if the converted string has larger size, then cut off all the characters except for the first 30. Of course, any errors will not occur. Just on the “long” model numbers the proposed decision and gave the wrong result. As they say in such cases, read the documentation. However, it is interesting that on this occasion Standard said?

In according to standard, if the size is not counted for CHAR and VARCHAR types, it means CHAR(1) and VARCHAR(1) respectively. Let’s check DBMS available to me, according to the standard: SQL Server, MySQL, PostgreSQL.

Here there are two aspects:

  1. Type conversion

  2. Using types CHAR / VARCHAR to describe the schema (DDL)

Let’s start with type conversion.

SQL Server 2008

SELECT
CAST('11111111112222222222333333333344444444445555555555' AS CHAR) chr,
CAST('11111111112222222222333333333344444444445555555555' AS VARCHAR) vchr
🚫
[[ error ]]
[[ column ]]
NULL [[ value ]]

The result:

chrvchr
111111111122222222223333333333111111111122222222223333333333

That is both a character types truncated to defaults, which are 30. No error messages occur that, in fact, meet the standard.

PostgreSQL 8.3

SELECT
CAST('11111111112222222222333333333344444444445555555555' AS CHAR) AS chr,
 CAST('11111111112222222222333333333344444444445555555555' AS VARCHAR) AS vchr
chrvchr
111111111112222222222333333333344444444445555555555

There is a half-hearted compliance with standard, i.e. match it against the type CHAR. With regard to the type VARCHAR, then according to the documentation, if character varying is used without length specifier, the type accepts strings of any size, ie truncation does not occur.

MySQL 5.0

As the saying goes, the farther, the “curiouser”. It appears that in MySQL, the conversion of type VARCHAR not supported at all. It remains to check only the conversion to CHAR:

SELECT CAST('11111111112222222222333333333344444444445555555555' AS CHAR) chr
chr
11111111112222222222333333333344444444445555555555

Ie string is not truncated, while in the documentation I read: “When n is not specified when using the CAST and CONVERT functions, the default length is 30”.

Let us now see what happened with the definition of data. Below is a test script.

CREATE TABLE Test_char(
chr CHAR,
vchr VARCHAR
);
DELETE FROM Test_char;
INSERT INTO Test_char
VALUES
('1','11111111112222222222333333333344444444445555555555');
INSERT INTO Test_char
VALUES
('11111111112222222222333333333344444444445555555555', '1');
INSERT INTO Test_char
VALUES
('2',CAST('111111111122222222223333333333' AS VARCHAR));
INSERT INTO Test_char
VALUES
(CAST('111111111122222222223333333333' AS CHAR), '2');
INSERT INTO Test_char
VALUES
('3', '3');
SELECT * FROM Test_char;

SQL Server 2008

chrvchr
33

So, only one row be inserted containing one character for each column. When you insert the remaining rows we get an error message:

String or binary data would be truncated. The statement has been terminated.
which means that we should reduce the size of the row.

Although there is standard compliance, it seems to me that there is a contradiction that an explicit cast to the column type of the table does not work:

INSERT INTO Test_char
 VALUES
(CAST('111111111122222222223333333333' AS CHAR), '2')

PostgreSQL 8.3

chrvchr
111111111112222222222333333333344444444445555555555
2111111111122222222223333333333
12
33

It may be noted the sequence in behavior: VARCHAR is arbitrary size; the second row was not inserted due to an error exceeding the size (ERROR: value too long for type character(1)); explicit value conversion to the column type of the table works, cutting off the extra characters from the right.

MySQL 5.0

VARCHAR type is not supported without specifying the size of the string. CHAR corresponds CHAR (1) – as standard. Since the explicit conversion to CHAR leaves the length of the string without changing, then into the table, defined as

CREATE TABLE Test_char(
chr CHAR,
vchr VARCHAR(1)
);

in the result, as in SQL Server, single row will be added:

chrvchr
33

Conclusions. In my humble opinion, none of these databases does not meet the standard behavior in those cases where the size of type is not specified. In my opinion, PostgreSQL is most consistent in the “particular implementation”. In order to code portability, I would recommend that you always explicitly specify the size.

Let’s consider the following code.

DECLARE @ch AS VARCHAR(2)='1';
DECLARE @t TABLE(ch VARCHAR(2));
insert into @t values('123');
select @ch=CONCAT(@ch,'2','3');
select @ch "var",(select ch from @t) "col";
varcol
12NULL

Here we define the simple variable of VARCHAR(2) data type and the table (table variable) with the single column of the same data type - VARCHAR(2).

Running this code results in truncation of variable value up to defined size, whereas the error arises, when inserting the row in the table, due to the column undersize:

String or binary data would be truncated.

This explains the result obtained - NULL in the col column, i.e. data does not be inserted into the table.

Regarding character data, it should be made a remark about its available sizes.

SELECT LEN(REPLICATE('a',100000)) "varchar"
,LEN(REPLICATE(CAST('a' AS NVARCHAR),100000)) "nvarchar"
,LEN(REPLICATE(CAST('c' AS VARCHAR(MAX)),100000)) "varchar_max";
🚫
[[ error ]]
[[ column ]]
NULL [[ value ]]
varcharnvarcharvarchar_max
80004000100000

This example shows us that simple character data types are truncated up to SQL Server data page - 8000 bytes (NVARCHAR type uses 2 bytes per symbol). VARCHAR(MAX) data type allows to store strings up to 2Gb.

It should be taken into account particularly when concatenating strings with beforehand unknown size of resulting string, which could result in wrong analysis:

SELECT LEN(CONCAT(REPLICATE('a', 7000), REPLICATE('b', 7000))) ch;
🚫
[[ error ]]
[[ column ]]
NULL [[ value ]]
ch
8000

Above is concatenation of two strings of 7000 symbols each. As a result we get string of 8000 symbols in size, the exceed symbols are truncated without error message.