Referential integrity: FOREIGN KEY
A foreign key is a constraint that enforces the consistency of two tables, thus ensuring their so-called referential integrity. This kind of integrity means it’s always possible to get full information about an object even if this information is distributed among several tables. The reasons for such a distribution lie in the relational model design principles, and will be examined later.
The relationship between tables isn’t equivalent. There’s always a parent and a child table. There are two types of relationship: one-to-one and one-to-many. A one-to-one relationship means that a row in the parent table has no more than one (that is, one or none) corresponding row in the child table. For a one-to-many relationship, a row in the parent table may refer to any number (including zero) of rows in the child table.
The relationship is established by having equal values of certain columns in the parent and child tables. The column (or a set of columns in the case of a compound key) in the child table lined to a column (or a set of columns) in the parent table is called a foreign key.
Since in any relationship, the parent table is on the “one” side, its column involved in the foreign key relationship should have a PRIMARY KEY or UNIQUE constraint applied. And the foreign key is created by means of the FOREIGN KEY specification:
FOREIGN KEY(< list of columns 1 >
REFERENCES < name of the parent table >(< list of columns 2 >)
The number of columns in the lists 1 and 2 must be equal, and their data types must be compatible for each pair of columns.
That’s how a foreign key for the PC table can be created:
ALTER TABLE PC
ADD CONSTRAINT fk_pc_product
FOREIGN KEY(model) REFERENCES Product(model);
Note: for the parent table, the column in parentheses needn’t to be specified if it is a primary key, since there can be only one primary key in a table. That’s exactly our case; therefore, the last line can be rewritten as follows:
FOREIGN KEY(model) REFERENCES Product;
Foreign keys for the Printer and Laptop tables are created in a similar manner.
Now it’s time to gain an understanding how the foreign key constraint works.
Since this constraint ensures the consistency of data in two tables, it prevents the appearance of rows in the child table for which no corresponding rows exist in the parent table. Discordance could arise as a result of one of the following actions:
1. Adding a row that doesn’t have a corresponding record in the parent table, to the child table. In our case, respective foreign keys won’t allow adding a product to any of the descriptive tables (PC, Laptop, or Printer) whose model isn’t present in the Product table. For instance, the attempt to execute the statement
insert into pc values(13, 1126, 500, 64, 10, '24x', 650);
will produce an error, since the model 1126 doesn’t exist in the Product table:
2. Setting the foreign key attribute of an existing record to a value that isn’t present in the corresponding column of the parent table. In our database, the constraint won’t allow the following UPDATE statement, returning a similar error:
update pc set model = 1126 where model = 1121;
3. Deleting a row in the parent table, for which associated records exist in the child table. Here, data consistency can be ensured in different ways, as specified by option in the non-compulsory clause
ON DELETE <опция>
Following option values are possible:
- CASCADE - cascade deletion, that is, when a row is deleted from the parent table all corresponding rows from the child table will also be deleted. For instance, on deletion of model 1121 from the Product table, the rows with codes 2, 4, and 5 will be deleted from the PC table;
- SET NULL - on deletion of a row from the parent table the foreign key attribute of the corresponding rows in the child table becomes indeterminate. Obviously, this setting implies there is no NOT NULL constraint applied to the foreign key. In our example of deleting model 1121 from the Product table, the model column in the PC table will be set to NULL for the rows with codes 2, 4, and 5;
- SET DEFAULT – similar to the previous setting, but instead of NULL, the foreign key is set to the default value;
- NO ACTION (default setting) – the statement won’t be executed if there are corresponding rows in the child table for the record to be deleted. If there are no corresponding rows, the deletion will be carried out normally.
Since we haven’t specified any ON DELETE clause when creating the foreign key for the PC table, NO ACTION will be used, because it’s the default option. To change the database behavior to, say, cascade deletion, we have to revise the foreign key constraint. We can do it as follows:
- delete the existing constraint;
- create a new constraint.
For constraint removal, the ALTER TABLE statement is used, as well:
ALTER TABLE < table name >
DROP CONSTRAINT < constraint name >;
That’s where the constraint name becomes necessary! Let’s remove the foreign key from the PC table.
ALTER TABLE PC
DROP CONSTRAINT fk_pc_product;
Note
When the foreign key is removed, the columns it is applied to are not deleted, just the constraint itself is dropped. This holds true for other constraints, as well.
Now, we create a new constraint making use of cascade delete:
ALTER TABLE PC
ADD CONSTRAINT fk_pc_product
FOREIGN KEY(model) REFERENCES Product ON DELETE CASCADE;
4. Changes of column values in the parent table the foreign key in the child table refers to, that is, of columns listed in the REFERENCE clause of the FOREIGN KEY constraint. Here, the same behavior patterns are possible as when a row is deleted, but option is introduced by the clause
ON UPDATE < option >
By means of the foreign key, as well as of other constraints, we model connections existing in the subject area. Therefore, the choice of the constraint behavior options is stipulated exactly by the subject area. In our case, it seems natural to create a constraint with CASCADE for changing/deleting the model number in the Product table, so that such a change propagates to the descriptive tables, say, deleting articles of the void model. In other words, for the PC table, we should define the foreign key constraint as follows:
ALTER TABLE PC
ADD CONSTRAINT fk_pc_product
FOREIGN KEY(model) REFERENCES Product
ON DELETE CASCADE
ON UPDATE CASCADE;
However, for a different subject area cascade deletion may result in data loss. Say, we have an Employees and a Departments table linked together by the department number, and on deletion (breaking-up) of a department employees working there aren’t dismissed but transferred to other departments. In this case, cascade deletion would lead to an undesired removal of information on employees who have been working in the division broken up. Here, NO ACTION (first assign employees to other divisions, then delete the “emptied” department) or SET NULL (first delete the department, then find new jobs for employees not assigned to any department) seems more appropriate. I’d like to repeat one more time – the choice of the constraint behavior depends not on the programmer’s preferences but on processes taking place in the real world being modeled.
Notes
1. In the example above, we established a one-to-many relationship between the Product and PC tables. A one-to-one relationship is created when a unique column or column combination represents the foreign key for the child table. In a number of cases, such a relationship can be attributed to flawed database design, since an entity is practically divided in two. However, in some situations there are forcible reasons for such a separation – for instance, if a vertical table partitioning needs to be done to improve performance or because of security considerations.
2. To delete a constraint, one needs to know its name. However, as we’ve already seen, a constraint can be created without being explicitly assigned a name. What is to be done in such a situation? Well, if we don’t explicitly specify a name it’s generated by the system. Thus, the constraint always has a name – whether we know it, is another question. This is an appropriate point to say that relational databases store metadata exactly the same way as “ordinary” data – namely, in tables. By standard, metadata is represented by an information schema, and can be retrieved from it by an ordinary SELECT query. Without going into detail, let’s write a query getting us the foreign key constraint name for the PC table:
SELECT CONSTRAINT_NAME
FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS
WHERE TABLE_NAME ='PC' AND CONSTRAINT_TYPE ='FOREIGN KEY';