Functional dependency
This article needs additional citations for verification. (October 2012) |
In relational database theory, a functional dependency is the following constraint between two attribute sets in a relation: Given a relation R and attribute sets X,Y R, X is said to functionally determine Y (written X → Y) if each X value is associated with precisely one Y value. R is then said to satisfy the functional dependency X → Y. Equivalently, the projection is a function, that is, Y is a function of X.[1][2] In simple words, if the values for the X attributes are known (say they are x), then the values for the Y attributes corresponding to x can be determined by looking them up in any tuple of R containing x. Customarily X is called the determinant set and Y the dependent set. A functional dependency FD: X → Y is called trivial if Y is a subset of X.
In other words, a dependency FD: X → Y means that the values of Y are determined by the values of X. Two tuples sharing the same values of X will necessarily have the same values of Y.
The determination of functional dependencies is an important part of designing databases in the
A notion of
Examples
Cars
Suppose one is designing a system to track vehicles and the capacity of their engines. Each vehicle has a unique vehicle identification number (VIN). One would write VIN → EngineCapacity because it would be inappropriate for a vehicle's engine to have more than one capacity. (Assuming, in this case, that vehicles only have one engine.) On the other hand, EngineCapacity → VIN is incorrect because there could be many vehicles with the same engine capacity.
This functional dependency may suggest that the attribute EngineCapacity be placed in a relation with candidate key VIN. However, that may not always be appropriate. For example, if that functional dependency occurs as a result of the transitive functional dependencies VIN → VehicleModel and VehicleModel → EngineCapacity then that would not result in a normalized relation.
Lectures
This example illustrates the concept of functional dependency. The situation modelled is that of college students visiting one or more lectures in each of which they are assigned a teaching assistant (TA). Let's further assume that every student is in some semester and is identified by a unique integer ID.
Student ID | Semester | Lecture | TA |
---|---|---|---|
1234 | 6 | Numerical Methods | John |
1221 | 4 | Numerical Methods | Smith |
1234 | 6 | Visual Computing | Bob |
1201 | 2 | Numerical Methods | Peter |
1201 | 2 | Physics II | Simon |
We notice that whenever two rows in this table feature the same StudentID, they also necessarily have the same Semester values. This basic fact can be expressed by a functional dependency:
- StudentID → Semester.
If a row was added where the student had a different value of semester, then the functional dependency FD would no longer exist. This means that the FD is implied by the data as it is possible to have values that would invalidate the FD.
Other nontrivial functional dependencies can be identified, for example:
- {StudentID, Lecture} → TA
- {StudentID, Lecture} → {TA, Semester}
The latter expresses the fact that the set {StudentID, Lecture} is a superkey of the relation.
Employee department
A classic example of functional dependency is the employee department model.
Employee ID | Employee name | Department ID | Department name |
---|---|---|---|
0001 | John Doe | 1 | Human Resources |
0002 | Jane Doe | 2 | Marketing |
0003 | John Smith | 1 | Human Resources |
0004 | Jane Goodall | 3 | Sales |
This case represents an example where multiple functional dependencies are embedded in a single representation of data. Note that because an employee can only be a member of one department, the unique ID of that employee determines the department.
- Employee ID → Employee Name
- Employee ID → Department ID
In addition to this relationship, the table also has a functional dependency through a non-key attribute
- Department ID → Department Name
This example demonstrates that even though there exists a FD Employee ID → Department ID - the employee ID would not be a logical key for determination of the department Name. The process of normalization of the data would recognize all FDs and allow the designer to construct tables and relationships that are more logical based on the data.
Properties and axiomatization of functional dependencies
Given that X, Y, and Z are sets of attributes in a relation R, one can derive several properties of functional dependencies. Among the most important are the following, usually called Armstrong's axioms:[3]
- Reflexivity: If Y is a subset of X, then X → Y
- Augmentation: If X → Y, then XZ → YZ
- Transitivity: If X → Y and Y → Z, then X → Z
"Reflexivity" can be weakened to just , i.e. it is an actual
.
These three rules are a
By applying augmentation and transitivity, one can derive two additional rules:
- Pseudotransitivity: If X → Y and YW → Z, then XW → Z[3]
- Composition: If X → Y and Z → W, then XZ → YW[6]
One can also derive the union and decomposition rules from Armstrong's axioms:[3][7]
- X → Y and X → Z if and only if X → YZ
Closure
Closure of functional dependency
The closure of a set of values is the set of attributes that can be determined using its functional dependencies for a given relationship. One uses Armstrong's axioms to provide a proof - i.e. reflexivity, augmentation, transitivity.
Given and a set of FDs that holds in : The closure of in (denoted +) is the set of all FDs that are logically implied by .[8]
Closure of a set of attributes
Closure of a set of attributes X with respect to is the set X+ of all attributes that are functionally determined by X using +.
Example
Imagine the following list of FDs. We are going to calculate a closure for A (written as A+) from this relationship.
- A → B
- B → C
- AB → D
The closure would be as follows:
- A → A (by Armstrong's reflexivity)
- A → AB (by 1. and (a))
- A → ABD (by (b), 3, and Armstrong's transitivity)
- A → ABCD (by (c), and 2)
Therefore, A+= ABCD. Because A+ includes every attribute in the relationship, it is a superkey.
Covers and equivalence
Covers
Definition: covers if every FD in can be inferred from . covers if + ⊆ +
Every set of functional dependencies has a canonical cover.
Equivalence of two sets of FDs
Two sets of FDs and over schema are equivalent, written ≡ , if + = +. If ≡ , then is a cover for and vice versa. In other words, equivalent sets of functional dependencies are called covers of each other.
Non-redundant covers
A set of FDs is nonredundant if there is no proper subset
of with ≡ . If such an exists, is redundant. is a nonredundant cover for if is a cover for and is nonredundant.
An alternative characterization of nonredundancy is that is nonredundant if there is no FD X → Y in such that - {X → Y} X → Y. Call an FD X → Y in redundant in if - {X → Y} X → Y.
Applications to normalization
Heath's theorem
An important property (yielding an immediate application) of functional dependencies is that if R is a relation with columns named from some set of attributes U and R satisfies some functional dependency X → Y then where Z = U − XY. Intuitively, if a functional dependency X → Y holds in R, then the relation can be safely split in two relations alongside the column X (which is a key for ) ensuring that when the two parts are joined back no data is lost, i.e. a functional dependency provides a simple way to construct a lossless join decomposition of R in two smaller relations. This fact is sometimes called Heaths theorem; it is one of the early results in database theory.[9]
Heath's theorem effectively says we can pull out the values of Y from the big relation R and store them into one, , which has no value repetitions in the row for X and is effectively a
Functional dependencies however should not be confused with
Normal forms
Normal forms are database normalization levels which determine the "goodness" of a table. Generally, the third normal form is considered to be a "good" standard for a relational database.[citation needed]
Normalization aims to free the database from update, insertion and deletion anomalies. It also ensures that when a new value is introduced into the relation, it has minimal effect on the database, and thus minimal effect on the applications using the database.[citation needed]
Irreducible function depending set
A set S of functional dependencies is irreducible if the set has the following three properties:
- Each right set of a functional dependency of S contains only one attribute.
- Each left set of a functional dependency of S is irreducible. It means that reducing any one attribute from left set will change the content of S (S will lose some information).
- Reducing any functional dependency will change the content of S.
Sets of functional dependencies with these properties are also called canonical or minimal. Finding such a set S of functional dependencies which is equivalent to some input set S' provided as input is called finding a minimal cover of S': this problem can be solved in polynomial time.[10]
See also
- Chase (algorithm)
- Inclusion dependency
- Join dependency
- Multivalued dependency (MVD)
- Database normalization
- First normal form
References
- ISBN 978-0-12-373568-3.
- ISBN 978-1-4493-2801-6.
- ^ ISBN 978-0-07-352332-3.
- ^ a b M. Y. Vardi. Fundamentals of dependency theory. In E. Borger, editor, Trends in Theoretical
Computer Science, pages 171–224. Computer Science Press, Rockville, MD, 1987. ISBN 0881750840
- ISBN 0-201-53771-0
- ISBN 978-81-7758-567-4.
- ISBN 978-0-13-187325-4. This is sometimes called the splitting/combining rule.
- ISSN 0010-4620.
- S2CID 22069259. cited in:
- Ronald Fagin and Moshe Y. Vardi (1986). "The Theory of Data Dependencies - A Survey". In Michael Anshel and William Gewirtz (ed.). Mathematics of Information Processing: [short Course Held in Louisville, Kentucky, January 23-24, 1984]. American Mathematical Soc. p. 23. ISBN 978-0-8218-0086-7.
- C. Date (2005). Database in Depth: Relational Theory for Practitioners. O'Reilly Media, Inc. p. 142. ISBN 978-0-596-10012-4.
- Ronald Fagin and Moshe Y. Vardi (1986). "The Theory of Data Dependencies - A Survey". In Michael Anshel and William Gewirtz (ed.). Mathematics of Information Processing: [short Course Held in Louisville, Kentucky, January 23-24, 1984]. American Mathematical Soc. p. 23.
- S2CID 15789293.
Further reading
- Codd, E. F. (1972). "Further Normalization of the Data Base Relational Model" (PDF). ACM Transactions on Database Systems. San Jose, California: Association for Computing Machinery.
External links
- Gary Burt (Summer 1999). "CS 461 (Database Management Systems) lecture notes". University of Maryland Baltimore CountyDepartment of Computer Science and Electrical Engineering.
- Jeffrey D. Ullman. "CS345 Lecture Notes" (PostScript). Stanford University.
- Osmar Zaiane (June 9, 1998). "Chapter 6: Integrity constraints". CMPT 354 (Database Systems I) lecture notes. Simon Fraser University Department of Computing Science.