Database System [1] - Fundamentals
Introduction to Databases
What are Databases and Why are they Important?
4 Major Types of DBMS
Hierarchical
- Parent-child relationship of storing data
- Rarely to use nowadays
- Structure is like a tree with nodes representing recods and branches
- Eg. Window registry in window xp
Network DBMS
- Support many to many relationships
- Complex database structures
- Eg. RdM server
Relational DBMS (Most popular)
- Define database relationships in forms of tables also known as relations
- Does not support many to many relationships
- Have predefined data types
- eg. mysql, oracle, microsoft sql server
Object Oriented Relation DBMS
- Support storage of new Data types
- Data stored in form of objects
- Objects has attributes
- e.g. postgresql
Common Identifiers - Keys
Primary Key
Uniquely identifies a record.
Foreign Key
Identifies a different record, often in a different table.
File-Based System (Old method)
Limitations
- Separation and isolation of data
- Each program maintains its own set of data
- Users of one program may be unaware of potentially useful data held by other programs
- Duplication of data (Data Redundancy)
- Same data is held by different programs.
- Wasted space and potentially different values/format for the same term
- Data dependence
- File structure is defined in the program code
- Incompatible file formats
- Programs are written in different languages and cannot easily access each other’s files
- Fixed Queries/Proliferation of application programs
- Programs are written to satisfy particular functions
- Any new requirement needs a new program
Therefore People stop using File-Based System and move to Database Management System.
DBMS (Database Management System)
A software system that enables users to define, create, maintain, and control access to the database.
(Database) application program: a computer program that interacts with database by issuing an appropriate request (SQL statement) to the DBMS.
- Definition of data was embeeded in application programs, rather than being stored separately and independently
- No control over access and manipulation of data beyond that imposed by application programs
DBMS is not expected to handle data redundancy.
Database
- Shared collection of logically related data (and a description of this data), designed to meet the information needs of an organization.
- System catalog (metadata) provides description of data to enable program–data independence.
- Logically related data comprises entities, attributes, and relationships of an organization’s information.
Languages Manipulate DBMS
Data definition language (DDL)
- Permits specification of data types, structures and any data constraints. All specifications are stored in the database.
Data manipulation language (DML)
- General enquiry facility (query language) of the data.
- E.g. INSERT, UPDATE
Database Characteristics
- Persistence: Data lasts a long time, and is repetitively needed
- Concurrent Use: Can be used by multiple users at the same time
- Structured Data: A database system does not only contain the data but also has the complete definition (data type) of these data.
- Data Integrity: Guarantee data consistency and protection of the database from unauthorized access.
Components of DBMS Environment
Machine
Hardware
- Can range from a PC to a network of computers
Software
- DBMS, operating system and also the application programs
Hardware + Software
- Oracle Exadata; Oracle Big Data Appliance
Bridge
Data
- Used by the organization and a description (schema) of this data.
Human
Procedures
- Instructions and rules that should be applied to the database and DBMS.
People
Roles in the Database Environment
- Data Adminstrator (DA)
- Database Administrator (DBA)
- Database Designers (Logical and Physical)
- Application Programmers
- End Users (naive and sophisticated)
DBMS Marketplace
Enterprise DBMS
- Oracle: Dominates in Unix; strong in Windows
- Microsoft SQL server: Strong in Windows
- IBM DB2: strong in mainframe environment
- MySQL: Popular open source DBMS
Desktop DBMS
- Microsoft Access
Mobile DBMS
- SQLite (come with Android)
Database Architecture
Traditional Two-Tier Client-Server
First Tier - Client
- User Interface
- Main business and data processing logic
Second Tier - Database server
- Server-side validation
- Database access
3-Tier Client/Server Architecture
First Tier - Client
- User Interface
Second Tier - Application server
- Business logic
- Data processing logic
TPM as middle tier of 3-tier client-server
Transaction Processing Monitor
- Controls data transfer between clients and servers in order to provide a consistent environment, particularly for Online Transaction Processing (OLTP).
Third Tier - Database server
- Data validation
- Database access
n-Tier Client-Server (e.g. 4-Tier)
First Tier - Client
- User Interface
Second Tier - Web server
- Handling web request
Third Tier - Application server
- Business logic
- Data processing logic
TPM as middle tier of 3-tier client-server
Transaction Processing Monitor
- Controls data transfer between clients and servers in order to provide a consistent environment, particularly for Online Transaction Processing (OLTP).
Fourth Tier - Database server
- Data validation
- Database access
Relational Models and ER Diagrams
Relational Model Terminology
A relation is a table with columns and rows.
Tuple is a row of a relation.
Attribute is a named column of a relation.
Domain is the set of allowable values for one or more attributes.
Degree is the number of attributes in a relation.
Cardinality is the number of tuples in a relation.
Relational Database is a collection of relations with distinct relation names.
TLDR:
Relation = Table = File
Tuple = Row = Record
Attribute = Column = Field
Database Relations
Relation schema
- Named relation defined by a set of attribute and domain name pairs.
Relational database schema
- Set of relation schemas, each with a distinct name.
- 1 Relational database schema contains many relation schemas.
Properties of Relations
- Relation name is distinct from all other relation names in relational DB schema.
- Each cell of relation contains exactly one atomic (single) value.
- Each attribute has a distinct name.
- Values of an attribute are all from the same domain.
- Each tuple is distinct; there are no duplicate tuples.
- Order of attributes has no significance.
- Order of tuples has no significance, theoretically.
Relational Keys
Superkey
- An attribute, or set of attributes (More than 1), that uniquely identifies a tuple within a relation.
- A superkey may contain additional attributes that are not necessary for unique identification.
- E.g. Names, StaffNo., are superkeys. While Position, Sex(Male or Female) are not superkey.
Candidate Key
- Minimum Superkey (cannot be reducable)
- Superkey (K) such that no proper subset is a superkey within the relation
- Identifying a candidate key requires that we know the “real-world” meaning of the attributes
- E.g. StaffNo. is a Candidate Key. While LastName, FirstName is not a Candidate Key respectively.
Primary Key
- Candidate key selected to identify tuples uniquely within a relation.
Alternate Keys
- Candidate keys that are not selected to be primary key.
Foreign Key
- Attribute, or set of attributes, within one relation that matches candidate key of some (possibly same) relation.
Integrity Constraints
Null
-
Represents value for an attribute that is currently unknown or not applicable for tuple.
-
Deals with incomplete or exceptional data.
-
Represents the absence of a value and is not the same as zero or spaces, which are values.
-
Empty string is not a NULL value.
Entity Integrity (Relation / Table Integrity)
- Each relation has column(s) with unique values
- No two rows have the same value
- No attribute of a primary key can be null
- Can be achieved by having a primary key
Referential Integrity
- Column values in one table must match column values in a related table
- Foreign Key values must match their primary key values
Views
Base Relation
- Actual Table
- Named relation corresponding to an entity in conceptual schema, whose tuples are physically stored in database.
View
- Virtual Table (Not Physically Stored in database)
- Generated upon query
- Dynamic result of one or more relational operations operating on base relations to produce another relation.
- Allows each user to have his or her own view of the database.
View Types Based on Purpose
- Synonym of a table
- Summarization
- Join of multiple tables
Benefits of View
- Reduce complexity
- Provide a level of security
- Provide a mechanism to customize the appearance of the database
- Present a consistent, unchanging picture of the structure of the database, even if the underlying database is changed
Updating Views
- Updates are allowed if query involves a single base relation and contains a candidate key of base relation.
- Updates are not allowed involving multiple base relations.
- Updates are not allowed involving aggregation or grouping operations.
ER Diagram Terminologies
Entity
- Thing or object, whether real or imagined, about which information needs to be known/tracked
- E.g. Customer, Order, Book
Attributes
- Properties
- Describe the entity’s characteristics
Identifiers
- Support entity identification. E.g. Primary Key
- One or more of the entity’s attribute that uniquely identify a row in the table
Relationships
- Describes how one or more entities are related to each other
- Can be only 1 entity only (e.g. 1 Person married to another Person in a Entity)
ER Diagrams
Cardinalities
Contains the number of objects that participate in a relationship.
Minimum cardinality
Indicates whether participation in the relationship is:
- Mandatory (1)
- Optional (0)
Maximum cardinality
The maximum number of entities that can occur on one side of the relationship:
- One-to-One
- One-to-many
- Many-to-many
Classification | Cardinality Restrictions |
---|---|
Mandatory | Minimum cardinality >= 1 |
Optional | Minimum cardinality = 0 |
Functional or single-valued | Minimum cardinality = 1 |
1-M | Minimum cardinality = 1 in one direction and Maximum cardinality > 1 in the other direction |
M-N | Minimum cardinality > 1 in both directions |
1-1 | Minimum cardinality = 1 in both directions |
ERD Notation
M:N to 1:M relationship
The example show how a M:N relationship can be changed into Two 1:M relationships.
Note: MySQL Workbench will automatically convert M:N relationship to two 1:M relationship. You need to change the name of the associative entity and the relationship to fit the context.
In MySQL Workbench
Relationship
An identifying relationship is identified by a solid line between tables.
- An identifying relationship is one where the child table cannot be uniquely identified without its parent. Typically this occurs where an intermediary table is created to resolve a many- to-many relationship. In such cases, the primary key is usually a composite key made up of the primary keys from the two original tables.
A non-identifying relationship is identified by a broken (dashed) line between tables.
To edit relationship in MySQL Workbench:
- Double click the relationship between tables, or
- Right click the relationship between tables and select “Edit Relationship”