Introduction to Databases

What are Databases and Why are they Important?

4 Major Types of DBMS

Hierarchical

  • Parent-child relationship of storing data
  • Rarely to use nowadays
  • Structure is like a tree with nodes representing recods and branches
  • Eg. Window registry in window xp

Network DBMS

  • Support many to many relationships
  • Complex database structures
  • Eg. RdM server
  • Define database relationships in forms of tables also known as relations
  • Does not support many to many relationships
  • Have predefined data types
  • eg. mysql, oracle, microsoft sql server

Object Oriented Relation DBMS

  • Support storage of new Data types
  • Data stored in form of objects
    • Objects has attributes
  • e.g. postgresql

Common Identifiers - Keys

Primary Key

Uniquely identifies a record.

Foreign Key

Identifies a different record, often in a different table.

File-Based System (Old method)

Limitations

  • Separation and isolation of data
    • Each program maintains its own set of data
    • Users of one program may be unaware of potentially useful data held by other programs
  • Duplication of data (Data Redundancy)
    • Same data is held by different programs.
    • Wasted space and potentially different values/format for the same term
  • Data dependence
    • File structure is defined in the program code
  • Incompatible file formats
    • Programs are written in different languages and cannot easily access each other’s files
  • Fixed Queries/Proliferation of application programs
    • Programs are written to satisfy particular functions
    • Any new requirement needs a new program

Therefore People stop using File-Based System and move to Database Management System.

DBMS (Database Management System)

A software system that enables users to define, create, maintain, and control access to the database.

(Database) application program: a computer program that interacts with database by issuing an appropriate request (SQL statement) to the DBMS.

  • Definition of data was embeeded in application programs, rather than being stored separately and independently
  • No control over access and manipulation of data beyond that imposed by application programs

DBMS is not expected to handle data redundancy.

Database

  • Shared collection of logically related data (and a description of this data), designed to meet the information needs of an organization.
  • System catalog (metadata) provides description of data to enable program–data independence.
  • Logically related data comprises entities, attributes, and relationships of an organization’s information.

Languages Manipulate DBMS

Data definition language (DDL)

  • Permits specification of data types, structures and any data constraints.  All specifications are stored in the database.

Data manipulation language (DML)

  • General enquiry facility (query language) of the data.
  • E.g. INSERT, UPDATE

Database Characteristics

  • Persistence: Data lasts a long time, and is repetitively needed
  • Concurrent Use: Can be used by multiple users at the same time
  • Structured Data: A database system does not only contain the data but also has the complete definition (data type) of these data.
  • Data Integrity: Guarantee data consistency and protection of the database from unauthorized access.

Components of DBMS Environment

Machine

Hardware

  • Can range from a PC to a network of computers

Software

  • DBMS, operating system and also the application programs

Hardware + Software

  • Oracle Exadata; Oracle Big Data Appliance

Bridge

Data

  • Used by the organization and a description (schema) of this data.

Human

Procedures

  • Instructions and rules that should be applied to the database and DBMS.

People

Roles in the Database Environment

  • Data Adminstrator (DA)
  • Database Administrator (DBA)
  • Database Designers (Logical and Physical)
  • Application Programmers
  • End Users (naive and sophisticated)

DBMS Marketplace

Enterprise DBMS

  • Oracle: Dominates in Unix; strong in Windows
  • Microsoft SQL server: Strong in Windows
  • IBM DB2: strong in mainframe environment
  • MySQL: Popular open source DBMS

Desktop DBMS

  • Microsoft Access

Mobile DBMS

  • SQLite (come with Android)

Database Architecture

Traditional Two-Tier Client-Server

First Tier - Client

  • User Interface
  • Main business and data processing logic

Second Tier - Database server

  • Server-side validation
  • Database access

3-Tier Client/Server Architecture

First Tier - Client

  • User Interface

Second Tier - Application server

  • Business logic
  • Data processing logic

TPM as middle tier of 3-tier client-server

Transaction Processing Monitor

  • Controls data transfer between clients and servers in order to provide a consistent environment, particularly for Online Transaction Processing (OLTP).

Third Tier - Database server

  • Data validation
  • Database access

n-Tier Client-Server (e.g. 4-Tier)

First Tier - Client

  • User Interface

Second Tier - Web server

  • Handling web request

Third Tier - Application server

  • Business logic
  • Data processing logic

TPM as middle tier of 3-tier client-server

Transaction Processing Monitor

  • Controls data transfer between clients and servers in order to provide a consistent environment, particularly for Online Transaction Processing (OLTP).

Fourth Tier - Database server

  • Data validation
  • Database access

Relational Models and ER Diagrams

Relational Model Terminology

A relation is a table with columns and rows.

Tuple is a row of a relation.

Attribute is a named column of a relation.

Domain is the set of allowable values for one or more attributes.

Degree is the number of attributes in a relation.

Cardinality is the number of tuples in a relation.

Relational Database is a collection of relations with distinct relation names.

TLDR:

Relation = Table = File

Tuple = Row = Record

Attribute = Column = Field

Database Relations

Relation schema

  • Named relation defined by a set of attribute and domain name pairs.

Relational database schema

  • Set of relation schemas, each with a distinct name.
  • 1 Relational database schema contains many relation schemas.

Properties of Relations

  • Relation name is distinct from all other relation names in relational DB schema.
  • Each cell of relation contains exactly one atomic (single) value.
  • Each attribute has a distinct name.
  • Values of an attribute are all from the same domain.
  • Each tuple is distinct; there are no duplicate tuples.
  • Order of attributes has no significance.
  • Order of tuples has no significance, theoretically.

Relational Keys

Superkey

  • An attribute, or set of attributes (More than 1), that uniquely identifies a tuple within a relation.
  • A superkey may contain additional attributes that are not necessary for unique identification.
  • E.g. Names, StaffNo., are superkeys. While Position, Sex(Male or Female) are not superkey.

Candidate Key

  • Minimum Superkey (cannot be reducable)
  • Superkey (K) such that no proper subset is a superkey within the relation
  • Identifying a candidate key requires that we know the “real-world” meaning of the attributes
  • E.g. StaffNo. is a Candidate Key. While LastName, FirstName is not a Candidate Key respectively.

Primary Key

  • Candidate key selected to identify tuples uniquely within a relation.

Alternate Keys

  • Candidate keys that are not selected to be primary key.

Foreign Key

  • Attribute, or set of attributes, within one relation that matches candidate key of some (possibly same) relation.

Integrity Constraints

Null

  • Represents value for an attribute that is currently unknown or not applicable for tuple.

  • Deals with incomplete or exceptional data.

  • Represents the absence of a value and is not the same as zero or spaces, which are values.

  • Empty string is not a NULL value.

Entity Integrity (Relation / Table Integrity)

  • Each relation has column(s) with unique values
  • No two rows have the same value
  • No attribute of a primary key can be null
  • Can be achieved by having a primary key

Referential Integrity

  • Column values in one table must match column values in a related table
  • Foreign Key values must match their primary key values

Views

Base Relation

  • Actual Table
  • Named relation corresponding to an entity in conceptual schema, whose tuples are physically stored in database.

View

  • Virtual Table (Not Physically Stored in database)
  • Generated upon query
  • Dynamic result of one or more relational operations operating on base relations to produce another relation.
  • Allows each user to have his or her own view of the database.

View Types Based on Purpose

  • Synonym of a table
  • Summarization
  • Join of multiple tables

Benefits of View

  • Reduce complexity
  • Provide a level of security
  • Provide a mechanism to customize the appearance of the database
  • Present a consistent, unchanging picture of the structure of the database, even if the underlying database is changed

Updating Views

  • Updates are allowed if query involves a single base relation and contains a candidate key of base relation.
  • Updates are not allowed involving multiple base relations.
  • Updates are not allowed involving aggregation or grouping operations.

ER Diagram Terminologies

Entity

  • Thing or object, whether real or imagined, about which information needs to be known/tracked
  • E.g. Customer, Order, Book

Attributes

  • Properties
  • Describe the entity’s characteristics

Identifiers

  • Support entity identification. E.g. Primary Key
  • One or more of the entity’s attribute that uniquely identify a row in the table

Relationships

  • Describes how one or more entities are related to each other
  • Can be only 1 entity only (e.g. 1 Person married to another Person in a Entity)

ER Diagrams

Cardinalities

Contains the number of objects that participate in a relationship.

Minimum cardinality

Indicates whether participation in the relationship is:

  • Mandatory (1)
  • Optional (0)

Maximum cardinality

The maximum number of entities that can occur on one side of the relationship:

  • One-to-One
  • One-to-many
  • Many-to-many
Classification Cardinality Restrictions
Mandatory Minimum cardinality >= 1
Optional Minimum cardinality = 0
Functional or single-valued Minimum cardinality = 1
1-M Minimum cardinality = 1 in one direction and Maximum cardinality > 1 in the other direction
M-N Minimum cardinality > 1 in both directions
1-1 Minimum cardinality = 1 in both directions

ERD Notation

M:N to 1:M relationship

The example show how a M:N relationship can be changed into Two 1:M relationships.

Note: MySQL Workbench will automatically convert M:N relationship to two 1:M relationship. You need to change the name of the associative entity and the relationship to fit the context.

In MySQL Workbench

Relationship

An identifying relationship is identified by a solid line between tables.

  • An identifying relationship is one where the child table cannot be uniquely identified without its parent. Typically this occurs where an intermediary table is created to resolve a many- to-many relationship. In such cases, the primary key is usually a composite key made up of the primary keys from the two original tables.

A non-identifying relationship is identified by a broken (dashed) line between tables.

To edit relationship in MySQL Workbench:

  • Double click the relationship between tables, or
  • Right click the relationship between tables and select “Edit Relationship”