Normalization

 

First Normal Form : No repeating content between any two rows. Separate table for repeats

Second Normal Form : No repeating content for part of composite key. Separate key+table

Third Normal Form : A non-key field cannot depend on another non-key field. Make field key

 

Pros:

  • More rows per page (because rows are shorter, less logical I/O)
  • More rows per I/O (more efficient)
  • More rows fit in cache (less physical I/O)
  • Searching, sorting, and creating indexes is faster, since tables are narrower, and more rows fit on a data page.
  • You usually have more tables. You can have more clustered indexes (one per table), so you get more flexibility in tuning queries.
  • Index searching is often faster, since indexes tend to be narrower and shorter.
  • More tables allow better use of segments to control physical placement of data.
  • You usually have fewer indexes per table, so data modification commands are faster.
  • Fewer null values and less redundant data, making your database more compact.
  • Triggers execute more quickly if you are not maintaining redundant data.
  • Data modification anomalies are reduced.
  • Normalization is conceptually cleaner and easier to maintain and change as your needs change.

 

Cons:

  • When nearly all of the most frequent queries require access to the full set of joined data.
  • A majority of applications perform table scans when joining tables.
  • Computational complexity of derived columns requires temporary tables or excessively complex queries.

 

Managing denormalized data

 

Whatever denormalization techniques you use, you need to ensure data integrity by using:

  • Triggers, which can update derived or duplicated data anytime the base data changes
  • Application logic, using transactions in each application that update denormalized data, to ensure that changes are atomic
  • Batch reconciliation, run at appropriate intervals, to bring the denormalized data back into agreement

From an integrity point of view, triggers provide the best solution,

 

Indexes

 

Clustered Index

 

  • Most allpages-locked tables should have clustered indexes or use partitions to reduce contention on the last page of heaps.
  • In a high-transaction environment, the locking on the last page severely limits throughput.
  • If your environment requires a lot of inserts, do not place the clustered index key on a steadily increasing value such as an IDENTITY column.
  • Choose a key that places inserts on random pages to minimize lock contention while remaining useful in many queries. Often, the primary key does not meet this condition.
  • This problem is less severe on data-only-locked tables, but is a major source of lock contention on allpages-locked tables.
  • Clustered indexes provide very good performance when the key matches the search argument in range queries, such as: where colvalue >= 5 and colvalue < 10
  • In allpages-locked tables, rows are maintained in key order and pages are linked in order, providing very fast performance for queries using a clustered index.
  • In data-only-locked tables, rows are in key order after the index is created, but the clustering can decline over time.
  • Other good choices for clustered index keys are columns used in order by clauses and in joins.
  • If possible, do not include frequently updated columns as keys in clustered indexes on allpages-locked tables.
  • When the keys are updated, the rows must be moved from the current location to a new page. Also, if the index is clustered, but not unique, updates are done in deferred mode.

 

Composite Index

 

  • If your analysis shows that more than one column is a good candidate for a clustered index key, you may be able to provide clustered-like access with a composite index that covers a particular query or set of queries.
  • Index Covering contains all columns in a select query

 

Update Statistics

  • Update histograms, which are kept on a per-column basis (update statistics (col1); update statistics tab1 tab1_idx)
  • Delete statistics after dropping column (delete statistics tab1 (col1))
  • Check statistics using command line utility optdiag and system table systabstats

 

Locks(sp_lock)

 

ALLPAGES

  • Exclusive locks are held on all affected data pages and index pages.
  • Rows are stored on strictly key order

 

DATAPAGES

  • Exclusive locks are held on all affected data pages.
  • Rows are stored on strictly key order.

 

DATAROWS

  • Exclusive locks are held on all affected data rows.
  • Rows are not stored on strictly key order within a page.

 

Page and Row Locks

 

Shared Lockno exclusive lock can be held

Exclusive Lockno other lock of any kind

Update Lockshared lock OK, to be promoted to Exclusive Lock

 

Table Locks

 

Intent Locklock is currently held

 

Isolation Level(set transaction isolation level 3)

 

  • 0 - read uncommitted

The transaction is allowed to read uncommitted changes to data. Select operation does not lock table.

  • 1 - read committed

The transaction is allowed to read only committed changes to data. Select operation locks table.

  • 2 - repeatable read

The transaction can repeat the same query, and no rows that have been read by the transaction will have been updated or deleted. New rows can appear however.

  • 3 - serializable read

The transaction can repeat the same query, and receive exactly the same results. No rows can be inserted that would appear in the result set.

 

Cursor

 

  • Row-oriented programming as opposed to set-oriented.
  • Read-only mode is in effect when you specify for read only or when the cursor’s select statement uses distinct, group by, union, or aggregate functions, and in some cases, an order by clause.
  • Update mode uses update page or row locks. It is in effect when:
    • You specify for update.
    • The select statement does not include distinct, group by, union, a subquery, aggregate functions, or the at isolation read uncommitted clause.
    • You specify shared.
  • Halloween problem – row reappears when only part of a composite index is updated.
  • Optimizing tips:
    • Optimize cursor selects using the cursor, not an ad hoc query.
    • Use union or union all instead of or clauses or in lists.
    • Declare the cursor’s intent.
    • Specify column names in the for update clause.
    • Fetch more than one row if you are returning rows to the client.
    • Keep cursors open across commits and rollbacks.
    • Open multiple cursors on a single connection.

 

Misc

 

@@sqlstatus = 1 Error

                      = 2 End of result set

 

 

Transaction ACID

 

Atomicity- atomic transaction – all or nothing

Consistency- database consistent at all time regardless of success or failure

Isolation- no changes visible to other processes until transaction is committed

Durability- committed transactions should survive all kinds of crashes

 

CRUD Interface

 

Create

Retrieve

Update

Delete

 

The case against MySQL

 

    • Data corruption due to IBM ISAM
    • Row locks are executed as table locks
    • MyISAM is not ACID compliant
    • No procedural language support until recently
    • Trigger support only per row and not per statement
    • Proprietary authentication only – no LDAP or Kerberos