About a year ago, I had an idea for a SQL Server enhancement that I never wrote about.  I crafted a few sample queries that would utilize the feature, and I shared them with some colleagues, but I never wrote an article about it.  Needless to say, it's been in the back of my head ever since, as an "unfinished task."

One problem that almost all of us face is dealing with inactive database records.  There are dozens of approaches out there, but a few are becoming pretty standard.  Personally, I've ended up using an 'IsActive' column on tables.  I noticed that Subsonic uses the inverse approach, looking for "Deleted" or "IsDeleted" columns, as mentioned here:

If you are using logical deletes - delete flags instead of record removal - SubSonic will support this. Our convention is to use "Deleted" or "IsDeleted" as the flag for this column, and that it's a bit field. If you have these fields in your database SubSonic will mark it appropriately on delete, or it will just delete the record.

You can use a permanent delete along with the logical deletes by using "Destroy()".

I like what SubSonic did there; very clean.  But you still have to consider the active/deleted state with all queries throughout the entire system, as well as in joins to tables and lots of other aspects of everyday life.

Permanently deleting records seems to happen a lot less frequently than deactivating records.  Yet only the permanent delete is built into SQL Server.  My idea is to implement the ability to deactivate records inside SQL, without permanently deleting them.  But in order to be a complete solution, you need to be able to still work with inactive records as needed.

As I did last year, I'm going to pitch my idea by showing SQL statements that would utilize the features.  The Northwind schema is used.

First, let's deactivate a record:

   1: DEACTIVATE    Employees
   2: WHERE         employee_id = @employee_id


Next, let's reactivate that same record: (it happens, you know it does)

   1: REACTIVATE    Employees
   2: WHERE         employee_id = @employee_id


By default (but configurable per database of course), inactive records would NOT be included in queries.  So, the following statement would exclude any records that have been deactivated:

   1: SELECT    *
   2: FROM      Employees


But again, we sometimes need to get inactive records to come back out, so the following convention would allow us to include them in a query:

   1: SELECT    *
   2: FROM      Employees WITH (INACTIVE)


If we needed to get only the inactive records, omitting any active records, we could do that too.  This would be helpful for archiving processes.

   1: DELETE    Employees
   2: FROM      Employees WITH (INACTIVE ONLY)


The 'with' clause would be used on joins just as it is for lock hints.  And when you need to set a transaction level hint to use inactive records, you could do that too, with one of the following three statements:

   1: SET TRANSACTION RECORD STATE INACTIVE
   2: SET TRANSACTION RECORD STATE INACTIVE ONLY
   3: SET TRANSACTION RECORD STATE ACTIVE ONLY

 

Where this feature would get fun is when you start dealing with foreign keys.  Here's a query that would get products from Northwind that are active and belong to an active category.

   1: SELECT        Products.product_id
   2:          ,    Products.product_name
   3:          ,    Categories.category_name
   4:          ,    Supplier.company_name
   5:          ,    Products.quantity_per_unit
   6:          ,    Products.unit_price
   7: FROM          Products
   8: INNER JOIN    Categories ON Categories.category_id = Products.category_id
   9: INNER JOIN    Supplier ON Supplier.supplier_id = Products.supplier_id


Because inner joins were used on Categories and Supplier, any inactive records in those tables would fail the inner join and the products automatically get filtered out.

If you need to include products belonging to inactive categories or suppliers, you'd have a couple of choices.  First, using the record state hint:

   1: SELECT        Products.product_id
   2:          ,    Products.product_name
   3:          ,    Categories.category_name
   4:          ,    Supplier.company_name
   5:          ,    Products.quantity_per_unit
   6:          ,    Products.unit_price
   7: FROM          Products
   8: INNER JOIN    Categories WITH (INACTIVE) ON Categories.category_id = Products.category_id
   9: INNER JOIN    Supplier WITH (INACTIVE) ON Supplier.supplier_id = Products.supplier_id

Notice the subtle change?  This prevents inactive products from being returned, but allows active products that belong to inactive categories or suppliers.

However, if you didn't want to show the inactive category name or supplier name in the results, you could use a different approach.

   1: SELECT       Products.product_id
   2:         ,    Products.product_name
   3:         ,    category_name = IsNull(Categories.category_name, 'Inactive Category')
   4:         ,    company_name = IsNull(Supplier.company_name, 'Inactive Supplier')
   5:         ,    Products.quantity_per_unit
   6:         ,    Products.unit_price
   7: FROM         Products
   8: LEFT JOIN    Categories ON Categories.category_id = Products.category_id
   9: LEFT JOIN    Supplier ON Supplier.supplier_id = Products.supplier_id


For those of us that have stored procedures running their applications, the 'with' hints won't always cut it though.  If the user gets to specify whether or not to include inactive records in a result, then you usually have code like this:

   1: CREATE PROCEDURE Employee_List
   2:     @ActiveOnly bit = 1
   3: AS
   4:   SELECT       *
   5:   FROM         Employees
   6:   WHERE        (IsActive >= @ActiveOnly)

This uses a convenient trick that handles all scenarios very concisely.  Here are the scenarios and how this works:

  1. User wants active records only, so @ActiveOnly = 1
    1. Employees marked as IsActive = 0 fail the (0 >= 1) condition
    2. Employees marked as IsActive = 1 pass the (1 >= 1) condition
  2. User wants both active and inactive records, so @ActiveOnly = 0
    1. Employees marked as IsActive = 0 pass the (0 >= 0) condition
    2. Employees marked as IsActive = 1 pass the (1 >= 0) condition

Scenario 2b. is where we require a greater than or equal check instead of just equality.

Well, we'd need to be able to pull off this same kind of result with SQL-level inactive records.  And this is where my implementation ideas get a little fuzzy.  But something like this could do:

   1: CREATE PROCEDURE Employee_List
   2:     @ActiveOnly bit = 1
   3: AS
   4:   DECLARE      @RecordState varchar(10)
   5:   SELECT       @RecordState =
   6:   CASE @ActiveOnly
   7:   WHEN 1 THEN 'ACTIVE'
   8:   ELSE 'INACTIVE'
   9:   END
  10:   
  11:   SELECT      *
  12:   FROM        Employees WITH (@RecordState)


Of course, it would be even better to be able to just put the 'case' statement inline in the 'with' clause.

Inevitably, we'd end up needing to know whether or not a record was inactive after it came out of a query.  I don't think we'd want to have a fabricated column visible to us, so a bit-returning, system-level function would probably be preferred.

   1: SELECT       employee_id
   2:         ,    last_name
   3:         ,    first_name
   4:         ,    title
   5:         ,    IsActive = RECORD_STATE(Employee)
   6: FROM         Employee WITH (INACTIVE)

 

With referential integrity in place, it's often difficult to delete data.  But with aging systems, users often need to clean up old data, hiding it indefinitely.  We need to be able to hide data without deleting it, so that we can retain historical data and integrity.  As common as this problem is, it seems like there should be a built-in solution.  A solution like what I've illustrated could save many developers a lot of time and potentially prevent a lot of bugs.

If you like this idea, vote for it on Microsoft Connect.