Image of Navigational Map linked to Home / Contents / Search Intro to SQL: Jaws II

Mark Warhol - Prudential Insurance
Image of Line Break

This article has been extracted from Mark Warhol's book The Art of Programming with Visual Basic, John Wiley & Sons, Inc., 1995, Chapter 4, pp 115 - 122.

I've mentioned before that the main reason a corporate application exists is to process data. Nobody wants that program I always see as an example in programming books. It has a text box, a command button, and a label. You type in the temperature in degrees Fahrenheit, press the command button, and the temperature converted to Celsius appears in the label (Figure 4.1). If a program like this did exist, it would probably have to process several thousand temperatures stored in a database table. The program would display the records, possibly filtering them based on certain criteria. The users would probably be allowed to add, modify, and delete these records. On a regular basis, one or more standard reports on the data would have to be generated. Temperature Convereter

Most likely, your applications will use some variation of SQL or Structured Query Language to perform these tasks. Performing this processing using standard flat-file I/O can be wicked and should be reserved for very simple applications. SQL is a lot more fun. SQL is pronounced "sequel" by everyone but me. I still say "S-Q-L" and everyone looks at me like I'm a potato farmer. It was created by IBM, which, to my knowledge, everyone pronounces the same way. SQL is amazingly powerful and easy to learn. As a programmer, you can perform all the data manipulation you need with four statements. I'll base my examples here on this mythical temperature program. There are two tables used: SAMPLES, which contains the temperature readings, and S_INFO, which contains specific information about the samples themselves.

The tables look like this:

SAMPLESTable
SAMPLE_IDInteger
TEMP_FARENFloat

Example data in table:

SAMPLE_IDTEMP_FAREN
1100
275
357

S_INFO Table
SAMPLE_IDInteger
SAMPLE_NAMEChar(50)
COMMENTSChar(100)

Example Data in table:

SAMPLE_IDSAMPLE_NAMECOMMENTS
1Test Run 37Pending
2Sample BNot sure
3Lab CheckDone earlier

Here are the four SQL statements:
SELECT-retrieves a set of records.

	SELECT
		SAMPLE_ID, TEMP_FAREN, (TEMP_FAREN - 32) * 5/9
	FROM
		SAMPLES
	WHERE
		TEMP_FAREN > 0
	ORDER BY
		SAMPLE_ID

Returns:

SAMPLE_IDTEMP_FAREN(TEMP_FAREN - 32) * 5/9
110037.7
27523.8
35713.8

(TEMP_FAREN - 32) * 5/9 is an example of a compound column that doesn't really exist in the table.

Fields from more than one table can be retrieved in a single SQL SELECT statement by joining the tables through common key field(s). Next, I retrieve the TEMP_FAREN field from the SAMPLES table and the SAMPLE_NAME field from the S_INFO table. The two tables are linked by the key field SAMPLE_ID, which exists in both tables.

	SELECT
		S_INFO.SAMPLE_NAME, SAMPLES.TEMP_FAREN
	FROM
		SAMPLES, S_INFO
	WHERE
		SAMPLES.SAMPLE_ID = S_INFO.SAMPLE_ID

Retrieves:

SAMPLE_NAMETEMP_FAREN
Test Run 37100
Sample B75
Lab Check57

INSERT-adds a new record.

	INSERT INTO SAMPLES
		(SAMPLE_ID, TEMP_FAREN)
	VALUES
		(4, 54)

UPDATE-updates fields in one or more existing records.

	UPDATE
		SAMPLES
	SET
		TEMP_FAREN = 55
	WHERE
		SAMPLE_ID = 1

DELETE-deletes one or more existing records.

	DELETE FROM
		SAMPLES
	WHERE
		SAMPLE_ID = 1

That's just a little taste. A good SQL manual will explain the basics better than I do. I will go over two basic SQL errors and several advanced techniques that I have actually needed to use, unlike the advanced techniques in the SQl manual that nobody would ever think to use. I am using standard or semi-standard SQL for my examples. You may be working with a weirder form of SQL but the concepts will be the same. Test out any of the statements I write to make sure they perform in your environment.

Basic SQL Errors: Too Much or Not Enough

Here's a simple SQL SELECT statement:

	SELECT
		EMP_ID, F_NAME, L_NAME, M_INIT, STATUS_CD, COMMENTS
	FROM
		EMPLOYEE
	WHERE
		STATUS_ID = 1
	ORDER BY
		L_NAME, F_NAME, M_INIT

It should be fairly obvious what this statement does. It retrieves the EMP_ID, F_NAME, L_NAME, M_INIT, STATUS_CD, COMMENTS columns from the EMPLOYEE table for all records where the STATUS_CD = 1. The retrieved records will be sorted in ascending order by L_NAME, F_NAME, M_INIT. Simple SQL SELECT statements (basically statements using only one table) probably won't cause much trouble. But there is trouble. It comes when more than one table is used in the SELECT statement.

The two most common SQL SELECT troubles are:

Here's a statement where the first problem is brewing. It's the previous statement joined with the STATUS table.

STATUS
CODEInteger
VALUEChar(20)

Example data in table:

CODEVALUE
1Active
2Terminated
3Maternity Leave
4Disability Leave

	SELECT
		EMP_ID, F_NAME, L_NAME, M_INIT, STATUS.VALUE, COMMENTS
	FROM
		EMPLOYEE, STATUS
	WHERE
		EMPLOYEE.STATUS_CD = STATUS.CODE

In this SQL statement, the EMPLOYEE and STATUS tables are joined by the field STATUS_CD. The purpose of the join is to display the value associated with the STATUS_CD field instead of the STATUS_CD itself that is just a number. This is a very common SQL thing you need to do. Let's say there are 200 records in the table and you're only getting 180 records back. What's the problem? This brings you to the most important questions you need answered about a field involved in a join:

  1. Are the fields being joined guaranteed to match each other?
  2. If there isn't a match, should i retrieve the record anyway?
    So, let's ask these questions in this particular case: Does EMPLOYEE.STATUS_CD always match with a STATUS_CODE?
    If it doesn't match, do I still want the EMPLOYEE record?

IF STATUS_CD can be left blank in EMPLOYEE, then it won't match STATUS and won't return a record. In this case, I don't want that to happen so I must rewrite the statement slightly. When doing a join like this one, where the field joining the two tables can be blank (or just isn't guaranteed to match), you need an outer join.
Here it is:

	SELECT
		EMP_ID, F_NAME, L_NAME, M_INIT, STATUS.VALUE, COMMENTS
	FROM
		EMPLOYEE, STATUS
	WHERE
		EMPLOYEE.STATUS_CD *= STATUS.CODE

The *= says that all records should be retrieved from the table on the left side of the equal sign, here EMPLOYEE. If there is a match, then also retrieve the value from the table on the right side of the equal sign, here STATUS.

Not using an outer join when necessary is the most common cause of retrieving fewer records than expected. I've done it, and had to maintain several programs from other people that had this error. One was a personnel application that used a code/value for the RESPONSIBILITY field. The retrieve did a join but didn't require the RESPONSIBILITY field to be filled in. A user would add a bunch of personnel without entering a RESPONSIBILITY, save, then none of the records would display even though they were in the database.

I've also encountered this problem in programs that were originally set up to require the user to enter an appropriate value for the joining field. The SQL call started out working just fine. The specs for the program then changed, making the field optional, but the original SELECT statement didn't get changed. A very simple bug that can ruin your day. Check if any such situations occurred if you've inherited an old project.

The second problem happens less frequently but when it does, watch out. Here's a statement where trouble #2 is brewing:

	SELECT
		EMP_ID, F_NAME, L_NAME, M_INIT, SATTUS.VALUE, COMMENTS
	FROM
		EMPLOYEE, STATUS, SALARY
	WHERE
		EMPLOYEE.STATUS_CD *= STATUS.CODE

This statement returns a zillion records after a considerable delay. Why?

The problem is insidious because you'd think it wouldn't really be a problem. This statement has an extra table in it's FROM clause that is not used in the WHERE clause. The result is SQL matches every record in this table with every record retrieved in a process called a Cartesian Product. This result isn't quite as elegant as it sounds, it's actually nothing more than a Big Mess. The number of records retrieved will vary considerably depending on the number of tables in the join and the number of records in the tables. If your tables have just a few records, you may not notice the extra records retrieved, but once those tables start to fill up, you'll be languishing in decadent excess.

The person who wrote this statement either:

  1. Forgot to add the connection to it in the WHERE clause.
  2. Shouldn't have included this extra table in the FROM clause.

Often, the second case occurs when someone is optimizing a large SQL statement. They realise, "Hey, I don't need this in the WHERE clause," and remove the join, forgetting that the table is still in the FROM clause. I know this happens because my boss did it. Three people looked at the statement for 20 minutes before the problem was discovered. Save your people and your minutes; look for it first. The best thing to do is to print out the SQL, then go over it making sure all tables in the FROM clause are actually being used.

If you use a SELECT DISTINCT statement, the second error will hide from you. SQL will retrieve the zillion records into it's own temporary table but will then filter out the endless duplicates, displaying just the unique records to you. The poor server must chug away, a helpless accessory to your destructive plan. The DISTINCT clause is very useful in certain situations. Just make sure it isn't covering up some erroneous and terribly inefficient SQL. Do a zillion record retrieve check of the SQL statement without the DISTINCT clause, then put it in.

HYSTERICAL WARNING:
If you want to try out an example of error #2 just to see it in action then make sure you use tables that have about two records in them and include only one unnecessary table in the join. This can really be a killer statement. Please don't torture your server unnecessarily.



Written by: Mark Warhol
November '96


Image of Line Break
[HOME] [TABLE OF CONTENTS] [SEARCH]