Mark Warhol - Prudential Insurance
This article has been extracted from Mark Warhol's book The Art of Programming with Visual Basic, John Wiley & Sons, Inc., 1995, Chapter 4, pp 115 - 122.
|I've mentioned before that the main reason a corporate application exists is to process data. Nobody wants that program I always see as an example in programming books. It has a text box, a command button, and a label. You type in the temperature in degrees Fahrenheit, press the command button, and the temperature converted to Celsius appears in the label (Figure 4.1). If a program like this did exist, it would probably have to process several thousand temperatures stored in a database table. The program would display the records, possibly filtering them based on certain criteria. The users would probably be allowed to add, modify, and delete these records. On a regular basis, one or more standard reports on the data would have to be generated.|
Most likely, your applications will use some variation of SQL or Structured Query Language to perform these tasks. Performing this processing using standard flat-file I/O can be wicked and should be reserved for very simple applications. SQL is a lot more fun. SQL is pronounced "sequel" by everyone but me. I still say "S-Q-L" and everyone looks at me like I'm a potato farmer. It was created by IBM, which, to my knowledge, everyone pronounces the same way. SQL is amazingly powerful and easy to learn. As a programmer, you can perform all the data manipulation you need with four statements. I'll base my examples here on this mythical temperature program. There are two tables used: SAMPLES, which contains the temperature readings, and S_INFO, which contains specific information about the samples themselves.
The tables look like this:
Example data in table:
Example Data in table:
|1||Test Run 37||Pending|
|2||Sample B||Not sure|
|3||Lab Check||Done earlier|
Here are the four SQL statements:
SELECT-retrieves a set of records.
SELECT SAMPLE_ID, TEMP_FAREN, (TEMP_FAREN - 32) * 5/9 FROM SAMPLES WHERE TEMP_FAREN > 0 ORDER BY SAMPLE_ID
|SAMPLE_ID||TEMP_FAREN||(TEMP_FAREN - 32) * 5/9|
(TEMP_FAREN - 32) * 5/9 is an example of a compound column that doesn't really exist in the table.
Fields from more than one table can be retrieved in a single SQL SELECT statement by joining the tables through common key field(s). Next, I retrieve the TEMP_FAREN field from the SAMPLES table and the SAMPLE_NAME field from the S_INFO table. The two tables are linked by the key field SAMPLE_ID, which exists in both tables.
SELECT S_INFO.SAMPLE_NAME, SAMPLES.TEMP_FAREN FROM SAMPLES, S_INFO WHERE SAMPLES.SAMPLE_ID = S_INFO.SAMPLE_ID
|Test Run 37||100|
INSERT-adds a new record.
INSERT INTO SAMPLES (SAMPLE_ID, TEMP_FAREN) VALUES (4, 54)
UPDATE-updates fields in one or more existing records.
UPDATE SAMPLES SET TEMP_FAREN = 55 WHERE SAMPLE_ID = 1
DELETE-deletes one or more existing records.
DELETE FROM SAMPLES WHERE SAMPLE_ID = 1
That's just a little taste. A good SQL manual will explain the basics better than I do. I will go over two basic SQL errors and several advanced techniques that I have actually needed to use, unlike the advanced techniques in the SQl manual that nobody would ever think to use. I am using standard or semi-standard SQL for my examples. You may be working with a weirder form of SQL but the concepts will be the same. Test out any of the statements I write to make sure they perform in your environment.
Basic SQL Errors: Too Much or Not Enough
Here's a simple SQL SELECT statement:
SELECT EMP_ID, F_NAME, L_NAME, M_INIT, STATUS_CD, COMMENTS FROM EMPLOYEE WHERE STATUS_ID = 1 ORDER BY L_NAME, F_NAME, M_INIT
It should be fairly obvious what this statement does. It retrieves the EMP_ID, F_NAME, L_NAME, M_INIT, STATUS_CD, COMMENTS columns from the EMPLOYEE table for all records where the STATUS_CD = 1. The retrieved records will be sorted in ascending order by L_NAME, F_NAME, M_INIT. Simple SQL SELECT statements (basically statements using only one table) probably won't cause much trouble. But there is trouble. It comes when more than one table is used in the SELECT statement.
The two most common SQL SELECT troubles are:
Here's a statement where the first problem is brewing. It's the previous statement joined with the STATUS table.
Example data in table:
SELECT EMP_ID, F_NAME, L_NAME, M_INIT, STATUS.VALUE, COMMENTS FROM EMPLOYEE, STATUS WHERE EMPLOYEE.STATUS_CD = STATUS.CODE
In this SQL statement, the EMPLOYEE and STATUS tables are joined by the field STATUS_CD. The purpose of the join is to display the value associated with the STATUS_CD field instead of the STATUS_CD itself that is just a number. This is a very common SQL thing you need to do. Let's say there are 200 records in the table and you're only getting 180 records back. What's the problem? This brings you to the most important questions you need answered about a field involved in a join:
IF STATUS_CD can be left blank in EMPLOYEE, then it won't match STATUS and won't
return a record. In this case, I don't want that to happen so I must rewrite the
statement slightly. When doing a join like this one, where the field joining the
two tables can be blank (or just isn't guaranteed to match), you need an outer join.
Here it is:
SELECT EMP_ID, F_NAME, L_NAME, M_INIT, STATUS.VALUE, COMMENTS FROM EMPLOYEE, STATUS WHERE EMPLOYEE.STATUS_CD *= STATUS.CODE
The *= says that all records should be retrieved from the table on the left side of the equal sign, here EMPLOYEE. If there is a match, then also retrieve the value from the table on the right side of the equal sign, here STATUS.
Not using an outer join when necessary is the most common cause of retrieving fewer records than expected. I've done it, and had to maintain several programs from other people that had this error. One was a personnel application that used a code/value for the RESPONSIBILITY field. The retrieve did a join but didn't require the RESPONSIBILITY field to be filled in. A user would add a bunch of personnel without entering a RESPONSIBILITY, save, then none of the records would display even though they were in the database.
I've also encountered this problem in programs that were originally set up to require the user to enter an appropriate value for the joining field. The SQL call started out working just fine. The specs for the program then changed, making the field optional, but the original SELECT statement didn't get changed. A very simple bug that can ruin your day. Check if any such situations occurred if you've inherited an old project.
The second problem happens less frequently but when it does, watch out. Here's a statement where trouble #2 is brewing:
SELECT EMP_ID, F_NAME, L_NAME, M_INIT, SATTUS.VALUE, COMMENTS FROM EMPLOYEE, STATUS, SALARY WHERE EMPLOYEE.STATUS_CD *= STATUS.CODE
This statement returns a zillion records after a considerable delay. Why?
The problem is insidious because you'd think it wouldn't really be a problem. This statement has an extra table in it's FROM clause that is not used in the WHERE clause. The result is SQL matches every record in this table with every record retrieved in a process called a Cartesian Product. This result isn't quite as elegant as it sounds, it's actually nothing more than a Big Mess. The number of records retrieved will vary considerably depending on the number of tables in the join and the number of records in the tables. If your tables have just a few records, you may not notice the extra records retrieved, but once those tables start to fill up, you'll be languishing in decadent excess.
The person who wrote this statement either:
Often, the second case occurs when someone is optimizing a large SQL statement. They realise, "Hey, I don't need this in the WHERE clause," and remove the join, forgetting that the table is still in the FROM clause. I know this happens because my boss did it. Three people looked at the statement for 20 minutes before the problem was discovered. Save your people and your minutes; look for it first. The best thing to do is to print out the SQL, then go over it making sure all tables in the FROM clause are actually being used.
If you use a SELECT DISTINCT statement, the second error will hide from you. SQL will retrieve the zillion records into it's own temporary table but will then filter out the endless duplicates, displaying just the unique records to you. The poor server must chug away, a helpless accessory to your destructive plan. The DISTINCT clause is very useful in certain situations. Just make sure it isn't covering up some erroneous and terribly inefficient SQL. Do a zillion record retrieve check of the SQL statement without the DISTINCT clause, then put it in.
If you want to try out an example of error #2 just to see it in action then make sure you use tables that have about two records in them and include only one unnecessary table in the join. This can really be a killer statement. Please don't torture your server unnecessarily.