Image of Navigational Map linked to Home / Contents / Search Toybox

by Peter Wone - Wamoz Pty Ltd
Image of Line Break

A confession

I was wrong about messaging in Delphi. It's already integrated into TObject - and in such a way that it integrates neatly with the Windows messaging system. For details see my article Journal of an OOP bigot also in this issue.

Depression with database gurus

A bunch of quacks

I make a habit of reading the better DBMS journals from time to time, just to stay current with popular belief. This is not to say that the opinion of the masses has any validity or foundation in fact, but it does affect what people will invest in, which is highly relevant to sorting out our ongoing mess.

In a futile attempt to persuade reality not to be so difficult, various fairy tales are concocted and pushed from time to time. These usually have foundations that have been partly thought through but most represent personal prejudices or corporate agendae. Invariably, reality resists, but another generation of software add-ons or DBMSs is sold, and the merry-go-round spins ever faster.

Lately things have been especially depressing, with all sorts of pseudo scientific twaddle being pushed by people who ought to be more responsible.

Let me be blunt: the only branch of mathematics particularly and broadly relevant to DBMSs is set theory. The only data modelling science is relational. No other scheme has a complete foundation in mathematics. Anything else is hocus pocus. When other schemes, like "post-relational" or "object-based" DBMSs work, it's because at root the models implemented are relational. This is because either they're basically relational or they don't work. By the evolution of debugging (and usually at great expense) the dysfunctional non-relational models implemented with these technologies turn into fundamentally relational models.

The data we store can be of only three possible species:

Before a model will work, you must either normalise it or get right the logic for abnormal updates. There are no other possibilities.

One man who knows what he's talking about

Tony Percy, vice president and research centre director for the Gartner group, outlined these "seven rules of combat" in a conference address:

  1. When data are artificially distributed, consolidate.
  2. When data are dispersed, synchronise.
  3. When a dedicated DSS is required, move and transform.
  4. When DSS access needs optimisation, replicate.
  5. When workloads cannot be scaled, partition.
  6. When availability is critical, duplicate (everything).
  7. When multiple stores must be made consistent, broker.

Apart from correcting his grammar and Australianising his spelling, I have neither added nor changed anything. This is a masterful summary, and it covers just about every possible situation. Learn this by heart, learn how to do these things, and you are ready for big data in a distributed world.

Remember that this is the science, and that all the software is merely the technology. When there isn't canned technology for building what the science says you must build, don't abandon the science. Build better technology. I know it's expensive, but look at it this way: afterwards you can make a fortune selling next generation tools. Or you can hang on to it and make good on marketing lies about being better and faster than your competitors.

Some advice on Java

Positive: if you're new to Java and you're having trouble getting your head around it, you find some orientation help on Sun's website. There are a number of Java tutorials thematically described as "trails" and ranging from entry level to intermediate.

Negative: do not fall into the J++ trap:

I've already been over this once. J++ is a blatant attempt to wreck the platform independence of Java. It comes down to this:

Why bother? If you're going Wintel specific just use C++ because you'd have to off your rocker to pay the price of platform independence without really achieving it.

Thanks

I learnt, just before submitting this column, that many of you read this column first, that you often read it more than once, and that there are an astonishing number of you. That is immensely gratifying. Every issue I put a lot of care and thought into this column. It may be rambling, but there's a lot of me in it.

At times I wonder whether I'm just some opinionated fool with a soapbox. Mark says that we're all just some opinionated fool with a soapbox. I like to think that I have something to say. Apparently, so do you. For that you have my thanks.

Stop agreeing with me

Maybe I have a few things to say, and maybe it's interesting.

But surely you don't always agree with me? In all the years I've been writing this column, I've only once had someone write in to say "Rubbish. You are wrong, and here are the reasons."

C'mon people, send me some hate-mail. Tell me that I've re-invented the reverse-widget, and refer me to the page where Donald Knuth explained my brilliant idea fifteen years ago.

Déjà vu

There is a highly amusing parallel between the way COBOL uses screens and the way many current web applications dynamically generate interface pages. Call it thin client if you like, and cover it with buttons and scrollbars, but the more things change the more they stay the same.

Another instance of this recurring theme is the ubiquitous Windows dialog box.

All three amount to a form you fill in and submit.

And why not? It works.

Wisdom?

It occurred to me that after learning so much, all that I know about databases could be written on the back of a postcard:

Things have attributes. Things relate to other things. Relationships are things. For each, describe the thing, the whole thing, and nothing but the thing.

That's everything there is to know about data modeling. That paragraph describes completely both object modeling and normalisation. Of course, to interpret it usefully you have to understand it completely. I doubt that is possible from simply reading it.

It's a bit like saying that all there is to know about Mandelbrot fractals is that each term, for some imaginary number k, is the sum from i = 0 to i = infinity of Zi where Zi = Zi-12 + k. The beauty of it is that infinitesimal variations in k produce radical fluctuations in term escape values (which produces those incredible patterns that were such a fad in the eighties).

Really it is simple, and perfectly obvious once you understand it, but to make any use of this information you need to know what it means, and its various implications, not to mention how to square an imaginary number expressed in polar notation.

A visit from Borland

I'm a much happier little Vegemite than I was.

Finally I settled down to do some serious learning of Java, and JBuilder in particular. I wrote a "hello world" program and compiled it. No problems there, so I tried compile/run and discovered that Borland's JVM doesn't like spaces in paths. When I re-installed the software on my PC, it announced that the trial was over.

Borland's Graham Porter came to the party with a release copy of JBuilder Pro and a seat at their Intro to JBuilder seminar.

The speaker was Joe Nuxoll, architect of JBCL (JBuilder component library) and devotee of Design Patterns (see below). This book has wandered in and out of my life ever since it was first published in 1995. It's very good in many ways, and certainly most programmers could learn a lot from it. I did learn a lot from it.

JBCL is Borland's latest go at a killer library. It's really orthogonal. It's efficient, it's effective, and it's designed to be extended. It's also about as fast as a great big comprehensive Java library can be.

The Delphi library structure is determined to a great extent by the fact that in a sense it is a wrapper for Win32. JBCL is not constrained by that. The set of controls and control behaviour common to the platforms on which Java runs is so small that wrapping common stuff has been abandoned in favour of an all-Java approach with appearance localisation for platforms. JBCL does this, and does it well.

Algorithm is everything. Let me illustrate with a personal anecdote. During a project some years ago, while hand tuning a performance critical piece of SQL, it occurred to me that the domain optimisation pattern I was using could be applied to nested loops in procedural code. I quickly coded a Sieve of Erastosthenes, benched it, added in a subtle change to the algorithm and clocked it again. No language tricks or side effects, just an improvement to the algorithm. Everything was in-memory anyway, so caching was not a factor.

Execution time in the same language on the same computer dropped from two minutes to about ten seconds. C++ isn't twelve times faster than VB for that sort of thing, so the improved algorithm in VB was faster than the standard algorithm in C++.

There's lots of good science in JBCL. And the chips just keep getting faster. Java is going to work, I think.

As for a full discussion of JBuilder, I'm still learning. Nell has asked for some bits and pieces for her web pages. The learning experience this will provide should put me in a good position to comment on JBuilder.

Contractual obligations and corporate hypocrisy

Not very long ago I scandalised an employer by refusing to work overtime. My then boss said they needed whatever it was. I said they didn't need it, they wanted it to avoid admitting that a salesman had lied to a major corporate customer.

He said I had a bad attitude, to which I replied "You want me to work for free to cover for a liar in order to increase the wealth of somebody highly unlikely to share it with me, and you think I have a bad attitude?"

Later that week they tried to shame me into doing it by arranging to have a representative from the customer visit to look at the prototype of the software.

To my boss's lasting chagrin I told the customer the bald truth. He seemed amused by my attitude and asked me whether I was aware that the availability of the software was likely to be the deciding criterion for about half a million dollars worth of custom. I pointed out to him that in order to secure themselves a fat commission, the salespeople would have offered him the biggest discount they were allowed to offer, and that such a contract would adversely affect our transient debt exposure in return for a comparatively marginal profit. This gave him a belly laugh, and he remarked that I certainly liked to call a spade a spade, and that our closest competitor on the tender had the software already, but the prototypes of ours looked more appropriate to his needs.

I asked him whether they gave him a copy. When he said not, I said "They lied to you. If they had a working program they would have given you a copy straightaway, because playing with it would have branded it in your memory."

He looked at me like a stunned mullet for a few seconds and asked to use my phone. After being put on hold three times and a minute of two of undoubtedly embarrassed umming and ahh-ing on the other end, he hung up and looked at me askance, saying "You were right."

After a minute or so of silence he asked, "When will this be ready?" and I told him "Between six and twelve weeks. Twelve if you want a stable version. Good software takes at least three tries."

Him: "And when do you think you think your competitors will be ready?"   Me: "Don't know, but they're less than half finished or they'd be giving you progress reports."

We got the contract. I prefer truth. So did he.

I left that company, about a month later, to work for someone prepared to pay by the hour for wasting my time and talent. It was necessary for my sanity. We only get one life on Earth. If I live to 85 that's about 31000 days. I've used nearly half of them, and I don't like wasting them: this is it, and there are no refunds or rainchecks. If you want part of my life you can damn well pay for it.

Poor Ross, victim of the paucity of Visual Basic

All that trouble, just to parse a string. (q.v. last issue, Separating Yourself from Comma Separated Values)

In Delphi, all you have to do is load the string into a TStringList. One operation, and then you access the elements by index:

  AStringList.CommaText := AString;
  AnotherString := AStringList[3];

It knows all about ignoring commas inside double-quoted values, and escaping double-quotes by repeating them.

Name value pairs of the form name=value (like an INI file) are also parsed automatically. You can refer to the values directly by name:

  AStringList.LoadFromFile(AFileNameString);
  AnotherString := AStringList.Values['BaseURL'];

although you wouldn't really do this, because if you use a TIniFile object instead, you don't even need to explicitly load the file text. Where you do make heavy use of this facility is in writing ISAPI DLLs, because cookie and URL data arrive as name value pairs of exactly this form, and it's damned handy not needing any explicit parsing code.

In Java it gets even easier. The code:

StringTokenizer st = new StringTokenizer("this is a test"); 
  while (st.hasMoreTokens()) {
    println(st.nextToken());
  }

uses the default separators (blocks of whitespace characters) and prints the following output:

  this
  is
  a
  test

There are two other overloads of java.util.StringTokenizer which afford more control. You can supply a delimiter string as a second parameter (eg comma, if you're handling CSV) and if you supply a third parameter of True, the delimiters are included as tokens in the resultant list of tokens.

If you want to get even fancier - at the expense of having to understand what you're doing, something that programmers these days don't seem to want - you can use StreamTokeniser instead. Here's an excerpt from the documentation:

The StreamTokenizer class takes an input stream and parses it into "tokens", allowing the tokens to be read one at a time. The parsing process is controlled by a table and a number of flags that can be set to various states. The stream tokenizer can recognize identifiers, numbers, quoted strings, and various comment styles.

It's not much less than a built-in compiler.

Tell me again how good VB is. And don't give me that twaddle about Microsoft being in a position to provide long term support. When was the last time you called Microsoft without being charged $200 for the privilege of listening to Enya for half an hour? And when you actually spoke to a human being, was the answer any use?

Even if you're as good as me at eliminating the standard evasions, they just promise to investigate. The very most you can hope for is an invitation to Tech-Ed, a couple of groovy T-shirts and your name on a documented bug.

Microsoft documentation is encyclopædic, available on CD, and entirely free of "This page intentionally left blank."  Pity there's so much rubbish to learn that you end up committed to a proprietary platform. On the other hand, they did invite me to the developer launch of MSIE4, where I scored a great skivvy, an eval copy of NT4 that I used for four months before being forced to pay for NT, and a rather good lunch.

And in fairness, MSIE4 is very good, and they are toeing the line when HTML standards are issued. But they try to preempt the committee by implementing stuff before it's official, which forces market direction and  crystallises syntax before it's been properly thought through.

They should realise from their own legacy problems with Windows API that designs need much reflection before implementation otherwise everybody has to live with the consequences and artificial limitations introduced by rash decisions.

Of couse they're also causing problems for Netscape - I'm sure that breaks their black little hearts. <g>

For the love of IF is the root of all error

A thought: conditionals in procedural code represent uncollapsed probability waves. A procedurally defined program, chock-full of uncollapsed waves, is Shrödinger's cat: doomed to surf its indeterminate half-life on the fickle wave of ignorant fate, saved or slain by the prying eyes of some unfortunate user peering myopically into its box.

These waves of probability collapse depending on program state; parts of the code are existentially dependent on state. Conditionals, as part of the code, may themselves be iteratively existentially dependent on other conditionals. Consequently, future run-time states of a program are obscured by a miasma of probabilistic indeterminacy.

Test suites are attempts to clear the air, to qualify and quantify sequences of wave collapse. But the number of possible states is an exponential function of the number of governing factors, which makes it difficult and expensive to collapse every wave in every way.

Interestingly, in languages such as Prolog, the operation space of a method is completely and finitely defined by the predicate to which it is applied. It is still possible to create uncollapsed probability waves, but you have to try harder - it requires exceptional stupidity. (People like me do it all the time!)

Living software

I was just reading a science fiction novel in which the baddies were AIs which evolved from a certain very real Mr Ray's 80 byte A-life microorganisms. The explanations of how Mr Ray's greeblies worked provoked some interesting thought.

Can we make software evolve? Evolution, at its simplest, is survival of the fittest. The principal difficulty in simulating evolution lies in defining what constitute "fittest".

The idea of adaptive software is hardly new, but the environments in which this software lives and grows are simple and are largely unaffected by changes in the software. The software tends to be large and complex and all it can change is a few behavioural parameters - quantitative rather than qualitative change.

With "real" life, the environment is largely defined in terms of pre-existing life-forms. In a sense, software already evolves by natural selection. Fitness is defined by humans voting with their wallets - free market selection. But a piece of software successful on the free market does not increase its potential to diversify. There are not millions of individual versions of Microsoft Word, each with the opportunity to evolve in a different direction.

If there were, it would undoubtedly be a  slimmer, more robust application than it is. Most of the thousands of useless features would atrophy. New and unexpected ones might crop up from time to time. This is more or less what happens with Linux, because there are hundreds of versions evolving independently. And Linux is famous for being amazingly robust, small and efficient, and feature rich yet simple. Evolution in action.

People are still involved. It's too slow. It's probably safer that way, because "fitness" is defined entirely in terms of utility to humans, but the bottleneck is people.

More on that another time.


Image of Line Break

Operator overloading

In C++ and Java, you can implement multiple methods by the same name. The compiler determines which to call based on the parameters used in the call.

Design Patterns

Design Patterns is a cookbook of proven ways to handle these issues with current OOP languages. Go buy it. You need it. It tells you all the stuff they should have taught you at university.

Let's have a look at some of the design patterns described. I have compared them to RDBMS functions, and I think you'll find that many of the problems addressed are consequences of the way methods are bound to data in conventional OOP languages, and the fact that they operate on instances rather than logical domains.

Creational patterns

Except for Singleton, all of the creational patterns share the purpose of allowing methods to operate on objects to which they are not explicitly coupled. The methods are logically bound by instance data rather than by declaration as class members.

As for Singleton, it is concerned entirely with ensuring that a single, shared instance occurs. in other words, the function of the Singleton pattern is enforcement of referential integrity.

Structural patterns

Decorator provides run-time binding of methods to data in classes which are separate but logically related to the classes implementing the methods.

Flyweight uses sharing to "support large numbers of fine-grained objects efficiently." The trick here lies in implicitly joining data to methods based on type. This saves having to store the bindings, which for prolific objects like characters would be very expensive. But it's still a join.

Behavioural patterns

Iterator supports the traversal of sets of objects, and is used to implement methods that operate on sets of objects.

Like the structural pattern Decorator, the behavioural pattern Visitor provides run-time binding of methods to data in classes which are separate but logically related to the classes implementing the methods. Visitor "lets you define a new operation without changing the classes of the elements on which it operates." This is actually static binding implemented in code created after the class(es) on which it operates, but it illustrates the need to bind methods to data dynamically.

Strategy is a little more complex. It lets you "Define a family of algorithms, encapsulate each one, and make them interchangeable. Strategy lets the algorithm vary independently from clients that use it." To put it another way, Strategy supports data-driven binding of methods and data.

The function of Memento is to support the implementation of persistence. Nobody says you have to keep the mementos in memory. An increasingly common practice is to keep them in a table of blobs, to allow them to persist between user sessions.

Observer - this is straight out of the book - is used to "Define a one-to-many dependency between objects so that when one object changes state, all its dependents are notified and updated automatically." Sounds like a join to me.



Written by: Peter Wone
April '98

Image of Arrow linked to Previous Article
Image of Line Break
[HOME] [TABLE OF CONTENTS] [SEARCH]