by Mark Trescowthick - GUI Computing
Let me just set the scene... I'd been using Word 97 for a few weeks and, as part of that, I'd generated some HTML. Looked fine to me. So when a client approached us and was looking to move some documents (made up of one or more Word documents, plus some "Help popup" style explanations) from Word to an Intranet site (and back), I had no hesitation in suggesting that that would be no trouble.
The designers at Spider Eye got to work, and came up with a delightful design, featuring frames to hold a button bar and a Table of Contents, which I'd generate from the heading lines of the Word document, using VBA. The client was happy, and so was I. I was going to have to emulate the "popups", and we intended to do this via a small borderless frame in the button bar. We also had to generate a Table of Contents at two levels - for each document, and for the overall "compound document" - but that looked pretty easy indeed.
I'd overlooked the fact that, for some insane reason, Word doesn't know anything about frames at all. You can't set a base target at the document level, you can't even set a target link-by-link!! From a 'gee this will be easy' state of mind, I descended in only a few hours to 'oh dear' (or something along those lines, anyway!).
What I came up with gave Word's VBA a nice little test, certainly got me to grips with the new object model in a heck of a hurry and, best of all, worked. It's a classic kludge, but what the heck!
Really, there was only one way around things... all I could come up with was to generate the HTML, then copy that file to a text file, open it in Word and play ducks and drakes with targets, then save it and rename it to .html.
Yes, folks, there's a kludge solution for every occasion... and this was a ripper!
Oh, and by the way, did I tell you the client wanted to export these files back to Word 6, work on them, then bring them back again with the links intact? I didn't… well, that's another story.
A word about what we decided was the best approach will probably help before I look at the actual code involved…
We were going to be doing this generation process a good deal, so we sat down and specified the directory structures we wanted to use first. This was important because, unless we were going to simply dump all of this HTML into one directory on the server at the end of the day, those paths needed to be inserted into the HTML during generation.
We also needed some way to provide targeting for individual links, so that we could support those pesky popups. The only real way to do that was to enforce some naming standards on the link creation. We chose to have each bookmark to which we would popup end with "GUIPopup". We also mandated that these popups be in a separate document.
Finally, the requirement was that we be able to generate a Table of Contents from either Heading Two lines or from Heading Two and Three lines. All this is included in the demo template, so I've excluded a lot for the sake of clarity (if you can call this that!).
The first thing to do was to ensure a Document Title is there - otherwise the HTML picks up the title of the first line of the document.
If ActiveDocument.BuiltInDocumentProperties.Item("Title") = "" Then MsgBox "Please specify a document Title. Use File | Properties.", vbCritical, sSysname Exit Sub End If
We then show a small form to allow the user to select which generation option they wish, and on the basis of that branch to the appropriate routine, having first initialised about a million variables, with which I won't bore you here. And, of course, having created a new blank document into which we'd put the Table of Contents.
Creating the Table of Contents is, of course, a two-stage process. First, we needed to find all the, say, Heading Two lines and create bookmarks on them, then we needed to create links to them in the new Table of Contents.
The first step, though, was to delete any of these bookmarks we might previously have created (in case the user had changed their style). This looked to be easy, but caused some pain. The following works, but is quite cumbersome, as it requires us to save the document after each bookmark is deleted. If this wasn't done, the collection got very messy and things didn't work. It turns out that, if you process the collection from the end, this isn't a problem. This code just never quite got re-written.
It's also important to note that, if a bookmark spans two (or more) styles, referencing the range.style property generates an error - hence I created a tiny function called CheckRangeStyle which does just that.
While we're processing and deleting all our previous bookmarks (which begin with sbkPrepend), we also took the opportunity to examine each other bookmark we came across. If it was on a Heading Two line, we moved its start and end point down a fraction, so that it wouldn't get in the way… it appears that, at least sometimes, if two bookmarks overlap and one is deleted, the other disappears as well. I haven't been able to verify that 100%, but certainly it appeared that way sometimes.
bFlag = False l = ActiveDocument.Bookmarks.Count bFlag = (l = 0) Do While Not bFlag For i = 1 To l ActiveDocument.Save ' pack the bookmarks bFlag = True If i <= ActiveDocument.Bookmarks.Count Then If Left(ActiveDocument.Bookmarks(i).Name, 2) = sbkPrepend Then ActiveDocument.Bookmarks(i).Delete bFlag = False Else If Not CheckRangeStyle(ActiveDocument.Bookmarks(i).Range) Then MsgBox "The bookmark " & ActiveDocument.Bookmarks(i).Name _ & " containing the text """ & _ ActiveDocument.Bookmarks(i).Range.text & _ """ spans two styles. Processing aborted.", _ vbCritical, sSysname Exit Sub End If If Not CheckRangeStyle(ActiveDocument.Bookmarks(i).Range) And _ ActiveDocument.Bookmarks(i).Range.Style = "Heading 2" Then ActiveDocument.Bookmarks(i).Start = _ ActiveDocument.Bookmarks(i).End + 1 ActiveDocument.Bookmarks(i).End = _ ActiveDocument.Bookmarks(i).Start bFlag = False End If End If End If Next i Loop
Once we'd done that, it was simply a matter of creating our new bookmarks. This is easy - search for the style, copy the text, switch to the Contents document and paste it. One interesting gotcha, though, occurs if a Heading Two style appeared on the last line of the document. Word gets stuck and goes round in circles forever. Hence the "And Selection.Range.End <> ActiveDocument.Content.End" in the While test.
lbkCount = 0 With ActiveDocument.Content.Find .ClearFormatting .Style = wdStyleHeading2 Do While .Execute(FindText:="", Forward:=True, Format:=True) = True _ And Selection.Range.End <> ActiveDocument.Content.End With .Parent .StartOf unit:=wdParagraph, Extend:=wdMove .Paragraphs(1).Range.Select .Move unit:=wdParagraph, Count:=1 End With Selection.Copy lbkCount = lbkCount + 1 ActiveDocument.Bookmarks.Add Range:=Selection.Range, _ Name:=sbkPrepend & Format(lbkCount) Documents(sIdxDoc).Activate Selection.Paste Documents(sTgtDoc).Activate Loop End With
Finally, we step through each paragraph in the Contents document and added a hyperlink to the appropriate bookmark. Of course, the target wasn't right yet, but at least the link was created.
For i = 1 To (ActiveDocument.Paragraphs.Count - 1) ActiveDocument.Hyperlinks.Add Anchor:=ActiveDocument.Range( _ Start:=ActiveDocument.Paragraphs(i).Range.Start, _ End:=ActiveDocument.Paragraphs(i).Range.End - 1), _ Address:="../../../Documents/" & sDocSysName & "/" & sTgtHtml, _ SubAddress:=sbkPrepend & Format$(i) Next i
It got a bit nastier when generating for two heading types, but the essential techniques are the same, so I haven't bothered to include them here.
Having saved the index file, we then copy it to a text file, and open it in Word again, to insert our target information.
This is actually pretty straightforward, as our standard templates include an additional META Tag (inserted using a custom property). For the Table of Contents, the destination frame will always be the same for every link, so we just search and replace on that, as per :
Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .text = sMetaTagToReplace .Replacement.text = sBaseTarget .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False End With Selection.Find.Execute replace:=wdReplaceAll
There's one other nasty gotcha, and I can't work out whether I think it's a bug or a feature :- when you save a document as HTML, any hyperlinks to other .DOC files (or, indeed, to other .Anything files) remain as they were. This sort of makes sense, and sort of doesn't. I would really have thought that this could have been an option. In any event, we step through and replace each occurrence of .DOC with .HTML.
All that's left for the Table of Contents is to save the "text" file, delete the old HTML and rename the text file appropriately. A warning, though - if you or any user opens this (or any) HTML file we "adjust" in this manner, Word will 'helpfully' remove all that hard work!
We then step through the original document (having saved it as HTML and opened that as text), and replace each instance of "GUIPopup" with "GUIPopup target= "popup"" to set the targets on these appropriately.
Finally, of course, we need to generate framesets and such. The easiest way to do that was to simply create a new text document, insert the appropriate lines, then save it and rename it to .html.
All in all, an interesting experience. Not one I'd want to repeat every five minutes, mind you, but interesting.
I found the Word object model pretty easy to work with, once I got the hang of it, but I do think that the bookmarks and hyperlinks collections, in particular, need to be looked at carefully. Deleting bookmarks and hyperlinks is a reasonably frequent occurrence (if you're doing that sort of thing) and any macro which steps through these will fail most ungracefully unless it saves the document first. Even checking for bookmark = Nothing fails, and there appears no other way to "pack" these collections other than to save… which the user may not want to do.
We've found some other oddities in Word's new VBA (specifically, Jim Karabatsos can show you one instance where any reference to ActiveDocument.anything will inform you that it can't allocate macro storage), but overall I can only breathe a sigh of relief that WordBasic has gone forever.
The source is included.