This file is sort of kind of part of the TADS Author’s Manual.
This thing is copyright © 1998-2001 by N. K. Guy, tela design.


Converting an existing TADS game into HTML


How to convert a TADS game to HTML TADS

Converting a regular TADS game into HTML is not a difficult process. In fact, this document probably makes the whole business look a lot more complicated than it really is. However, I wanted to be fairly comprehensive in my discussion of the process, so I go into a fair bit of detail.

The new HTML system is essentially just an HTML interpreter built into the TADS runtime, so converting to HTML doesn’t involve any huge changes to your game’s code or anything. All you need to do is search out and replace any text that might be erroneously interpreted by the runtime as HTML and HTMLize any special character sequences.

In this document I’m assuming that you want to add a little simple text formatting to your game. I don’t deal with complex sound and audio stuff at all.

 


Terminology.

(maybe rewrite this bit as more an introduction to HTML?)

HTML - Acronym for HyperText Markup Language. The markup language used to describe files viewable with a typical Web browser. HTML TADS understands HTML markup for formatting of the textual output of HTML TADS adventure games. HTML files are simple 7-bit ASCII files. Markup codes are defined using tags.

ASCII - Acronym for American Standard Code for Information Interchange. The method of encoding text used by virtually all computers these days. Traditional ASCII is 7-bit only - it relies on 7-bit bytes to encode text, which means that 2 to the power of 7 or 128 characters are possible. Those characters include the letters A-Z in both upper and lower case, the numerals 0-9, various formatting characters such as !, @, # and $, and a number of non-printing control codes.

ASCII thus has a pretty limited set of characters. Sort of like a television soap opera. This is particularly a problem as normal 7-bit ASCII is American indeed - it does not have any provisions for encoding accented (diacritical) characters used by non-English languages. There are international character sets which replace certain formatting characters like \ and { with accented letters, but they aren’t very flexible owing to the limited number of spare formatting characters within 7-bit ASCII. And unfortunately 8-bit ASCII, which permits up to 256 characters, is not standardized. Each operating system (UNIX, DOS, Mac, etc) supports its own and incompatible set of high-bit (8-bit ASCII above the usual 7-bit range) characters. See entity.

Entity - In the context of HTML an entity is a way of encoding, using plain 7-bit ASCII, a given character. Entities are often characters that do not appear in the standard ASCII set, but they don’t have to be. For example, take the less-than symbol. ( < ) It’s part of 7-bit ASCII but also marks the start of an HTML tag. So you can’t include it directly in text output, because otherwise the HTML parser will think that you’re starting a new tag. So to display the < symbol you have to encode it as an entity. ( &lt; ) Entities are more correctly called character references, but that always makes me think of apartment rental application forms. See ISO Latin-1.

ISO Latin-1 - A method of encoding a common set of accented (diacritical) characters using basic 7-bit ASCII. The accents and special characters used by most Western European languages can be encoded using ISO Latin-1. Each accented character or special character is identified by a unique ISO Latin-1 entity. An entity is the & symbol followed by an abbreviated description - in English - of the character and ended by a semicolon. For example, the lower-case letter e with an acute accent is encoded in ISO Latin-1 as “&eacute;”. In many cases a numeric code is substituted for the abbreviated description of the entity. (eg: &#233;) Entities are somewhat cumbersome and inefficient, but they do work. ISO stands for International Standards Organisation, and the full technical name for the Latin-1 specification is ISO-8859-1.

Unicode - A way of encoding characters using 16 bits of information rather than ASCII’s 7. The result is a much wider range of character options. Unicode has encoding choices for Western (Roman) alphabets, Cyrillic (Russian and Ukrainian, etc.), Japanese, Chinese, Korean and so on. Unicode is not, as yet, particularly well supported by typical microcomputer operating systems. TADS has some internal provisions for Unicode but doesn’t yet support it. The Universal Character Set defined in ISO standard 10646 (ISO/IEC 10646-1:1993) is basically the same as Unicode 2.0.

Parser - A computer program designed to take in a particular piece of text information and process it. For instance, in a text adventure game the parser is the code that takes the player’s textual commands and breaks it down and interprets it. In the context of HTML, the HTML parser is the code that analyzes a text file and interprets any HTML instructions contained therein.

Tag - A simple HTML markup instruction. Tags are enclosed in <angle brackets>. For example, to mark a piece of text as bold you simply put in a <B> tag. To indicate that boldface should end you put in the closing tag </B> The obvious consequence of this scheme is that any text enclosed in angle brackets is assumed by an HTML parser to be an HTML tag. Since unrecognized tags are ignored by browsers, putting a < symbol into a piece of text means that everything following the < symbol will be ignored by the HTML parser.

 


Step 1 of the Conversion: look for <, > and &

HTML uses those three characters to signify the beginning of a tag, the end of a tag and the beginning of an entity respectively. So make sure that those three characters do not appear in your game’s textual output anywhere.

You have to replace each occurrence of the characters with their HTML entity equivalent. HTML entities are 7-bit ASCII representations of the characters that are displayed by the HTML parser and not interpreted as HTML code. Replace each occurrence of:

< with &lt;
> with &gt;
& with &amp;

Now remember you only replace the symbols where they occur within textual output. You do not replace them if they appear within a code segment. For instance, the > symbol in the example below is a problem and must be replaced, because it appears in a line of text:

  if ( global.suchValue )
     "From this -> that! ";

However the > symbol in the example below is not a problem, because it appears within TADS code and not within a line of text:

   if ( firstValue > secondValue )
   {
     say( firstValue ); " is bigger than "; say( secondValue ); "! ";
   }

So how can you get your computer to tell the difference between the two cases? Well, you can’t, easily. It’d take a moderately sophisticated search and replace routine to be able to check for problematic symbols located within quoted text. So the easiest thing to do is simply to go through your source code manually. Just do a search for each of the three characters and replace the troublesome ones one by one. This shouldn’t be a very time-consuming process as you probably won’t have that many occurrences of the three characters in your game.

 


Step 2: Optional - look for quotation marks and TADS codes.

Regular HTML uses the escaped character \" to encode a neutral quotation mark. You may want to go through your code with a simple search and replace and replace any \" with &quot;, which is the HTML equivalent, but it isn’t required. The HTML TADS interpreter understands \" as well.

HTML TADS also supports curved (typographical) quotation marks - see the section on Fancy Formatting below.

Another optional thing you can do is to change your use of other regular TADS codes. For example, in regular TADS the \t code sets a tab. If you like you can change that to the HTML TADS equivalent, which is <TAB MULTIPLE=4>. The MULTIPLE tag attribute is an HTML TADS extension that means “move to the next tab stop x spaces over.” Similarly you can replace the begin and end highlighting codes \( and \) with <B> and </B> respectively. Or, if it’s more appropriate, <I> and </I>. <B> is boldface text and <I> is italic.

Similarly, you can change \b to <P> and \n to <BR HEIGHT=0>. The HEIGHT tag attribute is another HTML TADS extension that creates a new blank line but with a zero (in this instance) height.

Now, you don’t have to go and replace all your regular TADS symbols with HTML equivalents. You can use the old \ codes and they’ll work just fine. However, you do have the option of switching to HTML for increased flexibility if you desire.

 


Step 3: Replace any 8-bit ASCII

This section won’t affect most people. However, some people had special versions of their TADS games designed for specific operating systems, in order to take advantage of each system’s 8-bit ASCII set. If you don’t know what that means then don’t worry about it, as you probably haven’t used anything above 7-bit ASCII. In fact, I’d be surprised if anyone other than me did this...

If you have implemented some 8-bit ASCII text output in your game you should convert all the high-bit code into HTML entity equivalents. For instance, at one point I had three versions of the game that I’m working on. One version was plain 7-bit ASCII for UNIX systems, one version contained 8-bit ASCII designed for the Macintosh and one contained 8-bit ASCII designed for Windows. Because of HTML TADS’ support for ISO Latin-1 I’ve been able to unify the code into one set of files, written with HTML entities.

All I did was to run my code through the HTML filters built into my text editor of choice - BBEdit for Macintosh. It automatically replaced each 8-bit character with its HTML entity equivalent, saving me a lot of time. You should be able to do the same thing with your code, though not all text editors have conversion tables for HTML entities built in, so you may need to do manual search and replacing.

 


Step 4: Update your status line code

The status line is the only part of the HTML conversion process that may involve a little work - or it may involve absolutely no work at all. It depends entirely on whether your game uses the default adv.t status line code or whether you’ve rolled your own.

If you simply use the adv.t status line and haven’t customized it in any way, rejoice! Converting your game to HTML involves no status line modifications at all, because the upgraded adv.t file contains new code to emulate the traditional behaviour of the older software using the new <BANNER> functionality. All you need to do is replace your old adv.t file with the latest one. Easy.

However, if you’ve fiddled the status line to do your own thing then you’ll need to rewrite it. Basically let’s start with an overview of how the traditional TADS status line works.

The traditional TADS runtime is essentially hardcoded to produce an Infocom-style status line. In other words, the room name is printed on the left side of the bar and the player’s score and number of moves (separated by a / ) are printed on the right side of the bar. The room name is displayed by the statusLine hardcoded method in the room code. Any text that’s output by that method is automatically installed into the left side of the status line. The right side of the line is generated by the setscore() function (which in turn calls the scoreFormat() function, at least in TADS 2.4.0 and higher). Any time that function is called the text passed to it is written to the right side of the line.

The functionality of the traditional TADS status line grew over time, which is why the left side is controlled by a method and the right side by a function. TADS 1 didn’t allow you to change the right side of the status line - that feature was added later. (amusing side note: on some older versions of the TADS interpreter, this text was drawn every time setscore() was called. So you could do sort of simple textual animation by calling setscore() repeatedly and watching the text change. I thought this was pretty cool. Unfortunately this trick doesn’t work with interpreters like MaxTADS and code derived from it - I think they buffer the status line or only redraw it once or something.)

Now HTML TADS handles things very differently. It uses the <BANNER> tag from HTML 3 to create a new non-scrolling portion of the window (sort of like a non-scrolling frame). You can position this banner along any side of the screen - top, bottom, left, right. You can put graphics or text into the banner, and you can have multiple banners on-screen at one time. In other words, HTML TADS is considerably more flexible than traditional TADS when it comes to implementing status lines. And of course that means that making a status line that emulates the behaviour of the Infocom-style bar is quite easy.

Banners are fairly straightforward - you just insert a <BANNER> tag into your game. Each banner has a name associated with it so you can specify multiple banners with ease. There are several attributes for banners, such as banner height (or width), whether there’s a border to the banner (ie: a separator line between the banner and the rest of the window) and so on. Any text displayed between the opening <BANNER> and the closing </BANNER> is put into the specified banner. And of course you can put any HTML you like in there - plain text, <IMG>s, even tables.

Unlike traditional TADS status lines, banners are not called by two separate bits of code - the contents of a given banner tag define the entire banner. So if you’ve divided your status line code into two pieces - one which updates the left side of the status line using statusLine and one which updates the right side using setscore() - you may have to rethink things a bit for banners.

One tack you might want to take is to set up two global variables - one for the left text and one for the right. These variables would be single-quoted strings, updated as necessary by your two separate pieces of code. Then once per turn the <BANNER> tag would be displayed and your text strings pulled out of the global object and printed with the say() function. Of course, you could simply scrap the whole left/right text model altogether and go for something totally new, since banners afford so many more options, but using two global variables might be the easiest way to do a quick and dirty conversion of the existing code.

Note that you shouldn’t have any \n newlines in your strings if you do things this way. Unlike the banner routine, the traditional statusLine method normally puts a newline at the end of the line. In a banner this simply inserts a blank line (thus increasing the height of the banner if you haven’t forced it to be a specific height) which can lead to formatting problems.

A key thing to keep in mind here is that you shouldn’t use the existing status line specification methods at all in your game when it’s running in HTML mode on an HTML TADS runtime. Never call setscore() and never set a double-quoted string or print any text at all in statusLine. This latter is very important. The old statusLine method should never ever be called in an HTML game, since statusLine is a hardcoded feature. (that’s one of the reasons why the revised adv.t file uses a method called statusRoot) If you call statusLine in an HTML game then the text it displays will simply be drawn down by the input prompt rather than in the status bar, which is almost certainly not what you want.

Now, having said all that, there will be an important case when you will need to use the old status line methods of displaying status line text. And that case will be when you write a game that’s designed to work both under HTML TADS runtimes and text-only runtimes. Take the following example. Here we’ve assumed that we’ve done a test for HTML TADS compatibility early on in the game - probably in the init() statement. We’ve put the result of that test into global.htmlEnabled. We’ve also separated out the text for the left and right side of our status bar and put the bits of text into global variables.

  if ( global.htmlEnabled )
  {
    // we're using an HTML TADS runtime
    "<BANNER ID=StatusLine HEIGHT=PREVIOUS><BODY BGCOLOR=SILVER TEXT=BLACK><B>";
    say( global.leftStatusText );
    "</B><TAB ALIGN=RIGHT>";
    say( global.rightStatusText );
    "</BANNER>";
  }
  else
  {
    // we're using a plain-text TADS runtime
    setscore( global.rightStatusText );
  }

Now this won’t update the left side of the status line in plain-text runtimes. To do that we make sure that the base room class has the following statusLine method:

  statusLine =
  {
    if ( not global.htmlEnabled )
      say( global.leftStatusText );
  }

You now have a game with a status line that’s fully compatible with both HTML and plain-text runtimes.

 


Step 5: Do any fancy formatting

If you wish, you can go through and add in fancy text formatting that you couldn’t before. For example, HTML TADS supports typographical (curved) quotation marks, which look far better than neutral ones. You can go through your textual descriptions and replace each occurrence of an apostrophe or quotation mark with a curved one, thus:

is &lsquo; (left single quotation mark)
is &rsquo; (right single quotation mark, or apostrophe)
is &ldquo; (left double quotation mark)
is &rdquo; (right double quotation mark)

Also, TADS supports a useful smart quote feature - putting in the <Q> tag automatically generates the correct quotation marks. Thus:

<Q>Hello,<Q> he said.

will appear to the player as:

“Hello,” he said.

Remember that many kinds of advanced HTML formatting don’t survive very well when viewed by TADS interpreters that don’t support HTML. Tables are particularly messy that way. So you might, in certain places, want to put in some check code to test for the presence of HTML parsing. To do this use the systemInfo function. For example:

   if ( systemInfo(__SYSINFO_HTML ) = 1 )
   {
     // print out the HTML table
   }
   else
   {
     // print out the plain text equivalent
   }

In fact, since the browser’s ability to interpret HTML does not change at any time, you could put in a test early on in the game - say, in the init() code. This check could set a global variable that you could examine later on in the game. Note that the test for __SYSINFO_HTML returns 1 if HTML parsing is present and 0 if not, so don’t test for true or nil here.

Another important thing is that versions 2.2.4 and higher of the text-only TADS runtime do have the ability to understand limited HTML. That is, they can print entities correctly as well as handling a few simple tags, such as <P>, <BR> and <B>. Basically the runtime supports those HTML tags with traditional TADS \ sequence equivalents. However, a __SYSINFO_HTML test still returns 0 on these runtimes, because they don’t fully support HTML. This makes it quite easy to write a game that uses basic HTML (it prints the \H+ switch as below) but which works properly on both plain-text and full HTML runtimes.

 


Step 6: Put in the magical \H+ switch

Finally, you need your game to print out the \H+ sequence at some point early on in the game. This is the sequence that tells the interpreter that it should henceforth be interpreting HTML. If your game doesn’t print this sequence then all HTML will be printed out to the screen as-is rather than interpreted - not a particularly attractive thing.

Since you probably want your game to use HTML right from the outset be sure to put the \H+ early into the game; the init() function probably being the most logical location. Note also that you can turn HTML parsing off using the \H- sequence. This is useful for bug testing, as you’ll be able to read the exact output of your game on-screen. If your code produces strange HTML, strange on-screen results may appear. And it’s a lot easier to debug the problem if you can view the actual HTML source.

If your game includes a test for the global.restarting variable in the init() function, be certain to put the \H+ text in before the global.restarting code. That way \H+ will be output to the text formatter every single time the game is started up, regardless of whether the game is running from scratch or loading from a saved game file. If your game doesn’t include a test for global.restarting then this isn’t an issue, of course.

 


Other Stuff

Tags supported by HTML TADS.

Tags not supported by HTML TADS.

Colour names supported by HTML TADS.

Character references (entities) supported by HTML TADS.