Encoded files, formatted files, portability, etc...
jrc at ANBG.GOV.AU
Sat Feb 13 11:07:10 CST 1993
The recent exchange on taxacom on posting of uuencoded files touched on
on a topic that has been bugging me for some time: the tendency of people
to invoke the arcane rites of wordprocessors where a simple text editor
would be quite sufficient.
Most electronic writing in our organization, which I doubt is little
different from any other government, academic or private bureaucracy, is
concerned with minutes and instructions that will be ignored by
management and staff alike, or with manuscripts that will be butchered
by editors, reviewers and other low-life. Such documents are not really
enhanced by a 16.5 point grotesque italic font in a double-bordered
shaded shadowed box. But people insist on doing this kind of thing,
simply because it can be done.
For those of you who follow such things, there has been some fairly
intense discussion between librarians and information professionals on
net groups such as pacs-l, about the relevant importance of 'form' and
'content'. This is an on-going difference of view and there does not
appear to have been a resolution.
The problem we are facing is how to make a piece of scientific
information available to the widest possible audience in a form that
they can do something useful with. At the moment our strategy is to
place anything botanically useful or interesting on a gopher information
server (mor on this later), in a form that *any* system can access and
use. Because of the almost total lack of compatibility between systems
we are forced to use lowest common level of human understandable data
representation: 7-bit ascii. That means no bold, no italics, no
underlining, no fancy paragraphs or page layout, not fonts, no graphics,
no diacritics, etc.; just simple, raw, unembellished, naked information.
The first attempt at achieving this was to produce ascii exports of the
word-processed files. This sort of worked but was not without problems.
People specified paragraphs in different ways, tabs were different
widths, page breaks were handled differently etc. All this meant that
the ascii export files had to be re-edited to a greater or lesser extent
to make them generally presentable on screen and paper.
To try and reduce the amount of re-editing and introduce a degree of
uniformity between documents prepared on different word-processors, a
rough style manual was thrown together. Appended below.
We would really appreciate comments on the concept and the content of
such a manual to make it more flexible and usable.
------------------------ Cut here --------------------------
Australia National Botanic Gardens
Electronic Document Style Manual
Prepared by Jim Croft, January 1993
Most documents of substance these days are prepared electronically
and often there is a desire or need to place these documents on the
network to achieve wide and timely distribution.
Problems of incompatibility arise because documents are prepared on
a variety of word-processors and text editors and are received for
storage and display by an large number of combinations of terminals
and display applications ( such as word-processors ).
Word-processors allow users to set attributes of characters,
paragraphs, pages, and documents but although there are some
conventions, they are not standard and different word-processing
applications implemented the same thing in different ways.
Moreover, text editors and word-processors allow users to create
documents of similar appearance using different techniques. This
creates problems when documents are moved and converted from one
environment to another.
The layout of a document depends very much on the purpose for which
it is intended, the objective being to minimize the amount of
reprocessing required to load, read or incorporate the text.
Helpful tips on layout can be found in manuals covering the
preparation of typewritten reports. A chapter on this occurs in
the earlier editions of the Australian Government Printing
Service 'Style manual for authors, editors and printers of
Australian government publications'. Interestingly, in the latest
edition this chapter has been omitted in favour of more detailed
accounts of electronic desktop publishing.
The following suggestions cover some topics that may make a
document more readable as as ASCII text file, and less likely to
require intensive manual editing when imported into another
Pagination is not a critical matter in electronic text where linear
scrolling through the document is possible. If you chose to
paginate a document, bear in mind there is no international paper
size and documents formatted for the old world A4 paper size may not
print comfortably on the new world 8.5 x 11 inch paper size.
For this reason, if page breaks are essential, it is preferable to
use the ASCII page break code, ^L, rather than the appropriate
number of linefeeds required to place the text on the next page.
Running headers and footers, may make a printed document look
'nicer', but they are irritating when reading an electronic
document a screenful at a time.
As some terminals and emulations are not resizable and have a
screen width limited to 80 columns, and some printers truncate (or
wrap awkwardly) lines longer than this, the effective workable
line width should be kept within this limit.
The page length perhaps should not be specified, following the
comments above. If a document is formatted for a particular paper
size if will look awkward on the screen, if formatted for a 24 or
25 line screen it will look awkward when printed.
To make a document more readable on the screen, it should be
indented on the left margin (say 2 - 5 spaces). To incorporate
it into a word-processor the indent spaces on each line will need
to be globally removed. Each line of text should fall short of
the margin by a comparable amount.
Multiple columns of text may look good in a printed newsletter by
are next to impossible to read on an 80 x 25 screen. The document
should be a single text stream of a single column.
Hyphen to break words should be avoided as there is no standard to
handle optional hyphenation in ASCII friles. When imported into a
text editor or word-processor, the hyphens will almost certainly be
in the wrong place and have to removed.
Fonts, Style and Size
The font style and size of a letter have little meaning in an ASCII
text file. To get an idea of what a file will look like, compose
it in 12 point courier without attributes such as bold, italic,
underline, etc. Courier is recommended because it is fixed with;
proportional fonts such as helvetica and times roman will not line
up in the same manner as fixed width fonts.
Italics, Bold, Underline
There is no standard way of displaying or printing character
attributes such as bold, italic and underline; what works on one
printer or screen may not work on another.
Some conventions have been adopted on the network and may be
appropriate. _Italic_ and _underlined_ text can be indicated by
surrounding the text with the underscore character. *Bold* or
*emphasized* text can be indicated by surrounding the text with an
In titles, it is possible to underscore words by placing a row of
hyphens or equal signs (double underline) on a blank line beneath
the title, as in this document. This is not compatible with the
underlining conventions of word-processors.
Wordspacing and Letter-spaced Words
In general, one blank space should be left between words (see
Justification, below). L e t t e r - s p a c e d words are
often effective in titles and sometimes for emphasis in the body of
the text. If two or more words are involved, they should be
separated by three spaces. With access to different font sizes,
letter spacing is rarely used in word-processors.
Diacritic Marks and Symbols
There are no universally accepted ways of display or printing
letters with diacritic marks and so these should be avoided.
On some printers it possible to fake some diacritics with the
use of the backspace function, but this does not work on
Ligatures should be represented as two separate letters, ae, oe,
The tab character is very useful in aligning columns of text.
Problems arise because tabs may have different widths depending on
the environment or word-processor or text editor. Common values
are 8 characters, 5 characters or 0.5 inches, 10 characters or 1
inch. Problems arise when tabs laid out on one enviromnet are
displayed on another - the columns often to not line up.
Furthermore, the ASCII tab character (^I) left aligns text; there
is no standard way to right align or decimal align ASCII text.
It is recommended that tabs be avoided and text be laid out using
spaces. This of course works only with fixed width fonts and not
when the document is formatted with a proportional fonts.
The left and right paragraph margins should be set following the
document margin conventions listed above.
Paragraphs can be separated in a number of ways: a blank line
bewteen each; indenting (say 5 spaces) the first line of each
parargraph; the presence or a hard carriage return or linefeed; and
Paragraphs separated by blank lines are easier to pick out and
conveniently separate large blocks of text. Lines within a
paragraph should end in a newline character and each paragraph
separated by a blank line (two successive newline characters).
It is not necessary to use both a blank line and an initial indent
to delimit prargraphs.
Generally speaking text should be single spaced, with a double
space between paragraphs. For clarity and emphasis in some lists
it may be appropriate to double space the items. Incremental
spacing is not defined in ASCII text and is not available on many
terminals and printers.
Block text should be left-justified. While it is possible to
pad the space bewteen words to give the appearance of fully
justified text, this is not done evenly and looks contrived;
moreover, the inserted hardspaces upset the spacing in the text is
imported into a word-processor.
Centered text should be done by left padding with an appropriate
number of spaces.
Titles and headings should break up and draw attention to the
relative importance of various parts of the text. They should be
employed liberally to give structure to a linear stream of text.
A hierachy of headings can be employed using a combination of
centering, left and right justification, upper and lower case,
letter spacing, underlining with hyphens or equal signs, etc.
Further structure or emphasis can be achieved by indenting the
paraphs (say, an additional 2 - 5 spaces) at each level of the
Table are awkward things and do not translate comfortable between
different word-processors and display devices. It is recommended
that they be created using fixed width fonts and appropriate blank
padding. Boxes can be drawn around tables.
Lines and Boxes
Horizontal lines can be drawn with the characters _, -, = and
vertical lines with |, !, I. the plus sign, +, makes a nice
intersection between the vertical and horizontal, | and -.
There is no way to adequately handle graphics using the standard
7-bit ASCII character set. An approximation of graphics can
sometimes be achieved by using characters such as:
with appropriate blank padding to achieve alignment. This will
only work with fixed width fonts.
Jim Croft [Herbarium CBG] internet: jrc at anbg.gov.au
Australian National Botanic Gardens voice: +61-6-2509 490
GPO Box 1777, Canberra, ACT 2601, AUSTRALIA fax: +61-6-2509 599
____Biodiversity Directorate, Australian National Parks & Wildlife Service____
More information about the Taxacom