Published in the August 1997 issue of the Monitor, the monthly
magazine of the Capital PC User Group, Inc.
Euphoria v1.5:
Small, Fast, Cheap MS-DOS Programming Language
by Paul Smith
Euphoria is a new (1993) MS-DOS procedural programming language.
Euphoria programs run in 32-bit protected mode using a built-in
memory expander. They happily coexist with Windows, Windows 95
(including long file names), and OS/2. The host and target
machines for a Euphoria program must be 386 or higher with VGA or
SVGA video. Euphoria comes with an interpreter, a compiler, a
debugger, a multifile colorized-syntax editor, eight function
libraries with source, and over 40 demonstration and benchmarking
programs with source. All are freeware placed in the public
domain by Euphoria's creator, Robert Craig. Version 1.5 of
Euphoria is available for download on our MIX with the filename
EUPHOR15.ZIP.
The language is a possible replacement for QBasic or AWK as a
quick one-off file manipulation tool; for QBasic, Fortran, or
extensible matrix packages for coding mathematical or statistical
procedures; for Qbasic or Pascal as a beginner's first procedural
language; and (most interestingly) for C or assembler as a
language for fast, all-out action/arcade DOS game programming.
Euphoria is not a visual rapid application development (RAD) tool
for the corporate client-side programmer nor a Web language for
the Web site administrator: Delphi/Power++ and Java/PERL are safe
for the moment, but the rest of the procedural world may be in
play. Nor should the RAD vendors look back, because Robert Craig
and his wife, Junko Miura, have formed Rapid Deployment Software
(a Canadian firm) and are working on a Windows 95 GUI version
right now -- Rapid Deployment Software certainly sounds like RAD
to me.
No one has to learn Euphoria to stay employed. (Count of hits on
www.amazon.com: C++, over 1000; Java, over 400; Euphoria, zip).
On the other hand, for those of us who, through duty or delight,
crank programs out daily, this new language is worth a glance.
Rapid Deployment is an exact description of the capability
Euphoria delivers to the procedural programmer.
"Small, Fast, Cheap" is a mantra not much older than Euphoria
itself, but it is briskly rotating into view as a business maxim.
I'll use it to organize my summary of the features of Euphoria
because the match is exact. However, tradition requires that I
first show the language's "Hello, World." (See figure 1.)
----------
[Figure 1]
import java.applet.Applet;
import java.awt.Graphics;
public class HelloWorldApplet extends Applet {
public void paint(Graphics g) {
g.drawString("Hello, World.",50,25);
}
}
----------
Oops, sorry, my mistake. That was applet Java. See figure 2 for
Euphoria. The first parameter is a file handle, using 1 for
StdOut as in DOS or C. Euphoria takes a few features from C, such
as include files for function libraries and clean parameter and
indexing punctuation -- this(x) is a parameter, that[x] is an
index -- but almost nothing else. Let's see what it does have.
----------
[Figure 2]
puts(1,"Hello, World.\n")
----------
Small
Euphoria is a small language. C and C++ have char, short, int,
long, float, double (with signed and unsigned variants) and
pointers of one or another of these forms as built-in scalar
types. Euphoria has "atoms."
Since Aho, Weinberger, and Kernighan's AWK in 1977, interpreted
languages have handled scalars automatically, in machine
language. Thus a coder need never bother with whether something
could happen more than, say, 65,535 times before the user
reinitializes the Patriot battery. There are real benefits to
languages that work this way -- not all of them limited to
combat.
All scaling in Euphoria is internal. Counts don't overflow. If
the decimal value for an ASCII character is sent to a function
that speaks ASCII (like puts), then that value goes out as ASCII
character. The use of a single scalar type should make Euphoria
interesting to those who analyze statistical data, track the
national debt, or transform text files.
There is only one other primary built-in type: the sequence. A
sequence is, simply, a sequence of none or more atoms or
sequences. The concept is a bit startling, like LISP's lists. It
is much more than an array, because it can nest to any depth and
does not require equal "sizes" in its elements, rows, columns,
and so on. There are some subtle conventions in Euphoria's syntax
(see figure 3) that come into play here. (They must be subtle,
they threw me on my face over and over before I got the idea.)
Examples are that single quotes surround only atoms (single
characters or numbers) while double quotes surround sequences
(strings of none or more characters or numbers).
----------
[Figure 3]
atom a,b,c
sequence d,e,f,g,h,i
a = 'a' -- lowercase a, ASCII 97
b = ' ' -- blank, ASCII 32
c = 10 -- scalar value 10, also LF
d = "" -- an empty sequence
e = " " -- a sequence of one blank
f = {10} -- a sequence of one 10
g = "Hello, World.\n" -- a sequence of 14 atoms
h = {a,b,c,d,e,f,g} -- a sequence
i = {{a,b,c},{d,e,f},{g,h}} -- a sequence of sequences
----------
The double dash is Euphoria's only comment designator. The braces
are a sequence forming operator. For example, d = {} is exactly
the same as d = "", and completely different from d = '' which
is, after all, illegal since an atom must exist rather than be
only an empty molecule!
The index operator can pick out the comma inside the g that is
inside the h that is inside the i. (See figure 4.)
----------
[Figure 4]
atom j
j = i[3][2][7][6] -- finds the comma
Parsing: i[3] is {g,h}
i[3][2] is h, a sequence that is {a,b,c,d,e,f,g}
i[3][2][7] is g, a sequence that is "Hello, World.\n"
i[3][2][7][6] is an atom = 44, an ASCII comma
----------
Euphoria takes from Pascal and spreadsheets the double-dot
convention in indexing. To set k equal to the phrase "Hello," we
only have to say what is shown in figure 5, and all indexing is
bounds checked at run time. There are no "wild pointer" errors in
an executing Euphoria program. Of course, there are no pointers
either.
----------
[Figure 5]
sequence k
k = g[1..5] -- is "Hello"
----------
Note the smallness of the syntax: Parentheses only group
arithmetical expressions and function parameters. Braces only
group sequence definitions. Brackets only group sequence indexes.
And there are only two primary built-in types, atoms and
sequences. Yet these are enough to reproduce almost all the
complex types and structures of all other procedural languages.
I'll discuss the exceptions later.
Actually there are two secondary types built in, but they are
idioms rather than independent species: An object is a variable
that can be either an atom or a sequence, and an integer is a
signed atom with 30 bits, about 1 trillion. The atom tops out as
a signed double floating point value of about 10 to the 300th
power with 15 or 16 significant decimal places. It's fun to tell
the kids "Well, the biggest numbers I use are somewhere between a
google and a googleplex." The whole Euphoria math package is
standard IEEE double precision, and it includes the infinities
and not a numbers (NANs) of Intel's floating point processor.
Despite having only two primary built-in types of data
structures, Euphoria is a strongly typed language. Type checking
is automatic even at run time, but can be turned off for speed.
The programmer is free to define new types, as he or she pleases.
The type definition facility is unique -- it tests whether a
variable meets its definition. (See figure 6.)
----------
[Figure 6]
type printable(atom x)
return x >= 32 and x <= 128 --printable ASCII
end type
printable l,m
l = ' ' -- a printable blank
m = 'm' -- a printable lowercase m
----------
The return statement in the type function definition is merely
testing that atom x is indeed printable: It returns true if it
is, or false if not. The interpreter/compiler yells if x was not
an atom to begin with, or if the printable function returns
false. Euphoria's type definition facility is similar to modern
database languages with enforced data definitions and business
rules built into the defining mechanism -- all under programmer
control and all alive at run time. Euphoria is completely type
safe.
Now, to see the payoff to Euphoria's very simple variable
architecture, we can use the contents of figure 7, and have the
character from the upper left corner of the color screen page
just as in QBasic. But all will have the whole screen display
area, characters, and attributes together. Peek is a machine
language function built into the Euphoria language itself and
runs as fast as the silicon can pump. Poke bytesprays in the
other direction. Many other machine language routines are
provided as built-in functions, including memory-to-memory copy
and the usual bit twidlers.
----------
[Figure 7]
atom address -- address of color screen
address = #B8000 -- in Euphoria hexadecimal
object one, all -- maybe atom, maybe sequence
one = peek(address) -- come one
all = peek({address,4000}) -- come all
----------
Euphoria provides a complete graphics library for both text and
graphic VGA/SVGA modes with the full panoply of line, polygon,
ellipse, pageflip, color, palette, sound, mouse, and cursor
tools. It reads and writes BMP files with native functions. These
are just what the wild young talent writing the successor to
Doom(TM) needs.
Euphoria's atoms and sequences do it all. They just do it very,
very fast.
The rest of the language is just as clean but a little more
conventional. All the arithmetic, relational, and logical
operators are in Euphoria, and with common precedence. (See
figure 8.) Powers and remainders are functions rather than
special symbols. Sequences can be concatenated with & although
there are also append and prepend functions.
----------
[Figure 8]
arithmetical: + - / *
relational: < > <= >= = !=
logical: and or not
----------
An assignment statement in Euphoria is, indeed, a statement
rather than an expression as it is in C. The double equal sign
(==) in C for the relation of equality is not needed because an
assignment statement and a relational expression can never be
confused in Euphoria's syntax.
The list of operators in Euphoria is small, but in Euphoria small
is powerful. (See figure 9.)
----------
[Figure 9]
sequence counts, totals, details
counts = repeat(0,10) -- repeat(x,n) assigns x n times
totals = repeat(0,10) -- and we zero totals, too
details = {15,123,23,76,34,5,67,23,34,12345} -- 10 of them
counts = counts + 1 -- adds 1 to all 10 counts
totals = totals + details -- adds all 10 details to totals
----------
All Euphoria's operators are vectorial, including the relational
and logical operators. If the parameters are both atoms, the
operator works as it usually does in other languages. If the
parameters are an atom and a sequence, the operator applies the
atom to every member of the sequence. If the parameters are two
sequences of equal length, the operator applies elementwise along
both sequences. Any other combination gets an error message.
The results arrive at a furious rate. This feature turns a
personal computer into something very like a vector processor.
Deep in the innermost loop of a linear algebra package is an
operator always called "saxpy" for "scalar a times vector x plus
vector y." In Euphoria that is a single statement. (See figure
10.) Of course that's true for Cray Fortran, too. Statisticians,
spreadsheets, and physicists all use code built upon such
vectorial operators. They also come in handy for updating
players' scores in a game and their position vectors.
----------
[Figure 10]
atom a -- scalar
sequence x,y -- vectors
y = a * x + y -- an updating saxpy
----------
Euphoria is still under construction. Two language facilities are
missing. The first is scalar accumulation along a vector
operator. For example, cumulating the sum of the products of a
vectorial multiply. This is the physicist's scalar or dot
product, the statisticians' variance/ covariance summation, and
(when the operator is != not equal) the game programmers'
collision detector. (Babel's curse: each discipline renames the
basic math concepts, often several times.) Of course these
functions can be programmed in Euphoria, but nothing can match
the speed of a built-in function, and Euphoria's design cries out
for a scalar inner product symmetrical with the vector outer
product it already provides.
The other missing syntactic feature in Euphoria (as in AWK) is
the run-time function specifier. C, C++, and Java all use a
pointer-to-function scalar type, and these are the only pointers
that Euphoria's index syntax can't replace. More powerful
languages (like LISP, Scheme, Forth, or PERL) have an eval,
apply, or interpret function built-in that will evaluate a string
and apply the function it specifies.
Euphoria will have to gain at least the function pointer to
become an object oriented programming language (OOP) because the
"member function" is the only missing ingredient for completely
encapsulated "class" definitions. Notice that Euphoria's type
definitions are otherwise completely inheritable. The same
facility is at the heart of simulated annealing, genetic
algorithms, nonlinear function maximization, and other
generalized tools of modern numerical analysis. It is also the
key to "strategy" routines in game programming.
The last small features of Euphoria are the statements
themselves, and there are only a few of them. The three control
flow statements are if, while, and for. (See figure 11.)
----------
[Figure 11]
atom this
sequence that,s,t
if this = that[1] then
puts(1,s)
else
puts(1,t)
end if
----------
As a great improvement over Java's switch statement ("Death to
the switch statement!" Peter van der Linden, Just Java, SunSoft
Press, 1996, p. 106), Euphoria adds a simple elsif clause to the
if statement. (See figure 12.) The while statement is also simple
(figure 13), and the for statement is classic. (See figure 14.)
----------
[Figure 12]
if this = 1 then
that = "One"
elsif this = 2 then
that = "Two"
else
that = "Many"
end if
----------
[Figure 13]
while length(s) > 0 do
puts(1,s[1]) -- forwards
s = s[2..length(s)]
end while
----------
[Figure 14]
for i = length(t) to 1 by -1 do
puts(1,t[i]) -- backwards
end for
----------
The indexing variable in the for statement is local to the loop
and disappears outside it. Euphoria also has a modern namespace
structure where variable and function names are local to their
enclosing procedures or include files. A function or procedure
must be marked as global if its name is to be exported beyond its
namespace.
Both the while and the for statements can have an exit statement
within them that exits to the first statement following the
innermost enclosing loop. Euphoria suffers nothing comparable to
Java's labeled continue statement.
There are no semicolons at the end of lines -- or, as in Pascal,
at the ends of some lines and not others. Euphoria is a modern
stream language with new lines as white space. We may break up
any statement as we please, or put several on a single line. The
indentation wars should be glorious.
This small language has only two built-in types, full type
definition facilities, run-time type checking, full IEEE math,
vectorial operators, run-time bounds checking, structured syntax,
streaming statements with simple delimiters ("end if"), and
modern namespaces. It is an ideal language for the beginner
because there is only a little to learn now and nothing to
unlearn later.
Fast
Euphoria is an interpreted language, just like AWK or QBasic. The
programmer codes in the swift edit-test-edit cycle without
waiting for compile/link operations. Have you ever used QBasic to
code a pesky one-off job simply because you were too impatient to
put up with the interminable chugging of the sophisticated C++
integrated development environment? Euphoria has an integrated
debugger built into the interpreter and also has a compiler to
turn debugged code into a distributable EXE file -- the best of
both worlds in one small fast language.
Coding is faster because Euphoria supplies 8 function libraries
containing 40 functions in addition to the 48 built into the
language. The function libraries cover graphics, image
processing, mouse reading, file and directory operations,
command-line wildcard specifiers, sorting, keyboard input, and
full machine access to user assembly routines, interrupts, and
memory assignments. The libraries (and the editor) are all in
open source so the programmer can lift and learn rather than
recreate. Add the 40 sample programs (also in open source), and
one can get up to speed very quickly.
Notice that there is no "dimensioning" in statements defining
Euphoria's sequences and no arbitrary length or end markers for
text strings. The programmer doesn't have to work out how to fit
objects into 64-kbyte blocks or the 640 kbytes of lower memory.
Euphoria has a built-in memory manager that gives the whole
32-bit flat address space of the machine's memory to the
programmer, and then automatically pages out to disk if more is
needed. A compiled Euphoria program carries the whole virtualized
memory mechanism right along with it -- and almost always
produces an EXE file smaller than 200 kbytes.
How does Euphoria's memory management help? Well, if you have
Windows 95, just drop to DOS and SORT a text file with lines
longer than 512 characters. Buggy, right? Older DOS versions of
SORT had arbitrary limits on the size of the text file they could
sort. That doesn't happen with Euphoria (FILESORT.EX is one of
the demo programs). The secret is that in addition to the memory
manager and virtualized memory paging to disk, Euphoria has an
exceptional garbage collection algorithm that recaptures and
reallocates unused memory automatically.
Euphoria code executes very rapidly: Creator Robert Craig claims
10 to 20 times the execution speed of QBasic and 8 times the
speed of Java. He provides sample programs so we can test his
claims on our own machines. (True, by the way, on both of mine.)
But no one ever thought of QBasic and Java as fast. What happens
if we put Euphoria up against a program its own size? AWK is
small, fast, and designed for transforming text files. A CPCUG
member properly complained about the pseudojustification in
captured man (manual) files from cpcug.org, which contain extra
spaces that right-justify all the text lines. Figure 15 is a
Euphoria routine to remove exactly one blank from a pair enclosed
by nonblanks, or from a trio following a sentence terminator.
----------
[Figure 15]
function scanline(object linein) -- unjust.ex
object lineout -- outputline
integer blanks, chars,
char, oldchar -- counts and characters
chars = length(linein) -- built-in function
lineout = {} -- output line
oldchar = linein[1] -- keep 1st char
if oldchar = ' ' then
blanks = 3 -- keep initial blanks
else
blanks = 0 -- nonblank initial
end if
for i = 2 to chars do -- all after 1st
char = linein[i] -- get char
if char = ' ' then
blanks = blanks + 1 -- count blanks
lineout = lineout & oldchar -- passout prior char
elsif char = '.' or char = '?' or
char = '!' then -- sentence terminals:
blanks = -1 -- count 3 (not 2)
lineout = lineout & oldchar -- blanks, delete 1
elsif blanks != 2 then
blanks = 0 -- nonblank not after
lineout = lineout & oldchar -- 2 (or 3) prior
else
blanks = 0 -- nonblank after 2 or 3
-- blanks, just skip
end if
oldchar = char -- save current char
end for
lineout = lineout & oldchar -- last char was new line
return lineout
end function
----------
I now know how to write a much faster version of this routine,
but fair's fair. The whole program is on our MIX under the
filename UNJUST.ZIP. Figure 16 is the whole AWK program to do
(almost) the same thing.
----------
[Figure 16]
BEGIN { FS = “” }
/^\32+$/ { next }
NF > 0 {
line = $0
while(match(line,/([?!.] [^ ])¦([^ ] [^ ])/) > 0) {
line = substr(line,1,RSTART) substr(line,RSTART + 2)
}
print line
next
}
{
print “”
next
}
END {
}
----------
I don't offer to explain AWK -- but I just counted the number of
AWK programs on my hard drive. There are 165. I write AWK for a
living; take my word that the AWK routine is just as fast as I
could make it. I raced Euphoria against AWK, unjustifying five
copies of nn.man and sending the output to the bitbucket.
Processing 1.2 Mbytes of text file, AWK took 25 seconds. Euphoria
took 18 seconds. Euphoria, the generalized programming language,
was 39 percent faster than specialized AWK working on its home
turf. I am very impressed.
Euphoria is fast to learn, fast to code, and flies when it
executes. Euphoria is fast.
Cheap
Robert Craig has placed the whole Euphoria version 1.5 package in
the public domain. The only limitation is that the debugger cuts
off after the first 300 source language statements (very fair:
only five of the demo programs exceed 300 lines). For $44 the
registered version of Euphoria is available (order blank
included) with an unlimited debugger, a spiral-bound manual, and
a disk. The documentation in the freeware version has everything
in the manual so the extended debugger for large programs is the
only real inducement to register. I found the documentation to be
excellent and complete, but I learned from the open source code
in the freeware version.
But Euphoria itself is only the beginning. Robert Craig's
"Official Euphoria Programming Page" is at
http://members.aol.com/FilesEu/
and contains many additional free source files plus links to
other Euphoria sites around the world.
David Gay has written a Beginners Guide to Euphoria, version
1.01, that is an executable tutorial to the language. It is also
freeware and on the MIX as BEGIN101.ZIP. I found it excellent
and, with open source, illuminating:
http://www.interlog.com/~moggie/Euphoria/
For the game programmer, Lord Generic Productions sells OidZone
for $45. It is both a game and a "crash course in game design"
that uses the game to illustrate the fine points of game
programming. I have not tried the commercial version, but I
worked my way through the free chapters at
http://exo.com/~lgp/euphoria/
and found them clear and clever. Other Euphoria programmers (too
many to list) have home pages providing free code covering
everything from windowing editors through matrix algebra routines
to games, games, games. I found the most value from joining the
Euphoria LISTSERV mailing list. To subscribe, send e-mail to
listserv@miamiu.acs.muohio.edu
with
subscribe euphoria Your Name
as the body of the message. Afterwards you may want to read the
instructions you receive to set your subscription to digest.
There are half a dozen messages a day, usually from programmers
exchanging code and helping others debug.
Euphoria is a small, fast, cheap programming language and a true
gift to young programmers. But the Euphoria community itself is
the most striking part of Robert Craig's creation. If you were
around in the early days of DOS assembler coding when the free
sources from Toad Hall and Snippets were flying about, or at the
start of comp.sources for Unix, you remember the variation on our
"Users Helping Users" that was programmers helping programmers
with glee and spots of genius. Euphoria is a new language that
has brought back something old that I am glad to share once more.
That is the reason I wrote this article.
Paul Smith programmed the IBM 7070 and 1401 in the early 1960s
and has been the research director for a nonprofit, using donated
hardware and obscure languages, for the last two decades.
================================================================
Copyright 1997, by the Capital PC User Group, Inc. All rights
reserved.
Permission for reproduction in whole or in part is hereby
granted to other non-profit and computer user groups for
internal, non-profit use, provided credit is given to the
Capital PC Monitor and to the author(s) of the reproduced
material, and attribution of copyright is included.
Permission is also granted for posting on electronic bulletin
board systems, provided credit is given to the Capital PC
Monitor and to the author(s) of the reproduced material, and the
files are made available in their entirety, without alteration,
including this notice.
All other reproduction, other than for personal use, without the
prior written permission of the Capital PC User Group is
prohibited.
Unless specifically stated, opinions expressed in any article or
column are those of the individual author(s) and do not
necessarily represent an official position or endorsement of the
Capital PC User Group.
Capital PC User Group, Inc.
Plaza East Two
51 Monroe Street
Rockville, Maryland 20850
MIX BBS: (301) 738-9060 (10 MultiTech v.34 modems)
(301) 738-9061 Alternative modem
Office: (301) 762-9372