RULE 11 - Use the Balanced Line algorithm for file "matching"
instead of the key matching algorithm.
Of all the COBOL coding constructs that could have the most
effect on making programs simpler and more maintainable, this
is the foremost. As was shown in chapter 8, this algorithm has
been available since the mid 1970's, and yet most textbook authors
do not appear to be aware of its existence. Instead they generally
offer some variation of the "less than, greater than, equal"
key matching algorithm. And this in spite of some of the difficulties
it presents to them.
One author goes to a great deal of detail (19 pages of text)
in outlining how to handle the nine cases involved in matching
two files during an update (three key match conditions - less
than, greater than, and equal to, combined with three transaction
types - add, change, and delete).1 If the
balanced line algorithm were being used, there would only be six
cases (two key existence conditions - key exists, and key does
not exist, combined with the three transaction types). In this
latter scenario, whether a key exists is just one more type of
edit condition to be applied in editing the transaction.
Of course, no mention is made of what happens when an add
transaction and a change transaction happen to occur in the same
batch of transactions, or what would happen if there were more
than two files. The necessary logic to explain those situations
would be far more than the 19 pages the simple situation consumes.
Another author uses this key matching algorithm as the example
in his chapter on "Refinement".2
Such statements as "As we start to work backward from the
required output to the required input, we notice (perhaps for
the first time) that nothing guarantees that a record will be
available in each input area." and "This was probably
a wise decision, but it has created a problem with some code already
written. If we invoke these routines from two higher levels, how
shall we place the definitions of the FLAGS?" and "There
are a number of potential solutions to this paradox." The
author seems to regard this file update algorithm as a sufficiently
complex problem to require a detailed discussion of how to code
it.
In addition, the authors of texts which specify the key matching
algorithm for sequential file updating, also generally also assume
that their programs are processing only valid transactions ("If
T-KEY = N-KEY and the transaction code is a deletion (neither
an insert nor a change), the new master file record, which contains
the item to be deleted, is simply not written onto disk."3).
Adding editing for such things as invalid transaction codes or
changes with invalid values would unduly complicate the issue.
But with the balanced line algorithm, it is trivial, since the
existence or non-existence of a master record, which is checked
simply by interrogating a flag, is just another type of edit check.
One text went beyond simply using the key matching algorithm
as the assumed way that file matching must be done, and put it
in a standard as if to say, this is the only way it can be done,
so everyone should do it this way.4
Finally, one author used the key match algorithm as his example
of how to create a structured program design complete with structure
charts, and flowcharts.5
Fortunately, there were a few good examples of the correct
algorithm available. Two out of the eleven texts which presented
a file update algorithm used the balanced line algorithm. Popkin
simply assumes that this is the correct way and presents it with
no explanation.6 Lim gives a more elaborate
explanation that is well worth reading by anyone who may not know
of the algorithm.7 An entire chapter is devoted
to it, although the entire logic flow and explanation only takes
five pages, the rest of the chapter being devoted to an extensive
example. His opening remarks are worth repeating:
"Sequential update programs can be standardized through
what is know [sic] as the Balanced Line algorithm; although this
algorithm was discovered more than a decade ago, only a few of
the more experienced programmers are aware of its existence.
This is unfortunate, since the algorithm makes it possible to
write update programs with ease."
This was written almost ten years ago, yet few other authors
seem to have "discovered" the existence of this algorithm
so they could pass it on to their readers.
RULE 12 - Use the PERFORM UNTIL structure instead of the historical
control break structure.
This control structure is equally important as the balanced
line algorithm discussed above. It is also just as misunderstood.
Interestingly, neither of the authors who presented the balanced
line algorithm presented a proper solution to the control break
situation. This reinforces my beliefs that the authors do not
really understand the algorithm behind the code, but simply pass
on those coding constructs that they have personally encountered,
not very dissimilar to what the students who learn from their
courses will do.
Lim, who gave such an excellent treatise on the balanced line
algorithm, simply lists the pseudo code for handling control breaks
without any explanation about why it should be that way.8
Others are more elaborate and may even include structure charts
showing all the detail on how to implement a multi-level control
break.9,10 Those that use this algorithm
also generally have to note some of the problems that using an
incorrect algorithm generates.
"At the end of the file, we need to print the subtotals since
there can be no control break generated without more records."11
"There are two tricky things about testing for control breaks.
At the beginning of program processing, the program logic must
bypass the false control break that will occur when the first
record is read. Then after all the input records have been processed
and end-of-file has been reached, the program logic must force
out the final control total line. Failure to provide for these
requirements will result in the common programming control-break
program bugs shown [above]."12
"Forcing the Last Store Footing ... This kind of situation
usually occurs in control-break programs, and the explicit documentation
is therefore not necessary."13
Only a single author presents the correct algorithm for a
multi-level control break problem.14 It is
no wonder that few programmers ever learn it.
RULE 13 - Use the "triform" structure as the main
control structure in your program.
This rule is really a generalization of the preceding one
where it is specifically applied to control breaks. Since only
one author presented the correct algorithm there, it is not surprising
that none of the authors presents a good argument for the triform
structure. However, it is interesting that all of the authors
who show any program examples use the triform structure for the
"mainline" paragraph of their programs. However, they
never extrapolate this structure and use it for lower level structuring
where it would be useful in either control break programs or file
update programs mentioned earlier in this chapter. With this
lack of notice of this central control structure, one cannot fault
the average programmer for being unaware of it.
RULE 14 - Do not indent when coding a "linear" nested
IF to implement a CASE structure. Code the ELSE IF on the same
line as if it were a single verb.
A key part of any COBOL textbook is a discussion of the IF
verb. Equally important is a discussion of how control structures,
especially the IF statement, may be "nested". However,
there is a lot of confusion among the various textbook authors
about the implementation of the CASE structure. Most of those
that discuss it limit their remarks to the GO TO DEPENDING ON
statement in COBOL and do not include the IF ... ELSE IF ... ELSE
IF ... ELSE construct in their remarks. Following are some of
the obvious examples of an n-way IF that is really an instance
of a CASE construct, but which are improperly nested.
IF K = 1
IF NEW-YORK
IF PRODUCT-CODE-INPUT = 'H'
MOVE HARDWARE-CONSTANT TO PRODUCT-TYPE-REPORT
ELSE
IF PRODUCT-CODE-INPUT = 'S'
MOVE SOFTWARE-CONSTANT TO PRODUCT-TYPE-REPORT
ELSE
MOVE INVALID-CONSTANT TO PRODUCT-TYPE-REPORT.15
MOVE "FRESHMAN" TO LINE1
ELSE
IF K = 2
MOVE "SOPH" TO LINE1
ELSE
IF K = 3
MOVE "JUNIOR" TO LINE1
ELSE
MOVE "ERROR" TO LINE1.16
THEN ADD TRANS-AMOUNT TO NEW-YORK-CTR
ELSE IF WASHINGTON
THEN ADD TRANS-AMOUNT TO WASHINGTON-CTR
ELSE IF BOSTON
THEN ADD TRANS-AMOUNT TO BOSTON-CTR
ELSE ADD TRANS-AMOUNT TO OTHER-CITY-CTR.17
Some even make comments on the "difficulty" of understanding
an IF statement if it is nested more than 3 levels deep.
"The reason we recommend avoiding more than three levels
of nesting is that more often than not, code involving complex
nested IF statements is hard to understand."18
Fortunately, some authors seem to realize that a series of
IF statements in a single sentence do not necessarily have to
be "nested". McClure calls this structure an "n-Way
Branch" and gives the following example:
IF FORMAT-TYPE = 01
This example is probably a little too simplistic. Another
states it a little better. "The IF...ELSE combination can
also be used to generate another version of the CASE structure."20
The following example is used:
MOVE 01 TO OUTPUT-TYPE
ELSE IF FORMAT-TYPE = 02
MOVE 02 TO OUTPUT-TYPE
ELSE IF FORMAT-TYPE = 03
MOVE 03 TO OUTPUT-TYPE
ELSE
PERFORM ERROR-PROCESSING.19
IF EDIT-CODE = 1
PERFORM 4110-EDIT-1
ELSE IF EDIT-CODE = 2
PERFORM 4120-EDIT-2
ELSE IF EDIT-CODE = 3
PERFORM 4130-EDIT-3
ELSE IF EDIT-CODE = 4
PERFORM 4140-EDIT-4
ELSE
PERFORM 4150-EDIT-5.
One author calls this type of structure "A Special Kind
of IF Statement" and states "Some programmers feel that
since one of the True paths at most will be executed, it is clearer
to write it this way."21
Finally, one author calls the two kinds of nested IF statements
"Linear Nested IF Statements" and "Nonlinear Nested
IF Statements" and states "Before the development of
structured programming concepts, use of nested IF statements was
usually discouraged because they were considered complicated and
difficult to understand. However, with structured programming,
nested IF statements are often required to provide proper control
of statement selection. The complexity of nested IF statements
is reduced when [1] the programmer thoroughly understands how
the ELSE statement groups are paired with IF conditions, [2] proper
indentation forms are used when coding the nested IF, and [3]
the number of levels of nesting is limited to perhaps three or
four."22
This last author has reached a balanced view of nesting which
many of the other authors have not.
RULE 15 - When coding nested IF statements, always code both
the true and false paths, using the NEXT SENTENCE or ELSE NEXT
SENTENCE construct as necessary.
This topic is not addressed by many authors, except syntactically.
If it is addressed, it usually expressed something like "The
clause ELSE NEXT SENTENCE may be omitted if it appears immediately
before the period."23
One author does make a positive statement.
"... the ELSE NEXT SENTENCE could be optionally coded. eginners
may prefer to actually code this line, however, and he (or she)
should if it helps in reading the program better."24
However, since a later reader of any program may be a beginner other than the original programmer, I would recommend that it be coded to help others read the program
besides the original programmer.
RULE 16 - Do not code an IF statement when the terminating
condition of a PERFORM loop will also include the condition.
This particular construct does not appear to be covered by
any of the textbook authors. However, it appears many times in
actual programs. It is apparently a misconception of how PERFORMs
work and confusion over the fact that the "test before"
can cause a PERFORM statement to be executed zero times.
RULE 17 - Do not use the PERFORM ... THRU construct.
There is much discussion over this particular construct and
much disagreement among the various textbook authors. Some recommend
always using it.
"The coding of the EXIT paragraph explicitly defines the
logical end of the procedure."25
"The above code [PERFORM...THRU] reflects a common practice
among many programmers. Paragraphs stand out more vividly as
each terminates with an EXIT paragraph. In essence, each paragraph
performed has a beginning and a clearly marked end (the EXIT paragraph)
to which all sentences converge in the paragraph."26
Others give details about why it others may use it, but counter
these examples with reasons why it should not be used.
"Programmers often use this form of PERFORM-THRU in order
to facilitate the use of GO TO statements within the procedure;
e.g., to provide for an early exit from the module by passing
control to the EXIT paragraph. ... The same logic could be coded
without the GO TO statement ... Note that with this arrangement
of code, the final EXIT paragraph really isn't necessary. Indeed,
this is generally true. If we organize our logic properly, the
GO TO statement is superfluous."27
"In the late 1960s and the early 1970s, prior to the adoption
of structured coding concepts, use of the PERFORM/THRU statement
was very popular. Indeed, the programming standards for many
installations recommended that the single-paragraph PERFORM never
by used. ... The main disadvantage to multiple-paragraph modules
is that the physical placement of paragraphs within the program
becomes significant to program execution. Hence program bugs
can be introduced. ... With structured code, GO TO statement usage
is restricted and hence there is no reason to use dummy paragraphs
[those with an EXIT in them] or multiple-paragraph modules. Because
of the wide use of dummy paragraphs prior to the structured coding
era, many older programs using multiple-paragraph modules are
still in existence, however."28
RULE 18 - Always use the READ ... INTO and WRITE ... FROM forms
of these verbs and define all record definitions in the WORKING-STORAGE
SECTION.
Most of the textbooks make no recommendation on the use of
the READ...INTO construct. Some simply note "The INTO clause
is optional."29 This is probably since
the initial programming examples in most textbooks do not use
the INTO clause so as not to confuse the student. However, those
texts that give lists of recommended coding constructs and/or
programming style ideas tend to recommend that the INTO clause
be used.
"Use READ INTO and WRITE FROM to do all the processing
in the Working-Storage Section. This is suggested for two reasons
..."30
"STANDARD. All work on a file should be done in the
WORKING-STORAGE. This means that in the PROCEDURE DIVISION, the
programmer should READ INTO and WRITE FROM. If a file is worked
on in WORKING-STORAGE alone, then the only fields that should
be defined in the FILE SECTION are the ones referenced in the
program. This will keep the maintenance programmer from referencing
them in the program intentionally or unintentionally. In fact,
the only fields from the file description (FD) entry referenced
in the program will be the record key and the record name."31
RULE 19 - Consider counting "records processed" instead
of "records read". At least ask yourself, "why
am I counting?" and "what am I counting?".
Only three of the authors mention control counts, and only
one actually gives an example of doing the counting. Weinberg
has a short section of "Counting for Control", but much
of the discussion centers around replacing "flags" with
counters because "Flags carry information only about zero
or nonzero; but control counts give a very precise picture."32
One author states, "Each file should maintain a record
count, and this count should be displayed as part of the end-of-job
procedure."33
This is a helpful suggestion, but still lacks a concrete example.
Interestingly, the only example does not count "records
read", but "records processed".34
READ INPUT-FILE AT END MOVE 1 TO WS-EOF.
PERFORM 10-PROCESS-READ UNTIL WS-EOF = 1.
PERFORM 20-END-OF-FILE-CHORES.
.
.
10-PROCESS-READ.
ADD 1 TO KOUNT.
PERFORM 30-PROCESS-RECORD.
READ INPUT-FILE AT END MOVE 1 TO WS-EOF.
.
.
If this program were to count reads, then the following statement
would have to be included after each of the read statements.
IF WS-EOF NOT = 1 THEN ADD 1 TO KOUNT.
RULE 20 - Consider adding "PARM" overrides to allow
for easy program testing, eliminated the need for "near-clones",
etc.
Because most programming texts are oriented at getting the
students to learn proper COBOL syntax and solve common problems,
this topic is not covered in any of the introductory textbooks
examined. However, one of the texts that included suggestions
for experienced programmers made some reference to it.
"The flexibility of the program can be greatly increased
by removing fixed values from the working storage of the program
and providing for these values to be entered via control cards
or parameter files during program initialization."35
While this is not as strong a statement as the rule above
suggests, it is at least a step in that direction.
1 - J. Wayne Spence. COBOL for the 80's. 559-578.
2 - Gerald M. Weinberg, et al. High Level COBOL Programming. 127ff.
3 - Michel Boillot and Mona Boillot. Understanding Structured COBOL. 468.
4 - Computer Partners, Inc. Handbook of COBOL Techniques. 25.
5 - Carl Feingold. Fundamentals of Structured COBOL Programming. 136-137.
6 - Gary S. Popkin. Comprehensive Structured COBOL. 365,414.
7 - Pacifico A. Lim. A Guide to Structured COBOL with Efficiency Techniques and Special Algorithms. 73ff.
8 - Pacifico A. Lim. 55-56.
9 - Computer Partners, Inc. 58-59.
10- Michel Boillot and Mona Boillot. 306.
11- Edward J. Coburn. Advanced Structured COBOL. 53.
12- Tyler Welburn. Advanced Structured COBOL: Batch, On-line, and Data-base Concepts. 446-447.
13- Gerard A. Paquette. Structured COBOL. 465.
14- Carl Feingold. 160-161.
15- Gary B. Shelly, et al. Structured COBOL, Pseudocode Edition. 6.7.
16- Michel Boillot and Mona Boillot. 177.
17- Pacifico A. Lim. 28.
18- Barry K. Nirmal. Programming Standards and Guidelines: COBOL edition. 129.
19- Carma L. McClure. Reducing COBOL Complexity through Structured Programming. 139.
20- Computer Partners, Inc. 38-39.
21- Gary S. Popkin. 135.
22- Tyler Welburn. Structured COBOL: Fundamentals and Style. 355-356.
23- Gary S. Popkin. 136.
24- Pacifico A. Lim. 28.
25- Barry K. Nirmal. 124.
26- Michel Boillot and Mona Boillot. 186-187.
27- Timothy R. Lister and Edward Yourdon. Learning to Program in Structured COBOL, Part 2. 46-47.
28- Tyler Welburn. Structured COBOL: Fundamentals and Style. 273-274.
29- Fritz A. McCameron. COBOL Logic and Programming. 80.
30- Carl Feingold. 745.
31- Barry K. Nirmal. 110.
32- Gerald M. Weinberg, et al. 163.
33- Computer Partners, Inc. 41.
34- Michel Boillot and Mona Boillot. 198.