JG/BJO/JPA 25th. May, 1962. AUTOMATIC PROGRAMMING LANGUAGES FOR BUSINESS AND SCIENCE A report is presented of the two day Conference on Automatic Programming languages for Business and Science held at Northampton College of Advanced Technology in co-operation with the British Computer Society on Tuesday and Wednesday 17th and 18th April, 1962. J. Caldwell Introduction The Conference consisted of four sessions, of which the first two were devoted to programming languages for business and the second two to programming languages for science. Dr. Clippinger commented on certain aspects of automatic processing in America and Dr. Dijkstra talked about the use of ALGOL 60 in Holland, but the main emphasis of the Conference was the state of the art in this country. Although the size and complexity of some business language compilers was something to be wondered at, their proponents underlined that they worked, whereas British systems were much less advanced. There seemed to be some danger of outside pressures forcing British manufacturers to adopt COBOL within a year or two. 2. Session 1. The first speaker was Mr. A.D'Agapayeff (Hayes, Akers and Hayes) who discussed current developments in commercial automatic programming. His talk fell into three sections. The first part was concerned with progress in four areas of commercial automatic programming:- a) the properties of data and how the data is described; b) the procedural facilities in a language and how they are combined; c) communication with the object programme and debugging; d) operation of the compiler and its availability to the user. In discussing the properties of data Mr. D'Agapayeff distinguished between data which consisted of fixed length units (noting the problem of differing sources and destination area sizes when moving data from one part of the store to another) data which consisted of variable length units whose lengths were fixed at the time they were created, and data which consisted of units whose lengths varied dynamically for which space allocation could only be made at the time. He observed that in moving data about in the store the editing characteristics of the receiving field were required. There were a number of features where progress was needed. These included facilities for character manipulation, for arbitrary groupings and repetitions of different classes of data units, for input and output descriptions without regard to hardware, for dynamical renaming of data and for internal and printing descriptions. Turning to procedural facilities, Mr. d'Agapayeff noted the need for and almost universal lack of facilities for the definition of new functions. In this area ALGOL has a great advantage over most other languages. As far as communication with the object programme was concerned, there were three points on which attention could be focussed. The first was that at run time data description was not available to the object programme, so that it was impossible to write routines in the source programme, referring to data which was to be identified at run time. TThe second was that only the subroutines required by the object programme should be loaded with it. Finally, debugging should be done in the source language only. Mr. d'Agapayeff then commented on the disillusionment of customers when they finally received their compilers and discovered how much equipment was required and how lengthy was the compilation process. The second part of his talk consisted of a progress report on the major commercial languages, i.e. COBOL, FACT, I.B.M. Commercial Translator, RAPIDWRITE, NEBULA, CLEO FILECODE, and language H. The general attitude to COBOL seems to depend on that of I.B.M. who have announced that they are not implementing their commercial Translator for any new machine. COBOL 61 is much tidier, better described and more realistic in its approach to commonality and in its attempt to reduce the number of dialects than the earlier version. However, more precision is still required in specifying the language, for example in the data description section. The language has no means of defining and using new functions. FACT is untidy and poorly supported by its programming and reference manuals. Nevertheless it is a real tool with powerful input/output and file processing features. The I.B.M. Commercial Translator is available on some machines. It is simple and elegant and has an excellent record. It contains means for specifying new functions. There are a number of facilities not in COBOL. The language does not cater for variable length fields and the input/output facilities are not up to FACT standard. RAPIDWRITE is essentially a subset of COBOL (mainly COBOL 60 with anticipations of COBOL 61). It uses card files for language statements and the compiler produces a full COBOL listing. NEBULA, originally written for ORION, is to be implemented on ATLAS also. It has good facilities for input/output and for the expansion of formulas. CLEO is interesting because of it stated intention to allow both the mathematical and commercial user the facility to express their requirements in a natural manner. FILECODE (for Sirius) is the only British commercial autocode which works. In contrast to language H is has a data description section, but is otherwise rudimentary. Language H (for the National Elliott 405) appears to be little more than an English Language assembly code. Some general points were then made concerning extensions to ALGOL to embrace cmmercial problems, and concerning commonality where it was argued that this was advantageous for training purposes, but disadvantageous because of the greated variety of commercial problems. In the third part of his talk Mr. d'Agapayeff looked to the future and expressed some hopes and expectations. He hoped that there would be a breakaway from the classical computer concept in the design of computers for business applications; that manu- facturers would stop selling solutions to problems written in languages which were not complete with compilers which did not exist on machines which did not work; that research centres would attempt to report their work in a more intelligible form; and that committees such as those of ECMA, IFIPS and ISO would combine. He expected that there would be a standardisation on languages in a year or two, with COBOL as the commercial standard; that there would be pressure to design computers to accept COBOL; and that there would be greater dependence on American machines and programming systems. In the discussion which followed the question was asked whether the systems analyst should know the run-time length of data units. Mr. d'Agapayeff thought not, and went on to say that he felt proper machine organisation would contribute greatly to the solution of problems associated with the design of compilers for business autocodes. Dr. R.D. Clippinger (Honeywell) reviewed the operational usage of commercial compilers in the U.S.A. He was mainly concerned with FACT as there has not been a great deal of experience of programme writing in commercial programming languages. Whereas 90% of all scientific problems programmed in America are written in FORTRAN, there were only 2 users of FLOWMATIC. At present, 12 companies are implementing COBOL on 35 machines, so experience in the use of COBOL is largely in the future. FACT was conceived in January 1959 and work began on it in May. Thus FACT just predates COBOL. It was desired to build a language which was complete in the sense that there would be no exit from the source language (for example, to machine code). This desire was not fulfilled. The goals aimed at in designing FACT were:- to reduce time and effort in programming, to increase the documentation of a problem and to make it easier to transfer problems from one machine to another. File handling is the key feature of FACT. Magnetic tape files are created automatically from card input with built-in checking. There are also facilities for sorting and report generating. The FACT compiler has 220,00 three-address instructions (the equivalent of about 450,000 one-address instructions). It can be run on a machine with a 4096 word memory and 4 magnetic tape decks. The compiler is large because of the complexity of what is attempted (e.g. the handling of many different card formats). The breakdown of the instructions is as follows:- 65,000 instructions are used in organising the eight phases of compilation and the sorting between each phase; 3,000 instructions are used in the input editor 6,000 instructions are used in the report generatot 20,000 instructions are used in the input/output 20,000 instructions are used in the sorting 5,000 instructions are used in the arithmetic 11,000 instructions are used in the fiekd and group moves The remaining instructions are used foe miscellaneaous purposes. As an example of the use of FACT Dr. Clippinger instanced a payroll calculation which resulted in 30,000 object programme instructions and which took 3 man-months to write. This did not include systems analysis as the job had already been programmed in the assembly code ARGUS. To illustrate the speed of compilation, it was stated that 7,000 words of object programme could be produced in about 20 minutes, while improvements to the compiler would reduce the figure to 13 minutes from July. Compilation would be three times as fast on the new 1800 computer. Dr. Clippinger observed that all problems should not automatically be writeen in an autocode - where the job was complex there was a clear case for using a compiler. It should be noted that the use of a compiler entailed the use of more equipment. Turning to the use of FACT, Dr. Clippinger mentioned four applications. One of these resulted in 200,000 - 300,000 object programme instructions with ?00 runs and 1500 tape changes each day. In another application IBM 729 tapes were used for input and output. The third application was in banking and the fourth was a military inventory control job with 150,000 transaction types. FACT was used in the framework of a computer optimisation system which include a programme for checking out programmes from any source language, a scheduling routine for parallel processing and an executive monitor for loading, entering and running programmes and for organising rerun and restarts for all or any of a set of programmes. Mr. Humby (ICT) then spoke in place of Mr. P.V. Ellis on the acceptance of COBOL in Britain. He said that COBOL was being accepted more slowly in this country than in America because:- (a) capital investment in equipment is on a more modest scale; (b) the early descriptions of COBOL were tape oriented; (c) the first COBOL report was vague; (d) there was a great distance between the implementors and the maintenance committee which made it difficult to keep up with the latest position; (e) there was no equivalent driving force; (f) the cost of providing readability was high. I.C.T. were developing a Mark I COBOL compiler for a minimum configur- ation consisting of 400 words of fast store, a 12,000 word drum, a card reader, a card punch, and a line printer. The Mark II version would have tape facilities. Session 2 Mr. M.A. Kingsbury (English Electric) opened the session by describing experience at Kidsgrove on the use of COBOL since October 1961. The compiler has about 60,000 instructions. Courses within the organisation last 3 days although 2 weeks would be more suitable. Use of COBOL results in a time saving in programme writing by a factor of 4 to 1. The compiler is in two sections. The first accepts the source programme and checks for errors, and the second produces the object programme. It has been found that compilation is usually successful after 2 or 3 runs. The compiler produces a detailed listing of the source programmes. Its use simplifies and standardises use of the computer. The efficiency of compiled programmes is less than that of the best hand-coded programmes, being best for file processing jobs. The most frequent programme errors are undefined data names and incompatible sizes for data fields. Mr. Humby then discussed the pros and cons of RAPIDWRITE. RAPIDWRITE helps the programmers who dislike the verbosity of COBOL and the managers and systems analysts who want to see the complete listing. It also crosses the language boundary, because full COBOL listings can be produced in any language. The compiler has two input routines. One accepts COBOL programme, the other accepts specially prepared cards together with a synonym table and a format dictionary and uses these to give the full listing in the require language before entering the compiler proper. RAPIDWRITE would be available in May; the compiler has 30,000 instructions. Debugging is by trace print out of the fast store. Courses in RAPIDWRITE last two days. Mr. A.S. Cormack (National Elliott) described early operating experience with Language H. This is an English language system with a minimum of operational features. The compiler is in two parts - the first is a translator with checking facilities, the second is a generator. Amendments to the source programme require recompilation. The first compiler for the 405M was begun in August 1961 and virtually completed by the end of the year (it has 23,000 instructions). At the beginning of 1962 a 405 compiler to produce coding for the 803 was initiated. To illustrate compilation speed an example was given of a payroll calculation consisting of three programmes. The first deals with permanent amendments, has 1000 commands (8000 machine code instructions) and takes 42 minutes to compile. The second calculates the payslips; has 560 commands (6,500 machine code instructions) and takes 25 minutes to compile, the third produces summaries, has 365 commands (4,300 machine code instructions) and takes 21 minutes to compile. Mr. A.R. Russell (Ferranti) gave a progress report on NEBULA. The compiler for ORION would bbe ready by the end of the year. One of the reasons for adopting NEBULA rather than COBOL was that COBOL was oriented towards a character rather than a binary computer. The writing of the compiler is based on the Brooker Morris technique of haing a compiler to write compilers. NEBULA is described to this programme in about 30,000 instructions and the output is compiler for NEBULA, which is expected to have at least 100,000 instructions. By keeping the analysis of the original source programmes on a separate reel of tape, the problem of making amendments is eased. Mr. T.R.Thompson (LEO) conducted the session by considering the fundamental principles for expressing a procedure for a computer application independently of any compiler or computer considerations. He distinguised the setting out of data in files of records which consist ultimately of items, which may in certain cases be single characters, and the description of procedures expressed in commands for operating on the data. The file layout must be described and the items in it characterised by relevant properties such as medium, position for printing, number of entries, alternative entries. Commands could be classified as input/output commands, commands for carrying out computation and commands for controlling the sequence of computation. A computer language should have a facility for adding to its repertoire of commands. Mr. Thompson pointed out that the language should be independent of any computer, whereas particular job plan expressions would be machine independent. Finally, Mr. Thompson listed some criteria for the design of an automatic programming language. It should be simple and easy to learn, it should have only a small number of reserved words, it should be concise, should be completely precise and it should have a compiler which avoids complication. 4. Session 3 E.W. Dijkstra spoke on his experience in the implementation of Algol. The Amsterdam Algol compiler had been produced for the Mathematical Centre's XI computer. In designing the compiler a straight forward approach had been adopted. There had been runours at the time of trouble with Fortran compilers due to their sophistication. The machine was new to the team and they had in consequence been free of pre-conceived ideas of how it should be used. This together with the machine limitations had enabled decisions to be made rapidly. The XI had a 4096 word store. Input and output are by paper tape. Unlimited machine time had been available for testing the compiler. Compiling speed was limited by the punch (25 characters/sec.). Approximately 1,000 - 1,500 instructions are obeyed to produce a machine instruction, and this is equivalent to 15 instr./sec. The punch is only capable of producing 5 instr./sec. Own arrays with dynamic bounds have not been implemented. The language has been extended by a library of procedures which may be called without declaration. Prior to ALGOL, autocode had not been used in Holland, and as a result there had not grown a climate of formed opinions which required such things as matrix routines. In England the availability of matrix autocodes had produced a feed-back, many problems being solved by matrix methods in preference to other techniques. The recursive features of ALGOL had been greatly appreciated, and had led to considerable enthusiasm for ALGOL 60. 4-day courses were run to teach ALGOL 60, and had been very successful. Programmes for the XI were written by several organisations, and these were compiled and run on a service bureau basis. The machine code produced by the compiler was not printed out. This was of no concern to the programmer. All modification to a programme must be in ALGOL, and any alteration of the machine code version is discouraged. Post mortem facilities were available by inserting print statements in the programme. These could be subsequently removed (by scissors). Only about 35% of programmes are free of syntactical errors, and it had been a mistake not to check the syntax on input. A separate syntactical checker had been written but was not used as their operator was more efficient. A further 4096 word store unit was being fitted to the XI, and had this been foreseen the compiler could have avoided paper tape output for load and go operation. Programmes which could reasonably be written in ALGOL 60 were generally between two and three times as long (measured in terms of number of instructions) than a comparable hand coded programme. Two cases were known of the translated programmes being shorter than the hand coded version. The library facility provided means for inserting hand coding, but required considerable know-how. The library routines were called from the library by the object programmes. F.G. Duncan (English Electric) spoke on the English Electric ALGOL Compilers. English Electric are developing two compilers to translate ALGOL to KDF 9 user code. At Whetstone a simple compiler was being written to allow rapid translation. Initially it had been intended to use a restricted set of ALGOL, but these restrictions had been lifted after discussion with Dijkstra. Load and go operation was to be used. The compiler would have about 3000 instructions, and requires 3 tape decks, although it could manage on 2 decks. At Kidsgrove a more sophisticated compiler was being written. This would have between 20,000 and 30,000 instructions and require 4 tape decks, although it could manage with 2. This compiler was a multi-pass compiler, necessary for optimisation. Considerable care was taken to make the compilers compatible. Full ALGOL was being implemented, except for own arrays with dynamic bounds and integer labels. It was expected to complete both compilers by the end of the year. An experimental compiler for Deuce had been written. The communication problem in using KDF 9 user code in the ALGOL programme had been solved. C.A.R. Hoare (Elliotts) described the Elliott ALGOL Compilers. The Elliott ALGOL compiler was designed for Service Bureau operation on the 505. This machine has high speed logic but it otherwise relatively unsophisticated. The interpretive approach of Dijkstra had not been adopted, since it was expected that with the 503 the inefficiency would have been of the order 30 : 1. It was intended to retain the compiler semi-permanently in the store and use load and go compilation. Large programmes were allowed to overwrite the compiler. If necessary, for very large programmes two pass operation is used. The limitations on size had meant that only a restricted set of the language was implemented. There was no attempt to optimise. No attempt to recover was made following discovery of a syntactical error. Errors could be traced from a print out of the push-down list. A compiler had been under test since February, and consisted of 8,000 instructions. It was estimated that 1.5 man-years had been spent coding the compiler. 7 hours/week were used to test the compiler on an 803, since the 503 was not yet available. A.E. Glennie (A.W.R.E) discussed his experience in using Fortran compilers. Early experience (3 years ago) with Fortran for the I.B.M. 704 had not been satisfactory. The quality of the coding was bad, and the compiler variable. Programmes were in general 10-15% longer in terms of instructions, and took 5% longer to obey than hand-coded programmes. Fortran II had been more successful, its attractive feature being the ability to Assemble programmes from pre-written and pre-tested sub- routines, using a Fortran skeleton. It had been found to be possible to write programmes in Fortran not previously considered possible to write at all. The Fortran Monitor System which enable sub-routines, hand or machine code, to be assembled into programmes was particularly powerful feature. Fortran syntax was very bad, and difficult, if not impossible to describe. As a result programmers were forced to adopt a simple minded attitude. Input and output was too clever, and its syntax using brackets caused trouble. The Mercury Autocode Syntax was more satisfactory. DO statements are annoying, since the variable must be positive and increasing. Conditions have a 3 way test which gave rise to redundant labels, (this is changed in Fortran 4). Commercial programmes could be written in Fortran using hand coded subroutines. Fortran compilers were slow due to optimisation. Loops were examined for modification in simple steps to avoid recalculating addresses of items in arrays. The sophistication of the compilers had led to unreliability. Their typical size was 40,000 to 50,000 instructions. It was estimated that it cost 1d to compile a machine code instruction, against 1/3d for symbolic assembly languages. Good programmers saved money by using Fortran, but bad programmers did not. Discussion Dr. M. V. Wilkes: The standardisation committee had turned its attention to deciding what is meant by standardisation. It was early to claim that they had yet agreed on this. E. W. Dijkstra (in answer to question): Programmes are full of clerical errors which decrease as the programmer gets more familiar with his problem. Fortran compilers may be slow because of its ability to translate parts separately. Compilers must not be unreliable. How can a workman love a poor tool? A.E. Glennie: The unreliability of Fortran was justifies by the sophistication. These compilers were improving, and errors only occurred in unused portions of the compiler. R. D. Clippinger: Compiler checking can be compared with machine checking. It was possible to have so much checking that little real work was done. There was a desirable minimum to syntactical checking, and this was probably a little less than the checking in second generation Fortran compilers. F. G. Duncan: The use of KDF 9 user code in Algol programmes was in the spirit of the Algol 60 report. E. W. Dijkstra The use of machine code should be limited to library routines. C. A. R. Hoare: It was dangerous to allow programmers to insert machine code, as they might do something silly. In particular, they might overwrite the compiler. P. Wegner: Algol is not as advanced as Fortran because it has no input and output. Data description is lacking, and in the procedure descriptiom there is no facility for assembling sub-routines. Recursion is not of any practical use and is used only as a toy. Would Mr. Dijkstra have preferred 2 tape units instead of additional store. E. W. Dijkstra Recursive routines made a language more enjoyable. This didn't means it was only a plaything. Good tools were lovable. It was easy to add an additional store to the XI, but not possible to fit tapes. Algol was quite flexible enough with the block structure, and recompilation presented no difficulties. Sub-routines were wasteful complication. 5. Session 4 Dr. M. V. Wilkes reported to the Rome Conference. An Algol extension committee had been formed to deal particularly with data description. The committee consisted of 50 members which included such members of the original committee as were still interested. Algol had become a satisfactory communication language for algorithms. Neliac is being developed as a language suitable for different machines with the minimum of reprogramming. This was achieved by boot- strapping techniques. It was still very experimental. R. D. Clippinger gave a Progress Report on Codasyl 1. COBOL: Cobol 61 was a cleaner version of COBOL 60, and by December 1961 was being implemented by 12 manufacturers. Cobol had been extended, and these would soon be available Extensions included:- (i) additional rounding facilities in arithmetic; (ii) a SORT verb; (iii) report writing. This required 52 pages of new material and 57 new reserved words (e.g. footing) There will be no COBOL 62. COBOL 63 will include the above extensions, and will also be more polished. A limited version of COBOL would be published. The Users Committee were asking for all elective features to be required. Anyone implementing COBOL could seek membership of the Manufacturers Committee. 2. Tabular Languages: These are now being considered. Considerable work is being done by I.B.M. 3. Language Structure Group: This group were working on an Information Algebra for Problem Stating. (NOTE: Dr. Clippinger reviewed this work. It would appear to be an abstract algebra oncerned with Entities, Properties, Values, lines, Areas, Bundles, Functions of Bundles etc.). Discussion (G.P.O.) Compiler languages were getting out of hand. There was no compatability. Extensions were making languages difficult to learn. More work should be none to make machines and assembly languages compatible. R. D. Clippinger: Extensions made programming easier. Salesmen could demonstrate to customers how easy programming was. People had learned Algol, and they would learn Cobol. Machine compatibility meant stagnation but there could be improvements in this direction. (Burroughs): Was Cobol satisfactory for small machines (e.g. 4 tape I.B.M. 1401)? Could commonality be achieved for such machines? Was commonality any use? Could software keep up with the improvements in a machine? R.D. Clippinger: Cobol was not satisfactory for small machines. It was not possible to make efficient compilers for such a complicated language for small machines. Commonality could be achieved but was difficult. The U.S.A.F. had a requirement for commonality because of its many machines but commonality was overrated. In practice it was easy to hand translate from one machine to another. It was difficult for languages to keep up with machine improvements. In Honeywell's case, it was not possible, automatically to take advantage of disc storage because of the segmentation problem. They were working on this, but it wasn't solved yet. R. A. Brooker (Manchester University) described his procedure for specifying compililation processor phase structure languages C. Strachey (Unattached): Commercial languages are imprecisely defined, and their compilation too slow. The compilers produced efficient object programmes. Mathematical languages are precisely specified, and highly symbolized. Their compilation is fast, but they produced inefficient programmes. The generalisation of concepts in the languages led to difficulties. A compiler for a mathematical language has about 4000 instructions compared with 20,000 - 200,000 instructions for a commercial language. In the future the boundary between mathematical and commercial languages and compilers would become blurred. Mathematical languages should have data description and sophisticated input and output. Commercial languages should cease pretending to use English and become more concise and symbolic. It should be remembered that languages do not make programming easier, but coding easier. The formulation of the problem remained. Manuals for commercial languages paid little or no attention to the operating procedures required for the programmes compiled. Mathematical languages were machine oriented, and failed to distinguish between the formulation of the problem, and the actual machine processes (e.g. matrix multiplication may involve partitioning because of the size of the matrices and the storage limitation of the machine). Compiler writing would get easier and machine independent compilers such as Neliac were being experimented with. More attention must be given to specifiying languages. Standardisa- tion experts could not specify languages, and too much effort is being spent on standardisation by committees. Compilers were affecting the design of computers. Concepts discovered by compiler writers were now incorporated into machines (e.g. the stack). Programming was becoming more realistic and less of a game like crossword puzzles. Experiments were being carried on at MIT on the shared use of large computers. Users submitted their problems to the computer independently. The MIT computer had a 1,000,000 word store. Printer character sets are inadequate R. A. Brooker (Answer to question); The phase structure compiler had been completed for Atlas but not tested. A Fortran translator had been written, but probably would not be implemented. Atlas autocode was being written. Algol could be written using the phase structure compiler. D'Agapayeff: Standardisation would come and it wasn't a waste of time. Learning Cobol was like learning Algol, and could be achieved easily in small doses. C. Strachey: More programming languages were required, orientated for special purposes. Cobol had got out of hand with its complications trying to please everyone. O'Brien (English Electric): The use of English in Cobol was good, because it enabled full documentation of the programme in an easily understood form. Decision tables of Tabsel are a step in the right direction for dealing with compound conditions. Fraser (Ferranti): Compilers would like a lot more information for efficient compila- tion. Too often the worst code had to be assumed. The more sophisticated a language, the more information is requied, and the less likely it is to get it. M. Wegener (C.S.G.): Algol should have an environment statement. It could then be a machine independent language. Gibson (L.C.D.): Problem oriented languages are not practical. Procedure orientated languages could be disguised to look like them. Harwall (Honeywell): Was a standard language for compilers possible? C. Strachey No. F. G. Duncan: We want lots of good languages, the trouble is that there is not one now.