Prepared for Warpstock 2003 by Lynn Maxson (lmaxson@pacbell.net) Opening Open Source Foreword The following describes a current project of the SCOUG Programming SIG. SCOUG, the Southern California OS/2 User Group, invites the reader to visit their website at www.scoug.com. Should the reader also want to follow the progress of this Programming SIG project more actively he can subscribe to the scoug-programming mailing list from the website as well. An Open Source Project The SCOUG Programming SIG discussed how best to participate in open source development for OS/2. Obviously it could have chosen to join in any of the currently ongoing open source projects for OS/2. Instead it has decided to focus on the more general problem relative to OS/2 open source development: increasing the number of OS/2 programmers of open source. This means overcoming the perceived barriers or inhibitors currently facing those in the OS/2 community interested in participating. SCOUG membership itself reflects that of the OS/2 community as a whole. The Programming SIG as a subset of that membership reflects it as well. We have a range of highly skilled programmers down through several levels to those who rate their programming skills at or near zero. So how then do we raise the average skill level of most, if not all, to a personal comfort zone to engage actively in open source programming? How do we use our own people resources to effect the necessary skill level? The reader will note that the focus relies on raising individual skill levels, a "smarting up", instead of a "dumbing down" of the skill level to that of the individual. In effect we propose to lower the bar by raising the people. The Inhibitors: Mass versus Inertia Not surprisingly that which many highly skilled programmers, so-called techies, enjoy with such relish, the abundance and variety of tools, has just the opposite effect on those considering increasing their skill levels. It just seems such an overwhelming mass in terms of sheer number that it dampens the desire to proceed: inertia. At the core of that number lies multiplicity. The multiplicity of programming languages. Within a programming language the multiplicity of implementations. Within an implementation the multiplicity of utilities. For example the GCC package comes with nearly four dozen utilities, each with its own language and each with its own source to maintain. So how do we present this mass in a way that encourages those who need to gain some mastery, some increase in their skill levels, to start the momentum to overcome their initial inertia? The Strategies: Short Term, Long Term, and Merging We cannot avoid the current situation with its multiplicities as the starting point, point A. We need to define a situation without such multiplicities, point B. We need a way that starts with point A that eventually leads into a seamless fit with point B. This means executing two concurrent strategies, one short and one long term, along with one that at some point joins the short seamlessly with the long. The SCOUG Programming SIG (SPS) has taken this three strategy approach. Short Term Strategy Multiplicity of Programming Languages In tackling this one head-on the SPS takes advantage of member's expertise in various programming languages to essentially support the languages in parallel. For those without either a language or programming experience we have to offer the necessary tutorials. We take a literate programming approach. Literate programming involves two languages, the informal descriptive language of the user and the formal language of the source code. In this instance we intend to use the same informal description as far as possible in all sample code. In effect we will provide a basic form of comparative linguistics, allowing the user to see the differences and similarities among the languages as well as nuances within various implementations of the same language. The sample code will range from single statements to different control structures to complete algorithms to entire programs. In that manner we expect the user to more rapidly gain a sense of "construction" for assembling code sequences in any language. Multiplicity of Implementations For any given programming language we frequently find open source code written for one implementation, e.g. Watcom C/C++, giving errors when compiled under another, e.g. GCC. It occurs also among the several different assembly language implementations. The SPS expects to take several different actions to assist in reducing the impact of these errors. First, we will deal with compiler options, listing equivalent forms from each implementation. Secondly, we will report such errors, their cause, and means of correction. This will occur as part of tutorial support from the website. In addition we will use the scoug-programming mailing list to function in part as a help desk. Multiplicity of Versions Within implementations we have different versions not fully backward compatible from a newer version to an older one. Where this occurs the SPS will either engage in modifying the source to use the newer version error free or will describe the source changes, their cause and correction, again as part of the tutorial description on the website. Also we will support the error detection and correction through the scoug-programming mailing list. Changes to the C/C++ compiler(s) The SPS believes that we can modify the C/C++ compilers, specifically the GCC compiler, to enhance programmer productivity. Moreover we can do this without affecting existing C/C++ source, ensuring in this manner backward compatibility. Change 1: Change compiler from single- to multi-pass This change eliminates the need for the "void" statement to provide for forward references. Change 2: Allow same naming convention for "main" procedures as for subroutines This requirement, a holdover from the use of OCL in UNIX, does not apply on any other platform. Technically it exists as a UNIX-specific implementation restriction, not a language one. Change 3: Eliminate the need for nested, i.e. internal, procedures This leaves only the use of external procedures. This change will allow an unlimited number of unordered procedures (source) on input. In effect it eliminates the need for "make" and "build" while simplifying the "link" process. Change 4: Allow unlimited "main" procedures on input This comes as an extension to change 3. Effectively it allows global source changes across program boundaries. It allows multiple object modules, i.e. "main" procedures, to result from a single compile. The process works from a single "external" procedure, a current restriction, to compilation of an unlimited number. This offers a single synchronization point to effect global source changes. Change 5: Allow for either interpretive or compiled execution This takes advantage of the fact that both interpreters and compilers have the same four stages: (1) syntax analysis, (2) semantic analysis, (3) proof theory, and (4) meta theory. Interpreters and compilers engage in the first two stages, syntax and semantic analysis, identically. They differ in terms of executable results, interpretive or compiled, in proof theory. To make this change means incorporating both interpretive and compile functions as part of data entry, i.e. an editor function. The editor then becomes the single, necessary user interface whose menu options make it a complete IDE. As we currently have "smart" editors, e.g. LPEX, which do syntax analysis this simply means making them even "smarter" all the way through code generation (proof theory). Change 6: Allow for automatic test data generation and testing of interpretive output This testing occurs at three levels: (1) the individual source statement, (2) a control structure (sequence, decision, iteration), and (3) procedure. This takes advantage of the fact that we can consider a sequence as one or more statements or control structures. Each (statement, control structure, procedure) have definite boundaries. Basically these boundaries have a single input and a single output. As such we can consider them as "pluggable" or "reusable" components. Within the different boundaries we have data variables whose values determine results in an assignment statement or within an "if" or "do" clause. In interpretive mode the software can present us with a list of variables, assumptions about their default value ranges, and the possibility to substitute for those default values. Once we have set the values the software as part of an exhaustive true/false proof, the same as occurs in logic programming, can enumerate all possible combinations of values for all variables tested. The reader needs to understand the implication of the exhaustive true/false proof using enumerated sets of values for variables. When "properly" understood and implemented it eliminates the need for "alpha" and "beta" testing along with the need for "alpha" and "beta" testers. The reader needs to understand that this automatic software testing process occurs millions of times faster, millions of times cheaper, and millions of times more accurate than current "accepted" practice. This changes represent a form of "paradigm busting", of attempting to look at "old", i.e. current, things in a new way. Each of them illustrates how implementations and their associated methodologies operate to set an upper limit on individual productivity. Somewhere in the process of implementing this short term strategy the SPS will reach a consensus of when to leave this to focus more fully on the long term strategy. The SPS will then merge the two strategies. Long Term Strategy Interpreters have always offer an IDE based upon an editor interface. Because they only differ from compilers in the form of the executables produce, a meta theory option during the proof theory, it makes little sense to continue to pursue the historical edit-compile-test, mulitple, separate step process. We need only one tool, one interface. Moreover we need only one language. One language, one tool, one interface can lower the skill bar to participating in open source programming. Three pre-1970 programming languages (LISP, APL, and PL/I) contain all of the data elements and aggregates and operators found in all other third generation (imperative) programming languages combined. Three. Equal to thousands. Moreover all programming languages are specification languages. On the other hand not all specification languages are programming languages: they have neither an interpreter or compiler. We make this distinction in order to make clear...or clearer...the relationship between imperative programming languages (1st, 2nd, and 3rd generation) and the Software Development Process (SDP) Software Development Process (SDP) The SDP consists of five stages: (1) specification, (2) analysis, (3) design, (4) construction, and (5) testing. Now if every programming language is a specification language, why is it that coding appears first in construction and not specification? The answer lies in the use of imperative languages like C, C++, and JAVA (also Python, PHP, Perl, PL/I, etc.). In imperative languages each of these stages have their own form, some text, some graphical, of manually prepared source. In short five stages, five separate manual sources to maintain in synch. Declarative languages (AI, neural nets, Prolog, SQL, etc.) on the other hand allow the software to accept specifications as input, automatically performing analysis, design, and construction upon it. In some instances as suggested in change 6 (automated test data generation and testing) in the short term strategy the software will also perform automatic testing. Thus we have the comparison of imperative versus declarative languages of five manual SDP stages in the former and one in the latter. Imperative Declarative Specification Manual Manual Analysis Manual Software Design Manual Software Construction Manual Software Testing Manual Software Obviously our insistence to stay with imperative languages in the production of open source code places another limit on individual productivity. A Big Challenge: Matching Solution Set to Problem Set Sometimes in the search for better solutions we venture far from the KISS principle. We get so wrapped up in some elegant or esoteric form that we forget why we came here in the first place. We have programming languages to describe, i.e. communicate, real world events. We use programming languages basically for the same reason we use our native languages: to provide a linguistic map of the territory. This linguistic map represents our solution set to a real world situation, our problem set. To the degree that our solution set matches the problem set determines its rationality, how closely our map fits the territory. In this instance we can only map the logical processes, the data and the operators describing them. We engage in an irrational act when we attempt to make the territory fit the map. Perhaps no better example of this exists than in the post-1970 use of 'int' (binary integer only) and 'float'(real arithmetic). This disallows the broader occurences in reality, in fact in real computers, of fixed decimal integers (as well as binary) and fixed decimal real (as well as binary real). It disallows the variable precision available for binary and decimal variables as well as the choice of either binary or decimal for floating point. Fortunately at least one pre-1970 programming language, PL/I, has native data types for all arithmetic and string variables and constants. It supports native fixed and variable length, bit and character strings as well as native support for string operators. In short PL/I provides a closer match, i.e. a better map, to machine architecture or assembly language than that erroneously claimed for C. In addition PL/I and APL support aggregate operands, e.g. the ability to add, substract, multiple, divide, and, or, and compare (equal to, less than, greater than, equal to or greater than, equal to or less than, not equal) arrays and structures. If we add the operator richness of APL to PL/I and then to that mix the list aggegate and operators of LISP, we can in this synthesis do anything with equal ease in terms of writing effort and expression possible in any other imperattive language. This synthesis allows our solution set, our map, to directly correspond to events, data and operations, in the problem set, the territory. No other combination of languages can provide a better mapping solution, a better match of the solution set to the problem set, of the map to the territory, than this single one. We build here on capabilities operating now for over 50 years. Moving it to a single syntax, a single form, a single language occurs without any loss in translation. If we use a simple syntax like PL/I where every program element is a statement and every statement ends in a semi-colon, we reduce the learning curve significantly. A Bigger Challenge: Matching Dynamics of Solution Set to Problem Set. Assuming that we have now provided for the best possible fit of our solution set to our problem set, a challenge we have met, we now face an even bigger challenge: having our solution set match the dynamics of our problem set. Changes occur in the problem set. We need to implement them in the solution set. Moreover we need to implement them at rate at least equal to their occurrence in the problem set. Failure to do so means creation of a backlog. Historically the persistant presence of a increasing backlog has brought more than one promise of a "silver bullet" to the forefront. In fact the resolution of the backlog situation explicitly asserted by its advocates brought about the current emphasis on object-oriented technology. That it has failed to live up to its promise and in fact even worsened the situation has lead to recent efforts in extreme programming and agile modeling as well as a consideration of aspect-oriented programming. At its core resolving the backlog situation, the ability to sustain a change rate in the solution set, the software, at least equal to that of the problem set lies in improving individual productivity. As people make up the primary cost as well as the delay in software development and maintenance the resolution lies in doing two things concurrently: (1) reduce the number of people necessary and (2) minimize the remaining people effort. We can offer the following guideline in achieving this: "Let people do what software cannot and software what people need not." This means to minimize the amount of clerical work by shifting it, by automating it in software. We have already seen an example of this in the earlier comparison in the SDP between the use of imperative and declarative programming languages. Impediments to Increased Productivity We have already discussed one: the continued reliance on imperative programming languages. We need to shift to a greater use of declarative languages based on logic programming. However, we need to recognize we have imperative languages because machine architectures through their instruction set have an imperative basis. Thus any declarative language needs also to include imperative within its scope. Otherwise as a specification language it cannot specify itself down to the machine, i.e. instruction set, level. A second impediment lies in our reliance on file systems, on files and directories, for source code storage and maintenance. We should shift to a use of a data repository/directory using a database manager. This allows a manufacturing approach for source maintenance where we separately store our raw materials, source statements and source data. This allows the software to maintain assemblies of statements and assemblies as ordered lists of names. A third impediment lies in using multiple libraries instead of a single source library, a single specification pool. The use of a single source library does not eliminate incompatibilities, but in conjunction with logic programming identifies them. That identification allows the user the full range of choices as well as the possibility of modifying the source to eliminate them. The Data Respository/Directory The Data Repository/Directory uses a database management approach to automate the creation, retrieval, and maintenance of source code, source text, and source data. While the user explicitly names source data elements and aggregates the software can provide a content-based name for every source statement. This means that it stores each source statement separately. It also means that all statement assemblies exist as an ordered list of names of other source statements or assemblies. Thus we never replicate the source itself, only its name. This allows a "pure" manufacturing approach supporting a bill-of-material explosion of any assembly of all lower level assemblies and source statements. It also supports a "where-used" capability for source statements and assemblies. This simplifies source change management when change to source has a global, cross-program, cross-application, effect. In conjunction with the suggested change to interpreter/compiler to allow production of multiple object modules we can synchronize all global effects of any change in a single unit of work, i.e. a single compile. This makes reuse available from the statement level on up through all higher level assemblies. It does the same for data elements up through all higher level aggregates. This allows then any examination of any use of any statement or data throughout all applications. In Summary We need a way to have more in the OS/2 community comfortable and competent in contributing open source. We have ways of assisting this in the current environment as well working toward a more ideal, more productive one. We do not take the approach of dumbing down, but rather one of reducing what needs mastering. This reduction comes down to a single specification/programming language, covering the entire range of imperative and declarative capabilities. It comes down to a single software tool written in that language based on a data repository/directory also written in that language. That means simplifying the current software environment and its multiplicities to oneness: one language, one tool, one source. This offers more comprehensive support than available in the current software environment along with increased productivity as the use must learn less and do less yet achieve as much or more. The SCOUG Programming SIG has started on this path. Obviously it has a long way to go. Just as obviously it welcomes anyone interested in bringing this to fruition as early as possible.