The next “secret weapon” in programming

Paul Graham, in 2001, wrote on his success with Viaweb (now Yahoo Store) that it was due to a “secret weapon” that enabled the rapid development and deployment of code: Lisp. Designed with functional programming in mind, the language offers abstractions that make code dense and therefore quick to write and maintain. Although Lisp was impractical for desktop applications at the time, Graham’s use of it for web-based software allowed a small team to kick ass.

Viaweb launched in 1995. It’s now 2010, and the Lisp family of languages is alive and well, thanks to Clojure, a Lisp that runs on the Java Virtual Machine. Ideas from Lisp and functional programming have also filtered into languages like Ruby and Python, which are far more powerful than C++, the lingua franca of the ’90s. Functional programming is, of course, still as powerful and excellent as it was in 1995, but languages as a whole have improved, making the use of a powerful language like Clojure less of a relative advantage than it was for Graham to use Lisp. Using a functional language is still a good business decision, especially for a startup, but wouldn’t quality as a “secret weapon” in an era when every developer worth his salt has heard of Ruby on Rails, which may not be a “functional” language, but comes close enough for many purposes. So what is the next secret weapon? No one knows, but I’ve got a good guess: strong static typing.

Plenty of programmers love to hate static typing, and not for bad reasons, because they’re exposed to shitty static typing, such as that in Java. If one compares the static typing of Java to the dynamic typing of Python or Ruby, most programmers will prefer to program in the dynamic languages. It’s hard to blame them, since Java, as it is often used, requires explicit typing of every variable that is to be used. This becomes painful rapidly, because complex programs use “inner” variables and functions within API-level functions all the time, and having to explicitly declare a type for each will slow down development substantially. This leads us to:

Misconception #1: Java is a representative of “static typing”.

If we want to compare static and dynamic typing, we need to make best-in-class comparisons: what is the best that can be done in each paradigm? Comparing Ruby or Lisp to Java or C++ is not fair; a better comparison would use Haskell or OCaml, both of which have implicit typing as well as parameterized and algebraic data types– these concepts are not as “advanced” as they sound– to represent static typing.

Java’s type system simply isn’t very powerful or useful. Java actually contains two type systems, meshed together into an unsettling chimera. One of its type systems is bottom-up and vaguely algebraic, consisting of a few primitive types (integers of varying sizes, and single- and double-precision floating point numbers) and arrays of existing types. That’s it. It has int and double[] and char[][] but cannot represent anything more interesting (tuples, strings, variants, parameterized types). To represent those more structured types one has to use Java’s other type system, which is top-down– everything is a subtype of “Object”, and each variable stands not for the object itself, but to a reference to what might be an object of that type, or none (null). The nastiness of this cannot be overstated. Java’s notorious NullPointerException, a runtime failure associated with dereferencing a null pointer, is both a common and (from a debugging perspective) a usually-quite-useless error. Java’s type system lacks the power to verify that a String variable, at compile time, will actually hold a string, and not null, a value that must be handled specially by any method that takes a String (ditto, for any other object type) as input. The string type is not available; you must make do with what is, in fact, a (string option) ref– a mutable reference that may hold a String.

Java’s type system catches some errors at compile time, but not enough of them, most programmers feel, to justify the pain of using its type system. No language can eliminate runtime failures, but statically-typed languages, used properly, can make them very, very rare. Java’s type system doesn’t have enough power to achieve this; it merely makes them somewhat less common, but not enough to justify the required, explicit use of an ugly, underpowered type system.

The type systems of languages like Ocaml and Haskell are far more powerful, allowing user-specified algebraic types. Also, explicit typing is usually not required; the compiler infers most types using the Hindley-Milner algorithm. Although Ocaml and Haskell programmers usually explicitly type their API-level functions– this is just a good practice– they do not suffer the overhead associated with explicitly typing inner functions and variables as in Java. Code development is almost as fast in Haskell or Ocaml as in a language like Ruby; no worse than 10 or 20 percent slower, a difference easily made up for by the reduced debugging time.

Misconception #2: Static typing’s main justification is the faster performance of executables.

Executables generated by statically-typed languages generally have better performance than programs, even compiled ones, in dynamic languages, but for most of us, this is one of the weaker arguments for static typing, maybe fourth or fifth down on the list for most programmers. For most programmers on most applications, human performance is far more important: human time is valuable, and computer time is cheap. We want to write good programs, with minimal maintenance overhead, fast.

On large projects, one of the greatest benefits of static typing is interface control. Although many programmers in dynamic languages are disciplined, one “rock star” with no respect for interfaces can spoil a project, and the errors he produces can go undetected until they occur in runtime testing or (worse yet) in production. He may, for example, change the return type of an interface-level function and fail to inform anyone. In a statically-typed language, this breaks the build, and he’s expected to fix it. In a dynamic language, it can produce a difficult-to-detect runtime error. Worse yet, the failure this change produces can occur far from the function that is in error, after it has finished and is no longer on the call stack.

As for smaller projects with one developer, interface control may not be so important, but ease of debugging is, just because of the enormous amount of time programmers spend debugging and testing. At any scale, compile time bugs are less painful than runtime bugs, and do-the-wrong thing errors are even worse than program-terminating runtime bugs. I would argue that, on average, one runtime bug equals 15 to 50 compile-time bugs in terms of costliness. This is not only because they take more time and effort to find and fix. It’s also because of the cognitive state called flow, on which programmers rely in order to be productive. Fixing an error caught by the compiler, with a known line number, does not break flow much more than a quick trip to the bathroom (most bugs are trivial and, once caught, can be quickly fixed). A 30-minute forensic caper required to determine the source of runtime misbehavior will break flow, because the programmer has to drop what he’s doing and solve a different problem.

It’s often stated that 50% of a programmer’s time is spent debugging. In dynamically-typed languages and languages with weak type systems, I’d bump that percentage to 80, including unit testing, development and study of debugging tools, and defensive measures that must be taken to prevent possibly unknown bugs (“unknown unknowns”) from entering production. In statically-typed languages, this percentage is appreciably lower. It’s probably 30 to 40 percent, not because programmers in statically-typed languages produce fewer bugs, but because so many of those bugs are confronted immediately and quick to fix.

This is a simple economic argument based on human time. The fact that statically-typed languages produce faster executables is merely an added bonus.

Misconception #3: Static typing only catches trivial bugs.

First, it’s surprising how many bugs are trivial. Occasionally they are the result of deep, intrinsic errors that follow from faulty reasoning about the system one is building, and those take serious time and energy to fix no matter what language one is using, but most of the time, they are the result of mistakes like creating records with a field named “public” and, later in the code, reading a field named “pubic”. In a statically-typed language, properly used, this error will be caught by the compiler, noting that the record type of the data does not have such a field. In dynamically-typed languages, where records are usually represented using map (dictionary) types, the range of possible behaviors is greater. Lisps, for example, tend to return a special value nil when a nonexistent key is queried from a map, meaning that the error will not occur until another function, possibly much later, tries to do something with this null value.

The painfulness of a bug is not a function of whether it is “trivial” in origin, but how long it takes to detect and fix the bug. “Trivial” bugs, by definition, are fairly easy to fix once found; this does not mean they are always easy to find. In dynamically-typed languages, certain classes of trivial bugs take minutes to find at best and hours at worst. That time adds up very quickly.

Second, the usefulness of static typing is a function of the programmer’s knowledge of how to use it. Types provide a language through which programmers can specify certain constraints, but don’t require that the programmer use it. An undisciplined programmer could represent dates as, say, integer arrays or tuples– a bad idea, due to ambiguity in date formats. By contrast, a good programmer would create a record type with fields labeled “day”, “week”, and “month”, thereby eliminating certain classes of ambiguity.

Strong, static typing, properly used, can catch the vast majority of bugs in compilation. Using the type system to do so is an art more than it is a science, but most programmers can learn enough to get started within a couple of weeks.

Conclusion

I’ve only scratched the surface of the benefits of static typing, and there’s much I’ve left out. In sum, I believe the strongest benefit of static typing is that it offers a set of tools through which programmers can dramatically reduce the incidence of costly runtime bugs. Since type inference is automatic in languages like Ocaml and Haskell, it provides, essentially for free, a large suite of unit tests that never have to be written, and automatic, error-free documentation. It’s no silver bullet, obviously– no tool could entirely eliminate the need for unit testing and documentation– but in my experience, it’s still damn useful. If I’m right (and I may not be; these are estimates based on anecdotal experience) in my claim that debugging overhead in large projects reaches 80% in dynamically-typed languages, as opposed to 40% for statically typed ones, this indicates the potential for a threefold increase in the amount of time spent moving a project forward, and the potential for a dramatic improvement in real productivity.

About these ads

13 thoughts on “The next “secret weapon” in programming

  1. The trouble with type inference is that it only guarantees that the compiler can find some type that is consistent with all your assertions, not that it finds the right type. GHC is constantly spitting out errors like “Found (int, int, int), expected (int, int, int).”

    • Interesting. I haven’t seen this before. Could you show me how to generate it?

      Existing type systems have a lot of annoyances and problems: no question about that. And then there is the functor/type-class debate: tye classes are a lot nicer for simple cases, while functors are more general and powerful (to get full functor power out of Haskell, you need the type system extensions).

      I have seen errors where, for example, one might have two identical (under the hood) types, such as “float” and “length”, and get a compilation error because of the type mismatch, but that’s generally held to be a desired behavior. I don’t think it’s what you’re talking about.

      • Not offhand, alas. It’s been a while since my informal Haskell class (and I gave up on Haskell). I will point you to Typed Racket, however, which is statically typed Scheme with far fewer of the inconveniences of H-M type languages, and far more expressiveness in the type system (without being outright Turing-complete).

  2. Have you looked at the Rust language at all? It’s very much a work in progress, and not yet stable in terms of syntax; but it can be used in a Lispy/functional style, and supports both dynamic typing, static typing (with type inference), ADTs, and ML-style pattern matching.

  3. I’ve just read the discussion of success typing at the Erlang Dialyzer page. Success types look like a very nice layer over dynamically typed languages such as Erlang: they report a type error at compile time only if the run-time must (as opposed to might) report a type error. For example, given this ruleset (not in Erlang syntax):

    foo true true = true;
    foo false _ = false;
    foo _ false = false;

    H-M typing will assign a type of bool -> bool -> bool. But this is only a small subset of what foo can actually accept: for example, (foo 32 false) is false and so is (foo false “yack”), whereas (foo 32 “yack”) is an error. So the typing is pessimistic; it prevents errors at run time at the expense of being overly restrictive.

    Success typing gives this function the type any -> any -> bool, which is optimistic and reflects the way dynamically typed languages actually work. Dialyzer has found lots of errors of this type in large, well-aged code bases: it has the advantages that it can be run offline rather than intertwingled with the compiler, and it never generates a false positive. What’s more, you can add Javadoc-style comments specifying the intended type of a function (Dialyzer already knows about Erlang’s built-in functions), and if any uses are inconsistent with these, Dialyzer will tell you so.

  4. > Since type inference is automatic in languages like Ocaml and Haskell, it provides, essentially for free, a large suite of unit tests that never have to be written, and automatic, error-free documentation.

    You nailed it. The type system acts as a language-generated basic unit testing suite, with the added benefit of being faster (it is done by the compiler rather than a testing farm and supports incremental testing which further reduces the time between creating and detecting the bug).

    And it’s fairly easy to teach yourself how to leverage type annotations for further testing, by adding type constraints that match domain constraints (“an overdrawn account cannot be charged”) that are not immediately obvious from the code itself. I’d say 25% of an average unit test suite can be replaced by type inference, and an additional 50% can be replaced by appropriate design of types and surgical annotation strikes.

  5. You are incorrect with regards to “Lisps” returning only nil on a map miss.

    Common Lisp’s GETHASH returns two values, with the second value being T if the value was found.

  6. Michael, do you still feel this way about static typing? I’m curious, because on your list of 5 recommended programming languages to learn, 2 have advanced static type systems but 2 are dynamically typed (C is somewhere in between).

    • Static typing is very strong and I still prefer it, but I’m less ideological about it than I used to be.

      What matters more than the language is the kind of work you’re doing day-to-day, which depends on libraries, internal code, and the problem space. You can do things properly in dynamic languages. It just takes a little bit more discipline.

  7. Pingback: Why Clojure will win | Michael O. Church

  8. Interesting reading this in 2014, now that Apple has presented Swift as the next generation language for their platforms.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s