Why you can’t hire good Java developers.

Before I begin, the title of my essay deserves explanation. I am not saying, “There are no good Java developers.” That would be inflammatory and false. Nor am I saying it’s impossible to hire one, or even three, strong Java developers for an especially compelling project. What I will say is this: in the long or even medium term, if the house language is Java, it becomes nearly impossible to establish a hiring process that reliably pulls in strong developers with the very-low false-positive rates (ideally, below 5%) that technology companies require.

What I won’t discuss (at least, not at length) are the difficulties in attracting good developers for Java positions, although those are significant, because most skilled software developers have been exposed to a number of programming languages and Java rarely emerges as the favorite. That’s an issue for another post. In spite of that problem, Google and the leading investment banks have the resources necessary to bring top talent to work with uninspiring tools, and one willing to compete with them on compensation will find this difficulty surmountable. Nor will I discuss why top developers find Java uninspiring and tedious; that also deserves its own post (or five). So I’ll assume, for simplicity, that attracting top developers is not a problem for the reader, and focus on the difficulties Java creates in selecting them.

In building a technology team, false positives (in hiring) are considered almost intolerable. If 1 unit represents the contribution of a median engineer, the productivity of the best engineers is 5 to 20 units, and that of the worst can be -10 to -50 (in part, because the frankly incompetent absorb the time and morale of the best developers). In computer programming, making a bad hire (and I mean a merely incompetent one, not even a malicious or unethical person) isn’t a minor mistake as it is in most fields. Rather, a bad hire can derail a project and, for small businesses, sink a company. For this reason, technical interviews at leading companies tend to be very intensive. A typical technology company will use a phone screen as a filter (a surprising percentage of people with impressive CVs can’t think mathematically or solve problems in code, and phone screens shut them out) followed by a code sample, and, after this, an all-day in-office interview involving design questions, assessment of “fit” and personality, and quick problem-solving questions. “White board” coding questions may be used, but those are generally less intensive (due to time constraints) than even the smallest “real world” coding tasks. Those tend to fall closer to the general-intelligence/”on-your-feet” problem-solving questions than to coding challenges.

For this reason, a code sample is essential in a software company’s hiring process. It can come from an open-source effort, a personal “side project”, or even a (contrived) engineering challenge. It will generally be between 100 and 500 lines of code (any more than 500 can’t be read in one sitting by most people).  The code’s greater purpose is irrelevant– but the scope of the sample must be sufficient to determine whether the person writes quality code “in the large” as well as for small projects. Does the person have architectural sense, or use brute-force inelegant solutions that will be impossible for others to maintain? Without the code sample, a non-negligible false-positive rate (somewhere around 5 to 10%, in my experience) is inevitable.

This is where Java fails: the code sample. With 200 lines of Python or Scala code, it’s generally quite easy to tell how skilled a developer is and to get a general sense of his architectural ability, because 200 lines of code in these languages can express substantial functionality. With Java, that’s not the case: a 200-line code sample (barely enough to solve a “toy” problem) provides absolutely no information about whether a job candidate will solve problems in an infrastructurally sound way, or will instead create the next generation’s legacy horrors. The reasons for this are as follows. First, Java is tediously verbose, which means that 200 lines of code in it contain as much information as 20-50 lines of code in a more expressive language. There just isn’t much there there. Second, in Java, bad and good code look pretty much the same: one actually has to read an implementation of “the Visitor pattern” for detail to know if it was used correctly and soundly. Third, Java’s “everything is a class” ideology means that people don’t write programs but classes, and that even mid-sized Java programs are, in fact, domain-specific languages (DSLs)– usually promiscuously strewn about the file system because of Java’s congruence requirements around class and package names. Most Java developers solve larger problems by creating utterly terrible DSLs, but this breakdown behavior simply doesn’t show up on the scale of a typical code sample (at most, 500 lines of code).

The result of all this is that it’s economically infeasible to separate good and bad Java developers based on their code. White-board problems? Code samples? Not enough signal, if the language is Java. CVs? Even less signal there. The result is that any Java shop is going to have to filter on something other than coding ability (usually, the learned skill of passing interviews). In finance, that filter is general intelligence as measured by “brainteaser” interviews. The problem here is that general intelligence, although important, does not guarantee that someone can write decent software. So that approach works for financial employers because they have uses (trading and “quant” research) for high-IQ people who can’t code, but not for typical technology companies that rely on a uniformly high quality in the software they create.

Java’s verbosity makes the most critical aspect of software hiring– reading the candidates’ code not only for correctness (which can be checked automatically) but architectural quality– impossible unless one is willing to dedicate immense and precious resources (the time of the best engineers) to the problem, and to request very large (1000+ lines of code) code samples. So for Java positions, this just isn’t done– it can’t be done. This is to the advantage of incompetent Java developers, who with practice at “white-boarding” can sneak into elite software companies, but to the deep disadvantage of companies that use the language.

Of course, strong Java engineers exist, and it’s possible to hire a few. One might even get lucky and hire seven or eight great Java engineers before bringing on the first dud. Stranger things have happened. But establishing a robust and reliable hiring process requires that candidate code be read for quality before a decision is made. In a verbose language like Java, it’s not economical (few companies can afford to dedicate 25+ percent of engineering time to reading job candidates’ code samples) and therefore, it rarely happens. This makes an uncomfortably high false-positive rate, in the long term, inevitable when hiring for Java positions.

REPL or Fail

I wrote a post last week on the idealized trajectory of a software engineer in general professional ability, and I find the 3-point scale (with one decimal point) that I developed to be quite useful. It describes the transition from additive to multiplicative contributions to a team, with 1.0 representing the baseline competence (a net adder, rather than a subtracter) of a professional programmer, 1.2 being about average for the industry, 1.5 being seriously good (“senior” in most contexts) and 2.0 representing a consistent multiplier and technical leader. Sadly, one of the more common roadblocks occurs early (around 1.0 to 1.2) and for most developers, and it’s often associated with-established programming languages like Java, C++, and Visual Basic. What’s going on? Why do so many programmers reach a hard ceiling, with some persisting there for decades, while others pass through this barrier with ease? What is it about certain languages or technologies that holds programmers back?

Programming is a two-class society. We have the mere “coders” who use only one language, hate the command line, and don’t program outside of work. They typically work on bland, “enterprise” projects and solve annoyingly detailed, but not difficult, problems. They generally find programming to be a boring task, but “it pays the bills” and the other smart-person route to management (actuarial science) involves hard exams. If they remain “in programming”, they’re lucky to break six figures when they’re 40, and they are likely to face age discrimination and layoffs in the decade after that. On my scale, they plateau around 1.2 because, in enterprise programming, those who reach 1.3-1.4 are usually brought into management. For a contrast, the other class is comprised of elite “hackers” who prefer languages like Python, Scala and Erlang (although some might have to use Java) and who program outside of work, who are hotly desired by huge companies and startups alike, and who continue growing even into old age. These usually reach 1.3 in their first few years as professional programmers, and reliably break 1.5 by mid-career. What separates the two? What is it about a programming language that makes it highly indicative of a software engineer’s future progress (or lack thereof)?

It’s evident that this problem is outside of languages themselves, because a 1.5 programmer can, after some adjustment, program at a comparable level in Java. So it’s actually not the case (aside from an opportunity cost argument) that Java and C++ make people worse programmers. Rather, something happens in other, more modern, languages that makes people improve faster and helps them smash through that 1.2 barrier. So, what is it? I spent much time trying to figure this out, and when I came upon the answer, it was so simple it shocked me: The Mighty REPL.

Modern programming languages supply an interactive mode, also known as a “read-eval-print loop” or REPL, as part of the programming environment. The REPL allows programmers to try out code and get immediate results, and to explore existing software modules interactively by calling their functions and seeing what they do. Although typically associated with interpreted languages (notably, Lisp) REPLs are supplied for compiled languages such as Scala, Ocaml, and Haskell. (They do not provide the speed of compiled code, but code is not run for speed in the interactive mode.) This tight feedback loop facilitates a style of development and code exploration that is far more engaging and effective than the processes of writing and reading code would be without a REPL, and far superior to anything available from an IDE. (A REPL is, in some sense, a simple but highly effective IDE. Properly built, it makes IDEs unnecessary.)

Programmer productivity is binary, in that a programmer is either in a productive, engaged state of “flow” or in a disorganized, unpleasant, and unproductive state out of flow where 10% as much (if that) is accomplished. (A significant amount of work stress, in my estimation, is caused by a self-inflicted sense of pressure to be productive while one is out of flow.) It takes programmers about 15 to 30 minutes to enter flow, but once in it, they are immensely productive and moreover, quite happy. Developers usually write the most code and their best code when in flow, and are best at reading code when in this state as well. There’s a problem: reading other peoples’ code, if the code presents a lot of accidental complexity obscuring the question the reader is trying to answer, often shatters flow (which is why bad code is hated with a passion that only programmers understand).

Programmers understand flow and its importance from experience, and flow becomes more important as one increases one’s programming skills (to the point that 2.0+ programmers, rather than negotiating for higher salaries, tend to negotiate perks oriented toward flow and engagement, such as a quiet working space and an unconditional right to turn down meetings). An experienced programmer can work at a 1.0 level (cranking out code of nominal additive value) without inspiration, but 1.5+ and especially 2.0+ level contributions require creativity and focus. So experienced, elite programmers know that a REPL-less language is generally a dead-end. In a “green field” environment where the programmer controls the entire context in which is work exists, engaged writing of code in languages like Java is still possible– but engaged reading of code is out of the question. The engaging way to read code is to get a big-picture sense of it through interaction, and then to examine the code for implementation details strictly after this has been achieved.

This, in my mind, singularly explains the ceiling that Java developers hit around 1.2. By some non-satisfactory definition akin to Turing-completeness for programming languages, a 1.2 programmer has the knowledge and resources to “solve any programming problem” (ignoring performance and feasibility concerns). The solution may be inelegant, slow, unmaintainable, and even bug-prone, and it might take a long time for the solution to be delivered, but there aren’t programming problems that a 1.2 (or even 1.0) engineer “can’t solve”. On the other hand, far more interesting is how people solve problems, and some of the things that programmers must do if they want to become elite (1.5+) programmers are (a) figure out which problems are worth solving, and (b) learn how to read solutions that other people have created. To become a great programmer, one must be able to read code (in an engaged state of flow). Moreover, one must read good code, a commodity that is depressingly rare in this industry.

Reading code can be an immense joy that produces “Aha!” moments, or it can be hellishly tedious and unfruitful. Sadly, most real-world code (especially in languages like Java and C++) is closer to the latter extreme– it’s probably over 90 percent. Code rots for a variety of reasons. One is the wine/sewage problem (“a teaspoon of sewage in a barrel of wine makes a barrel of sewage, a teaspoon of wine in a barrel of sewage makes a barrel of sewage”): if a system is corrupted and the nastiness isn’t aggressively refactored, the kludges will beget counterkludges in “maintenance” and destroy the whole system. A related issue is the “broken windows” effect: tolerance of ugliness leads to a sense of abandon, and this is more common than most programmers will admit. Modifying code in a reasonable way (i.e. one that doesn’t, despite solving an immediate bug or adding a specific feature, make the general quality of the code worse) requires understanding it, and that usually involves reading it, and most code is so terrible that programmers who have to use, extend or maintain it just give up on comprehension and “hack” it as far as they can. Programming in this style is akin to the” Jenga” game, where players must remove planks from a tower and place them on top of it, making the structure less sturdy and higher as they go (until it collapses, and the player whose turn it is loses).

There’s no silver bullet for code comprehension, but the REPL is the closest thing. The worst code may remain impenetrable, but few real projects begin their existence as incomprehensible legacy nightmares; they usually start out as average code for a project of that size. REPLs make it possible to explore average-case code and comprehend it without dedicating massive amounts of time to the process, and this makes a huge difference. Aging modules can be refactored in the earliest stages of decay, long before they get anywhere near the “legacy horror” state. By enabling interactive peeking and poking of functions, REPLs allow programmers to explore libraries and get a sense of their interfaces. For example, in Ocaml, it’s possible to get the full type signature of any module:

# module L = List;;
module L :
sig
val length : 'a list -> int
val hd : 'a list -> 'a
val tl : 'a list -> 'a list
val nth : 'a list -> int -> 'a
val rev : 'a list -> 'a list
val append : 'a list -> 'a list -> 'a list
val rev_append : 'a list -> 'a list -> 'a list
val concat : 'a list list -> 'a list
val flatten : 'a list list -> 'a list
val iter : ('a -> unit) -> 'a list -> unit
val map : ('a -> 'b) -> 'a list -> 'b list
[...]
end

From these type signatures, it’s relatively easy to get a sense of what these functions do and to test those intuitions:

# List.length [1; 1; 2; 3; 5; 8];;
- : int = 6
# List.map (fun x -> x*x) [1; 1; 2; 3; 5; 8];;
- : int list = [1; 1; 4; 9; 25; 64]

With Ocaml’s powerful REPL, a person can explore code and get a sense of the big picture before starting to read it. That makes a huge difference: reading code is an order of magnitude easier and more engaging when one understands what one is looking at. Moreover, many Lisps such as Clojure and Common Lisp provide documentation functions at the REPL that allow the user to read a function’s documentation without having to leave the command-line. This provides all of the benefits of an IDE, without the flow-breaking drawbacks.

For an aside, there’s something that elite programmers (1.5+) call “keyboard snobbery”: IDEs are scorned, while the key-combos of emacs and vim are venerated, even with anachronistic names like “meta” for the escape key.  The command line interface is highly valued. This doesn’t apply to all computer use (when web surfing, keyboard snobs use the mouse like anyone else) but it does apply to writing and reading code. Why? Because the mouse is physical, continuous and imprecise, while the keyboard is  cerebral, discrete and exact (and therefore a better tool when programming). When we use web pages, we trust the developer to handle the imprecision (in determining whether a button was clicked, and in interpreting the mouse event). This is fine for this purpose, but when we’re writing code, we want exactitude and total control of our interaction with the machine. We want exactly the result we expect at all times. So that’s why we prefer the keyboard when coding, but there’s something else going on as well. Switching from keyboard to mouse doesn’t only involve a move of the hand. It reframes the interaction between the human and the machine, and that’s a context switch. Seemingly benign context switches inflict major drag on programmer productivity. Switching to the mouse because one’s IDE requires it? That’s 3-5 minutes. Pinging about the filesystem because of some stupid requirement that each class live in its own file? That’s about ten minutes. Managerial interruption? An hour, and half a day if the meeting is unexpected and intense. Programmers hate being nickel-and-dimed by context switches, and they hate being out of flow. This is why seriously good programmers prefer “archaic” tools like the command-line interface, vim and emacs, while considering typical mouse-driven modern IDEs (which are necessary if one is developing in a verbose basketcase of a language like Java, but unnecessary in better languages) to be useless.

The REPL, served at the commandline, allows a person to interact with code as if it were live, and see what the pieces do. At least half of what we must do as programmers is comprehension of assets that other people have created, and the REPL allows us to do this without the painful context switch associated with having to read code cold. It enables “flowful” (that is, engaging) exploration and, later, “lazy reading” of code. (Lazy, in this sense, is a non-pejorative computer science term associated with doing only the work needed to solve a problem.) That’s something REPL-less languages can’t provide, because in them, code is a dead static thing that might be run against some dead static tests, not something a developer can interact with as he works.

Reading code is part of the job description of any programmer, and yet it’s rarely done well because enterprise languages like Java make the process so dismal that most people just give up, falling into abominable development practices. When 90 percent of the code is tedious boilerplate (accidental complexity) that isn’t worth the eye strain, it’s easy to miss crucial details. The accumulation of missed details leads to frank incomprehension quickly, and then development practices akin to “throwing mud at the wall and seeing what sticks” become the norm.

This, I believe, is what holds back most Java developers’ progress. Not only does code in such languages become horrible quickly, but the environment makes it unpleasant to read even “good” (by which, I mean “above average for the language) code. In fact, most IDEs tacitly assume that no one is going to bother to read code after it is first written, and adjust accordingly.

For a contrast, this is something Ocaml got right in a major way. Ocaml is an obscure “niche” language, but it has the highest average quality of programmers that I’ve ever seen (even higher than Haskell and Lisp, although those are close). I don’t believe the reason for this is that only good developers can use Ocaml. Instead, what Ocaml achieves is that it makes it a joy to read average-case code– no small feat. Pattern matching, a core feature of the language, is explicitly designed to make what would otherwise be complex control flows human-readable. Haskell is an excellent language as well, and extremely terse, but in my belief it’s optimized (more than Ocaml) for writers of code (although still far better, from a reader’s perspective, than Java). I would guess that the ML family of languages (which are elegantly simple) are the only languages on earth that go so far to make almost all code readable, even in large systems. Of course, it’s still absolutely possible to write horrible, illegible Ocaml code– the language puts up more of a fight against bad practices than most, but it can be done. The difference, relevant for economic rather than purist discussions, is that average-case Ocaml code is attractive, whereas even very good Java code looks only 20 percent less ugly than typical “bad” Java code.

In a language like Ocaml, there’s so little boilerplate and accidental complexity that one can look at the code and actually see the problem being solved. For larger systems, one can test one’s intuitions at the REPL. No one needs to rifle through 300-page design documents to understand what a well-written Ocaml program does. The consequence of this is that Ocaml has libraries of generally very clean code that people can read as they learn the language. Since it’s not an unpleasant process to read code, they do so, and they grow as programmers at a rate that would be unheard-of in Java or C++: rising from 0.8 to 1.5 in about two to three years is typical. Ocaml isn’t some “hard” language that only 1.5+ programmers have a chance of understanding. It’s a language that turns ordinary programmers into 1.5ers rapidly.

There’s one language that can be cited as a counterexample to the “REPL or Fail” rule, and that’s C. C, invented in the 1970s, doesn’t have a REPL. Why was this acceptable for C? First, the language grew up in a different time, when small programs (that would today be replaced by “scripts” unless performance were an issue) were the norm. Programmers could “grow up” on C in 1985 because the programs they’d be reading were small and had well-defined semantics. Second, to say that “C lacks a REPL” is a bit strict. It doesn’t have a language-native REPL, but the Unix/C environment does have a (rudimentary, but sufficient) REPL: the command-line console. This was the environment in which C programs were run and explored: first you run wc and cat to see what they do, and then you could look at their C code and discover how they do it. C was designed with a “small-program” model of development (because large, megalithic programs were simply untenable in 1975) in mind. If complex behaviors were desired, they could be established by composing independent C programs and having them communicate through pipes and sockets. In this world, one could read “a whole C program” (a small, independent module, usually in one file) in one sitting. One only needed a REPL (command-line console) to understand the bigger environment: Unix.

How’d we end up with these disengaging, REPL-less languages? As I said; speaking superficially and strictly, C has no REPL. This was not a problem for C because large programs were so rarely written in it, and enough small, well-written C programs were distributed in every Linux environment that a programmer could learn the language from those. Where C++ differs is that large, complex, and monolithic programs are written in it, because the language has just enough in the way of high-level support to let people attempt them. The result is that C++ supports beasts of complexity (such as 200-line functions, 1000-line class definitions, and 1-million-line whole programs spanning several directories) that would be unconscionable in C, and yet fails to provide the one tool that might enable a programmer to make sense of such things. Although writing a C++ REPL is possible, it wouldn’t be easy: the language is so deeply imperative and crystalline that overcoming the mismatch between the two models of programming would be a monumental task. Java, as a descendant of C++ in syntax and culture, inherited most of these illnesses from it while becoming the default language for enterprise programming, and was also launched without a REPL. The result is that millions of people are stuck in a REPL-less language and don’t know why, while hacking on monolithic projects of intractable complexity that are doomed to get worse over time.

The REPL isn’t just a tool. It’s an engaging classroom in which one learns how to be a programmer. It’s absolutely necessary for a person assigned a task that involves comprehending a complex piece of code. And unlike the training wheels of an IDE, it doesn’t attempt to hide “difficult” details from the developer; it allows her to explore them to arbitrary depth when she is ready.

For these reasons, the interactive mode can’t be considered a luxury of those who are privileged enough to work in “elite” languages. There’s no reason programming should be that way. If we want to democratize programming (and there’s no reason we can’t have at least ten times as many 1.5+ programmers as are alive now, and 10x is a conservative goal; considering the world population) we need to begin orienting ourselves toward modern languages. And there is one rule that seems more fundamental than any argument about static vs. dynamic typing or imperative vs. functional programming: REPL or fail.