Gervais / MacLeod 9: Convexity

Originally, I had intended the 9th part to be the one in which I solve this damn thing for good. Unfortunately, as that “9th post” crossed into 12 kiloword territory, I realized that I needed to break it up a bit. Even for me, that’s long. So I had to tear some stuff out and split this “final post” yet again. So here’s the 9th chapter in this ongoing lurid saga. (See: Part 1Part 2Part 3Part 4Part 5Part 6Part 7, Part 8). I am really trying to wrap this fucker up so I can get my life back, but I’m not going to solve it just yet… there are a few more core concepts I must address. Today’s topic, however, is convexity.

What is convexity?

Convexity pertains to the question: which is more important, the difference between excellent work and mediocre work, or that between mediocre work and noncompliance (zero)? For concave work, the latter is more important: just getting the job done is what matters, and qualitative concerns are minimal. For convex work, the difference between excellence and mediocrity is substantial and that between mediocrity and failure is small.

The theoretical basis for this is the logistic function, or “S-curve”, which is convex at its left side (looking exponential at y << 0.5) and concave on the right, as it approaches a horizontal asymptote (saturation point). Model input as a numerical variable i pertaining to resources, skill, talent, or effort. Then model task output as having a maximal yield Y, and the function Y * p(i) where p is a logistic function with range (0, 1) representing the proportion of maximum possible yield that is captured. Now, the inflection point (switch-over from convexity to concavity) is exactly where p(i) = 0.5. Taken in full, this logistic function is neither concave or convex. Yet, for most economic problems, the relevant band of the input range is narrow and is mostly on one side of the inflection point or the other. We can classify tasks as convex or concave based on what we know about the performance of the average.

To get concrete about it, consider exams in most schools. A failing student might be able to answer 60% of the questions right; an average one gets 80%, and the good ones get 90%. That’s a concave world. The questions are easy, and one needs to get almost all of them right to distinguish oneself. On the other hand, a math researcher would be thrilled to solve 50% of the (much harder) problems she confronts. With concave work, the success or completion rates tend to be high because the tasks are easy. With convex work, they’re low because the tasks are hard. What makes convex work worth doing is that, often, the potential yield is much higher. If the task is concave, it’s been commoditized, and it’ll be hard to make a profit by doing it. Even 100% completion will yield only marginal profits. If the work is convex, the most successful players can generate outsized returns. It may not be clear even what the upper limit (the definition of “100%”) is.

Convexity and management

Consider the following payoff structures for two tasks, A and B. A is concave; B is convex, and 0 to 4 represent degrees of success.

| Performance | A Payoff | B Payoff | Q dist | R dist |
-------------------------------------------------------
| 4 (Superb)  |      125 |      500 |  20.0% |   0.0% |
| 3 (Good)    |      120 |      250 |  20.0% |  20.0% |
| 2 (Fair)    |      100 |      100 |  20.0% |  60.0% |
| 1 (Poor)    |       60 |       25 |  20.0% |  20.0% |
| 0 (Awful)   |        0 |        0 |  20.0% |   0.0% |
-------------------------------------------------------

Further, let’s assume there are two management strategies, Q and R. Under Q, the workforce will be uniformly distributed among the five tiers of performance: 20% in each. Under R, 20% each of the workforce will fall into Good and Poor, 60% into the Fair tier, and none into the Superb or Awful tiers. R is variance-reducing managerial strategy. It brings people in toward the middle. The goal, here is to maximize bulk productivity, and we assume we have enough workers that we can use the expected payoff as a proxy for that.

For Job A, which is concave, management strategy Q produces an output-per-worker of 81, while R yields 96. The variance-reducing strategy, R, is the right one, yielding 15 points more. For example, bringing up the worst slackers (from 0 to 60) delivers more benefit than pulling down the top players (from 125 to 120).

For Job B, which is convex, strategy Q gives us an average yield of 165, while R delivers only 105– 60 points less. The variance-reducing strategy fails. We see more of a drop in pulling down the best people (from 500 to 250) than we gain in hauling the laggards (o to 25).

In short, when the work is concave, variance is your enemy and reducing it increases expected yield. When the work is convex, variance is your friend; more risk means more yield. 

The above may seem disconnected from the problems of the MacLeod organization, but it’s not. MacLeod organizations are based on variance-reduction management strategies, which have worked overwhelmingly well over the past 200 years of concave, industrial-era labor. MacLeod Losers naturally desire familiarity, uniformity, and stability. They want variance to be reduced and will give up autonomy to have that. MacLeod Clueless (middle managers) take on the job of reducing variance in conditions for the Losers below them, and reducing volatility in performance for the Sociopaths above. Their job is to homogenize and control, and they do it well. It doesn’t require vision or strategy. MacLeod Sociopaths start out as the “heroic” risk-takers (entrepreneurs) but that caste often evolves (as especially as transplant executives come in) into a cushy rent-seeking class as the organization matures (necessitating the obfuscation enabled by the Clueless and the Effort Thermocline). The Sociopath category itself becomes risk-averse, out of each established individual’s desire to protect organizational position. The result is an organization that is very good at reducing variance and stifling individuality, but incapable of innovation. How do we come back from that?

In the concave world, the failures of the MacLeod organization were tolerable. Businesses didn’t need to generate new ideas. They needed to turn existing, semi-fresh ones into repeatable processes, and motivate large groups of people to carry out difficult but mostly simple functions. Variance-reduction was desired and encouraged. Only in the past few decades, with the industrial era fading and the technological one coming on, has there been a need for business to have in-house creative capacity.

Old-style organizations: the optimization model

The convex/concave discussion above assumes one dimension of input (pertaining to how good an individual is at a job) and one of output (observed productivity). In truth, a more accurate model of an organization’s performance would have a interconnected network of such “S-curve” functions for the relationships between various variables, many of which are hidden. There’d be a few input variables (“business variables”) and the things the company cares about (profit, reputation, organizational health) would be outputs, but most of the cause-and-effect relationships are hidden. Wages affect morale, which affects performance, which affects productivity, which affects the firm’s profits, which is its performance function. With all of the dimensions that could be considered, this function might be very convoluted, and while it is held to exist “platonically” it is not known in its entirety. The actual function relating controllable business variables to performance is illegible (due to hidden variables) and certainly not perfectly concave or convex.

So how does the firm find an optimal solution for a problem it faces?

This gets into an area of math called optimization, and I’m not going to be able to do it justice, so I’ll just address it in a hand-wavy way. First, imagine a two-dimensional space (if only because it’s hard to visualize more) where each point has an associated value, creating a 3-dimensional graph surface. We want to find the “highest point”. If that surface is globally concave, like an inverted bowl, that’s very easy, because there can only be one maximum. We can start from any point and “hill climb”: assess the local gradient, step in the most favorable dimension. We’ll end up at the highest point. However, the more convoluted our surface is, the harder the optimization problem. If we pick a bad starting point on a convoluted surface, we might end up somewhere sub-optimal. Thinking of it in topographical terms, a “hill climb” from most places won’t lead to the top of Mount Everest, but to the neighborhood’s highest hill. In other words, the “starting point” matters if the surface is convoluted.

Real optimization problems usually involve more than two dimensions. It is obviously not the case that organizations perform optimizations over all possible values of all possible business variables (of which there are an infinite number). Additionally, the performance function changes over time. As a metaphor, however, for the profit-maximizing industrial corporation, it’s surprisingly useful. One part of what it must do is pick a good “starting point” for the state of the business, a question of “What should this company be?” That requires non-local insight. Another part is the iterative process of refinement and hill-climbing once that initial point is selected.

This leads to the three-tier organization. People who are needed for commodity labor but not trusted, at all, to affect business variables are mere workers. Managers perform the iterative hill-climb and find the highest point in the neighborhood. In startup terms, they “iterate”. Executives, whose job is to choose starting points, also have the right to make non-local “jumps” in business state if needed. In startup terminology, they “pivot”.

The MacLeod organization gets along well with this computational model of the organization. MacLeod Losers are mediocre in dedication, but that’s fine. That aspect of them is treated as a hidden variable that can be modulated via compensation (carrots) or managerial attention (sticks). In the optimization model, they’re just infrastructure– human resources in the true sense of the word. The fault of MacLeod Clueless is that they aren’t strategic, but they don’t need to be. Since their job is just to climb a hill, they don’t need to worry about non-local “vision” concerns such as whether they’re climbing the right hill. That’s for someone else to worry about. They just assess the local gradient and move in the steepest upward direction. Finally, there are the MacLeod Sociopaths, whose goal is to be strategic and have non-local insight. Being successful at that usually requires a high quality of information, and people don’t get that stuff by following the rules. The source could be illegal (industrial espionage) or chaotic (experimental approaches to social interaction) or merely insubordinate (a programmer learning new technologies on “company time”) but it’s almost always transgressive. The MacLeod Sociopath’s ability to get information confers more benefit, in an executive position, than the negatives associated with that category.

Why the optimization model breaks down

In the model above, there is some finite and well-specified set of business variables. The real world is much more unruly. In truth, there are an infinite number of dimensions. Two things make this more tractable. The first is sparsity. Most dimensions don’t matter. For example, model “product concern” as a vector representing the products that a company might make (1.0 meaning “it’s the only thing we care about”, 0 meaning “not interested at all”). Assume there are 387 trillion possible conceivable products that a firm could create. That’s 387 trillion business variables; 386.99999…+ trillion of those entries are always going to be zero (excepting a major pivot) and can be thrown out of the analysis. Second is aggregation. For personnel, one could have a variable for each of the world’s 7.1 billion people (again, most being zero for ‘not working here’) but most companies just care about a few things, like how many people they employ and how much they cost. Headcount and budget are the important business variables. Whether John A. Smith, 35, of Flint, Michigan is employed at the company (i.e. one of those 7.1 billion personal variables) isn’t that relevant for most values of John A. Smith, so executives need not concern themselves with it.

Even still, modern companies have thousands to millions of business variables that matter to them. That’s more information than a single person can process. Then there is the matter of what variables might matter (unknown unknowns). If the optimization problem were simple, the company would only need one executive to call out starting points, but these information issues mandate a larger team. The computation that the organization exists to perform must be distributed. It can’t fit on one “node” (i.e. one person’s head). That also mandates that this massively high-dimensional optimization problem be broken down as well. (I’m ignoring the reality, which is that most people in business don’t “compute” at all, and that many decisions are made on hunches rather than data.)

As far as many dimensions are separable (that is, they aren’t expected to interact, so the best values for each can be found in separation) the problem can be decomposed by splitting it into subproblems and solving each in isolation. Executives take the most important business variables where it is most likely that non-local jumps will be needed, such as whether to lay off 15% of the workforce. The less important ones (like whether to fire John A. Smith) are tackled by managers. Workers don’t participate in the problem-solving; they’re just machines.

This evolves from an optimization model where business variables and performance functions are presumed to exist platonically, to a distributed agent-based model operated by local problem-solving agents. This is a more accurate model of what actually happens in the corporation, made further amusing by the fact that the agents often have diverging personal objective functions. Centralized computation is no longer the most important force in the company; it’s communication between the nodes (people).

Here’s where MacLeod comes in to the agent-based model. MacLeod Losers consume information and turn it into work. That’s all they’re expected to do, and ideally the only thing that should be coming back is one word: done. MacLeod Clueless furnish information up and down the food chain, non-editorially because they aren’t strategic enough to turn it into power. They tell the Losers what to do, and the Sociopaths what was done, and they aren’t much of a filter. The only information they take in, in general, is what information others want from them. The MacLeod Sociopaths are strategic givers and takers of information, and (having their own agendas) they are selective in what they transmit. Organizations actually need such people in order to protect the top decision-makers from “information overload”. It’s largely the bottom-tier Sociopaths who participate in dimensionality reduction and aggregation, so they’re absolutely vital; however, they make sure to use whatever editorial sway they have toward their own benefit.

Optimization and convexity

The actual performance function of a company, in terms of its business variables, is quite convoluted. It’s generally concave in a neighborhood (enabling managers to find the “local hill”) but its global structure is not, necessitating the non-local jumping afforded to executives. The underlying structure, as I said earlier, is driven by an inordinate number of hidden variables. It might be best thought of as a neural network of S-curve functions (“perceptrons”) wherein there are elements of concavity and convexity, often interacting in strange ways. It’s not possible for anyone to ascertain what a specific organization’s underlying network looks like exactly. The overall relationship between business variables (inputs) and performance (output) is not going to be purely concave or convex. The best one can hope for is a well-chosen initial point in which the neighborhood is concave.

For typical organizations and most people in them, concavity has a lot of nice properties. For one thing, it tends toward fairness. Allocation of resources to varying parties, if the input/output relationships are concave, is likely to favor an “interior” solution– everyone gets some– because marginal returns diminish as the resource is allocated to richer people. If the input/output relationship is convex, resource allocation can favor an “edge” solution where one party gets everything and the rest get none, which tends to exacerbate the “power law” distributions associated with social inequalities: a few (who seem the most capable of turning those resources into value) get much, most get little or nothing. Another benefit of concavity is that performance relative to a standard can be measured. At concave work, the maximum sustainable output is typically well-studied and known, and acceptable error rates can be defined. With convex work, no one knows what’s possible. Once a maximum is established and can be reliably attained, the task is likely to become concave (as people develop the skills to perform it successfully more than 50% of the time). Research is inherently convex: most things explored don’t pan out, but those that do deliver major benefit. When those explorations lead to repeatable processes that can be carried out by people of average motivation and talent, that’s concave, commodity work.

MacLeod organizations exist as a risk trade between those who want to be protected from the vagaries of the market, so creating islands of concavity and easiness is kind of what they do. The Big World Out There is a place with many pockets of convexity, plenty of bad local maxima, and a difficult and mostly unexplored landscape. The MacLeod organization provides its lower-level workers with access to an already-explored and safe concave hill neighborhood. Executives read maps and find start points. Managers just follow the steepest gradient up to the top, and workers just have to follow and carry the manager’s stuff.

Technology and change

There’s a problem with concave work. It tends to be commoditized, making it hard to get substantial profits on it. If “100% performance” can truly be defined and specified, then it can also be achieved mechanically. Not only are the margins low, but machines are just better at it than humans: they’re cheaper, don’t need breaks, and don’t make as many mistakes. Robots are taking over the concave work, leaving us with convexity.

We cannot compete with robots on concave work. We’ll need to let them have it.

Software engineering is notoriously convex. First of all, excellent software engineers are 5 to 100 times more productive than average ones, a phenomenon known as the “10X engineer”. As is typical of convex projects, most software projects fail– probably over 80 percent deliver negative net value– but the payoffs of the successes are massive. This is even more the case in the VC-funded startup ecosystem, where companies that seem to show potential for runaway success are sped along, the laggards being shot and killed. In a convex world, that’s how you allocate resources and attention: focus on the winners, starve the losers.

Convexity actually makes for a very frustrating ecosystem. While convex work is a lot more “fun” because of the upside potential and the challenge, it’s not a great way to make a living. Most software engineers get to age 30 with no successful projects (most of this being not their fault) under their belt, no credibility on account of that series of ill-fated projects, and mediocre financial results (even if they had successes). Managing convex work, and compensating it fairly, are not things that we have a society have figured out how to do. For the past 200 years– the industrial era, as opposed to the technological one that is coming on– we haven’t needed to do it. Almost all human labor was concave. What little convex work existed was generally oriented toward standardizing processes so as to make 99.9% of the organization’s labor pool concave. We are now moving toward an economy where enormous amounts of work are done by machines, practically for free, leaving only convex, creatively taxing work.

The fate of the MacLeod organization

MacLeod organizations, over the past 200 years, could perform well. They weren’t great at innovation; they didn’t need it. They got the job done. One of the virtues of the corporation was its ability to function as a machine made out of people. It would render services and products not only at more scale, but much more reliably, than individual people could do. The industrial corporation co-evolved with the failings of each tier of the MacLeod organization, hence converging on the “optimization model” above that uses the traits of each to its benefit. Of course, I do not mean to suggest that these “computations” are performed in reality, but the metaphor works quite well. 

The modern technological economy has created problems for that style of organization, however. Microeconomic models tend to focus on a small number of business variables– price points, quantity produced, wages. Current-day challenges require thousands, often ill-defined. What product is one going to build? What kind of people should be hired? What kind of culture should the company strive toward, and how will it enforce those ideals? Those things matter a lot more in the technological economy. Hidden variables that could once be ignored are now crucial, and industrial-era management is falling flat. Combine this with the convexity of input/output relationships regarding individual talent, effort, and motivation, and we have a dramatically different world.

The islands of concavity that MacLeod organizations can create for their Losers and Clueless are getting smaller by the year. The ability to protect the risk-averse from the Big World Out There is diminishing. MacLeod Sociopaths were never especially scrupulous about keeping that implicit promise, but now they can’t.

Even individuals now have to make non-local (formerly executive-level) decisions. For a concrete example, consider education. The generalist education implicitly assumes that most people will have a concave relationship between their amount of knowledge in an area and utility they get from it. It’s vitally important, as most educational institutions see it, for one to first get a mediocre amount of knowledge about a lot of subjects. (I agree, for non-economic reasons. How can a person know what is interesting without having a wide survey of knowledge? A mediocre knowledge gives you enough to determine if you want to know more; with no knowledge, you have no clue.) However, there’s no such thing as a Generalist job description. The market doesn’t reward a breadth of mediocre knowledge. People need to specialize. In 1950, having a college degree bought a person credibility as someone capable of learning quickly, thus entry into the managerial, professional, or academic ranks. (Specialization could begin on the job.) By 1985, one needed a marketable major: preferably, math, CS, or a physical science. In 2013, what classes a person took (compilers? machine learning?) is highly relevant. The convex valuation of a knowledge base makes deep knowledge in one area more valuable than a broader, shallower knowledge. Choosing and changing specialties is also a non-local process. A well-rounded generalist can move about the interior by gradually shifting attention. The changing specialist must jump from one “pointy” position to another– and hope it’s a good place to be.

In technology especially, we’re seeing an explosion of dimensionality. General competence doesn’t cut it anymore. Firms aren’t willing to hire “overall good” people who might take 6 months to learn their technology stacks, and the most credible job candidates don’t want to pin their careers on companies that don’t strongly correspond with their (sometimes idiosyncratic) preferences. When there’s a bilateral matching problem (e.g. dating) it usually has something to do with dimensionality. Both sides of the market are “purple squirrel hunting”.

This proliferation of dimensionality isn’t sustainable, of course. One thing I’ve come to believe is that it has an onerous effect on real estate prices. That might seem bizarre, but the “star cities” are the only places that tolerate purple squirrel hunting. If you’re a startup that wants a Python/Scala/C++ expert with production experience in 4 NoSQL products and two PhDs, you can find her in the Bay Area. For some price, she’s out there. That’s not because the people in the Bay Area are better; it’s that, with more of them, you get a continuous (it’s available at some price) rather than discrete (you might wait intolerably long and not find it) market for talent– and also for jobs, from a candidate’s perspective– even if you’re trying to fill some ridiculous purple squirrel specification. That’s what makes “tech hubs” (e.g. Bay Area, New York, Boston) so attractive– to candidates and companies both– and a major part of what keeps them so expensive. The continuous markets make high-risk business and job-hopping careers– that aren’t viable in smaller cities unless one wants to move or tolerate remote work– possible. Since real estate in these areas is reaching the point of being unaffordable for technology workers, I think it’s a fair call to say that this dimensionality explosion in technology won’t continue forever. However, convexity and high dimensionality in general are here to stay, and about to become the norm for the greater economy. The convexity introduced by an economic arrangement where an increasing bulk of commodity labor is dumped directly on machines has incredible upsides, and is very attractive. Now, in the late-industrial era, global economic growth is about 4-5 percent per year. In the thick of the technological era– a few decades from now– it could be over 10% per year.

If MacLeod rank cultures are going to become obsolete, what will replace them? That I do not know for sure, but I have some thoughts. The “optimization model” paints a world where the relevant business variables are known. Executives call out initial values (based on non-local knowledge) for a gradient ascent performed by managers. As the business world becomes high-dimensional– too many dimensions for any one person to handle them all– it begins to break down the problem and distribute the “computation” (again, solely in metaphor). High-ranking executives handle important dimensions (sub-problems) where tricky non-local jumps might be in order. Managers handle less-important ones where continuous modulation will do. Getting the communication topology right is tricky. Often the conceptual hierarchy that is created will look suspiciously like the organizational hierarchy (Conway’s Law?) This leads to an interesting question: is this hierarchy of people– which will limit the firm’s capacity to form proper conceptual hierarchies and solve its own problems– even necessary? Or is it better to have all eyes open on non-local, “visionary” questions? Is that a good idea? Organizations claim to want their employees to “act like owners”. Is that really true? With the immense complexity of the technological economy, and the increasing inability of centralized management to tackle convexity (one cannot force creative excellence or innovation by managerial fiat) it might have to be true.

Enter the self-executive. Self-executive people don’t think of themselves as subordinate employees, but as free agents. They don’t want to be told what to do. They want to excel. A manager who will guide one (mentorship) gets loyalty. However, typical exploitative managers get ignored, sabotaged, or humiliated. Self-executive employees are the ones who can handle convexity, and enjoy the risk and challenge of hard problems. They strongly toward chaos on the civil alignment spectrum. These are the people one will need in order to navigate a convex technological economy, and the self-executive culture is the one that will unleash their capabilities.

That said, the guild culture has a lot to add as well and should not be ignored. There’s a lot of lost work in exploration that can be eliminated by advice from a wise mentor (although if things change, as they do more rapidly these days, that “don’t go there” advice might sometimes be best discarded). The valuation of knowledge and skill are so strongly convex that there’s immense value generation in teaching. Not only should that not be ignored, but it’s going to become a critical component of the working culture. Companies that want loyalty are going to have to start teaching people again. Self-executives don’t work hard unless they believe they’re learning more on the work given to them than they would on their own– and these people tend to be fiercely autodidactic.

This brings us to the old quip. A VP tells his CEO that the company should invest more in its people, and he says, “What if we spend all that money training them and they leave?” The VP’s response: “what happens if we don’t and they stay?” That ends up looking like MacLeod rank culture over time. There’s a lot to be learned from guild culture, and when I finally Solve This Fucking Thing (Part 11? 12? 5764+23i?) I won’t be able to afford to overlook it.

IDE Culture vs. Unix philosophy

Even more of a hot topic than programming languages is the interactive development environment, or IDE. Personally, I’m not a huge fan of IDEs. As tools, standing alone, I have no problem with them. I’m a software libertarian: do whatever you want, as long as you don’t interfere with my work. However, here are some of the negatives that I’ve observed when IDEs become commonplace or required in a development environment:

  • the “four-wheel drive problem”. This refers to the fact that an unskilled off-road driver, with four-wheel drive, will still get stuck. The more dexterous vehicle will simply have him fail in a more inaccessible place. IDEs pay off when you have to maintain an otherwise unmanageable ball of other people’s terrible code. They make unusable code merely miserable. I don’t think there’s any controversy about this. The problem is that, by providing this power, then enable an activity of dubious value: continual development despite abysmal code quality, when improving or killing the bad code should be a code-red priority. IDEs can delay code-quality problems and defer macroscopic business effects, which is good for manageosaurs who like tight deadlines, but only makes the problem worse at the end stage. 
  • IDE-dependence. Coding practices that require developers to depend on a specific environment are unforgivable. This is true whether the environment is emacs, vi, or Eclipse. The problem with IDEs is that they’re more likely to push people toward doing things in a way that makes use of a different environment impossible. One pernicious example of this is in Java culture’s mutilation of the command-line way of doing things with singleton directories called “src” and “com”, but there are many that are deeper than that. Worse yet, IDEs enable the employment of programmers who don’t even know what build systems or even version control are. Those are things “some smart guy” worries about so the commodity programmer can crank out classes at his boss’s request.
  • spaghettification. I am a major supporter of the read-only IDE, preferably served over the web. I think that code navigation is necessary for anyone who needs to read code, whether it’s crappy corporate code or the best-in-class stuff we actually enjoy reading. When you see a name, you should be able to click on it and see where that name is defined. However, I’m pretty sure that, on balance, automated refactorings are a bad thing. Over time, the abstractions which can easily be “injected” into code using an IDE turn it into “everything is everywhere” spaghetti code. Without an IDE, the only way to do such work is to write a script to do it. There are two effects this has on the development process. One is that it takes time to make the change: maybe 30 minutes. That’s fine, because the conversation that should happen before a change that will affect everyone’s work should take longer than that. The second is that only adept programmers (who understand concepts like scripts and the command line) will be able to do it. That’s a good thing.
  • time spent keeping up the environment. Once a company decides on “One Environment” for development, usually an IDE with various in-house customizations, that IDE begins to accumulate plugins of varying quality. That environment usually has to be kept up, and that generates a lot of crappy work that nobody wants to do.

This is just a start on what’s wrong with IDE culture, but the core point is that it creates some bad code. So, I think I should make it clear that I don’t dislike IDEs. They’re tools that are sometimes useful. If you use an IDE but write good code, I have no problem with you. I can’t stand IDE culture, though, because I hate hate hate hate hate hate hate hate the bad code that it generates.

In my experience, software environments that rely heavily on IDEs tend to be those that produce terrible spaghetti code, “everything is everywhere” object-oriented messes, and other monstrosities that simply could not be written by a sole idiot. He had help. Automated refactorings that injected pointless abstractions? Despondency infarction frameworks? Despise patterns? Those are likely culprits.

In other news, I’m taking some time to learn C at a deeper level, because as I get more into machine learning, I’m realizing the importance of being able to reason about performance, which requires a full-stack knowledge of computing. Basic fluency in C, at a minimum, is requisite. I’m working through Zed Shaw’s Learn C the Hard Way, and he’s got some brilliant insights not only about C (on which I can’t evaluate whether his insights are brilliant) but about programming itself. In his preamble chapter, he makes a valid insight in his warning not to use an IDE for the learning process:

An IDE, or “Integrated Development Environment” will turn you stupid. They are the worst tools if you want to be a good programmer because they hide what’s going on from you, and your job is to know what’s going on. They are useful if you’re trying to get something done and the platform is designed around a particular IDE, but for learning to code C (and many other languages) they are pointless. [...]
Sure, you can code pretty quickly, but you can only code in that one language on that one platform. This is why companies love selling them to you. They know you’re lazy, and since it only works on their platform they’ve got you locked in because you are lazy. The way you break the cycle is you suck it up and finally learn to code without an IDE. A plain editor, or a programmer’s editor like Vim or Emacs, makes you work with the code. It’s a little harder, but the end result is you can work with any code, on any computer, in any language, and you know what’s going on. (Emphasis mine.)

I disagree with him that IDEs will “turn you stupid”. Reliance on one prevents a programmer from ever turning smart, but I don’t see how such a tool would cause a degradation of a software engineer’s ability. Corporate coding (lots of maintenance work, low productivity, half the day lost to meetings, difficulty getting permission to do anything interesting, bad source code) does erode a person’s skills over time, but that can’t be blamed on the IDE itself. However, I think he makes a strong point. Most of the ardent IDE users are the one-language, one-environment commodity programmers who never improve, because they never learn what’s actually going on. Such people are terrible for software, and they should all either improve, or be fired.

The problem with IDEs is that each corporate development culture customizes the environment, to the point that the cushy, easy coding environment can’t be replicated at home. For someone like me, who doesn’t even like that type of environment, that’s no problem because I don’t need that shit in order to program. But someone steeped in cargo cult programming because he started in the wrong place is going to falsely assume that programming requires an IDE, having seen little else, and such novice programmers generally lack the skills necessary to set one up to look like the familiar corporate environment. Instead, he needs to start where every great programmer must learn some basic skills: at the command-line. Otherwise, you get a “programmer” who can’t program outside of a specific corporate context– in other words, a “5:01 developer” not by choice, but by a false understanding of what programming really is.

The worst thing about these superficially enriched corporate environments is their lack of documentation. With Unix and the command-line tools, there are man pages and how-to guides all over the Internet. This creates a culture of solving one’s own problems. Given enough time, you can answer your own questions. That’s where most of the growth happens: you don’t know how something works, you Google an error message, and you get a result. Most of the information coming back in indecipherable to a novice programmer, but with enough searching, the problem is solved, and a few things are learned, including answers to some questions that the novice didn’t yet have the insight (“unknown unknowns”) yet to ask. That knowledge isn’t built in a day, but it’s deep. That process doesn’t exist in an over-complex corporate environment, where the only way to move forward is to go and bug someone, and the time cost of any real learning process is at a level that most managers would consider unacceptable.

On this, I’ll crib from Zed Shaw yet again, in Chapter 3 of Learn C the Hard Way:

In the Extra Credit section of each exercise I may have you go find information on your own and figure things out. This is an important part of being a self-sufficient programmer. If you constantly run to ask someone a question before trying to figure it out first then you never learn to solve problems independently. This leads to you never building confidence in your skills and always needing someone else around to do your work. The way you break this habit is to force yourself to try to answer your own questions first, and to confirm that your answer is right. You do this by trying to break things, experimenting with your possible answer, and doing your own research. (Emphasis mine.)

What Zed is describing here is the learning process that never occurs in the corporate environment, and the lack of it is one of the main reasons why corporate software engineers never improve. In the corporate world, you never find out why the build system is set up in the way that it is. You just go bug the person responsible for it. “My shit depends on your shit, so fix your shit so I can run my shit and my boss doesn’t give me shit over my shit not working for shit.” Corporate development often has to be this way, because learning a typical company’s incoherent in-house systems doesn’t provide a general education. When you’re studying the guts of Linux, you’re learning how a best-in-class product was built. There’s real learning in mucking about in small details. For a typically mediocre corporate environment that was built by engineers trying to appease their managers, one day at a time, the quality of the pieces is often so shoddy that not much is learned in truly comprehending them. It’s just a waste of time to deeply learn such systems. Instead, it’s best to get in, answer your question, and get out. Bugging someone is the most efficient and best way to solve the problem.

It should be clear that what I’m railing against is the commodity developer phenomenon. I wrote about “Java Shop Politics” last April, which covers a similar topic. I’m proud of that essay, but I was wrong to single out Java as opposed to, e.g. C#, VB, or even C++. Actually, I think any company that calls itself an “<X> Shop” for any language X is missing the point. The real evil isn’t Java the language, as limited as it may be, but Big Software and the culture thereof. The true enemy is the commodity developer culture, empowered by the modern bastardization of “object-oriented programming” that looks nothing like Alan Kay’s original vision.

In well-run software companies, programs are build to solve problems, and once the problem is finished, it’s Done. The program might be adapted in the future, and may require maintenance, but that’s not an assumption. There aren’t discussions about how much “headcount” to dedicate to ongoing maintenance after completion, because that would make no sense. If people need to modify or fix the program, they’ll do it. Programs solve well-defined problems, and then their authors move on to other things– no God Programs that accumulate requirements, but simple programs designed to do one thing and do it well. The programmer-to-program relationship must be one-to-many. Programmers write programs that do well-defined, comprehensible things well. They solve problems. Then they move on. This is a great way to build software “as needed”, and the only problem with this style of development is that the importance of small programs is hard to micromanage, so managerial dinosaurs who want to track efforts and “headcount” don’t like it much, because they can never figure out who to scream at when things don’t go their way. It’s hard to commoditize programmers when their individual contributions can only be tracked by their direct clients, and when people can silently be doing work of high importance (such as making small improvements to the efficiencies of core algorithms that reduce server costs). The alternative is to invert the programmer-to-program relationship: make it many-to-one. Then you have multiple programmers (now a commodity) working on Giant Programs that Do Everything. This is a terrible way to build software, but it’s also the one historically favored by IDE culture, because the sheer work of setting up a corporate development environment is enough that it can’t be done too often, and this leads managers to desire Giant Projects and a uniformity (such as a one-language policy, see again why “<X> Shops” suck) that managers like but that often makes no sense.

The right way of doing things– one programmer works on many small, self-contained programs– is the core of the so-called “Unix philosophy“. Big Programs, by contrast, invariably have undocumented communication protocols and consistency requirements whose violation leads not only to bugs, but to pernicious misunderstandings that muddle the original conceptual integrity of the system, resulting in spaghetti code and “mudballs”. The antidote is for single programs themselves to be small, for large problems to be solved with systems that are given the respect (such as attention to fault tolerance) that, as such, they deserve.

Are there successful exceptions to the Unix philosophy? Yes, there are, but they’re rare. One notable example is the database, because these systems often have very strong requirements (transactions, performance,  concurrency, durability, fault-tolerance) that cannot be as easily solved with small programs and organic growth alone. Some degree of top-down orchestration is required if you’re going to have a viable database, because databases have a lot of requirements that aren’t typical business cruft, but are actually critically important. Postgres, probably the best SQL out there, is not a simple beast. Indeed, databases violate one of the core tenets of the Unix philosophy– store data in plain text– and they do so for good reasons (storage usage). Databases also mandate that people be able to use them without having to keep up with the evolution of such a system’s opaque and highly-optimized internal details, which makes the separation of implementation from interface (something that object-oriented programming got right) a necessary virtue. Database connections, like file handles, should be objects (where “object” means “something that can be used with incomplete knowledge of its internals”.) So databases, in some ways, violate the Unix philosophy, and yet are still used by staunch adherents. (We admit that we’re wrong sometimes.) I will also remark that it has taken decades for some extremely intelligent (and very well-compensated) people to get databases right. Big Projects win when no small project or loose federation thereof will do the job.

My personal belief is that almost every software manager thinks he’s overseeing one of the exceptions: a Big System that (like Postgres) will grow to such importance that people will just swallow the complexity and use the thing, because it’s something that will one day be more important than Postgres or the Linux kernel. In almost all cases, they are wrong. Corporate software is an elephant graveyard of such over-ambitious systems. Exceptions to the Unix philosophy are extremely rare. Your ambitious corporate system is almost certainly not one of them. Furthermore, if most of your developers– or even a solid quarter of them– are commodity developers who can’t code outside of an IDE, you haven’t a chance.

Functional programs rarely rot

The way that most of us sell functional programming is all wrong. The first thing we say about it is that the style lacks mutable state (although most functional languages allow it). So we’re selling it by talking about what it doesn’t have. Unfortunately, it’s not until late in a programmer’s career (and some never get there) that he realizes that less power in a language is often better, because “freedom-from” can be more important than “freedom-to” in systems that will require maintenance. This makes for an ineffective sales strategy, because the people we need to reach are going to see the statelessness of the functional style as a deficit. Besides, we need to be honest here: sometimes (but not often) mutable state is the right tool for the job.

Often, functional programming is sold with side-by-side code snippets, the imperative example being about 30 lines long and relatively inelegant, while the functional example requires only six. The intent is to show that functional programming is superior, but the problem is that people are likely to prefer whatever is most familiar to them. A person who is used to the imperative style is more likely to prefer the imperative code, because they are more used to it. What good is a reduction to six lines if they seem impenetrable? Besides, in the real world, we use mutable state all the time: for performance and because the world actually is stateful. It’s just that we’re mindful enough to manage it well.

So what is it that makes functional programming superior? Or what is it that makes imperative code so terrible? The issue isn’t that state is inherently evil, because it’s not. Small programs are often easier to read in an imperative style than a functional one, just as goto improves the control flow of many small procedures. The problem, rather, is that stateful programs evolve in bad ways when programs get large.

A referentially transparent function is one that returns the same output per input. This, and immutable data, are the atoms of functional programming. Such functions actually have precise (and usually, obviously intended) semantics which means that it’s clear what is and what is not a bug, and unit testing is relatively straightforward. Contrast this with typical object-oriented software where an object’s semantics are the code, and it’s easy to appreciate why the functional approach is better. There’s something else that’s nice about referentially transparent functions, which is that they can’t be changed (in a meaningful way) without altering their interfaces.

Object-oriented software development is contrived to make sense to non-technical businessmen, and the vague squishiness associated with “objectness” appeals to that crowd. Functional programming is how mathematicians would prefer to program, and programming is math. When you actually care about correctness, the mathematician’s insistence on integrity even in the smallest details is preferable.

A functional program computes something, and intermediate computations are returned from one function and passed directly into another as a parameter. Behaviors that aren’t reflected in a returned value might matter for performance but not semantics. An imperative or object-oriented program does things, but the intermediate results are thrown away. What this means is that an unbounded amount of intermediate stuff can be shoved into an imperative program, with no change to its interface. Or, to put it another way, with a referentially transparent function, the interface-level activity is all one needs to know about its behavior. On the other hand, this means that for a referentially transparent function to become complex will require an equally complex interface. Sometimes the simplicity provided by an imperative function’s interface is desirable.

When is imperative programming superior? First, when a piece of code is small and guaranteed to stay small, the additional plumbing involved in deciding what-goes-where that functional programming can make the program harder to write.  A common functional pattern when state is needed is to “thread” it through referentially transparent functions by documenting state effects (that may be executed later) in the interface, which can complicate interfaces. It’s more intuitive to do a series of actions than build up a list of actions (the functional style) that is then executed. What if that list is very large? How will this affect performance? That said, my experience is that the crossover point at which functional programming becomes strictly preferable is low: about 100 lines of code at most, and often below 50. The second case of imperative programming’s superiority is when the imperative style is needed for performance reasons, or when an imperative language with manual memory management (such as C or C++) is strictly required for the problem being solved. Sometimes all of those intermediate results must be thrown away because there isn’t space to hold them. Mutable state is an important tool, but it should almost always be construed as a performance-oriented optimization.

The truth is that good programmers mix the styles quite a bit. We program imperatively when needed, and functionally when possible.

Imperative and object-oriented programming are different styles, and the latter is, in my view, more dangerous. Few programmers write imperative code anymore. C is imperative, Java is full-blown object-oriented, and C++ is between the two depending on who wrote the style guide. The problem with imperative code, in most business environments, is that it’s slow to write and not very dense. Extremely high-quality imperative code can be written, but it takes a very long time to do so. Companies writing mission-critical systems can afford it, but most IT managers prefer the fast-and-loose, sloppy vagueness of modern OOP. Functional and object-oriented programming both improve on the imperative style in terms of the ability to write code fast (thanks to abstraction) but object-oriented code is more manager-friendly, favoring factories over monads

Object-oriented programming’s claiming virtue is the ability to encapsulate complexity behind a simpler interface, usually with configurability of the internal complexity as an advanced feature. Some systems reach a state where that is necessary. One example is the SQL database, where the typical user specifies what data she wants but not how to get it. In fact, although relational databases are often thought of in opposition to “object-oriented” style, this is a prime example of an “object-oriented” win. Alan Kay’s original vision with object-oriented programming was not at all a bad one: when you require complexity, encapsulate it behind a simpler interface so that (a) the product is easier to use, and (b) internals can be improved without disruption at the user level. He was not saying, “go out and write a bunch of highly complex objects.” Enterprise Java is not what he had in mind.

So what about code rot? Why is the functional style more robust against software entropy than object-oriented or imperative code? The answer is inherent in functional programming’s visible (and sometimes irritating) limitation: you can’t add direct state effects, but have to change interfaces. What this means is that adding complexity to a function expands its interface, and it quickly reaches a point where it’s visibly ugly. What happens then? A nice thing about functional programming is that programs are build up using function composition, which means that large functions can easily be broken up into smaller ones. It’s rare, in a language like Ocaml or Clojure when the code is written by a competent practitioner, to see functions longer than 25 lines long. People break functions up (encouraging code reuse) when they get to that point. That’s a really great thing! Complexity sprawl still happens, because that’s how business environments are, but it’s horizontal. As functional programs grow in size, there are simply more functions. This can be ugly (often, in a finished product, half of the functions are unused and can be discarded) but it’s better than the alternative, which is the vertical complexity sprawl that can happen within “God methods”, and in which it’s unclear what is essential and what can be discarded. 

With imperative code, the additional complexity is shoved into a series of steps. For-loops get bigger, and branching gets deeper. At some point, procedures reach several hundred lines in length, often with the components being written by different authors and conceptual integrity being lost. Much worse is object-oriented code, with its non-local inheritance behaviors (note: inheritance is the 21st-century goto) and native confusion of data types and namespaces. Here, the total complexity of an object can reach tens of thousands of lines, at which points spaghettification has occurred.

Functional programming is like the sonnet. There’s nothing inherently superior about iambic pentameter and rhyming requirements, but the form tends to elevate one’s use of language. People find a poetic capability that would be much harder (for a non-poet) to find in free verse. Why? Because constraint– the right kind of constraint– breeds creativity. Functional programming is the same way. No, there’s nothing innately superior about it, but the style forces people to deal with the complexity they generate by forcing it to live at the interface level. You can’t, as easily, throw a dead rat in the code to satisfy some dipshit requirement and then forget about it. You have to think about what you’re doing, because it will effect interfaces and likely be visible to other people. The result of this is that the long-term destruction of code integrity that happens in most business environments is a lot less likely to occur.

Programmer autonomy is a $1 trillion issue.

Sometimes, I think it’s useful to focus on trillion-dollar problems. While it’s extremely difficult to capture $1 trillion worth of value, as made evident by the nonexistence of trillion-dollar companies, innovations that create that much are not uncommon, and these trillion-dollar problems often have not one good idea, but a multitude of them to explore. So a thought experiment that I sometimes run is, “What is the easiest way to generate $1 trillion in value?” Where is the lowest-hanging trillion dollars of fruit? What’s the simplest, most obvious thing that can be done to add $1 trillion of value to the economy?

I’ve written at length about the obsolescence of traditional management, due to the convex input-output relationship in modern, creative work, making variance-reductive management counter-productive. I’ve also shared a number of thoughts about Valve’s policy of open allocation, and the need for software engineers to demand the respect accorded to traditional professions: sovereignty, autonomy, and the right to work directly for the profession without risk of the indirection imposed by traditional hierarchical management. So, for this exercise, I’m going to focus on the autonomy of software engineers. How much improvement will it take to add $1 trillion in value to the economy?

This is an open-ended question, so I make it more specific: how many software engineers would we have to give a high degree (open allocation) of autonomy in order to add $1 trillion? Obviously, the answer is going to depend on the assumptions that are made, so I’ll have to answer some basic questions. Because there’s uncertainty in all of these numbers, the conclusion should be treated as an estimate. However, when possible, I’ve attempted to make my assumptions as conservative as possible. Therefore, it’s quite plausible that the required number of engineers is actually less than half the number that I’ll compute here.

Question 1: How many software engineers are there? I’m going to restrict this analysis to the developed world, because it’s hard to enforce cultural changes globally. Data varies, but most surveys indicate that there are about 1.5 million software engineers in the U.S. If we further assume the U.S. to be one-fifth of “the developed world”, we get 7.5 million full-time software engineers in the world. This number could probably be increased by a factor of two by loosening the definition of “software engineer” or “developed world”, but I think it’s a good working number. The total number of software engineers is probably two to three times that.

Question 2: What is the distribution of talent among software engineers? First, here’s the scale that I developed for assessing the skill and technical maturity of a software engineer. Defining specific talent percentiles requires specifying a population, but I think the software engineer community can be decomposed into the following clusters, differing according to the managerial and cultural influences on their ability and incentive to gain competence.

Cluster A: Managed full-time (75%). These are full-time software engineers who, at least nominally, spend their working time either coding or on software-related problems (such as site reliability). If they were asked to define their jobs, they’d call themselves programmers or system administrators: not meteorologists or actuaries who happen to program software. Engineers in this cluster typically work on large corporate codebases and are placed in a typical hierarchical corporate structure. They develop an expertise within their defined job scope, but often learn very little outside of it. Excellence and creativity generally aren’t rewarded in their world, and they tend to evolve into the stereotypical “5:01 developers”. They tend to plateau around 1.2-1.3, because the more talented ones are usually tapped for management before they would reach 1.5. Multiplier-level contributions for engineers tend to be impossible to achieve in their world, due to bureaucracy, limited environments, project requirements developed by non-engineers, and an assumption that anyone decent will become a manager. I’ve assigned Cluster A a mean competence of 1.10 and a standard deviation of 0.25, meaning that 95 percent of them are between 0.6 and 1.6.

Cluster B: Novices and part-timers (15%). These are the non-engineers who write scripts occasionally, software engineers in their first few months, interns and students. They program sometimes, but they generally aren’t defined as programmers. This cluster I’ve given a mean competence of 0.80 and a standard deviation of 0.25. I assign them to a separate cluster because (a) they generally aren’t evaluated or managed as programmers, and (b) they rarely spend enough time with software to become highly competent. They’re good at other things.

Cluster B isn’t especially relevant to my analysis, and it’s also the least well-defined. Like the Earth’s atmosphere, its outer perimeter has no well-defined boundary. Uncertainty about its size is also the main reason why it’s hard to answer questions about the “number of programmers” in the world. The percentage would increase (and so would the number of programmers) if the definition of “programmer” were loosened.

Cluster C: Self-managing engineers (10%). These are the engineers who either work in conditions of unusual autonomy (being successful freelancers, company owners, or employees of open-allocation companies) or who exert unusual efforts to control their careers, education and progress. This cluster has a larger mean competence and variance than the others. I’ve assigned it a mean of 1.5 and a variance of 0.35. Consequently, almost all of the engineers who get above 2.0 are in this cluster, and this is far from surprising: 2.0-level (multiplier) work is very rare, and impossible to get under typical management.

If we mix these three distributions together, we get the following profile for the software engineering world:

Skill Percentile
1.0     38.37
1.2     65.38
1.4     85.24
1.6     94.47
1.8     97.87
2.0     99.23
2.2     99.78
2.4     99.95
2.6     99.99

How accurate is this distribution? Looking at it, I think it probably underestimates the percentage of engineers in the tail. I’m around 1.7-1.8 and I don’t think of myself as a 97th-percentile programmer (probably 94-95th). It also says that only 0.77 percent of engineers are 2.0 or higher (I’d estimate it at 1 to 2 percent). I would be inclined to give Cluster C an exponential tail on the right, rather than the quadratic-exponential decay of a Gaussian, but for the purpose of this analysis, I think the Gaussian model is good enough.

Question 3: How much value does a software engineer produce? Marginal contribution isn’t a good measure of “business value”, because it’s too widely variable based on specifics (such company size) but I wouldn’t say that employment market value is a good estimate either, because there’s a surplus of good people who (a) don’t know what they’re actually worth, and (b) are therefore willing to work for low compensation, especially because a steady salary removes variance from compensation. Companies know that they make more (a lot more) on the most talented engineers than on average ones– if they have high-level work for them. The better engineer might be worth ten times as much, but only cost twice as much, as the average. So I think of it in terms of abstract business value: given appropriate work, how much expected value (ignoring variance) can that person deliver?

This quantity I’ve assumed to be exponential with regard to engineer skill, as described above: an increment of 1.0 is a 10-fold increase in abstract business value (ABV). Is this the right multiplier? It indicates that an increment of 0.3 is a doubling of ABV, or that a 0.1-increment is a 26 percent increase in ABV. In my experience, this is about right. For some skill-intensive projects, such as technology-intensive startups, the gradient might be steeper (a 20-fold difference between 2.0 and 1.0, rather than 10) but I am ignoring that, for simplicity’s sake.

The ABV of a 1.0 software engineer I’ve estimated at $125,000 per year in a typical business environment, meaning that it would be break-even (considering payroll costs, office space, and communication overhead) to hire one at a salary around $75,000. A 95th-percentile engineer (1.63) produces an ABV of $533,000 and a 99th-percentile engineer (1.96) produces an ABV of $1.14 million. These numbers are not unreasonable at all. (In fact, if they’re wrong, I’d bet on them being low.)

This doesn’t, I should say, mean that a 99th-percentile engineer will reliably produce $1.14 million per year, every year. She has to be assigned at-level, appropriate, work. Additionally, that’s an expected return (and the variance is quite high at the upper end). She might produce $4 million in one year under one set of conditions, and zero in another arena. Since I’m interested in the macroeconomic effect of increasing engineer autonomy, I can ignore variance and focus on mean expectancy alone. This sort of variance is meaningful to the individual (it’s better to have a good year than a bad one) and small companies, but the noise cancels itself out on the macroeconomic scale.

Putting it all together: with these estimates of the distribution of engineer competence, and the ABV estimates above, it’s possible to compute an expected value for a randomly-chosen engineer in each cluster:

Cluster A: $185,469 per year.

Cluster B: $89,645 per year.

Cluster C: $546,879 per year.

All software engineers: $207,236 per year.

Software engineers can evolve in all sorts of ways, and improved education or more longevity might change the distributions of the clusters themselves. That I’m not going to model, because there are so many possibilities. Nor am I going to speculate on macroeconomic business changes that would change the ABV figures. I’m going to focus on only one aspect of economic change, although there are others with additional positive effects, and that change is the evolution of an engineer from the micromanaged world of Cluster A to the autonomous, highly-productive one of Cluster C. (Upward movement within clusters is also relevant, and that strengthens my case, but I’m excluding it for now.) I’m going to assume that it takes 5 years of education and training for that evolution to occur, and assume an engineer’s career (with some underestimation here, justified by time-value of money) lasts 20 years, meaning there are 15 years to reap the benefits in full, plus two years to account for partial improvement during that five-year evolution.

What this means is that, for each engineer who evolves in this way, it generates $361,410 per year in value for 17 years, or $6.14 million per engineer. That is the objective benefit that accrues to society in engineer skill growth alone when a software engineer moves out of a typical subordinate context and into one like Valve’s open allocation regime. Generating $1 trillion in this way requires liberating 163,000 engineers, or a mere 2.2% of the total pool. That happens even if (a) the number of software engineers remains the same (instead of increasing due to improvements to the industry) and (b) other forms of technological growth, that would increase the ABV of a good engineer, stop, although it’s extremely unlikely that they will. Also, there are the ripple effects (in terms of advancing the state of the art in software engineering) of a world with more autonomous and, thus, more skilled engineers. All of these are substantial, and they improve my case even further, but I’m removing those concerns in the interest of pursuing a lower bound for value creation. What I can say, with ironclad confidence, is that the movement of 163,000 engineers into an open-allocation regime will, by improving their skills and, over the years, their output, generate $1 trillion at minimum. That it might produce $5 or $20 trillion (or much more, in terms of long-term effects on economic growth) in eventual value, through multiplicative “ripple” effects, is more speculatory. My job here was just to get to $1 trillion.

In simpler terms, the technological economy into which our society needs to graduate is one that requires software (not to mention mathematical and scientific) talent at a level that would currently be considered elite, and the only conditions that allow such levels of talent to exist are those of extremely high autonomy, as observed at Valve and pre-apocalyptic Google. Letting programmers direct their own work, and therefore take responsibility for their own skill growth is, multiplied over the vast number of people in the world who have to work with software, easily a trillion-dollar problem. It’s worth addressing now.

What Programmers Want

Most people who have been assigned the unfortunate task of managing programmers have no idea how to motivate them. They believe that the small perks (such as foosball tables) and bonuses that work in more relaxed settings will compensate for more severe hindrances like distracting work environments, low autonomy, poor tools, unreasonable deadlines, and pointless projects. They’re wrong. However, this is one of the most important things to get right, for two reasons. The first is that programmer output is multiplicative of a number of factors– fit with tools and project, skill and experience, talent, group cohesion, and motivation. Each of these can have a major effect (plus or minus a factor of 2 at least) on impact, and engineer motivation is one that a manager can actually influence. The second is that measuring individual performance among software engineers is very hard. I would say that it’s almost impossible and in practical terms, economically infeasible. Why do I call infeasible rather than merely difficult? That’s because the only people who can reliably measure individual performance in software are so good that it’s almost never worth their time to have them doing that kind of work. If the best engineers have time to spend with their juniors, it’s more worthwhile to have them mentoring the others (which means their interests will align with the employees rather than the company trying to perform such measurement) than measuring them, the latter being a task they will resent having assigned to them.

Seasoned programmers can tell very quickly which ones are smart, capable, and skilled– the all-day, technical interviews characteristic of the most selective companies achieve that– but individual performance on-site is almost impossible to assess. Software is too complex for management to reliably separate bad environmental factors and projects from bad employees, much less willful underperformance from no-fault lack-of-fit. So measurement-based HR policies add noise and piss people off but achieve very little, because the measurements on which they rely are impossible to make with any accuracy. This means that the only effective strategy is to motivate engineers, because attempting to measure and control performance after-the-fact won’t work.

Traditional, intimidation-based management backfires in technology. To see why, consider the difference between 19th-century commodity labor and 21st-century technological work. For commodity labor, there’s a level of output one can expect from a good-faith, hard-working employee of average capability: the standard. Index that to the number 100. There are some who might run at 150-200, but often they are cutting corners or working in unsafe ways, so often the best overall performers might produce 125. (If the standard were so low that people could risklessly achieve 150, the company would raise the standard.) The slackers will go all the way down to zero if they can get away with it. In this world, one slacker cancels out four good employees, and intimidation-based management– which annoys the high performers and reduces their effectiveness, but brings the slackers in line, having a performance-middling effect across the board– can often work. Intimidation can pay off, because more is gained by intimidating the slacker into mediocrity brings more benefit than is lost. Technology is different. The best software engineers are not at 125 or even 200, but at 500, 1000, and in some cases, 5000+. Also, their response to a negative environment isn’t mere performance middling. They leave. Engineers don’t object to the firing of genuine problem employees (we end up having to clean up their messes) but typical HR junk science (stack ranking, enforced firing percentages, transfer blocks against no-fault lack-of-fit employees) disgusts us. It’s mean-spirited and it’s not how we like to do things. Intimidation doesn’t work, because we’ll quit. Intrinsic motivation is the only option.

Bonuses rarely motivate engineers either, because the bonuses given to entice engineers to put up with undesirable circumstances are often, quite frankly, two or three orders of magnitude too low. We value interesting work more than a few thousand dollars, and there are economic reasons for us doing so. First, we understand that bad projects entail a wide variety of risks. Even when our work isn’t intellectually stimulating, it’s still often difficult, and unrewarding but difficult work can lead to burnout. Undesirable projects often have a 20%-per-year burnout rate between firings, health-based attrition, project failure leading to loss of status, and just plain losing all motivation to continue. A $5,000 bonus doesn’t come close to compensating for a 20% chance of losing one’s job in a year. Additionally, there are the career-related issues associated with taking low-quality work. Engineers who don’t keep current lose ground, and this becomes even more of a problem with age. Software engineers are acutely aware of the need to establish themselves as demonstrably excellent before the age of 40, at which point mediocre engineers (and a great engineer becomes mediocre after too much mediocre experience) start to see prospects fade.

The truth is that typical HR mechanisms don’t work at all in motivating software engineers. Small bonuses won’t convince them to work differently, and firing middling performers (as opposed to the few who are actively toxic) to instill fear will drive out the best, who will flee the cultural fallout of the firings. There is no way around it: the only thing that will bring peak performance out of programmers is to actually make them happy to go to work. So what do software engineers need?

The approach I’m going to take is based on timeframes. Consider, for an aside, peoples’ needs for rest and time off. People need breaks at  work– say, 10 minutes every two hours. They also need 2 to 4 hours of leisure time each day. They need 2 to 3 days per week off entirely. They need (but, sadly, don’t often get) 4 to 6 weeks of vacation per year. And ideally, they’d have sabbaticals– a year off every 7 or so to focus on something different from the assigned work. There’s a fractal, self-similar nature to peoples’ need for rest and refreshment, and these needs for breaks tap into Maslovian needs: biological ones for the short-timeframe breaks and higher, holistic needs pertaining to the longer timeframes. I’m going to assert that something similar exists with regard to motivation, and examine six timeframes: minutes, hours, days, weeks, months, and years.

1. O(minutes): Flow

This may be the most important. Flow is a state of consciousness characterized by intense focus on a challenging problem. It’s a large part of what makes, for example, games enjoyable. It impels us toward productive activities, such as writing, cooking, exercise, and programming. It’s something we need for deep-seated psychological reasons, and when people don’t get it, they tend to become bored, bitter, and neurotic. For a word of warning, while flow can be productive, it can also be destructive if directed toward the wrong purposes. Gambling and video game addictions are probably reinforced, in part, by the anxiolytic properties of the flow state. In general, however, flow is a good thing, and the only thing that keeps people able to stand the office existence is the ability to get into flow and work.

Programming is all about flow. That’s a major part of why we do it. Software engineers get their best work done in a state of flow, but unfortunately, flow isn’t common for most. I would say that the median software engineer spends 15 to 120 minutes per week in a state of flow, and some never get into it. The rest of the time is lost to meetings, interruptions, breakages caused by inappropriate tools or bad legacy code, and context switches. Even short interruptions can easily cause a 30-minute loss. I’ve seen managers harass engineers with frequent status pings (2 to 4 per day) resulting in a collapse of productivity (which leads to more aggressive management, creating a death spiral). The typical office environment, in truth, is quite hostile to flow.

To achieve flow, engineers have to trust their environment. They have to believe that (barring an issue of very high priority and emergency) they won’t be interrupted by managers or co-workers. They need to have faith that their tools won’t break and that their environment hasn’t changed in a catastrophic way due to a mistake or an ill-advised change by programmers responsible for another component. If they’re “on alert”, they won’t get into flow and won’t be very productive. Since most office cultures grew up in the early 20th century when the going philosophy was that workers had to be intimidated or they would slack off, the result is not much flow and low productivity.

What tasks encourage or break flow is a complex question. Debugging can break flow, or it can be flowful. For me, I enjoy it (and can maintain flow) when the debugging is teaching me something new about a system I care about (especially if it’s my own). It’s rare, though, that an engineer can achieve flow while maintaining badly written code, which is a major reason why they tend to prefer new development over maintenance. Trying to understand bad software (and most in-house corporate software is terrible) creates a lot of “pinging” for the unfortunate person who has to absorb several disparate contexts in order to make sense of what’s going on. Reading good code is like reading a well-written academic paper: an opportunity to see how a problem was solved, with some obvious effort put into presentation and aesthetics of the solution. It’s actually quite enjoyable. On the other hand, reading bad code is like reading 100 paragraphs, all clipped from different sources. There’s no coherency or aesthetics, and the flow-inducing “click” (or “aha” experience) when a person makes a connection between two concepts almost never occurs. The problem with reading code is that, although good code is educational, there’s very little learned in reading bad code aside from the parochial idiosyncracies of a badly-designed system, and there’s a hell of a lot of bad code out there.

Perhaps surprisingly, whether a programmer can achieve “flow”, which will influence her minute-by-minute happiness, has almost nothing to do with the macroscopic interestingness of the project or company’s mission. Programmers, left alone, can achieve flow and be happy writing the sorts of enterprise business apps that they’re “supposed” to hate. And if their environment is so broken that flow is impossible, the most interesting, challenging, or “sexy” work won’t change that. Once, I saw someone leave an otherwise ideal machine learning quant job because of “boredom”, and I’m pretty sure his boredom had nothing to do with the quality of the work (which was enviable) but with the extremely noisy environment of a trading desk.

This also explains why “snobby” elite programmers tend to hate IDEs, the Windows operating system, and anything that forces them to use the mouse when key-combinations would suffice. Using the mouse and fiddling with windows can break flow. Keyboarding doesn’t. Of course, there are times when the mouse and GUI are superior. Web surfing is one example, and writing blog posts (WordPress instead of emacs) is another. Programming, on the other hand, is done using the keyboard, not drag-and-drop menus. The latter are a major distraction.

2. O(hours): Feedback

Flow is the essence here, but what keeps the “flow” going? The environmental needs are discussed above, but some sorts of work are more conducive to flow than others. People need a quick feedback cycle. One of the reasons that “data science” and machine learning projects are so highly desired is that there’s a lot of feedback, in comparison to enterprise projects which are developed in one world over months (with no real-world feedback) and released in another. It’s objective, and it comes on a daily basis. You can run your algorithms against real data and watch your search strategies unfold in front of you while your residual sum of squares (error) decreases.

Feedback needs to be objective or positive in order to keep people enthusiastic about their work. Positive feedback is always pleasant, so long as it’s meaningful. Objective, negative feedback can be useful as well. For example, debugging can be fun, because it points out one’s own mistakes and enables a person to improve the program. The same holds for problems that turn out to be more difficult than originally expected: it’s painful, but something is learned. What never works well is subjective, negative feedback (such as bad performance reviews, or aggressive micromanagement that shows a lack of trust). That pisses people off.

I think it should go without saying that this style of “feedback” can’t be explicitly provided on an hourly basis, because it’s unreasonable to expect managers to do that much work (or an employee do put up with such a high frequency of managerial interruption). So, the feedback has to come organically from the work itself. This means there need to be genuine challenges and achievements involved. So most of this feedback is “passive”, by which I mean there is nothing the company or manager does to inject the feedback into the process. The engineer’s experience completing the work provides the feedback itself.

One source of frustration and negative feedback that I consider subjective (and therefore place in that “negative, subjective feedback that pisses people off” category) is the jarring experience of working with badly-designed software. Good software is easy to use and makes the user feel more intelligent. Bad software is hard to use, often impossible to read at the source level, and makes the user or reader feel absolutely stupid. When you have this experience, it’s hard to tell if you are rejecting the ugly code (because it is distasteful) or if it is rejecting you (because you’re not smart enough to understand it). Well, I would say that it doesn’t matter. If the code “rejects” competent developers, it’s shitty code. Fix it.

The “feedback rate” is at the heart of many language debates. High-productivity languages like Python, Scala, and Clojure, allow programmers to implement significant functionality in mere hours. On my best projects, I’ve written 500 lines of good code in a day (by the corporate standard, that’s about two months of an engineer’s time). That provides a lot of feedback very quickly and establishes a virtuous cycle: feedback leads to engagement, which leads to flow, which leads to productivity, which leads to more feedback. With lower-level languages like C and Java– which are sometimes the absolute right tools to use for one’s problem, especially when tight control of performance is needed– macroscopic progress is usually a lot slower. This isn’t an issue, if the performance metric the programmer cares about lives at a lower level (e.g. speed of execution, limited memory use) and the tools available to her are giving good indications of her success. Then there is enough feedback. There’s nothing innate that makes Clojure more “flow-ful” than C; it’s just more rewarding to use Clojure if one is focused on macroscopic development, while C is more rewarding (in fact, probably the only language that’s appropriate) when one is focused on a certain class of performance concerns that require attention to low-level details. The problem is that when people use inappropriate tools (e.g. C++ for complex, but not latency-sensitive, web applications) they are unable to get useful feedback, in a timely manner, about the performance of their solutions.

Feedback is at the heart of the “gameification” obsession that has grown up of late, but in my opinion, it should be absolutely unnecessary. “Gameifcation” feels, to me, like an after-the-fact patch if not an apology, when fundamental changes are necessary. The problem, in the workplace, is that these “game” mechanisms often evolve into high-stakes performance measurements. Then there is too much anxiety for the “gameified” workplace to be fun.

In Java culture, the feedback issue is a severe problem, because development is often slow and the tools and culture tend to sterilize the development process by eliminating that “cosmic horror” (which elite programmers prefer) known as the command line. While IDEs do a great job of reducing flow-breakage that occurs for those unfortunate enough to be maintaining others’ code, they also create a world in which the engineers are alienated from computation and problem-solving. They don’t compile, build, or run code; they tweak pieces of giant systems that run far away in production and are supported by whoever drew the short straw and became “the 3:00 am guy”.

IDEs have some major benefits but some severe drawbacks. They’re good to the extent that they allow people to read code without breaking flow; they’re bad to the extent that they tend to require use patterns that break flow. The best solution, in my opinion, to the IDE problem is to have a read-only IDE served on the web. Engineers write code using a real editor, work at the command line so they are actually using a computer instead of an app, and do almost all of their work in a keyboard-driven environment. However, when they need to navigate others’ code in volume, the surfing (and possibly debugging) capabilities offered by IDEs should be available to them.

3. O(days): Progress

Flow and feedback are nice, but in the long term, programmers need to feel like they’re accomplishing something, or they’ll get bored. The feedback should show continual improvement and mastery. The day scale is the level at which programmers want to see genuine improvements. The same task shouldn’t be repeated more than a couple times: if it’s dull, automate it away. If the work environment is so constrained and slow that a programmer can’t log, on average, one meaningful accomplishment (feature added, bug fixed) per day, something is seriously wrong.  (Of course, most corporate programmers would be thrilled to fix one bug per week.)

The day-by-day level and the need for a sense of progress is where managers and engineers start to understand each other. They both want to see progress on a daily basis. So there’s a meeting point there. Unfortunately, managers have a tendency to pursue this in a counterproductive way, often inadvertently creating a Heisenberg problem (observation corrupts the observed) in their insistence on visibility into progress. I think that the increasing prevalence of Jira, for example, is dangerous, because increasing managerial oversight at a fine-grained level creates anxiety and makes flow impossible. I also think that most “agile” practices do more harm than good, and that much of the “scrum” movement is flat-out stupid. I don’t think it’s good for managers to expect detailed progress reports (a daily standup focused on blockers is probably ok) on a daily basis– that’s too much overhead and flow breakage– but this is the cycle at which engineers tend to audit themselves, and they won’t be happy in an environment where they end the day not feeling that they worked a day.

4. O(weeks): Support.

Progress is good, but as programmers, we tend toward a trait that the rest of the world sees only in the moody and blue: “depressive realism”. It’s as strong a tendency in the mentally healthy among us as the larger-than-baseline percentage who have mental health issues. For us, it’s not depressive. Managers are told every day how awesome they are by their subordinates, regardless of the fact that more than half of the managers in the world are useless. We, on the other hand, have subordinates (computers) that frequently tell us that we fucked up by giving them nonsensical instructions. “Fix this shit because I can’t compile it.” We tend to have an uncanny (by business standards) sense of our own limitations. We also know (on the topic of progress) that we’ll have good days and bad days. We’ll have weeks where we don’t accomplish anything measurable because (a) we were “blocked”, needing someone else to complete work before we could continue, (b) we had to deal with horrendous legacy code or maintenance work– massive productivity destroyers– or (c) the problem we’re trying to solve is extremely hard, or even impossible, but it took us a long time to fairly evaluate the problem and reach that conclusion.

Programmers want an environment that removes work-stopping issues, or “blockers”, and that gives them the benefit of the doubt. Engineers want the counterproductive among them to be mentored or terminated– the really bad ones just have to be let go– but they won’t show any loyalty to a manager if they perceive that he’d give them grief over a slow month. This is why so-called “performance improvement plans” (PIPs)– a bad idea in any case– are disastrous failures with engineers. Even the language is offensive, because it suggests with certainty that an observed productivity problem (and most corporate engineers have productivity problems because most corporate software environments are utterly broken and hostile to productivity) is a performance problem, and not something else. An engineer will not give one mote of loyalty to a manager that doesn’t give her the benefit of the doubt.

I choose “weeks” as the timeframe order of magnitude for this need because that’s the approximate frequency with which an engineer can be expected to encounter blockers, and removal of these is one thing that engineers often need from their managers: resolution of work-stopping issues that may require additional resources or (in rare cases) managerial intervention. However, that frequency can vary dramatically.

5. O(months): Career Development.

This is one that gets a bit sensitive, and it becomes crucial on the order of months, which is much sooner than most employers would like to see their subordinates insisting on career advancement, but, as programmers, we know we’re worth an order of magnitude more than we’re paid, and we expect to be compensated by our employers investing in our long-term career interests. This is probably the most important of the 6 items listed here.

Programmers face a job market that’s unusually meritocratic when changing jobs. Within companies, the promotion process is just as political and bizarre as it is for any other profession, but when looking for a new job, programmers are evaluated not on their past job titles and corporate associations, but on what they actually know. This is quite a good thing overall, because it means we can get promotions and raises (often having to change corporate allegiance in order to do so, but that’s a minor cost) just by learning things, but it also makes for an environment that doesn’t allow for intellectual stagnation. Yet most of the work that software engineers have to do is not very educational and, if done for too long, that sort of work leads in the wrong direction.

When programmers say about their jobs, “I’m not learning”, what they often mean is, “The work I am getting hurts my career.” Most employees in most jobs are trained to start asking for career advancement at 18 months, and to speak softly over the first 36. Most people can afford one to three years of dues paying. Programmers can’t. Programmers, if they see a project that can help their career and that is useful to the firm, expect the right to work on it right away. That rubs a lot of managers the wrong way, but it shouldn’t, because it’s a natural reaction to a career environment that requires actual skill and knowledge. In most companies, there really isn’t a required competence for leadership positions, so seniority is the deciding factor. Engineering couldn’t be more different, and the lifetime cost of two years’ dues-paying can be several hundred thousand dollars.

In software, good projects tend to beget good projects, and bad projects beget more crap work. People are quickly typecast to a level of competence based on what they’ve done, and they have a hard time upgrading, even if their level of ability is above what they’ve been assigned. People who do well on grunt work get more of it, people who do poorly get flushed out, and those who manage their performance precisely to the median can get ahead, but only if managers don’t figure out what they’re doing. As engineers, we understand the career dynamic very well, and quickly become resentful of management that isn’t taking this to heart. We’ll do an unpleasant project now and then– we understand that grungy jobs need to be done sometimes– but we expect to be compensated (promotion, visible recognition, better projects) for doing it. Most managers think they can get an undesirable project done just by threatening to fire someone if the work isn’t done, and that results in adverse selection. Good engineers leave, while bad engineers stay, suffer, and do it– but poorly.

Career-wise, the audit frequency for the best engineers is about 2 months. In most careers, people progress by putting in time, being seen, and gradually winning others’ trust, and actual skill growth is tertiary. That’s not true for us, or at least, not in the same way. We can’t afford to spend years paying dues while not learning anything. That will put us one or two full technology stacks behind the curve with respect to the outside world.

There’s a tension employees face between internal (within a company) and external (within their industry) career optimization. Paying dues is an internal optimization– it makes the people near you like you more, and therefore more likely to offer favors in the future– but confers almost no external-oriented benefit. It was worthwhile in the era of the paternalistic corporation, lifelong employment, and a huge stigma attached to changing jobs (much less getting fired) more than two or three times in one career. It makes much less sense now, so most people focus on the external game. Engineers who focus on the external objective are said to be “optimizing for learning” (or, sometimes, “stealing an education” from the boss). There are several superiorities to a focus on the external career game. First, external career advancement is not zero-sum–while jockeying internally for scarce leadership positions is– and what we do is innately cooperative. It works better with the type of people we are. Second, our average job tenure is about 2 to 3 years. Third, people who suffer and pay dues are usually passed over anyway in favor of more skilled candidates from outside. Our industry has figured out that it needs skilled people more than it needs reliable dues-payers (and it’s probably right). So this explains, in my view, why software engineers are so aggressive and insistent when it comes to the tendency to optimize for learning.

There is a solution for this, and although it seems radical, I’m convinced that it’s the only thing that actually works: open allocation. If programmers are allowed to choose the projects best suited to their skills and aspirations, the deep conflict of interest that otherwise exists among their work, their careers, and their educational aspirations will disappear.

6. O(years): Macroscopic goals. On the timescale of years, macroscopic goals become important. Money and networking opportunities are major concerns here. So are artistic and business visions. Some engineers want to create the world’s best video game, solve hard mathematical problems, or improve the technological ecosystem. Others want to retire early or build a network that will set them up for life.

Many startups focus on “change the world” macroscopic pitches about how their product will connect people, “disrupt” a hated industry or democratize a utility, or achieve some world-changing ambition. This makes great marketing copy for recruiters, but it doesn’t motivate people on the day-to-day basis. On the year-by-year basis, none of that marketing matters, because people will actually know the character of the organization after that much time. That said, the actual macroscopic character, and the meaning of the work, of a business matters a great deal. Over years and decades, it determines whether people will stick around once they develop the credibility, connections, and resources that would give them the ability to move on to something else more lucrative, more interesting, or of higher status.

How to win

It’s conventional wisdom in software that hiring the best engineers is an arbitrage, because they’re 10 times as effective but only twice as costly. This is only true if they’re motivated, and also if they’re put to work that unlocks their talent. If you assign a great engineer to mediocre work, you’re going to lose money. Software companies put an enormous amount of effort into “collecting” talent, but do a shoddy job of using or keeping it. Often, this is justified in the context of a “tough culture”; turnover is reflective of failing employees, not a bad culture. In the long term, this is ruinous. The payoff in retaining top talent is often exponential as a function of the time and effort put into attracting, retaining, and improving it.

Now that I’ve discussed what engineers need from their work environments in order to remain motivated, the next question is what a company should do. There isn’t a one-size-fits-all managerial solution to this. In most cases, the general best move is to reduce managerial control and to empower engineers: to set up an open-allocation work environment in which technical choices and project direction are set by engineers, and to direct from leadership rather than mere authority. This may seem “radical” in contrast to the typical corporate environment, but it’s the only thing that works.

Tech companies: open allocation is your only real option.

I wrote, about a month ago, about Valve’s policy of allowing employees to transfer freely within the company, symbolized by placing wheels under the desk (thereby creating a physical marker of their superior corporate culture that makes traditional tech perks look like toys) and expecting employees to self-organize. I’ve taken to calling this seemingly radical notion open allocation– employees have free rein to work on projects as they choose, without asking for permission or formal allocation– and I’m convinced that, despite seeming radical, open allocation is the only thing that actually works in software. There’s one exception. Some degree of closed allocation is probably necessary in the financial industry because of information barriers (mandated by regulators) and this might be why getting the best people to stay in finance is so expensive. It costs that much to keep good people in a company where open allocation isn’t the norm, and where the workflow is so explicitly directed and constrained by the “P&L” and by justifiable risk aversion. If you can afford to give engineers 20 to 40 percent raises every year and thereby compete with high-frequency-trading (HFT) hedge funds, you might be able to retain talent under closed allocation. If not, read on.

Closed allocation doesn’t work. What do I mean by “doesn’t work”? I mean that, as things currently go in the software industry, most projects fail. Either they don’t deliver any business value, or they deliver too little, or they deliver some value but exert long-term costs as legacy vampires. Most people also dislike their assigned projects and put minimal or even negative productivity into them. Good software is exceedingly rare, and not because software engineers are incompetent, but because when they’re micromanaged, they stop caring. Closed allocation and micromanagement provide an excuse for failure: I was on a shitty project with no upside. I was set up to fail. Open allocation blows that away: a person who has a low impact because he works on bad projects is making bad choices and has only himself to blame.

Closed allocation is the norm in software, and doesn’t necessarily entail micromanagement, but it creates the possibility for it, because of the extreme advantage it gives managers over engineers. An engineer’s power under closed allocation is minimal: his one bit of leverage is to change jobs, and that almost always entails changing companies. In a closed-allocation shop, project importance is determined prima facie by executives long before the first line of code is written, and formalized in magic numbers called “headcount” (even the word is medieval, so I wonder if people piss at the table, at these meetings, in order to show rank) that represent the hiring authority (read: political strength) of various internal factions. The intention of headcount numbers is supposed to be to prevent reckless hiring by the company on the whole, and that’s an important purpose, but their actual effect is to make internal mobility difficult, because most teams would rather save their headcount for possible “dream hires” who might apply from outside in the future, rather than risk a spot on an engineer with an average performance review history (which is what most engineers will have). Headcount bullshit makes it nearly impossible to transfer unless (a) someone likes you on a personal basis, or (b) you have a 90th-percentile performance review history (in which case you don’t need a transfer). Macroscopic hiring policies (limits, and sometimes freezes) are necessary to prevent the company from over-hiring, but internal headcount limits are one of the worst ideas ever. If people want to move, and the leads of those projects deem them qualified, there’s no reason not to allow this. It’s good for the engineers and for the projects that have more motivated people working on them.

When open allocation is in play, projects compete for engineers, and the result is better projects. When closed allocation is in force, engineers compete for projects, and the result is worse engineers. 

When you manage people like children, that’s what they become. Traditional, 20th-century management (so-called “Theory X”) is based on the principle that people are lazy and need to be intimidated into working hard, and that they’re unethical and need to be terrified of the consequences of stealing from the company, with a definition of “stealing” that includes “poaching” clients and talent, education on company time, and putting their career goals over the company’s objectives. In this mentality, the only way to get something decent out of a worker is to scare him by threatening to turn off his income– suddenly and without appeal. Micromanagement and Theory X are what I call the Aztec Syndrome: the belief in many companies that if there isn’t a continual indulgence in sacrifice and suffering, the sun will stop rising.

Psychologists have spent decades trying to answer the question, “Why does work suck?” The answer might be surprising. People aren’t lazy, and they like to work. Most people do not dislike the activity of working, but dislike the subordinate context (and closed allocation is all about subordination). For example, peoples’ minute-by-minute self-reported happiness tends to drop precipitously when they arrive at the office, and rise when they leave it, but it improves once they start actually working. They’re happier not to be at an office, but if they’re in an office, they’re much happier when working than when idle. (That’s why workplace “goofing off” is such a terrible idea; it does nothing for office stress and it lengthens the day.) People like work. It’s part of who we are. What they don’t like, and what enervates them, is the subordinate context and the culturally ingrained intimidation. This suggests the so-called “Theory Y” school of management, which is that people are intrinsically motivated to work hard and do good things, and that management’s role is to remove obstacles.

Closed allocation is all about intimidation: if you don’t have this project, you don’t have a job. Tight headcount policies and lockout periods make internal mobility extraordinarily difficult– much harder than getting hired at another company. The problem is that intimidation doesn’t produce creativity and it erodes peoples’ sense of ethics (when people are under duress, they feel less responsible for what they are doing). It also provides the wrong motivation: the goal becomes to avoid getting fired, rather than to produce excellent work.

Also, if the only way a company can motivate people to do a project is to threaten to turn off a person’s income, that company should really question whether that project’s worth doing at all.

Open allocation is not the same thing as “20% time”, and it isn’t a “free-for-all”. Open allocation does not mean “everyone gets to do what they want”. A better way to represent it is: “Lead, follow, or get out of the way” (and “get out of the way” means “leave the company”). To lead, you have to demonstrate that your product is of value to the business, and convince enough of your colleagues to join your project that it has enough effort behind it to succeed. If your project isn’t interesting and doesn’t have business value, you won’t be able to convince colleagues to bet their careers on it and the project won’t happen. This requires strong interpersonal skills and creativity. Your colleagues decide, voting with their feet, if you’re a leader, not “management”. If you aren’t able to lead, then you follow, until you have the skill and credibility to lead your own project. There should be no shame in following; that’s what most people will have to do, especially when starting out.

“20% time” (or hack days) should be exist as well, but that’s not what I’m talking about. Under open allocation, people are still expected to show that they’ve served the needs of the business during their “80% time”. Productivity standards are still set by the projects, but employees choose which projects (and sets of standards) they want to pursue. Employees unable to meet the standards of one project must find another one. 20% time is more open, because it entails permission to fail. If you want to do a small project with potentially high impact, or to prove that you have the ability to lead by starting a skunk-works project, or volunteer, take courses, or attend conferences on company time, that’s what it’s for. During their “80% time”, people are still expected to lead or follow on a project with some degree of sanction. They can’t just “do whatever they want”.

Four types of projects. The obvious question that open allocation raises is, “Who does the scut work?” The answer is simple: people do it if they will get promoted, formally or informally, for doing it, or if their project directly relies on it. In other words, the important but unpleasant work gets done, by people who volunteer to do it. I want to emphasize “gets done”. Under closed allocation, a lot of the unpleasant stuff never really gets done well, especially if unsexy projects don’t lead to promotions, because people are investing most of their energy into figuring out how to get to better projects. The roaches are swept under the carpet, and people plan their blame strategies months in advance.

If we classify projects into four categories by important vs. unimportant, and interesting vs. unpleasant, we can assess what happens under open allocation. Important and interesting projects are never hard to staff. Unimportant but interesting projects are for 20% time; they might succeed, and become important later, but they aren’t seen as critical until they’re proven to have real business value, so people are allowed to work on them but are strongly encouraged to also find and concentrate on work that’s important to the business. Important but unpleasant projects are rewarded with bonuses, promotions, and the increased credibility accorded to those who do undesirable but critical work. These bonuses should be substantial (six and occasionally even seven figures for critical legacy rescues); if the project is actually important, it’s worth it to actually pay. If it’s not, then don’t spend the money. Unimportant and unpleasant projects, under open allocation, don’t get done. That’s how it should be. This is the class of undesirable, “death march” projects that closed-allocation nurtures (they never go away, because to suggest they aren’t worth doing is an affront to the manager that sponsors them and a career-ending move) but that open allocation eliminates. Under open allocation, people who transfer away from these death marches aren’t “deserters”. It’s management’s fault if, out of a whole company, no one wants to work on the project. Either the project’s not important, or they didn’t provide enough enticement.

Closed allocation is irreducibly political. Compare two meanings of the three-word phrase, “I’m on it”. In an open-allocation shop, “I’m on it” is a promise to complete a task, or at least to try to do it. It means, “I’ve got this.” In a closed-allocation shop, “I’m on it” means “political forces outside of my control require me to work only on this project.”

People complain about the politics at their closed-allocation jobs, but they shouldn’t, because it’s inevitable that politics will eclipse the matter of actually getting work done. It happens every time, like clockwork. The metagame becomes a million times more important than actually sharpening pencils or writing code. If you have closed allocation, you’ll have a political rat’s nest. There’s no way to avoid it. In closed allocation, the stakes of project allocation are so high that people are going to calculate every move based on future mobility. Hence, politics. What tends to happen is that a four-class system emerges, resulting from the four categories of work that I developed above. The most established engineers, who have the autonomy and leverage to demand the best projects, end up in the “interesting and important” category. They get good projects the old-fashioned way: proving that they’re valuable to the company, then threatening to leave if they aren’t reassigned. Engineers who are looking for promotions into managerial roles tend to take on the unpleasant but important work, and attempt to coerce new and captive employees into doing the legwork. The upper-middle class of engineers can take the interesting but unimportant work, but it tends to slow their careers if they intend to stay at the same company (they learn a lot, but they don’t build internal credibility). The majority and the rest, who have no significant authority over what they work on, get a mix, but a lot of them get stuck with the uninteresting, unimportant work (and closed-allocation shops generate tons of that stuff) that exists for reasons rooted in managerial politics.

What are the problems with open allocation? The main issue with open allocation is that it seems harder to manage, because it requires managers to actively motivate people to do the important but unpleasant work. In closed allocation, people are told to do work “because I said so”. Either they do it, or they quit, or they get fired. It’s binary, which seems simple. There’s no appeal process when people fail projects or projects fail people– and no one ever knows which happened– and extra-hierarchical collaboration is “trimmed”, and efforts can be tracked by people who think a single spreadsheet can capture everything important about what is happening in the company. Closed-allocation shops have hierarchy and clear chains of command, and single-points-of-failure (because a person can be fired from a whole company for disagreeing with one manager) out the proverbial wazoo. They’re Soviet-style command economies that somehow ended up being implemented within supposedly “capitalist” companies, but they “appear” simple to manage, and that’s why they’re popular. The problem with closed allocation policies is that they lead to enormous project failure rates, inefficient allocation of time, talent bleeds, and unnecessary terminations. In the long term, all of this unplanned and surprising garbage work makes the manager’s job harder, more complex, and worse. When assessing the problems associated with open allocation (such as increased managerial complexity) it’s important to consider that the alternative is much worse.

How do you do it? The challenging part of open allocation is enticing people to do unpleasant projects. There needs to be a reward. Make the bounty too high, and people come in with the wrong motivations (capturing the outsized reward, rather than getting a fair reward while helping the company) and the perverse incentives can even lead to “rat farming” (creating messes in the hopes of being asked to repair them at a premium). Make it too low, and no one will do it, because no one wise likes a company well enough to risk her own career on a loser project (and part of what makes a bad project bad is that, absent recognition, it’s career-negative to do undesirable work). Make the reward too monetary and it looks bad on the balance sheet, and gossip is a risk: people will talk if they find out a 27-year-old was paid $800,00o in stock options (note: there had better be vesting applied) even if it’s justified in light of the legacy dragon being slain. Make it too career-focused and you have people getting promotions they might not deserve, because doing unpleasant work doesn’t necessarily give a person technical authority in all areas. It’s hard to get the carrot right. The appeal of closed allocation is that the stick is a much simpler tool: do this shit or I’ll fire you.

The project has to be “packaged”. It can’t be all unpleasant and menial work, and it needs to be structured to involve some of the leadership and architectural tasks necessary for the person completing it to actually deserve the promised promotion. It’s not “we’ll promote you because you did something grungy” but “if you can get together a team to do this, you’ll all get big bonuses, and you’ll get a promotion for leading it.” Management also needs to have technical insight on hand in order to do this: rather than doing grunt work as a recurring cost, kill it forever with automation.

An important notion in all this is that of a committed project. Effectively, this is what the executives should create if they spot a quantum of work that the business needs but that is difficult and does not seem to be enjoyable in the estimation of the engineers. These shouldn’t be created lightly. Substantial cash and stock bonuses (vested, over the expected duration of the project) and promotions are associated with completing these projects, and if more than 25% of the workload is committed projects, something’s being done wrong. A committed project offers high visibility (it’s damn important; we need this thing) and graduation into a leadership role. No one is “assigned” to a committed project. People “step up” and work on them because of the rewards. If you agree to work on a committed project, you’re expected to make a good-faith effort to see it through for an agreed-upon period of time (typically, a year). You do it no matter how bad it gets (unless you’re incapable) because that’s what leadership is. You should not “flake out” because you get bored. Your reputation is on the line.

Companies often delegate the important but undesirable work in an awkward way. The manager gets a certain credibility for taking on a grungy project, because he’s usually at a level where he has basic autonomy over his work and what kinds of projects he manages. If he can motivate a team to accomplish it, he gets a lot of credit for taking on the gnarly task. The workers, under closed allocation, get zilch. They were just doing their jobs. The consequence of this is that a lot of bodies end up buried by people who are showing just enough presence to remain in good standing, but putting the bulk of their effort into moving to something better. Usually, it’s new hires without leverage who get staffed on these bad projects.

I’d take a different approach to committed projects. Working on one requires (as the name implies) commitment. You shouldn’t flake out because something more attractive comes along. So only people who’ve proven themselves solid and reliable should be working on (much less leading) them. To work on one (beyond a 20%-time basis) you have to have been at the company for at least a year, senior enough for the leadership to believe that you have the ability to deliver, and in strong standing at the company. Unless hired at senior roles, I’d never let a junior hire take on a committed project unless it was absolutely required– too much risk.

How do you fire people? When I was in school, I enjoyed designing and playing with role-playing systems. Modeling a fantasy world is a lot of fun. Once I developed an elaborate health mechanic that differentiated fatigue, injury, pain, blood loss, and “magic fatigue” (which affected magic users) and aggregated them (determining attribute reductions and eventual incapacitation) in what I considered to be a novel way. One small detail I didn’t include was death, so the first question I got was, “How do you die?” Of course, blood loss and injuries could do it. In a no-magic, medieval world, loss of the head is an incapacitating and irreversible injury, and exsanguination is likewise. However, in a high-magic world, “death” is reversible. Getting roasted, eaten and digested by a dragon might be reversible. But there has to be a possibility (though it doesn’t require a dedicated game mechanic) for a character to actually die in the permanent, create-a-new-character sense of the word. Otherwise there’s no sense of risk in the game: it’s just rolling dice to see how fast you level up. My answer was to leave that decision to the GM. In horror campaigns, senseless death (and better yet, senseless insanity) is part of the environment. It’s a world in which everything is trying to kill you and random shit can end your quest. But in high-fantasy campaigns with magic and cinematic storylines, I’m averse to characters being “killed by the dice”. If the character is at the end of his story arc, or does something inane like putting his head in a dragon’s mouth because he’s level 27 and “can’t be killed”, then he dies for real. Not “0 hit points”, but the end of his earthly existence. But he shouldn’t die because the player is hapless enough to roll 4 “1″s in a row on a d10. Shit happens.

The major problem with “rank and yank” (stack-ranking with enforced culling rates) and especially closed allocation is that a lot of potentially great employees are killed by the dice. It becomes part of the rhythm of the company for good people to get inappropriate projects or unfair reviews, blow up mailing lists or otherwise damage morale when it pisses them off, then get fired or quit in a huff. Yawn… another one did that this week. As I alluded in my Valve essay, this is the Welch Effect: the ones who get fired under rank-and-yank policies are rarely low performers, but junior members of macroscopically underperforming teams (who rarely have anything to do with this underperformance). The only way to enforce closed allocation is to fire people who fail to conform to it, but this also means culling the unlucky whose low impact (for which they may not be at fault) appears like malicious noncompliance.

Make no mistake: closed allocation is as much about firing people as guns are about killing people. If people aren’t getting fired, many will work on what they want to anyway (ignoring their main projects) and closed allocation has no teeth. In closed allocation shops, firings become a way for the company to clean up its messes. “We screwed this guy over by putting him on the wrong project; let’s get rid of him before he pisses all over morale.” Firings and pseudo-firings (“performance improvement plans” and transfer blocks and intentional dead-end allocations) become common enough that they’re hard to ignore. People see them, and that they sometimes happen to good people. And they scare people, especially because the default in non-financial tech companies is to fire quickly (“fail fast”) and without severance. It’s a really bad arrangement.

Do open-allocation shops have to fire people? The answer is an obvious “yes”, but it should be damn rare. The general rule of good firing is: mentor subtracters, fire dividers. Subtracters are good-faith employees who aren’t pulling their weight. They try, but they’re not focused or skilled enough to produce work that would justify keeping them on the payroll. Yet. Most employees start as subtractors, and the amount of time it takes to become an adder varies. Most companies try to set guidelines for how long an employee is allowed to take to become an adder (usually about 6 months). I’d advise against setting a firm timeframe, because what’s important is now how fast a person has learned (she might have had a rocky start) but how fast, and more importantly how well, she can learn.

Subtracters are, except in an acute cash crisis when they must be laid off for business reasons, harmless. They contribute microscopically to the burn rate, but they’re usually producing some useful work, and getting better. They’ll be adders and multipliers soon. Dividers are the people who make whole teams (or possibly the whole company) less productive. Unethical people are dividers, but so are people whose work is of so low quality that messes are created for others, and people whose outsized egos produce conflicts. Long-term (18+ months) subtractors become “passive” dividers because of their morale effects, and have to be fired for the same reason. Dividers smash morale, and they’re severe culture threats. No matter how rich your company is and how badly you may want not to fire people, you have to get rid of dividers if they don’t reform immediately. Dividers ratchet up their toxicity until they are capable of taking down an entire company. Firing can be difficult because many dividers shine as individual contributors (“rock stars”) but taketh away in their effects on morale, but there’s no other option.

My philosophy of firing is that the decision should be made rarely, swiftly, for objective reasons, and with a severance package sufficient to cover the job search (unless the person did something illegal or formally unethical) that includes non-disclosure, non-litigation and non-disparagement. This isn’t about “rewarding failure”. It’s about limiting risk. When you draft “performance improvement plans” to justify termination without severance, you’re externalizing the cost to people who have to work with a divider who’s only going to get worse post-PIP. Companies escort fired employees out of the building, which is a harsh but necessary risk-limiting measure; but it’s insane to leave a PIP’d employee in access for two months. Moreover, when you cold-fire someone, you’re inviting disparagement, gossip, and lawsuits. Just pay the guy to go away. It’s the cheapest and lowest-variance option. Three months of severance and you never see the guy again. Good. Six months and you he speaks highly of you and your company: he had a rocky time, you took care of him, and he’s (probably) better-off now. (If you’re tight on money, which most startups are, stay closer to the 3-month mark. You need to keep expenses low more than you need fired employees to be your evangelists. If you’re really tight, replace the severance with a “gardening leave” package that continues his pay only until he starts his next job.)

If you don’t fire dividers, you end up with something that looks a lot like closed allocation. Dividers can be managers (a manager can only be a multiplier or divider, and in my experience, at least half are dividers) or subordinates, but dividers tend to intimidate. Subordinate passive dividers intimidate through non-compliance (they won’t get anything done) while active dividers either use interpersonal aggression or sabotage to threaten or upset people (often for no personal gain). Managerial (or proto-managerial) dividers tend to threaten career adversity (including bad reviews, negative gossip, and termination) in order to force people to put the manager’s career goals above their own. They can’t motivate through leadership, so they do it using intimidation and (if available) authority, and they draw people into captivity to get done the work they want, without paying for it on a fair market (i.e. providing an incentive to do the otherwise undesirable work). At this point, what you have is a closed-allocation company. What this means is that open allocation has to be protected: you do it by firing the threats.

If I were running a company, I think I’d have a 70% first-year “firing” (by which I mean removal from management; I’d allow lateral moves into IC roles for those who desired to do so) rate for titled managers. By “titled manager”, I mean someone with the authority and obligation to participate in dispute resolution, terminations and promotions, and packaging committed projects. Technical leadership opportunities would be available to anyone who could convince people to follow them, but to be a titled people manager you’d have to pass a high bar. (You’d have to be as good at it as I would be, and for 70 to 80 percent of the managers I’ve observed, I’d do a better job.) This high attrition rate would be offset by a few cultural factors and benefits. First, “failing” in the management course wouldn’t be stigmatized because it would be well-understood that most people either end it voluntarily, or aren’t asked to continue. People would be congratulated for trying out, and they’d still be just as eligible to lead projects– if they could convince others to follow. Second, those who aspired specifically to people-management and weren’t selected would be entitled (unless fully terminated for doing something unethical or damaging) to a six-month leave period in which they’d be permitted to represent themselves as employed. That’s what B+ and A- managers would get– the right to remain as individual contributors (at the same rank and pay) and, if they didn’t want that, a severance offer along with a strong reference if they wished to pursue people management in other companies– but not at this one.

Are there benefits to closed allocation? I can answer this with strong confidence. No, not in typical technology companies. None exist. The work that people are “forced” to do is of such low quality that, on balance, I’d say it provides zero expectancy. In commodity labor, poorly motivated employees are about half as productive as average ones, and the best are about twice as productive. Intimidating the degenerate slackers into bringing themselves up to 0.5x from zero makes sense. In white-collar work and especially in technology, those numbers seem to be closer to -5 and +20, not 0.5 and 2.

You need closed (or at least controlled) allocation over engineers if there is material proprietary information where even superficial details would represent, if divulged, an unacceptable breach: millions of dollars lost, company under existential threat, classified information leaked. You impose a “need-to-know” system over everything sensitive. However, this most often requires keeping untrusted, or just too many people, out of certain projects (which would be designated as committed projects under open allocation). It doesn’t require keeping people stuck on specific work. Full-on closed allocation is only necessary when there are regulatory requirements that demand it (in some financial cases) or extremely sensitive proprietary secrets involved in most of the work– and comments in public-domain algorithms don’t count (statistical arbitrage strategies do).

What does this mean? Fundamentally, this issue comes down to a simple rule: treat employees like adults, and that’s what they’ll be. Investment banks and hedge funds can’t implement total open allocation, so they make up the difference through high compensation (often at unambiguously adult levels) and prestige (which enables lateral promotions for those who don’t move up quickly). On the other hand, if you’re a tiny startup with 30-year-old executives, you can’t afford banking bonuses, and you don’t have the revolving door into $400k private equity and hedge fund positions that the top banks do, so employee autonomy (open allocation) is the only way for you to do it. If you want adults to work for you, you have to offer autonomy at a level currently considered (even in startups) to be extreme.

If you’re an engineer, you should keep an eye out for open-allocation companies, which will become more numerous as the Valve model proves itself repeatedly and all over the place (it will, because the alternative is a ridiculous and proven failure). Getting good work will improve your skills and, in the long run, your career. So work for open-allocation shops if you can. Or, you can work in a traditional closed-allocation company and hope you get (and continue to get) handed good projects. That means you work for (effectively, if not actually) a bank or a hedge fund, and that’s fine, but you should expect to be compensated accordingly for the reduction in autonomy. If you work for a closed-allocation ad exchange, you’re a hedge-fund trader and you deserve to be paid like one.

If you’re a technology executive, you need to seriously consider open allocation. You owe it to your employees to treat them like adults, and you’ll be pleasantly surprised to find that that’s what they become. You also owe it to your managers to free them from the administrative shit-work (headcount fights, PIPs and terminations) that closed allocation generates. Finally, you owe it to yourself; treat yourself to a company whose culture is actually worth caring about.

XWP vs. JAP

The software industry is a fascinating place. As programmers, we have the best and worst job in the world. What we do is so rewarding and challenging that many of us have been doing it for free since we were eight. We’re paid to think, and to put pure logic into systems that effectively do our bidding. And yet, we have (as a group) extremely low job satisfaction. We don’t last long: our half-life is about six years. By 30, most of us have decided that we want to do something else: management, quantitative finance, or startup entrepreneurship. It’s not programming itself that drives us out, but the ways in which the “programmer” job has been restructured out of our favor. It’s been shaved down, mashed, melted and molded into commodity grunt work, except for the top 2 percent or so of our field (for whom as much time is spent establishing that one is, in fact, in the top 2 percent, as is spent working). Most of us have to sit in endless meetings, follow orders that make no sense, and maintain legacy code with profanity such as “VisitorFactoryFactory” littered about, until we move “into management”, often landing in a role that is just as tedious but carries (slightly) more respect.

I’m reaching a conclusion, and it’s not a pleasant one, about our industry and what one has to do to survive it. My definition of “survive” entails progress, because while it’s relatively easy to coast, engineers who plateau are just waiting to get laid off, and will usually find that demand for their (increasingly out of date) skills has declined. Plainly put, there’s a decision that programmers have to make if they want to get better. Why? Because you only get better if you get good projects, and you only get good projects if you know how to play the game. Last winter, I examined the trajectory of software engineers, and why it seems to flat-line so early. The conclusion I’ve come to is that there are several ceilings, three of which seem visible and obvious, and each requires a certain knack to get past it. Around 0.7 to 0.8 there’s the “weed out” effect that’s  rooted in intellectual limitations: inability to grasp pointers, recursion, data in sets, or other intellectual concepts people need to understand if they’re going to be adequate programmers. Most people who hit this ceiling do so in school, and one hopes they don’t become programmers. The next ceiling, which is where the archetypical “5:01″ mediocrities live, is around 1.2. This is where you finish up if you just follow orders, don’t care about “functional programming” because you can’t see how it could possibly apply to your job, and generally avoid programming outside of an office context.

The next ceiling is around 1.8, and it’s what I intend to discuss. The 0.7 ceiling is a problem of inability, and at 1.2 it’s an issue of desire and willingness. There are a lot of programmers who don’t have a strong desire to get any better than average. Average is employable, middle-class, comfortable. That keeps a lot of people complacent around 1.2. The ceiling at 1.8, on the other hand, comes from the fact that it’s genuinely hard to get allocated 1.8+ level work, which usually involves technical and architectural leadership. In most companies, there are political battles that the projects’ originators must fight to get them on the map, and others that engineers must involve themselves in if they want to get on to the best projects. It’s messy and hard and ugly and it’s the kind of process that most engineers hate.

Many engineers at the 1.7 to 1.8 level give up on engineering progress and take this ceiling as a call to move into management. It’s a lot harder to ensure a stream of genuinely interesting work than it is to take a middle management position. The dream is that the managerial position will allow the engineer to allocate the best technical work to himself and delegate the crap. The reality is that he’s lucky if he gets 10 hours per week of coding time in, and that managers who cherry-pick the good work and leave the rest to their subordinates are often despised and therefore ineffective.

This said, there’s an idea here, and it deserves attention. The sudden desire to move into management occurs when engineers realize that they won’t progress by just doing their assigned work, and that they need to hack the project allocation process if they want to keep getting better. Managerial authority seems like the most direct route to this because, after all, it’s managers who assign the projects. The problem with that approach is that managerial work requires an entirely different skill set, and that while this is a valid career, it’s probably not what one should pursue if one wants to get better as a software engineer.

How does one hack project allocation? I’m going to introduce a couple terms. The first is J.A.P.: “Just A Programmer”. There are a lot of people in business who see programming as commodity work: that’s why most of our jobs suck. This is a self-perpetuating cycle: because of such peoples’ attitudes toward programmers, good engineers leave them, leaving them with the bad, and reinforcing their perception that programming is order-following grunt work that needs to be micromanaged or it won’t be done right at all. Their attitude toward the software engineer is that she’s “just a programmer”. Hence the term. There’s a related cliche in the startup world involving MBA-toting “big-picture guys” who “just need a programmer” to do all the technical work in exchange for a tiny sliver of the equity. What they get, in return, is rarely quality.

Worse yet for the business side, commodity programmers aren’t 90 or 70 or 50 percent as valuable as good engineers, but 5 to 10 percent as useful, if that. The major reason for this is that software projects scale horribly in terms of the number of people involved with them. A mediocre engineer might be 20 percent as effective, measured individually, as a good one, but four mediocre engineers will only be about 35 percent (not 100) as effective as a single good engineer.

Good programmers dread the “Just A Programmer” role in which they’re assessed on the quantity of code they crank out rather than the problems they solve and the quality of their solutions, and they avoid such positions especially because commodity-programmer roles tend to attract ineffective programmers, and effective people who have to work with ineffective programmers become, themselves, ineffective.

This said, a 1.8 engineer is not a “commodity programmer”. At this level, we’re talking about people who are probably in the top 2 or 3 percent of the software industry. We’re talking about people who, in a functioning environment, will deliver high-quality and far-reaching software solutions reliably. They can start from scratch and deliver an excellent “full-stack” solution. (In a dysfunctional environment, they’ll probably fail if they don’t quit first.)  The political difficulty, and it can be extreme, lies with the fact that it’s very difficult for a good engineer to reliably establish (especially to non-technical managers, and to those managers’ favorites who may not be technically strong) that she is good. It turns out that, even if it’s true, you can’t say to your boss, “I’m a top-2% engineer and deserve more autonomy and the best projects” and expect good results. You have to show it, but you can’t show it unless you get good projects.

What this means, in fewer words, is that it’s very difficult for a software engineer to prove he’s not a commodity programmer without hacking the politics. Perversely, many software environments can get into a state where engineering skill becomes negatively correlated with political success. For example, if the coding practices are “best practices”, “design pattern”-ridden Java culture, with FactoryVisitorSingletonSelection patterns all over the place, bad engineers have an advantage on account of being more familiar with damaged software environments, and because singleton directories called “com” don’t piss them off as much (since they never venture outside of an IDE anyway).

Software wants to be a meritocracy, but the sad reality is that effectiveness of an individual programmer depends on the environment. Drop a 1.8+ engineer into a Visitor-infested Java codebase and he turns into a bumbling idiot, in the same way that an incompetent player at a poker table can fluster experts (who may not be familiar with that particular flavor of incompetence). The result of this is that detecting who the good programmers are, especially for a non-programmer or an inept one, is extremely difficult, if not impossible. The 1.7 to 1.8 level is where software engineers realize that, in spite of their skill, they won’t be recognized as having it unless they can ensure a favorable environment and project allocation, and that it’s next to impossible to guarantee these benefits in the very long run without some kind of political advantage. Credibility as a software engineer alone won’t cut it, because you can’t establish that creativity unless you get good projects.

Enter the “X.W.P.” distinction, which is the alternative to being a “J.A.P.” It means an “X Who Programs”, where X might be an entrepreneur, a researcher, a data scientist, a security specialist, a quant, or possibly even a manager. If you’re an XWP, you can program, and quite possibly full-time, but you have an additional credibility that is rooted in something other than software engineering. Your work clearly isn’t commodity work; you might have a boss, but he doesn’t believe he could do your job better than you can. XWP is the way out. But you also get to code, so it’s the best of both worlds.

This might seem perverse and unusual. At 1.8, the best way to continue improving as a software engineer is not to study software engineering. You might feel like there’s still a lot to learn in that department, and you’re right, but loading up on unrecognized skill is not going to get you anywhere. It leads to bitterness and slow decline. You need something else.

One might think that an XWP is likely to grow as an X but not as a software engineer, but I don’t think that’s necessarily true. There certainly are quants and data scientists and entrepreneurs and game designers who remain mediocre programmers, but they don’t have to. If they want to become good engineers, they have an advantage over vanilla software engineers on account of the enhanced respect accorded their role. If a Chief Data Scientist decides that building a distributed system is the best way to solve a machine learning problem, and he’s willing to roll his sleeves up and write the code, the respect that this gives him will allow him to take the most interesting engineering work. This is how you get 1.8 and 2.1 and 2.4-level engineering work. You start to bill yourself as something other than a software engineer and get the respect that entitles you to projects that will make you better. You find an X and become an X, but you also know your way around a computer. You’re an X, and you know how to code, and your “secret weapon” (secret because management in most companies won’t recognize it) is that you’re really good at it, too.

This, perhaps, is the biggest surprise I’ve encountered in the bizarre world that is the software engineering career. I am so much without words that I’ll use someone else’s, from A Song of Ice and Fire: To go west, you must first go east.

What is spaghetti code?

One of the easiest ways for an epithet to lose its value is for it to become over-broad, which causes it to mean little more than “I don’t like this”. Case in point is the term, “spaghetti code”, which people often use interchangeably with “bad code”. The problem is that not all bad code is spaghetti code. Spaghetti code is an especially virulent but specific kind of bad code, and its particular badness is instructive in how we develop software. Why? Because individual people rarely write spaghetti code on their own. Rather, certain styles of development process make it increasingly common as time passes. In order to assess this, it’s important first to address the original context in which “spaghetti code” was defined: the dreaded (and mostly archaic) goto statement.

The goto statement is a simple and powerful control flow mechanism: jump to another point in the code. It’s what a compiled assembly program actually does in order to transfer control, even if the source code is written using more modern structures like loops and functions. Using goto, one can implement whatever control flows one needs. We also generally agree, in 2012, that goto is flat-out inappropriate for source code in most modern programs. Exceptions to this policy exist, but they’re extremely rare. Most modern languages don’t even have it.

Goto statements can make it difficult to reason about code, because if control can bounce about a program, one cannot make guarantees about what state a program is in when it executes a specific piece of code. Goto-based programs can’t easily be broken down into component pieces, because any point in the code can be wormholed to any other. Instead, they devolve into an “everything is everywhere” mess where to understand a piece of the program requires understanding all of it, and the latter becomes flat-out impossible for large programs. Hence the comparison to spaghetti, where following one thread (or noodle) often involves navigating through a large tangle of pasta. You can’t look at a bowl of noodles and see which end connects to which. You’d have to laboriously untangle it.

Spaghetti code is code where “everything is everywhere”, and in which answering simple questions such as (a) where a certain piece of functionality is implemented, (b) determining where an object is instantiated and how to create it, and (c) assessing a critical section for correctness, just to name a few examples of questions one might want to ask about code, require understanding the whole program, because of the relentless pinging about the source code that answer simple questions requires. It’s code that is incomprehensible unless one has the discipline to follow each noodle through from one side to the other. That is spaghetti code.

What makes spaghetti code dangerous is that it, unlike other species of bad code, seems to be a common byproduct of software entropy. If code is properly modular but some modules are of low quality, people will fix the bad components if those are important to them. Bad or failed or buggy or slow implementations can be replaced with correct ones while using the same interface. It’s also, frankly, just much easier to define correctness (which one must do in order to have a firm sense of what “a bug” is) over small, independent functions than over a giant codeball designed to do too much stuff. Spaghetti code is evil because (a) it’s a very common subcase of bad code, (b) it’s almost impossible to fix without causing changes in functionality, which will be treated as breakage if people depend on the old behavior (potentially by abusing “sleep” methods, thus letting a performance improvement cause seemingly unrelated bugs!) and (c) it seems, for reasons I’ll get to later, not to be preventable through typical review processes.

The reason I consider it important to differentiate spaghetti code from the superset, “bad code”, is that I think a lot of what makes “bad code” is subjective. A lot of the conflict and flat-out incivility in software collaboration (or the lack thereof) seems to result from the predominantly male tendency to lash out in the face of unskilled creativity (or a perception of such, and in code this is often an extremely biased perception): to beat the pretender to alpha status so badly that he stops pestering us with his incompetent displays. The problem with this behavior pattern is that, well, it’s not useful and it rarely makes people better at what they’re trying to do. It’s just being a prick. There are also a lot of anal-retentive wankbaskets out there who define good and bad programmers based on cosmetic traits so that their definition of “good code” is “code that looks like I wrote it”. I feel like the spaghetti code problem is better-defined in scope than the larger but more subjective problem of “bad code”. We’ll never agree on tabs-versus-spaces, but we all know that spaghetti code is incomprehensible and useless. Moreover, as spaghetti code is an especially common and damaging case of bad code, assessing causes and preventions for this subtype may be generalizable to other categories of bad code.

People usually use “bad code” to mean “ugly code”, but if it’s possible to determine why a piece of code is bad and ugly, and to figure out a plausible fix, it’s already better than most spaghetti code. Spaghetti code is incomprehensible and often unfixable. If you know why you hate a piece of code, it’s already above spaghetti code in quality, since the latter is just featureless gibberish.

What causes spaghetti code? Goto statements were the leading cause of spaghetti code at one time, but goto has fallen so far out of favor that it’s a non-concern. Now the culprit is something else entirely: the modern bastardization of object-oriented programming. Inheritance is an especially bad culprit, and so is premature abstraction: using a parameterized generic with only one use case in mind, or adding unnecessary parameters. I recognize that this claim– that OOP as practiced is spaghetti code– is not a viewpoint without controversy. Nor was it without controversy, at one time, that goto was considered harmful.

One of the biggest problems in comparative software (that is, the art of comparing approaches, techniques, languages, or platforms) is that most comparisons focus on simple examples. At 20 lines of code, almost nothing shows its evilness, unless it’s contrived to be dastardly. A 20-line program written with goto will usually be quite comprehensible, and might even be easier to reason about than the same program written without goto. At 20 lines, a step-by-step instruction list with some explicit control transfer is a very natural way to envision a program. For a static program (i.e. a platonic form that need never be changed and incurs no maintenance) that can be read in one sitting, that might be a fine way to structure it. At 20,000 lines, the goto-driven program becomes incomprehensible. At 20,000 lines, the goto-driven program has been hacked and expanded and tweaked so many times that the original vision holding the thing together has vanished, and the fact that a program can be in a piece of code “from anywhere” means that to safely modify the code requires confidence quantified by “from everywhere”. Everything is everywhere. Not only does this make the code difficult to comprehend, but it means that every modification to the code is likely to make it worse, due to unforeseeable chained consequences. Over time, the software becomes “biological”, by which I mean that it develops behaviors that no one intended but that other software components may depend on in hidden ways.

Goto failed, as a programming language construct, because of these problems imposed by the unrestricted pinging about a program that it created. Less powerful, but therefore more specifically targeted, structures such as procedures, functions, and well-defined data structures came into favor. For the one case where people needed global control flow transfer (error handling) exceptions were developed. This was a progress from the extreme universality and abstraction of a goto-driven program to the concretion and specificity of pieces (such as procedures) solving specific problems. In unstructured programming, you can write a Big Program that does all kinds of stuff, add features on a whim, and alter the flow of the thing as you wish. It doesn’t have to solve “a problem” (so pedestrian…) but it can be a meta-framework with an embedded interpreter! Structured programming encouraged people to factor their programs into specific pieces that solved single problems, and to make those solutions reusable when possible. It was a precursor of the Unix philosophy (do one thing and do it well) and functional programming (make it easy to define precise, mathematical semantics by eschewing global state).

Another thing I’ll say about goto is that it’s rarely needed as a language-level primitive. One could achieve the same effect using a while-loop, a “program counter” variable defined outside that loop that the loop either increments (step) or resets (goto) and a switch-case statement using it. This could, if one wished, be expanded into a giant program that runs as one such loop, but code like this is never written. What the fact that this is almost never done seems to indicate is that goto is rarely needed. Structured programming thereby points out the insanity of what one is doing when attempting severely non-local control flows.

Still, there was a time when abandoning goto was extremely controversial, and this structured programming idea seemed like faddish nonsense. The objection was: why use functions and procedures when goto is strictly more powerful?

Analogously, why use referentially transparent functions and immutable records when objects are strictly more powerful? An object, after all, can have a method called run or call or apply so it can be a function. It can also have static, constant fields only and be a record. But it can also do a lot more: it can have initializers and finalizers and open recursion and fifty methods if one so chooses. So what’s the fuss about this functional programming nonsense that expects people to build their programs out of things that are much less powerful, like records whose fields never change and whose classes contain no initialization magic?

The answer is that power is not always good. Power, in programming, often advantages the “writer” of code and not the reader, but maintenance (i.e. the need to read code) begins subjectively around 2000 lines or 6 weeks, and objectively once there is more than one developer on a project. On real systems, no one gets to be just a “writer” of code. We’re readers, of our own code and of that written by others. Unreadable code is just not acceptable, and only accepted because there is so much of it and because “best practices” object-oriented programming, as deployed at many software companies, seem to produce it. A more “powerful” abstraction is more general, and therefore less specific, and this means that it’s harder to determine exactly what it’s used for when one has to read the code using it. This is bad enough, but single-writer code usually remains fairly disciplined: the powerful abstraction might have 18 plausible uses, but only one of those is actually used. There’s a singular vision (although usually an undocumented one) that prevents the confusion. The danger sets in when others who are not aware of that vision have to modify the code. Often, their modifications are hacks that implicitly assume one of the other 17 use cases. This, naturally, leads to inconsistencies and those usually result in bugs. Unfortunately, people brought in to fix these bugs have even less clarity about the original vision behind the code, and their modifications are often equally hackish. Spot fixes may occur, but the overall quality of the code declines. This is the spaghettification process. No one ever sits down to write himself a bowl of spaghetti code. It happens through a gradual “stretching” process and there are almost always multiple developers responsible. In software, “slippery slopes” are real and the slippage can occur rapidly.

Object-oriented programming, originally designed to prevent spaghetti code, has become (through a “design pattern” ridden misunderstanding of it) one of the worst sources of it. An “object” can mix code and data freely and conform to any number of interfaces, while a class can be subclassed freely about the program. There’s a lot of power in object-oriented programming, and when used with discipline, it can be very effective. But most programmers don’t handle it well, and it seems to turn to spaghetti over time.

One of the problems with spaghetti code is that it forms incrementally, which makes it hard to catch in code review, because each change that leads to “spaghettification” seems, on balance, to be a net positive. The plus is that a change that a manager or customer “needs yesterday” gets in, and the drawback is what looks like a moderate amount of added complexity. Even in the Dark Ages of goto, no one ever sat down and said, “I’m going to write an incomprehensible program with 40 goto statements flowing into the same point.”  The clutter accumulated gradually, while the program’s ownership transferred from one person to another. The same is true of object-oriented spaghetti. There’s no specific point of transition from an original clean design to incomprehensible spaghetti. It happens over time as people abuse the power of object-oriented programming to push through hacks that would make no sense to them if they understood the program they were modifying and if more specific (again, less powerful) abstractions were used. Of course, this also means that fault for spaghettification is everywhere and nowhere at the same time: any individual developer can make a convincing case that his changes weren’t the ones that caused the source code to go to hell. This is part of why large-program software shops (as opposed to small-program Unix philosophy environments) tend to have such vicious politics: no one knows who’s actually at fault for anything.

Incremental code review is great at catching the obvious bad practices, like mixing tabs and spaces, bad variable naming practices, and lines that are too long. That’s why the more cosmetic aspects of “bad code” are less interesting (using a definition of “interesting” synonymous with “worrisome”) than spaghetti code. We already know how to solve them in incremental code review. We can even configure our continuous-integration servers to reject such code. As for spaghetti code, where there is no clear definition, this is difficult if not impossible to do. Whole-program review is necessary to catch that, but I’ve seen very few companies willing to invest the time and political will necessary to have actionable whole-program reviews. Over the long term (10+ years) I think it’s next to impossible, except among teams writing life- or mission-critical software, to ensure this high level of discipline in perpetuity.

The answer, I think, is that Big Code just doesn’t work. Dynamic typing falls down in large programs, but static typing fails in a different way. The same is true of object-oriented programming, imperative programming, and to a lesser but still noticeable degree (manifest in the increasing number of threaded state parameters) in functional programming. The problem with “goto” wasn’t that goto was inherently evil, so much as that it allowed code to become Big Code very quickly (i.e. the threshold of incomprehensible “bigness” grew smaller). On the other hand, the frigid-earth reality of Big Code is that there’s “no silver bullet”. Large programs just become incomprehensible. Complexity and bigness aren’t “sometimes undesirable”. They’re always dangerous. Steve Yegge got this one right.

This is why I believe the Unix philosophy is inherently right: programs shouldn’t be vague, squishy things that grow in scope over time and are never really finished. A program should do one thing and do it well. If it becomes large and unwieldy, it’s refactored into pieces: libraries and scripts and compiled executables and data. Ambitious software projects shouldn’t be structured as all-or-nothing single programs, because every programming paradigm and toolset breaks down horribly on those. Instead, such projects should be structured as systems and given the respect typically given to such. This means that attention is paid to fault-tolerance, interchangeability of parts, and communication protocols. It requires more discipline than the haphazard sprawl of big-program development, but it’s worth it. In addition to the obvious advantages inherent in cleaner, more usable code, another benefit is that people actually read code, rather than hacking it as-needed and without understanding what they’re doing. This means that they get better as developers over time, and code quality gets better in the long run.

Ironically, object-oriented programming was originally intended to encourage something looking like small-program development. The original vision behind object-oriented programming was not that people should go and write enormous, complex objects, but that they should use object-oriented discipline when complexity is inevitable. An example of success in this arena is in databases. People demand so much of relational databases in terms of transactional integrity, durability, availability, concurrency and performance that complexity is outright necessary. Databases are complex beasts, and I’ll comment that it has taken the computing world literally decades to get them decent, even with enormous financial incentives to do so. But while a database can be (by necessity) complex, the interface to one (SQL) is much simpler. You don’t usually tell a database what search strategy to use; you write a declarative SELECT statement (describing what the user wants, not how to get it) and let the query optimizer take care of it. 

Databases, I’ll note, are somewhat of an exception to my dislike of Big Code. Their complexity is well-understood as necessary, and there are people willing to devote their careers entirely to mastering it. But people should not have to devote their careers to understanding a typical business application. And they won’t. They’ll leave, accelerating the slide into spaghettification as the code changes hands.

Why Big Code? Why does it exist, in spite of its pitfalls? And why do programmers so quickly break out the object-oriented toolset without asking first if the power and complexity are needed? I think there are several reasons. One is laziness: people would rather learn one set of general-purpose abstractions than study the specific ones and when they are appropriate. Why should anyone learn about linked lists and arrays and all those weird tree structures when we already have ArrayList? Why learn how to program using referentially transparent functions when objects can do the trick (and so much more)? Why learn how to use the command line when modern IDEs can protect you from ever seeing the damn thing? Why learn more than one language when Java is already Turing-complete? Big Code comes from a similar attitude: why break a program down into small modules when modern compilers can easily handle hundreds of thousands of lines of code? Computers don’t care if they’re forced to contend with Big Code, so why should we?

However, more to the point of this, I think, is hubris with a smattering of greed. Big Code comes from a belief that a programming project will be so important and successful that people will just swallow the complexity– the idea that one’s own DSL is going to be as monumental as C or SQL. It also comes from a lack of willingness to declare a problem solved and a program finished even when the meaningful work is complete. It also comes from a misconception about what programming is. Rather than existing to solve well-defined problems and then get out of the way, as small-program methodology programs do, they become more than that. Big Code projects often have an overarching and usually impractical “vision” that involves generating software for software’s sake. This becomes a mess, because “vision” in a corporate environment is usually bike-shedding that quickly becomes political. Big Code programs always reflect the political environment that generated them (Conway’s Law) and this means that they invariably look more like collections of parochialisms and inside humor than the more universal languages of mathematics and computer science.

There is another problem in play. Managers love Big Code, because when the programmer-to-program relationship is many-to-one instead of one-to-many, efforts can be tracked and “headcount” can be allocated. Small-program methodology is superior, but it requires trusting the programmers to allocate their time appropriately to more than one problem, and most executive tyrannosaurs aren’t comfortable doing that. Big Code doesn’t actually work, but it gives managers a sense of control over the allocation of technical effort. It also plays into the conflation of bigness and success that managers often make (cf. the interview question for executives, “How many direct reports did you have?”) The long-term spaghettification that results from Big Code is rarely an issue for such managers. They can’t see it happen, and they’re typically promoted away from the project before this becomes an issue.

In sum, spaghetti code is bad code, but not all bad code is spaghetti. Spaghetti is a byproduct of industrial programming that is usually, but not always, an entropic result of too many hands passing over code, and an inevitable outcome of large-program methodologies and the bastardization of “object-oriented programming” that has emerged out of these defective, executive-friendly processes. The antidote to spaghetti is an aggressive and proactive refactoring effort focused on keeping programs small, effective, clean in source code, and most of all, coherent.

Don’t waste your time in crappy startup jobs.

What I’m about to say is true now, as of July 2012. It wasn’t necessarily true 15 years ago, and it may not be true next year. Right now, for most people, it’s utterly correct– enough that I feel compelled to say it. The current VC-funded startup scene, which I’ve affectionately started calling “VC-istan”, is– not to be soft with it– a total waste of time for most of the people involved.

Startups. For all the glamour and “sexiness” associated with the concept, the truth is that startups are no more and no less than what they sound like: new, growing businesses. There are a variety of good and bad reasons to join or start businesses, but for most of human history, it wasn’t viewed as a “sexy” process. Getting incorporated, setting up a payroll system, and hiring accountants are just not inspiring duties for most people. They’re mundane tasks that people are more than willing to do in pursuit of an important goal, but starting a business has not typically been considered to be  inherently “sexy”. What changed, after about 1996, is that people started seeing ”startups” as an end in themselves. Rather than an awkward growth phase for an emerging, risky business, “startup” became a lifestyle. This was all fine because, for decades, positions at established businesses were systemically overvalued by young talent, and those at growing small companies were undervalued. It made economic sense for ambitious young people to brave the risk of a startup company. Thus, the savviest talent gravitated toward the startups, where they had access to responsibilities and career options that they’d have to wait for years to get in a more traditional setting.

Now, the reverse seems to be true. In 1995, a lot of talented young people went into large corporations because they saw no other option in the private sector– when, in fact, there were credible alternatives, startups being a great option. In 2012, a lot of young talent is going into startups for the same reason: a belief that it’s the only legitimate work opportunity for top talent, and that their careers are likely to stagnate if they work in more established businesses. They’re wrong, I think, and this mistaken belief allows them to be taken advantage of. The typical equity offer for a software engineer is dismally short of what he’s giving up in terms of reduced salary, and the career path offered by startups is not always what it’s made out to be.

For all this, I don’t intend to argue that people shouldn’t join startups. If the offer’s good, and the job looks interesting, it’s worth trying out. I just don’t think that the current, unconditional “startups are awesome!” mentality serves us well. It’s not good for any of us, because there’s no tyrant worse than a peer selling himself short, and right now there are a lot of great people selling themselves very short for a shot at the “startup experience”– whatever that is.

Here are 7 misconceptions about startups that I’d like to dispel.

1. A startup will make you rich. True, for founders, whose equity shares are measured in points. Not true for most employees, who are offered dimes or pennies.

Most equity offerings for engineers are, quite frankly, tiny. A “nickel” (0.05 percent) of an 80-person business is nothing to write home about. It’s not partnership or ownership. Most engineers have the mistaken belief that the initial offering is only a teaser, and that it will be improved once they “prove themselves”, but it’s pretty rare that this actually happens.

Moreover, raises and bonuses are very uncommon in startups. It’s typical for high performers to be making the same salary after 3 years as they earned when they started. (What happens to low performers, and to high performers who fail politically? They get fired, often with no warning or severance.) Substantial equity improvements are even rarer. When things are going well in a startup, the valuation of the equity package is increasing and that is the raise. When things are going badly, that’s the wrong time to be asking for anything.

There are exceptions. One is that, if the company finds itself in very tough straits and can’t afford to pay salaries at all, it will usually grant more equity to employees in order to make up for the direct economic hardship it’s causing them by not being able to pay a salary. This isn’t a good situation, because the equity is usually offered at-valuation (more specifically, at the valuation of the last funding round, when the company was probably in better shape) and typically employees would be better off with the cash. Another is that it’s not atypical for a company to “refresh” or lengthen a vesting period with a proportionate increase. A 0.1% grant, vesting over four years, can be viewed as compensation at 0.025% per year. It’s not atypical for a company to continue that same rate in the years after that. That means that a person spending six years might get up to 0.15%. What is atypical is for an employee brought in with 0.1% to be raised to 1% because of good performance. The only time that happens is when there’s a promotion involved, and internal promotions (more on this, later) are surprisingly rare in startups.

2. The “actual” valuation is several times the official one. This is a common line, repeated both by companies in recruiting and by engineers justifying their decision to work for a startup. (“My total comp. is actually $250,000 because the startup really should be worth $5 billion.) People love to think they’re smarter than markets. Usually, they aren’t. Moreover, the few who are capable of being smarter than markets are not taking (or trying to convince others to take) junior-level positions where the equity allotment is 0.05% of an unproven business. People who’ve legitimately developed that skill (of reliably outguessing markets) deal at a much higher level than that.

So, when someone says, “the actual valuation should be… “, it’s reasonable to conclude with high probability that this person doesn’t know what the fuck he or she is talking about.

In fact, an engineer’s individual valuation should, by rights, be substantially lower than the valuation at which the round of funding is made. When a VC offers $10 million for 20% of a business, the firm is stating that it believes the company (pre-money) is worth $40 million to them. Now, startup equity is always worth strictly more (and by a substantial amount) to a VC than it is worth to an engineer. So the fair economic value (for an engineer) of a 0.1% slice is probably not $40,000. It might be $10-20,000.

There are several reasons for this disparity of value. First, the VC’s stake gives them control. It gives them board seats, influence over senior management, and the opportunity to hand out a few executive positions to their children or to people whom they owe favors. An engineer’s 0.1% slice, vesting over four years, doesn’t give him any control, respect, or prestige. It’s a lottery ticket, not a vote. Second, startup equity is a high-risk asset, and VCs have a different risk profile from average people. An average person would rather have a guarantee of $2 million than a 50% chance of earning $5 million, even though the expected value of the latter offer is higher. VCs, in general, wouldn’t, because they’re diversified enough to take the higher-expectancy, riskier choices. Third, the engineer has no protection against dilution, and will be on the losing side of any preference structure that the investors have set up (and startups rarely volunteer information pertaining to what preferences exist against common stock, which is what the engineers will have). Fourth, venture capitalists who invest in highly successful businesses get prestige and huge returns on investment, whereas mere employees might get a moderate-sized windfall, but little prestige unless they achieved an executive position. Otherwise, they just worked there.

In truth, startup employees should value equity and options at about one-fourth the valuation that VCs will give it. If they’re giving up $25,000 per year in salary, they should only do so in exchange for $100,000 per year (at current valuation) in equity. Out of a $40-million company with a four-year vesting cycle, that means they should ask for 1%.

3. If you join a startup early, you’re a shoe-in for executive positions. Nope.

Points #1-2 aren’t going to surprise many people. Most software engineers know enough math to know that they won’t get filthy rich on their equity grants, but join startups under the belief that coming into the company early will guarantee a VP-level position at the company (at which point compensation will improve) once it’s big. Not so. In fact, one of the best ways not to get a leadership position in a startup is to be there early.

Startups often involve, for engineers, very long hours, rapidly changing requirements, and tight deadlines, which means the quality of the code they write is generally very poor in comparison to what they’d be able to produce in saner conditions. It’s not that they’re bad at their jobs, but that it’s almost impossible to produce quality software under those kinds of deadlines. So code rots quickly in a typical startup environment, especially if requirements and deadlines are being set by a non-technical manager. Three years and 50 employees later, what they’ve built is now a horrific, ad-hoc, legacy system hacked by at least ten people and built under intense deadline pressure, and even the original architects don’t understand it. It may have been a heroic effort to build such a powerful system in so little time, but from an outside perspective, it becomes an embarrassment. It doesn’t make the case for a high-level position.

Those engineers should, by rights, get credit and respect for having built the system in the first place. For all its flaws, if the system works, then the company owes no small part of its success to them. Sadly, though, the “What have you done for me lately?” impulse is strong, and these engineers are typically associated with how their namesake projects end (as deadline-built legacy monstrosities) rather than what it took to produce them.

Moreover, the truth about most VC-funded startups is that they aren’t technically deep, so it seems to most people that it’s marketing rather than technical strength that determines which companies get off the ground and which don’t. The result of this is that the engineer’s job isn’t to build great infrastructure that will last 10 years… because if the company fails on the marketing front, there will be no “in 10 years”. The engineer’s job is to crank out features quickly, and keep the house of cards from falling down long enough to make the next milestone. If this means that he loads up on “technical debt”, that’s what he does.

If the company succeeds, it’s the marketers, executives, and biz-dev people who get most of the glory. The engineers? Well, they did their jobs, but they built that disliked legacy system that “just barely works” and “can’t scale”. Once the company is rich and the social-climbing mentality (of always wanting “better” people) sets in, the programmers will be replaced with more experienced engineers brought in to “scale our infrastructure”. Those new hires will do a better job, not because they’re superior, but because the requirements are better defined and they aren’t working under tight deadline pressure. When they take what the old-timers did and do it properly, with the benefit of learning from history, it looks like they’re simply superior, and managerial blessing shifts to “the new crowd”. The old engineers probably won’t be fired, but they’ll be sidelined, and more and more people will be hired above them.

Furthermore, startups are always short on cash and they rarely have the money to pay for the people they really want, so when they’re negotiating with these people in trying to hire them, they usually offer leadership roles instead. When they go into the scaling phase, they’re typically offering $100,000 to $150,000 per year for an engineer– but trying to hire people who would earn $150,000 to $200,000 at Google or on Wall Street. In order to make their deals palatable, they offer leadership roles, important titles and “freedom from legacy” (which means the political pull to scorched-earth existing infrastructure if they dislike it or it gets in their way) to make up for the difference. If new hires are being offered leadership positions, this leaves few for the old-timers. The end result of this is that the leadership positions that early engineers expect to receive are actually going to be offered away to future hires.

Frankly put, being a J.A.P. (“Just A Programmer”) in a startup is usually a shitty deal. Unless the company makes unusual cultural efforts to respect engineering talent (as Google and Facebook have) it will devolve into the sort of place where people doing hard things (i.e. software engineers) get the blame and the people who are good at marketing themselves advance.

4. In startups, there’s no boss. This one’s patently absurd, but often repeated. Those who champion startups often say that one who goes and “works for a company” ends up slaving away for “a boss” or “working for The Man”, whereas startups are a path to autonomy and financial freedom.

The truth is that almost everyone has a boss, even in startups. CEOs have the board, the VPs and C*Os have the CEO, and the rest have actual, you know, managers. That’s not always a bad thing. A competent manager can do a lot for a person’s career that he wouldn’t realistically be able to do on his own. Still, the idea that joining a startup means not having a boss is just nonsense.

Actually, I think founders often have the worst kind of “boss” in venture capitalists. To explain this, it’s important to note that the U.S. actually has a fairly low “power distance” in professional workplaces– this is not true in all cultures– by which I mean bosses aren’t typically treated as intrinsic social superiors to their direct reports. Yes, they have more power and higher salaries, but they’re also older and typically have been there for longer. A boss who openly treats his reports with contempt, as if he were innately superior, isn’t going to last for very long. Also, difficult bosses can be escaped: take another job. And the most adverse thing they can (legally) do is fire someone, which has the same effect. Beyond that, bosses can’t legally have a long-term negative effect on someone’s career.

With VCs, the power distance is much greater and the sense of social superiority is much stronger. For example, when a company receives funding it is expected to pay both parties’ legal fees. This is only a minor expenditure in most cases, but it exists to send a strong social message: you’re not our kind, dear, and this is what you’ll deal with in order to have the privilege of speaking with us at all. 

This is made worse by the incestuous nature of venture capital, which leads to the worst case of groupthink ever observed in a supposedly progressive, intelligent community. VCs like a startup if other VCs like it. The most well-regarded VCs all know each other, they all talk to each other, and rather than competing for the best deals, they collude. This leaves the venture capitalists holding all the cards. A person who turns down a term sheet with multiple liquidation preferences and participating preferred (disgusting terms that I won’t get into because they border on violence, and I’d prefer this post to be work-safe) is unlikely to get another one.

A manager who presents a prospective employee with a lowball offer and says, “If you don’t take this, I’ll make a phone call and no one in the industry will hire you” is breaking the law. That’s extortion. In venture capital? They don’t have to say this. It’s unspoken that if you turn down a terrible term sheet with a 5x liquidation preference, you’re taking a serious risk that a phone call will be made and that supposedly unrelated interest will dry up as well. That’s why VCs can get away with multiple liquidation preferences and participating preferred.

People who really don’t want to have “a boss” should not be looking into VC-funded startups. There are great, ethical venture capitalists who wouldn’t go within a million miles of the extortive shenanigans I’ve described above. It’s probably true that most are. Even still, the power relationship between a founder and investor is far more lopsided than that between a typical employee and manager. No manager can legally disrupt an employee’s career outside of one firm; but venture capitalists can (and sometimes do) block people from being fundable.

Instead, those who really want not to have a boss should be thinking about smaller “lifestyle” businesses in which they’ll maintain a controlling interest. VC has absolutely no interest in funding these sorts of companies, so this is going to require angel investment or personal savings, but for those who really want that autonomy, I think this is the best way to go.

For all this, what I’ve said here about the relationship between founders and VCs isn’t applicable to typical engineers. An engineer joining a startup of larger than about 20 people will have a manager, in practice if not in reality. That’s not a bad thing. It’s no worse or better than it would be in any other company. It does make the “no boss” vs. “working for The Man” selling point of startups a bit absurd, though.

5. Engineers at startups will be “changing the world”. With some exceptions, startups are generally not vehicles for world-changing visions. Startups need to think about earning revenue within the existing world, not “changing humanity as we know it”.

“The vision thing” is an aspect of the pitch that is used to convince 22-year-old engineers to work for 65 percent of what they’d earn at a more established company, plus some laughable token equity offering. It’s not real.

The problem with changing the world is that the world doesn’t really want to change, and to the extent that it it’s willing to do so, few people who have the resources necessary to push for improvements. What fundamental change does occur is usually gradual– not revolutionary– and requires too much cooperation to be forced through by a single agent.

Scientific research changes the world. Large-scale infrastructure projects change the world. Most businesses, on the other hand, are incremental projects, and there’s nothing wrong with that. Startups are not a good vehicle for “changing the world”. What they are excellent at is finding ways to profit from inexorable, pre-existing trends by doing things that (a) have recently become possible, but that (b) no one had thought of doing (or been able to do) before. By doing so, they often improve the world incrementally: they wouldn’t survive if they didn’t provide value to someone. In other words, most of them are application-level concepts that fill out an existing world-changing trend (like the Internet) but not primary drivers. That’s fine, but people should understand that their chances of individually effecting global change, even at a startup, are very small.

6. If you work at a startup, you can be a founder next time around. What I’ve said so far is that it’s usually a shitty deal to be an employee at a startup: you’re taking high risk and low compensation for a job that (probably) won’t make you rich, lead to an executive position, bring great autonomy, or change the world. So what about being a founder? It’s a much better deal. Founders can get rich, and they will make important connections that will set up their careers. So why aren’t more people becoming founders of VC-funded startups? Well, they can’t. Venture capital acceptance rates are well below 1 percent.

The deferred dream is probably the oldest pitch in the book, so this one deserves address. A common pitch delivered to prospective employees in VC-istan is that “this position will set you up to be a founder (or executive) at your next startup”. Frankly, that’s just not true. The only thing that a job can offer that will set a person up with the access necessary to be a founder in the future is investor contact, and a software engineer who insists on investor contact when joining an already-funded startup is going to be laughed out the door as a “prima donna”.

A non-executive position without investor contact at a startup provides no more of the access that a founder will need than any other office job. People who really want to become startup founders are better off working in finance (with an aim at venture capital) or pursuing MBA programs than taking subordinate positions at startups.

7. You’ll learn more in a startup. This last one can be true; I disagree with the contention that it’s always true. Companies tend to regress to the mean as they get bigger, so the outliers on both sides are startups. And there are things that can be learned in the best small companies when they are small that can’t be learned anywhere else. In other words, there are learning opportunities that are very hard to come by outside of a startup.

What’s wrong here is the idea that startup jobs inherently more educational simply because they exist at startups. There’s genuinely interesting work going on at startups, but there’s also a hell of a lot of grunt work, just like anywhere else. On the whole, I think startups invest less in career development than more established companies. Established companies have had great people leave after 5 years, so they’ve had more than enough time to “get it” on the matter of their best people wanting more challenges. Startups are generally too busy fighting fires, marketing themselves, and expanding to have time to worry about whether their employees are learning.

So… where to go from here?

I am not trying to impart the message that people should not work for startups. Some startups are great companies. Some pay well and offer career advancement opportunities that are unparalleled. Some have really great ideas and, if they can execute, actually will make early employees rich or change the world. People should take jobs at startups, if they’re getting good deals.

Experience has led me to conclude that there isn’t much of a difference in mean quality between large and small companies, but there is a lot more variation in the small ones, for rather obvious reasons. The best and worst companies tend to be startups. The worst ones don’t usually live long enough to become big companies, so there’s a survivorship bias that leads us to think of startups as innately superior. It’s not the case.

As I said, the worst tyrant in a marketplace is a peer selling himself short. Those who take terrible deals aren’t just doing themselves a disservice to themselves, but to all the rest of us as well. The reason young engineers are being offered subordinate J.A.P. jobs with 0.03% equity and poorly-defined career tracks is because there are others who are unwise enough to take them.

In 2012, is there “a bubble” in internet startups? Yes and no. In terms of valuations, I don’t think there’s a bubble. Or, at least, it’s not obvious to me that one exists. I think it’s rare that a person who’s relatively uninformed (such as myself, when it comes to pricing technology companies) can outguess a market, and I see no evidence that the valuations assigned to these companies are unreasonable. Where there is undeniably a bubble is in the extremely high value that young talent is ascribing to subordinate positions at mediocre startups.

So what is a fair deal, and how does a person get one? I’ll give some very basic guidelines.

1. If you’re taking substantial financial risk to work at the company, you’re a Founder. Expect to be treated like one. By “substantial financial risk”, I mean earning less  than (a) the baseline cost-of-living in one’s area or (b) 75% of one’s market compensation.

If you’re taking that kind of risk, you’re an investor and you better be seen as a partner. It means you should demand the autonomy and respect given to a founder. It means not to take the job unless there’s investor contact. It means you have a right to know the entire capitalization structure (an inappropriate question for an employee, but a reasonable one for a founder) and determine if it’s fair, in the context of a four-year vesting period. (If the first technical hire gets 1% for doing all the work and the CEO gets 99% because he has the connections, that’s not fair. If the first technical hire gets 1% while the CEO gets 5% and the other 94% has been set aside for employees and investors, and the CEO has been going without salary for a year already, well, that’s much more fair.) It means you should have the right to represent yourself to the public as a Founder.

2. If you have at least 5 years of programming experience and the company isn’t thoroughly “de-risked”, get a VP-level title. An early technical hire is going to be spending most of his time programming– not managing or sitting in meetings or talking with the press as an “executive” would. Most of us (myself included) would consider that arrangement, of getting to program full-time at high productivity, quite desirable. This might make it seem like “official” job titles (except for CEO) don’t matter and that they aren’t worth negotiating for. Wrong.

Titles don’t mean much when 4 people at the company. Not in the least. So get that VP-level title locked-in now, before it’s valuable and much harder to get. Once there are more than about 25 people, titles start to have real value and for a programmer to ask for a VP title might seem like an unreasonable demand.

People may claim that titles are old-fashioned and useless and elitist, and they often have strong points behind their claims. Still, people in organizations place a high value on institutional consistency (meaning that there’s additional cognitive load for them to contradict the company’s “official” statements, through titles, about the status of its people) and the high status, however superficial and meaningless, conferred by an impressive title can easily become self-perpetuating. As the company becomes larger and more opaque, the benefit conferred by the title increases.

Another benefit of having a VP-level title is the implicit value inherent of being VP of something. It means that one will be interpreted as representing some critical component of the company. It also makes it embarrassing to the top executives and the company if this person isn’t well treated. For an example, let’s take “VP of Culture”. Doesn’t it sound like a total bullshit title? In a small company, it probably is. So meaningless, in fact, that most CEOs would be happy to give it away. “You want to be ‘VP of Culture’, but you’ll be doing the same work for the same salary? By all means.”  Yet what does it mean if a CEO berates the VP of Culture? That culture isn’t very important at this company. What about if the VP of Culture is pressured to resign or fired? From a public view, the company just “lost” its VP of Culture. That’s far more indicative than if a “J.A.P.” engineer leaves.

More relevantly, a VP title puts an implicit limit on the number of people who can be hired above a person, because most companies don’t want the image of having 50 of their 70 people being “VP” or “SVP”. It dilutes the title, and makes the company look bloated (except in finance, where “VP” is understood to represent a middling and usually not executive level.) If you’re J.A.P., the company is free to hire scads of people above you. If you’re a VP, anyone hired above you has to be at least a VP, if not an SVP, and companies tend to be conservative with those titles once they start to actually matter.

The short story to this is that, yes, titles are important and you should get one if the company’s young and not yet de-risked. People will say that titles don’t mean anything, and that “leadership is action, not position”, and there’s some truth in that, but you want the title nonetheless. Get it early when it doesn’t matter, because someday it will. And if you’re a competent mid-career (5+ years) software engineer and the company’s still establishing itself, then having some VP-level title is a perfectly reasonable term to negotiate.

3. Value your equity or options at one-fourth of the at-valuation level.  This has been discussed above. Because this very risky asset is worth much more to diversified, rich investors than it is to an employee, it should be discounted by a factor of 3-4. This means that it’s only worth it to take a job at $25,000 below market in exchange for $100,000 per year in equity or options (at valuation).

Also worth keeping in mind is that raises and bonuses are uncommon in startups, and that working at a startup can have an affect on one’s salary trajectory. Realistically, a person should assess a startup offer in the light of what he expects to earn over the next 3 to 5 years, not what he can command now.

4. If there’s deferred cash involved, get terms nailed down. This one doesn’t apply to most startups, because it’s an uncommon arrangement after a company is funded for it to be paying deferred cash. Usually, established startups pay a mix of salary and equity.

If deferred cash is involved in the package, it’s important to get a precise agreement on when this payment becomes due. Deferred cash is, in truth, zero-interest debt of the company to the employee. Left to its own devices, no rationally acting company would ever repay a zero-interest loan. So this is important to get figured out. What events make deferred cash due? (Startups never have “enough” money, so “when we have enough” is not valid.) What percentage of a VC-funding round is dedicated to pay off this debt? What about customer revenue? It’s important to get a real contract to figure this out; otherwise, the deferred payment is just a promise, and sadly those aren’t always worth much.

The most important matter to address when it comes to deferred cash is termination, because being owed money by a company one has left (or been fired from) is a mess. No one ever expects to be fired, but good people get fired all the time. In fact, there’s more risk of this in a small company, where transfers tend to be impossible on account of the firm’s small size, and where politics and personality cults can be a lot more unruly than they are in established companies.

Moreover, severance payments are extremely uncommon in startups. Startups don’t fear termination lawsuits, because those take years and startups assume they will either be (a) dead, or (b) very rich by the time any such suit would end– and either way, it doesn’t much matter to them. Being fired in established companies usually involves a notice (“improvement plan”) period (in which anyone intelligent will line up another job) or severance, or both, because established companies really don’t want to deal with termination lawsuits. In startups, people who are fired usually get neither notice nor severance.

People tend to think that the risk of startups is limited to the threat of them going out of business, but the truth is that they also tend to fire a lot more people, and often with less justification for doing so. This isn’t always a bad thing (firing too few people can be just as corrosive as firing too many) but it is a risk people need to be aware of.

I wouldn’t suggest asking for a contractual severance arrangement in negotiation with a startup; that request will almost certainly be denied (and might be taken as cause to rescind the offer). However, if there’s deferred cash involved, I would ask for a contractual agreement, if there is deferred cash, that it becomes due immediately on event of involuntary termination. Day-of, full amount, with the last paycheck.

5. Until the company’s well established (e.g. IPO) don’t accept a “cliff” without a deferred-cash arrangement in event of involuntary termination. The “cliff” is a standard arrangement in VC-funded startups whereby no vesting occurs if the employee leaves or is fired in the first year. The problem with the cliff is that it creates a perverse incentive for the company to fire people before they can collect any equity.

Address the cliff as follows. If employee is involuntarily terminated, and the cliff is enforced, whatever equity would have vested is converted (at most recent valuation) to cash and due upon date of termination.

This is a non-conventional term, and many startups will flat-out refuse it. Fine. Don’t work for them. This is important; the last thing you want is for the company to have an incentive to fire you because of a badly-structured compensation package.

6. Keep moving your career forward. Just being “at a startup” is not enough. The most credible appeal of working at a startup is the opportunity to learn a lot, and one can, it’s not a guarantee. Startups tend to be more “self-serve” in terms of career development. People who go out of their way to explore and use new technologies and approaches to problems will learn a lot. People who let themselves get stuck with the bulk of the junior-level grunt work won’t.

I think it’s useful to explicitly negotiate project allocation after the first year– once the “cliff” period is over. Raises being rare at startups, the gap between an employee’s market value and actual compensation is only growing as time goes by. When the request for a raise is denied is a good time to bring up the fact that you really would like to be working on that neat machine learning project or that you’re really interested in trying out a new approach to a problem the company faces.

7. If blocked on the above, then leave. The above are reasonable demands, but they’re going to meet some refusal because there’s no shortage of young talent that is right now willing to take very unreasonable terms for the chance to work “at a startup”. So expect some percentage of these negotiations to end in denial, even to the point of rescinded job offers. For example, some startup CEOs will balk at the idea that a “mere” programmer, even if he’s the first technical hire, wants investor contact. Well, that’s a sign that he sees you as “J.A.P.” Run, don’t walk, away from him.

People tend to find negotiation to be unpleasant or even dishonorable, but everyone in business negotiates. It’s important. Negotiations are indicative, because in business politeness means little, and so only when you are negotiating with someone do you have a firm sense of how he really sees you. The CEO may pay you a million compliments and make a thousand promises about your bright future in the company, but if he’s not willing to negotiate a good deal, then he really doesn’t see you as amounting to much. So leave, instead of spending a year or two in a go-nowhere startup job.

In the light of this post’s alarmingly high word count, I think I’ll call it here. If the number of special cases and exceptions indicates a lack of a clear message, it’s because there are some startup jobs worth taking, and the last thing I want to do is categorically state that they’re all a waste of time. Don’t get me wrong, because I think most of VC-istan (especially in the so-called “social media” space) is a pointless waste of talent and energy, but there are gems out there waiting to be discovered. Probably. And if no one worked at startups, no one could found startups and there’d be no new companies, and that would suck for everyone. I guess the real message is: take good offers and work good jobs (which seems obvious to the point of uselessness) and the difficulty (as observed in the obscene length of this post) is in determining what’s “good”. That is what I, with my experience and observations, have attempted to do.

Functional programming is a ghetto

Functional programming is a ghetto.

Before any flamewars can start, let me explain exactly what I mean. I don’t mean “functional programming sucks”. Far from it. The opposite, actually. Not all ghettos are poor, crime-ridden, and miserable. Jewish ghettos existed for centuries in Europe, from the Renaissance to World War II, and many were intellectual centers of the world. Some were quite prosperous. Harlem was, at one time, an upper-middle-class African-American community at the center of some of America’s most important artistic contributions. The same is true of functional programming. It’s the underappreciated intellectual capital of the programming world, in that its ideas (eventually) trickle down into the rest of the industry, but it’s still a ghetto. A ghetto is the urban analog of a geographic enclave: it’s included in the metropolis, but culturally isolated and usually a lot smaller than the surrounding city. It often harbors those who’ve struggled outside of it. Those who become too used to its comforts view the outside world with suspicion, while those on the outside have a similar attitude of distrust toward those within. Ghettos usually imply that there’s something involuntary about being there, but that’s often not the case. Chinatowns are voluntary ghettos, in the non-pejorative sense, as are some religious communities like monasteries. “Functional programming” is, likewise, a voluntary ghetto. We’ve carved out an elite niche in the software industry, and many of us refuse to work outside of it, but we’re all here by choice.

What is functional programming? Oddly enough, what I’m about to talk about is not functional programming in the purist sense, because most “functional programmers” are not averse to using side effects. The cultural issues surrounding functional programming are not about some abstract dislike of computational effects, but rather an understanding of the necessity of managing the complexity they create, and using tools (especially languages) that make sane development possible. Common Lisp, Scala, and Ocaml are not purely functional languages, but they provide native support for the abstractions that make functional programming possible. What real functional programmers do is “multi-paradigm”– mostly functional, but with imperative techniques used when appropriate. What the debate comes down to is the question of what should be the primary, default “building block” of a program. To a functional programmer, it’s a referentially-transparent (i.e. returning the same output every time per input, like a mathematical function) function. In imperative programming, it’s a stateful action. In object-oriented programming, it’s an object, a more general construct that might be a referentially-transparent function, might represent an action in a hand-rolled domain-specific language (DSL) or might be something else entirely. Of course, most “object-oriented programming” becomes a sloppy mix of multiple styles and ad-hoc DSLs, especially as more than one developer comes to work on an object-oriented project. That’s a rant for later.

In general, functional programming is right. The functional approach is not right for every problem, but there is a right answer regarding the default abstractions for building most high-level programs: immutable data, and referentially transparent functions should be the default, except in special cases where something else is clearly more appropriate. Why? A computational action is, without more knowledge, impossible to test or reason about, because one cannot control the environment in which it exists. One needs to be able to know (and usually, to control) the environment in which the action occurs in order to know if it’s being done right. Usually, the tester wants to be able to cover all special cases of this action, in which case knowing the environment isn’t enough; controlling it is also necessary. But at this point, the state of the environment in which the test happens is an implicit parameter to the action, and making it explicit would make it a referentially-transparent function. In many cases, it’s better to do so– when possible. It might not be. For example, the environment (e.g. the state in a computer cluster database) might be too large, complex, or volatile to explicitly thread into a function. Much of real-world functional programming is not about eliminating all state, but about managing what state is intrinsic, necessary or appropriate.

For a concrete example, let’s say I have a blackboard, face down, with a number (state) on it, and I ask someone to read that (call it n) and erase the number, then write n+1 on the blackboard. I’m assuming he won’t lie to me, and that he’s physically capable of lifting the board; I want to determine if he can carry out this operation. If I don’t know n, and only see what is written after he is done, I have no hope of knowing whether the person carried out my order correctly. Of course, I could control the testing enviroment and write 5 on the blackboard before asking him to do this. If he writes 6, then I know he did what I asked him to do. At that point, though, the blackboard isn’t necessary. It’s more lightweight to just ask him, “what is 5 + 1?” I’ve moved from an imperative style of testing to a functional one: I’m determining whether his model of the addition function gives the right answer, rather than putting him through an exercise (action) and checking the state after it is done. The functional alternative is a unit test. I’m not trying to assess whether he knows how to turn over a blackboard, read it, erase it, and write a new number on it, because I only care about whether he can add. If I want to assess all of those as well, then I need to make an integration test of it. Both types of test are necessary in real-world software engineering, but the advantage of unit tests is that they make it easy to determine exactly what went wrong, facilitating faster debugging.

Testing, debugging, and maintenance are a major component of real-world software engineering, and functional programming gives us the tools to tackle these problems in a tractable way. Functions should be referentially transparent and, ideally, small (under 20 lines when reasonable). Large functions should be broken up, hierarchically, into smaller ones, noting that often these small components can be used in other systems. Why is this desirable? Because modularity makes code reuse easier, it makes debugging and refactoring much simpler (fixes only need to be made in one place) and, in the long term, it makes code more comprehensible to those who will have to modify and maintain it. People simply can’t hold a 500-line object method in their heads at one time, so why write these if we can avoid doing so?

The reality, for those of us who call ourselves functional programmers, is that we don’t always write stateless programs, but we aim for referential transparency or for obvious state effects in interfaces that other programmers (including ourselves, months later) will have to use. When we write C programs, for example, we write imperative code because it’s an imperative language, but we aim to make the behavior of that program as predictable and reasonable as we possibly can.

Functional programming, in the real world, doesn’t eschew mutable state outright. It requires mindfulness about it. So why is functional programming, despite its virtues, a ghetto? The answer is that we tend to insist on good design, to such a degree that we avoid taking jobs where we’re at risk of having to deal with bad designs. This isn’t a vice on our part; it’s a learned necessity not to waste one’s time or risk one’s career trying to “fix” hopeless systems or collapsing companies. Generally, we’ve come to know the signs of necrosis. We like the JVM languages Clojure and Scala, and we might use Java-the-language when needed, but we hate “Java shops” (i.e. Java-the-culture) with a passion, because we know that they generate intractable legacy messes in spite of their best efforts. Say “POJO” or “Visitor pattern”, and you’ve lost us. This seems like arrogance, but for a person with a long-term view, it’s necessary. There are a million “new new things” being developed at any given time, and 995,000 of them are reincarnations of failed old things that are going to crash and burn. If I could characterize the mindset of a functional programmer, it’s that we’re conservative. We don’t trust methodologies or “design patterns” or all-purpose frameworks promising to save the world, we don’t believe in “silver bullets” because we know that software is intrinsically difficult, and we generally believe that the shiniest IDE provides us enough to compensate for its shortfalls and false comforts. For example, an IDE is useless for resolving a production crisis occurring on a server 3,000 miles away. We’d rather use vim or emacs, and the command line, because we know they work pretty much everywhere, and because they give us enough power in editing to be productive.

From a functional programmer’s perspective, it’s easy to mistake the rest of the software industry for “the ghetto” (especially considering the pejorative association, which I am trying to disavow, with that word). Our constructions are stable and attractive, and we do such a good job of cleaning up after ourselves that there’s not much horseshit on our streets. Outside our walls are slums with rickety, fifteen-story tenements that are already starting to lean. The city without is sloppy and disease-ridden and everything built in out there will be burned down, to kill the plague rats, in ten years. We don’t like to go there, but sometimes there are advantages of doing so– for one thing, it’s fifty times larger. If we lose awareness of size and scale and what this means, we can forget that we are in the ghetto. That’s not to say we shouldn’t live in one, for it’s a prosperous and intellectually rich ghetto we inhabit, but a ghetto it is.

I think most functional programmers only get a full awareness of this when we’re job searching, and thanks to most of us being in the top 5% of programmers, our job searches tend to be short. Still, I’ve lost count of the number of times over the past five years that I’ve found a job listing that looked interesting, except for its choice of language. “5 years of experience in Java, including knowledge of design-pattern best practices.” Nope. It might be a good company writing bad copy, but its technical choices look exactly the same as those of the bad firms, so how can I be sure? The process quickly becomes depressing. It’s not that Java or C++ are “dirty” languages that I would never use. It’s that any job that involves using these languages full-time is so likely to suck that it’s hardly worth investigating. Occasionally, C++ and Java are the right tools for the job, but no one should try to build a company on these languages. Not in 2012. Java isn’t a language that people choose to use, not for primary development. Not if they’ve used three or four languages in their career. It’s a language that people make other people use. Usually, it’s risk-averse and non-technical managers making that call. A Java Shop is almost always a company in which non-engineers call the shots.

What we call functional programming is somewhat of a shibboleth for good-taste programming. We prefer the best programming languages, like Ocaml and Clojure, but we don’t actually restrict ourselves to writing functional programs. Do we use C when it’s the right tool for the job? Hell yeah. Do we put mutable state into a program when it makes it simpler (as is sometimes the case)? Hell yeah. On the other hand, we trust the aesthetic and architectural decisions made by brilliant, experienced, gray-bearded engineers far more than we trust business fads. We have a conservative faith in simplicity and ease-of-use over the shifting tastes of mainstream managerial types and the superficial attractiveness of silver bullets and “methodologies”. We roll our eyes when some fresh-faced MBA tells us that structuring our calendar around two-week “iterations” will solve every software problem known to humankind. Unfortunately, this insistence (often in the face of managerial authority) on good taste makes us somewhat unusual. It stands out, it can be unpopular, and it’s not always good for one’s career. Few stand with us. Most leave our camp, either to become managers (in which case, even a Java Shop is a plausible employer) or to accept defeat and let bad taste win. It’s hard to live in a ghetto.

Now, I have little faith in the stereotypical average programmer, the one who never thinks a technical thought after 5:01 pm, and who doesn’t mind using Java full-time because the inevitable slop is the problem of some “maintenance guy”. That person probably shouldn’t be programming. On the other hand, we’re about 2 percent of the software industry, if that, right now. We can reach out. We can do better. We’re not so brilliant that the other 98% of programmers have no hope of joining us. Not even close. There are many IDE-using, Java-hacking, semi-bored developers who are just as smart as we are but haven’t seen the light yet. It’s our job to show it to them, and if we fail to convince them that they could become 2 to 10 times more productive than they ever dreamed of being, and that programming can become fun again, then we’re the ones to blame. We must reach out, and we can probably bring 10, 20, maybe even 30 percent of programmers over to the light side, bringing about dramatic changes in the software industry.

I think the integrity of our industry depends on our ability and willingness to figure out how to do this.