Computing From the Middle Out, Part 1: Why Turing Machines Matter

While you’re here: my novel, Farisa’s Crossing will come out on April 26, 2019.

Computers have an undeserved reputation for being unpredictable, complicated beasts. I’m going to argue that, to the contrary, they’re quite simple at their core. In order to establish this, I’ll work through some models of computation, as well as some programming models that correspond well to real-world computation (with indications of where they don’t).

There’s a lot of complexity in real-world computing. Some of it’s desirable and some of it’s not. For example, today’s cell phones, laptops, and servers use electronic circuitry far more complex than, say, a Turing machine. That isn’t a problem because the payoff is immense and the cost to user is minimal. If the complicated adder or multiplier is a thousand times faster, most people are happy to have this way. So, even though real-world integrated circuits are complicated in ways we won’t even begin to discuss here, it’s not a problem. Doing simple things, better, is a worthy expense of complexity.

On the other hand, bloated buggy software ruins lives– this problem is largely preventable, but unlikely to improve because of conditions in the software industry (e.g., a culture that encourages piss-poor management) that are beyond the scope of the analysis here. If ever there were a machine for producing unusable crapware, it would be the American corporation. But again, that’s a topic for another time.

I’d prefer to motivate the claim that computers can be simple. They can be.

What Is Computation?

Computability theory is quite deep, but there’s a relatively simple, rule-based definition of what it means for a (partial) function to be mathematically computable. Our domain here is functions Nn → N; that is, from lists of natural numbers to natural numbers.

  • The n-ary zero functions z1(x) = 0, z2(x, y) = 0 , … , are computable for all n.
  • The successor function s(x) = x + 1 is computable.
  • For any nk < n, the projection function pn,k(x1, … , xn) =xk is computable.
    • p1,1(x) = x, the identity function, and p2,1(x, y) = x, f2,2(x, y) = y are the most used examples.
  • Composition: compositions of computable functions are computable.
    • For example h(x, y) = f(g1(x, y), g2(x, y), g3(x, y), g4(x, y) is computable if f and all the gi are.
    • This means that a computable function can use as many computable functions as it wants as subroutines.
  • Primitive Recursion: if g and h are computable, then so is f, defined like so:
    • f(0, x1, … , xn) = g(x1, … , xn), and
    • f(n + 1, x1, … , xn) = h(nf(nx1, …), x1, …);
    • this is the recursive analogue a for-loop; the number of calls is bounded.
  • Search (a.k.a. General Recursion): if f is computable, then so is mf, defined as:
    • mf(x1, … , xn) = k where k is the least integer where f(kx1, … , xn) = 0.
    • We say mf(x1, … , xn) ↑ (pronounced “diverges”) if there is no such k. The function is not defined at that point.
    • this is analogous to a while loop. If the function diverges, an implementation would not terminate– unless the programmer could predict divergence in advance, but this is not always possible.

Functions that don’t use search are called primitive recursive. Those are total– they have values for all inputs, and more importantly, these values can be computed in a finite number of steps. If one uses general recursion, though, all bets are off. The function may not be defined for some inputs.

For example, addition is primitive recursive. It’s defined like so:

add(0, x) = x

add(n + 1, x) = s(add(n, x))

In the language above, g(x) = x and h(nax) = s(a).

Multiplication is a primitive recursion using addition rather than the successor function. One can also show that limited subtraction, sub(xy) = max(x – y, 0) is primitive recursive.

Furthermore, any bounded search problem is primitive recursive. If you have an upper bound on how far you’re willing to search, you can use a primitive recursive function.

Sometimes, it’s a judgment call how one wants to implement it.

For example, the division function can be represented as:

div(nd) is the first q such that qd< (q + 1) * d.

Perform an unbounded search for such a q and, when d = 0, this diverges. However, in this case we know when the function’s badly behaved and can rectify it:

idiv(nd) is 1 + div(nd) if d > 0, and 0 if d = 0.

It returns a positive integer on success– a successful return of 0 becomes a 1– and a 0 on failure. The enclosing routine can decide how to handle the error case.

Divisibility checks (nothing but 0 is divisible by 0) and primality are primitive recursive and therefore total computable within finite time. Most importantly, prime factorization is primitive recursive. This is something we’ll come back to.

Turing Machines

Most people have heard of Turing machines, but unless they have taken a course in graduate-level logic or the theory of computation, they’ve probably never worked with one– and may not know what it is.

They have the reputation of being complicated beasts. They’re brain-dead simple, actually. Doing anything with them, that’s the part that can be painful. The ones that we inspect and analyze as computers tend to have massive state spaces– which may or may not be a problem– while the most aggressively minimalistic ones– I won’t prove it, but there are machines with under 20 states and two symbols that can compute any function– tend to be inscrutable in practice.

Formally, an (n, s) Turing machine is a device that:

  • recognizes a pre-programmed alphabet of n > 2 symbols. That set could be {0, 1}, or {A, B, C}, or the 100,000 most common English language words. One of these symbols is blank.
  • is in one of s distinct internal states, including one called Start and one called Halt. This set must be finite and is pre-programmed into the machine.
  • has n * (s – 1) pre-programmed rules, written as (sold, ain, snew, aout, ±1), one for each (sold, ain) pair except for those where sold = Halt.
  • reads and writes to a tape– each cell holding exactly one symbol– that never runs out in either direction.

And here is how it works:

  • Input: a finite number of cells may be set to any non-blank values. (The rest of the tape is all blank, in both directions.)
  • Initialization: the machine is put in state Start.
  • Runtime: Over and over, the machine does the same thing:
    • read the symbol (ain) at the cell where the machine is, and consult its internal state (sold);
    • fetch the matching rule (sold, ain, snew, aout, ±1);
    • write aout to the tape, and transition to state snew;
    • move right if the matching rule’s last column had a +1; left, if -1;
    • repeat this cycle unless snew is Halt, in which case the machine terminates. Whatever is on the tape is the program’s output.

What happens if the Turing machine never goes into the Halt state? It runs forever. This is generally considered undesirable. The computation doesn’t complete.

This is probably the biggest disconnect between Turing machines and the computers we actually use. Turing machines are supposed to halt. If one doesn’t, that’s considered pathological; its work isn’t done and as far as we’re concerned, it hasn’t computed anything. Meanwhile, the cell phones and laptops we use on a daily basis run in an infinite loop and that’s what we expect them to do. We expect them to be available (and I’ll formalize that much later, but not in this installment) but they never halt.

A Turing machine is all-or-nothing. Its job is to compute one function and then indicate that it’s done by going into the Halt state. For a contrast, a real-world computer, at the minimum has to respond to real-world inputs like the user’s keystrokes, its own temperature sensors (so it doesn’t run too hot), and power supply disruptions. Later on, I’ll show how to close this gap.

What’s neat about Turing machines is that, in principle, one could have been built in the late 19th century. (My work on Farisa has had be on a steampunk kick.) We were close: we had programmable looms, player pianos, and electricity. We had record players and magnetic storage. Today, a Turing machine good enough to emulate a 1980s video game console could be built with about $100 of commodity electronics. Rather than get into the details– it’s not my expertise– I’ll point the reader to Ben Eater’s excellent series of videos on the 8-bit computer he built on a breadboard. As he’s building an actual circuit, his model gives a much better representation of what computers actually do, in the physical world, than do Turing machines.

Anyway, an automaton is only as good as its ruleset. Most rulesets will have the machine pinging about at random– sound and fury, signifying nothing. A few, though, do useful things. A Turing machine can add two numbers, whether specified in binary or decimal that are supplied on the tape. These machines can multiply, or check regular expressions, or… well, literally anything computable. In fact, that’s one definition of what it means for something to be computable– they are legion, and they’re all equivalent.

It’s counterintuitive to most people, but the slowest computers from the 1960s can do anything a modern machine can– they would merely take longer. In terms of what computers can do, nothing has changed. If we allow computers to generate probabilistic bits, they even quantum computing does not add capabilities– quantum computers are merely faster.

From a practical perspective, computers and programming languages are not remotely equivalent. In theory, they are.

Now, Turing machines would be nearly useless as a real-world concept, say, if they required 2210,000 states in order to do useful computation. It would be annoying if there were computations that couldn’t be done with fewer states, because we have no way to store that much information. In fact, one can find fairly small n and s, and specific rulesets, that can emulate any Turing machine (any size, any ruleset) on any input at all. These are called universal Turing machines. I’m not going to go through the details of building one and proving it universal, but I’ll walk through the basic concepts, along two different paths.

We are not concerned with how efficiently the machines run– as long as they terminate, except on problems where no machine terminates. Real world computers are sufficiently different from Turing machines that the the (heavy) performance implications here are irrelevant.

  • First, a Turing machine’s read-fetch-write-transition-move cycle is mechanical. We can implement it over all (ns) Turing machines with a machine using sf(s), where f is a slow-growing function, states. We include the ruleset we want as an input– a lookup table– and our machine implements the read-fetch-write-transition-move cycle against that table instead.
  • Operating on k-grams of symbols allows us to use an n-symbol Turing machine to emulate an nk-symbol machine. We can in practice do any of this work with a 2-symbol machine.
  • An (n, s) Turing machine can emulate a Turing machine with a larger state space (say, s2 states) by writing state information to the tape. The details of this are ugly, and the machine may take much longer, but it will emulate the more powerful machine– by which, I mean that it will come to the same conclusions and that it will halt if the emulated machine does.

This approach isn’t the most attractive, and it has a lot of technical details that I’m handwaving away, but using those techniques, we can emulate, say, all the (n2,  s2) Turing machines using an (nf(n, s), kg(ns)) where f and g are asymptotically sub-linear (I believe, logarithmic) in their inputs. The result is that, for sufficiently large n and s, machines can be build that emulate all machines at some larger size– and, of course, a machine at that size can emulate an even larger one. The cost in efficiency may be extreme– one could be emulating the emulation of another emulator emulating another emulator… ad nauseum– but we don’t care about speed.

If that approach is unappealing, here’s a different one. It uses the symbols: {0, 1, Z, R, E,+, <, _, ~, [, ], and ?}– in two colors: black and red; 1, Z, E, and R will never be red. This gives us 20 symbols. The blank symbol is the black 0.

Here’s a series of steps that, if one goes into enough detail (I’ll confess that I haven’t, and the machines involved are likely wholly impractical) can be used to construct a universal Turing machine.

Step 1: establish that copying and equality checking on strings of arbitrary length can be done by a specific, small Turing machine.

Step 2: use a symbol Z and put it between two regions of tape at (without loss of generality) tape position 0. Use it nowhere else. Use a symbol R to separate the right side of the tape into registers. These will hold numbers, e.g. R 1 0 1 R 1 0 0 0 1 R 0 R means that 5, 17, and 0 are in the registers. Resizing the registers is tedious (everything to the right must be resized, too) but it’s relatively straightforward for a Turing machine to do. There will be an E at the rightward edge of the data.

Step 3: The right side of the Z stores a stack of nonnegative integers: 1s and 0s (representing binary numbers) separated by register symbol R. The left side stores code, which consists of the symbols {0, +, <, _, ~, [, ], ?}. Only code symbols can be red.

  • A possible tape state is: E0+++++0+0+?0+++Z 101 R 1 R 0 R 1 E. (Spaces added for convenience.) The left region is code in a language (to be defined); the red zero indicates where in execution the program is; on the stack we have [5, 1, 0, 1] with TOS being the righthand 1.

Step 4: A Turing machine with a finite number of states can be an interpreter for StackMan, which is the following programming language:

  • At initialization, the stack is empty. The stack will only ever consist of nonnegative integers. We’ll write stack left-to-right with the top-of-stack (TOS) at the right.
  • 0 (“zero”) is an instruction (not a value!) that puts a 0 on top of the stack, e.g. ... X -> ... X 0.
  • + (“plus”) increments TOS, e.g. ... X 5 -> ... X 6.
  • _ (“drop”) pops TOS, e.g. ... X Y -> ... X.
  • ~ (“dupe”) duplicates TOS, e.g. ... X -> ... X X.
  • < (“rotate”) pops TOS calls it n and then rotates the top n elements left. This may be the most tedious to implement. Examples:
    • ... X Y 2 -> ... Y X
    • ... X Y Z 3 -> ... Y Z X
    • ... X Y Z W 4 -> ... Y Z W X
  • ? (“test”) decrements TOS, then pushes a 1 on the stack, if TOS is nonzero; otherwise, it pushes a zero, e.g.:
    • ... 6 -> ... 5 1.
    • ... 0 -> ... 0 0.
  • This is a concatenative language, so instructions are executed in sequence one after the other. For example, +++ adds 3 to TOS, 0+++0+++ pushes two threes on it, _0 drops TOS and replaces it with a zero (constant function), and ?_?_?_ subtracts 3 from TOS (leaving a 0 if TOS < 3).
  • Code inside [] brackets is executed repeatedly while TOS is nonzero and skipped over once TOS is zero or if the stack is empty.
    • For example, 0+[] will loop forever because TOS is always 1.
    • The code [?_0++<+0++<]_ has behavior ... x y -> ... x + y. It’s an adder. For example, if the stack’s state is ... 6 2, it does the following:
      • The code in the brackets is executed. ? tests the 2, so we have 6 1 1, and we immediately drop the 1. The 0++< (“fish”) is a swap, so we have 1 6, and the + gives us 1 7. We do another 0++< and are back at 7 1.
      • The next cycle, we end up at 8 0; after that, TOS is zero so we exit our loop. With a _, we are left with ... 8.
  • Any instruction demanding more elements than are on the stack does nothing.

The interpreter for this language can be built on a Turing machine using a finite number of states. To keep track of the code pointer (i.e., one’s place in the stored program) while operating on the stack, color a symbol red. Make sure to color it black when you have moved on.

Step 5: show that any primitive recursive function Nn → N can be computed as a fragment of StackMan, taking the arguments from the stack; e.g.,

  • f(x, y, z) = x + y * z could be implemented a fragment with behavior ... x y z -> ... (x + y * z).

This isn’t hard. The zero functions and successor come for free (0, +) and the projection functions (data movement) can be built using _, ~, and <. Composition is merely concatenation– we get that for free by nature of the language. We can get primitive recursion from ? and principled use of [] blocks, and general recursion from arbitrary [] blocks.

Thus, a StackMan interpreter is a Turing machine that can compute any primitive recursive function.

Next, show that any computable function Nn → N can be computed as a fragment of StackMan that will terminate if the function is defined. (It may loop indefinitely where it is not.)

Step 6: since prime factorization is primitive recursive, we can go from lists of nonnegative integers to a single nonnegative integer, using multiplication (one way) and prime factorization the other way: e.g. (1, 2, 0, 1) ↔ 2* 3* 5* 71 = 126. This means that we can coalesce

Step 7: show that all (ruleset, state, tape) configurations can be encoded as a single integer. Then show that the Turing step (read-fetch-transition-write-move) and the halting check are both primitive recursive. These capabilities can be encoded as StackMan routines. (They’ll be obnoxiously inefficient but, again, we don’t care about speed here.)

Step 8: then, a Turing machine can be built with a finite number of states that:

  • takes a Turing machine ruleset, tape, and state configuration and translates it into a StackMan program that repeatedly checks whether the machine has halted and, if not, computes the next step. The read-fetch-transition-write-move cycle will be performed in bounded time. The only source of unbounded looping is that the emulated machine may not halt.
  • and, therefore, can write and run StackMan program that will halt if and only if the emulated configuration also halts.

Neither of these approaches leads to a practical universal Turing machine. We don’t actually want to be doing number theory one increment (+, in StackMan) at a time. Though StackMan can perform sufficient number theory to emulate any machine or run any program– it is, after all, Turing complete– it is unlikely that the requisite programs would complete in a human life. But, in principle, this shows one way to construct a Turing machine that is provably universal.

Human Computation

This installment is part of what was a larger work. I’ve decided to put it out in pieces. I titled it, “Why Turing Machines Matter”, but I had to start with a bunch of stuff that most people would think doesn’t matter– a stack-based esoteric language, some number theory review, et cetera. I haven’t yet motivated that this concept actually does matter. So, let me get on that, just briefly.

Mathematicians and logicians like Turing machines because they’re one of the simplest representations of all computers, and the state space and alphabet size don’t need to be unusually large to get a machine that can compute anything– although it might be slow. Alan Turing’s establishment of the first universal Turing machine led to John von Neumann’s architecture for the first actual computers.

Is it reasonable to assume that Turing machines perform all computations? Well, that’s one way that computability is defined, but it’s a bit cheap to fall back on a definition. It’s more accurate to look at the shortcomings of Turing machines and decide whether it’s reasonable to believe a computer can be built that overcomes them.

For example, some electronic devices are analog, and Turing machines don’t allow real-numbered inputs. Everything they do is in a finite world. But, in practice, machines can only differentiate a finite number of different states. There’s no such thing as a zero error bar. Not only that, but quantum mechanics suggests that this will always be the case. For example, there are an infinite number of colors in theory, but humans can only differentiate a few million under best-case circumstances, and we can only reliably name about a hundred. It’s the same for machines: measurements have error. Of course, an infinite state space isn’t allowable either: that would be analogous to infinite RAM.

So, those shortcomings of Turing machines apply to all computers that we know– including (in a different way) the quantum computers humans know how to build.

Turing machines, as theoretical objects, can’t do I/O. The input exists all at once on the tape, and output is produced– and until that output occurs, no computation has been completed. One alteration to account for this is to allow the Turing Machine an input register that other agents (e.g., keyboards, temperature sensors, the camera) can write to. When the computer is in a Ready state, it scans for input and reacts appropriately. If the machine reaches Ready within a finite time interval, that is analogous to successfully halting– the software itself may be broken, but the machine is doing its job.

In truth, modern computers are more accurately modeled as systems of interacting Turing-like machines than single machines– especially with all the multitasking they have to do to support users’ demands.

There is one thing Turing machines don’t do that we take for granted, although it’s a bit of a philosophical mess: random number generation. Turing machines don’t model it: everything they do is deterministic, and “random” is not a computable function (or a function at all). Real computers most often use pseudorandom number generators (PRNGs)– which are predictably (but ideally without pattern) “random”– and Turing machines can implement any of those. Truly random? Well, we don’t fully know what that is. We can get “random enough” with a PRNG or from some input that we expect to be uncorrelated to anything we care about (e.g. atmospheric noise, radioactive decay).

Turing machines give a poor model of performance as described here. To access data at cell 5,305, from cell 0, the machine has to go through every cell in between. That’s O(N) memory access, which is terrible. Luckily, real computers have O(1) memory access, right? That’s why it’s called random access memory, eh? Well, not quite. Caching is too much of a beast for me to take on here, but I would argue this far: a Turing machine with a 3-dimensional tape– I haven’t gotten into this, but a Turing machine can have any dimensionality and be computationally equivalent– is more faithful model for performance. Why? Well, our best case or random access is O(N1/3). . We can call random access into a finite machine O(1), but that’s moving the goalposts. Asymptotic behavior is only about the infinite, and the real world is constrained by the speed of light. If have a robot moving around a 3-dimensional cubic lattice where each cell is 100 microns on a side (no diagonal movement) and we want each round trip to complete in one nanosecond (30 cm) then we are limited to 125 trillion cells. Going up to 1 quadrillion would double our latency. Of course, we’re ignoring the absurdity of a robot zipping around at relativistic speeds.

Happily, most computers don’t have the moving part of a robotic tape head (although a traditional hard drive may be analogous). Rather than the computation going to the data (in the model of a classical Turing machine) they, instead, bring the data to the chip. Electrical signals travel faster than a mechanical robot, as on a literal Turing machine, could (without catastrophic heat dissipation). So, in this way, modern computers and Turing machines are quite different.

If anything, I’d make a different claim altogether. Turing machines aren’t a perfect model of what computers do– although they’re good enough to explain what computers can (and can’t) do. They are, perhaps surprisingly, a great representation of what we do when we compute.

Before “a computer” was a machine, it was a person whose job was to perform rote operations– addition, subtraction, multiplication, division, elementary functions, and moving data around– which is, as it were, all today’s computers really do as well. And how does a human compute, say, 157,393 * 648,203? Most of us would have to reach for paper– a two-dimensional Turing tape– and start going through rote operations. To transliterate schoolbook multiplication to be done by a Turing machine is tedious but not hard– there are a couple thousand states.

The plodding Turing machine isn’t “about” computers. It’s about us, moving around a sheet of paper with a pencil and eraser, as we do– at least, when we know we’re computing. Most of what we do, we don’t think of computation at all. We’re not even aware of computation happening.

It’s an open question whether there’s a non-computational element to human experience. I tend to be unusual– by the standards of, say, Silicon Valley, I’m downright mystical– and I think that there is. I can’t prove it, though. No one can.

The difference between intuition and computation is that the latter happens by rote, from a precisely-understood, finitely-describable state, following a series of rules that require no judgment. Intuition can’t be checked; computation can.

Most mathematicians use informal proofs– verbal arguments that convince intelligent, skeptical people that a conclusion is valid. This is a social rather than algorithmic process, and it is not devoid of error. Informal proofs can be unrolled into formal proofs from ZFC, it is generally believed, but it would typically be impractical to check. An informal proof is an argument (using other informal proofs) that a formal proof exists, and although the informal proof is imperfect– of course, 100-percent perfection in computation is not physically possible, either– it usually gives more insight into the mathematical structure than a formal one would.

Do humans have non-computational capabilities or elements to our existence? I believe so. But, in terms of what we can communicate to each others with proof– that is, checkable computation– we are limited to finite strings of finite symbols, an agreed-upon initial state, and a finite set of rules. At least in this life, that’s the best we can prove.

Next Up

In the next installment, I’m going to show how to build a Turing machine that’s practical.

Aggressively minimal universal Turing machines– with, say, only 10 states and 5 symbols– tend to be next-to-impossible to understand. I’m going to work with a large-ish state space and alphabet: 512 symbols and 248 possible states (even though we’ll only use about a million). Those numbers sound beastly, and to implement the Turing machine as a lookup table would require 1,884,160 terabytes. At such a size, storing the entire ruleset is cost-prohibitive. Most rulesets for those parameters are patternless and unmanageable, but a ruleset that we’d actually want to use is likely to be highly patterned– allowing rules to be computed on the fly. In fact, that’s what we’ll have to do.

In the second installment, we’ll build a Turing machine about as capable as a 1980s video game console (e.g. Atari, Nintendo) that’ll be much easier to program against. That’s up next.

Don’t Be Like Ajay

There’s a lot of bad career advice out there, but the worst of it comes from people who’ve been successful at private-sector social climbing. Blind to their own privilege, and invested in the perverse mythology of corporate meritocracy, they are least equipped to perceive the truth– not to mention their lack of incentive to share it, in the off chance of discovering it. At the same time, these people can say anything and get it into print, so desperate are the rest of us, the proles, to hear the inside corporate secrets they purport to have.

There are no secrets. The corporate system is corrupt; it is not a conspiracy. It is exactly what it looks like; the powerful abuse the powerless, the rich get richer, and people who speak the truth about it are punished.

This pestilent article, “What College Grads Could Learn From My Former Intern“, comes from Zillow CEO, Spencer Rascoff. Now, I have no personal knowledge of the author, and I know even less about the “Ajay”– that may or may not be his real name; it doesn’t matter– so I’m going to stick to the merits of the article itself.

This I will say: venture-funded startup CEOs are the worst when it comes to self-deception and the profligate evangelization of nonsense.

Venture capital, at least in the technology industry, has become a mechanism for the replication of privilege. Well-connected families create the appearance of their progeny having built businesses from scratch when, it fact, they had all sorts of hidden advantages: tighter sales advantages, fawning press coverage, and most importantly, the privilege not to worry about personal financial nonsense. (If their businesses tanked, they’d fail up into cushy executive jobs, often as venture capitalists.) It’s money laundering, plain and simple, and it’s not even well hidden since it’s technically not illegal.

The corporate system is a resource extraction culture, not unlike the ones in culturally impoverished, oil-rich societies that never needed to grow or innovate, because they could pump wealth out of the ground. In this case, though, the depleting resource is the good faith of the American middle class– an earnest belief in hard work, an affinity for technology, an acceptance of authority. The purpose of the ruse is to make it look like “this time it’s different” and that today’s elite, unlike the warlords and viscounts of the past, actually earned it.


Ajay, the protagonist of this second-rate Horatio Alger story, was a hard worker, eager to please, by the author’s description (emphasis mine):

Ajay did [difficult, unpleasant work] eagerly and with a smile; he worked incredibly hard and because of that, built a reputation for himself as someone who would pitch in to help with anything you asked and give it his best effort. People liked that.

I almost retched when I came upon “and with a smile”. Gross.

My thoughts, for the rising generation? Yes, work hard when it’s worth it to work hard. In fact, I would not try to give advice to the young about “work-life balance” or tell them that they should backpack around Australia for two years. It’s hard enough to achieve something significant during peace time; it’s much harder in 2018, when the rich have made it so much harder for anyone to get a chance. One cannot produce significant work in any field and also have the Instagram party life.

This said, there is difficult, unpleasant work worth doing; there are other tasks that are waste. If one has to do the job with a goddamn smile to get credit for it, then it’s almost certainly in the latter category.


Bosses might like, on a personal level, those who do unpleasant work with a smile. That doesn’t mean that it leads to career success. It’s never good to be disliked by a manager, but bosses don’t get to promote everyone they like. If one is well-liked only because of having made it a path of least resistance to give one unpleasant, career-incoherent work, then one is in a state sustained only by suffering, that one can almost never turn into career advancement.

I’d also like to point out the author’s corporate weasel terminology. He says, “People liked that.” He liked it. There’s nothing sinister or surprising about a boss liking someone who’s preternaturally “easy to manage”. What’s galling is that, like most corporate bosses, he felt entitled to superpose his opinion over the entire company. It’s like when managers fire people but want to avoid taking responsibility, so they say “the team decided”.

I would guess that many people disliked Ajay. They saw what he was doing, and they cringed.

Of course, if Ajay succeeded, then their opinions didn’t matter; those people didn’t win. Still, it’s generally not useful to be disliked by one’s colleagues, and no one likes ass-kissers.

Ajay was also a serial networker, even all the way up to me, the CEO.

It’s funny how blind CEOs are the politics that exist all around them. Since they get everything they want, there’s “no politics” in the organization. I suppose that’s true. The ultimate solution for someone who wishes to abolish politics is despotism– the degenerate but nominally apolitical arrangement. Most of us don’t want that, of course.

At any rate, if Ajay’s colleagues and managers tolerated “a serial networker”, it’s because they never saw him as a threat until he was fully ensconced in the managerial sun. Perhaps they were wrong and got blindsided. Like I said, I don’t know these people.


In general, though, the idea that a 22-year-old can try to rub elbows with a CEO, in a competitive environment like a startup or investment bank, and not get shanked by someone at or above his own level, is laughable. The people with the training to pull this off are those with inherited wealth and social resources, who have the least need for “internal networking” because of the extensive external networks their Daddies gave them.

When Ajay left to finish school and go on to various startups, he continued to build upon his brand and kept in touch—essentially marketing himself through his networks.

Emphases mine. There’s nothing incorrect about “essentially”; I just wanted to highlight an unnecessary adverb that really, totally, very badly, irritatingly weakened the prose.

I want to focus more on “build upon his brand”. (The author could have taken out “upon” and nothing would have been lost, but there’s actual incorrectness here, so I shan’t dwell on it.) See, what got me to write this response is not that the author’s giving misguided career advice. To be honest, I couldn’t give better advice that Forbes readers (if my estimation of its demographic is correct) would want to hear. I’d offer the truth– the game is rigged and most people will lose no matter what they do– and that’s not a charismatic message. No, I’m writing this response because the notion of “personal brand” is, to me, sickening.

I am not a brand. There are not five hundred of me stacked on a shelf in grocery store, all in neat order like the rectangular boxes they put toothpaste tubes in. You, dear reader, are not a brand either. If you don’t cringe when you hear the words “personal brand”, then wake up.

People who use the term “personal brand” without dripping contempt are a special breed of douchebag. What’s amusing is that, while they identify “personal brand” with their desperate claims of uniqueness, these people are pretty much all the same.

It is bad advice. The truth is that people who focus on “building their brand” are assumed by their colleagues not to be doing the work, and they’re the first ones to get shanked when things get difficult. Perhaps Ajay succeeded. Perhaps he’s in a corporate jet, still smile. Or perhaps he used his bonus on plastic surgery to fix that frozen-face smile after getting kicked out of a funeral for the goddamn last time.

You want to be remembered, whether you’re joining a company of five or 500, because remembered people get opportunities; anonymous ones don’t.

Remembered people get denied opportunities.

I’ve been involved with the antifascist cause since 2011. I’ve been turned down for jobs because of a somewhat public (and, in cases, adversarially publicized) track record of having the backbone to stand up for what’s right.

When it comes to social media, employment references, and personal uniqueness, we live in a 500-mile world. As in, follow any driver for 500 miles, and you’ll find a reason to write him up. It used to be difficult (literally, and in metaphor) and time-consuming to follow one person so far; technology and surveillance have made it easier.

I’ve been a hiring manager. I was always sympathetic to people with controversial online histories, for obvious reasons, but it’s the most common reason for denying a job to someone good enough to make it to the final round. No, these people aren’t alt-right psychopaths or proud, public drug users. Usually, they’re normal people who just happen to hold opinions. It’s assumed that they’ll get bored, or that they’ll react badly to mistakes made from authority. I did, on one occasion, cringe when a startup executive commented on a black woman’s natural hair being “political”.

The people who rise in the corporate system are boring. The best odds, in the corporate game, come from becoming the most bland, inoffensive, socially useless person one can. The problem with this truth– the reason it lacks business-magazine charisma– is that its odds are still poor. There are a lot of perfunctory losers out there, and they don’t all get executive jobs. Most of them get the same shitty treatment and outcomes as everyone else.

Not being boring, though, means that someone only has to follow you for 25 miles to find a reason to screw you over, damage your reputation, or deny you a job.

The optimal strategy is to be boring, to ingratiate oneself to powerful people over time, and to become intertwined enough with an organization’s powerful people that one is perceived to have undocumented leverage, and therefore gets what one wants out of the organization. Does this strategy work for everyone, all the time? No. The odds are depressing– most social climbers fail. But the odds are even worse for all the other strategies.


“How do you effectively brand yourself without being a peacock or a sycophant?” There are two ways: intentionally constructing it and being patient.

There are several ways to brand yourself. The classic approach is apply pressure with iron, heated in a fire. At high enough temperatures, permanent scars can be achieved in two or three seconds. Electric arcs are sometimes used for this process. An alternative to thermal burns is “cold branding”, often using liquid nitrogen. There seems to be no risk-free option, since branding literally is skin damage.


The same should be true for you: “Work with Sophia—she has a great attitude, big ideas, and is really hard-working.”

This guy must be getting paid per word. The Hemingway editor yells at me; I use adverbs. They’re not always unnecessary and replacing one with a clunky adverb-free adverbial phrase isn’t my way. Still, not only is the “really” unnecessary, but the author could have said “works hard”.

Whatever you decide to pursue as your personal brand, make sure it has a strong purpose behind it. If you do that, the rest is just packaging.

“Just packaging.” A product’s brand is literally that: packaging. Brand is the use of identical-looking boxes to convince buyers that a minimum standard of quality has been met. A Hershey Bar isn’t going to blow me away, but it’s perfectly adequate. I know that when I buy one, I’m unlikely to find a severed housefly wing in it.

If you want “perfectly adequate” on your tombstone, then consider being like Ajay– a brand. That said, you might want to pull that smile down. Do your job and do it well, of course, but if you smile so much, you’ll make everyone hate you. No one wants to compete for attention with an ass-kisser.

The Truth

As I said, I found the article harmless till I got to the “personal brand” bit.

There’s a lot of bad career advice out there from successful people (most of whom lucked into, or were born into, what they have). There’s also a lot of bad career advice from unsuccessful people who’ve found success selling the “inside secrets” of a corporate game they never actually won– now that is personal brand. The well-meaning self-deception will never go away, nor will the intentionally deceptive sleaze. There are many gamblers who “have a system” for beating roulette wheels and slot machines. Many books have been written on their systems. They do not work. The house wins in the long term. That’s why it’s the house.

The house is smart enough to keep people coming in. So it offers intermittent small wins, and a few big ones that generate publicity. It’s very hard for lottery winners to keep their windfalls private; lotteries discourage it. In these corrupt career lotteries, though, the system doesn’t have to make it hard for game winners to stay private. They shout in open air; they never shut up.

Is “be like Ajay” good advice? I don’t know, because I don’t know who Ajay is. Perhaps he was a ruthless political operator, fully aware of the resentments his supplicating smiles generated, and he used them for some sort of eleven-dimensional manifold socio-economic judo so brilliant it’s beyond my comprehension. Perhaps Ajay’s reading this blog post on Trump’s golden toilet, laughing at me. For the average schmuck, though, it’s not good advice. Of course, don’t be incompetent. Don’t be too grumpy. Be the “go to” guy or girl for work you genuinely enjoy and are good at. But, as a favor to yourself, don’t become a dumpster for career-incoherent work. Also, don’t smile all the time; it’s creepy.

I would love to advise authenticity, but that is also not a good approach for someone who needs to squeeze money out of the corporate system– and most people have no other choice.


There’s no path I can sell for the individual. The situation, in truth, is quite dire. In Boomer times, the corporate system seduced people with greed: $500 executive lunches, business-class travel all over the world, and seven-figure bonuses just for showing up. Today, it runs on fear. Fear’s cheap. Most Ajays won’t succeed; I can say that with confidence. I can also say that most anti-Ajays won’t succeed. Most people won’t succeed. The corporate game is rigged and anyone who says otherwise is trying to sell something toxic. I have no elixir of socioeconomic invulnerability; I’ll admit that. There’s a massive market for false hope. I will not sell into it. I am better than that.

For the world– if, sadly, not always the individual– it would be better if we woke up, tore down the corporate system brick-by-brick like the Bastille, and replaced it with a fairer, more sensible, pro-intellectual style of society worth caring about. If enough of us had the courage to live in truth, consequences be damned, the whole corporate edifice would crumble and we’d all be better off for it.

It’s not easy to live in truth. It’s downright hard to change a world whose most powerful people loathe any change at all. A first step, though, might be for us, unhindered by mercy, to mock anyone and everyone who says “personal brand” without vehement contempt for the concept. If we work together, we can make such people shut up. That would be a start.