technical blog

20 December 2016

Code is Data

Homoiconicity, AKA “code is data”, is a frequently cited benefit of using a Lisp. When people reference this, the first thing they talk about is usually macros, and how powerful and transformative they are.

In Paul Graham’s Beating the Averages, for example, he points to macros as the tool that elevates Lisp above other languages. He goes on to provide an interesting statistic:

The source code of the Viaweb editor was probably about 20-25% macros.

As a Clojure programmer, this makes me want to run away screaming.

Or to put it less dramatically, in idiomatic Clojure writing macros is an exceptionally rare occurance, so the idea of having so much of a code base made up of macros is both astonishing and mildly terrifying to me.

However, as Paul Graham has literally written the book on ANSI Common Lisp, it’s probably charitable to assume that what’s idiomatic Clojure is not necessarily idiomatic Common Lisp. But that still leaves two questions:

  1. Why are macros uncommon in Clojure in particular?
  2. Why is Clojure homoiconic if macros are used so rarely?

Why are macros uncommon?

Much of Clojure design revolves around the idea of simplicity, which in Clojure parlance means reducing the way things interact with one another. The more things interact, the more permutations we have to deal with, and the harder it is to make predictions about the outcome. When there are many interactions, we say that a system is complex.

This is why we prefer pure functions over side-effectful ones in Clojure, and why we refer to Clojure as a functional programming language. Functional programs are simpler, in the sense that their interactions are more constrained than programs that make use of mutable state.

It’s also the reason we avoid macros in Clojure. Macros are powerful, but complex. A macro is passed both unevaluated forms and the local environment, giving it a far wider array of things it can interact with. If we want to reduce interactions, macros are the last tool we should be using.

This is summed up in a 2010 talk from Stuart Sierra:

The First Rule of Macro Club: You do not write macros.

It’s hyperbole, but not by much. A good rule of thumb I’ve found is that in any project, you should be able to count off the number of unique, non-core macros you use on one hand.

Why is Clojure homoiconic?

If Clojure rarely uses macros, why is Clojure a Lisp? Why build its syntax around a tool that’s used only occassionally? Why not instead adopt a more familiar syntax and expose an specialised AST for language extensions?

I’m not Rich Hickey, so I can’t give you an authorative answer. But I can give you my best guess, and that is:

Clojure is homoiconic to make data a first class citizen.

Immutable data is the simplest possible element of a codebase, because it generates no new interactions. The integer 1, for example, can be found in thousands of different software projects. It’s the same number, but it doesn’t connect the projects it’s used in. If one of the goals of Clojure is simplicity, then it should help us maximise our use of immutable data.

In most programming languages there’s a clear segregation between code and data. Syntax is generally designed to favour writing code over writing literal data structures. The focus is on how to write better code, not how to write better data. I’d categorize them as code-first languages.

Clojure takes a different approach. Because it’s homoiconic, there’s no distinction between code and data. This is true for all Lisps, but Clojure goes beyond traditional S-expressions by providing an extremely expressive and extensible syntax for describing literal data.

Clojure’s literal syntax is more than just sugar; it’s a deliberate decision to orientate the language around writing data. Clojure pushes the notion you should be thinking about data before code, to a degree I haven’t seen in anywhere else. Clojure is a data-first language.

A common misconception

There’s a common misconception that because Clojure is a Lisp, code transformation is encouraged, even idiomatic. Homoiconicity is seen as an excuse to write more code, rather than to write more data.

For instance, a recent post had this to say on the subject:

You also often hear from Lisp proponents that Lisps are great because code and data are described in the same way. The implication being that it’s simpler because it’s one thing to learn instead of two and that when you learn to transform data you learn to transform code, enabling powerful tooling.

This might be the implication for a language like Common Lisp — certainly Paul Graham believe so — but Clojure has very different motivations. Misunderstanding this leads to a very warped view of the language:

The features you have in your language and the features you choose not to have are difficult decisions that subtly and not so subtly shape the community, libraries and other code written in the language. Lisps punt on these hard questions in favor of allowing users to write their own languages on top of Lisp. This creates a Wild West where Domain Specific Languages (DSLs) abound.

In eight years of writing Clojure, I’ve used less than a dozen DSLs, and half that number with any regularity. If the author hadn’t previously mentioned Clojure, I’d have assumed it was talking about an entirely different Lisp.

The author isn’t unfamiliar with Clojure; he’s even written a couple of libraries for it. Despite this, his post is built upon a fundamental misunderstanding about Clojure’s design.

This misunderstanding stems from how Lisp is traditionally presented. There’s a lot of literature on the benefits of Lisp in general, and it’s natural to assume that Clojure is treading the same ground. When we introduce Clojure, I don’t think we do enough to explain how it differs, and how this affects the way we program in it.

When the words “code is data” come up, it’s tempting to point people toward macros as a demonstration of what that means. But in Clojure macros are a specialized tool. If we use the usual Lisp idioms to explain Clojure, we fail to communicate what makes Clojure a unique language.

Perhaps instead perhaps we should put explanations that involve macros and code transformation to one side, and focus on telling people how Clojure approaches data differently to other languages. If we do that, maybe fewer people will end up disillusioned.