Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Amplifying C (2010) (voodoo-slide.blogspot.com)
140 points by jhack on Feb 7, 2016 | hide | past | favorite | 101 comments


> The system introduces an "amplification" phase where s-expressions are transformed to C code before a traditional build system runs.

In other words, this is a compiler.

At this point I have to say what I always say: Don't compile to C. You will spend forever eliminating undefined behavior, the debugging experience will be bad because you don't have direct control over the DWARF DIEs, your compilation will be slower for no reason, you will not be able to add custom metadata to be consumed by optimization passes (for example, aliasing passes), you won't have access to special instructions like LLVM add nuw/nsw, etc.

Virtually all languages I know of that started out compiling to C stopped doing it at some point in their development, for one or more of the above reasons. Skip the major technical overhaul you're going to inevitably have to do and just build LLVM IR from the beginning. The LLVM API is excellent, so it ends up being less work to begin with.


> You will spend forever eliminating undefined behavior.

Wrong. You can build an abstraction layer above C code-generator which makes sure undefined C code doesn't get generated.

> the debugging experience will be bad because you don't have direct control over the DWARF DIEs.

Nope. Once you have generated C code, you debug your C code with gdb or whatever is available. No problems there.

> your compilation will be slower for no reason

Nope. C compilation is about an order of magnitude faster than C++ compilation.

> you will not be able to add custom metadata to be consumed by optimization passes

Does that mean the clang project asks C/C++ programmers to insert custom metadata into their C/C++ source files to give hints to clang?

> you won't have access to special instructions like LLVM add nuw/nsw

You're looking at it the wrong way. This article for C++ programmers who are sick and tired of C++'s arbitrary man-made complexity and limitations over abstraction management. They can handle generated C code quite well. They're not interested in looking at or working on assembly languages like LLVM IR, especially the one they'll have to learn before they resume their work.

LLVM IR creates an additional abstraction barrier between a game programmer's code and hardware. Getting rid of C++ and moving to LLVM IR (not to mention LLVM codebase itself is a giant C++ blob, and might want to switch to a saner system like, e.g., amplify-C) is not a very good strategy for a game project.


> Wrong. You can build an abstraction layer above C code-generator which makes sure undefined C code doesn't get generated.

Yes, you could. But now you're talking way more work, and yet another IR. If you're going to build another IR for this, and you really really want to compile to C, why not just fix up LLVM's C backend and save yourself a lot of work?

> Nope. Once you have generated C code, you debug your C code with gdb or whatever is available. No problems there.

Debugging the generated C code is not a good experience. A developer wants to develop the code they wrote. For that to work you need fine-grained control over the contents of the DWARF DIEs.

> Nope. C compilation is about an order of magnitude faster than C++ compilation.

I'm not comparing C++ compilation to C compilation. I'm comparing the time it takes to compile C to LLVM IR, plus the overhead of in-memory or disk serialization, against zero (which is what you have if you go straight to LLVM IR).

> LLVM IR creates an additional abstraction barrier between a game programmer's code and hardware.

A C compiler IR, like LLVM IR, is not an additional abstraction barrier. What do you think C compilers compile their code into?


> But now you're talking way more work, and yet another IR.

What makes you think every abstraction layer has to be an IR? There's a reason we moved away from "assembly language" looking things: so we can think at a higher layer of abstraction and not worry about how a machine implements it.

> A developer wants to develop the code they wrote.

Again you're looking at it the wrong way. The generated C code _is_ the developer's code. An amplify-C developer would generate C code with the very purpose of looking at it. And the reason it has to be C and not an "assembly language" looking thing is that it has to be 'human readable'.

This is an entire approach quite orthogonal to the previous two approaches:

- shoehorn hand-picked fixed abstractions (like OO, templates) on top of C using cryptic syntax (the C++ approach)

- working at a higher layer and never having to deal with C but instead generating IR for a virtual machine (the Java, Perl, Python, LLVM-client, approach).

It turns out both approaches have turned out to be quite unsatisfactory for game programming. If you don't like this new approach, so be it, but unless you have an evidence of how game-programming is thriving on top of a VM model, you have to give them a break.


C is more portable than the LLVM IR though. So you get an inferior in some ways, but a more portable program as your compiler's output. And if you want to debug in a debugger that uses a proprietary format rather than DWARF, C will work better than LLVM, etc.

C is an OK target for a small new language; it's probably too crummy for a big, successful one (C++ certainly is one data point here!) But IMO for 90% of new "small" languages that can choose between C and LLVM IR, C isn't obviously the worse choice.


Is it more portable in a way that matters though? LLVM targets x86, x86-64, ARM32, ARM64, PowerPC, MIPS, SPARC, Hexagon, System z, TI MSP430, and XCore, as well as NVIDIA and AMD GPUs, and has some unofficial forks that target other niche architectures like AVR. If you aren't targeting one of those architectures, you're in a really niche space.

If you are in that niche space (i.e. you know that you absolutely have to target some architecture), and you don't have source to your compiler, by all means target C though--though I really wonder if it isn't worth just fixing up the LLVM C backend. (LLVM hasn't been able to find a maintainer for that backend because so few people are on architectures that aren't supported natively by LLVM, which says something to me about how few people need support for those architectures.)


> Is it more portable in a way that matters though?

A few months ago when I was trying to compile swiftc's IR with emscripten, I would have said "definitely yes". It didn't work both because of a LLVM version mismatch and because, well, the NaCl backend passes were apparently only tested on IR generated by Clang - random differences in what swiftc generated, like using 'add' with a constant right hand side instead of get-element-ptr, not simplifying struct returns to return-via-pointer-argument, etc. made it variously abort or outright crash. As you know, Rust has never really worked with it either due to the version mismatch.

Admittedly, though, that's mainly emscripten's fault, and I expect the new WebAssembly backend in trunk to be much more robust. Other than that, a few reasons I can think of to want to compile to C:

- Performance. Other compiler backends often produce more efficient code than LLVM; not always, but it's nice to be able to test several independent production-quality C compilers and pick the fastest, while with LLVM IR you're stuck with what you have. Same goes for compilation speed, though LLVM usually ranks well at that, and even working around optimizer bugs.

- Windows support for LLVM is still not up to par yet. Supposedly it will be in the near future.

- (edit) Easier to integrate with unusual platform-specific compilation modes, like C++/CLI or Apple's LLVM bitcode distribution (well, that is LLVM, but imagine in the future someone comes up with a similar system based on a different compiler).

- You can distribute the C output from your compiler, and others can use it without having to work your compiler into their build process. Like distributing object files, but those only work on one platform, while C works anywhere. (Caveat: compilers often need to know data layout, which in practice means you might need to have separate C outputs for 64-bit and 32-bit pointer sizes, and not support any more exotic data representations. But that's still way more portable than an object file.) Notably, even if you only need to distribute to people using LLVM, LLVM IR is not designed to be cross-platform.

Incidentally, for this reason, I'd love if someone made a Rust to C compiler, preferably working on the AST level to avoid the unreadable spaghetti that the LLVM C backend used to generate. It's easier to say "swap out this insecure C library with a secure Rust library" (to a larger C/C++ program) if doing so is a matter of swapping C files - distributing 'binaries' is suboptimal from a maintainability perspective, but not much worse than, e.g., SQLite's "amalgamation" distribution.


> - Performance. Other compiler backends often produce more efficient code than LLVM

I wouldn't say "often" here. LLVM's compiler backend is top-notch and is hard to beat. Often it comes down to random register allocation or scheduling differences.

> - Windows support for LLVM is still not up to par yet. Supposedly it will be in the near future.

What are you referring to in particular? The only thing I can really think of is exception handling, and that got overhauled recently for MSVC compatibility. PDB was an issue too, but that's gotten fixed and now LLVM can output PDB. I can't think of much that's left...

> - You can distribute the C output from your compiler, and others can use it without having to work your compiler into their build process.

That is a legitimate advantage. But on the whole I think that it's outweighed by the downsides of compiling to C.


> PDB was an issue too, but that's gotten fixed and now LLVM can output PDB. I can't think of much that's left...

This is what I was referring to. The RFC from a Microsoft employee on full CodeView (what gets compiled to PDB) support was posted three months ago, but AFAIK the implementation is still in progress. The preexisting support for CodeView is described by the documentation as "minimal" as it contains only line tables.


But when compiling to C you can't do much better than #line. So it doesn't strike me of much of an argument in favor of compiling to C.


Oh, and regarding the request for a Rust AST to C compiler, I hope that doesn't happen for a number of reasons. First, it undoes the work we're doing on MIR, which is very important (you couldn't run MIR-level optimizations, the borrow check and codegen would be at high risk of divergence, we'd have to take a big backwards step to serializing ASTs for generics, etc.) Second, it'd be very hard to dodge C's undefined behavior: think of what we'd have to do to make signed integer overflow crash cleanly instead of leading to UB, just to name one particularly egregious problem...


I don't see how signed overflow is hard: whenever you see a signed arithmetic operation you just emit, e.g., `add_i32(a, b)` instead of `a + b`, and then include definitions of those functions for each signed integer type (all 4 of them) which are 1-4 lines long.

Strict aliasing and pointer rules, on the other hand, definitely are hard if you want to produce truly standard C code: you have to completely bypass C's type system, using `char *` or `uintptr_t` for everything, and while this can be done, the resulting code is likely to look pretty ugly. However, a reasonable alternative is to depend on nonstandard annotations to disable TBAA: `__attribute__((may_alias))` for every popular compiler other than MSVC, which I believe doesn't do TBAA at all. (If MSVC ever adds it, the generated C code would need updating, but that's not the end of the world.)

As for MIR - while I said AST, I suspect it would be fine to do it on MIR, at the cost of having to reconstruct some control flow. The biggest problem with doing it on LLVM IR is the simplified type info, while the full info is still there for MIR, right? Not an expert. Anyway, it's just an idea.


I believe that is one of the reasons ats compiles to C - you can write a library that makes use of the advanced compile-time type features, but which can then be distributed as C code


LLVM works fine on Windows. Clang is not feature complete on Windows however.


The author is a game developer. I don't know what platfirms he worked with, but the console world isn't known for their great compiler support. You are pretty much limited to whatever the official (nda ridden) sdk privides.


LLVM can even target the web (emscripten).


> In other words, this is a compiler.

Its not a compiler it a pre-processor.

People look at C in isolation but don't realise it was designed to be part of the full Unix system that included Sed, Awk, M4, Lex, Yacc, Sh and all the other tools.

Writing custom C abstractions with Awk is fairly trivial and you end up with efficient C code that can be further processed or tuned. Thats what tools like Awk are there for.

The compiler itself is built on the same principles doing successive transformations on source, IR, assembly etc. There is really no rigid boundary.

If you want C with custom abstractions a custom dialect that compiles to C makes perfect sense. That use case was taken into account in its design.


"Write programs that produce and generate text, because that is a universal interface" is not a good principle for a compiler. (In fact, I think it's not really a good principle all around, and Unix is worse for it, but especially for a compiler.)


The author of the article is not writing a compiler.

I fail to see how that quote has any contextual connection with the current article or compiler design. A discussion about the merits of Unix text streams or text in general is another discussion entirely then the current one.


You yourself said "there is really no rigid boundary" between a compiler and a preprocessor. By most practical definitions of a compiler, the "C amplifier" discussed in the OP is a compiler, and is definitely similar enough to suffer many of the same issues as a compiler. In particular, I think what pcwalton is getting at is that in such a "pre-processor", it's likely that you'll eventually want more expressive data structures than text can comfortable offer you.


Can you given an example of how to write custom C abstractions with Awk?



[edit] keep in mind this is just a simple example. Awk is just one tool in the toolbox.

Take a look at 'The AWK Programming Language ' Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger.

http://www.amazon.com/The-AWK-Programming-Language-Alfred/dp...

Yes, its that Aho and Kerninghan...

operators to C function mapping. you can run like so.

$ awk -f fixed.awk < fixst.in > fixed.c

--- fixed.awk ---

  BEGIN {
  	printf("#include <stdio.h>\n");
  	printf("typedef long fixed;\n");
  	printf("void *send(fixed *this, char *message, ...);\n");
  }
  /^{/ { 
  	Ndecl = 0;
  	printf("{\n");
  }
  /^}/ { 
  	while(--Ndecl >= 0)
  		printf("\t%s\n", Destructor[Ndecl]);
  	printf("}\n");
  }
  /^:/ {
  	if($2~/fixed/) {
  		sub(/;/, "", $3);
  		printf("\tfixed *%s = send(NULL, \"fixed\");\n", $3);
  		Destructor[Ndecl++] = "send(" $3 ", \"~fixed\");"
  	}
  	else if ($4 ~ /(double)/) {
  		sub(/;/, "", $5);
  		printf("\tsend(%s, \"double\", &%s);\n", $5, $2);
  	}		
    	else if ($3 ~ /^=/) {
     		sub(/;/, "", $4);
    		printf("\tsend(%s, \"=\", %s);\n", $2, $4);
    	}
    	else if ($3 ~ /^\+/) {
    		sub(/;/, "", $4);
  		printf("\tsend(%s, \"+=\", %s);\n", $2, $4);
  	}
  	else if ($3 ~ /^\*/) {
  		sub(/;/, "", $4);
  		printf("\tsend(%s, \"*=\", %s);\n", $2, $4);
  	}
  }
  /^[^{}:]/ { print }
--- fixst.in ---

  int main()
  {
  :	fixed a;
  :	fixed b;
  :	fixed t0;
  :	fixed t1;
  	double f;
  :	a  = 1.00;
  :	b  = 9.99;
  :	t0 = 0.00;
  :	t1 = 2.00;
  :	t0 += t1;
  :	t0 *= b;
  :	t0 += a;
  :	f  = (double)t0;
  }


Sorry to repeat myself, but this is what Clasp [1] does (I am not affiliated with it, I promise).

I am in favor of integrating tightly with a compiler whenever possible, but the selling point of c-amplify is to be independent from any compiler or your existing toolchain. This means that (1) you can work with a in-house proprietary compiler for specific hardware like a game console (author used to develop games) and (2) you code according to some stable C standard and not against a possibly evolving compiler. I suppose the LLVM API is quite stable, but things might break over time.

[1] https://github.com/drmeister/clasp


Game consoles are all running architectures that are supported by LLVM. Many of their toolchains are LLVM [1].

The LLVM API breakage issue is legitimate, but not enough to outweigh the downsides of compiling to C. The definition of LLVM IR doesn't change that much for the C features.

[1]: http://llvm.org/devmtg/2013-11/slides/Robinson-PS4Toolchain....


LLVM makes a lot of things easier, but it is also a huge dependency for a small language. Using C as an intermediate language is a conservative way to generate code that works well for most languages. Also, if you don't mind the machine dependency, generating assembly directly may also be a good choice for a compiled language.


Nim seems to be doing pretty well and it compiles to C. I've not really used it but it does look pretty good.


Nim has lots of problems that come from compiling to C. It shouldn't compile to C.


Oh, I've been writing software in Nim pretty happily for a months now. What problems am I blind to?


I'm not saying you can't write software happily in Nim. I'm saying that Nim would be a better implementation if it didn't compile to C. There are lots of things that languages I like do that they could do better.


Oh sure. I'm asking specifically, given that I'm writing software happily in Nim, what potential pitfalls am I blind to? My software is compiling to C and it shouldn't be so I'm wondering what can I expect to actually experience using Nim, outside of Hacker News posts about Nim.


...care to explain why?


Not really. Compiling to C is much easier than compiling to LLVM. If 90% of languages fail, then there's only a 10% chance you'll have to do that rewrite :)


No, it's not. The LLVM API is really easy to use, with an excellent tutorial available (Kaleidoscope). You get to work with the IR as a tree instead of as a quirky serialized output format that was never designed to be used as an IR.


I've actually done both.

The LLVM API is good, yes, but from a project management point of view it's pretty painful to work with --- the libraries are vast, don't have stable binary interfaces, and don't validate parameters, which means that if you get anything wrong it tends to just segfault deep inside somewhere. I've had to single step through the LLVM source code way too many times.

Plus, distribution support has always been pretty poor. e.g. Debian's 3.3 package's llvm-config tries to link your program against the static libraries rather than the dynamic ones, which leads to painfully large link times.

This all adds up to a non-trivial cost.

By contrast, emitting C is a lot less powerful, but suddenly you don't have to care about any of this stuff. You have a standardised intermediate format which you can throw at any compiler, which is trivially verifiable, doesn't need special libraries to write, and easily integrates into third-party tool chains. Simply having rigorously separated front and back ends can be a huge win. But biggest of all, you don't have to keep knowledge of the LLVM API in your brain while you're trying to get work done.

Of course, you don't get proper debugging information, or tail calls, or any of the other things you can do with the LLVM API which you can't do through C; but depending on what you want to do, it can totally be worth it.


> the libraries are vast, don't have stable binary interfaces, and don't validate parameters, which means that if you get anything wrong it tends to just segfault deep inside somewhere.

Did you compile LLVM in Debug+Asserts mode? I rarely ever get segfaults from LLVM in this mode: I just get assertions when constructing invalid IR nodes.

> Plus, distribution support has always been pretty poor. e.g. Debian's 3.3 package's llvm-config tries to link your program against the static libraries rather than the dynamic ones, which leads to painfully large link times.

The link times aren't so bad if you use gold.

> You have a standardised intermediate format which you can throw at any compiler, which is trivially verifiable

I wouldn't say it's trivially verifiable. The undefined behavior rules of C are vast, and compiler authors have to know them incredibly well.

> But biggest of all, you don't have to keep knowledge of the LLVM API in your brain while you're trying to get work done.

But you have to keep C's undefined behavior rules in the back of your brain (such as signed overflow == UB!), which is worse.


Well you can generate C with Ruby, JavaScript, Bash, or whatever language takes your fancy.


Not any different from generating LLVM IR in a text form.


I would not call it "much easier". Complexity is nearly equivalent, with LLVM being somewhat simpler (if used via llvm-c wrapper).


I'm not sure compiling through C is really slower than using LLVM. For example, I haven't used Rust or Nim very recently, but a few months ago when I tried them on some toy programs, Nim compiled significantly faster than Rust. This of course might just be a matter of the implementations of the two compilers.


I thought the principle benefit of compiling to C was the ease of bring up on a new system.


LLVM supports every system you could possibly care about, unless you're in a very niche market. If you're coding for a processor that's so niche that it supports GCC but not LLVM, then you should compile to GCC GIMPLE instead. If you're coding for a system that has neither LLVM nor GCC support, then compiling to C may be an option, but I honestly wonder if it wouldn't be easier to just repair the LLVM C backend in that case. (Note that this is so uncommon of a use case that the LLVM C backend has been unable to find a maintainer willing to keep it working for years.)


Crystal and Nim compile to C.


also Vala.


and chicken scheme and ats


This is an infrastructure for performing parse tree transformations on C code, basically bringing Lisp macros to C.

It has a 1-to-1 mapping to C (variables, literals, and all) so it seems to be doing less on the "compilation" end than what an assembler performs.


This road is well traveled. Objective-C, C#, and Java are all on that road - start with C, and add only the stuff you need. A few years down the road, and the new language has feature bloat.

The three big problems in C are "how big is it", "who released it", and "who locks it". The language provides no help with any of these issues. C++ allows papering over the problem, but the abstractions always leak and the mold always comes through the wallpaper. There are times when you need a raw pointer (for system calls, for example) and the availability of raw pointers breaks the size and ownership protection.

Rust deals effectively with all three of those issues. That was a major breakthrough, one of the few fundamental advances in language design in years. Rust, unfortunately, seems to have taken a turn for the worse in the last year. Rust has been infected with the Boost disease - overly clever templates. Rust is a procedural language, but template enthusiasts have been making it pseudo-functional with constructs like ".and_then()" and ".or_else()". Then there's "try!()", with its invisible return statement. The result is painful to read and hard to maintain.

The author of the parent article has a point about "with-" type constructs. Python has a general "with" statement, which can be used on any object that implements "__enter__" and "__exit__". That's a very useful and safe language feature. Both Go and Rust lack it, and their workarounds ("defer" and destructors) are worse.

Python's "with" plays well with exceptions. Exceptions in Python work well. Exceptions in C++, not so much, mainly because ownership and exceptions mix badly. Go could fix that with GC, and Rust could fix that with the borrow checker. But neither has exceptions. As a result, error handling in Go is wordy, and in Rust, both complicated and wordy.

I had high hopes for Rust, but they may have jumped the shark by getting too clever. We don't need a new C, but we may need a new Rust. At this point, any new low level language that doesn't have a borrow checker is flawed from the start.


> Objective-C, C#, and Java are all on that road - start with C, and add only the stuff you need.

If »starting with C« means »use a mostly similar syntax for basic things«, then yes. Otherwise only ObjC has been on that road, given that C supports things that C# and Java don't. And then it's a »start with C, remove stuff you don't like, and add stuff you need«, which is probably more akin to »start wherever and pick things you want in the language«. E.g. syntax aside, C# probably has much more in common with Delphi in its inception and design than with C.


Obj-C, C#, Java, etc each define a new fixed language with new features. This doesn't.

This defines a toolkit for defining a C-derived language per-project. This on its own adds no new features to C; it only gives you the means to add them.


There is a project by Fernando Borretti for Lisp-style macros in C, with macro expander written in Common Lisp: https://github.com/eudoxia0/cmacro, and a collection of basic macros: https://github.com/eudoxia0/magma

This way he has already added lambdas, lazy evaluation, types, anaphoric macros, with- macros.


The difference between cmacro is it builds on C syntax (and allows for syntax extensions, by having a very flexible parser), while this is more like the "defmacro for C" system described here[0] and here[1], because it uses a Lisp syntax.

I prefer the former approach because it's less likely to alienate C programmers. At the same time, being able to define new syntax can have the same effect, since every new syntax you add carves out a smaller and smaller set of the community that is willing to use it. The same is true for Common Lisp reader macros, which is why few of the many syntax extensions that are available become commonly used.

[0]: http://www.european-lisp-symposium.org/editions/2014/ELS2014...

[1]: http://www.european-lisp-symposium.org/editions/2014/selgrad...


Always fun to see projects like this, but I don't get some of the motivations. Regarding the code examples:

  // c++
  AutoFile file("c:/temp/foo.txt", "w");
  fprintf(file, "Hello, world");

  // lisp
  (with-open-file (f "c:/temp/foo.txt" "w")
    (fprintf f "Hello, world"))
.. the author thinks the former is cluttered and latter uncluttered (simply because the fprintf is within a sub-scope of with-open-file)? That's a fragile rationale, but you can certainly write c++ that way if you like.


If you actually use RAII, and write braces in a bit of an unorthodox style, the distinction disappears even further:

    { AutoFile file("c:/temp/foo.txt", "w");
         fprintf(file, "Hello, world"); }

    (with-open-file (f "c:/temp/foo.txt" "w")
        (fprintf f "Hello, world"))


I honestly think explicit scopes, even though unorthodox, should become more of a thing. I personally do it in my C play projects to emphasize particular parts in a function as being independent "subordinate clauses", and with variables that only have a limited scope, but not being independent enough to be factored into a function--for example when the clause needs to be within reach of a label.

In C++ with RAII, the "emphasis" makes even more sense, to the extent that it doesn't make things ugly.


To be fair, it was written in 2010 back when c++11 was still c++0x and what the final feature set would be was still murky.


Ahh, I didn't check the publication date. That explains a bit, c++ has had some big changes in the last 6 years.


The motivation is to reach the ultimate state of DRY: skip filling the RAII "template" again and again and use a macro. And yes, you might end up writing a compiler as a side-effect ;-)


Wouldn't it make more sense to use a Scheme that compiles down to C (like, Chicken scheme or something like that)?

I guess the argument might be readability of the output, but I'm guessing as his lisp grows more complex the output is going to diverge from the input quite a bit anyway (I'm trying to imagine what a closure would look like.. I'm guessing it wouldn't be pretty)


Amplify is mainly used to generate C code that does not need a Lisp runtime. It is used effectively as a smarter preprocessor.

Embeddable Common Lisp (ECL) [1] is an efficient implementation of Lisp that compiles to C. It can be easily used to interface C and Lisp.

Regarding C++, nowadays there is Clasp [2] that is under active development and looks very promising.

[1] https://common-lisp.net/project/ecl/ [2] https://github.com/drmeister/clasp


I don't know a lot about C or about Lisp, but I do know that this isn't the first compile-to-C Lisp. So why not use one of the ones that already exist instead of starting from scratch? And the blog post seemed to treat it like this is the first attempt to compile a high level language to C (MLTon, JHC, various Schemes, etc).


It's because the author doesn't want a whole new language, but rather a better way to build abstractions over C.

What he wants is C's conceptual model with sexpr syntax and (therefore) the ability to build custom abstractions on top of it. That's different from working in an entirely other programming language with its own conceptual model that happens to compile to C.

You would see this difference vividly in the C that the two systems generate: the first would read like a more verbose version of the same application in which the skeleton of the program is recognizable; the second would look like machine gobbledygook. The former would be debuggable in a way that let you step through the logic of the surface program, while the other probably wouldn't. And so on.

Instead of thinking of this as a different language, think of it as 'magic C with DSLs', or 'C, with a macro system that lets you do what you want'. That distinction isn't absolute, because if you push your abstractions far enough you will end up with a different language, but the style of programming the author is talking about gives you incentives not to do that.


Even if all you want is C semantics with a different syntax, though, I think it's better to emit LLVM IR. LLVM IR is close to C semantics (though aliasing information has to be supplied explicitly), but you have direct control over debug info, which is important to avoid a huge regression in the debugging experience. As an added bonus, you eliminate the necessity of serializing and deserializing your IR during your compilation pipeline for no reason (which is effectively what compiling and reparsing C is doing).


You may be right, but learning curve is an issue. If you're already familiar with C, writing a sexpr C generator is super easy. I can see why someone would balk at learning a new conceptual model and toolset, and just take the path of least resistance, especially if they already know the kind of C program they'd like to write. This would have been even more true in 2010 of course.

So what's the best way to tackle the learning curve of what you're suggesting? If I know zero about LLVM and I want to make something like the OP, what should I do?


Another good option is to use Joe Armstrong's approach: look at the LLVM IR generated by clang for example and then emit those.

My first attempt at using LLVM was using the C++ API. It was...a struggle. Using this approach (IR snippets), I made more progress in a day than I had in months using the API.


Also worth mentioning an invaluable learning tool: 'cpp' backend in LLVM. It emits an idiomatic C++ code that generates any given IR module using the LLVM API.


Yup, that's what I used in my first attempt. Didn't work for me. (Actually: a later part of my first attempt, I think I initially tried with the straight API. Good luck with that).


What exactly did not work? Have you filed a bug?


That's clever.


LLVM has a great tutorial (Kaleidoscope): http://llvm.org/docs/tutorial/

It walks you through basic expression generation, control flow, memory, etc. for a simple language. The learning curve isn't zero, to be sure, but I think the time saved by being able to work with IR as a tree instead of as a flat series of bytes makes it easily worth it.


"you eliminate the necessity of serializing and deserializing your IR during your compilation pipeline for no reason (which is effectively what compiling and reparsing C is doing)."

I did debugging before it got to C and just ensured C generation would do exactly what I wanted. I could read and debug the C itself as a check against problems in that. Yet, serializing and deserializing my IR just didn't happen: it was just LISP or BASIC expressions depending on which version we're talking about. Just tree's.

You're debugging regression claim is correct as I addressed in my main comment. Fortunately, my development style and choice of libraries compensated for that nicely. It would've been quite painful if I had to deal with arbitrary FOSS or proprietary stuff out of necessity. I'd be working at both abstraction levels for sure or coding miracles into my tooling haha.


I don't doubt your position has merit, and there are many options for generating code other than llvm, but, is anything really quicker to implement than fprintf? I have implemented compile (note I prefer to phrase it as 'source to source translation' rather 'compile') to X myself, and for a certain class of project (personal or rarely used by others, compilation speed isn't an issue) fprintf (or whatever) get's you a lot of bang for the development-hour buck.


In order to actually generate valid C, you have to do a lot of work to figure out what you're supposed to fprintf; you have to get the operator precedence right, you have to do scoping right, debugging the serialization code is annoying, etc. A high-level API like LLVM IR, by contrast, lets you interact with the IR as a tree instead of an output stream, which is usually easier because your AST is already a tree.


There is a plain text form of an LLVM IR. It is a bit easier to pritnf it than a fully featured C.


I think the point is this more C with an S-expression syntax than a compiler that generates C. It sounds like you're supposed to think of the translation process as, at heart, more akin to transliteration than actual compilation.

Article dates from 2010... I wonder whether this ever got used.

(There's a paper that suggests doing something similar for C++, though not quite so adventurously. It makes it look more like Pascal: http://www.csse.monash.edu.au/~damian/papers/HTML/ModestProp...)


IMO Terra (http://terralang.org/) is a more interesting approach to this problem, since it not only enables macros but also staged compilation, polymorphism-as-a-library, runtime specialization etc.

> You can use Terra and Lua as…

> A scripting-language with high-performance extensions. While the performance of Lua and other dynamic languages is always getting better, a low-level of abstraction gives you predictable control of performance when you need it. Terra programs use the same LLVM backend that Apple uses for its C compilers. This means that Terra code performs similarly to equivalent C code. For instance, our translations of the nbody and fannhakunen programs from the programming language shootout1 perform within 5% of the speed of their C equivalents when compiled with Clang, LLVM’s C frontend. Terra also includes built-in support for SIMD operations, and other low-level features like non-temporal writes and prefetches. You can use Lua to organize and configure your application, and then call into Terra code when you need controllable performance.

> An embedded JIT-compiler for building languages. We use techniques from multi-stage programming to make it possible to meta-program Terra using Lua. Terra expressions, types, and functions are all first-class Lua values, making it possible to generate arbitrary programs at runtime. This allows you to compile domain-specific languages (DSLs) written in Lua into high-performance Terra code. Furthermore, since Terra is built on the Lua ecosystem, it is easy to embed Terra-Lua programs in other software as a library. This design allows you to add a JIT-compiler into your existing software. You can use it to add a JIT-compiled DSL to your application, or to auto-tune high-performance code dynamically.

> A stand-alone low-level language. Terra was designed so that it can run independently from Lua. In fact, if your final program doesn’t need Lua, you can save Terra code into a .o file or executable. In addition to ensuring a clean separation between high- and low-level code, this design lets you use Terra as a stand-alone low-level language. In this use-case, Lua serves as a powerful meta-programming language. You can think of it as a replacement for C++ template metaprogramming or C preprocessor X-Macros4 with better syntax and nicer properties such as hygiene. Since Terra exists only as code embedded in a Lua meta-program, features that are normally built into low-level languages can be implemented as Lua libraries. This design keeps the core of Terra simple, while enabling powerful behavior such as conditional compilation, namespaces, templating, and even class systems implemented as libraries.


+1. I can't say enough good things about Terra.

I self identify as a Lisper (Clojure), but have done a fair amount of professional work in C/C++/Go too. I was productive in Lua/Terra practically overnight. I ported a bunch of Rust (that I had originally ported from C and was considering porting back to C + custom string-templating) to Lua in a few days and the code got shorter, simpler, faster, and with build times of a tiny fraction of those in Rust. Terra just matches how I think about programming low-level code: Types are about shifting work to compile time. Yes, some error-detection work, sure, but that's totally secondary for me as compared to shifting work to compile time for performance, memory representation, etc.



It seems like it hasn't been touched since the year of publication of this article :)


This quote from the article really echoes my experience:

"Every C++ shop has its own 'accepted' subset."

And it's frustrating. This quote from the article too:

"these are clear signs that C++ isn’t providing any real cost benefit for us, and that we should be writing code in other ways."


Except this happens to all languages when they grow big.

If the language is a rich one, then every shop uses part of it.

If the language has a rich library ecosystem, then each shop has its own little garden of libraries.


This problem is much more serious when you are talking about opaque DSLs. Even having to retreat to dramatically different corners of a huge language is not too hot. By comparison, a garden of libraries that are all readable by anyone in command of a smaller core language is a smaller problem.


Depends how big the libraries are and if they are commercial or open.

But I agree with the DSLs, that is the curse of many Common Lisp shops, yet Lisp is a very simple language.


Is debugging C++ worse than generated C code?


No, it's not. Debugging generated C code is always worse. One of the major reasons why C++ compilers stopped compiling to C is that it is basically impossible to emit proper debug information that way.


In the same vein: RAII is verbose and complicated but layering a DIY lisp-based macro system on top of C isn't?


"compiles super fast": compared to C++, sure. But not really, no.


Compiling stb_image, an incredibly useful image loader:

  $ wc -l stb_image.h
      6433 stb_image.h

  $ more stb_image.c 
  #define STBI_ONLY_JPEG
  #define STBI_ONLY_PNG
  #define STBI_FAILURE_USERMSG
  #define STB_IMAGE_IMPLEMENTATION
  #include "stb_image.h"

  $ time clang -c stb_image.c -o stb_image.o

  real	0m0.091s
  user	0m0.075s
  sys	0m0.014s
Seems pretty fast to me!


Have you ever written Pascal, Go, or D? That's what fast compilation looks like. The D standard library compiles in 3.5s from scratch.


Yes, I've used Go. The compiler is fast and that's really nice; fast compilation was one of their explicit goals.

Pascal and C are fast because they're old and relatively simple, and their implementations are mature.

C++ becomes super slow when you start doing anything interesting, when I tried Rust it seemed very slow, even Java is slow although it seems like it shouldn't have to be. Those languages are all more widely used than D, I think.

[edit: typo]


C compilation is generally faster than C++ compilation for obvious reasons. However the true speed depends on the level of optimization you want and the number of header files you're including. Non-Optimized C code can be generated very quickly, and if you manage your header files conservatively you can have very fast compilation speeds.


> from the article: "In my prototype system, c-amplify, I’m doing exactly this. The system introduces an "amplification" phase where s-expressions are transformed to C code before a traditional build system runs."

haven't gambit/chicken already tried something similar for scheme ? don't know what the latest state of affairs there is though.


We've experimented extensively with code generation resulting in C and there a single most important reason why it does not work: Nobody except the authors can read and understand, let alone modify, the source code.


> Answer: none, they are equivalent as far as semantics go.

The C function does not have any defined semantics for certain values of a and b. The Common Lisp function merely adds two integers (or runs out of memory trying).


But this is not a Common Lisp function, it is the AST of the previous C fragment. In fact the code makes no sense interpreted as Common Lisp.


I built something like this that compiled BASIC to C or C++ to avoid dealing with the later. Later re-implemented it in LISP. Doing a one-to-one mapping from LISP to 3GL code makes it easier than some here think. Using a correct-by-construction coding method eliminates a lot of the problems. Your code generator can also avoid C-related errors and undefined behavior wherever possible with tools available to check for most of both. Mine was a simplistic tool with very-few targets in terms of compilers. Great safety and productivity, though.

Here's main benefits a LISP-like C can give you:

1. Automate boilerplate 2. Automate safety 3. 4GL-style doing very much with very little 4. Context-specific optimizations 5. My favorite: interactive testing & incremental compilation for sub-1s code-to-execute and maximizing mental flow. Safe, C alternatives will probably never be able to do this. 6. Update while the system is running if still in prototyping mode. Fun stuff. 7. Save exact execution state of mock C application for debugging if you design that feature.

I don't recall most of details of implementation due to a broken memory. I worked in industrial BASIC more than C because C has too many issues. However, as "dang" and "jhack" are into this, I'll add that I did find it useful as a non-compiler-expert to setup at least two versions or states of the development environment: prototyping and production.

The prototyping system was designed for interactive development. It had all the libraries I might use loaded up, pre-wrapped, and with interface checks. The 3GL commands, or emulation of them, ran instantly as LISP expressions. You develop in this mode with no connection to the 3GL except logical and [mostly] semantic equivalence. Your code is straight up code at the moment.

When done with that, you switch to production where your code is data that's interpreted and turned into the 3GL. The generator should take care to make portable code or at least cover your targets. Throw in out-of-bounds checks and such by default. Can have a declaration to decide that. Anyway, you get your BASIC, C, C++, whatever code here. I can even imagine one for Rust that works faster than the actual compiler.

So, two versions of the system: interactive interpretation, incremental compilation, and live updating of LISP version for quick development w/ optional checks inside (eg typing); version that generates code for external compilers. Both operate on same input but ignore what they don't need. Can avoid most debugging issues with a robust, coding style and plenty interface checks. All I did with that.

Hope lessons from that old project helps someone doing the next. I expect if I recreate it that the Amplifying C stuff will be helpful in figuring out how to do that. So might be tools like Racket and me learning LISP for real. ;)


I seem to recall something about C programs becoming bug ridden implementations of half of Common Lisp


This is Greenspun's tenth rule (https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule):

"Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp."


The Morris corollary:

"Including Common Lisp." http://paulgraham.com/quotes.html


Can also be read on the wikipedia link that I posted. :-)


I've asked for examples of this in the past and nobody's been able to provide any.


I tried "amplifying" C with Lisp-like syntax macros. Looks like there is a lot of potential on this route:

https://github.com/combinatorylogic/clike




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: