Rust Proc Macros: A Beginner's Journey

Tsvetomir Dimitrov

2022-10-15 14:40

Who am I and what is this post about

Recently I started working professionally with Rust and it wasn't long before I stumbled upon proc macros. I had to make some changes to a relatively complex (for me) macro which generates traits and implementations for them. Reading about macros (and especially proc macros) in Rust created some associations with C++ metaprogramming which is not exactly my cup of tea. As a result I was determined not to enjoy the task. However it turned out that proc macros aren't that bad. They are very different from the regular Rust one sees every day but they are not black magic. And you can do useful stuff with them.

So why this post? There are a lot of resources for learning proc macros out there. I've read some of them but despite that there were things which just didn't click for me in the beginning. I needed something more to reach the 'now this makes sense' moment. So after gaining some knowledge I decided to write this post and share my experience. And more specifically - the things I didn't initially understand.

Should you read it? The short answer is NO :). If you aren't a complete proc macro newbie - there is nothing for you here. I'll be very grateful if you want to read it and share your feedback/advise. But don't be disappointed if you expect to learn something new and you only see obvious stuff. If you are just starting to learn proc macros this post might help. But people are different - hard things for me are maybe easy for you and the other way around.

Learning about proc macros

There are a lot of good sources for learning about Rust's proc-macros. I can recommend two freely available ones:

The Rust Programming Language - the official Rust book which is freely available online. Ch. 19 is dedicated to macros. Contains a detailed walk-through on how to write a derive macro. Explains syn and quote a little.
The Rust Reference - explains the different types of macros and has got code samples for each of them. I find this very useful as general reference.

I've also found this post on LogRocket's blog very helpful.

You can read all day long about proc macros but you won't get anywhere until you start writing code. Fork Rust Latam: procedural macros workshop by David Tolnay, read the instructions and do the exercises. You'll not regret it.

What I wish I knew about proc-macros in advance

As I already mentioned proc-macros are different and require some mind shift. It's hard to write and exhaustive list of shifts so I'll share some which helped me.

Debugging macros

Macros are expanded during compilation so most of the regular debugging techniques don't work with them. Using a debugger is definitely not an option (please leave a comment if I am wrong). I followed the advice from the workshop and it was enough to figure out what's going on with my macros. It suggests two approaches - cargo expand and printing traces.

cargo expand is another project by David Tolnay. It is a binary crate which you can install on your system. When invoked it will replace all macro invocations in a given source with the actual code which the macro produces and dump it on stdout. The command supports the regular target selection used in cargo build and cargo test so you can specify single modules/tests/etc.

The other approach mentioned is just to print the tokens which your macro generates on stderr. This is especially useful when you've messed something up and your macro generates an invalid code. The example from the repo:

eprintln!("TOKENS: {}", tokens);

Note that the code will be printed during compilation, not execution. Look for it in the output from cargo check or cargo build.

Another useful compiler option for fixing problems is -Zproc-macro-backtrace. If your macro panics during expansion you can use this option to see a backtrace which helps to figure out what's wrong. A convenient way to run it via cargo is

RUSTFLAGS="-Z proc-macro-backtrace" cargo +nightly <cargo cmd>

proc-macro and proc-macro2

This was very confusing for me. Why two versions? Why first version is still alive if 2 is superior? There are good answers for these questions but they are scattered around the internet. I'll try to summarise them. First - why there are two versions? In nutshell because proc-macro types can't exist outside proc macro code. For better explanation read this excerpt proc-macro2 crate documentation. There is no point in copy-pasting it here.

Why the two versions coexist? The input for each proc macro is TokenStream type from proc-macro crate. You can't escape from this - it should be in your outer API. But inside your implementation you should use proc_macro2. It's more convenient and more testable.

Another thing that caused a lot of confusion in the beginning was how these two versions work together. syn and quote work with proc-macro2 while the entry point function requires proc-macro. The result for me was a bunch of errors like these:

expected struct `TokenStream2`, found struct `proc_macro::TokenStream`

or it's reversed twin:

expected struct `proc_macro::TokenStream`, found struct `TokenStream2`

and also:

the trait `ToTokens` is not implemented for `proc_macro::TokenStream`

This drove me crazy. The solution is very simple. Use proc-macro at API level and proc-macro2 everywhere else. Here is a sample to see what this means. You have got a simple rust crate with the following structure:

Cargo.toml
- src
  - lib.rs
  - impl.rs

You can see the code on GitHub. Make sure you have checked out branch master. I'll also copy-paste the code in the post for convenience.

Cargo.toml looks like this:

[package]
name = "my-proc-macro"
version = "0.1.0"
edition = "2021"

[lib]
proc-macro = true

[dependencies]
syn = {version = "1.0", features = ["full"]}
quote = "1.0"
proc-macro2 = "1.0"

This is a regular Cargo.toml file. In [lib] there is proc-macro = true indicating the crate contains a proc-macro. Note the dependencies - we have got proc-macro2 there. proc_macro is included by default.

Now src/lib.rs:

use proc_macro::TokenStream;
use syn::parse_macro_input;

mod my_proc;

#[proc_macro]
pub fn my_proc_macro(input: TokenStream) -> TokenStream {
    let input = parse_macro_input!(input);
    my_proc::my_proc_impl(input).into()
}

This is the main entrypoint I wrote about before. Note that we use TokenStream from proc_macro, because this is the 'API level'. Also note that my_proc module is included here. The into() call on the last line does the conversation between proc-macro and proc-macro2 types. We'll get to it soon.

And finally src/my_proc.rs:

use proc_macro2::TokenStream;
use quote::quote;

pub fn my_proc_impl(input: TokenStream) -> TokenStream {
    quote!(println!("Answer: {}", #input))
}

This is the implementation of the macro which uses proc_macro2. The my_proc_impl function returns proc_macro2::TokenStream and the into() call in the previous file converts it to proc_macro::TokenStream.

Let's resume:

lib.rs declares the API of the macro and uses proc-macro. It calls an impl function from another module.
impl.rs contains the impl function and works with proc-macro2.
into() is used to convert from proc_macro2::TokenStream to proc_macro::TokenStream.

Organising your code

When I was writing my first macros I didn't structure my code very well. I used a very long single function doing a lot of work. This is a very bad idea because proc-macros are just code. As every other code it needs to be easy to read and test. Better way is to encapsulate the logic in functions which return some syn structs and at some point to stitch them together with quote!. Let's have a look at a macro which prints a message (hardcoded in our case) and then the answer from our initial sample (passed to the macro as an integer). We'll split the code in two functions. The first one will print the message and the second one - the result itself. You can see the code in organising-your-code branch of the sample project. The only modification is in my_proc.rs:

use proc_macro2::TokenStream;
use quote::quote;
use syn::{parse_quote, ExprMacro};

pub fn my_proc_impl(input: TokenStream) -> TokenStream {
    let progress = progress_message("Thinking about the answer".to_string());
    let answer = answer(input);

    quote!(
        #progress;
        #answer;
    )
}

fn progress_message(msg: String) -> ExprMacro {
    parse_quote!(println!(#msg))
}

fn answer(result: TokenStream) -> ExprMacro {
    parse_quote!(println!("Answer: {}", #result))
}

I use parse_quote! in this example but the result can be generated in many ways - modifying the input, extracting parts of it, etc.

Another pattern I have seen is all the functions to return TokenStream and again combine them with quote!. The benefit is that you are not limited to a single type plus you can handle unknown number of elements. For example (branch organising-your-code-tokenstream):

use proc_macro2::TokenStream;
use quote::quote;

pub fn my_proc_impl(input: TokenStream) -> TokenStream {
    let mut result = Vec::new();

    result.push(progress_message("Thinking about the answer".to_string()));
    result.push(answer(input));

    quote!(
        #(#result);*
    )
}

fn progress_message(msg: String) -> TokenStream {
    quote!(println!(#msg))
}

fn answer(result: TokenStream) -> TokenStream {
    quote!(println!("Answer: {}", #result))
}

#(#result);* is an interpolation feature of quote. It expands all elements from the vector and puts a ; between them. This is explained here.

The examples above are of course not universal but they were a good start for me. Do whatever works for you but do it in timely manner before you reach the point where you've got one big unmaintainable function.

A few words about syn and quote

These are mandatory libraries for working with proc-macros.

Syn parses the input Rust code (TokenStream) to structures. With them you can generate new code, modify the existing one or remove code. Have a look at the list of structs in syn. There is one for every piece of Rust syntax you can think about. For example ExprClosure. This represents a closure expression like |a, b| a + b. All relevant parts of this expression are extracted as struct fields. For example output is its return type. Each field is another structure representing part of the syntax. You see how you start from a single struct representing something, and it has got other structs chained together to represent the whole code fragment. This is the AST (Abstract syntax tree) pattern mentioned in the documentation.

Quote crate provides the quote! macro which gets Rust source code as input and converts it to TokenStream. It can be used as a result of the macro you are writing or processed further. It also does 'quasi quoting'. This means you can use variables from the scope where quote! is executed and the macro will embed them to the resulting TokenStream. Let's have a look at a quick example:

fn generate_getter() -> TokenStream {
    const ANSWER: u32 = 42;
    quote! {
        fn get_the_answer() -> u32 {
            #ANSWER
        }
    }
}

Note the #ANSWER syntax inside the code block of quote!. It refers to const ANSWER defined in the beginning of the function. This is called variable interpolation and can be done with any type implementing ToTokens trait. There are implementations for all base types and all syn structs.

Some useful functions from syn and quote

Now let's see some functions from both crates which I believe are worth knowing. To avoid dead links all references to documentation link to specific version.

Spans and `quote_spanned`

Syn uses spans to represent the location (line and column number) of the expression in the source where it was initially located. This is used mainly for error reporting. All structs (AST elements) implement Spanned. The trait contains a single function - span() which returns a Span. Then you can pass the span around and attach it to errors so that the compile renders them on the offending lines. Spans are often used with quote_spanned from quote crate. It generates a TokenStream and attaches a span to it. Let's see a small example which generates a compilation error. It's in quote-spanned branch from the sample code:

use proc_macro2::TokenStream;
use quote::quote_spanned;
use syn::spanned::Spanned;

pub fn my_proc_impl(input: TokenStream) -> TokenStream {
    quote_spanned!(input.span() => compile_error!("I don't like this...");)
}

The generated compilation error looks like:

   Compiling proc-macro-post v0.1.0 (/home/ceco/projects/proc-macro-post)
error: I don't like this...
 --> src/runner.rs:3:20
  |
3 |     my_proc_macro!(42);
  |                    ^^

error: could not compile `proc-macro-post` due to previous error

If you need to generate a more complex error message (which dumps variable values, etc) you can use format! from the standard library.

Error reporting with `syn`

I barely touched error reporting in the previous section by mentioning spans and compile_error!. Let's have a look at a more complete example. Syn has got type aliases for Result and Error. Together they make error handling very elegant. Error has got a method named to_compile_error() which generates a compilation error from the error object. You can generate an Error instance somewhere within your code and propagate it with ? operator up to a place where you can handle it by converting it to an actual compilation error.

Let's see an example. We want to write a proc macro which accepts and integer and prints 'Answer: INTEGER'. However the only accepted value will be 42. For everything else an error will be generated.

Let's modify the example we used so far. The sample code is in branch syn-error from the sample code. my_proc.rs contains:

use proc_macro2::TokenStream;
use quote::quote;
use syn::{parse2, spanned::Spanned, Error, LitInt, Result};

pub fn my_proc_impl(input: TokenStream) -> Result<TokenStream> {
    let span = input.span();
    let ans = parse2::<LitInt>(input)?.base10_parse::<i32>()?;
    if ans != 42 {
        return Err(Error::new(span, "Answer should be 42"));
    }

    Ok(quote!(println!("Answer: {}", #ans);))
}

We import Error and Result from syn. The function my_proc_impl parses the input to LitInt (integer literal) and extracts its value. base10_parse generates syn::Error on failure (just as most of the functions in syn) so we can use ? to unwrap or propagate the error. Then if the input is not 42 we return another instance of syn::Error. Its constructor accepts two parameters - a span and an error message.

Now let's see how this function is used in lib.rs:

use proc_macro::TokenStream;
use syn::parse_macro_input;

mod my_proc;

#[proc_macro]
pub fn my_proc_macro(input: TokenStream) -> TokenStream {
    let input = parse_macro_input!(input);
    my_proc::my_proc_impl(input)
        .unwrap_or_else(|e| e.to_compile_error())
        .into()
}

Here we call my_proc_impl and convert any syn::Error to compilation error. If we pass bad value to the proc macro we have a nice compilation error:

error: expected integer literal
 --> src/runner.rs:3:20
  |
3 |     my_proc_macro!("test");
  |                    ^^^^^^

error: could not compile `proc-macro-post` due to previous error

Converting code snippets to syn structs

Sometimes you want to convert a piece of rust code to a syn AST struct and use it later, pass it around etc. You can do this with parse_quote. Its input is rust code (just as quote!) but instead of generating a TokenStream it returns a syn struct. The exact type is determined by type inference so the type of the result should always be specified.

Let's see a small (and not at all practical) example:

use proc_macro2::TokenStream;
use quote::quote;
use syn::{parse_quote, ExprMacro};

pub fn my_proc_impl(input: TokenStream) -> TokenStream {
    let msg: ExprMacro = parse_quote!(println!("Thinking about the answer..."));
    quote!(
        #msg;
        println!("Answer: {}", #input);
    )
}

parse_quote! here is used to convert the println! statement to ExprMacro. In practice this is pointless because you can put the println! statement inside quote! directly but this code is for demonstration purposes only. The type of msg is explicitly specified (ExprMacro). You can use msg not only in quote!. It can be returned as a value, passed to another function and so on.

Generating identifiers

Ident from syn represents an identifier (name of function, variable, struct, etc). As explained in quote documentation here you can use token concatenation to generate identifiers. Quote provides a dedicated macro for this - format_ident!. Use it anytime you want to generate an identifier.

Testing your code

If you use proc-macro2 (which you should) you can write all kind of unit and integration tests for your proc macro and this should be pretty standard. There is one aspect of testing specific to macros though - the UI tests. This was confusing for me because when I hear UI I usually think about graphical user interfaces or web frontends. For proc macros this means the interface of your macro or more specifically the compilation errors it generates. In this context UI tests make sense. You create a new piece of code, which interacts with another piece of code. You have no way to generate error codes or exceptions. The compiler is the one which should generate errors for your macro. We already covered how this can be done in 'Error reporting with syn' section.

trybuild is a crate which helps you create UI tests for macros. You write a test code which should compile or not. The library checks the desired outcome and in case of failure if the expected compilation error is generated. This might sound a bit complicated but it's actually very simple. Let's add a test to the example from the previous section (syn-error branch). The complete code is in a branch named try-build.

Let's add a test in lib.rs:

#[test]
fn ui() {
    let t = trybuild::TestCases::new();
    t.compile_fail("tests/ui/*.rs");
}

The code initialises trybuild and adds all .rs fils in tests/ui as tests which are supposed to fail. Now lets add a new test in tests/ui/wrong_answer.rs. It will call the proc macro with 43 which should generate an error:

use proc_macro_post::my_proc_macro;

fn main() {
    my_proc_macro!(43);
}

And now we run cargo test:

$ cargo test

    <skipped>

    Finished test [unoptimized + debuginfo] target(s) in 6.44s
     Running unittests src/lib.rs (target/debug/deps/proc_macro_post-faaca3c42c745804)

running 1 test

    <skipped>

    Finished dev [unoptimized + debuginfo] target(s) in 10.38s


test tests/ui/wrong_answer.rs ... wip

NOTE: writing the following output to `wip/wrong_answer.stderr`.
Move this file to `tests/ui/wrong_answer.stderr` to accept it as correct.
┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
error: Answer should be 42
 --> tests/ui/wrong_answer.rs:4:20
  |
4 |     my_proc_macro!(43);
  |                    ^^
┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈



test ui ... FAILED

failures:

<skipped>

What trybuild does is to compile the test and compare the compilation error with the one in wrong_answer.stderr. If they don't match (or the test compiles) - the test fails. If the stderr file doesn't exist the output will be saved in wip/wrong_answer.stderr. You can either move the file by hand (as suggested in the test output) or run cargo test with TRYBUILD=overwrite environment variable set. It will create (or overwrite) wrong_answers.stderr directly. This is convenient in combination with a version control. You can review the changes and commit directly. The output will be:

$ TRYBUILD=overwrite cargo test

    <skipped>

test tests/ui/wrong_answer.rs ... wip

NOTE: writing the following output to `tests/ui/wrong_answer.stderr`.
┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
error: Answer should be 42
 --> tests/ui/wrong_answer.rs:4:20
  |
4 |     my_proc_macro!(43);
  |                    ^^
┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈



test ui ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.19s

    <skipped>

At this point the test is added and should pass just fine. Try running cargo test again. You can also modify the stderr file and see how the test fails.

Conclusion

Macros in Rust doesn't look that scary after you spend some time with them. Quite the opposite - they enable you to do pretty interesting and useful things. You have to keep things under control though. If your your macro becomes too big and complicated it can easily become a nightmare to maintain.

Before wrapping up I want to mention one more syn feature which looks quite nice. Unfortunately I didn't spent enough time with it so writting about it is pointless. It's the Visit trait which is used to traverse the AST and do some processing/transformations on each node. I barely saw it in action and I don't feel comfortable to write about it. Maybe it can be a topic for another post :)

One more project worth exploring is expander. It expands a proc-macro in a file and uses include! directive on its place (copy-pasted from the project README). Sounds like a life saver for the cases when you want to see what code your proc macro is producing. For better or worse - I didn't have to use it so far.