Rust Proc Macros: A Beginner's Journey
Who am I and what is this post about
Recently I started working professionally with Rust and it wasn't long before I stumbled upon proc macros. I had to make some changes to a relatively complex (for me) macro which generates traits and implementations for them. Reading about macros (and especially proc macros) in Rust created some associations with C++ metaprogramming which is not exactly my cup of tea. As a result I was determined not to enjoy the task. However it turned out that proc macros aren't that bad. They are very different from the regular Rust one sees every day but they are not black magic. And you can do useful stuff with them.
So why this post? There are a lot of resources for learning proc macros out there. I've read some of them but despite that there were things which just didn't click for me in the beginning. I needed something more to reach the 'now this makes sense' moment. So after gaining some knowledge I decided to write this post and share my experience. And more specifically - the things I didn't initially understand.
Should you read it? The short answer is NO :). If you aren't a complete proc macro newbie - there is nothing for you here. I'll be very grateful if you want to read it and share your feedback/advise. But don't be disappointed if you expect to learn something new and you only see obvious stuff. If you are just starting to learn proc macros this post might help. But people are different - hard things for me are maybe easy for you and the other way around.
Learning about proc macros
There are a lot of good sources for learning about Rust's proc-macros. I can recommend two freely available ones:
- The Rust Programming Language - the official Rust book which is freely available online. Ch. 19 is dedicated to macros. Contains a detailed walk-through on how to write a derive macro. Explains syn and quote a little.
- The Rust Reference - explains the different types of macros and has got code samples for each of them. I find this very useful as general reference.
I've also found this post on LogRocket's blog very helpful.
You can read all day long about proc macros but you won't get anywhere until you start writing code. Fork Rust Latam: procedural macros workshop by David Tolnay, read the instructions and do the exercises. You'll not regret it.
What I wish I knew about proc-macros in advance
As I already mentioned proc-macros are different and require some mind shift. It's hard to write and exhaustive list of shifts so I'll share some which helped me.
Debugging macros
Macros are expanded during compilation so most of the regular debugging techniques don't work with them. Using a debugger is definitely not an option (please leave a comment if I am wrong). I followed the advice from the workshop and it was enough to figure out what's going on with my macros. It suggests two approaches - cargo expand
and printing traces.
cargo expand is another project by David Tolnay. It is a binary crate which you can install on your system. When invoked it will replace all macro invocations in a given source with the actual code which the macro produces and dump it on stdout. The command supports the regular target selection used in cargo build
and cargo test
so you can specify single modules/tests/etc.
The other approach mentioned is just to print the tokens which your macro generates on stderr. This is especially useful when you've messed something up and your macro generates an invalid code. The example from the repo:
eprintln!("TOKENS: {}", tokens);
Note that the code will be printed during compilation, not execution. Look for it in the output from cargo check
or cargo build
.
Another useful compiler option for fixing problems is -Zproc-macro-backtrace
. If your macro panics during expansion you can use this option to see a backtrace which helps to figure out what's wrong. A convenient way to run it via cargo is
RUSTFLAGS="-Z proc-macro-backtrace" cargo +nightly <cargo cmd>
proc-macro and proc-macro2
This was very confusing for me. Why two versions? Why first version is still alive if 2 is superior? There are good answers for these questions but they are scattered around the internet. I'll try to summarise them. First - why there are two versions? In nutshell because proc-macro
types can't exist outside proc macro code. For better explanation read this excerpt proc-macro2 crate documentation. There is no point in copy-pasting it here.
Why the two versions coexist? The input for each proc macro is TokenStream
type from proc-macro
crate. You can't escape from this - it should be in your outer API. But inside your implementation you should use proc_macro2
. It's more convenient and more testable.
Another thing that caused a lot of confusion in the beginning was how these two versions work together. syn
and quote
work with proc-macro2
while the entry point function requires proc-macro
. The result for me was a bunch of errors like these:
expected struct `TokenStream2`, found struct `proc_macro::TokenStream`
or it's reversed twin:
expected struct `proc_macro::TokenStream`, found struct `TokenStream2`
and also:
the trait `ToTokens` is not implemented for `proc_macro::TokenStream`
This drove me crazy. The solution is very simple. Use proc-macro
at API level and proc-macro2
everywhere else. Here is a sample to see what this means. You have got a simple rust crate with the following structure:
- Cargo.toml
- src
- lib.rs
- impl.rs
- src
You can see the code on GitHub. Make sure you have checked out branch master
. I'll also copy-paste the code in the post for convenience.
Cargo.toml looks like this:
[package] name = "my-proc-macro" version = "0.1.0" edition = "2021" [lib] proc-macro = true [dependencies] syn = {version = "1.0", features = ["full"]} quote = "1.0" proc-macro2 = "1.0"
This is a regular Cargo.toml
file. In [lib]
there is proc-macro = true
indicating the crate contains a proc-macro. Note the dependencies - we have got proc-macro2
there. proc_macro
is included by default.
Now src/lib.rs
:
use proc_macro::TokenStream; use syn::parse_macro_input; mod my_proc; #[proc_macro] pub fn my_proc_macro(input: TokenStream) -> TokenStream { let input = parse_macro_input!(input); my_proc::my_proc_impl(input).into() }
This is the main entrypoint I wrote about before. Note that we use TokenStream
from proc_macro
, because this is the 'API level'. Also note that my_proc
module is included here. The into()
call on the last line does the conversation between proc-macro
and proc-macro2
types. We'll get to it soon.
And finally src/my_proc.rs
:
use proc_macro2::TokenStream; use quote::quote; pub fn my_proc_impl(input: TokenStream) -> TokenStream { quote!(println!("Answer: {}", #input)) }
This is the implementation of the macro which uses proc_macro2
. The my_proc_impl
function returns proc_macro2::TokenStream
and the into()
call in the previous file converts it to proc_macro::TokenStream
.
Let's resume:
-
lib.rs
declares the API of the macro and usesproc-macro
. It calls an impl function from another module. -
impl.rs
contains the impl function and works withproc-macro2
. -
into()
is used to convert fromproc_macro2::TokenStream
toproc_macro::TokenStream
.
Organising your code
When I was writing my first macros I didn't structure my code very well. I used a very long single function doing a lot of work. This is a very bad idea because proc-macros are just code. As every other code it needs to be easy to read and test. Better way is to encapsulate the logic in functions which return some syn structs and at some point to stitch them together with quote!
. Let's have a look at a macro which prints a message (hardcoded in our case) and then the answer from our initial sample (passed to the macro as an integer). We'll split the code in two functions. The first one will print the message and the second one - the result itself. You can see the code in organising-your-code branch of the sample project. The only modification is in my_proc.rs
:
use proc_macro2::TokenStream; use quote::quote; use syn::{parse_quote, ExprMacro}; pub fn my_proc_impl(input: TokenStream) -> TokenStream { let progress = progress_message("Thinking about the answer".to_string()); let answer = answer(input); quote!( #progress; #answer; ) } fn progress_message(msg: String) -> ExprMacro { parse_quote!(println!(#msg)) } fn answer(result: TokenStream) -> ExprMacro { parse_quote!(println!("Answer: {}", #result)) }
I use parse_quote!
in this example but the result can be generated in many ways - modifying the input, extracting parts of it, etc.
Another pattern I have seen is all the functions to return TokenStream
and again combine them with quote!
. The benefit is that you are not limited to a single type plus you can handle unknown number of elements. For example (branch organising-your-code-tokenstream):
use proc_macro2::TokenStream; use quote::quote; pub fn my_proc_impl(input: TokenStream) -> TokenStream { let mut result = Vec::new(); result.push(progress_message("Thinking about the answer".to_string())); result.push(answer(input)); quote!( #(#result);* ) } fn progress_message(msg: String) -> TokenStream { quote!(println!(#msg)) } fn answer(result: TokenStream) -> TokenStream { quote!(println!("Answer: {}", #result)) }
#(#result);*
is an interpolation feature of quote. It expands all elements from the vector and puts a ;
between them. This is explained here.
The examples above are of course not universal but they were a good start for me. Do whatever works for you but do it in timely manner before you reach the point where you've got one big unmaintainable function.
A few words about syn and quote
These are mandatory libraries for working with proc-macros.
Syn parses the input Rust code (TokenStream
) to structures. With them you can generate new code, modify the existing one or remove code. Have a look at the list of structs in syn. There is one for every piece of Rust syntax you can think about. For example ExprClosure. This represents a closure expression like |a, b| a + b
. All relevant parts of this expression are extracted as struct fields. For example output
is its return type. Each field is another structure representing part of the syntax. You see how you start from a single struct representing something, and it has got other structs chained together to represent the whole code fragment. This is the AST (Abstract syntax tree) pattern mentioned in the documentation.
Quote crate provides the quote!
macro which gets Rust source code as input and converts it to TokenStream
. It can be used as a result of the macro you are writing or processed further. It also does 'quasi quoting'. This means you can use variables from the scope where quote!
is executed and the macro will embed them to the resulting TokenStream
. Let's have a look at a quick example:
fn generate_getter() -> TokenStream { const ANSWER: u32 = 42; quote! { fn get_the_answer() -> u32 { #ANSWER } } }
Note the #ANSWER
syntax inside the code block of quote!
. It refers to const ANSWER
defined in the beginning of the function. This is called variable interpolation and can be done with any type implementing ToTokens
trait. There are implementations for all base types and all syn
structs.
Some useful functions from syn and quote
Now let's see some functions from both crates which I believe are worth knowing. To avoid dead links all references to documentation link to specific version.
Spans and quote_spanned
Syn uses spans to represent the location (line and column number) of the expression in the source where it was initially located. This is used mainly for error reporting. All structs (AST elements) implement Spanned. The trait contains a single function - span()
which returns a Span. Then you can pass the span around and attach it to errors so that the compile renders them on the offending lines.
Spans are often used with quote_spanned
from quote crate. It generates a TokenStream
and attaches a span
to it. Let's see a small example which generates a compilation error. It's in quote-spanned branch from the sample code:
use proc_macro2::TokenStream; use quote::quote_spanned; use syn::spanned::Spanned; pub fn my_proc_impl(input: TokenStream) -> TokenStream { quote_spanned!(input.span() => compile_error!("I don't like this...");) }
The generated compilation error looks like:
Compiling proc-macro-post v0.1.0 (/home/ceco/projects/proc-macro-post) error: I don't like this... --> src/runner.rs:3:20 | 3 | my_proc_macro!(42); | ^^ error: could not compile `proc-macro-post` due to previous error
If you need to generate a more complex error message (which dumps variable values, etc) you can use format!
from the standard library.
Error reporting with syn
I barely touched error reporting in the previous section by mentioning spans and compile_error!
. Let's have a look at a more complete example. Syn
has got type aliases for Result
and Error
. Together they make error handling very elegant. Error
has got a method named to_compile_error()
which generates a compilation error from the error object. You can generate an Error
instance somewhere within your code and propagate it with ?
operator up to a place where you can handle it by converting it to an actual compilation error.
Let's see an example. We want to write a proc macro which accepts and integer and prints 'Answer: INTEGER'. However the only accepted value will be 42. For everything else an error will be generated.
Let's modify the example we used so far. The sample code is in branch syn-error from the sample code. my_proc.rs
contains:
use proc_macro2::TokenStream; use quote::quote; use syn::{parse2, spanned::Spanned, Error, LitInt, Result}; pub fn my_proc_impl(input: TokenStream) -> Result<TokenStream> { let span = input.span(); let ans = parse2::<LitInt>(input)?.base10_parse::<i32>()?; if ans != 42 { return Err(Error::new(span, "Answer should be 42")); } Ok(quote!(println!("Answer: {}", #ans);)) }
We import Error
and Result
from syn. The function my_proc_impl
parses the input to LitInt
(integer literal) and extracts its value. base10_parse
generates syn::Error
on failure (just as most of the functions in syn) so we can use ?
to unwrap or propagate the error. Then if the input is not 42 we return another instance of syn::Error
. Its constructor accepts two parameters - a span and an error message.
Now let's see how this function is used in lib.rs
:
use proc_macro::TokenStream; use syn::parse_macro_input; mod my_proc; #[proc_macro] pub fn my_proc_macro(input: TokenStream) -> TokenStream { let input = parse_macro_input!(input); my_proc::my_proc_impl(input) .unwrap_or_else(|e| e.to_compile_error()) .into() }
Here we call my_proc_impl
and convert any syn::Error
to compilation error. If we pass bad value to the proc macro we have a nice compilation error:
error: expected integer literal --> src/runner.rs:3:20 | 3 | my_proc_macro!("test"); | ^^^^^^ error: could not compile `proc-macro-post` due to previous error
Converting code snippets to syn structs
Sometimes you want to convert a piece of rust code to a syn AST struct and use it later, pass it around etc. You can do this with parse_quote
. Its input is rust code (just as quote!
) but instead of generating a TokenStream
it returns a syn struct. The exact type is determined by type inference so the type of the result should always be specified.
Let's see a small (and not at all practical) example:
use proc_macro2::TokenStream; use quote::quote; use syn::{parse_quote, ExprMacro}; pub fn my_proc_impl(input: TokenStream) -> TokenStream { let msg: ExprMacro = parse_quote!(println!("Thinking about the answer...")); quote!( #msg; println!("Answer: {}", #input); ) }
parse_quote!
here is used to convert the println!
statement to ExprMacro
. In practice this is pointless because you can put the println!
statement inside quote!
directly but this code is for demonstration purposes only. The type of msg
is explicitly specified (ExprMacro
). You can use msg
not only in quote!
. It can be returned as a value, passed to another function and so on.
Generating identifiers
Ident
from syn represents an identifier (name of function, variable, struct, etc). As explained in quote documentation here you can use token concatenation to generate identifiers. Quote provides a dedicated macro for this - format_ident!
. Use it anytime you want to generate an identifier.
Testing your code
If you use proc-macro2
(which you should) you can write all kind of unit and integration tests for your proc macro and this should be pretty standard. There is one aspect of testing specific to macros though - the UI tests. This was confusing for me because when I hear UI I usually think about graphical user interfaces or web frontends. For proc macros this means the interface of your macro or more specifically the compilation errors it generates. In this context UI tests make sense. You create a new piece of code, which interacts with another piece of code. You have no way to generate error codes or exceptions. The compiler is the one which should generate errors for your macro. We already covered how this can be done in 'Error reporting with syn
' section.
trybuild is a crate which helps you create UI tests for macros. You write a test code which should compile or not. The library checks the desired outcome and in case of failure if the expected compilation error is generated. This might sound a bit complicated but it's actually very simple. Let's add a test to the example from the previous section (syn-error branch). The complete code is in a branch named try-build.
Let's add a test in lib.rs
:
#[test] fn ui() { let t = trybuild::TestCases::new(); t.compile_fail("tests/ui/*.rs"); }
The code initialises trybuild and adds all .rs fils in tests/ui
as tests which are supposed to fail. Now lets add a new test in tests/ui/wrong_answer.rs
. It will call the proc macro with 43
which should generate an error:
use proc_macro_post::my_proc_macro; fn main() { my_proc_macro!(43); }
And now we run cargo test
:
$ cargo test <skipped> Finished test [unoptimized + debuginfo] target(s) in 6.44s Running unittests src/lib.rs (target/debug/deps/proc_macro_post-faaca3c42c745804) running 1 test <skipped> Finished dev [unoptimized + debuginfo] target(s) in 10.38s test tests/ui/wrong_answer.rs ... wip NOTE: writing the following output to `wip/wrong_answer.stderr`. Move this file to `tests/ui/wrong_answer.stderr` to accept it as correct. ┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈ error: Answer should be 42 --> tests/ui/wrong_answer.rs:4:20 | 4 | my_proc_macro!(43); | ^^ ┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈ test ui ... FAILED failures: <skipped>
What trybuild
does is to compile the test and compare the compilation error with the one in wrong_answer.stderr
. If they don't match (or the test compiles) - the test fails. If the stderr file doesn't exist the output will be saved in wip/wrong_answer.stderr
. You can either move the file by hand (as suggested in the test output) or run cargo test
with TRYBUILD=overwrite
environment variable set. It will create (or overwrite) wrong_answers.stderr
directly. This is convenient in combination with a version control. You can review the changes and commit directly. The output will be:
$ TRYBUILD=overwrite cargo test <skipped> test tests/ui/wrong_answer.rs ... wip NOTE: writing the following output to `tests/ui/wrong_answer.stderr`. ┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈ error: Answer should be 42 --> tests/ui/wrong_answer.rs:4:20 | 4 | my_proc_macro!(43); | ^^ ┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈ test ui ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.19s <skipped>
At this point the test is added and should pass just fine. Try running cargo test
again. You can also modify the stderr file and see how the test fails.
Conclusion
Macros in Rust doesn't look that scary after you spend some time with them. Quite the opposite - they enable you to do pretty interesting and useful things. You have to keep things under control though. If your your macro becomes too big and complicated it can easily become a nightmare to maintain.
Before wrapping up I want to mention one more syn feature which looks quite nice. Unfortunately I didn't spent enough time with it so writting about it is pointless. It's the Visit
trait which is used to traverse the AST and do some processing/transformations on each node. I barely saw it in action and I don't feel comfortable to write about it. Maybe it can be a topic for another post :)
One more project worth exploring is expander. It expands a proc-macro in a file and uses include!
directive on its place (copy-pasted from the project README). Sounds like a life saver for the cases when you want to see what code your proc macro is producing. For better or worse - I didn't have to use it so far.
Comments
Comments powered by Disqus