Rust for C++ developers part 2: References, Structs and Traits

Tsvetomir Dimitrov

2023-12-17 14:37

This is the second post from my Rust for C++ developers post series and it will focus on structs, traits and some basic data structures embedded into the language. Similar to part 1 this post will continue exploring Rust's syntax and making parallels with C++. For each topic I'll provide links to the Rust book. You don't need a development environment for this post too. Rust Playground should be enough to run the examples and experiment with them. Let's get started.

References

The references in Rust are similar to the pointers in C/C++. They can be created and left uninitialized (or simply said pointing to nothing), reassigned and explicitly dereferenced (with few exceptions). Creating a reference to another variable is called borrowing. Conceptually the references in Rust behave exactly the same as the pointers/references in C/C++. There are a few syntactic differences which we will see in a moment.

A big difference between references in Rust and C++ is the way the compiler handles them. Rust's compiler has got a borrow checker which tries to catch potentially dangerous usages like trying to dereference an uninitialized reference or dropping a variable which will be read via a reference later. In this post we will cover only the basics and ignore the borrow checker. It's a big topic which requires at least a post on its own.

References can be shared (like const pointer/reference in C/C++) or mutable (as regular pointer/reference in C/C++). The former allow read-only access to the source while the latter can also modify it. Let's see how a shared reference is created and used:

fn main() {
    let a: u32 = 5;
    let r = &a;

    println!("{}", *r);
}

r is a reference to a. It needs to be explicitly dereferenced if we want to get its value. If we try to modify a via r we'll get a compilation error. For example this code:

fn main() {
    let a: u32 = 5;
    let r = &a;
    *r = 2;
    println!("{}", *r);
}

Yields this error:

error[E0594]: cannot assign to `*r`, which is behind a `&` reference
 --> src/main.rs:4:5
  |
4 |     *r = 2;
  |     ^^^^^^ `r` is a `&` reference, so the data it refers to cannot be written
  |
help: consider changing this to be a mutable reference
  |
3 |     let r = &mut a;
  |              +++

For more information about this error, try `rustc --explain E0594`.

To fix the code above we need to make a mutable and change r to a mutable reference:

fn main() {
    let mut a: u32 = 5;
    let r = &mut a;
    *r = 2;
    println!("{}", *r);
}

Note that mut is appended to the & operator, not the reference declaration and that we dereference to modify the value. This should be natural for you if you think about pointers but might be confusing if you relate it to the C++' references.

You probably know that in C int const * and int * const are different things. The story is similar in Rust too:

fn main() {
    let mut a: u32 = 5;
    let mut b: u32 = 6;

    // mutable reference to `a`
    let r = &mut a;
    // `a` can be modified via `r`
    *r = 2;
    // `r` is mutable reference but `r` itself is not mut.
    // We can't point `r` to another variable.
    //r = &mut b; //won't compile;
    // However we can shadow `r` and achieve the same effect
    let r = &b;

    // `q` is a shared reference but `q` itself is mut
    let mut q = &a;
    // So we can initialise `q` with another variable
    q = &b;
    // But we can't change `b` via `r`
    //*r = 2; won't compile

    println!("{}", *q);
}

r is mutable reference meaning that we can modify a via r. We can't reassign r to b because r itself is not mutable. We can however 'shadow' r with let and assign it to something else and even make it immutable. This is a common practice in Rust.

q on the other side is a shared reference but q itself is mutable. So we can reassign q to b and the compiler won't complain. We can't however modify the original value though because the reference is not mut.

References are also used as function parameters to avoid moving (consuming) the initial variable. The syntax is identical. For example:

fn main() {
    let mut a = 5;
    increment_something(&mut a);
    print_something(&a);
}

fn increment_something(a: &mut u32) {
    // `a` is a mutable - we can increment
    *a += 1
}

fn print_something(a: &u32) {
    // `a` is shared - we can only read
    println!("{}", *a)
}

I already mentioned that references in Rust should be explicitly dereferenced. There is one exception however. Accessing structure fields via a reference doesn't require dereferencing. * can be omitted in that case. Second notable exception is 'reference to reference ... to reference' (&&&u32 for example). In this case a single dereference is enough to access the value pointed by the last reference (*a instead of ***a for example).

References are described in Chapter 4 from the Rust book.

Struct

Structs are quite similar to their C++ counterparts. First let's see how a struct is defined in Rust:

struct MyData {
    pub count: usize,
}

This struct is named MyData and has got just one field named count with type usize. Struct fields are private by default. pub keyword is used to declare a field as public. In our example count is public. Public/private behave in the same way as in C++ - pub fields can be modified externally while private ones only by methods. There is no struct inheritance in Rust so there is no alternative of C++' protected access modifier.

Methods of a struct are declared in an impl block:`

struct MyData {
    count: usize,
}

impl MyData {
    pub fn new() -> Self {
        MyData {count: 0}
    }

    pub fn get_count(&self) -> usize {
        self.count
    }

    pub fn inc(&mut self) {
        self.count += 1
    }
}

fn main() {
    let mut data = MyData::new();
    println!("Count is {}", data.get_count());
    data.inc();
    println!("Count is {}", data.get_count());
}

First note that count is private in this example. impl MyData is the impl block for the struct - it contains all its methods. Each method has got an access modifier. As in C++ public methods are part of the public API of the struct while the private ones can only be called from another methods.

All the methods has got self as it's first parameter which is the alternative of this in C++. In Rust self is mandatory for methods which are called on a struct instance. To be precise it's the same in C++ - each method has got this as it's first parameter, but this happens under the hood. In Rust it's explicit and you will see why in a second. Also if you want to access any of the struct's fields you need to do it via self.

self can either be a shared reference (&self), mutable reference (&mut self) or there can be no self at all. A few words about each:

&self - this is an alternative of a const method in C++. You can access fields via self but you can't modify them.
&mut self - this is the alternative of a non-const C++ method. self is a mutable reference so you can access and modify fields via it.
no self - this is called 'associated function' and is the alternative of a static method in C++. They are not called on a specific instance so they can't access any fields. A common pattern in Rust is to have a pub fn new() -> Self (which may or may not have any arguments) which initialises an instance of the struct.
self - self is not a reference, so the instance of the struct is moved into the method. In Rust terminology the instance is 'consumed'. After calling such a method the instance of the struct can't be used anymore. It hasn't got an alternative in C++ but if you think about a method just as a regular function - it makes perfect sense. A common pattern for such methods are to return another type (and perform type conversion) or to return the same type (and perform some sort of mutation).

Let's get back to our example and see what methods we have got in the impl block for MyData. First there is a pub fn new() -> Self. Self is an alias for the type the impl block is related for. We can also write pub fn new() -> MyData but it's less convenient. The function body of new demonstrates how structs are initialised. We list each field of the struct and its corresponding value (separated by :) in curly braces. As we already mentioned this functions are not called for a specific instance. Have a look how new is used in main - MyData::new().

get_count is a getter for count (which is private). It doesn't need to modify self so it takes an immutable reference to the instance (&self). Again note that to access any fields of the object you need to use self. inc on the other side increments count - it modifies the object so it needs a mutable reference to self (&mut self).

Structs are covered in Chapter 5 from the Rust book. The chapter is worth reading.

Traits

Traits are similar to C++' abstract classes (or interfaces in other languages). I assume you are already familiar with this concept. If not have a look at Chapter 10.2 from the Rust book.

Let's see how to define and implement a trait:

trait Printer {
    fn print_something(&self);

    fn greet_and_print(&self) {
        println!("Hello!");
        self.print_something();
    }
}

struct MyStruct {
    answer: u32,
}

impl Printer for MyStruct {
    fn print_something(&self) {
        println!("{}", self.answer);
    }
}

fn main() {
    let m = MyStruct { answer: 42 };
    m.print_something();
    m.greet_and_print();
}

A trait is is defined with the trait keyword followed by a name. The trait body can contain either function signatures (like print_something) or functions with definitions (like greet_and_print). The former are functions which the implementer must define (just like pure virtual functions in C++), the latter are functions with a default implementation provided and implementing them is optional (unlike the default pure virtual function implementation in C++).

The trait in our example has got two functions - print_something which the implementer needs to implement and greet_and_print which has got a default implementation. If the implementer doesn't implement it - the default implementation will be used. Unlike the abstract classes in C++ the functions in a trait hasn't got an access modifier. They are public by default and can't be hidden. The reason is that a trait is supposed to define a public api. Private functions doesn't make sense there.

struct MyStruct has got just one u32 field and it will implement our Printer trait. The implementation is similar to impl block for a struct but it also specifies a trait name. In our case - impl Printer for MyStruct. A struct can implement more than one trait. In that case there will be separate impl blocks for each trait. On top of them the struct can have its own impl block(s).

When we implement a trait method for a struct we can use all the fields of the struct and everything implemented for the struct in question. This means that in impl Printer for MyStruct we can call functions from other traits (if they are implemented for MyStruct) or functions from its own impl block. In our example we use the answer field from the struct.

Traits can also contain associated functions. For example:

trait Printer {
    fn print_42();
}

struct MyStruct {}

impl Printer for MyStruct {
    fn print_42() {
        println!("42");
    }
}

fn main() {
    MyStruct::print_42();
}

Here trait Printer declares only a single associated function print_42 which the implementer must define. Then we have got MyStruct without any fields and right after an implementation of Printer for MyStruct. print_42 is an associated function so no instance is needed for calling it.

To avoid ambiguity (and also for security reasons) there are two strict rules where a trait can be implemented:

Any trait can be implemented for a struct X in module M as long as all trait implementations are in module M.
Trait X in module M can be implemented for any struct as long as all implementations are in module M.

Or in other words we can implement a trait for a struct either at the module where struct is defined or at the module where the trait is defined. This way we are protected from surprises. Imagine how a third party module adds an implementation for a type in the standard library and you spend hours wondering what's going on. The compiler just won't allow it.

After implementing a trait the implementer can be used in any place where the trait in question is required. In C++ this is usually done via a pointer to a base class. Rust supports this too but the preferred way is via generics and trait bounds. This is also a big topic which requires a separate post so I will not cover it here.

Tuple

Tuples are similar to std::tuple in C++ but in Rust they are a separate data type. They are declared like this:

fn main() {
    let powers = (1, 2, 4, 8, 16);
    println!("2^0={}, 2^2={}", powers.0, powers.2);
}

In this example powers is a tuple of uints. The elements of the tuple are accessed by the dot operator and its index. In the example above the indexes match with the corresponding exponents of 2. Tuples can't be extended after creation but their elements can be modified if they are mut. For example:

fn main() {
    let mut data = (15, 16, 18);
    data.0 += 1;
    println!("{}", data.0);
}

Here the tuple is mutable and we are incrementing the first element with 1. If data is not mut we will get a compilation error.

You can also set the types of the tuples explicitly if necessary. Note that the tuple can contain different types:

fn main() {
    let person: (&str, u32) = ("John", 16);
    println!("Name: {}, age: {}", person.0, person.1);
}

Tuples are described here in the Rust book.

Array

Arrays in Rust are very similar to those in C. They are allocated on the stack, have got a constant size and all their elements are from the same type. Here is the syntax used to declare an array:

let a = [0, 1, 2, 3, 4, 5];

Rust arrays are indexed just as the C ones - e.g. println!("{}, a[0]); for the array in the example above. Usually the exact type of the array and its size is deduced by the compiler but it can be set explicitly:

fn main() {
    let a: [u32; 6] = [0, 1, 2, 3, 4, 5];
    println!("{}", a[1]);
}

u32 is the type of the elements, 6 is the size of the array. If the array is mutable, its values can be modified.

As the size of the array is known during compilation the compiler provides bound checking for us. E.g. this code:

fn main() {
    let a = [0, 1, 2];
    println!("{:?}", a[3]);
}

will yield a compilation error:

error: this operation will panic at runtime
 --> src/main.rs:3:22
  |
3 |     println!("{:?}", a[3]);
  |                      ^^^^ index out of bounds: the length is 3 but the index is 3
  |
  = note: `#[deny(unconditional_panic)]` on by default

Arrays are described in Section 3.2 from the Rust book.

Slices

Slices are 'fat pointers' (pointer + length) to various objects like vectors, strings, arrays. For now we'll focus only on array slices. Let's see how to create a slice:

fn main() {
    let a = [0, 1, 2, 3, 4, 5];
    let sa = &a[0..2];  // create a slice

    println!("Length: {}", sa.len());
    println!("0->{} 1->{}", sa[0], sa[1]);
    // println!("3->{}", sa[3]); // will compile but will panic
}

First we create an array a and then a slice sa with the first two elements of the array (0 and 1). 0..2 is called 'range' in Rust. It is used to generate numbers (e.g. for i in 0..5) and also to select range of elements in collections. Ranges support the following syntax:

0..2 - 0 to 2, excluding 2.
0..=2 - 0 to 2, including 2.
0.. - from 0 to the end. If the range is used to index a collection this means from 0 to the end of the collection (including the last element).
..2 - same as 0..2
.. - same as 0..

We can get the length of a slice with its len() method and we can index the slice as a regular array. Unlike the arrays, the slices' length is not known during compilation so the compiler can't protect us from overflows. Overflowing a slice will result in a runtime error. Try uncommenting the last line of the example and rerun the code. It will generate an error:

thread 'main' panicked at 'index out of bounds: the len is 2 but the index is 3', src/main.rs:7:23

As everything else in Rust the slices are immutable by default. We can't modify a via sa in the example above. To achieve this we need to declare a mutable slice. For example:

fn main() {
    let mut b = [0, 1, 2, 3, 4, 5];
    let sb = &mut b[0..2];
    sb[0] = 10;
    println!("{}", sb[0]);
}

Here we create a mutable slice sb (let mut sb). Then we can modify the array via the slice. In this example we set the first element to 10.

You can find more information about slices in Section 4.3 from the Rust book.

Iterating over arrays and slices

For loops in Rust has got a functionality similar to the range based for loops in C++. This syntax works for any container including arrays and slices:

fn main() {
    let data = [1, 2, 3, 4, 5];
    for d in data {
        println!("{}", d);
    }
}

It's very important to remember that for x in CONTAINER moves out (consumea) the elements from the container. This doesn't happen here because the array contains i32s which are scalar types and are copied by default. If you write the same code for non-scalar type you will end up with consumed array:

fn main() {
    let data = ["a".to_string(), "b".to_string(), "c".to_string()];
    for d in data {
        println!("{}", d);
    }
    // println!("len: {}", data.len());
}

The code seems to do the same but you'll see the problem when you uncomment the last line:

error[E0382]: borrow of moved value: `data`
 --> src/main.rs:6:25
  |
2 |     let data = ["a".to_string(), "b".to_string(), "c".to_string()];
  |         ---- move occurs because `data` has type `[String; 3]`, which does not implement the `Copy` trait
3 |     for d in data {
  |              ---- `data` moved due to this implicit call to `.into_iter()`
...
6 |     println!("len: {}", data.len());
  |                         ^^^^^^^^^^ value borrowed here after move
  |
note: `into_iter` takes ownership of the receiver `self`, which moves `data`
 --> /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/iter/traits/collect.rs:271:18
help: consider iterating over a slice of the `[String; 3]`'s content to avoid moving into the `for` loop
  |
3 |     for d in &data {
  |              +

For more information about this error, try `rustc --explain E0382`.

For non-scalar types Rust follows its 'move by default' policy and the array is consumed after the iteration. This is valid for all non-scalar data types unless they implement Copy trait. To overcome this you need to borrow the array before iterating over it (as suggested by the compiler):

fn main() {
    let data = ["a".to_string(), "b".to_string(), "c".to_string()];
    for d in &data {
        println!("{}", d);
    }
    println!("len: {}", data.len());
}

data is borrowed and the type of d is &String instead of String. After the iteration the array remains untouched.

The same syntax works on slices (with the same gotchas).

Conclusion

At this point, especially if you haven't skipped the Rust book chapters, you should have a good basic understanding of the language. My advise is to continue by reading and writing some Rust code. Try to setup a local development environment. Try to write some code and see what will go wrong. The compiler is quite pedantic but at the same time helpful. Fetching some actual Rust project from GitHub and browsing the code is also a good idea. It will help you see what you don't know and continue learning.