Rust for C++ developers part 2: References, Structs and Traits
This is the second post from my Rust for C++ developers post series and it will focus on structs, traits and some basic data structures embedded into the language. Similar to part 1 this post will continue exploring Rust's syntax and making parallels with C++. For each topic I'll provide links to the Rust book. You don't need a development environment for this post too. Rust Playground should be enough to run the examples and experiment with them. Let's get started.
References
The references in Rust are similar to the pointers in C/C++. They can be created and left uninitialized (or simply said pointing to nothing), reassigned and explicitly dereferenced (with few exceptions). Creating a reference to another variable is called borrowing. Conceptually the references in Rust behave exactly the same as the pointers/references in C/C++. There are a few syntactic differences which we will see in a moment.
A big difference between references in Rust and C++ is the way the compiler handles them. Rust's compiler has got a borrow checker which tries to catch potentially dangerous usages like trying to dereference an uninitialized reference or dropping a variable which will be read via a reference later. In this post we will cover only the basics and ignore the borrow checker. It's a big topic which requires at least a post on its own.
References can be shared (like const pointer/reference in C/C++) or mutable (as regular pointer/reference in C/C++). The former allow read-only access to the source while the latter can also modify it. Let's see how a shared reference is created and used:
fn main() { let a: u32 = 5; let r = &a; println!("{}", *r); }
r
is a reference to a
. It needs to be explicitly dereferenced if we want to get its value. If we
try to modify a
via r
we'll get a compilation error. For example this code:
fn main() { let a: u32 = 5; let r = &a; *r = 2; println!("{}", *r); }
Yields this error:
error[E0594]: cannot assign to `*r`, which is behind a `&` reference --> src/main.rs:4:5 | 4 | *r = 2; | ^^^^^^ `r` is a `&` reference, so the data it refers to cannot be written | help: consider changing this to be a mutable reference | 3 | let r = &mut a; | +++ For more information about this error, try `rustc --explain E0594`.
To fix the code above we need to make a
mutable and change r
to a mutable reference:
fn main() { let mut a: u32 = 5; let r = &mut a; *r = 2; println!("{}", *r); }
Note that mut
is appended to the &
operator, not the reference declaration and that we
dereference to modify the value. This should be natural for you if you think about pointers but
might be confusing if you relate it to the C++' references.
You probably know that in C int const *
and int * const
are different things. The story is
similar in Rust too:
fn main() { let mut a: u32 = 5; let mut b: u32 = 6; // mutable reference to `a` let r = &mut a; // `a` can be modified via `r` *r = 2; // `r` is mutable reference but `r` itself is not mut. // We can't point `r` to another variable. //r = &mut b; //won't compile; // However we can shadow `r` and achieve the same effect let r = &b; // `q` is a shared reference but `q` itself is mut let mut q = &a; // So we can initialise `q` with another variable q = &b; // But we can't change `b` via `r` //*r = 2; won't compile println!("{}", *q); }
r
is mutable reference meaning that we can modify a
via r
. We can't reassign r
to b
because r
itself is not mutable. We can however 'shadow' r
with let
and assign it to something
else and even make it immutable. This is a common practice in Rust.
q
on the other side is a shared reference but q
itself is mutable. So we can reassign q
to b
and the compiler won't complain. We can't however modify the original value though because the
reference is not mut
.
References are also used as function parameters to avoid moving (consuming) the initial variable. The syntax is identical. For example:
fn main() { let mut a = 5; increment_something(&mut a); print_something(&a); } fn increment_something(a: &mut u32) { // `a` is a mutable - we can increment *a += 1 } fn print_something(a: &u32) { // `a` is shared - we can only read println!("{}", *a) }
I already mentioned that references in Rust should be explicitly dereferenced. There is one
exception however. Accessing structure fields via a reference doesn't require dereferencing. *
can
be omitted in that case. Second notable exception is 'reference to reference ... to reference'
(&&&u32
for example). In this case a single dereference is enough to access the value pointed by
the last reference (*a
instead of ***a
for example).
References are described in Chapter 4 from the Rust book.
Struct
Structs are quite similar to their C++ counterparts. First let's see how a struct is defined in Rust:
struct MyData { pub count: usize, }
This struct is named MyData
and has got just one field named count
with type usize
. Struct
fields are private by default. pub
keyword is used to declare a field as public. In our example
count
is public. Public/private behave in the same way as in C++ - pub fields can be modified
externally while private ones only by methods. There is no struct inheritance in Rust so there is no
alternative of C++' protected
access modifier.
Methods of a struct are declared in an impl
block:`
struct MyData { count: usize, } impl MyData { pub fn new() -> Self { MyData {count: 0} } pub fn get_count(&self) -> usize { self.count } pub fn inc(&mut self) { self.count += 1 } } fn main() { let mut data = MyData::new(); println!("Count is {}", data.get_count()); data.inc(); println!("Count is {}", data.get_count()); }
First note that count
is private in this example. impl MyData
is the impl block for the struct -
it contains all its methods. Each method has got an access modifier. As in C++ public methods
are part of the public API of the struct while the private ones can only be called from another
methods.
All the methods has got self
as it's first parameter which is the alternative of this
in C++. In
Rust self
is mandatory for methods which are called on a struct instance. To be precise it's the
same in C++ - each method has got this
as it's first parameter, but this happens under the hood.
In Rust it's explicit and you will see why in a second. Also if you want to access any of the
struct's fields you need to do it via self.
self
can either be a shared reference (&self
), mutable reference (&mut self
) or there can be
no self
at all. A few words about each:
-
&self
- this is an alternative of aconst
method in C++. You can access fields via self but you can't modify them. -
&mut self
- this is the alternative of a non-const C++ method.self
is a mutable reference so you can access and modify fields via it. -
no
self
- this is called 'associated function' and is the alternative of a static method in C++. They are not called on a specific instance so they can't access any fields. A common pattern in Rust is to have apub fn new() -> Self
(which may or may not have any arguments) which initialises an instance of the struct. -
self
- self is not a reference, so the instance of the struct is moved into the method. In Rust terminology the instance is 'consumed'. After calling such a method the instance of the struct can't be used anymore. It hasn't got an alternative in C++ but if you think about a method just as a regular function - it makes perfect sense. A common pattern for such methods are to return another type (and perform type conversion) or to return the same type (and perform some sort of mutation).
Let's get back to our example and see what methods we have got in the impl
block for MyData
.
First there is a pub fn new() -> Self
. Self
is an alias for the type the impl
block is related
for. We can also write pub fn new() -> MyData
but it's less convenient. The function body of new
demonstrates how structs are initialised. We list each field of the struct and its corresponding
value (separated by :
) in curly braces. As we already mentioned this functions are not called for
a specific instance. Have a look how new
is used in main
- MyData::new()
.
get_count
is a getter for count (which is private). It doesn't need to modify self
so it takes
an immutable reference to the instance (&self
). Again note that to access any fields of the object
you need to use self.
inc
on the other side increments count
- it modifies the object so it
needs a mutable reference to self
(&mut self
).
Structs are covered in Chapter 5 from the Rust book. The chapter is worth reading.
Traits
Traits are similar to C++' abstract classes (or interfaces in other languages). I assume you are already familiar with this concept. If not have a look at Chapter 10.2 from the Rust book.
Let's see how to define and implement a trait:
trait Printer { fn print_something(&self); fn greet_and_print(&self) { println!("Hello!"); self.print_something(); } } struct MyStruct { answer: u32, } impl Printer for MyStruct { fn print_something(&self) { println!("{}", self.answer); } } fn main() { let m = MyStruct { answer: 42 }; m.print_something(); m.greet_and_print(); }
A trait is is defined with the trait
keyword followed by a name. The trait body can contain either
function signatures (like print_something
) or functions with definitions (like greet_and_print
).
The former are functions which the implementer must define (just like pure virtual functions in
C++), the latter are functions with a default implementation provided and implementing them is
optional (unlike the default pure virtual function implementation in C++).
The trait in our example has got two functions - print_something
which the implementer needs to
implement and greet_and_print
which has got a default implementation. If the implementer doesn't
implement it - the default implementation will be used. Unlike the abstract classes in C++ the
functions in a trait hasn't got an access modifier. They are public by default and can't be hidden.
The reason is that a trait is supposed to define a public api. Private functions doesn't make sense
there.
struct MyStruct
has got just one u32
field and it will implement our Printer
trait. The
implementation is similar to impl
block for a struct but it also specifies a trait name. In our
case - impl Printer for MyStruct
. A struct can implement more than one trait. In that case there
will be separate impl
blocks for each trait. On top of them the struct can have its own impl
block(s).
When we implement a trait method for a struct we can use all the fields of the struct and everything
implemented for the struct in question. This means that in impl Printer for MyStruct
we can call
functions from other traits (if they are implemented for MyStruct
) or functions from its own
impl
block. In our example we use the answer
field from the struct.
Traits can also contain associated functions. For example:
trait Printer { fn print_42(); } struct MyStruct {} impl Printer for MyStruct { fn print_42() { println!("42"); } } fn main() { MyStruct::print_42(); }
Here trait Printer
declares only a single associated function print_42
which
the implementer must define. Then we have got MyStruct
without any fields and right after an
implementation of Printer
for MyStruct
. print_42
is an associated function so no instance is
needed for calling it.
To avoid ambiguity (and also for security reasons) there are two strict rules where a trait can be implemented:
- Any trait can be implemented for a struct X in module M as long as all trait implementations are in module M.
- Trait X in module M can be implemented for any struct as long as all implementations are in module M.
Or in other words we can implement a trait for a struct either at the module where struct is defined or at the module where the trait is defined. This way we are protected from surprises. Imagine how a third party module adds an implementation for a type in the standard library and you spend hours wondering what's going on. The compiler just won't allow it.
After implementing a trait the implementer can be used in any place where the trait in question is required. In C++ this is usually done via a pointer to a base class. Rust supports this too but the preferred way is via generics and trait bounds. This is also a big topic which requires a separate post so I will not cover it here.
Tuple
Tuples are similar to std::tuple
in C++ but in Rust they are a separate data type. They are
declared like this:
fn main() { let powers = (1, 2, 4, 8, 16); println!("2^0={}, 2^2={}", powers.0, powers.2); }
In this example powers
is a tuple of uints. The elements of the tuple are accessed by the dot
operator and its index. In the example above the indexes match with the corresponding exponents of
2. Tuples can't be extended after creation but their elements can be modified if they are mut
. For
example:
fn main() { let mut data = (15, 16, 18); data.0 += 1; println!("{}", data.0); }
Here the tuple is mutable and we are incrementing the first element with 1. If data
is not mut
we will get a compilation error.
You can also set the types of the tuples explicitly if necessary. Note that the tuple can contain different types:
fn main() { let person: (&str, u32) = ("John", 16); println!("Name: {}, age: {}", person.0, person.1); }
Tuples are described here in the Rust book.
Array
Arrays in Rust are very similar to those in C. They are allocated on the stack, have got a constant size and all their elements are from the same type. Here is the syntax used to declare an array:
let a = [0, 1, 2, 3, 4, 5];
Rust arrays are indexed just as the C ones - e.g. println!("{}, a[0]);
for the array in the
example above. Usually the exact type of the array and its size is deduced by the compiler but it
can be set explicitly:
fn main() { let a: [u32; 6] = [0, 1, 2, 3, 4, 5]; println!("{}", a[1]); }
u32
is the type of the elements, 6
is the size of the array. If the array is mutable, its values can be modified.
As the size of the array is known during compilation the compiler provides bound checking for us. E.g. this code:
fn main() { let a = [0, 1, 2]; println!("{:?}", a[3]); }
will yield a compilation error:
error: this operation will panic at runtime --> src/main.rs:3:22 | 3 | println!("{:?}", a[3]); | ^^^^ index out of bounds: the length is 3 but the index is 3 | = note: `#[deny(unconditional_panic)]` on by default
Arrays are described in Section 3.2 from the Rust book.
Slices
Slices are 'fat pointers' (pointer + length) to various objects like vectors, strings, arrays. For now we'll focus only on array slices. Let's see how to create a slice:
fn main() { let a = [0, 1, 2, 3, 4, 5]; let sa = &a[0..2]; // create a slice println!("Length: {}", sa.len()); println!("0->{} 1->{}", sa[0], sa[1]); // println!("3->{}", sa[3]); // will compile but will panic }
First we create an array a
and then a slice sa
with the first two elements of the array (0 and
1). 0..2
is called 'range' in Rust. It is used to generate numbers (e.g. for i in 0..5
) and
also to select range of elements in collections. Ranges support the following syntax:
-
0..2
- 0 to 2, excluding 2. -
0..=2
- 0 to 2, including 2. -
0..
- from 0 to the end. If the range is used to index a collection this means from 0 to the end of the collection (including the last element). -
..2
- same as0..2
-
..
- same as0..
We can get the length of a slice with its len()
method and we can index the slice as a regular
array. Unlike the arrays, the slices' length is not known during compilation so the compiler can't
protect us from overflows. Overflowing a slice will result in a runtime error. Try uncommenting the
last line of the example and rerun the code. It will generate an error:
thread 'main' panicked at 'index out of bounds: the len is 2 but the index is 3', src/main.rs:7:23
As everything else in Rust the slices are immutable by default. We can't modify a
via sa
in the
example above. To achieve this we need to declare a mutable slice. For example:
fn main() { let mut b = [0, 1, 2, 3, 4, 5]; let sb = &mut b[0..2]; sb[0] = 10; println!("{}", sb[0]); }
Here we create a mutable slice sb
(let mut sb
). Then we can modify the array via the slice. In
this example we set the first element to 10.
You can find more information about slices in Section 4.3 from the Rust book.
Iterating over arrays and slices
For loops in Rust has got a functionality similar to the range based for loops in C++. This syntax works for any container including arrays and slices:
fn main() { let data = [1, 2, 3, 4, 5]; for d in data { println!("{}", d); } }
It's very important to remember that for x in CONTAINER
moves out (consumea) the elements
from the container. This doesn't happen here because the array contains i32
s which are scalar
types and are copied by default. If you write the same code for non-scalar type you will end up with
consumed array:
fn main() { let data = ["a".to_string(), "b".to_string(), "c".to_string()]; for d in data { println!("{}", d); } // println!("len: {}", data.len()); }
The code seems to do the same but you'll see the problem when you uncomment the last line:
error[E0382]: borrow of moved value: `data` --> src/main.rs:6:25 | 2 | let data = ["a".to_string(), "b".to_string(), "c".to_string()]; | ---- move occurs because `data` has type `[String; 3]`, which does not implement the `Copy` trait 3 | for d in data { | ---- `data` moved due to this implicit call to `.into_iter()` ... 6 | println!("len: {}", data.len()); | ^^^^^^^^^^ value borrowed here after move | note: `into_iter` takes ownership of the receiver `self`, which moves `data` --> /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/iter/traits/collect.rs:271:18 help: consider iterating over a slice of the `[String; 3]`'s content to avoid moving into the `for` loop | 3 | for d in &data { | + For more information about this error, try `rustc --explain E0382`.
For non-scalar types Rust follows its 'move by default' policy and the array is consumed after the
iteration. This is valid for all non-scalar data types unless they implement Copy
trait.
To overcome this you need to borrow the array before iterating over it (as suggested by the
compiler):
fn main() { let data = ["a".to_string(), "b".to_string(), "c".to_string()]; for d in &data { println!("{}", d); } println!("len: {}", data.len()); }
data
is borrowed and the type of d
is &String
instead of String
. After the iteration the
array remains untouched.
The same syntax works on slices (with the same gotchas).
Conclusion
At this point, especially if you haven't skipped the Rust book chapters, you should have a good basic understanding of the language. My advise is to continue by reading and writing some Rust code. Try to setup a local development environment. Try to write some code and see what will go wrong. The compiler is quite pedantic but at the same time helpful. Fetching some actual Rust project from GitHub and browsing the code is also a good idea. It will help you see what you don't know and continue learning.
Comments
Comments powered by Disqus