# Data manipulation in rust (Part 1 : nalgebra)

## Introduction

I am writing this blog for two main reasons. First, I am using Rust and I would love to see it used more in the scientific community. However, the lack of documentation and guidance toward scientific crates is I think refraining more widespread adoption. So, let’s be the change I wanted to see! Second, by writing small documentions about scientific crates and how they interact together I am hoping to discover new crates and to familiarize myself with writing scientific code in Rust. I am usually using Python to prototype everything but it may be good to be able to do it sometime in Rust also!

In this first post I will describe the basic need we have when
beginning to do scientific computation: manipulating data. Usually,
when I am in Python I am using numpy which is the standard way to do
it. Most Python module made for science are also using numpy which is
really convenient to glue everything together. In Rust we do not have
(yet!) a standard numerical crate. The reason is that the language is
still young and many want to experiment with their own
implementations. It is for me rather unfortunate but this is the
current reality. The upside is that we have specialized crates
tailored for our needs! I found two main crates that I think are of
high qualities and answer different user cases: **ndarray** and
**nalgebra**. ndarray is the equivalent of numpy while nalgebra aims to
provide tools to manipulates matrices and vectors. I wanted to present
both crates in one post but it would have resulted in a very long
post. Instead my first post is about nalgebra and I will post a second
one on ndarray later.

Please note that as I could obviously not describe all the different crates in this post I focused only on nalgebra and ndarray. It does not mean that I think the other crates are not worthwhile but simply I needed to make a choice and I presented what I thought was the most popular choices.

## nalgebra: When you only want matrices and vectors

Having arrays with multiple dimensions is common but many times manipulating vectors and matrices is the only thing you want. In these situations the nalgebra crate is perfect for you. This crate has been created with the underlying goal to be used for general linear algebra operations and computer graphics. As such, this crate is really optimal for manipulating matrices and vector representing geometrical transformations. This crate has a really beautiful documentation and reading it is necessary to discover all the features the crate provides. However, I feel like when you are discovering the crate coming from a Python world (or other programming languages less strict than Rust) it is a bit confusing at first. How do you create and manipulate matrices? How do you use the factorizations explained in the documentaion? This post aims to give an easy start for beginners wanting to try nalgebra.

### Vector/Matrix types

Creating a matrix or a vector in nalgebra is not difficult but we need to be clear on what we want to create before doing it. The most significant choice is if the matrix or vector that we want to create has a column/row size determined in compile time. The more information you give to the compiler and the more your code could be optimized. In Rust, a lot of things are done during compilation and, as such, the code we write has to be correct (which is a bothering point for lot of Rust newcomers). The remaining information needed is the type of the elements inside the matrix.

Because there are many possible type for your matrix, you have to specify manually which type you want to use. For the smallest dimension nalgebra got you covered. Indeed, there are inside the crate many alias made for you to avoid specifying the types of matrix with dimensions lesser than 6. For example, here is how to create two matrices of size 2x4 and 5x6.

And here is how to create two vectors of size 2 and 4.

Note that these matrices and vectors have their dimension known at compile time. There is no alias for a matrix of 4 rows and not known at compile time column number for example.

What about bigger vector and matrices? Well you will have to define
their types. And remember that a vector is just a matrix with one row
or one column. Creating a new matrix type is not difficult, we need to
specify the type of the data we want to store (*i32*, *f64*…), the
number of rows and columns, and finally giving the buffer that will
contain all the matrix components.

However there is one catch. As of today (August 4th, 2018) we cannot
parametrize a type over integer values, which mean that to
differentiate two matrix types we cannot use numbers (even hard-coded
in compile time). This is clearly not ergonomic but people are
working on it! To
circumvent the problem the nalgebra crate created 127 types for you
named *U1* to *U127*.

So for example here is what we could write if we want to create a new matrix type representing matrix of integer with 4 rows and 27 columns.

As you can see, the buffer we used is a *MatrixArray* one. It is an
array which means its size is determined at compile time. So what
about if you only know the size of your matrix when running your
program (like a matrix made from the result of another program) ?

If you do not know anything about the size of your vector/matrix then
you can use the alias *DVector* / *DMatrix*.

Note that in this case we cannot simply use the * new* method as the
compiler would have no idea of which dimension is the matrix. We
instead need to initialize the matrix with different methods that
requires extra parameters to specify the dimensions.

What if you know the row dimension at compile time but not the column
dimension (or the opposite) ? In this case you can create your matrix
type by using the *Dynamic* type.

We created a new matrix type with 49 rows and a dynamical number of
columns of 64 bits floats. Because the matrix is dynamically allocated
we cannot use the *MatrixArray* buffer, instead we are using the
*MatrixVec* one.

### The different ways to initialize

There are many ways to initialize matrices and we will not cover everything here. Instead we will only see the most common one.

#### Zeros

The most common initialization would be to have a matrix filled of zeros. It is of course simple to do it in nalgebra but there is a small catch. Because Rust needs to know the type of the element of the matrix at compile time it will try to infer it from your code. However, if you program incrementaly you may first just initialize your matrix and later add the computation that will use it. In this case you will need to give a type annotation to tell the Rust compiler which it is dealing with.

#### Identical value

When you want to initialize your matrix with the
same element you can simply use the * from_element* method.

#### Identity

A lot of time we need to have an identity matrix and it would be
bothersome to write all the values by ourselves. You can instead just
use the * identiy* method which will handle it for you. You can even
use it for non square matrices but it will set only the largest square
submatrix as the identity, the rest will be filled by 0.

#### From an iterator

In Rust iterators are really convenient and easy to use. It is
possible that you will get your matrix values from one so nalgebra
gives you the * from_iterator* method. Here is a simple example where
the matrices are initialized from two iterators.

Note that the matrix is filled with column major-order.

#### From a function

Sometime you do not have an iterator but you have directly the
function to generate the values of your matrix. In this case it could
be easier to just use the * from_fn* method.

### Accessing and modifying elements

Accessing and modifying values of matrices are fundamental operations for scientific computing. However, it could be a little bit tricky at first to do it because of the Rust compiler. Indeed, the infamous borrow checker will not let you do anything that is not safe.

It is important to be clear on if we only want to access an element to read it/copy it or if we want to modify it. If we just want to read some value of our matrix it is simple.

You have to remember that the matrix has its element ordered in a
column-major mode. But now, suppose that you want to have access to
the values of a submatrix. It is a pretty common task so nalgebra
gives you many functions to do it. Here we will just see how to read a
rows, columns and a simple submatrix (you can have a lot of fun by
defining complex submatrix). Basically what we will do is to take a
slice of the matrix. The slice will also be a *Matrix* type (with a
special buffer type) so you can manipulate it as a normal matrix
except that you cannot mutates its values.

Note that as always, because the slices we are taking are in fact matrices we also need to know if their sizes are determined at compile time or not.

So what about modifying the values of our matrix ? You simply need to tell the compiler that you want your matrix to be mutable. However, remember that a slice cannot be mutated as it is just a view of a part of another matrix.

Last but not least, what if you want to change the size of your matrix ? Again, nalgebra is making a difference between compile time and runtime. If you know at compile time how many rows or columns you want to add or remove then your matrix will stay statically-sized (except if it is already dynamically sized of course). Conversely, if you do not know at compile time how you want to resize your matrix then the output will be a dynamically sized matrix. Because there are many different methods we will just show a subset of them. You can read all in the documentation.

Note that during the resizing you need to use the * let* keyword to
mean that you redefine the type of your variable. Indeed, because the
size of your matrix is changing its type is also changing.

### Matrix operations

Having to specify the types of the matrices we are using can be a pain
but it begins to be useful when we are manipulating them. It
is simple, if you do a mistake by manipulating two matrices with
non-suitable dimensions it will *not* compile! Unlike for example
numpy in Python where the matrix are all dynamically sized and where
you discover your mistake by running your program (which can be time
expensive) in Rust you will know it immediately! Of course, there is
no magic. If your matrices are dynamically sized you will also have to
debug at runtime. Therefore, the more information you give to the
compiler and the easier it will protect you.

#### Addition and multiplication

Now that we have seen all of the basics we can begin to use our matrices for real! Additioning and multiplicating matrices are two operations really easy in nalgebra, just consider them as simple scalar values and it will work out of the box!

Besides, it is worth insisting on the protection against matrix shapes mismatch. The Rust compiler will prevent you to do not compatible operation at compile time but only if you give enough information to it! It is always advantageous to you use methods requiring compile time information because first it will protect you again stupid mistakes and second it will be faster (the compiler could use the stack instead of the heap)!

#### Linear system

Most of the interest of using nalgebra is to do…algebra! And a very
common operation in algebra is to solve a linear system. However,
nalgebra gives you many ways to solve it. Indeed, if your linear
system is in the form **Ax=b** then computing **A^-1** may not be the
fastest solution! It is not in the scope of this article to explain
about all the different algebra tricks but the core idea is to exploit
matrices’ structures to reduce the computation time.

If you just want the inverse of a function, you can call the
* try_invert_to* function. The function takes two arguments, an input
matrix and an output matrix. By performing a LU decomposition on the
input matrix the function is trying to find an inverse and store the
result in the output matrix. Be careful, if no inverse is found your
output matrix may contain invalid data (so check the result).

Now what can we do if we want to solve the classical linear system
**Ax=b** ? Depending on the property of your matrix you can use
different decompositions to speed up the resolution. I will let you
take care of the choice but here I will just show some examples. The
following code solves a linear system in three different ways. Once by
inversing the matrix **A** and multiplying it by **b** (very
unefficient). Once by computing the LU decomposition and asking
nalgebra to solve it. And finally once by computing the Cholesky
decomposition and using nalgebra to solve it (faster than the LU
decomposition !).

Please note that the in the case of the LU decomposition the result of
the solve function is an *Option*. It means we have to check if a
solution has been found. In the case of the Cholesky decomposition it
is the decomposition itself that is an *Option* (because the matrix
may not be symmetric definite positive) but the solve function outputs a *Matrix*.

#### Other

I cannot write about all the possibilities that nalgebra offers (that is the documentation duty) because this post is already too long. Just for you to know, nalgebra allows you to perform many factorization of your matrices (QR, SVD…), finding the determinant and the eigenvalues, and to do a lot of computer graphic operations (rotation, projection…). As always, you can check it in the documention.