Text Processing in Rust

Create handy command-line utilities in Rust.

This article is about text processing in Rust, but it also contains a
quick introduction to pattern matching, which can be very handy when
working with text.

Strings are a huge subject in Rust, which can be easily realized by
the fact that Rust has two data types for representing strings as well
as support for macros for formatting strings. However, all of this also
proves how powerful Rust is in string and text processing.

Apart from covering some theoretical topics, this article shows how to develop
some handy yet easy-to-implement command-line utilities that let you
work with plain-text files. If you have the time, it’d be great to
experiment with the Rust code presented here, and maybe develop your own
utilities.

Rust and Text

Rust supports two data types for working with strings: String
and str.
The String type is for working with mutable strings that
belong to you, and it has length and a capacity property. On the other
hand, the str type is for working with immutable strings that you want
to pass around. You most likely will see an str variable be used as
&str. Put simply, an str variable is accessed as a reference to some
UTF-8 data. An str variable is usually called a “string slice” or, even
simpler, a “slice”. Due to its nature, you can’t add and remove any
data from an existing str variable. Moreover, if you try to call the
capacity() function on an &str variable, you’ll get an error message
similar to the following:


error[E0599]: no method named `capacity` found for type
 ↪`&str` in the current scope

Generally speaking, you’ll want to use an str when you want to pass a string
as a function parameter or when you want to have a read-only version
of a string, and then use a String variable when you want to have a mutable
string that you want to own.

The good thing is that a function that accepts &str parameters can
also accept String parameters. (You’ll see such an example in the
basicOps.rs program presented later in this article.)
Additionally, Rust supports the char type, which is for representing
single Unicode characters, as well as string literals, which are
strings that begin and end with double quotes.

Finally, Rust supports what is called a byte string. You can define a new
byte string as follows:

Powered by WPeMatico