- Rust Standard Library Cookbook
- Jan Nils Ferner Daniel Durante
- 377字
- 2021-08-27 19:45:11
How it works...
Essentially, being a kind of vector, a string can be created the same way by combining new and push; however, because this is really inconvenient, a string, which is an owned chunk of memory, can be created from a string slice (&str), which is either a borrowed string or a literal. Both of the ways to do it, that are shown in this recipe, are equivalent:
let s = "Hello".to_string();
println!("s: {}", s);
let s = String::from("Hello");
println!("s: {}", s);
Out of pure personal preference, we will use the first variant.
All strings in Rust are valid Unicode in UTF-8 encoding. This can lead to some surprises, as a character, as we know it, is an inherently Latin invention. For instance, look at languages that have a modifier for a letter—is ? an own character, or is it merely a variation of a? What about languages that allow many combinations in extreme? What would that keyboard even look like? For this reason, Unicode lets you compose your characters from different Unicode scalar values. With .chars(), you can create an iterator that goes through these scalars [28]. If you work with non-Latin characters, you might get surprised by this when accessing composing characters —y? is not one, but two scalars, y and ? [36]. You can get around this by using the Unicode-segmentation crate, which supports iteration over graphemes: https://crates.io/crates/unicode-segmentation.
When splitting a string on a pattern that is at the beginning, is at the end, or occurs multiple times after each other, each instance gets split into an empty string ""[107]. This is especially nasty when splitting on spaces (' '). In this case, you should use split_whitespace instead [110]. Otherwise, split_terminator will remove the empty strings from the end of the string [68].
- A character
- A string
- A predicate that takes one char