← Lambdica / mathematical-foundations

String

Also known as: word

A string over an alphabet is a finite sequence of symbols drawn from that alphabet.

Formal definition

Given an alphabet Σ\Sigma, a string (or word) of length n0n \geq 0 over Σ\Sigma is a function w:{1,2,,n}Σw : \{1, 2, \ldots, n\} \to \Sigma. The length is denoted w=n|w| = n. The unique string of length zero is the empty string, written ε\varepsilon.

The set of all strings of length exactly nn over Σ\Sigma is Σn\Sigma^n. The set of all finite strings is the Kleene closure

Σ=n0Σn.\Sigma^* = \bigcup_{n \geq 0} \Sigma^n.

Concatenation is the operation :Σ×ΣΣ\cdot : \Sigma^* \times \Sigma^* \to \Sigma^* that appends one string to another. Under concatenation with identity ε\varepsilon, Σ\Sigma^* forms a monoid — the free monoid generated by Σ\Sigma.

Intuition

A string is what you get when you take finitely many symbols from an alphabet and put them in a specific order. The empty string ε\varepsilon is a frequent source of bugs in proofs and implementations because it is simultaneously the unique length-zero object and the identity of concatenation. Treat it as a first-class citizen from the start; do not default to assuming strings have at least one character.

The finiteness is load-bearing. Infinite sequences over Σ\Sigma exist — they are called ω\omega-words and live in Σω\Sigma^\omega — but they require a different theory altogether (Büchi automata and ω\omega-regular languages). For the automata theory and computability covered on sibling pages, every input is a finite string.

Notes

A language over Σ\Sigma is any subset LΣL \subseteq \Sigma^*. Formal language theory studies which subsets can be specified by which kinds of machines.

The name Kleene closure honors Stephen Kleene, who introduced the operation in his 1951 RAND memorandum on regular expressions.

Connections

depends_on

References

  • sipser-2013-introduction