Substring matching in Erlang

January 20, 2010

I can’t remember if I’ve ever seen it in the official documentation and I can’t really be bothered to search for it, so I’ll just put it here regardless.

It is commonly believed string processing is one of the things that Erlang doesn’t do very well at all. Now, you have to keep in mind that strings are actually linked lists in Erlang, so you have to write your algorithms accordingly (i.e., the only O(1) operations you have are “take first element” and “add first element”; concatenation and array-like access are O(n), so if your code does a lot of random access, it’ll be really sluggish), that is, mostly rely on recursive descent. So I was sort of meditating on this and then I got a small idea.

When you match a list, it is possible to specify an arbitrary (but fixed) number of elements at the front:

[a, b, c | Rest] = [a, b, c, d, e].

So I thought, perhaps there is a similar laconic way to match a substring, et voilà:

"abc" ++ Rest = "abcde".

I imagine it could be used in Yaws servlets that work with deep URLs, something like Rails’ /controller/action/id.

There are limitations though. For instance, for some reason I can’t quite figure out at this hour, you cannot use the “++” syntax for matching lists:

[a, b, c] ++ Rest = [a, b, c, d, e].
* 1: illegal pattern

although there shouldn’t be any apparent difference in internal representation.

Also, the head string must be a literal; you cannot pre-bind it to a variable, because the head must be known at compile-time to create a pattern match. (Otherwise it could be rather handy for building regular automata or something trie-like, I imagine.) But at least it’s there and it could be useful in some cases.