Saturday, February 4, 2012

Perl bug: spliting UTF-8 encoded Chinese string

I found a bug of perl, when I used regular expression /\s+/ to split a Chinese string "我想去你家,可以吗?我还想去月球,你想去吗?" which was encoded in UTF-8.

No comments: