SailAlign and the error “ReadString: String too long”

If you have used SailAlign (or HTK) to do forced alignment on a large corpus, you may already encounter the error: ReadString: String too long. This error is actually thrown out from HTK, and a quick search on the Internet would return the below web page.

http://www.ling.ohio-state.edu/~bromberg/htk_problems.html

The solution according to the page is:

Make changes to the pronunciation dictionary:
Replace all multiple spaces with single space;
Replace all tabs with single space;
Put a ” before every double quote (“); %”
Put a ” before any dictionary entry beginning with single quote (‘)

And this actually solves the problem, which is quite annoying since the error message “String too long” gives no clue on this solution. Moreover, you will also have to make the same changes to the transcript giving to SailAlign to avoid seeing the same problem with HDecode.

I have spent so much time checking the dictionary and reducing the length of the input data to get rid of the error, just to find out that those suspects are irrelevant. Fortunately I found the problem right in the transcript, and at last SailAlign can run without a hitch now.