SailAlign and the error “ReadString: String too long”

March 11, 2014March 11, 2014 nguyenquyhy 2 Comments

If you have used SailAlign (or HTK) to do forced alignment on a large corpus, you may already encounter the error: ReadString: String too long. This error is actually thrown out from HTK, and a quick search on the Internet would return the below web page.

http://www.ling.ohio-state.edu/~bromberg/htk_problems.html

The solution according to the page is:

Make changes to the pronunciation dictionary:
Replace all multiple spaces with single space;
Replace all tabs with single space;
Put a ” before every double quote (“); %”
Put a ” before any dictionary entry beginning with single quote (‘)

And this actually solves the problem, which is quite annoying since the error message “String too long” gives no clue on this solution. Moreover, you will also have to make the same changes to the transcript giving to SailAlign to avoid seeing the same problem with HDecode.

I have spent so much time checking the dictionary and reducing the length of the input data to get rid of the error, just to find out that those suspects are irrelevant. Fortunately I found the problem right in the transcript, and at last SailAlign can run without a hitch now.

How to create full-context labels for your HTS system (update: not really worked)

January 9, 2014April 8, 2016 nguyenquyhy 5 Comments

Update: I later found out that the method described below did not work as expected. Tricking Festival by simply providing it with a custom monophone transcript will generates invalid .utt files. Then creating the full-context labels from those .utt files will give you only quin-phone without any other linguistic context.

However, you would still be able to utilize the script in the first part as the front-end for the TTS system (label/.utt generation using Festival). To create .utt for training data, I have noted down better way here: A better way to create the full-context labels for HTS training data.

Introduction

If you are familiar with the HTS demos, you probably know about their full-context label format. One full context labels looks like this:

ao^th-er+ah=v@1_1/A:1_1_2/B:0-0-1@2-1&amp;2-6#1-4$1-3!1-1;1-3|er/C:1+0+2/D:0_0/E:content+2@1+5&amp;1+2#0+3/F:in_1/G:0_0/H:7=5^1=2|L-L%/I:7=3/J:14+8-2

1	ao^th-er+ah=v@1_1/A:1_1_2/B:0-0-1@2-1&2-6#1-4$1-3!1-1;1-3\|er/C:1+0+2/D:0_0/E:content+2@1+5&1+2#0+3/F:in_1/G:0_0/H:7=5^1=2\|L-L%/I:7=3/J:14+8-2

The above line contain the phone identity and many of its linguistic context, including the 2 previous and 2 following phones, position of current phone in current syllable, position of current syllable in current words, stress, accent and so many other think. The detailed description of all those context is in lab_format.pdf inside the data folder of any HTS demo.

However, if you are building your own system, you may have problem getting all those context to create that long labels. In fact, HTS could still work with much shorter full-context labels containing much less information, but you should expect some degree of degradation in the quality of the synthesized speech due to the shrinking of the decision tree. Fortunately, all the text analysis can actually be done automatically by Festival. I will show all the steps in the sections below.

Continue reading the detailed steps

How to configure HTS for in-training synthesis with state-level alignment labels

December 14, 2013February 11, 2015 nguyenquyhy 2 Comments

Purpose

Utilizing state-level alignment labels allows us to copy the prosody from one speaker and use it on another speaker’s acoustic model. This can be used to improve the synthesized results by using prosody from natural speech and phone features from a HMM-based acoustic models. Moreover, since this technique can create phone-aligned parallel sentences from different acoustic models, we can also use it to generate comparable sentences where the quality of the vocoders or the acoustic features in the training data can be compared separately from the duration models.

Continue reading more on the steps

How to configure HTS demo with STRAIGHT features for 16kHz training data

December 12, 2013December 12, 2013 nguyenquyhy 4 Comments

I have been using HTS for a while for my research on speech synthesis. Recently, I have had some problems when I tried to configure the HTS demo with STRAIGHT features to use 16k data instead of 48k. I finally figured out how to properly do that work, and it is really not as easy as changing one or two configurations like in other demoes without STRAIGHT, so I decided to note all the steps down here.

Passing Async Task functions as a parameter

November 29, 2013November 29, 2013 nguyenquyhy Leave a comment

Some Motivations

If you have programmed Windows Phone or Windows Store apps, you may know about Dispatcher in Windows Phone or CoreDispatcher in Windows Store. In this article I will focus more on Windows Phone implementation, but basically this should be very similar in Windows Store.

In general, those dispatchers are ways to marshal UI interactions from a background context (any context that is not the UI context) to the UI. Trying to directly access (both read and write) any UI control or any view model property that has been bound to a UI control from a background context will trigger an UnauthorizedAccessException. Instead you will have to call Dispatcher.BeginInvoke(Action action) method (or Deployment.Current.Dispatcher.BeginInvoke(Action action) if you are not in a page) and put all the UI accessing codes in the action parameter.

It work perfectly fine until I tried to introduce some async logic inside.

You could do it simply this way:

Dispatcher.BeginInvoke(async () =>
{
	await myTask();
});

Dispatcher.BeginInvoke(async () =>

{

await myTask();

});

Introduction

Purpose

Some Motivations

Meta