Similar as done with SAPI / MSSP make the current implementation for executable
based TTS engines a base class and create derived classes for each supported
TTS. Removes the need for the implementation to know about the individual TTS
engines.
Add support for speaking directly (i.e. without going through a temporary wave
file, currently only used by espeak).
Change-Id: I59bbbd6ee4c2c009b2a8d8e0ab4a9b39ea723d6e