Lua-eSpeak Logo

Lua-eSpeak 1.36r1

DRAFT

Contents

Introduction

eSpeak is a compact open source software speech synthesizer for English and other languages. It produces good quality English speech using a different synthesis method from other open source TTS engines.

Lua-eSpeak is a "binding": a library that exports functions from eSpeak to the Lua Programming Language, allowing you to use eSpeak from Lua. The API was NOT literally exported, but changed in a way that made it familiar to Lua users.

Lua-eSpeak is a programming library, not a synthesis program. If you are looking for that or are not familiar to the Lua Programming Language, you are in the wrong place.

A NOTE ON VERSION NUMBERS: Lua-eSpeak version numbers are in the format "X.YrZ", where X.Y indicates the eSpeak version and Z the version of the binding. So, the version 1.36r1 is the first version of the binding for eSpeak 1.36 and 1.36r2 has some improvements/bug fixes/etc. but uses the same eSpeak version.

Licensing information

eSpeak and Lua-eSpeak are copyrighted free software, distributed under the GNU General Public License version 3 or, at your option, any later version. It can be used for both academic and commercial purpouses. BUT, if you redistribute the library, you must make all source code linking against this library and Lua interpreter available under the GPL too; This requirement does not include your Lua code. More precisely:

(c) 2007-08 Alexandre Erwin Ittner <aittner X gmail.com>

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA

Download and installation

Lua-eSpeak source code comes in a tar.gz package that can be downloaded from Lua-eSpeak project page on LuaForge. There are also some pre-compiled binary packages available for Linux and some other Unix systems.

After downloading, you need to compile the library. This process requires GNU Make, Lua, eSpeak and its dependencies (portaudio, etc.) installed on your system. To compile on Unix, just unpack the distribution package, enter the directory and type make in your favourite shell.

To install the library with Lua 5.1 and above just type make install, as root.

WARNING: This procedure will work only in systems with pkg-config (which includes the major Linux distributions). In other systems, you will need to edit the Makefile and manually install the library, by copying the file 'espeak.so' to your Lua binary modules directory.

Compiling Lua-eSpeak on Windows systems is possible, but I have not tried yet. You will need to edit the Makefile by yourself.

Library loading and initialization

Lua-eSpeak uses the Lua 5.1 package system that allows you to simply do a

require "espeak"

call to load up the library. After that, you must call the espeak.Initialize function before using the library.

Constants

General information

espeak.VERSION

Holds information about library name and version. 

Events

espeak.EVENT_LIST_TERMINATED

 Retrieval mode: terminates the event list.

espeak.EVENT_WORD

 Start of word.

espeak.EVENT_SENTENCE

 Start of sentence.

espeak.EVENT_PHONEME

 Phoneme, if enabled in espeak.Initialize()

espeak.EVENT_MARK

 Mark.

espeak.EVENT_PLAY

 Audio element.

espeak.EVENT_END

 End of sentence or clause.

espeak.EVENT_MSG_TERMINATED

 End of message.

Positions

espeak.POS_CHARACTER

espeak.POS_WORD

espeak.POS_SENTENCE

Audio output

espeak.AUDIO_OUTPUT_PLAYBACK

 PLAYBACK mode: plays the audio data, supplies events to the
 calling program.

espeak.AUDIO_OUTPUT_RETRIEVAL

 RETRIEVAL mode: supplies audio data and events to the calling program.

espeak.AUDIO_OUTPUT_SYNCHRONOUS

 SYNCHRONOUS mode: as RETRIEVAL but doesn't return until synthesis is    
 completed.

espeak.AUDIO_OUTPUT_SYNCH_PLAYBACK

 Synchronous playback mode: plays the audio data, supplies events to the
 calling program.

Errors and status

espeak.EE_OK

espeak.EE_INTERNAL_ERROR

espeak.EE_BUFFER_FULL

espeak.EE_NOT_FOUND

Synthesis

espeak.CHARS_AUTO

espeak.CHARS_UTF8

espeak.CHARS_8BIT

espeak.CHARS_WCHAR

espeak.SSML

espeak.PHONEMES

espeak.ENDPAUSE

espeak.KEEP_NAMEDATA

Parameters

espeak.RATE

espeak.VOLUME

espeak.PITCH

espeak.RANGE

espeak.PUNCTUATION

espeak.CAPITALS

espeak.WORDGAP

Punctuation

espeak.PUNCT_NONE

espeak.PUNCT_ALL

espeak.PUNCT_SOME

Functions

Initialization

espeak.Initialize(audio_output, buflength, [ path, [ options ]])

Must be called before any synthesis functions are called. This function
yields errors if called more then once.

'audio_output' is the audio data can either be played by eSpeak or passed
 back by the SynthCallback function.

'buflength' is the length (in miliseconds) of sound buffers passed to the
SynthCallback function.

'path' is the directory which contains the espeak-data directory, or nil
for the default location.

'options' is a integer bitvector. The following values are valid:
     Bit 0: Set to allow espeak.EVENT_PHONEME events.
for compatibility with previous versions of Lua-eSpeak, passing 'nil' or
not passing this parameter is interpreted as zero.

This function returns the sample rate in Hz or 'nil' (internal error);

espeak.Info()

Gives the version of the eSpeak library, as a string. The version of
the Lua binding is given in espeak.VERSION, instead.

espeak.SetSynthCallback(callback_function)

Must be called before any synthesis functions are called. This specifies
a function in the calling program which is called when a buffer of speech
sound data has been produced. 

The callback function is of the form:
 
     function callback(wave, events)
         ...
         ...
         return false     -- or true.
     end


Where 'wave' is a string with the speech sound data which has been
produced. 'nil' indicates that the synthesis has been completed. And empty
string  does NOT indicate end of synthesis. 'events' is a table with items
which indicate word and sentence events, and  also the occurance if <mark>
and <audio> elements within the text. Valid elements are:

     type: The event type, that must be espeak.EVENT_LIST_TERMINATED,
           EVENT_WORD, EVENT_SENTENCE, EVENT_PHONEME (if enabled in 
           speak.Initialize()), EVENT_MARK, EVENT_PLAY, EVENT_END
           or EVENT_MSG_TERMINATED.
     
     unique_identifier: The integer id passed from Synth function.

     text_position: The number of characters from the start of the text.
 
     length: For espeak.EVENT_WORD, the word length, in characters.
     
     audio_position: The time in ms within the generated output data.

     id: a number for WORD, SENTENCE or PHONEME events or a UTF8 string
         for MARK and PLAY events.

Callback functions must return 'false' to continue synthesis or 'true' to
abort.

espeak.SetUriCallback(callback_function)

This function must be called before synthesis functions are used, in
order to deal with <audio> tags. It specifies a callback function which
is called when an <audio> element is encountered and allows the calling
program to indicate whether the sound file which is specified in the
<audio> element is available and is to be played.

The callback function is of the form:

     function callback(type, uri, base)
         ...
         ...
         return false     -- or true.
     end

Where:

'type' is type of callback event. Currently only 1 = <audio> element.
'uri' is the "src" attribute from the <audio> element, a string.
'base' is the "xml:base" attribute (if any) from the <speak> element

The callback function must return 'true' to don't play the sound, but
speak the text alternative or 'false' to place a PLAY event in the event
list at the point where the <audio> element occurs. The calling program
can then play the sound at that point.

Synthesis

espeak.Synth(text, position, position_type, [ end_position, [ flags ]])

Synthesize speech for the specified text.  The speech sound data is passed
to the calling program in buffers by means of the callback function
specified by espeak.SetSynthCallback(). The command is asynchronous: it
is internally buffered and returns as soon as possible. If
espeak.Initialize() was previously called with espeak.AUDIO_OUTPUT_PLAYBACK
as argument, the sound data are played by eSpeak.

'text' is a string with the text to be spoken.  It may be either 8-bit
characters,  wide characters, or UTF8 encoding. Which of these is
determined by the 'flags' parameter.

'position' is the position in the text where speaking starts. Zero or nil
indicates speak from the start of the text.

'position_type' determines whether "position" is a number of characters,
words, or sentences. Valied values are espeak.POS_CHARACTER, 
espeak.POS_WORD or espeak.POS_SENTENCE.

'end_position', if set, this gives a character position at which speaking
will stop.  A value of zero or nil indicates no end position.

'flags': These may be added together:
    Type of character codes, one of: espeak.CHARS_UTF8, espeak.CHARS_8BIT,
         espeak.CHARS_AUTO (default) or espeak.CHARS_WCHAR.

    espeak.SSML   Elements within < > are treated as SSML elements, or if
         not recognised are ignored.

    espeak.PHONEMES  Text within [[ ]] is treated as phonemes codes (in
         espeak's Hirshenbaum encoding).

    espeak.ENDPAUSE  If set then a sentence pause is added at the end of
         the text.  If not set then this pause is suppressed.


This function returns two values: the status of the operation (espeak.EE_OK,
espeak.EE_BUFFER_FULL or espeak.EE_INTERNAL_ERROR) and an unique integer
that will also be passed to the callback function (if any).

espeak.Synth_Mark(text, index_mark, [ end_position, [ flags ]])

Synthesize speech for the specified text. Similar to espeak.Synth() but
the start position is specified by the name of a <mark> element in the
text.

'index_mark' is the "name" attribute of a <mark> element within the text
 which specified the point at which synthesis starts. it must be an UTF8
 string.

 For the other parameters, see espeak.Synth()

This function returns two values: the status of the operation (espeak.EE_OK,
espeak.EE_BUFFER_FULL or espeak.EE_INTERNAL_ERROR) and an unique integer
that will also be passed to the callback function (if any).

espeak.Key(key_name)

Speak the name of a keyboard key. If key_name is a single character, it
speaks the name of the character. Otherwise, it speaks key_name as a text
string.

 Return: espeak.EE_OK: operation achieved 
         espeak.EE_BUFFER_FULL: the command can not be buffered;  you may
             try to call the function again after a while.
        espeak.EE_INTERNAL_ERROR.

espeak.Char(character_code)

Speak the name of the character, given as a 16 bit integer.

 Return: espeak.EE_OK: operation achieved 
         espeak.EE_BUFFER_FULL: the command can not be buffered;  you may
             try to call the function again after a while.
        espeak.EE_INTERNAL_ERROR.

Speech parameters

espeak.SetParameter(parameter, value, relative)

Sets the value of the specified parameter. 'relative' is a boolean that
marks the value as relative to the current value.

The following parameters are valid:

     espeak.RATE:    speaking speed in word per minute.
     espeak.VOLUME:  volume in range 0-100, 0 = silence
     espeak.PITCH:   base pitch, range 0-100.  50=normal
     espeak.RANGE:   pitch range, range 0-100. 0-monotone, 50=normal

     espeak.PUNCTUATION:  which punctuation characters to announce:
        value in espeak_PUNCT_TYPE (none, all, some), 
        see espeak_GetParameter() to specify which characters are announced.

     espeak.CAPITALS: announce capital letters by:
        0=none,
        1=sound icon,
        2=spelling,
        3 or higher, by raising pitch.  This values gives the amount
             in Hz by which the pitch of a word raised to indicate it
             has a capital letter.

     espeak.WORDGAP: pause between words, units of 10ms (at the default speed)

 Return: espeak.EE_OK: operation achieved 
         espeak.EE_BUFFER_FULL: the command can not be buffered;  you may
             try to call the function again after a while.
        espeak.EE_INTERNAL_ERROR.

espeak.GetParameter(parameter, current)

Returns synthesis parameters. 'current' is a boolean that tells the
function to return the current value, instead of the default one.

espeak.SetPunctuationList(punctlist)

Specified a list of punctuation characters whose names are to be spoken
when the value of the Punctuation parameter is set to "some". 'punctlist'
is a array of character codes (as integers). 

espeak.SetPhonemeTrace(value, filehandle)

Controls the output of phoneme symbols for the text.

 value=0  No phoneme output (default)
 value=1  Output the translated phoneme symbols for the text
 value=2  as (1), but also output a trace of how the translation was done
          (matching rules and list entries)

'filehandle' is the output stream for the phoneme symbols (and trace). If
nil then it uses io.stdout.

This function returns no values.

espeak.CompileDictionary(path, filehandle, [ flags ])

Compile pronunciation dictionary for a language which corresponds to the
currently selected voice. The required voice should be selected before
calling this function.

'path' is the directory which contains the language's "_rules" and
 "_list" files. 'path' should end with a path separator character ('/').

'filehandle' is the output stream for error reports and statistics
information. If nil, then io.stderr will be used.

'flags' is a integer bitvector that accepts the following values:
    Bit 0: include source line information for debug purposes (as is
           displayed with the -X command line option in 'speak' command).
for compatibility with previous versions of Lua-eSpeak, passing 'nil' or
not passing this parameter is interpreted as zero.
     

This function returns no values.

Voice Selection

espeak.ListVoices(voice_spec)

Reads the voice files from espeak-data/voices and creates an array of
voice tables. If 'voice_spec' is given, then only the voices which are
compatible with the 'voice_spec' are listed, and they are listed in
preference order.

espeak.SetVoiceByName(name)

Searches for a voice with a matching "name" field.  Language is not
considered. "name" is a UTF8 string.

 Return: espeak.EE_OK: operation achieved 
         espeak.EE_BUFFER_FULL: the command can not be buffered;  you may
             try to call the function again after a while.
        espeak.EE_INTERNAL_ERROR.

espeak.SetVoiceByProperties(voice_spec)

An voice table is used to pass criteria to select a voice. Any of the
following fields may be set:

 name        nil or a voice name

 languages   nil or a single language string (with optional dialect), eg.
             "en-uk", or "en"

 gender      0 or nil = not specified, 1 = male, 2 = female

 age         0 or nil = not specified, or an age in years

 variant     After a list of candidates is produced, scored and sorted,
             "variant" is used to index that list and choose a voice.
             variant=0 takes the top voice (i.e. best match), variant=1
             takes the next voice, etc

 Return: espeak.EE_OK: operation achieved 
         espeak.EE_BUFFER_FULL: the command can not be buffered;  you may
             try to call the function again after a while.
        espeak.EE_INTERNAL_ERROR.

espeak.GetCurrentVoice()

Returns a voice table data for the currently selected voice. This is not
affected by temporary voice changes caused by SSML elements such as
<voice> and <s>.

Flow control

espeak.Cancel()

Stop immediately synthesis and audio output of the current text. When this
function returns, the audio output is fully stopped and the synthesizer is
ready to synthesize a new message. This function returns espeak.EE_OK if
the operation was achieved or espeak.EE_INTERNAL_ERROR.

espeak.IsPlaying()

Returns 'true' if audio is playing or 'false' otherwise.

espeak.Synchronize()

This function returns when all data have been spoken. Returns
espeak.EE_OK if the operation was achieved or espeak.EE_INTERNAL_ERROR.

espeak.Terminate()

Last function to be called. Returns espeak.EE_OK if the operation was
achieved, espeak.EE_INTERNAL_ERROR on eSpeak error. This function yells
errors if called before initialization or more then once.

Examples

A simple speech

This program utters a phrase and quits. You can listen to the resulting audio here.


require "espeak"

local text = "One Ring to rule them all."

espeak.Initialize(espeak.AUDIO_OUTPUT_PLAYBACK, 500)

if espeak.SetVoiceByName("english") ~= espeak.EE_OK then
    print("Failed to set default voice.")
    return
end

espeak.Synth(text, 0, espeak.POS_WORD, 0, nil)

espeak.Synchronize()
espeak.Terminate()

Changing the voice

This program speaks with some voices available in eSpeak. You can listen to the resulting audio here.


require "espeak"

espeak.Initialize(espeak.AUDIO_OUTPUT_PLAYBACK, 0)

local langs = {
    { "pt-br", "Português brasileiro" },
    { "pt", "Português Europeu" },
    { "en-gb", "British English" },
    { "en-us", "American English" },
    { "de", "Deutsch"  },
    { "eo", "Esperanto" },
    { "it", "Italiano" },
    { "es", "Español" },
    { "fi", "Suomi" }
}

for _, l in ipairs(langs) do
    if espeak.SetVoiceByProperties({ languages = l[1] }) == espeak.EE_OK then
        espeak.Synth(l[2])
    end
    espeak.Synchronize()
end

espeak.Terminate()

Telling the time

This program speaks the current time in the "spoken" (informal) Brazilian Portuguese. You can listen to the resulting audio here.

-- -*- coding: utf-8 -*-
--
-- Functions to "say" the time of the day.
-- (c) 2007-08 Alexandre Erwin Ittner <aittner@gmail.com>
--
-- This file is part of Lua-eSpeak and is distributed under the GNU GPL v2
-- or, at your option, any later version.
--
--

module("saytime", package.seeall)

-- Spoken Portuguese

local pt_hs = { "uma", "duas", "três", "quatro", "cinco", "seis", "sete",
    "oito", "nove", "dez", "onze" }

function pt_spoken(h, m)
    local mt
    if m > 30 then
        h = h + 1
        if h == 24 then
            return (60 - m) .. " para a meia-noite"
        elseif h == 1 then
            return (60 - m) .. " para a uma da manhã"
        elseif h > 1 and h < 12 then
            return (60 - m) ..  " para as " .. pt_hs[h] .. " da manhã"
        elseif h == 12 then
            return (60 - m) .. " para o meio-dia"
        elseif h == 13 then
            return (60 - m) .. " para a uma da tarde"
        elseif h > 13 and h < 19 then
            return (60 - m) ..  " para as " .. pt_hs[h - 12] .. " da tarde"
        else
            return (60 - m) ..  " para as " .. pt_hs[h - 12] .. " da noite"
        end
    else
        if m == 30 then
            if h == 0 then
                return "meia-noite e meia"
            elseif h == 12 then
                return "meio-dia e meio"
            elseif h > 12 then
                mt = pt_hs[h - 12] .. " e meia"
            else
                mt = pt_hs[h] .. " e meia"
            end
        else
            if h == 0 then
                mt = "meia-noite"
            elseif h == 12 then
                mt = "meio-dia"
            elseif h > 12 then
                mt = pt_hs[h - 12]
            else
                mt = pt_hs[h]
            end
            if m ~= 0 then
                mt = mt .. " e " .. m
            end
            if h == 0 or h == 12 then
                return mt
            end
        end
        if h < 12 then
            return mt .. " da manhã"
        elseif h > 12 and h < 19 then
            return mt .. " da tarde"
        else
            return mt .. " da noite"
        end
    end
end


-- Formal portuguese.

function pt_formal(h, m)
    local hs = ""
    if h == 0 then
        hs = "zero hora"
    elseif h == 1 then
        hs = "uma hora"
    else
        hs = h .. " horas"
    end
    if m == 1 then
        hs = hs .. " e um minuto"
    elseif m > 1 then
        hs = hs .. " e " .. m .. "minutos"
    end
    return hs
end
#!/usr/bin/env lua
-- -*- coding: utf-8 -*-

-- Says the time of the day as in the spoken Portuguese.
-- (c) 2007-08 Alexandre Erwin Ittner <aittner@gmail.com>
-- Distributed under the GPL v2 or later.

require "espeak"
require "saytime"

espeak.Initialize(espeak.AUDIO_OUTPUT_SYNCH_PLAYBACK, 500)

if espeak.SetVoiceByName("brazil") ~= espeak.EE_OK then
    print("Impossível localizar a voz correta.")
    return
end

local dt = os.date("*t")
espeak.Synth(saytime.pt_spoken(dt.hour, dt.min))

espeak.Terminate()

Other examples

There are some useful examples in the demos directory within the distribution package.

Contact information

Author: Alexandre Erwin Ittner
E-mail: aittner#gmail.com (e-mail obfuscated to avoid spam-bots. Please replace the "#" with an "@").
GnuPG/PGP Key: 0x0041A1FB (key fingerprint: 9B49 FCE2 E6B9 D1AD 6101 29AD 4F6D F114 0041 A1FB).
Homepage: http://users.netuno.com.br/aittner/.
Location: Jaraguá do Sul, Santa Catarina, Brazil.