eSpeak is a compact open source software speech synthesizer for English and other languages. It produces good quality English speech using a different synthesis method from other open source TTS engines.
Lua-eSpeak is a "binding": a library that exports functions from eSpeak to the Lua Programming Language, allowing you to use eSpeak from Lua. The API was NOT literally exported, but changed in a way that made it familiar to Lua users.
Lua-eSpeak is a programming library, not a synthesis program. If you are looking for that or are not familiar to the Lua Programming Language, you are in the wrong place.
A NOTE ON VERSION NUMBERS: Lua-eSpeak version numbers are in the format "X.YrZ", where X.Y indicates the eSpeak version and Z the version of the binding. So, the version 1.36r1 is the first version of the binding for eSpeak 1.36 and 1.36r2 has some improvements/bug fixes/etc. but uses the same eSpeak version.
eSpeak and Lua-eSpeak are copyrighted free software, distributed under the GNU General Public License version 3 or, at your option, any later version. It can be used for both academic and commercial purpouses. BUT, if you redistribute the library, you must make all source code linking against this library and Lua interpreter available under the GPL too; This requirement does not include your Lua code. More precisely:
(c) 2007-08 Alexandre Erwin Ittner <aittner X gmail.com> This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
Lua-eSpeak source code comes in a tar.gz package that can be downloaded from Lua-eSpeak project page on LuaForge. There are also some pre-compiled binary packages available for Linux and some other Unix systems.
After downloading, you need to compile the library. This process requires
GNU Make, Lua, eSpeak and its dependencies (portaudio, etc.) installed on your
system. To compile on Unix, just unpack the distribution package, enter the
directory and type make in your favourite shell.
To install the library with Lua 5.1 and above just type
make install, as root.
WARNING: This procedure will work only in systems with pkg-config (which includes the major Linux distributions). In other systems, you will need to edit the Makefile and manually install the library, by copying the file 'espeak.so' to your Lua binary modules directory.
Compiling Lua-eSpeak on Windows systems is possible, but I have not tried yet. You will need to edit the Makefile by yourself.
Lua-eSpeak uses the Lua 5.1 package system that allows you to simply do a
require "espeak"
call to load up the library. After that, you must call the
espeak.Initialize function before using the library.
Holds information about library name and version.
Retrieval mode: terminates the event list.
Start of word.
Start of sentence.
Phoneme, if enabled in espeak.Initialize()
Mark.
Audio element.
End of sentence or clause.
End of message.
PLAYBACK mode: plays the audio data, supplies events to the calling program.
RETRIEVAL mode: supplies audio data and events to the calling program.
SYNCHRONOUS mode: as RETRIEVAL but doesn't return until synthesis is completed.
Synchronous playback mode: plays the audio data, supplies events to the calling program.
Must be called before any synthesis functions are called. This function
yields errors if called more then once.
'audio_output' is the audio data can either be played by eSpeak or passed
back by the SynthCallback function.
'buflength' is the length (in miliseconds) of sound buffers passed to the
SynthCallback function.
'path' is the directory which contains the espeak-data directory, or nil
for the default location.
'options' is a integer bitvector. The following values are valid:
Bit 0: Set to allow espeak.EVENT_PHONEME events.
for compatibility with previous versions of Lua-eSpeak, passing 'nil' or
not passing this parameter is interpreted as zero.
This function returns the sample rate in Hz or 'nil' (internal error);
Gives the version of the eSpeak library, as a string. The version of the Lua binding is given in espeak.VERSION, instead.
Must be called before any synthesis functions are called. This specifies
a function in the calling program which is called when a buffer of speech
sound data has been produced.
The callback function is of the form:
function callback(wave, events)
...
...
return false -- or true.
end
Where 'wave' is a string with the speech sound data which has been
produced. 'nil' indicates that the synthesis has been completed. And empty
string does NOT indicate end of synthesis. 'events' is a table with items
which indicate word and sentence events, and also the occurance if <mark>
and <audio> elements within the text. Valid elements are:
type: The event type, that must be espeak.EVENT_LIST_TERMINATED,
EVENT_WORD, EVENT_SENTENCE, EVENT_PHONEME (if enabled in
speak.Initialize()), EVENT_MARK, EVENT_PLAY, EVENT_END
or EVENT_MSG_TERMINATED.
unique_identifier: The integer id passed from Synth function.
text_position: The number of characters from the start of the text.
length: For espeak.EVENT_WORD, the word length, in characters.
audio_position: The time in ms within the generated output data.
id: a number for WORD, SENTENCE or PHONEME events or a UTF8 string
for MARK and PLAY events.
Callback functions must return 'false' to continue synthesis or 'true' to
abort.
This function must be called before synthesis functions are used, in
order to deal with <audio> tags. It specifies a callback function which
is called when an <audio> element is encountered and allows the calling
program to indicate whether the sound file which is specified in the
<audio> element is available and is to be played.
The callback function is of the form:
function callback(type, uri, base)
...
...
return false -- or true.
end
Where:
'type' is type of callback event. Currently only 1 = <audio> element.
'uri' is the "src" attribute from the <audio> element, a string.
'base' is the "xml:base" attribute (if any) from the <speak> element
The callback function must return 'true' to don't play the sound, but
speak the text alternative or 'false' to place a PLAY event in the event
list at the point where the <audio> element occurs. The calling program
can then play the sound at that point.
Synthesize speech for the specified text. The speech sound data is passed
to the calling program in buffers by means of the callback function
specified by espeak.SetSynthCallback(). The command is asynchronous: it
is internally buffered and returns as soon as possible. If
espeak.Initialize() was previously called with espeak.AUDIO_OUTPUT_PLAYBACK
as argument, the sound data are played by eSpeak.
'text' is a string with the text to be spoken. It may be either 8-bit
characters, wide characters, or UTF8 encoding. Which of these is
determined by the 'flags' parameter.
'position' is the position in the text where speaking starts. Zero or nil
indicates speak from the start of the text.
'position_type' determines whether "position" is a number of characters,
words, or sentences. Valied values are espeak.POS_CHARACTER,
espeak.POS_WORD or espeak.POS_SENTENCE.
'end_position', if set, this gives a character position at which speaking
will stop. A value of zero or nil indicates no end position.
'flags': These may be added together:
Type of character codes, one of: espeak.CHARS_UTF8, espeak.CHARS_8BIT,
espeak.CHARS_AUTO (default) or espeak.CHARS_WCHAR.
espeak.SSML Elements within < > are treated as SSML elements, or if
not recognised are ignored.
espeak.PHONEMES Text within [[ ]] is treated as phonemes codes (in
espeak's Hirshenbaum encoding).
espeak.ENDPAUSE If set then a sentence pause is added at the end of
the text. If not set then this pause is suppressed.
This function returns two values: the status of the operation (espeak.EE_OK,
espeak.EE_BUFFER_FULL or espeak.EE_INTERNAL_ERROR) and an unique integer
that will also be passed to the callback function (if any).
Synthesize speech for the specified text. Similar to espeak.Synth() but the start position is specified by the name of a <mark> element in the text. 'index_mark' is the "name" attribute of a <mark> element within the text which specified the point at which synthesis starts. it must be an UTF8 string. For the other parameters, see espeak.Synth() This function returns two values: the status of the operation (espeak.EE_OK, espeak.EE_BUFFER_FULL or espeak.EE_INTERNAL_ERROR) and an unique integer that will also be passed to the callback function (if any).
Speak the name of a keyboard key. If key_name is a single character, it
speaks the name of the character. Otherwise, it speaks key_name as a text
string.
Return: espeak.EE_OK: operation achieved
espeak.EE_BUFFER_FULL: the command can not be buffered; you may
try to call the function again after a while.
espeak.EE_INTERNAL_ERROR.
Speak the name of the character, given as a 16 bit integer.
Return: espeak.EE_OK: operation achieved
espeak.EE_BUFFER_FULL: the command can not be buffered; you may
try to call the function again after a while.
espeak.EE_INTERNAL_ERROR.
Sets the value of the specified parameter. 'relative' is a boolean that
marks the value as relative to the current value.
The following parameters are valid:
espeak.RATE: speaking speed in word per minute.
espeak.VOLUME: volume in range 0-100, 0 = silence
espeak.PITCH: base pitch, range 0-100. 50=normal
espeak.RANGE: pitch range, range 0-100. 0-monotone, 50=normal
espeak.PUNCTUATION: which punctuation characters to announce:
value in espeak_PUNCT_TYPE (none, all, some),
see espeak_GetParameter() to specify which characters are announced.
espeak.CAPITALS: announce capital letters by:
0=none,
1=sound icon,
2=spelling,
3 or higher, by raising pitch. This values gives the amount
in Hz by which the pitch of a word raised to indicate it
has a capital letter.
espeak.WORDGAP: pause between words, units of 10ms (at the default speed)
Return: espeak.EE_OK: operation achieved
espeak.EE_BUFFER_FULL: the command can not be buffered; you may
try to call the function again after a while.
espeak.EE_INTERNAL_ERROR.
Returns synthesis parameters. 'current' is a boolean that tells the function to return the current value, instead of the default one.
Specified a list of punctuation characters whose names are to be spoken when the value of the Punctuation parameter is set to "some". 'punctlist' is a array of character codes (as integers).
Controls the output of phoneme symbols for the text.
value=0 No phoneme output (default)
value=1 Output the translated phoneme symbols for the text
value=2 as (1), but also output a trace of how the translation was done
(matching rules and list entries)
'filehandle' is the output stream for the phoneme symbols (and trace). If
nil then it uses io.stdout.
This function returns no values.
Compile pronunciation dictionary for a language which corresponds to the
currently selected voice. The required voice should be selected before
calling this function.
'path' is the directory which contains the language's "_rules" and
"_list" files. 'path' should end with a path separator character ('/').
'filehandle' is the output stream for error reports and statistics
information. If nil, then io.stderr will be used.
'flags' is a integer bitvector that accepts the following values:
Bit 0: include source line information for debug purposes (as is
displayed with the -X command line option in 'speak' command).
for compatibility with previous versions of Lua-eSpeak, passing 'nil' or
not passing this parameter is interpreted as zero.
This function returns no values.
Reads the voice files from espeak-data/voices and creates an array of voice tables. If 'voice_spec' is given, then only the voices which are compatible with the 'voice_spec' are listed, and they are listed in preference order.
Searches for a voice with a matching "name" field. Language is not
considered. "name" is a UTF8 string.
Return: espeak.EE_OK: operation achieved
espeak.EE_BUFFER_FULL: the command can not be buffered; you may
try to call the function again after a while.
espeak.EE_INTERNAL_ERROR.
An voice table is used to pass criteria to select a voice. Any of the
following fields may be set:
name nil or a voice name
languages nil or a single language string (with optional dialect), eg.
"en-uk", or "en"
gender 0 or nil = not specified, 1 = male, 2 = female
age 0 or nil = not specified, or an age in years
variant After a list of candidates is produced, scored and sorted,
"variant" is used to index that list and choose a voice.
variant=0 takes the top voice (i.e. best match), variant=1
takes the next voice, etc
Return: espeak.EE_OK: operation achieved
espeak.EE_BUFFER_FULL: the command can not be buffered; you may
try to call the function again after a while.
espeak.EE_INTERNAL_ERROR.
Returns a voice table data for the currently selected voice. This is not affected by temporary voice changes caused by SSML elements such as <voice> and <s>.
Stop immediately synthesis and audio output of the current text. When this function returns, the audio output is fully stopped and the synthesizer is ready to synthesize a new message. This function returns espeak.EE_OK if the operation was achieved or espeak.EE_INTERNAL_ERROR.
Returns 'true' if audio is playing or 'false' otherwise.
This function returns when all data have been spoken. Returns espeak.EE_OK if the operation was achieved or espeak.EE_INTERNAL_ERROR.
Last function to be called. Returns espeak.EE_OK if the operation was achieved, espeak.EE_INTERNAL_ERROR on eSpeak error. This function yells errors if called before initialization or more then once.
This program utters a phrase and quits. You can listen to the resulting audio here.
require "espeak"
local text = "One Ring to rule them all."
espeak.Initialize(espeak.AUDIO_OUTPUT_PLAYBACK, 500)
if espeak.SetVoiceByName("english") ~= espeak.EE_OK then
print("Failed to set default voice.")
return
end
espeak.Synth(text, 0, espeak.POS_WORD, 0, nil)
espeak.Synchronize()
espeak.Terminate()
This program speaks with some voices available in eSpeak. You can listen to the resulting audio here.
require "espeak"
espeak.Initialize(espeak.AUDIO_OUTPUT_PLAYBACK, 0)
local langs = {
{ "pt-br", "Português brasileiro" },
{ "pt", "Português Europeu" },
{ "en-gb", "British English" },
{ "en-us", "American English" },
{ "de", "Deutsch" },
{ "eo", "Esperanto" },
{ "it", "Italiano" },
{ "es", "Español" },
{ "fi", "Suomi" }
}
for _, l in ipairs(langs) do
if espeak.SetVoiceByProperties({ languages = l[1] }) == espeak.EE_OK then
espeak.Synth(l[2])
end
espeak.Synchronize()
end
espeak.Terminate()
This program speaks the current time in the "spoken" (informal) Brazilian Portuguese. You can listen to the resulting audio here.
-- -*- coding: utf-8 -*-
--
-- Functions to "say" the time of the day.
-- (c) 2007-08 Alexandre Erwin Ittner <aittner@gmail.com>
--
-- This file is part of Lua-eSpeak and is distributed under the GNU GPL v2
-- or, at your option, any later version.
--
--
module("saytime", package.seeall)
-- Spoken Portuguese
local pt_hs = { "uma", "duas", "três", "quatro", "cinco", "seis", "sete",
"oito", "nove", "dez", "onze" }
function pt_spoken(h, m)
local mt
if m > 30 then
h = h + 1
if h == 24 then
return (60 - m) .. " para a meia-noite"
elseif h == 1 then
return (60 - m) .. " para a uma da manhã"
elseif h > 1 and h < 12 then
return (60 - m) .. " para as " .. pt_hs[h] .. " da manhã"
elseif h == 12 then
return (60 - m) .. " para o meio-dia"
elseif h == 13 then
return (60 - m) .. " para a uma da tarde"
elseif h > 13 and h < 19 then
return (60 - m) .. " para as " .. pt_hs[h - 12] .. " da tarde"
else
return (60 - m) .. " para as " .. pt_hs[h - 12] .. " da noite"
end
else
if m == 30 then
if h == 0 then
return "meia-noite e meia"
elseif h == 12 then
return "meio-dia e meio"
elseif h > 12 then
mt = pt_hs[h - 12] .. " e meia"
else
mt = pt_hs[h] .. " e meia"
end
else
if h == 0 then
mt = "meia-noite"
elseif h == 12 then
mt = "meio-dia"
elseif h > 12 then
mt = pt_hs[h - 12]
else
mt = pt_hs[h]
end
if m ~= 0 then
mt = mt .. " e " .. m
end
if h == 0 or h == 12 then
return mt
end
end
if h < 12 then
return mt .. " da manhã"
elseif h > 12 and h < 19 then
return mt .. " da tarde"
else
return mt .. " da noite"
end
end
end
-- Formal portuguese.
function pt_formal(h, m)
local hs = ""
if h == 0 then
hs = "zero hora"
elseif h == 1 then
hs = "uma hora"
else
hs = h .. " horas"
end
if m == 1 then
hs = hs .. " e um minuto"
elseif m > 1 then
hs = hs .. " e " .. m .. "minutos"
end
return hs
end
#!/usr/bin/env lua
-- -*- coding: utf-8 -*-
-- Says the time of the day as in the spoken Portuguese.
-- (c) 2007-08 Alexandre Erwin Ittner <aittner@gmail.com>
-- Distributed under the GPL v2 or later.
require "espeak"
require "saytime"
espeak.Initialize(espeak.AUDIO_OUTPUT_SYNCH_PLAYBACK, 500)
if espeak.SetVoiceByName("brazil") ~= espeak.EE_OK then
print("Impossível localizar a voz correta.")
return
end
local dt = os.date("*t")
espeak.Synth(saytime.pt_spoken(dt.hour, dt.min))
espeak.Terminate()
There are some useful examples in the demos directory
within the distribution package.
Author: Alexandre Erwin Ittner
E-mail: aittner#gmail.com
(e-mail obfuscated to avoid spam-bots. Please replace the "#" with an "@").
GnuPG/PGP Key: 0x0041A1FB
(key fingerprint: 9B49 FCE2 E6B9 D1AD 6101 29AD 4F6D F114 0041 A1FB).
Homepage: http://users.netuno.com.br/aittner/.
Location: Jaraguá do Sul, Santa Catarina, Brazil.