The Talking PSoC

This is the HTML version of a project I submitted to the Cypress PSoC Design Contest for 2004. Contact me if you have an interest in using this technology. Homepage www.sensicomm.com Contact info about_contact.shtml

Summary

A combination of analog and digital blocks is used to form a speech synthesizer. Parameter tables in ROM drive the synthesizer to generate selected words or phrases. It can be integrated into other PSoC-based products to add a low-cost speech output capability. Talking clocks or voltmeters are examples of possible uses.

Figure 1 Speech synthesis using a model of the vocal tract.

Introduction

Voice output is a useful feature for situations where an operator is unable to use a graphic display. An example would be a talking voltmeter, where the user can focus on the device under test instead of glancing back and forth to the meter's visual display.

Conventional implementations use a digital signal processor (DSP) or dedicated synthesis chip to implement voice synthesis or playback of stored audio. For systems that already include a PSoC this added complexity can be avoided: some analog and digital blocks can be configured to synthesize intelligible speech with zero added external components (except for a few wires).

The design uses 6 of the 8 available switched-capacitor analog blocks and 3 digital blocks. The remaining analog and digital blocks are available for other functions. Alternatively, all of the analog and digital blocks could be reconfigured to perform other functions when the chip is not talking.

This note describes a reference implementation that speaks the phrases "Hello, I am a PSoC; I do amazing things" and "the temperature is seventy- three degrees." Additional words and phrases can be added as required by a specific application.

Voice Synthesis

The speech generation approach used here is known as vocoding, short for "voice coding." A vocoder uses analog or digital circuitry to mimic the human speech production mechanism (Figure 1). Speech sounds come in 2 basic flavors: (1) voiced sounds are generated by the vibration of the vocal folds in the larnyx, and can be approximated by a sequence of pulses at the pitch frequency. (2) Unvoiced sounds (like "s") are generated by turbulence in the vocal tract and can be approximated by a noise source.

The other parts of the vocal tract (mouth, tongue, teeth, etc) become a time-varying resonant structure that shapes the spectrum of the speech to form the desired sounds. A cascade of second-order bandpass filter sections provides a good model of the vocal tract functions.

So, to make speech we need a pulse source, a noise source, a pulse/noise switch, and some bandpass filters. Standard PSoC blocks can provide all these functions.

Figure 2 PSoC blocks arranged to form a speech synthesizer.

Block Structure

The PSoC implementation uses the following blocks, arranged as shown in Figure 2:

8-bit timer: generates a periodic pulse train to represent voiced excitation.
PRS random sequence generator: generates a noise-like sequence to model unvoiced excitation.
VGA: variable-gain amplifier to control volume level.
BPF2_1-3: 3 cascaded bandpass filter sections, to form the vocal tract filter.

A mux is also required, to connect either the PRS output or the timer output to the input of the VGA. The standard MUX block could be used to provide this capability, but a simpler approach is to do it within the digital infrastructure. The PRS output is directed to the "Row 0 Out 0" internal bus, and the Timer output is directed to the "Row 0 Out 1" bus. The Row Digital Interconnect (RDI) contains a lookup table (LUT) that can combine two inputs into a single output. In the configuration used here, the PRS output is the A input to LUT 0 and the Timer output is the B input to it. The desired output is selected by writing 3 (for A) or 5 (for B) to the 4 lsb's of the RDI0LT0 register.

Most of the interconnections between blocks are made using the internal connection paths. The connection between the multiplexed output of the digital blocks and the VGA requires the use of external pins, as does the connection from BPF2_2's output to BPF2_3's input.

The PSoC BPF2 blocks each require a pair of switched-capacitor blocks that are horizontally or vertically adjacent. This design uses the horzontal configuration for the first two, but a vertical configuration for BPF2_3. The use of a vertical configuration leaves both unused analog blocks in the same column, so they are not required to use the same analog clock that the filters use. This provides more flexibility in assigning additional functions to these unused blocks.

Hardware Configuration

The sample code is written for the PSoC Invention Board. A photo of the prototype configuration used for software development and testing is shown in Figure 3. Essential input and output connections for the board are listed in Table 1.

The synthesized output is available at the output of BPF2_3 on pin 25. A Realistic SA-10 audio amplifier was used to monitor the audio during development. The output level is typically 1-2 volts peak-to-peak, which is suitable for direct connection to the line input of most audio amplifiers (a DC blocking capacitor may be required with some). An LM386 or a more modern class D amplifier could be used to drive a speaker directly, if desired.

Software

The demonstration software implements a loop that periodicly updates the parameters of the PSoC blocks. Code in main.asm performs the initialization and configuration of the PSoC blocks, sets the parameters to reasonable default values, and then enters a loop that plays out a test message. Routine vtable.asm contains the code to step through a table of parameters and update the PSoC blocks. Various include files contain the parameter sequences for the message to be spoken. These files are generated externally.

The following fragment of code plays a message:


	call _vtableset0
	
call _sayphrase

Routine _vtableset0 sets a pointer to the beginning of the message to be spoken:


	_vtableset0:

	mov [vtablea],>_vtable0 ; A is the msb's.

	mov [vtablex],<_vtable0 ; X is the lsb's.

	ret

Other routines (_vtableset1, etc.) are similarly used to select other messages to be spoken.

Routine _sayphrase implements loop containing a delay to the start of the next update interval, followed by an update to the block parameters.


	_sayphrase:

	call delay10ms

	call _updatefilt

	cmp A,0  ; If updatefilt()=0, continue.

	jz _sayphrase

	ret

Routine _updatefilt reads parameters from the ROM table and updates the settings of the analog and digital blocks using the standard PSoC routines. It steps through the ROM table using routine _vtablenext


	_vtablenext: ; Return the next byte from

	; a ROM table in A. X destroyed.

	mov a,[vtablea]  ; Get the address.

	mov x,[vtablex]  ;

	romx ; Load the byte at rom[(a<<8)+x]

	add [vtablex],1  ; Increment.

	adc [vtablea],0  ;

	ret

The parameters stored in the ROM tables are the pitch counter repeat value, VGA gain setting, and the capacitor values for the filter blocks. The MUX selection value is derived from the pitch value.

The gain and capacitor values are determined by analysis of a recorded sample of the desired phrase. Bandpass center frequency and Q values can be determined using speech analysis techniques described in texts such as Speech Communication - Human and Machine, by D. O'Shaughnessy. These can then be converted to capacitor values using the equations from the bandpass filter documentation.

Summary

This PSoC configuration and its software provide a simple and low-cost way to add voice output to various devices. As with all vocoders, the output is highly intelligible, with a somewhat "robot-like" quality. Enhancements that can be added in the future include control of the playback rate or on-the-fly modification of the pitch and volume for emphasis or other special effects.

Figure 3 Prototype on a solderless breadboard.

Table 1 Connections to the prototyping board.

Output pin	Input pin	Description
8	24	Pitch pulses or Pseudorandom noise to VGA input
2	20	BPF2_2 output to BPF2_3 input
25	-	Output speech

PsoC(TM) (Programmable System-on-Chip(TM)) is a trademark of Cypress Microsystems, Inc. All other trademarks or registered trademarks referenced herein are the property of the respective corporations.

$Id: psoc.shtml,v 1.2 2016/02/16 12:29:00 jrothwei Exp jrothwei $

Sensicomm LLC - DSP design services.

The Talking PSoC

Summary

Introduction

Voice Synthesis

Block Structure

Hardware Configuration

Software

Summary