4.2 PCM Objects

The acronym PCM is short for Pulse Code Modulation and is the method used in ALSA and many other places to handle playback and capture of sampled sound data.

PCM objects in alsaaudio are used to do exactly that, either play sample based sound or capture sound from some input source (perhaps a microphone). The PCM object constructor takes the following arguments:

class PCM( [type], [mode], [cardname])

type - can be either PCM_CAPTURE or PCM_PLAYBACK (default).

mode - can be either PCM_NONBLOCK, PCM_ASYNC, or PCM_NORMAL (the default). In PCM_NONBLOCK mode, calls to read will return immediately independent of wether there is any actual data to read. Similarly, write calls will return immediately without actually writing anything to the playout buffer if the buffer is full.

In the current version of alsaaudio PCM_ASYNC is useless, since it relies on a callback procedure, which can't be specified from Python.

cardname - specifies which card should be used (this is only relevant if you have more than one sound card). Omit to use the default sound card

This will construct a PCM object with default settings:

Sample format: PCM_FORMAT_S16_LE
Rate: 8000 Hz
Channels: 2
Period size: 32 frames

PCM objects have the following methods:

pcmtype( )
Returns the type of PCM object. Either PCM_CAPTURE or PCM_PLAYBACK.

pcmmode( )
Return the mode of the PCM object. One of PCM_NONBLOCK, PCM_ASYNC, or PCM_NORMAL

cardname( )
Return the name of the sound card used by this PCM object.

setchannels( nchannels)
Used to set the number of capture or playback channels. Common values are: 1 = mono, 2 = stereo, and 6 = full 6 channel audio. Few sound cards support more than 2 channels

setrate( rate)
Set the sample rate in Hz for the device. Typical values are 8000 (poor sound), 16000, 44100 (cd quality), and 96000

setformat( )
The sound format of the device. Sound format controls how the PCM device interpret data for playback, and how data is encoded in captures.

The following formats are provided by ALSA:

Format Description
PCM_FORMAT_S8 Signed 8 bit samples for each channel
PCM_FORMAT_U8 Signed 8 bit samples for each channel
PCM_FORMAT_S16_LE Signed 16 bit samples for each channel (Little Endian byte order)
PCM_FORMAT_S16_BE Signed 16 bit samples for each channel (Big Endian byte order)
PCM_FORMAT_U16_LE Unsigned 16 bit samples for each channel (Little Endian byte order)
PCM_FORMAT_U16_BE Unsigned 16 bit samples for each channel (Big Endian byte order)
PCM_FORMAT_S24_LE Signed 24 bit samples for each channel (Little Endian byte order)
PCM_FORMAT_S24_BE Signed 24 bit samples for each channel (Big Endian byte order)
PCM_FORMAT_U24_LE Unsigned 24 bit samples for each channel (Little Endian byte order)
PCM_FORMAT_U24_BE Unsigned 24 bit samples for each channel (Big Endian byte order)
PCM_FORMAT_S32_LE Signed 32 bit samples for each channel (Little Endian byte order)
PCM_FORMAT_S32_BE Signed 32 bit samples for each channel (Big Endian byte order)
PCM_FORMAT_U32_LE Unsigned 32 bit samples for each channel (Little Endian byte order)
PCM_FORMAT_U32_BE Unsigned 32 bit samples for each channel (Big Endian byte order)
PCM_FORMAT_FLOAT_LE 32 bit samples encoded as float. (Little Endian byte order)
PCM_FORMAT_FLOAT_BE 32 bit samples encoded as float (Big Endian byte order)
PCM_FORMAT_FLOAT64_LE 64 bit samples encoded as float. (Little Endian byte order)
PCM_FORMAT_FLOAT64_BE 64 bit samples encoded as float. (Big Endian byte order)
PCM_FORMAT_MU_LAW A logarithmic encoding (used by Sun .au files)
PCM_FORMAT_A_LAW Another logarithmic encoding
PCM_FORMAT_IMA_ADPCM a 4:1 compressed format defined by the Interactive Multimedia Association
PCM_FORMAT_MPEG MPEG encoded audio?
PCM_FORMAT_GSM 9600 constant rate encoding well suitet for speech

setperiodsize( period)
Sets the actual period size in frames. Each write should consist of exactly this number of frames, and each read will return this number of frames (unless the device is in PCM_NONBLOCK mode, in which case it may return nothing at all)

read( )
In PCM_NORMAL mode, this function blocks until a full period is available, and then returns a tuple (length,data) where length is the size in bytes of the captured data, and data is the captured sound frames as a string. The length of the returned data will be periodsize*framesize bytes.

In PCM_NONBLOCK mode, the call will not block, but will return (0,'') if no new period has become available since the last call to read.

write( data)
Writes (plays) the sound in data. The length of data must be a multiple of the frame size, and should be exactly the size of a period. If less than 'period size' frames are provided, the actual playout will not happen until more data is written.

If the device is not in PCM_NONBLOCK mode, this call will block if the kernel buffer is full, and until enough sound has been played to allow the sound data to be buffered. The call always returns the size of the data provided

In PCM_NONBLOCK mode, the call will return immediately, with a return value of zero, if the buffer is full. In this case, the data should be written at a later time.

A few hints on using PCM devices for playback

The most common reason for problems with playback of PCM audio, is that the people don't properly understand that writes to PCM devices must match exactly the data rate of the device.

If too little data is written to the device, it will underrun, and ugly clicking sounds will occur. Conversely, of too much data is written to the device, the write function will either block (PCM_NORMAL mode) or return zero (PCM_NONBLOCK mode).

If your program does nothing, but play sound, the easiest way is to put the device in PCM_NORMAL mode, and just write as much data to the device as possible. This strategy can also be achieved by using a separate thread with the sole task of playing out sound.

In GUI programs, however, it may be a better strategy to setup the device, preload the buffer with a few periods by calling write a couple of times, and then use some timer method to write one period size of data to the device every period. The purpose of the preloading is to avoid underrun clicks if the used timer doesn't expire exactly on time.

Also note, that most timer API's that you can find for Python will cummulate time delays: If you set the timer to expire after 1/10'th of a second, the actual timeout will happen slightly later, which will accumulate to quite a lot after a few seconds. Hint: use time.time() to check how much time has really passed, and add extra writes as nessecary.