Getting raw audio from the Nicla Voice

Sunday, 8 September, 2024 // software

If you’re working with a Nicla Voice and got stuck trying to grab its audio data, you’ve come to the right place. This is a fairly short article to document some of my learnings while attempting to collect audio data for training a ML model. While I managed to successfully transmit raw PCM data over serial, the resulting audio files are almost unusable due to limitations of Arduino’s serial transmission implementation.

As usual, I’ve uploaded working code onto my GitHub repository for your reference. Enjoy!

Working with NDP

Contrary to what the Arduino documentation suggests¹, the PDM library cannot actually be used on the Nicla Voice. After digging through tons of documentation, source code and datasheets, I’ve come to the conclusion that the only way to grab audio data from the microphone is through the NDP class.

This was tricky to figure out and work with because the NDP class will only be initialised if all 3 required synpkg files on your flash are successfully loaded. It also means that your Nicla Voice will always boot with a bunch of debug information written to UART, regardless of whether you need to use the NN model.

// without these, you can't use your Nicla Voice at all:
NDP.begin("mcu_fw_120_v91.synpkg");
NDP.load("dsp_firmware_v91.synpkg");
NDP.load("ei_model.synpkg");
// in loop(), NN model runs immediately and starts printing into serial if there are match events

The solution is pretty simple. Send a b character from your Python script and use it to:

stop inferencing, and
control your device loop

// audio-serial.ino, loop()
while (1) {
  int thisChar = Serial.read();
  if (thisChar == 98) { // 'b' character
    break;
  }
}

# mic_serial.py
ser.write('b'.encode('utf-8'))
while True: 
  data = ser.readline() # clear the buffer
  if data: break

Extracting and transmitting

Unlike the Nicla Vision, the only way you can access audio samples is through the NDP.extractData method:

uint8_t _local_audio_buffer[_samples] = { 0 };

while (_local_current_samples < required_samples) {
  NDP.extractData(&_local_audio_buffer[read], &len);

  if (len != 0) {
    read += len;
    if (read >= read_size * 2) {
      Serial.write(_local_audio_buffer, read);
      _local_current_samples += (read / 2);
      read = 0;
    }

  } else {
    delay(1);
  }
}

Although the IM69D130 microphone documentation² states that it produces PDM output, the nRF52832 SoC³ has decimation filters that automatically converts PDM data into PCM samples.

Module layout of nRF52832's PDM module — Decimation filters convert PDM data into PCM samples

Take note that NDP.extractData writes into a uint8_t array, which was strange at first because PCM data is usually in 16-bit signed integer format. I assumed that this was for ease of writing into a block device or transmitting over serial (one byte at a time).

Next, create and manage buffers sized according to the device’s audio chunk size and your desired sample length. Then, clear the extracted audio buffer whenever it reaches a certain size and just write it to the serial port.

Receiving audio samples

If you’re using my Python script, you’ll need to install the pyserial module to proceed. By default, the Nicla Voice’s microphone produces mono-channel 16-bit audio samples at a rate of 16000Hz, so just replicate the settings accordingly:

with wave.open('output.wav', 'wb') as wav_file:
  wav_file.setframerate(16000)
  wav_file.setnchannels(1)
  wav_file.setsampwidth(2)

  try:
    while True:
      # Read data from serial port, 1024 bytes at a time
      data = ser.read(1024)
      if not data: break
      # received odd number of bytes
      if (len(data) % 2) != 0:
        data = data[:-1]
      # 16-bit signed PCM samples (big endian)
      samples = struct.unpack(f'>{len(data)//2}h', data)
      # convert to little endian
      wav_file.writeframes(struct.pack(f'<{len(samples)}h', *samples))

All that’s left to do is to unpack binary data from the serial port and write them into a .wav file. Because we’re receiving 16-bit data, getting the byte order (endianness) correct is the difference between an accurate audio file and static noise that will make your ears bleed.

Conclusion

As mentioned in the beginning of this article, I wasn’t able to use the audio samples for training in the end as the sound was choppy at best, and sometimes missing large chunks of audio. I’m not too sure what the reason is at this point, but I’m guessing this is primarily due to the Arduino platform’s implementation of its Serial.write method, which blocks the device if the serial buffer is full. It doesn’t help that the buffer is only 64 bytes large! It seems as if the microphone stops sampling whenever the device is blocked.

A viable solution is to replicate Edge Impulse’s way of sampling audio, which is to write audio data directly to flash memory first before transmitting it later. This would give us much higher quality audio, but sample length would be limited to available Flash memory.