Ray-Ban Meta Glasses Development: Day 3 (Hey James)

Summary

March 18, 2026: Focused on automating voice recording via new wake word "Hey James".

Reason: The default "Hey Meta" wake word routes all audio to Meta AI (no processing control). "Hey James" keeps audio in-app for full control.

Progression: It took 9 builds to get a clean implementation which amounted to ~3 hours. I cut the video short to ~30 minutes. The video is a bit longer than my last two because I want to start showing how development is progressing from build to build. Future videos will most likely be ~20-30 minutes to show build progress.

James Origin: The name James has a close meaning to me, so that's why I'm going with Hey James. However, I'm still unsure of the spelling I want to use. I might go with James, Jaemes, or Jaems. I initially went with Jay because of "J" in James, but that's too close to Tony Stark calling Jarvis "J" for short. I don't want to bite that.

Objectives

Create Picovoice account
Create a wake word "Hey James" which will initiate audio processing
Integrate with Picovoice/Porcupine
Test wake word "Hey James"
Play back audio to ensure voice was recorded with wake word

Issues Encountered

Default Wake Word

The default wake word for the Ray-Ban Meta glasses is "Hey Meta" which doesn't work with the app I'm building. Reason being "Hey Meta" initiates a process in the Meta AI app (where I have no control), not the app I'm building (where I have full control). To route processing to my app, I need a new wake word to plug into my app so it knows when to take over.

To solve this issue, I've integrated with Porcupine and created a custom wake word: Hey James. In my app, I've configured it to listen for "Hey James" so it knows when to take over. All audio is routed to my app and not Meta AI.

False Trigger Recording

Tapping "Start Listening" for the wake word feature automatically triggered audio recording. It didn't wait for me to say the wake word.

To solve this, I simply moved some code around so that recording only happens on "Hey James" instead of a button tap. In my last video, audio recording is triggered by a button tap. This is no longer the use case because I don't need a manual method to process audio. Moving forward, "Hey James" will be the method to start processing audio.

Mismatched Audio Sample Rating

The wake word wasn't being picked up in the app. Reason being the audio recorded from my glasses and sent from my app (to Porcupine) were in the wrong format.

To solve this, I had to ensure the audio sample of the wake word was sent to Porcupine in the format it expects:

- Exactly 16kHz sample rate
- Mono (single channel)
- Int16 PCM samples
- Exactly 512 samples per frame

The flow for converting the audio sample is as follows:

"Hey James" wake word picked up from glasses mic (sampled at 48kHz Stereo, Float32)
    |
    ↓
AVAudioConverter → Convert sample to 16kHz, Mono, Float32
    |
    ↓
Cast sample from Float32 → Int16  (same 16kHz mono data, just re-expressed)
    |
    ↓
Create buffer to hold sample
    |
    ↓
Loop through buffer, chunking it into 512-sample frames
        |
        ↓
    Send sample to Porcupine.process(pcm: frame)
        |
        ├── keywordIndex < 0  → discard frame, wait for next
        |
        └── keywordIndex >= 0 → wake word detected
                |
                ↓
            stopStreaming() → startRecording()

Wake Word Only Works Once

The wake word only works once per recording. When the recording stops, the wake word feature needs to be completely restarted to work again. This will be fixed in my next video/iteration.