A few months ago, I was in the market for a high definition IP based baby monitor, but they were all expensive and unreliable. Why IP based? I wanted the convenience of monitoring my daughter on our computers, smartphones and tablets. After reading about security vulnerabilities and general unreliability of off-the-shelf low cost IP based baby monitors, I started thinking… How hard can it be to build my own baby monitor?
Raspberry pi has great community support, all required software should be readily available, and open source too. The Pi NoIR camera board has just been announced recently. I should be able to throw something together and satisfy my “hardware-hacking” itch at the same time. This should be easy…
What do I want on my baby monitor?
The basic things are:
- Low latency high definition video
- True day and night vision
- Audio
- IP based so it is viewable on my smartphones, tablets and PCs.
On the hardware side, that translates to:
- Raspberry Pi
- Pi NoIR camera
- Switchable IR-filter
- IR LEDs
- Pi compatible sound card
- and a microphone
On the software side, it translates to:
- Raspbian
- Raspivid
- Sound card driver
- Audio video encoder
- and a web server
The tasks can be split into 3 categories: Optics/video, Audio, and Audio Video live streaming.
Optics / video
The basic concept of night vision is by shining infrared (IR) light to illuminate a small area which can be picked up by an infrared sensitive camera. Since infrared is invisible to the human eye, it appears as though the device can “see in the dark”, aka night vision. The Pi NoIR camera was selected because unlike most cameras, it does not have an IR filter built in. However, without the IR filter, the camera is susceptible to colour distortion caused by IR from the sunlight, and day-time images would not be colour accurate anymore, so I added a switchable IR filter. To get any useful night vision, I needed an array of IR LEDs to illuminate at least 3 meters at the front.
Day vs Night through the lens of the Raspberry Pi baby monitor
To achieve HD video quality, the video stream needs to be encoded so it can be transmitted over low bandwidth links like WiFi, but the Raspberry Pi CPU is too weak to perform real-time video encoding. Luckily, the Raspberry Pi has a powerful GPU that is capable of encoding the video stream to H264 format, and it is supported by “raspivid” camera software. Raspivid command line interface is straightforward. One thing that is worth mentioning is the GOP option (“-g” option) that specifies the I-frame (also called keyframe) interval. For simplicity sake, the H264 video is made up of I-frames, and P-frames. I-frames contain the full picture information and P-frames contain the delta of the frames. A low I-frame rate generates small amount of video data, but video players take longer to recover from bit errors. Conversely, a high I-frame rate generates large amount of video data, but gives a faster recovery time. Having an I-frame once per second is a good compromise.
Audio
The Raspberry Pi is not capable of accepting audio input but it has an I2S port (Inter-IC Sound) for digital audio. The sound card that I chose is based on Texas Instruments TLV320AIC23B audio codec IC and can be configured via I2C (Inter-IC) control interface. I also attached a microphone with a pre-amp circuit to it. After modifying my Linux kernel and compiling the open-source drivers, I managed to get the sound card to appear as an ALSA device in Linux. It was relatively easy to get the drivers to work, but unfortunately, I was getting clicking noise in the right channel. After a lot of swearing and hair pulling, I eventually found that it was caused by the audio stream crossing clock domains. The sound card has its own 12MHz clock while the Pi has its 500MHz clock. The two independent clocks drifted in frequency and the Pi over-sampled the audio samples and caused a clicking noise. I removed the 12MHz crystal and got the Pi supply the clock, hoping that would solve the problem, but it was more complicated than that. The Pi’s 500MHz clock is not perfectly divisible to the standard audio sample rates of 8kHz, 16kHz, 32kHz, 44.1kHz, 48kHz, and 96kHz. To make things worse, the clock signal had to run over a long piece of wire. I could not remove the clicking noise completely, so I ended up with a software workaround. The TLV320AIC23B chip duplicates the microphone input on both channels, and only the right channel was noisy, so I ended up using the left channel as mono channel output.
Preparing audio and video streams for live streaming
For live streaming, I used RTSP protocol to support Android and iOS mobile devices, and RTMP protocol to support PCs via web browsers. That means the audio and video streams have to be merged at the Raspberry Pi and transmitted as a single RTSP stream. Luckily, there is an open-source software for handling multimedia data, called “ffmpeg”. It is a very powerful and complex tool, and as a consequence, has a complex command line interface. The command looks like this: ffmpeg -fflags +nobuffer -re -i $VIDEO_FIFO -fflags +nobuffer -re -f alsa -ar 16000 -ac 2 -i hw:1,0 -map 0:0 -map 1:0 -c:v copy -strict -2 -c:a aac -b:a 16k -ac 1 -af “pan=1c | c0=c1“ -f rtsp -metadata title=bbPiCam rtsp://0.0.0.0:554 Basically, that command instructs ffmpeg to: |
- Read audio and video streams at their respective native frame rate with no buffering to achieve minimal latency,
- Copy the video stream and AAC encode the audio stream to produce a single output stream,
- Treat the left audio channel as mono channel,
- Name the output stream bbPiCam and send it to localhost at port 554 in RTSP format.
The RTSP stream is then handed over to “crtmpserver” software to convert the stream to RTMP format. Crtmpserver is also capable of relaying the RTSP stream to interested clients (Android and iOS devices in this case). As for PCs, I used “nginx” web server to serve a web page with an embedded flash player (JWPlayer) for playing RTMP in a web browser.
It was all well and good at this point, except the audio and video was slowly drifting out of sync, i.e. the audio was lagging the video further and further over time. The lag was more than 20 seconds and that was unacceptable. Upon further investigation, I found that ffmpeg has a built in buffer called “interleave buffer” that deals with input jitter in order to synchronize audio and video.
Unfortunately, jitter was not the problem here. It was the video crossing clock domains, from the camera’s 25MHz to the Pi’s 500MHz. Reducing ffmpeg’s “interleave buffer” was the obvious solution here, but a bug in ffmpeg prevented it from accepting my new interleave buffer configuration, which resulted in me hacking ffmpeg’s source code. The outcome was a beautiful HD quality video with less than 2 seconds latency!
Before I landed on RTSP and RTMP protocols, I looked into alternate ways of streaming live video. MPEG-TS and HLS (HTTP Live Streaming developed by Apple) combination was a strong contender, but HLS was not designed for live streaming as it is basically a playlist of short MPEG-TS videos and therefore introduced a huge latency. It is also not well supported by web browsers except for Safari.
Another alternative was MPEG-TS and multicast combination. Multicast is an efficient way of distributing content in a network, but it requires IGMP snooping on the network switch. Network switches without IGMP snooping flood the network which may severely affect the performance of the network.
As spit polish and ribbon on top, I added an RGB LED and a temperature sensor to show the room temperature. The LED colour ranges from blue (18°C) and progressively turns green (22°C) then red (26°C). I also added a push button to shut down the Pi properly to prevent filesystem corruption.
Finally, my bbPiCam was born – pardon the pun
This project turned out to be more difficult than I expected. If I have enough time, I would replace the sound card with a better one. One really useful feature that I missed out is the ability to stream and play music on the baby monitor. I found that white-noise and soothing background music can help babies sleep better. All that is needed is to add “Modipy” software, which is capable of playing music from internet radio, local and networked storage. Modipy can be controlled from mobile devices via MPD and web interface, so that fits in nicely into the user experience of the baby monitor. I hope my experience here can help anyone who wants to venture into IP based camera projects.Project instructions and source code available here: https://github.com/jasaw/bbPiCam