I’ve always been fascinated by technologies embedded in products – especially toys which use tech to provide interaction with the child playing with it. Indeed I think toys really ‘push’ technology as they need to be easy to understand and use yet be robust enough survive boistorous play.
From the earliest clockwork rabbits, to dolls containing mini built-in record players that randomly select a phrase to play from an embedded disc, through to TV interaction toys, they have all been designed to give a child access to their imagination through interactive creative play.
TV Teddy was a teddy bear sold in the USA in the 1990s, manufactured by Tomy. The toy worked by playing back VHS tapes with an embedded audio track hidden in the video signal. This audio track was received by a box sold with TV Teddy which the video signal from the VHS player had to be routed through. The VHS player was connected to the box and the box connected to the TV composite video input socket.
During video playback, the box detected the embedded audio track and transmitted it to a receiver inside TV Teddy itself, which then played the audio through a hidden speaker while moving its eyes and mouth. In this manner, TV Teddy would appear to interact with the TV show on the tape – usually conversing with the on-screen presenter or providing audience-style reactions such as laughing. Indeed, without TV Teddy, these video shows play as a weird one-sided monologue as you only hear one side of the conversation.
To get you acquainted with TV Teddy, please watch this video from DataBits who demonstrates it, as well as revealing its history. This video is the one that got me intrigued about how TV Teddy works:
So what’s going on? The clues from this video are:
- Audio embedded in video signal: it is the video feed that is providing the extra audio, somehow. Databits shows that the sound is not coming from the VHS tape’s audio track at all.
- Signal only on specially recorded VHS tapes: Databits also tells us that this interaction only works on ‘TV Teddy’ branded VHS tapes. On all other VHS tapes he has tried, the interaction is silent.
- TV Teddy box must detect something in the video signal: The box must be placed inline with the video signal, and Databits demonstrates that it turns on a green light to show it has detected the extra audio track and that it is transmitting a signal to TV Teddy. So what is it looking for in the video signal passing through it?
- Embedded audio must survive the low-resolution VHS picture: If you know VHS, you’ll have experienced its much softer playback picture compared to broadcast TV. This effect was the result of reducing the amount of information in the picture so it could be recorded onto domestic-grade videotape. As a result, trying to be ‘clever’ by hiding the audio signals in amongst the body of the picture itself would not survive the reduction in bandwidth that takes place. So this must be a simple audio embedding that survives low-bandwidth VHS.
Thank you, Databits! (Please go subscribe to his excellent retro-tech YouTube channel)
Intrigued by Databits’s video, I searched YouTube to find several TV Teddy complete video shows, choosing this one for closer analysis:
First of all, let me point out that this video has been recrafted for widescreen by YouTuber Brandon Hargus. Brandon has added the widescreen blue background to make it easier on the eye playing back on today’s widescreen devices. The original 4:3 ratio VHS tape recording is centred in the middle of the frame.
Playing back this video, what immediately jumped out at me was a strange wavering grey vertical bar to the left of the video picture. As you play the video, you see that, quite often, this grey bar will start showing lighter and darker ‘wave’ patterns in it. As the on-screen actor pauses talking, you’ll notice that the waves are visible. He must be pausing to allow TV Teddy to speak in their conversation. Those waves are apparently from an audio signal embedded to the left of the video picture.
Wouldn’t those waves be visibly annoying to viewers? No; all TVs are adjusted to perform an ‘overscan’ of a video picture so that it always fills the whole screen picture tube. This means that the audio track would be beyond the left-side of the TV screen and not visible.
However, the TV Teddy box would be looking out for this signal and, if found, turn on its interaction light.
Let’s take a closer look at some screen shots of the 4:3 part of the picture
In this first example, we know that nothing is being spoken by TV Teddy thanks to seeing the same start-up logo in Databits’s video. The result is a uniformly grey audio soundtrack:
Now the soundtrack is carrying a signal of some sort. Again, from Databits’s video we know TV Teddy is talking (well, singing) at this point. Note how the grey has changed to more of a black and white signal. This tells me that the audio is being carried as a range of brightness values between black and white. That’s what we see here:
Here’s a diagram illustrating this effect. Sound waves are converted to electrical signals by a microphone, creating an electrical waveform changing between positive and negative values, centred on zero. TV Teddy’s designers have taken this waveform and mapped it to a visible greyscale representation of the signal. The more positive this waveform becomes, the whiter its representation in this VHS audio track. The more negative, the blacker. At the point of no signal, it’s grey. If you track your eye from top to bottom of the image above, you can see it changing from black through greys to white and back again over time, representing positive and negative waveform values.
The diagram below shows my representation of that mapping: Each different position in the audio waveform has a lighter or darker grey assigned to it; lighter for positive waveform values, darker for negative waveform values, and mid-grey for 0 values.
In the still frame below is a more complicated signal representing a frame of rapid conversation by TV Teddy. Here I can see why the designers chose to store the signal vertically rather than horizontally. VHS tapes have a vertical resolution as good as the original content – it’s the horizontal resolution that requires more bandwidth and thus needs filtering down. If the designers had stored the audio horizontally, it could have lost some of the ‘treble’ and volume due to the same filtering because the whites and blacks would have been softened. It would also be more challenging for the box to decode as the audio soundtrack would be being delivered at a very high speed (in one line rather than over the duration of one frame):
Let’s take a look at these pictures using my video editor’s luma (brightness) scope in this animated GIF I’ve created from screenshots of Final Cut Pro X:
The scope shows the audio soundtrack at the start of each video line. At the start, it hovers just above the 50% mark with silent audio, then jumps up and down with audio present. All the TV Teddy box has to do is watch for the ‘line blanking’ signal telling the TV to start scanning the next line, then extract the luma value for the first part of each line. Looking at the 640 x 480 picture (the digital equivalent resolution of a US NTSC TV picture) the audio signal is occupying the first 20 pixels of each line (3% of each line).
A quick calculation: 60 fields per second x 240 lines (USA VHS pictures are 640 x 480 format and each field is half that vertical width, so 480 / 2 = 240) is 60 x 240 = 14,400 lines per second to sample, which would give an audio frequency bandwidth of no more than half that value, thanks to the Nyquist frequency rule – so 14,400 / 2 = 7,200 – that’s a maximum audio frequency of about 7 KHz, not great for music but fine for voice audio bandwidth.
So what have I guessed so far?
- The audio is carried as a form of greyscale repsentation of the audio waveform at the start of each TV line for a width of 20 equivalent pixels.
- The audio bandwidth has a maximum frequency of 7 KHz, perfect for voice audio.
How might I decode this signal?
- Read the downloaded video file frame by frame.
- Loop through each of the 480 vertical pixels in each frame at 10 horizontal pixels in from the left.
- Read the RGB value for each pixel and take the ‘G’ value as the soundtrack is greyscale (I could also take the R or the B values instead as they should be the same values).
- Store that value (a single 8-bit number) is between 0 and 255) in a WAV file format set to play at 14,400 samples per second.
- Subtract 127 (half of 255) from the sample so that the waveform is between +127 and -127, centred on 0, or else the waveform will be all positive values and not negative.
This is a great example of retro-toy technologies spearheading great innovation.
But, can I decode it and recover the audio? Let’s find out in part two!