Computer stuff Innovation Programming Retro-Tech

Decoding TV Teddy – Part Two: Programming and Audio Output

As a purely fun and academic exercise, I’m going attempt to decode the TV Teddy audio track embedded in a TV Teddy video programme, and output the audio as a separate file. I’ll then try and play back both the audio file and the YouTube video in sync to enjoy this particular TV Teddy episode with full dialogue for the first time.

I’ll be using the Python 2.7 programming language for this project so I’ve started a new project in my favourite Python development environment (Jetbrains PyCharm) which has a free community edition as well a commercial edition.

I’ll also need to download the OpenCV video file library (which can read MP4 format video files) using:

pip install opencv-python

Next, I must download the video file of an entire TV Teddy tape from this YouTube page:

…using Parallels Toolbox and its ‘Download Video’ app (other ways of downloading from YouTube are possible – make sure you don’t violate YouTube terms and conditions!).

I’ve renamed the file to make it easier to access, and created a folder called ‘media’ in the project, saving the file there as ‘media/TVTeddyShiningTimeStationSweetandSourEpisode26.mp4’.

To get started I’ve used the working Python OpenCV coding example at: https://stackoverflow.com/questions/33311153/python-extracting-and-saving-video-frames and adapted it thus:

import cv2
print 'TV Teddy Audio Extractor - CV2 version = ', cv2.__version__
vidcap = cv2.VideoCapture('media/TVTeddyShiningTimeStationSweetandSourEpisode26.mp4')
success, image = vidcap.read()
count = 0
success = True
while success:
    success,image = vidcap.read()
    count += 1
    if count % 1000 == 0:
        print 'count of frames so far =', count
        cv2.imwrite("media/frame%d.jpg" % count, image)  # save frame as JPEG file

print('total count of frames =', count)

In the above code, I am simply reading the video file and occasionally printing out frames once every count of 1,000 frames.

I ran the code for a few seconds and made it save a few frames to make sure it was working. Frame 1000 looks like this:

From this frame I note:

  • This video is in 1280 x 720 format.
  • Looking at the embedded audio track in an image editor, I found that its centre is 170 pixels in from the left.
  • The top of the grey soundtrack starts at vertical pixel 16, and the bottom ends at vertical pixel 700 (beyond those extremes appear to be distortions at the start and end of the frame, probably caused by the VHS player switching between its rotary heads as it reads the tape).

So my plan is to:

  1. Read each frame, and do this:
    1. Set a loop running from value 16 to value 700 and, at each position (170, 16) through to (170, 700) do this:
      1. Read the RGB value of the pixels and add those three values together together, subtracting 127 from each so that the waveform centres on value 0 and not on value 127. The values will be sampled between 0 and 255 so subtracting 127 will make them become sampled between 128 and -127. ‘Silence’ (denoted by grey rather than black or white) will then be recorded at 0 and not at 127. Summing the three R, G, and B samples will help with any subtetly that can still be derived from the change in greyness from sample to sample despite digitsation. I’ll save this value as 16-bit signed integer.
      2. Append the value to the end of an array which later be saved to the WAV file.
  2. Normalise the array of audio samples; that is, ‘amplify’ the sample values and re-centre them around the ‘0’ centre line.
  3. Save the array of samples in a WAV file requested to playback at 30 fps x (700-16) frame-lines per second = 20.520 KHz sample rate with 16bit sample size.

Now be warned about a patent: USA patent 5,808,869 “Method and apparatus for nesting secondary signals within a television signal” (and its international equivalents of the same name) owned by Shoot The Moon, who I have just discovered invented the TV Teddy technology in the first place. You may not be able to use this code – and certainly not in any commercial context – without their permission. Shoot The Moon have every right to earn from this patented idea until it expires. If you think you would enjoy creating new content compatible with TV Teddy, or decoding TV Teddy videos to use with other equipment, great! But contact Shoot The Moon via their website and agree some sort of licensing first. The code below is provided purely for your academic interest and self-education in Python programming.

import cv2
import array
import wave
import numpy as np

# A flag used to find out if the next video frame was read successfully
success = True

# Audio samples are counted into this variable:
currentSampleCount = 0

# This is the output WAV file's sample rate that will be written into its header information
wavSampleRate = 20483
# Each WAV file sample is 2 bytes (16-bit)
wavSampleSize = 2

# The Wav file will have a single mono audio channel
wavChannels = 1

# An array that will store all samples in a 16-bit signed integer format
sampleArray = array.array('h')

# The top and bottom video lines in the video frame where will measure the greyscale to get samples
audioLineModulationStartLine = 16
audioLineModulationEndLine = 700

# The horizontal position in the video frame where we will take the sample - ideally set to be in the centre
# of the greyscale audio line
audioLineCentrePixelToRead = 170

print 'TV Teddy Audio Extractor from YouTube 720p Source - CV2 version = ', cv2.__version__

# Open the video file
vidcap = cv2.VideoCapture('media/TVTeddyShiningTimeStationSweetandSourEpisode26.mp4')
if vidcap.isOpened():
    # Get some info on the file
    width = vidcap.get(cv2.CAP_PROP_FRAME_WIDTH)  # float
    height = vidcap.get(cv2.CAP_PROP_FRAME_HEIGHT)  # float
    fps = vidcap.get(cv2.CAP_PROP_FPS)
    frameCount = int(vidcap.get(cv2.CAP_PROP_FRAME_COUNT))
    print 'Incoming Video: Width=', width, ', height=', height, ', fps=', fps, ', framecount=', frameCount

    # Process the file frame by frame
    for currentFrame in range(0, frameCount):
        success, image = vidcap.read()

        if success:
            # For the current frame, read the grey line and extract a sample from each pixel
            for scanLine in range(audioLineModulationStartLine, audioLineModulationEndLine + 1):
                sampleValue = 0
                for rgb in range(0, 3):
                    sampleValue += int(image[scanLine, audioLineCentrePixelToRead, rgb])
                sampleArray.append(sampleValue)

        else:
            print 'Failed to read frame', currentFrame

        if currentFrame % 1000 == 0:
            print 'count of frames so far =', currentFrame, ' - ', int(currentFrame * 100 / frameCount), "%"

    # Close the incoming video file
    vidcap.release()

print 'Total count of frames =', frameCount
print 'Total count of samples =', currentSampleCount

print 'Analysing extracted audio...'
# Find the sum of sample sizes and the minimum & maximum sample size
sumSampleSize = 0
maxSampleValue = 0

for sampleIndex in range(0, len(sampleArray) - 1):
    sumSampleSize += sampleArray[sampleIndex]
    if maxSampleValue < sampleArray[sampleIndex]:
        maxSampleValue = sampleArray[sampleIndex]

# Calculate mean average sample size
meanSampleSize = int(sumSampleSize / len(sampleArray))

# Now alter the sample values to become rebalanced around zero based
# on the mean sample size, and amplified by multiplying the samples
# based on the amplifyValue - a process called 'normalisation'
print 'Normalising....'
maxSampleValue = maxSampleValue - meanSampleSize
amplifyValue = 24000 / maxSampleValue
for sampleIndex in range(0, len(sampleArray)):
    sampleArray[sampleIndex] = int((sampleArray[sampleIndex] - meanSampleSize) * amplifyValue)

# Write the output WAV file
print 'Writing WAV file...'
f = wave.open('media/output.wav', 'w')
f.setparams((wavChannels, wavSampleSize, wavSampleRate, len(sampleArray), "NONE", "Uncompressed"))
f.writeframes(sampleArray.tostring())  # Important to convert to string or only half the audio will be written out
f.close()

print 'Completed!'
print 'Now use an audio application such as Audacity (free) to read the output WAV file and' \
      ' increase the pitch by 100% (i.e. double it)'

Let’s look and listen to the output:

The view from Audacity, the free audio editing application:

The sound is a little faint so I’ll use Audacity to boost the audio and save the opening minute as an MP3 so you can take a listen:

The sample rate seems just about spot on as the audio is keeping time when played back in sync with the video. But, if you watch the demonstration in Databits’s video, the audio pitch is a lot higher than here.

This makes me wonder if that’s something that the tech in TV Teddy is doing? So, using Audacity, I’ve double the pitch while keeping the same speed. How does it sound now?

Spot on! So what’s happened is that, in order to stay within the limited bandwidth of this audio soundtrack, the designers have halved the pitch (but not speed) of the audio during recording, and either the TV Teddy box or TV Teddy receiver inside the bear takes the audio and doubles the pitch on playback to restore the child-like voice without needing extra audio bandwidth. All clever stuff – and remember this is all done with analogue circuits in the 1990s.

The distortions in the audio are going to be from the digitisation of the VHS tape where subtle differences between continuous analogue grey levels are lost in the sampling of each video frame. Digital compression (including YouTube) will also have played their part in damaging this subtlety, too. The ‘buzzing’ effect on the voice is caused by the jump from frame to frame. I’m guessing that the not-exactly-high-fidelity speaker inside TV Teddy’s body reduces the obviousness of this effect for its intended listeners – or there’s a high-pass filter in the circuit blocking low frequencies before the pitch-doubling takes place.

OK, program completed execution and your audio file is thusly served.
Sync it with the video at the top of this blog post and enjoy! If necessary briefly pause-and-play the video if it gets slightly ahead, likewise this audio track if it gets ahead:

Footnote: Now of course you’re going to ask: Would it be possible to create a video with a TV Teddy compatible soundtrack?

The answer is Yes! On to Part Three…