Computer stuff Programming Retro-Tech

Decoding TV Teddy – Part Three: Encoding a new TV Teddy Video

In part two I successfully decoded the TV Teddy audio embedded in a TV Teddy video, and I’ve watched the sync’d playback with the original video.

In performing the decoding I learned that:

  • The TV Teddy audio track is created by mapping the audio waveform to shades of grey
  • The bit rate can be only as high as the (number of lines in the frame) x (frames per second), giving an audio bandwidth of about 7 KHz
  • The TV Teddy recording has to have its pitch halved, as the TV Teddy hardware appears to double it on playback, recovering high frequencies linked to a child’s voice that could not fit easily into the original audio bandwith.

So.. surely it would be possible to actually use this knowledge to create a new TV Teddy video?

First I needed a source video that would be ideal for TV Teddy to dialogue with. Given that Databits had inspired me in the first place, TV Teddy would be made to converse with Databits! I returned to his YouTube channel and chose this video:

I decided to work with the first 2m40s of this video – just enought to prove the point and see if it worked out.

I downloaded the video onto my computer and imported it into Final Cut Pro X, then recorded a voiceover track pretending to be TV Teddy reacting to Databits. As the original video is in widescreen HD, I needed to reduce its size to create a 4:3 ‘letterbox’ version. I then edited my voiceover into individual elements and shifted them around to sync at the correct points in the video. In the screenshot below you can see the original Databits video and audio track, and below it my TV Teddy voiceover elements:

The next step was to silence the original audio leaving the TV Teddy voicover at full volume, then get Final Cut Pro X to export just the audio as a 48Kbps 16bit WAV file, its default audio-only export setting. Here’s the result, compressed to MP3 for your own bandwidth convenience. At the risk that this is going to be a bit creepy sounding to you (it is to me!), this is me channelling my inner five-year-old child voice, which you can sync with the beginning of the Databits video at the top of this article (after any pre-roll commercial):

Next, it’s time to ‘up-the-creepy’ by changing my voice timbre to that of a child, accomplished by adjusting the voice changer setting in Apple’s Garageband music editing application. This is the finished ‘TV Teddy’ sound which you can sync with the Databits video as before. Brace yourself..!

In part two I discovered that the TV Teddy audio track has its pitch halved to fit into the audio bandwidth available. Given that, anecdotally, the TV Teddy playback hardware is doubling the pitch, I need to halve the pitch on my TV Teddy recording, so I imported the audio file into Audacity and applied a ‘Change Pitch’ of -50%, resulting in this audio:

Nearly there, but now I need to do two things – change the bit rate to fit into the TV Teddy audio track, then export the audio as an ‘Unsigned 8-bit’ PCM WAV file – both accomplished by Audacity. That export will give me a WAV file with sample values between 0 and 255 – perfect for simple encoding of the appropriate level of grey from black (0) to white (255) for each sample.

I need to calculate the bit rate. I’m going to be encoding a 29.97 frames-per-second NTSC video with 480 lines. In order to stay away from the top and bottom of the frame where the VHS switching-heads might distort the picture, I’m going to start at line 40 and end at line 440. So the bit rate I need to change the audio to is (440 – 40) * 29.97 = 11988 Kbps. Using the Nyquist rule I mentioned in part 2, that maximum audio bandwidth this time is reduced to less than 6 KHz so now we are at ‘AM radio’ quality.

In Audacity I change the bitrate to 11988 and save as ‘8-bit unsigned PCM’ in WAV format. Here’s how it sounds now – this is the actual WAV file as it’s not that large any more:

So I now run some Python code to embed the audio track into the video frames. The code will:

  1. Open the TV Teddy 11988 bit rate WAV file.
  2. Read all the samples from the TV Teddy 11988 bit rate WAV file into an array.
  3. Open the video file for input.
  4. Open a new output video file which will have the TV Teddy track embedded.
  5. Load the next frame into an array which will be of size 640 pixels x 480 lines x 3 colours (R, G and B) and do this:
    1. Loop between lines 0 and 40, turning each line grey with value 127 using the method in the next step.
    2. Loop between line 40 and 440, and for each line do this:
      1. Loop betwen pixel 5 and 30 within each line, and do this:
        1. Loop between the R, G and B colour elements and do this:
          1. Take the ‘next’ WAV audio sample value and apply it to this colour value in the array element[line, pixel, colour]
    3. Loop between lines 440 and 480, turning each line grey with value 127 using the method in the previous step.
    4. Save the frame to the output video.
  6. Close input and output files.

For patent reasons explained in part two, the code below is provided purely for your academic interest and self-education in Python programming.

import cv2
import array
import wave
from binascii import hexlify

print 'TV Teddy Audio Combiner - CV2 version = ', cv2.__version__

incomingTVTeddyAudio = 'media/TVTeddyVoice_11988_8bitUnsignedPCM.wav'
incomingVideoFile = 'media/Databits TV Teddy.mp4'
outgoingTVTeddyVideo = 'media/Databits TV Teddy Output.mp4'

frameCount = 0
success = True
currentSample = 0
wavSampleArray = array.array('i')

audioLineHorizLeftmostPixel = 5
audioLineHorizrightmostPixel = 30
audioLineModulationStartLine = 40
audioLineModulationEndLine = 439
audioLineCentrePixelToRead = 170

# Read in the WAV file to an array
print 'Reading TV Teddy audio WAV file', incomingTVTeddyAudio
tvTeddyAudioInputFile = wave.open(incomingTVTeddyAudio, 'r')
wavLength = tvTeddyAudioInputFile.getnframes()
for sample in range (0, wavLength):
    wavSampleArray.append(int(hexlify(tvTeddyAudioInputFile.readframes(1)), 16))
tvTeddyAudioInputFile.close()
lengthWavSampleArray = len(wavSampleArray)

# Open the incoming video file
print 'Reading incoming Video File', incomingVideoFile
vidcap = cv2.VideoCapture(incomingVideoFile)
if vidcap.isOpened():
    width = vidcap.get(cv2.CAP_PROP_FRAME_WIDTH)  # float
    height = vidcap.get(cv2.CAP_PROP_FRAME_HEIGHT)  # float
    fps = vidcap.get(cv2.CAP_PROP_FPS)
    fc = vidcap.get(cv2.CAP_PROP_FRAME_COUNT)
    print 'Width=', width, ', height=', height, ', fps=', fps, ', framecount=', fc
    # Open the outgoing video file:
    print 'Writing TV Teddy Compatible Video File', outgoingTVTeddyVideo
    fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
    out = cv2.VideoWriter(outgoingTVTeddyVideo, fourcc, 30, (640, 480), True)

    # A reminder that range(x, y) starts at value x but ends at the integer value 1 lower than y
    while success and wavLength > currentSample:
        # Read the next video image frame from the video file
        success, image = vidcap.read()
        if success:
            # Write the first 40 lines as blank medium grey
            # (this could be where a VHS rotary-heads switching moment may occur)
            for lineScan in range(0, audioLineModulationStartLine):
                for lineWidth in range(audioLineHorizLeftmostPixel, audioLineHorizrightmostPixel + 1):
                    for rgb in range(0, 3):
                        image[lineScan, lineWidth, rgb] = 127

            # Now write the sample content down to almost the bottom of the frame
            for lineScan in range(audioLineModulationStartLine, audioLineModulationEndLine):
                for lineWidth in range(audioLineHorizLeftmostPixel, audioLineHorizrightmostPixel + 1):
                    for rgb in range(0, 3):
                        # write the value between 0 and 255 from the wav sample into the video picture
                        # for all three RGB values, for a lineWidth of 20 on every line from 40 to 400:
                        if currentSample < lengthWavSampleArray:
                            image[lineScan, lineWidth, rgb] = wavSampleArray[currentSample]
                        else:
                            # We've reached the end of the TV Teddy audio before the end of the video 
                            # so we'll just fill out the remaining video time with mid value 127
                            image[lineScan, lineWidth, rgb] = 127 
                # Increment the sample indexing variable so the next line must get the next sample value
                currentSample += 1

            # Write the last 40 lines as blank medium grey
            # (this could be where a VHS rotary-heads switching moment may occur, too)
            for lineScan in range(audioLineModulationEndLine, 480):
                for lineWidth in range(audioLineHorizLeftmostPixel, audioLineHorizrightmostPixel + 1):
                    for rgb in range(0, 3):
                        image[lineScan, lineWidth, rgb] = 127

            out.write(image)

            frameCount += 1
            if frameCount % 1000 == 0:
                print '  Samples written so far =', currentSample, ', Video frames written so far =', frameCount

        else:
            continue

    print 'Closing files'
    out.release()
    vidcap.release()
    cv2.destroyAllWindows()
else:
    print 'Failed to open input video file'

print 'Total samples written =', currentSample, ', Total video frames written =', frameCount

print 'Completed'

The resulting video now has the TV Teddy audio track embedded as you can see from this frame:

I’m not far from finishing the video, but I now need to restore the Databits main audio track. I import this video into Final Cut Pro X and copy/paste the audio from the original video. I also add the TV Teddy track to ensure that it looks and sounds synchronised before I silence it and export the finished video. Job done! Of course I could have researched how to use the CV2 python library to ‘pass through’ the original audio track. I will look into this.

Now it’s a case of testing the encoding by using a modified version of my original decoding program designed to work with an NTSC 640×480 video with TV Teddy audio between lines 40 and 440.

For patent reasons explained in part two, the code below is provided purely for your academic interest and self-education in Python programming.

import cv2
import array
import wave

print 'TV Teddy Audio Extractor - CV2 version = ', cv2.__version__

incomingTVTeddyVideo = 'media/Databits TV Teddy Output.mp4'
outgoingTVTeddyAudio = 'media/testresultsout.wav'

frameCount = 0
success = True
sampleArray = array.array('h')
pixelValue = 0
audioLineModulationStartLine = 40
audioLineModulationEndLine = 439
audioLineCentrePixelToRead = 20

# Open incoming video file with TV Teddy soundtrack embedded
vidcap = cv2.VideoCapture(incomingTVTeddyVideo)

if vidcap.isOpened():
    width = vidcap.get(cv2.CAP_PROP_FRAME_WIDTH)  # float
    height = vidcap.get(cv2.CAP_PROP_FRAME_HEIGHT)  # float
    fps = vidcap.get(cv2.CAP_PROP_FPS)
    frameCount = int(vidcap.get(cv2.CAP_PROP_FRAME_COUNT))

    print 'Incoming Video: Width=', width, ', height=', height, ', fps=', fps, ', framecount=', frameCount, ' name =',incomingTVTeddyVideo

    # A reminder that range(x, y) starts at value x but ends at the integer value 1 lower than y
    for currentFrame in range(0, frameCount + 1):
        success, image = vidcap.read()

        if success:
            for scanLine in range(audioLineModulationStartLine, audioLineModulationEndLine + 1):
                sampleValue = 0
                for rgb in range(0, 3):
                    sampleValue += int(image[scanLine, audioLineCentrePixelToRead, rgb])
                sampleArray.append(sampleValue)

        else:
            print 'Failed to read frame', currentFrame

        if currentFrame % 1000 == 0:
            print 'count of frames so far =', currentFrame
            print 'count of samples so far =', len(sampleArray)


print 'Total count of frames =', frameCount
print 'Total count of samples =', len(sampleArray)

print 'Normalising extracted audio...'
# Find the sum of sample sizes and the minimum & maximum sample size
sumSampleSize = 0
maxSampleValue = 0
minSampleValue = 0
for sampleIndex in range(0, len(sampleArray) - 1):
    sumSampleSize += sampleArray[sampleIndex]
    if maxSampleValue < sampleArray[sampleIndex]:
        maxSampleValue = sampleArray[sampleIndex]
    if minSampleValue > sampleArray[sampleIndex]:
        minSampleValue = sampleArray[sampleIndex]

# Calculate mean average sample size
meanSampleSize = int(sumSampleSize / len(sampleArray))

# Now alter the sample values to become rebalanced around zero based
# on the mean sample size, and amplified by multiplying the samples
# based on the amplifyValue:
maxSampleValue = maxSampleValue - meanSampleSize
amplifyValue = 24000 / maxSampleValue
for sampleIndex in range(0, len(sampleArray)):
    sampleArray[sampleIndex] = int((sampleArray[sampleIndex] - meanSampleSize) * amplifyValue)

print 'Writing WAV file with', len(sampleArray), "samples to", outgoingTVTeddyAudio

wavSampleRate = 11988
wavSampleSize = 2
wavChannels = 1
f = wave.open(outgoingTVTeddyAudio, 'w')
f.setparams((wavChannels, wavSampleSize, wavSampleRate, len(sampleArray), "NONE", "Uncompressed"))
f.writeframes(sampleArray.tostring())
f.close()


print 'Completed!'
print 'Now use an audio application such as Audacity (free) to read the output WAV file and' \
      ' increase the pitch by 100% (i.e. double it)'

Here is how the extract sounds – I’ve used Audacity’s ‘Change Pitch’ to alter it by 100% to double it, so this is how it should sound through TV Teddy hardware, although the playback medium (VHS, DVD for example) may cause some further loss of fidelity:

Here you can compare the quality (above) to the how it would have sounded as a ‘pure’ 8-bit PCM file without any of pitch change down, recording onto video as a greyscale track, then later playback and extract the audio from the greyscale and pitch change up again. So this ‘pure track’ below represents the technically best sound that could ever have been achieved:

I’ve sent the final video to Databits who will test it through his real TV Teddy bear and, if it works, make a video of the results. If it doesn’t work we’ll work together to find out what’s not quite right and fix it. Any such changes I’ll document here.

Footnote: The low bit rate nature of encoding audio onto video means there are limitations to the sound quality because of the limited bandwidth. But could we do better using the precision of software today? Next, I’m going to create my own 2018 version of this technology and see if I can encode / decode a Hifi music track, and let you know in Part Four…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *