What Happens When You Develop on Production Data

Warning: Perl programming ahead.

Here’s the story about how I got to do episode 3 of the be the story podcast all over again from scratch.

I had completed all the editing, and there is generally a lot. I spend at least as much time in post as I do recording. On the vocal tracks, if I flub a line, I’ll just repeat it and then edit out the bad one in post. I also get rid of uh’s and double-clutches and so forth, though there aren’t many any more. However, I still frequently do end up with awkward pauses, while I piece together what words I want to say next. And occasionally, I’ll forget to mention a point, record it at the end, and then cut and paste. During editing, I also find lines that didn’t come out as well as I’d like for whatever reason, and I rerecord them and paste them in.

I’m getting better. Episode 6 I knocked out in about 4 hours from concept to complete posting, including QA. Only about 2.5 hours were in recording and post. The “spotlight” episode #2 took a little over 2.5 hours total, not including the time it took me to watch every single episode of Firefly and the Serenity movie. But I took notes while I was watching, which made it easier to do the actual review. The “spotlight” episodes are only about 5 minutes long, but they are also faster paced, and there’s never enough time to say all that wants to be said. So it takes a disproportionate amount of time to produce them.

But back at episode 3, I had done hours of preproduction, recording, and editing, distributed over a week. I had lined up all the spots and music segments in order. All I had to do was make_podcast to assemble all the pieces into the final podcast.

Now, make_podcast is a small Perl program I wrote that uses ecasound and other Unix tools to do the actual audio manipulation. It finds the wav files for the segments, or “parts,” and the spots, and the bumper and theme music. Each episode has its own directory, and all the episode wav files are in that directory, named according to a convention. Actually, the bumper music, theme music, and spots are physically in a shared directory, and those files are symbolically linked into the individual episode directories. Anyhow, make_podcast finds all these files and assembles them into the final podcast episode.

But ecasound (or at least the version I have installed) has a bug. I run make_podcast from the episode directory. So make_podcast searches the current directory for files. It used to pass just the file names to ecasound, which also runs in the episode directory. But ecasound sometimes doesn’t pick up the wav files that way. It’s like it only sometimes finds them if you give relative paths. If you use absolute paths, it works fine. And this is what happened when I ran make_podcast on episode 3. So I decided to fix make_podcast so it wouldn’t invoke the ecasound bug.

There’s one other part to this story. Now, ecasound if the output file already exists, it will write to the existing file, but it won’t truncate it. I never want to record over only the first part of the existing wav output, so make_podcast unlinks the file. Actually, make_podcast uses system(“rm -rf $file”) to do this, so in case $file is a directory or something, the command will still do what you want.

So make_podcast did something like the following (simplified for illustration):

system("rm -rf podcast.wav");
system($ecasound_command); # which outputs to podcast.wav

To sidestep the ecasound bug, I changed to absolute paths instead of relative paths:

my $srcdir = `pwd`;
system("rm -rf $srcdir/podcast.wav");
system($ecasound_command); # which outputs to $srcdir/podcast.wav

Do you see the bug in the above code?

Of course, I have all of my episode files in the episode directory. And I don’t have any of these files under revision control. Revision control wouldn’t handle the wav files, anyhow. And I don’t have backups of any of these files. And as a bona fide software professional, I’m running untested code against live data. (That was sarcasm, by the way.) I have no automated tests for this; they’d be pretty hard to pull off in this case, and for a program that almost never changes. But I’m not even running the development code in a temporary test directory. And that’s what I did that was so stupid.

I run make_podcast again, and I see that all the file names are wrong. They just refer to the directory. They’re missing the filename portion of the path. Ugh. I forgot to chomp($srcdir). I’m always forgetting to chomp. Okay. Now that should work. I run it again. Now, I see all these messages about it not being able to see the various wav files. So I look at my shell window. Yup, I’m in the correct directory. I try ls, and it says something about the inode being invalid. I’ve seen messages like this over NFS, when the server and client get out of synch, usually because the server rebooted. But I’m not using NFS. Nonetheless, I cd to the parent directory and then cd back into the episode directory. It says the directory doesn’t exist. I ls the parent directory. Yup. It’s not there. What happened?

All at once, my mind puts the pieces together. What happens when you do system(“rm -rf $srcdir/podcast.wav”) and there’s a linefeed at the end of $srcdir? Uh. The data has to be there somewhere, right? I must’ve had a copy, even one hours old, right? All the audio? All that work? I had the thing practically done. I even had the blog entry written. The only part I hadn’t done was to copy the blog entry over to the web site. I don’t even have the blog notes, do I? I don’t have any of my notes!

I need to take a walk, take a break. Then, panic. It took me all week to put this together. It’s Saturday night at 10 PM, and I need to have this done by tomorrow. Well, maybe it won’t be so bad. If I do it now, I bet I can remember all the points and the order in which I made them. I just spent endless hours getting that right. I still can hardly breathe, my heart is thumping in my chest.

Obviously, I did get it done, and on time. But now I make a backup copy of my working directory at each milestone. And I always develop code on temporary data, or at least on a backup copy of the data.

This entry was posted in Uncategorized and tagged , , , . Bookmark the permalink.

Leave a reply