Duplo

Building Blocks & Learning Experiences

Update: due to wide-spread changes in the MPD git repository, this patch no longer applies and builds correctly.  You can still play with the shout MP3 plugin by cloning my shiny-new git repository at: http://git.musicpd.org/encoded/mpd.git See the comments section below for information about configuration changes.  The plugin is currently on the roadmap for the 0.14.0 release, so keep your fingers crossed.

I’ve updated my patch so that it can be applied against the current MPD git repository.  Hopefully it will be rolled into MPD for the 0.14.0 release.  The patch has been improved to do better configuration-time checking for the lame library.  I also found a bug where I wasn’t setting the bit rate properly if constant bit rate was requeted.

I’m hoping to roll this out to Mildred soon.

Here’s the link:  0001-Creation-of-shout_mp3-audio-output-plugin.-Basicall.patch

MPD is a great little project that collects, organizes, and plays your music files for you.  You can connect to MPD and listen to your music in a great number of ways.  You can even stream your music out as your own radio station (Mildred makes use of this functionality).

I’ve always thought it odd that MPD would stream in Ogg, but not MP3.  Don’t get me wrong, I’m a big fan of open standards and all, but MP3 is just so dominant, that I was surprised MP3 streaming hadn’t been implemented.  Add to that the stupid inability of Apple’s iTunes to stream Ogg and well, it motivated me.  I sat down and hacked together a MPD plugin that allows you to stream MP3 to your shoutcast server from MPD.

Source patches for the plugin are available here:  http://kill-0.com/patches/shout_mp3.patch

It’s rough around the edges, but it does work.  You can even play it in iTunes.

Enjoy!

Voyeur is a very small bit of ruby code that I’ve written to display updates to your MPD stream via Growl.

You can find it here:  http://kill-0.com/voyeur/

You start and stop it as you would a daemon, ie $ ./voyeur start and ./voyeur stop. As an added bonus, you can send it a HUP signal, and it will immediately display the current track information or status. Here’s a screen shot:

Voyeur: Screenshot

I was having all sorts of problems upgrading Mildred to Rails 2.1.  A lot of the errors I was seeing were like the following:

ArgumentError in 'TracksController downloading a track by admin should not be a redirection'
wrong number of arguments (0 for 1)
/Users/ewollesen/src/mildred/app/models/track.rb:52:in `title'
/Users/ewollesen/src/mildred/app/models/track.rb:52:in `filename'
/Users/ewollesen/src/mildred/app/controllers/tracks_controller.rb:49:in `download'
./spec/controllers/tracks_controller_spec.rb:82:

It took me a good while to figure out what was going on. The error was strange because as far as I knew, title was a database attribute in the Album model, which was a descendant of ActiveRecord. There shouldn’t have been any arguments required. Indeed, firing up my debugger and running track.album.read_attribute(:title) returned the expected result of “Test Album 1″. I was very puzzled.

I realized that my title method was being overridden, but by what? I started feeding the title method random arguments in the hopes of learning something new. It wasn’t long before I got lucky:

(rdb:1) e track.album.title("x")
NoMethodError Exception: undefined method `content_tag' for #<Album:0x3d5bbfc>

That tipped me off as to what was overriding my title method. The content_for is part of a pattern I use to set a view’s title in the layout via a method called title in my ApplicationHelper. This made me think of how in Rspec 1.1.4, a Helper module is no longer included by default. I popped over to my application_helper_spec.rb and found something similar to this:

require File.dirname(__FILE__) + '/../spec_helper'

include ApplicationHelper

describe "ApplicationHelper" do

This needed to be changed to:

require File.dirname(__FILE__) + '/../spec_helper'

describe "ApplicationHelper" do

  include ApplicationHelper

Then all was well. Whew. Just to be sure I didn’t run into this sort of thing again, I went ahead and made sure that all of my other Helper specs followed this paradigm.

This is the second time I’ve run into an issue where RSpec had overridden some seemingly random method in some seemingly random object. I guess there’s a lot of magic in there that I don’t have my head wrapped around yet. I just wish it were easier to spot!

I’m going to discuss some basic probabilities about D&D dice rolling. Yes, I’m that big of a geek. At the same time, I am not a statistician, so it’s entirely possible I’ve screwed something up.

Rolling a single six-sided die (d6), we know that the probability of rolling any given value is 1 in 6. This should not a be a surprise. If it is, you might want to brush up on your dice knowledge. In the older days of D&D we would typically roll 3 six-sided dice (3d6), and take their sum to use as one of our ability scores (eg strength, intelligence, charisma, etc). In more recent editions of the game, a new method has emerged, generally known as 4d6 drop lowest. It’s just what it sounds like, you roll 4d6, remove the lowest of the rolls, and sum the remaining three to use for your ability score. This can make it a little trickier to figure out what the average and standard deviation are. Fortunately, it doesn’t make it so different that a little scripting can’t find the answers for us.

So how do we calculate these means and standard deviations? Simple, we create a two dimensional array, and in that array, we place all the possible ways we could roll 4d6. It basically looks like this:

[ [ 1, 1, 1, 1 ],
[ 1, 1, 1, 2 ],
[ 1, 1, 1, 3 ],
…
[ 6, 6, 6, 4 ],
[ 6, 6, 6, 5 ],
[ 6, 6, 6, 6 ], ]

Now, we sum each row, and find the mean of those sums, we’ll have found the average roll when rolling 4d6. That’s close to what we want. To simulate dropping the lowest, I simply find the lowest die roll in each row, and set its value to 0. Then I repeat the process above, summing each row, and the average of those sums is now the average for 4d6 drop lowest. If I take the standard deviation of those rolls, I’ll have the standard deviation of this method. All this leads to the following:

Max 18
Min 3
Mean 12.2446
Std Dev 2.8468

But what does it mean? Simply put, if you roll 4d6 drop lowest, you are most likely to roll a 12. Additionally, there is a ? 68% chance that you’ll roll within one standard deviation of the mean, or [9,15]. The chances that you’ll get any given roll are in the table below:

Roll Probability Frequency ? Odds
3 0.000772 1 1 in 1,295
4 0.003086 4 1 in 324
5 0.007716 10 1 in 130
6 0.016204 21 1 in 62
7 0.029321 38 1 in 34
8 0.047840 62 1 in 21
9 0.070216 91 1 in 14
10 0.094136 122 1 in 11
11 0.114198 148 1 in 9
12 0.128858 167 1 in 8
13 0.132716 172 1 in 8
14 0.123457 160 1 in 8
15 0.101080 131 1 in 10
16 0.072531 94 1 in 14
17 0.041667 54 1 in 24
18 0.016204 21 1 in 62

Here’s a pretty picture for you:

4d6 drop lowest

You might notice that the mean is marked below the peak of the distribution. I’m not entirely sure why that is, but I suspect it has to do with the facts that we’re dropping the lowest roll, combined with how the curve gets cut off at 18. Dropping the lowest roll shifts the mean to a higher number, but because we can’t possibly go higher than 18, the right-side tail of the distribution gets cut off, causing the mean to be less than the peak of the distribution.

Lastly, here’s a little python script that can calculate these things for you. It will also generate two gnuplot data files, one of which was used to generate the pretty picture above. Yes, this code is quick and dirty, and certainly not production quality, but it gets the job done.

#!/usr/bin/env python
# -*- coding: iso-8859-15 -*-

from numpy import *
import sys
import os
import random
import pprint

outcomes = []

# roll 4d6
for d1 in range(1, 7):
    for d2 in range(1, 7):
        for d3 in range(1, 7):
            for d4 in range(1, 7):
outcome = array([d1, d2, d3, d4])
outcomes.append(outcome)

outcomes = array(outcomes)
hist = {}

# drop the lowest, count the number of times a given sum is found
for x in range(len(outcomes)):
    outcomes[x][outcomes[x].argmin()] = 0
    if outcomes[x].sum() in hist:
        hist[outcomes[x].sum()] += 1
    else:
        hist[outcomes[x].sum()] = 1

rsum = outcomes.sum()
sums = outcomes.sum(axis=1)
mean = sums.mean()
sigma = sums.std()

counts = array(hist.values())
hist2 = hist.copy()

# determine the probability of each sum occurring
for key,value in hist2.items():
    hist2[key] = counts[key-3] / float(counts.sum())

f = open("dice.data", "w")
for key in hist2.keys():
print >>f, "%d\t%0.12f" % (key, hist2[key])
f.close()

f = open("dice2.data", "w")
for key,value in hist2.items():
    print >>f, "%d\t%f" % (key, value)
f.close()

pprint.pprint(hist.values())

args = {
    "mean": mean,
    "sigma": sigma,
    "-sigma": mean - sigma,
    "+sigma": mean + sigma,
    "++sigma": mean + (2 * sigma),
    "--sigma": mean - (2 * sigma),
    "+++sigma": mean + (3 * sigma),
    "---sigma": mean - (3 * sigma),
}
labels = {
    "+sigma_label": mean + sigma - 0.1,
    "-sigma_label": mean - sigma + 0.1,
    "++sigma_label": mean + (2*sigma) - 0.1,
    "--sigma_label": mean - (2*sigma) + 0.1,
    "+++sigma_label": mean + (3*sigma) - 0.1,
    "---sigma_label": mean - (3*sigma) + 0.1,
    "mean_label": mean - 0.1,
}
args.update(labels)
gnuplot = """
set terminal png enhanced size 800,600 font '/Library/Fonts/Arial.ttf' 8
set output "dice.png"
set arrow from %(mean)f,0 to %(mean)f,0.14 nohead lc 1
set label "? (%(mean)0.2f)" at %(mean_label)f,0.05 right tc ls 1
set arrow from %(-sigma)f,0 to %(-sigma)f,0.14 nohead lc 2
set label "?? (%(-sigma)0.2f)" at %(-sigma_label)f,0.05 tc ls 2
set arrow from %(+sigma)f,0 to %(+sigma)f,0.14 nohead lc 2
set label "+? (%(+sigma)0.2f)" at %(+sigma_label)f,0.05 right tc ls 2

set arrow from %(--sigma)f,0 to %(--sigma)f,0.14 nohead lc 2
set label "?2? (%(--sigma)0.2f)" at %(--sigma_label)f,0.05 tc ls 2
set arrow from %(++sigma)f,0 to %(++sigma)f,0.14 nohead lc 2
set label "+2? (%(++sigma)0.2f)" at %(++sigma_label)f,0.05 right tc ls 2

set arrow from %(---sigma)f,0 to %(---sigma)f,0.14 nohead lc 3
set label "?3? (%(---sigma)0.2f)" at %(---sigma_label)f,0.05 tc ls 3
set arrow from %(+++sigma)f,0 to %(+++sigma)f,0.14 nohead lc 3
set label "+3? (%(+++sigma)0.2f)" at %(+++sigma_label)f,0.05 right tc ls 3

plot 'dice.data' with lines smooth csplines title '4d6 drop lowest'
""" % args

f = open("dice.gnuplot", "w")
f.write(gnuplot)
f.close()

gnuplot = """
set terminal png enhanced size 800,600 font '/Library/Fonts/Arial.ttf' 8
set output "dice2.png"
set boxwidth 0.9 relative
set style data histograms
set style fill solid 1.0 border -1
set style data histogram
plot 'dice2.data' using 2:xticlabels(1) title '4d6 drop lowest'
"""

f = open("dice2.gnuplot", "w")
f.write(gnuplot)
f.close()

print "mean: %f" % mean
print "sigma: %f" % sigma
print "1 sigma: %0.4f - %0.4f" % (mean - sigma, mean + sigma)
print "2 sigma: %0.4f - %0.4f" % (mean - 2*sigma, mean + 2*sigma)
print "3 sigma: %0.4f - %0.4f" % (mean - 3*sigma, mean + 3*sigma)

os.system("gnuplot 'dice.gnuplot'")
os.system("gnuplot 'dice2.gnuplot'")

Download the 4d6 Drop Lowest Python Script.

Rails’s JavaScript generator provides some great functionality. But unfortunately, one of the things it can’t do for you, is generate JavaScript conditionally, based on the page’s content. The reason is simply that the page’s content doesn’t exist at the time the conditional in your code was executed. I wanted to do something like the following:

if page["album_1_rating"]
  page["album_1_rating"].visual_effect(:highlight)
end

As I said, at the time the if statement above is executed, there is no HTML file yet, so the generator can’t possibly know if the element referenced will exist in the file or not. In fact, page["album_1_rating"] is a ActionView::Helpers::JavaScriptElementProxy object, which is not false, so the if statement will always execute. This wasn’t what I wanted. So I came up with something that would work. I added the following method to lib/prototype_helper_hacks.rb, then required it in config/environment.rb.

def if_element(id, &block)
  self << "if ($(\"#{id}\")) {"
  block.call(self[id])
  self << "}"
end

This lets me do things like this in my views:

page.if_element("album_1_rating") do |element|
  element.visual_effect(:highlight)
end

That made me happy.

In thinking more about the situation, I’m not sure this is really such a great method. It works, but I can’t help but feel there’s some better way of going about this. I could simply use the page append method, to do something like:

page << "if ($(id)) {"
  page["album_1_rating"].visual_effect(:highlight)
page << "}"

But at least if nothing else, my if_element method makes this a lot cleaner looking. So really this is just some syntactic sugar, and I wish at least I could make it more of a general purpose if statement, but since I don’t need that sort of funcationality at this point, YAGNI tells me to leave that hurdle for another day.

The short of it, as you can find with a quick Google search1, is that << is superior to + (or +=) in Ruby. This is because << appends to an existing object, while += creates a new object. Creating a new object in Ruby is not nearly as fast as appending to an existing object. This should not be surprising to anyone that has written the two operations in a lower level language like C or C++. Now, on with the story.

At work, I’m writing a parser for less-than-well-known file format. In the lexical analysis stage, I was adding arrays together. Something along the lines of:

@tokens += token

Where @tokens is an array, and token is an array of the form [:symbol, character]2. At any rate, when I tried to parse a very large file, it took approximately 180 seconds. My smaller tests had all been very fast. Looking at the code, the += was the first thing that came to mind. I checked it with the benchmarker and found that a particular loop full of +=’s was particularly offensive. A quick refactor from += to << later, and parsing the same file now took just under 9 seconds. That’s down from 180 seconds. Nice.

So what have I learned here? Well, I’m not sorry that I used +=. It worked just fine, and anyone that knows me knows I recite “premature optimization is the devil” at least 10 times a day. Now that I’m in the final test phases of the parser, I can take the time to optimize things like this. However this also reinforced the adage of “everything is fast when n is small.” A good reminder to test with larger datasets before claiming any algorithm is complete.

  1. Like this one. []
  2. If you’re paying attention you’ll recognize this format as that used by racc. []

No, I’m not going to say that using MySQL with Ruby on Rails is a bad thing. It works fine. But I do think that using the mysql command line tool to modify your Rails databases is a no-no. The reason is simple, Rails does a lot of work to abstract database details away, so that programmers don’t have to know anything about them. We should respect and embrace that abstraction, not step around it.

So if you’re not supposed to use mysql from the command line, how do you make changes to your models from outside of your application? The answer is script/console. Using console is even better than using mysql, because it a) uses Ruby, a language we already know and love, and b) exercises the same bug-free, well tested code that your Rails application does. Of special use here are ActiveRecord’s validations, associations, and before_* and after_* callbacks.

As a brief example, let’s say you’re writing the ubiquitous blog. You need to delete a user on the blog, but you haven’t implemeted the web framework yet. You open up your trusty mysql command line client and do something like this:

mysql> DELETE FROM users WHERE username = 'foobar';
Query OK, 1 row affected (0.08 sec)

And you think to yourself, “Job well done” as you wipe your hands and pat yourself on the back. But many of you are probably asking, “What happened to all of foobar’s posts?” They’re still there. I guess you have to delete them too:

mysql> DELETE FROM posts WHERE user_id = 666;
Query OK, 1 row affected (0.11 sec)

It’s a good thing you happened to remember foobar’s user_id huh? So that should do it… right? But what about foobar’s user image? Now you have to go into the filesystem and delete his user image.

$ rm RAILS_ROOT/public/images/users/foobar.jpg

Whew. There, that’s all of it. Finally. Or is it? What if your colleague added the concept of “friendships” between users, and now you have a bunch of orphaned table rows? If this doesn’t seem error prone (and tedious!) already, then you’re a lot braver than I am. Especially since if you hadn’t tried so hard to work around Rails’s database abstraction, all you would have had to do is:

>> User.find_by_username("foobar").destroy

That’s it. One simple line. Your well-designed models will take care of the rest. In case you’re new to Rails, and wondering just what this magical User model might look like in Rails, here it is:

class User < ActiveRecord::Base
  after_destroy :delete_user_image
  has_many :posts, :dependent => :destroy

  private

  def delete_user_image
    FileUtils.rm(RAILS_ROOT + "/public/images/users/#{username}.jpg")
  end
end

Yes, if you’re a true wiz, you could write all of that into your database, but you’re trying to write a Ruby on Rails application, remember? Besides, what happens when your CTO decides that your application will now use DatabaseX, since it’s the hottest thing since the sun? If that frightening occurance isn’t enough, consider that DHH, the creator of Rails, frowns on such database trickery.

I guess the real point of what I’m trying to say is, Rails uses the database so you don’t have to. Try to pretend there is no database, and use script/console instead. You’ll save yourself a lot of grief in the end.

How QBeat Works

No comments

QBeat is one piece of the Mildred project. Specifically, QBeat handles the selection of tracks to play. There are a number of subsytems within QBeat that allow tracks to be selected and played. I’ll start with a list of all the pertinent subsystems, then I’ll describe the main responsibilities of each subsystem.

Subsystems

  • QBeat
  • MPD
  • MpdMonitor
  • Selector
    • WeightedSelector
      • Stats
      • Weight
  • Schedule
  • TimeSlot
  • ProgrammingBlock
  • Fetcher

QBeat

QBeat is the glue that holds all the other pieces together. It gets things rolling, daemonizes the process, opens log files, handles signals, etc. Once the housekeeping is done, QBeat enters a loop, passing control to the various subsystems as appropriate.

MPD

MPD, the Music Player Daemon, is a wholly separate program, one that QBeat uses to handle the nitty gritty of playing audio files. QBeat simply tells MPD which files to play, and MPD takes care of opening the files, and sending them to the appropriate audio devices.

MpdMonitor

The MpdMonitor monitors the MPD playlist (imagine that!) It checks the MPD playlist’s length, and indicates to the Selector if a track should be added to the playlist. When additional tracks are to be queued, the Selector picks the tracks, and MpdMonitor feeds this information to MPD. MpdMonitor also monitors the currently playing track, and reports new tracks as they are played. Lastly, MpdMonitor trims the MPD playlist whenever it grows too long.

Selector

Selector is an abstract class, meaning it is meant to be extended, not used as is. It provides the basic interface expected of a Selector. A Selector’s job is to actually pick the next track (or tracks) to queue.

The Selector class also provides a number of utility functions to derivative Selectors. These functions help with such tasks as loading tracks from ProgrammingBlocks, filtering tracks by user-specified criteria (such as minimum time, or minimum rating), and determining (with help from the ProgrammingBlock) if a single track, whole album, or power block should be queued.

QBeat was designed to allow Selectors to be changed easily. This allows for the easy development of new Selectors, with totally different selection behavior. For example, in addition to WeightedSelector, I have a few test selectors which do such things as selecting tracks 100% at random, or selecting tracks in alphabetical order by title.

WeightedSelector

WeightedSelector is a derivative Selector, and the one generally in use by QBeat. It evaluates each track, and assigns it a weight. Weights are based on when the track, its album, and its artist were last queued, as well as their ratings. Last queued and rating data are normalized, and their sum becomes the track’s weight. Special modifications are made to reduce the effect of high ratings, or to weight new tracks more heavily. The final track to be selected is picked at random from the top weighted tracks.

WeightedSelector is where most of QBeat’s time is spent. It is also where I spend the most of my time tweaking, debugging, and dreaming.

Stats

This is a helper class that maintains minimum, maximum, and range stats for the last queued and rating data of tracks. These stats are very useful in calculating normalized values.

Weight

This is a helper class that allows me to easily calculate, compare and debug the weights of different tracks, artists, or albums.

Schedule

The schedule class determines which ProgrammingBlock should be fed to the Selector. It also provides the Mildred website with support in generating the Schedule page, which displays the Schedule for a given date.

The Schedule loads itself from a YAML file. The file describes which ProgrammingBlocks should be played at what times. Any time a specific ProgrammingBlock is not specified, a default random block is used. ProgrammingBlocks can be scheduled by start and end times, days of the week, days of the month, and months of the year.

TimeSlot

The TimeSlot helps the Schedule determine which ProgrammingBlock to choose by allowing the current time to be compared to each of the times specified in the Schedule. For example, the Schedule will load all of the ProgrammingBlocks. It will then filter out those that aren’t eligible to be scheduled today. Then, with TimeSlot’s help, it orders the remaining ProgrammingBlocks by start and end times. Then it can iterate through the ProgrammingBlocks, and determine which one (if any) should be currently active.

ProgrammingBlock

The ProgrammingBlock determines which artists, albums, or moods should be played. Like Schedule, it is read from a YAML file. ProgrammingBlocks also specify the probability of queueing a single track, an entire album, or a power block. Many ProgrammingBlocks exist, but only one is active at a time, as determined by the Schedule.

Fetcher

The Fetcher is a simple utility class with one purpose, to load the actual Track objects specified in the ProgrammingBlock. This means taking the artists, albums, and moods specified in the ProgrammingBlock, and returning a list of Tracks.

The QBeat Loop

The actual loop looks something like this:

  1. MpdMonitor checks to see if the currently playing track has changed, reporting the new track if it has
  2. MpdMonitor checks the size of the MPD playlist
    1. Selector picks a new track if the MPD playlist is too short
    2. MpdMonitor adds the tracks specified by Selector
  3. MpdMonitor checks the size of the MPD playlist, and trims it if it is too long
  4. QBeat sleeps for some amount of time (currently 10 seconds)

For mildred, I have a weighting system that determines which track to queue next. I call this system QBeat. I often try various tweaks to the algorithm for track weighting; trying and get that “perfect mix.” Debugging this system has proven to be a challenge, largely due simply to the amount of time it takes to collect enough data to be able to determine if the system has any quirks. With my current setup, it takes about 4 minutes to calculate the weight of every track in my collection (more about that in another post.) So even queuing things as fast as possible isn’t always a feasible solution for data generation.

So to help me visualize is going with QBeat, I inserted a number of “tracers”, that is, tracks that report their weight every time a new track is added to the queue. This should allow me to do some empirical testing of the system. I really should be working further to develop better unit tests, but what the hell, this is fun. :)

So without further ado, I present the output from my viz_qbeat script:

http://black.xmtp.net/~ericw/viz_qbeat_snapshot-2007-08-10-17-25.html

The link above is a static version of the page. The live page is updated every five minutes by a cron job. The X axis increments every time a new song is added to the queue. The Y axis is the track’s weight. The list at the bottom is each track, as it was enqueued. A track’s weight rises proportionally with its likeliness of being enqueued.

So what can I learn from this static snapshot? Check out the medium blue line (“Cold Heart” by Project Pitchfork), which starts at a weight of approximately 2.6. At around 38 on the X axis, its weight drops to 1.2. Why did that happen? Looking down through the list of queued tracks, we reach 38, and find that “Inquisition” by Skinny Puppy was queued at that time. While “Inquisition” is not a track by Project Pitchfork, we can see that it is on an album titled Industrial Beatdown, and I happen to know that “Cold Heart” is also on that album. So QBeat worked like it should. Joy.

Here’s a link to the live site: http://black.xmtp.net/~ericw/viz_qbeat.html

—————-
Now playing: Tool – The Patient
via FoxyTunes