Duplo

Building Blocks & Learning Experiences

Browsing Posts published by ericw

I’m going to discuss some basic probabilities about D&D dice rolling. Yes, I’m that big of a geek. At the same time, I am not a statistician, so it’s entirely possible I’ve screwed something up.

Rolling a single six-sided die (d6), we know that the probability of rolling any given value is 1 in 6. This should not a be a surprise. If it is, you might want to brush up on your dice knowledge. In the older days of D&D we would typically roll 3 six-sided dice (3d6), and take their sum to use as one of our ability scores (eg strength, intelligence, charisma, etc). In more recent editions of the game, a new method has emerged, generally known as 4d6 drop lowest. It’s just what it sounds like, you roll 4d6, remove the lowest of the rolls, and sum the remaining three to use for your ability score. This can make it a little trickier to figure out what the average and standard deviation are. Fortunately, it doesn’t make it so different that a little scripting can’t find the answers for us.

So how do we calculate these means and standard deviations? Simple, we create a two dimensional array, and in that array, we place all the possible ways we could roll 4d6. It basically looks like this:

[ [ 1, 1, 1, 1 ],
[ 1, 1, 1, 2 ],
[ 1, 1, 1, 3 ],
…
[ 6, 6, 6, 4 ],
[ 6, 6, 6, 5 ],
[ 6, 6, 6, 6 ], ]

Now, we sum each row, and find the mean of those sums, we’ll have found the average roll when rolling 4d6. That’s close to what we want. To simulate dropping the lowest, I simply find the lowest die roll in each row, and set its value to 0. Then I repeat the process above, summing each row, and the average of those sums is now the average for 4d6 drop lowest. If I take the standard deviation of those rolls, I’ll have the standard deviation of this method. All this leads to the following:

Max 18
Min 3
Mean 12.2446
Std Dev 2.8468

But what does it mean? Simply put, if you roll 4d6 drop lowest, you are most likely to roll a 12. Additionally, there is a ? 68% chance that you’ll roll within one standard deviation of the mean, or [9,15]. The chances that you’ll get any given roll are in the table below:

Roll Probability Frequency ? Odds
3 0.000772 1 1 in 1,295
4 0.003086 4 1 in 324
5 0.007716 10 1 in 130
6 0.016204 21 1 in 62
7 0.029321 38 1 in 34
8 0.047840 62 1 in 21
9 0.070216 91 1 in 14
10 0.094136 122 1 in 11
11 0.114198 148 1 in 9
12 0.128858 167 1 in 8
13 0.132716 172 1 in 8
14 0.123457 160 1 in 8
15 0.101080 131 1 in 10
16 0.072531 94 1 in 14
17 0.041667 54 1 in 24
18 0.016204 21 1 in 62

Here’s a pretty picture for you:

4d6 drop lowest

You might notice that the mean is marked below the peak of the distribution. I’m not entirely sure why that is, but I suspect it has to do with the facts that we’re dropping the lowest roll, combined with how the curve gets cut off at 18. Dropping the lowest roll shifts the mean to a higher number, but because we can’t possibly go higher than 18, the right-side tail of the distribution gets cut off, causing the mean to be less than the peak of the distribution.

Lastly, here’s a little python script that can calculate these things for you. It will also generate two gnuplot data files, one of which was used to generate the pretty picture above. Yes, this code is quick and dirty, and certainly not production quality, but it gets the job done.

#!/usr/bin/env python
# -*- coding: iso-8859-15 -*-

from numpy import *
import sys
import os
import random
import pprint

outcomes = []

# roll 4d6
for d1 in range(1, 7):
    for d2 in range(1, 7):
        for d3 in range(1, 7):
            for d4 in range(1, 7):
outcome = array([d1, d2, d3, d4])
outcomes.append(outcome)

outcomes = array(outcomes)
hist = {}

# drop the lowest, count the number of times a given sum is found
for x in range(len(outcomes)):
    outcomes[x][outcomes[x].argmin()] = 0
    if outcomes[x].sum() in hist:
        hist[outcomes[x].sum()] += 1
    else:
        hist[outcomes[x].sum()] = 1

rsum = outcomes.sum()
sums = outcomes.sum(axis=1)
mean = sums.mean()
sigma = sums.std()

counts = array(hist.values())
hist2 = hist.copy()

# determine the probability of each sum occurring
for key,value in hist2.items():
    hist2[key] = counts[key-3] / float(counts.sum())

f = open("dice.data", "w")
for key in hist2.keys():
print >>f, "%d\t%0.12f" % (key, hist2[key])
f.close()

f = open("dice2.data", "w")
for key,value in hist2.items():
    print >>f, "%d\t%f" % (key, value)
f.close()

pprint.pprint(hist.values())

args = {
    "mean": mean,
    "sigma": sigma,
    "-sigma": mean - sigma,
    "+sigma": mean + sigma,
    "++sigma": mean + (2 * sigma),
    "--sigma": mean - (2 * sigma),
    "+++sigma": mean + (3 * sigma),
    "---sigma": mean - (3 * sigma),
}
labels = {
    "+sigma_label": mean + sigma - 0.1,
    "-sigma_label": mean - sigma + 0.1,
    "++sigma_label": mean + (2*sigma) - 0.1,
    "--sigma_label": mean - (2*sigma) + 0.1,
    "+++sigma_label": mean + (3*sigma) - 0.1,
    "---sigma_label": mean - (3*sigma) + 0.1,
    "mean_label": mean - 0.1,
}
args.update(labels)
gnuplot = """
set terminal png enhanced size 800,600 font '/Library/Fonts/Arial.ttf' 8
set output "dice.png"
set arrow from %(mean)f,0 to %(mean)f,0.14 nohead lc 1
set label "? (%(mean)0.2f)" at %(mean_label)f,0.05 right tc ls 1
set arrow from %(-sigma)f,0 to %(-sigma)f,0.14 nohead lc 2
set label "?? (%(-sigma)0.2f)" at %(-sigma_label)f,0.05 tc ls 2
set arrow from %(+sigma)f,0 to %(+sigma)f,0.14 nohead lc 2
set label "+? (%(+sigma)0.2f)" at %(+sigma_label)f,0.05 right tc ls 2

set arrow from %(--sigma)f,0 to %(--sigma)f,0.14 nohead lc 2
set label "?2? (%(--sigma)0.2f)" at %(--sigma_label)f,0.05 tc ls 2
set arrow from %(++sigma)f,0 to %(++sigma)f,0.14 nohead lc 2
set label "+2? (%(++sigma)0.2f)" at %(++sigma_label)f,0.05 right tc ls 2

set arrow from %(---sigma)f,0 to %(---sigma)f,0.14 nohead lc 3
set label "?3? (%(---sigma)0.2f)" at %(---sigma_label)f,0.05 tc ls 3
set arrow from %(+++sigma)f,0 to %(+++sigma)f,0.14 nohead lc 3
set label "+3? (%(+++sigma)0.2f)" at %(+++sigma_label)f,0.05 right tc ls 3

plot 'dice.data' with lines smooth csplines title '4d6 drop lowest'
""" % args

f = open("dice.gnuplot", "w")
f.write(gnuplot)
f.close()

gnuplot = """
set terminal png enhanced size 800,600 font '/Library/Fonts/Arial.ttf' 8
set output "dice2.png"
set boxwidth 0.9 relative
set style data histograms
set style fill solid 1.0 border -1
set style data histogram
plot 'dice2.data' using 2:xticlabels(1) title '4d6 drop lowest'
"""

f = open("dice2.gnuplot", "w")
f.write(gnuplot)
f.close()

print "mean: %f" % mean
print "sigma: %f" % sigma
print "1 sigma: %0.4f - %0.4f" % (mean - sigma, mean + sigma)
print "2 sigma: %0.4f - %0.4f" % (mean - 2*sigma, mean + 2*sigma)
print "3 sigma: %0.4f - %0.4f" % (mean - 3*sigma, mean + 3*sigma)

os.system("gnuplot 'dice.gnuplot'")
os.system("gnuplot 'dice2.gnuplot'")

Download the 4d6 Drop Lowest Python Script.

Rails’s JavaScript generator provides some great functionality. But unfortunately, one of the things it can’t do for you, is generate JavaScript conditionally, based on the page’s content. The reason is simply that the page’s content doesn’t exist at the time the conditional in your code was executed. I wanted to do something like the following:

if page["album_1_rating"]
  page["album_1_rating"].visual_effect(:highlight)
end

As I said, at the time the if statement above is executed, there is no HTML file yet, so the generator can’t possibly know if the element referenced will exist in the file or not. In fact, page["album_1_rating"] is a ActionView::Helpers::JavaScriptElementProxy object, which is not false, so the if statement will always execute. This wasn’t what I wanted. So I came up with something that would work. I added the following method to lib/prototype_helper_hacks.rb, then required it in config/environment.rb.

def if_element(id, &block)
  self << "if ($(\"#{id}\")) {"
  block.call(self[id])
  self << "}"
end

This lets me do things like this in my views:

page.if_element("album_1_rating") do |element|
  element.visual_effect(:highlight)
end

That made me happy.

In thinking more about the situation, I’m not sure this is really such a great method. It works, but I can’t help but feel there’s some better way of going about this. I could simply use the page append method, to do something like:

page << "if ($(id)) {"
  page["album_1_rating"].visual_effect(:highlight)
page << "}"

But at least if nothing else, my if_element method makes this a lot cleaner looking. So really this is just some syntactic sugar, and I wish at least I could make it more of a general purpose if statement, but since I don’t need that sort of funcationality at this point, YAGNI tells me to leave that hurdle for another day.

The short of it, as you can find with a quick Google search1, is that << is superior to + (or +=) in Ruby. This is because << appends to an existing object, while += creates a new object. Creating a new object in Ruby is not nearly as fast as appending to an existing object. This should not be surprising to anyone that has written the two operations in a lower level language like C or C++. Now, on with the story.

At work, I’m writing a parser for less-than-well-known file format. In the lexical analysis stage, I was adding arrays together. Something along the lines of:

@tokens += token

Where @tokens is an array, and token is an array of the form [:symbol, character]2. At any rate, when I tried to parse a very large file, it took approximately 180 seconds. My smaller tests had all been very fast. Looking at the code, the += was the first thing that came to mind. I checked it with the benchmarker and found that a particular loop full of +=‘s was particularly offensive. A quick refactor from += to << later, and parsing the same file now took just under 9 seconds. That’s down from 180 seconds. Nice.

So what have I learned here? Well, I’m not sorry that I used +=. It worked just fine, and anyone that knows me knows I recite “premature optimization is the devil” at least 10 times a day. Now that I’m in the final test phases of the parser, I can take the time to optimize things like this. However this also reinforced the adage of “everything is fast when n is small.” A good reminder to test with larger datasets before claiming any algorithm is complete.

  1. Like this one. []
  2. If you’re paying attention you’ll recognize this format as that used by racc. []

No, I’m not going to say that using MySQL with Ruby on Rails is a bad thing. It works fine. But I do think that using the mysql command line tool to modify your Rails databases is a no-no. The reason is simple, Rails does a lot of work to abstract database details away, so that programmers don’t have to know anything about them. We should respect and embrace that abstraction, not step around it.

So if you’re not supposed to use mysql from the command line, how do you make changes to your models from outside of your application? The answer is script/console. Using console is even better than using mysql, because it a) uses Ruby, a language we already know and love, and b) exercises the same bug-free, well tested code that your Rails application does. Of special use here are ActiveRecord’s validations, associations, and before_* and after_* callbacks.

As a brief example, let’s say you’re writing the ubiquitous blog. You need to delete a user on the blog, but you haven’t implemeted the web framework yet. You open up your trusty mysql command line client and do something like this:

mysql> DELETE FROM users WHERE username = 'foobar';
Query OK, 1 row affected (0.08 sec)

And you think to yourself, “Job well done” as you wipe your hands and pat yourself on the back. But many of you are probably asking, “What happened to all of foobar’s posts?” They’re still there. I guess you have to delete them too:

mysql> DELETE FROM posts WHERE user_id = 666;
Query OK, 1 row affected (0.11 sec)

It’s a good thing you happened to remember foobar’s user_id huh? So that should do it… right? But what about foobar’s user image? Now you have to go into the filesystem and delete his user image.

$ rm RAILS_ROOT/public/images/users/foobar.jpg

Whew. There, that’s all of it. Finally. Or is it? What if your colleague added the concept of “friendships” between users, and now you have a bunch of orphaned table rows? If this doesn’t seem error prone (and tedious!) already, then you’re a lot braver than I am. Especially since if you hadn’t tried so hard to work around Rails’s database abstraction, all you would have had to do is:

>> User.find_by_username("foobar").destroy

That’s it. One simple line. Your well-designed models will take care of the rest. In case you’re new to Rails, and wondering just what this magical User model might look like in Rails, here it is:

class User < ActiveRecord::Base
  after_destroy :delete_user_image
  has_many :posts, :dependent => :destroy

  private

  def delete_user_image
    FileUtils.rm(RAILS_ROOT + "/public/images/users/#{username}.jpg")
  end
end

Yes, if you’re a true wiz, you could write all of that into your database, but you’re trying to write a Ruby on Rails application, remember? Besides, what happens when your CTO decides that your application will now use DatabaseX, since it’s the hottest thing since the sun? If that frightening occurance isn’t enough, consider that DHH, the creator of Rails, frowns on such database trickery.

I guess the real point of what I’m trying to say is, Rails uses the database so you don’t have to. Try to pretend there is no database, and use script/console instead. You’ll save yourself a lot of grief in the end.

How QBeat Works

No comments

QBeat is one piece of the Mildred project. Specifically, QBeat handles the selection of tracks to play. There are a number of subsytems within QBeat that allow tracks to be selected and played. I’ll start with a list of all the pertinent subsystems, then I’ll describe the main responsibilities of each subsystem.

Subsystems

  • QBeat
  • MPD
  • MpdMonitor
  • Selector
    • WeightedSelector
      • Stats
      • Weight
  • Schedule
  • TimeSlot
  • ProgrammingBlock
  • Fetcher

QBeat

QBeat is the glue that holds all the other pieces together. It gets things rolling, daemonizes the process, opens log files, handles signals, etc. Once the housekeeping is done, QBeat enters a loop, passing control to the various subsystems as appropriate.

MPD

MPD, the Music Player Daemon, is a wholly separate program, one that QBeat uses to handle the nitty gritty of playing audio files. QBeat simply tells MPD which files to play, and MPD takes care of opening the files, and sending them to the appropriate audio devices.

MpdMonitor

The MpdMonitor monitors the MPD playlist (imagine that!) It checks the MPD playlist’s length, and indicates to the Selector if a track should be added to the playlist. When additional tracks are to be queued, the Selector picks the tracks, and MpdMonitor feeds this information to MPD. MpdMonitor also monitors the currently playing track, and reports new tracks as they are played. Lastly, MpdMonitor trims the MPD playlist whenever it grows too long.

Selector

Selector is an abstract class, meaning it is meant to be extended, not used as is. It provides the basic interface expected of a Selector. A Selector’s job is to actually pick the next track (or tracks) to queue.

The Selector class also provides a number of utility functions to derivative Selectors. These functions help with such tasks as loading tracks from ProgrammingBlocks, filtering tracks by user-specified criteria (such as minimum time, or minimum rating), and determining (with help from the ProgrammingBlock) if a single track, whole album, or power block should be queued.

QBeat was designed to allow Selectors to be changed easily. This allows for the easy development of new Selectors, with totally different selection behavior. For example, in addition to WeightedSelector, I have a few test selectors which do such things as selecting tracks 100% at random, or selecting tracks in alphabetical order by title.

WeightedSelector

WeightedSelector is a derivative Selector, and the one generally in use by QBeat. It evaluates each track, and assigns it a weight. Weights are based on when the track, its album, and its artist were last queued, as well as their ratings. Last queued and rating data are normalized, and their sum becomes the track’s weight. Special modifications are made to reduce the effect of high ratings, or to weight new tracks more heavily. The final track to be selected is picked at random from the top weighted tracks.

WeightedSelector is where most of QBeat’s time is spent. It is also where I spend the most of my time tweaking, debugging, and dreaming.

Stats

This is a helper class that maintains minimum, maximum, and range stats for the last queued and rating data of tracks. These stats are very useful in calculating normalized values.

Weight

This is a helper class that allows me to easily calculate, compare and debug the weights of different tracks, artists, or albums.

Schedule

The schedule class determines which ProgrammingBlock should be fed to the Selector. It also provides the Mildred website with support in generating the Schedule page, which displays the Schedule for a given date.

The Schedule loads itself from a YAML file. The file describes which ProgrammingBlocks should be played at what times. Any time a specific ProgrammingBlock is not specified, a default random block is used. ProgrammingBlocks can be scheduled by start and end times, days of the week, days of the month, and months of the year.

TimeSlot

The TimeSlot helps the Schedule determine which ProgrammingBlock to choose by allowing the current time to be compared to each of the times specified in the Schedule. For example, the Schedule will load all of the ProgrammingBlocks. It will then filter out those that aren’t eligible to be scheduled today. Then, with TimeSlot’s help, it orders the remaining ProgrammingBlocks by start and end times. Then it can iterate through the ProgrammingBlocks, and determine which one (if any) should be currently active.

ProgrammingBlock

The ProgrammingBlock determines which artists, albums, or moods should be played. Like Schedule, it is read from a YAML file. ProgrammingBlocks also specify the probability of queueing a single track, an entire album, or a power block. Many ProgrammingBlocks exist, but only one is active at a time, as determined by the Schedule.

Fetcher

The Fetcher is a simple utility class with one purpose, to load the actual Track objects specified in the ProgrammingBlock. This means taking the artists, albums, and moods specified in the ProgrammingBlock, and returning a list of Tracks.

The QBeat Loop

The actual loop looks something like this:

  1. MpdMonitor checks to see if the currently playing track has changed, reporting the new track if it has
  2. MpdMonitor checks the size of the MPD playlist
    1. Selector picks a new track if the MPD playlist is too short
    2. MpdMonitor adds the tracks specified by Selector
  3. MpdMonitor checks the size of the MPD playlist, and trims it if it is too long
  4. QBeat sleeps for some amount of time (currently 10 seconds)

For mildred, I have a weighting system that determines which track to queue next. I call this system QBeat. I often try various tweaks to the algorithm for track weighting; trying and get that “perfect mix.” Debugging this system has proven to be a challenge, largely due simply to the amount of time it takes to collect enough data to be able to determine if the system has any quirks. With my current setup, it takes about 4 minutes to calculate the weight of every track in my collection (more about that in another post.) So even queuing things as fast as possible isn’t always a feasible solution for data generation.

So to help me visualize is going with QBeat, I inserted a number of “tracers”, that is, tracks that report their weight every time a new track is added to the queue. This should allow me to do some empirical testing of the system. I really should be working further to develop better unit tests, but what the hell, this is fun. :)

So without further ado, I present the output from my viz_qbeat script:

http://black.xmtp.net/~ericw/viz_qbeat_snapshot-2007-08-10-17-25.html

The link above is a static version of the page. The live page is updated every five minutes by a cron job. The X axis increments every time a new song is added to the queue. The Y axis is the track’s weight. The list at the bottom is each track, as it was enqueued. A track’s weight rises proportionally with its likeliness of being enqueued.

So what can I learn from this static snapshot? Check out the medium blue line (“Cold Heart” by Project Pitchfork), which starts at a weight of approximately 2.6. At around 38 on the X axis, its weight drops to 1.2. Why did that happen? Looking down through the list of queued tracks, we reach 38, and find that “Inquisition” by Skinny Puppy was queued at that time. While “Inquisition” is not a track by Project Pitchfork, we can see that it is on an album titled Industrial Beatdown, and I happen to know that “Cold Heart” is also on that album. So QBeat worked like it should. Joy.

Here’s a link to the live site: http://black.xmtp.net/~ericw/viz_qbeat.html

—————-
Now playing: Tool – The Patient
via FoxyTunes

Ruby on rails relies on cookies. Perhaps ironically, it doesn’t actually check that a user has cookies enabled. This, as you might imagine, can lead to problems.

I was charged with implementing some sort of cookies test for a project at work. Letting google be my co-pilot, I ran across a blog post that mentioned a way to do it. The method is simple: given a request, check the session for the _session_id variable. If it isn’t there, set it, and redirect the user to the same URL, but with a GET parameter appended to it. The _session_id variable should be set automatically by rails. The GET parameter will indicate that cookie detection is in progress. When the browser follows the redirect, the GET parameter is detected, and if the _session_id still doesn’t exist, then the user has cookies disabled.

Why is all of this required you ask? Because on the first visit to a site, the user can’t possibly have any cookies nor session variables (which are accessed via cookies). Hence the need to set a session variable and reload the page.

I implemented the methodology laid out in the blog post, and while effective, it proved awkward for my company’s use. Because it involved redirecting to a page with a GET argument, e.g. blah.com/my_page?cookies_enabled=testing.

This in and of itself was fine, except that if one restarted the server or deleted the sessions, then reloaded that particular page via its URL, the cookie test would see the GET argument, but not the _session_id session variable that was supposed to be set, and fail. As URLs get cut and pasted around a lot at my work—especially by QA—this didn’t go over well. I needed something different. Here’s what I came up with:

in application.rb:

  before_filter :cookies_required, :except => ["cookies_test"]

  def cookies_test
    if request.cookies["_session_id"].to_s.blank?
      logger.warn("=== cookies are disabled")
      render :template => "shared/cookies_required"
    else
      redirect_back_or_default(:controller => "foo")
    end
  end

  def cookies_required
    return unless request.cookies["_session_id"].to_s.blank?
    session[:return_to] = request.request_uri
    redirect_to(:controller => "foo",
                :action => "cookies_test")
    return false
  end 

Don’t forget to write up some scathing admonishment for the shared/cookies_required view, and you’re half-way there.

I say half-way, because no changes can be made to a rails installation without two things; passing your current tests, and writing some new ones. When I ran my tests, all my functional tests failed. Why? Say it with me class, “the cookie test was redirecting all the functional tests to the page with the scathing admonishment for disabling cookies.” That’s right. Fortunately, this is easily worked around, by adding the following cookie line to your functional tests:

in foo_controller_test.rb:


   def setup
      @controller = FooController.new
      @request    = ActionController::TestRequest.new
      @response   = ActionController::TestResponse.new
      @request.cookies["_session_id"] = "fake cookie bypasses filter"
   end

This way, all your functional tests’ requests have a fake session id, which fakes out the cookie detection test. Voile! All the tests pass. Now, don’t forget to take the time to write a few that don’t set the session id, and make sure that they do get redirected like they’re supposed to.

The advantage of this method, is that there are never any GET parameters passed around, so it’s safe to load any page at any time. Both methods make some assumptions about how rails handles sessions, but I think this is a limited risk, as the session information is integral to rails, so they can’t go changing it willy nilly without upsetting a lot of people anyway.

As an added bonus, the redirects are all done behind the scenes, so the user is ne’er the wiser. A slight delay on their first page load is all they could ever notice.

About Duplo

No comments

This blog is for notes, and perhaps even discussions about software development. The majority of the articles will revolve around problems and solutions I’ve encountered during my adventures in software development.