Software Freedom Law Center

When your apt-mirror is always downloading

When I started building our apt-mirror, I ran into a problem: the machine was throttled against ubuntu.com's servers, but I had completed much of the download (which took weeks to get multiple distributions). I really wanted to roll out the solution quickly, particularly because the service from the remote servers was worse than ever due to the throttling that the mirroring created. But, with the mirror incomplete, I couldn't so easily make available incomplete repositories.

The solution was to simply let apache redirect users on to the real servers if the mirror doesn't have the file. The first order of business for that is to rewrite and redirect URLs when files aren't found. This is a straightforward Apache configuration:

   RewriteEngine on
   RewriteLogLevel 0
   RewriteCond %{REQUEST_FILENAME} !^/cgi/
   RewriteCond /var/spool/apt-mirror/mirror/archive.ubuntu.com%{REQUEST_FILENAME} !-F
   RewriteCond /var/spool/apt-mirror/mirror/archive.ubuntu.com%{REQUEST_FILENAME} !-d
   RewriteCond %{REQUEST_URI} !(Packages|Sources)\.bz2$
   RewriteCond %{REQUEST_URI} !/index\.[^/]*$ [NC]
   RewriteRule ^(http://%{HTTP_HOST})?/(.*) http://91.189.88.45/$2 [P]
 

Note a few things there:

Once I do a rewrite like this for each of the hosts I'm replacing with a mirror, I'm almost done. The problem is that if for any reason my site needs to give a 403 to the clients, I would actually like to double-check to be sure that the URL doesn't happen to work at the place I'm mirroring from.

My hope was that I could write a RewriteRule based on what the HTTP return code would be when the request completed. This was really hard to do, it seemed, and perhaps undoable. The quickest solution I found was to write a CGI script to do the redirect. So, in the Apache config I have:

ErrorDocument 403 /cgi/redirect-forbidden.cgi

And, the CGI script looks like this:

#!/usr/bin/perl

use strict;
use CGI qw(:standard);

my $val = $ENV{REDIRECT_SCRIPT_URI};

$val =~ s%^http://(\S+).sflc.info(/.*)$%$2%;
if ($1 eq "ubuntu-security") {
   $val = "http://91.189.88.37$val";
} else {
   $val = "http://91.189.88.45$val";
}

print redirect($val);

With these changes, the user will be redirected to the original when the files aren't available on the mirror, and as the mirror gets more accurate, they'll get more files from the mirror.

I still have problems if for any reason the user gets a Packages or Sources file from the original site before the mirror is synchronized, but this rarely happens since apt-mirror is pretty careful. The only time it might happen is if the user did an apt-get update when not connected to our VPN and only a short time later did one while connected.

Posted by Bradley M. Kuhn on January 24, 2008. Please email any comments on this entry to <bkuhn@softwarefreedom.org>.

Other SFLC Technology Blog posts...

Main Page | Contact | Privacy Policy | News Feeds

[frdm] Support SFLC