Unix

building vlc for i386

Building vlc-2.0.5 with poudriere in FreeBSD kept failing for our i386 build.
I had numerous errors such as:

[...]
misc/cpu.c:129:1: error: unknown type name 'VLC_MMX_is_not_implemented_on_this_compiler'
VLC_MMX static void ThreeD_Now_test (void)
^
../include/vlc_cpu.h:46:19: note: expanded from macro 'VLC_MMX'
#  define VLC_MMX VLC_MMX_is_not_implemented_on_this_compiler
                  ^
misc/cpu.c:129:9: error: expected identifier or '('
VLC_MMX static void ThreeD_Now_test (void)
        ^
misc/cpu.c:230:35: error: use of undeclared identifier 'SSE_test'
        if (vlc_CPU_check ("SSE", SSE_test))
                                  ^
misc/cpu.c:239:56: error: use of undeclared identifier 'SSE2_test'
    if ((i_edx & 0x04000000) && vlc_CPU_check ("SSE2", SSE2_test))
                                                       ^
misc/cpu.c:246:56: error: use of undeclared identifier 'SSE3_test'
    if ((i_ecx & 0x00000001) && vlc_CPU_check ("SSE3", SSE3_test))
[...]

in the build log.

Looking at the ports Makefile I noticed that:

# prefer clang on 9.1+
.if (${OSVERSION} >= 901000) && exists(${DESTDIR}/usr/bin/clang)
CC=     clang
CXX=    clang++
CPP=    clang-cpp
.else
.if ${ARCH} == "i386"
USE_GCC?=       4.6+ # sse/3dnow detection on i386 needs newer gcc
.endif
.endif

I’m only building packages for 9.1-RELEASE så according to the above the build would use clang instead of gcc. clang is a pretty new compiler, it worked fine for the amd64 build so I just hacked that so that for any i386 build gcc 4.6+ would be used instead

.if ${ARCH} == "i386"
USE_GCC?=       4.6+ # sse/3dnow detection on i386 needs newer gcc
.else
.if (${OSVERSION} >= 901000) && exists(${DESTDIR}/usr/bin/clang)
CC=     clang
CXX=    clang++
CPP=    clang-cpp
.endif
.endif

And the build went through.

I suspect the ports maintainer has fixed this in a recent ports tree, but since I’ve frozen our ports tree from ‘sometime in january this year’ and I can’t / won’t update that yet, building with gcc is fine by me anyway.

vim syntax control

I’ve google’ed this one countless times. I’ll store it here
When vim doesn’t recognize what kind of syntax highlighting to use you can set it manually with

: set syn=sh

(in vim command mode)

If you wonder what syntax highlighting modes are available, in Fedora (and I see Ubuntu and Debian got them at the same place) they’re here:

[joar@saturn syntax]$ pwd
/usr/share/vim/vim73/syntax
[joar@saturn syntax]$ ls | sed 's/\.vim//' | head # showing only the first 10 ones
2html
a2ps
a65
aap
abap
abaqus
abc
abel
acedb
ada
[joar@saturn syntax]$ 

FreeBSD got those in /usr/local/share/vim/vim73/syntax/

zfs to the rescue

So I’ve had some zfs raidz/mirror problems, and again I noticed another troublesome disk in my zfs setup.
This time I noticed a minor sector error on a disk pretty early, and didn’t want to take any chances and decided to replace it at once. The disk is one of those super crappy WD green disks anyway, which I’ve found that REALLY shouldn’t be used in any raid setup / server setup ( anything other than a desktop you don’t care about) .
A bit wiser from last time, this time my nagios nrpe script picked up:

(da134:ciss1:2:11:0): READ(6). CDB: 8 a 9c d3 1 0 
(da134:ciss1:2:11:0): CAM status: CCB request completed with an error
(da134:ciss1:2:11:0): Retrying command

Not a 100% sure if that message really is a sector error, but I’m not taking any chances, I had a lot of troubles with this server last time a disk died.
So I did:

root:~# zpool offline tank da134
root:~# zpool status
[...]
	  mirror-5                DEGRADED     0     0     0
	    da122                 ONLINE       0     0     0
	    11771992511548113470  OFFLINE      0     0     0  was /dev/da134
[...]
root:~# zpool detach tank da134
root:~# zpool status
[...]
mirror-4  ONLINE       0     0     0
	    da98    ONLINE       0     0     0
	    da110   ONLINE       0     0     0
	  da122     ONLINE       0     0     0
[...]
root:~# halt -p
( replaced the drive and turned the server back on )
root:~# zpool attach tank da122 da134
root:~# zpool status
mirror-5  ONLINE       0     0     0
	    da122   ONLINE       0     0     0
	    da134   ONLINE       0     0     0  (resilvering)

I never blog’ed about what really happened from my previous raidz rebuild which went south (to put it mildly). Problem then was that I was running raidz, which supports 1 disk failure, but it turned out I had several block read errors, so after 4 attempts to resilver / rebuild, zfs still wasn’t able to rebuild the fresh drive simply because there was at least 2 other pretty rotten disks in the raid that kept on throwing new sector errors …

So I had to scrap the whole setup, and setup a fresh zfs pool. I got my boss to buy some new disks, but it turned out that 4 of those disks (some samsung disks) wasn’t recognized by the raid hardware controller (?), sooooo I put in some WD green disks there …

BUT I figured since the pool is less than 25% filled, I changed the whole setup from raidz to a mirrored setup, that is 6 mirror pairs with a total of 12 disks, and on top of that I set copies=2 for the backup pool. copies=2 will double the amount of space usage because every block is written to 2 blocks on the disk. As long as I have plenty of space I should be better set for corrupted sectors/blocks, bit rotting and what not. 🙂

root:~# zfs get copies tank/backup
NAME         PROPERTY  VALUE   SOURCE
tank/backup  copies    2       local
root:~# 

And the mirrored zfs setup look like this:

root:~# zpool status
  pool: tank
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Apr 25 14:20:24 2013
        68.2G scanned out of 4.61T at 61.2M/s, 21h37m to go
        13.8G resilvered, 1.44% done
config:

	NAME        STATE     READ WRITE CKSUM
	tank        ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    da2     ONLINE       0     0     0
	    da14    ONLINE       0     0     0
	  mirror-1  ONLINE       0     0     0
	    da26    ONLINE       0     0     0
	    da38    ONLINE       0     0     0
	  mirror-2  ONLINE       0     0     0
	    da50    ONLINE       0     0     0
	    da62    ONLINE       0     0     0
	  mirror-3  ONLINE       0     0     0
	    da74    ONLINE       0     0     0
	    da86    ONLINE       0     0     0
	  mirror-4  ONLINE       0     0     0
	    da98    ONLINE       0     0     0
	    da110   ONLINE       0     0     0
	  mirror-5  ONLINE       0     0     0
	    da122   ONLINE       0     0     0
	    da134   ONLINE       0     0     0  (resilvering)
	logs
	  da1       ONLINE       0     0     0

errors: No known data errors

zfs raidz – replacing a drive

So I had a faulty drive in a zfs raidz configuration. I replaced the drive without setting the faulty drive to offline, or detaching it from the current raidz configuration … I probably did that part wrong. After reboot I had something like this:

root@backupmh:~# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
	the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scan: none requested
config:

	NAME                     STATE     READ WRITE CKSUM
	tank                     DEGRADED     0     0     0
	  raidz1-0               ONLINE       0     0     0
	    da1                  ONLINE       0     0     0
	    da13                 ONLINE       0     0     0
	    da25                 ONLINE       0     0     0
	    da37                 ONLINE       0     0     0
	    da49                 ONLINE       0     0     0
	  raidz1-1               DEGRADED     0     0     0
	    da73                 ONLINE       0     0     0
	    da85                 ONLINE       0     0     0
	    da97                 ONLINE       0     0     0
	    da109                ONLINE       0     0     0
	    9120273794345838000  UNAVAIL      0     0     0  was /dev/da121
	spares
	  da61                   AVAIL   
	  da133                  AVAIL   

errors: No known data errors
root@backupmh:~# ls -l /dev/da121
crw-r-----  1 root  operator    1, 100 14 mar 11:59 /dev/da121

It’s /dev/da121 that I replaced. First I smply wannted to hot-swap the drive, but since this is a really old HP MSA-20 or something, and I’m using som super crap’y cheap sata drives I had to reboot and some other stuff until the raid controller wanted to recognize the drive …
Anyway, to get zfs to rebuild ( or resilver, as zfs calls it) I ended up with:

root@backupmh:~# zpool offline tank /dev/da121
root@backupmh:~# zpool online tank /dev/da121
warning: device '/dev/da121' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present
root@backupmh:~# zpool replace tank /dev/da121

After that I got :

root@backupmh:~# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Thu Mar 14 12:14:51 2013
    6.46G scanned out of 1.36T at 4.60M/s, 85h38m to go
    685M resilvered, 0.46% done
config:

	NAME                       STATE     READ WRITE CKSUM
	tank                       DEGRADED     0     0     0
	  raidz1-0                 ONLINE       0     0     0
	    da1                    ONLINE       0     0     0
	    da13                   ONLINE       0     0     0
	    da25                   ONLINE       0     0     0
	    da37                   ONLINE       0     0     0
	    da49                   ONLINE       0     0     0
	  raidz1-1                 DEGRADED     0     0     0
	    da73                   ONLINE       0     0     0
	    da85                   ONLINE       0     0     0
	    da97                   ONLINE       0     0     0
	    da109                  ONLINE       0     0     0
	    replacing-4            UNAVAIL      0     0     0
	      9120273794345838000  UNAVAIL      0     0     0  was /dev/da121/old
	      da121                ONLINE       0     0     0  (resilvering)
	spares
	  da61                     AVAIL   
	  da133                    AVAIL   

errors: No known data errors

the resilver progress look slow … but it’s getting faster and faster every time I check it (started at an estimate of +600 hours) .
I initially setup the 2 raidz configurations with a hot-spare, I wonder why it didn’t start resilvering to the corresponding hot-spare after reboot (?) .
I should check that out, some day …

Minor munin-node hack

We’ve got a Postgres sql server running 9.1beta3 , and I’m currently setting up some Munin graphs for that server. I ran into an annoying error when I wanted to test one of the postgres_* plugins

# munin-run postgres_bgwriter
Unable to detect PostgreSQL version

It kind of put me off, since I’d already setup those plugins on 4 other servers running postgres …
Google gave me an url for the source code for the munin Pgsql plugin, which is written in Perl . The corresponding plugin on this FreeBSD installation was in :
/usr/local/lib/perl5/site_perl/5.12.4/Munin/Plugin/Pgsql.pm , so I had look at the code there. And it turned out to be this function that spat out my error:

sub get_version {
    my ($self) = @_;

    return if (defined $self->{detected_version});

    my $r = $self->runquery("SELECT version()");
    my $v = $r->[0]->[0];
    die "Unable to detect PostgreSQL version\n"
        unless ($v =~ /^PostgreSQL (\d+)\.(\d+).(\d+) on/);
    $self->{detected_version} = "$1.$2";
}

So the code says that print out “Unable to detect PostgreSQL version\n” unless the query “SELECT version()” returns a string similar to the regula expression /^PostgreSQL (\d+)\.(\d+).(\d+) on/ …
I ran the query on the servers where these plugins worked ok, and compared to the troublesome server.
Working server

=# select version();
                                                     version                                                      
------------------------------------------------------------------------------------------------------------------
 PostgreSQL 8.4.15 on amd64-portbld-freebsd

Non-working server:

select version();
                                                 version                                                 
---------------------------------------------------------------------------------------------------------
 PostgreSQL 9.1beta3 on amd64-portbld-freebsd

Looking at the regular expression /^PostgreSQL (\d+)\.(\d+).(\d+) on/, translated it means something like a string starting with “PostgreSQL [whitespace] [one or more digits] [a period] [one or more digits] [a period] [one or more digits [whitespace] on” This matches the working server which has the string PostgreSQL 8.4.15 on, but the 9.1beta3 doesn’t match (\d+)\.(\d+).(\d+) ! .
Upgrading the Postgres server to 9.1.[something not beta] is out of the question for the moment so I tried to make the regular expression true by changing /^PostgreSQL (\d+)\.(\d+).(\d+) on/ to /^PostgreSQL (\d+)\.(\d+)beta(\d+) on/ , the whole function ended up like this:

sub get_version {
    my ($self) = @_;

    return if (defined $self->{detected_version});

    my $r = $self->runquery("SELECT version()");
    my $v = $r->[0]->[0];
    die "Unable to detect PostgreSQL version\n"
        unless ($v =~ /^PostgreSQL (\d+)\.(\d+)beta(\d+) on/);
    $self->{detected_version} = "$1.$2";
}

And it seem to have been enough to get those munin-node postgres_* plugins working again:

# munin-run postgres_bgwriter
buffers_checkpoint.value 4514399
buffers_clean.value 10922461
buffers_backend.value 63381760
buffers_alloc.value 72530920829

Did I forget about some web server in my dmz (?)

Ok so you’ve been working like h#%”# for a couple months (mabye a year), implementing new application servers, moving around services, upgrading other servers and pretty much been busy. Of course you’re updating documentation all the way ! :p … But for that 1 in a million incident where you just happened to forget about updating the documentation of your servers, say you wonder about “do I have control of all my web servers now ? is it possible I might have forgotten to stop Apache on some random server (?)
nmap comes to the rescue, say I want to find every server in the 192.168.0.0/24 subnet listening to port 80:

nmap -p 80 192.168.0.0/24

Puppet, FreeBSD and custom $PACKAGESITE

I recently noticed a puppet managed server installed a package that wasn’t supposed to be available …
The thing is, we’re running Tinderbox to build our own packages from ports, it’s kind of normal when using FreeBSD and it’s ‘rolling release, source based software package system (ports)’ .
Anyway, we’ve setup all servers with a global PACKAGESITE variable pointing to our local repo, so that pkg_add and portmaster will pull packages from there.
We distribute config files and so on via puppet, and what happened was that when running puppet initially it will push out the global /etc/csh.cshrc config which contains the magic PACKAGESITE setup, but at the same time it will start installing a bunch of packages without actually source’ing the /etc/csh.cshrc … So actually PACKAGESITE is empty the first time puppet runs, and that means pkg_add will default to the official FreeBSD packagesite …
This is not how I want it, because for instance a FreeBSD 8.3-RELEASE with an empty PACKAGESITE will pull packages from a spesific ports ‘freeze’ around the 8.2-RELEASE, and that’s a long time ago today … Next time I install any extra packages on that system, it’ll use our internal PACKAGESITE, and now our system has a mix of our lates ports and the original ports from RELEASE, and stuff start to get complicated ….

FreeBSD upgrading packages with portmaster

So we’re running our local package repository for our FreeBSD farm. When using FreeBSD, can choose to stick to binary packages for your current RELEASE (which never get patched for security holes), or you could use binary packages for the stable branch of your RELEASE (they’ll be patch’ed for security holes, but they’ll also keep rolling into new versions, e.g. if you got a 100 or so servers you kinda fucked if you want to have consistency with versions of packages on your servers), or you could, like us (and many other FreeBSD admins) run your local repo.
With a local repo you get control on versions of packages, at least, as a bonus you’ll also get all kinds of freakin’ dependency problems, tinderbox doesn’t always cooperate, and so on.
So we configure portmaster on each server to use binary packages only, and point to our package repo server . Then you’ll ‘only’ have to issue portmaster -a to upgrade your packages on a server… I’d say you got about 30% chance of that working out for you on your first run .
Here’s what I usually have to do (kinda pseudo’ish cli):

portmaster -a
(portmaster brakes on package a ... so manually remove that package so that portmaster -a could finish, then reinstall the troublesome package)
pkg_info | grep package a
pkg_delete package a
ERROR: package b depends on package a
pkg_delete package b
pkg_delete package a
portmaster -a
(portmaster brakes on package c ... so manually remove that package so that portmaster -a could finish, then reinstall the troublesome package)
pkg_info | grep package c
pkg_delete package c
ERROR: package d depends on package c
pkg_delete package d
ERROR: package e depends on package d
pkg_delete package e
pkg_delete package d
pkg_delete package c
portmaster -a
(let's say this time portmaster managed to complete)
pkg_add -r [all the packages you manually removed]
(now I suddenly got a)
pkg_version: corrupted record (pkgdep line without argument), ignoring
(google it, mabye you get <a href=http://www.cyberciti.biz/faq/pkg_version-corrupted-record-pkgdep-line-without-argument-ignoring/>here</a> so you try)
portmaster --check-depends
you'll mabye reinstall any troublesome packages, and if you're lucky you're done

It’s probably because I got more than 10 years of experience in using Debian, and only about 6 months experience in using FreeBSD that I keep getting myself into these kind of problems, but I’m still getting grumpy about it !

semget errors in FreeBSD Jail’s

We got some semget errors when trying to start uwsgi inside a FreeBSD jail . The solution is to set some /boot/loader.conf variables

kern.ipc.shmmni="512"
kern.ipc.semmni="512"
kern.ipc.semmnu="512"
kern.ipc.semmns="1024"
kern.ipc.semume="512"

But we still had some problems in the jail after reboot . It turns out FreeBSD sysctl variable security.jail.sysvipc_allowed defaults to 0, that is ipc by default is disabled for jails. Note that issuing:

# sysctl security.jail.sysvipc_allowed=1

and restarting the jail isn’t enough, ’cause /etc/rc.d/jail will reset that variable upon jail restart, you’d also have to have this in /etc/Rc.conf

jail_sysvipc_allow="YES"

and you’re able to start jails with ipc support .

This is explained in more detail at www.freebsddiary.org
Wonder what all these cryptic sysctl variable names mean ? try:

# sysctl -d kern.ipc.shmmni
kern.ipc.shmmni: Number of shared memory identifiers
# 

Also note that: I ran into an other problem where a lot of ipc semaphores where taken but not released, I wrote a small script to fix that:

#!/bin/sh

ipcs | grep [username] | awk '{print $2}' > sem.txt
for i in `cat sem.txt`; do 
 ipcrm -s $i; 
done;

substitute [username] with the username of the users semaphores you’d want to delete .