Author: Joar Jegleim

building vlc for i386

Building vlc-2.0.5 with poudriere in FreeBSD kept failing for our i386 build.
I had numerous errors such as:

[...]
misc/cpu.c:129:1: error: unknown type name 'VLC_MMX_is_not_implemented_on_this_compiler'
VLC_MMX static void ThreeD_Now_test (void)
^
../include/vlc_cpu.h:46:19: note: expanded from macro 'VLC_MMX'
#  define VLC_MMX VLC_MMX_is_not_implemented_on_this_compiler
                  ^
misc/cpu.c:129:9: error: expected identifier or '('
VLC_MMX static void ThreeD_Now_test (void)
        ^
misc/cpu.c:230:35: error: use of undeclared identifier 'SSE_test'
        if (vlc_CPU_check ("SSE", SSE_test))
                                  ^
misc/cpu.c:239:56: error: use of undeclared identifier 'SSE2_test'
    if ((i_edx & 0x04000000) && vlc_CPU_check ("SSE2", SSE2_test))
                                                       ^
misc/cpu.c:246:56: error: use of undeclared identifier 'SSE3_test'
    if ((i_ecx & 0x00000001) && vlc_CPU_check ("SSE3", SSE3_test))
[...]

in the build log.

Looking at the ports Makefile I noticed that:

# prefer clang on 9.1+
.if (${OSVERSION} >= 901000) && exists(${DESTDIR}/usr/bin/clang)
CC=     clang
CXX=    clang++
CPP=    clang-cpp
.else
.if ${ARCH} == "i386"
USE_GCC?=       4.6+ # sse/3dnow detection on i386 needs newer gcc
.endif
.endif

I’m only building packages for 9.1-RELEASE så according to the above the build would use clang instead of gcc. clang is a pretty new compiler, it worked fine for the amd64 build so I just hacked that so that for any i386 build gcc 4.6+ would be used instead

.if ${ARCH} == "i386"
USE_GCC?=       4.6+ # sse/3dnow detection on i386 needs newer gcc
.else
.if (${OSVERSION} >= 901000) && exists(${DESTDIR}/usr/bin/clang)
CC=     clang
CXX=    clang++
CPP=    clang-cpp
.endif
.endif

And the build went through.

I suspect the ports maintainer has fixed this in a recent ports tree, but since I’ve frozen our ports tree from ‘sometime in january this year’ and I can’t / won’t update that yet, building with gcc is fine by me anyway.

vim syntax control

I’ve google’ed this one countless times. I’ll store it here
When vim doesn’t recognize what kind of syntax highlighting to use you can set it manually with

: set syn=sh

(in vim command mode)

If you wonder what syntax highlighting modes are available, in Fedora (and I see Ubuntu and Debian got them at the same place) they’re here:

[joar@saturn syntax]$ pwd
/usr/share/vim/vim73/syntax
[joar@saturn syntax]$ ls | sed 's/\.vim//' | head # showing only the first 10 ones
2html
a2ps
a65
aap
abap
abaqus
abc
abel
acedb
ada
[joar@saturn syntax]$ 

FreeBSD got those in /usr/local/share/vim/vim73/syntax/

zfs to the rescue

So I’ve had some zfs raidz/mirror problems, and again I noticed another troublesome disk in my zfs setup.
This time I noticed a minor sector error on a disk pretty early, and didn’t want to take any chances and decided to replace it at once. The disk is one of those super crappy WD green disks anyway, which I’ve found that REALLY shouldn’t be used in any raid setup / server setup ( anything other than a desktop you don’t care about) .
A bit wiser from last time, this time my nagios nrpe script picked up:

(da134:ciss1:2:11:0): READ(6). CDB: 8 a 9c d3 1 0 
(da134:ciss1:2:11:0): CAM status: CCB request completed with an error
(da134:ciss1:2:11:0): Retrying command

Not a 100% sure if that message really is a sector error, but I’m not taking any chances, I had a lot of troubles with this server last time a disk died.
So I did:

root:~# zpool offline tank da134
root:~# zpool status
[...]
	  mirror-5                DEGRADED     0     0     0
	    da122                 ONLINE       0     0     0
	    11771992511548113470  OFFLINE      0     0     0  was /dev/da134
[...]
root:~# zpool detach tank da134
root:~# zpool status
[...]
mirror-4  ONLINE       0     0     0
	    da98    ONLINE       0     0     0
	    da110   ONLINE       0     0     0
	  da122     ONLINE       0     0     0
[...]
root:~# halt -p
( replaced the drive and turned the server back on )
root:~# zpool attach tank da122 da134
root:~# zpool status
mirror-5  ONLINE       0     0     0
	    da122   ONLINE       0     0     0
	    da134   ONLINE       0     0     0  (resilvering)

I never blog’ed about what really happened from my previous raidz rebuild which went south (to put it mildly). Problem then was that I was running raidz, which supports 1 disk failure, but it turned out I had several block read errors, so after 4 attempts to resilver / rebuild, zfs still wasn’t able to rebuild the fresh drive simply because there was at least 2 other pretty rotten disks in the raid that kept on throwing new sector errors …

So I had to scrap the whole setup, and setup a fresh zfs pool. I got my boss to buy some new disks, but it turned out that 4 of those disks (some samsung disks) wasn’t recognized by the raid hardware controller (?), sooooo I put in some WD green disks there …

BUT I figured since the pool is less than 25% filled, I changed the whole setup from raidz to a mirrored setup, that is 6 mirror pairs with a total of 12 disks, and on top of that I set copies=2 for the backup pool. copies=2 will double the amount of space usage because every block is written to 2 blocks on the disk. As long as I have plenty of space I should be better set for corrupted sectors/blocks, bit rotting and what not. 🙂

root:~# zfs get copies tank/backup
NAME         PROPERTY  VALUE   SOURCE
tank/backup  copies    2       local
root:~# 

And the mirrored zfs setup look like this:

root:~# zpool status
  pool: tank
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Apr 25 14:20:24 2013
        68.2G scanned out of 4.61T at 61.2M/s, 21h37m to go
        13.8G resilvered, 1.44% done
config:

	NAME        STATE     READ WRITE CKSUM
	tank        ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    da2     ONLINE       0     0     0
	    da14    ONLINE       0     0     0
	  mirror-1  ONLINE       0     0     0
	    da26    ONLINE       0     0     0
	    da38    ONLINE       0     0     0
	  mirror-2  ONLINE       0     0     0
	    da50    ONLINE       0     0     0
	    da62    ONLINE       0     0     0
	  mirror-3  ONLINE       0     0     0
	    da74    ONLINE       0     0     0
	    da86    ONLINE       0     0     0
	  mirror-4  ONLINE       0     0     0
	    da98    ONLINE       0     0     0
	    da110   ONLINE       0     0     0
	  mirror-5  ONLINE       0     0     0
	    da122   ONLINE       0     0     0
	    da134   ONLINE       0     0     0  (resilvering)
	logs
	  da1       ONLINE       0     0     0

errors: No known data errors

Wrong keyboard layout in lightdm for (k)ubuntu

I’ve had this annoying thing about my keyboard layout being US for my login window in Kubuntu 12.10. This happened after I got a new wireless keyboard with this unifying usb thing at work. I recently upgraded to 13.04 and the error followed, so I dug up the workaround / fix at bugs.launchpad.net .
My fix was adding this to /etc/lightdm/lightdm.conf

[...]
display-setup-script=setxkbmap no
[...]

Kubuntu update manager has been complaining about failed package download / update

If you, like me, got that really annoying message from update manager regarding failed download of additional data ( usually flashplugin-installer in my case) every time you login, I just found the answer at ubuntu forums .

root@saturn:/var/lib/update-notifier/user.d# pwd
/var/lib/update-notifier/user.d
root@saturn:/var/lib/update-notifier/user.d# ls -l | grep failed
-rw-r--r-- 1 root root 24155 Dec 17 10:33 data-downloads-failed
-rw-r--r-- 1 root root 27963 Feb 15 08:16 data-downloads-failed-permanently
root@saturn:/var/lib/update-notifier/user.d# rm data-downloads-failed*
root@saturn:/var/lib/update-notifier/user.d# 

zfs raidz – replacing a drive

So I had a faulty drive in a zfs raidz configuration. I replaced the drive without setting the faulty drive to offline, or detaching it from the current raidz configuration … I probably did that part wrong. After reboot I had something like this:

root@backupmh:~# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
	the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scan: none requested
config:

	NAME                     STATE     READ WRITE CKSUM
	tank                     DEGRADED     0     0     0
	  raidz1-0               ONLINE       0     0     0
	    da1                  ONLINE       0     0     0
	    da13                 ONLINE       0     0     0
	    da25                 ONLINE       0     0     0
	    da37                 ONLINE       0     0     0
	    da49                 ONLINE       0     0     0
	  raidz1-1               DEGRADED     0     0     0
	    da73                 ONLINE       0     0     0
	    da85                 ONLINE       0     0     0
	    da97                 ONLINE       0     0     0
	    da109                ONLINE       0     0     0
	    9120273794345838000  UNAVAIL      0     0     0  was /dev/da121
	spares
	  da61                   AVAIL   
	  da133                  AVAIL   

errors: No known data errors
root@backupmh:~# ls -l /dev/da121
crw-r-----  1 root  operator    1, 100 14 mar 11:59 /dev/da121

It’s /dev/da121 that I replaced. First I smply wannted to hot-swap the drive, but since this is a really old HP MSA-20 or something, and I’m using som super crap’y cheap sata drives I had to reboot and some other stuff until the raid controller wanted to recognize the drive …
Anyway, to get zfs to rebuild ( or resilver, as zfs calls it) I ended up with:

root@backupmh:~# zpool offline tank /dev/da121
root@backupmh:~# zpool online tank /dev/da121
warning: device '/dev/da121' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present
root@backupmh:~# zpool replace tank /dev/da121

After that I got :

root@backupmh:~# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Thu Mar 14 12:14:51 2013
    6.46G scanned out of 1.36T at 4.60M/s, 85h38m to go
    685M resilvered, 0.46% done
config:

	NAME                       STATE     READ WRITE CKSUM
	tank                       DEGRADED     0     0     0
	  raidz1-0                 ONLINE       0     0     0
	    da1                    ONLINE       0     0     0
	    da13                   ONLINE       0     0     0
	    da25                   ONLINE       0     0     0
	    da37                   ONLINE       0     0     0
	    da49                   ONLINE       0     0     0
	  raidz1-1                 DEGRADED     0     0     0
	    da73                   ONLINE       0     0     0
	    da85                   ONLINE       0     0     0
	    da97                   ONLINE       0     0     0
	    da109                  ONLINE       0     0     0
	    replacing-4            UNAVAIL      0     0     0
	      9120273794345838000  UNAVAIL      0     0     0  was /dev/da121/old
	      da121                ONLINE       0     0     0  (resilvering)
	spares
	  da61                     AVAIL   
	  da133                    AVAIL   

errors: No known data errors

the resilver progress look slow … but it’s getting faster and faster every time I check it (started at an estimate of +600 hours) .
I initially setup the 2 raidz configurations with a hot-spare, I wonder why it didn’t start resilvering to the corresponding hot-spare after reboot (?) .
I should check that out, some day …

Minor munin-node hack

We’ve got a Postgres sql server running 9.1beta3 , and I’m currently setting up some Munin graphs for that server. I ran into an annoying error when I wanted to test one of the postgres_* plugins

# munin-run postgres_bgwriter
Unable to detect PostgreSQL version

It kind of put me off, since I’d already setup those plugins on 4 other servers running postgres …
Google gave me an url for the source code for the munin Pgsql plugin, which is written in Perl . The corresponding plugin on this FreeBSD installation was in :
/usr/local/lib/perl5/site_perl/5.12.4/Munin/Plugin/Pgsql.pm , so I had look at the code there. And it turned out to be this function that spat out my error:

sub get_version {
    my ($self) = @_;

    return if (defined $self->{detected_version});

    my $r = $self->runquery("SELECT version()");
    my $v = $r->[0]->[0];
    die "Unable to detect PostgreSQL version\n"
        unless ($v =~ /^PostgreSQL (\d+)\.(\d+).(\d+) on/);
    $self->{detected_version} = "$1.$2";
}

So the code says that print out “Unable to detect PostgreSQL version\n” unless the query “SELECT version()” returns a string similar to the regula expression /^PostgreSQL (\d+)\.(\d+).(\d+) on/ …
I ran the query on the servers where these plugins worked ok, and compared to the troublesome server.
Working server

=# select version();
                                                     version                                                      
------------------------------------------------------------------------------------------------------------------
 PostgreSQL 8.4.15 on amd64-portbld-freebsd

Non-working server:

select version();
                                                 version                                                 
---------------------------------------------------------------------------------------------------------
 PostgreSQL 9.1beta3 on amd64-portbld-freebsd

Looking at the regular expression /^PostgreSQL (\d+)\.(\d+).(\d+) on/, translated it means something like a string starting with “PostgreSQL [whitespace] [one or more digits] [a period] [one or more digits] [a period] [one or more digits [whitespace] on” This matches the working server which has the string PostgreSQL 8.4.15 on, but the 9.1beta3 doesn’t match (\d+)\.(\d+).(\d+) ! .
Upgrading the Postgres server to 9.1.[something not beta] is out of the question for the moment so I tried to make the regular expression true by changing /^PostgreSQL (\d+)\.(\d+).(\d+) on/ to /^PostgreSQL (\d+)\.(\d+)beta(\d+) on/ , the whole function ended up like this:

sub get_version {
    my ($self) = @_;

    return if (defined $self->{detected_version});

    my $r = $self->runquery("SELECT version()");
    my $v = $r->[0]->[0];
    die "Unable to detect PostgreSQL version\n"
        unless ($v =~ /^PostgreSQL (\d+)\.(\d+)beta(\d+) on/);
    $self->{detected_version} = "$1.$2";
}

And it seem to have been enough to get those munin-node postgres_* plugins working again:

# munin-run postgres_bgwriter
buffers_checkpoint.value 4514399
buffers_clean.value 10922461
buffers_backend.value 63381760
buffers_alloc.value 72530920829

Did I forget about some web server in my dmz (?)

Ok so you’ve been working like h#%”# for a couple months (mabye a year), implementing new application servers, moving around services, upgrading other servers and pretty much been busy. Of course you’re updating documentation all the way ! :p … But for that 1 in a million incident where you just happened to forget about updating the documentation of your servers, say you wonder about “do I have control of all my web servers now ? is it possible I might have forgotten to stop Apache on some random server (?)
nmap comes to the rescue, say I want to find every server in the 192.168.0.0/24 subnet listening to port 80:

nmap -p 80 192.168.0.0/24

Have puppet replace a folder with a symlink

I’ve got some perl webapps that tend to have ‘wrong’ directory setup when I checkout the code from github. We use puppet to take care of these things for us (I will always forget these settings on at least 1 node for every upgrade) .
I noticed puppet may complain about :

[...]ensure: change from directory to link failed: Could not remove existing file

You solve this easily with the:

force => true

directive in your puppet config .

The puppet documentation is extensive, though I find it a bit cumbersome everytime I need a fast lookup for some minor detail.
Thanks to groups.google.com I found this solution quick !
(But puppet already tried to tell me that with : “: Not removing directory; use ‘force’ to override”, but I’m slow :p)