Setting Up A Highly Available NFS Server (2) [Linux-HA]

Now that I know that this, at least apparently works it makes me really consider whether or not I need a culstered NFS system. I could simply create an appropriately sized LVM for each service requiring HA storage. This would likely work out fine, however, I will want to serve shares to physical machines. The next evolution of this network will be HA email servers, and I will want to use Maildirs on a NFS. I will also likely play with LDAP and want to move the Home directories to NFS. I may find out later that it was better to do it another way, but for now I will stick with the 1.2TB LVM served out by the NFS server.

High Availability NFS (Network File System)が必要な理由は、
データストレージを2つの物理サーバーで共有利用していた場合、いくらWeb ServerがロードバランサーによってHigh Availabilityとなっていても共有データストレージが壊れてしまった場合、完全なWebのコンテンツを提供することができないため、NFSも2つの物理サーバーにそれぞれ設定し、データをDRBDを使って冗長化(Redundant)することでデータ提供を可能とする。(ちなみにオープンソース版のDRBDは2台までのミラーリングが可能であるとのこと。ネットワーク経由でRAID1みたいなことができるなんてすごい。)


Linux高信頼サーバ構築ガイド クラスタリング編 (Industrial Computing Series)

Linux高信頼サーバ構築ガイド クラスタリング編 (Industrial Computing Series)

  • 作者: 笠野 英松
  • 出版社/メーカー: CQ出版
  • 発売日: 2010/02
  • メディア: 単行本

Now it is time to setup the same for the second NFS server. As it is currently physically residing on asobimaster, I will want to move it to asobihost before adding the logical volume as a disk. This is a good time to test out live migration.

Live Migration.png

Click on the arrow beside our 2nd NFS server and select migrate. On the next screen select online migration. During the migration I was pinging the host and had an open ssh connection to it. The ping seemed to pause for about 2 seconds, but the ping results didn't show a two second pause. The secure shell stayed connected throughout.
<図2>proxmox上でサーバーマイグレーション プログレス画面
Live Migration2.png

Now it is time to shutdown our NFS2 (asobihost's NFS) server and make the same changes we did, by adding the LVM volume as a drive. Make sure to do this from the Proxmox server that is hosting the server. In this case asobihost. (ここでasobihostのNFSをシャットダウンし下記のコマンドでLVM volumeを追加する。)
qm set 114 -virtio1 /dev/drbdvg/storage
そして再度NFS2 (asobihost's NFS) serverを起動。

As it runs out, things went exactly as before, however now I have some trouble. when I mount the /dev/vdb volume in NFS2 I do not see the test directory that was created on NFS1. A couple of tests show that the data is not replicating properly. I know for a fact that DRBD is working, so the issue must be with the way I have mounted the LVM's as partitions.

It appears that the way LVM handles data is quite different from regular partitions. So let's install LVM2 on the NFS2 VM and see if it makes any differences.

apt-get install lvm2

The following extra packages will be installed:
The following NEW packages will be installed:  dmsetup lvm2
0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 405kB of archives.
After this operation, 1044kB of additional disk space will be used.
Do you want to continue [Y/n]? y
Get:1 lenny/main dmsetup 2:1.02.27-4 [39.7kB]
Get:2 lenny/main lvm2 2.02.39-8 [366kB]
Fetched 405kB in 6s (62.1kB/s)
Selecting previously deselected package dmsetup.
(Reading database ... 19118 files and directories currently installed.)
Unpacking dmsetup (from .../dmsetup_2%3a1.02.27-4_amd64.deb) ...
Selecting previously deselected package lvm2.
Unpacking lvm2 (from .../lvm2_2.02.39-8_amd64.deb) ...
Processing triggers for man-db ...
Setting up dmsetup (2:1.02.27-4) ...
Setting up lvm2 (2.02.39-8) ...
Setting up LVM Volume Groups  Reading all physical volumes.  This may take a while...  /dev/cdrom: open failed: Read-only file system  Attempt to close device '/dev/cdrom' which is not open.

This last line shows us that LVM doesn't see any useable block devices. But what if instead of mounting the LVM volume like we did, we instead mount the /dev/drbd0 device as a drive.
I did the following on asobihost with NFS2

qm set 114 -virtio1 /dev/drbd0

then on NFS2 pvscan
NFS2:~# pvscan
 /dev/cdrom: open failed: Read-only file system
 Attempt to close device '/dev/cdrom' which is not open.
 PV /dev/vdb   VG drbdvg   lvm2 [1.82 TB / 544.96 GB free]
 Total: 1 [1.82 TB] / in use: 1 [1.82 TB] / in no VG: 0 [0   ]

Wow, it sees my volume group! Okay lets try

 ACTIVE            '/dev/drbdvg/vm-110-disk-1' [32.00 GB] inherit
 ACTIVE            '/dev/drbdvg/vm-102-disk-1' [6.00 GB] inherit
 ACTIVE            '/dev/drbdvg/vm-106-disk-1' [32.00 GB] inherit
 ACTIVE            '/dev/drbdvg/vm-111-disk-1' [32.00 GB] inherit
 ACTIVE            '/dev/drbdvg/storage' [1.17 TB] inherit
 ACTIVE            '/dev/drbdvg/vm-113-disk-1' [8.00 GB] inherit
 ACTIVE            '/dev/drbdvg/vm-114-disk-1' [8.00 GB] inherit

 --- Logical volume ---
 LV Name                /dev/drbdvg/vm-110-disk-1
 VG Name                drbdvg
 LV UUID                eNeAwn-CMAT-n5Bu-1Omc-Wg8i-VWVT-QAFUiJ
 LV Write Access        read/write
 LV Status              available
 # open                 0
 LV Size                32.00 GB
 Current LE             8192
 Segments               1
 Allocation             inherit
 Read ahead sectors     auto
 - currently set to     256
 Block device           253:0

 --- Logical volume ---
 LV Name                /dev/drbdvg/vm-102-disk-1
 VG Name                drbdvg
 LV UUID                hce0KH-xP0p-HMhj-tXuJ-rVz5-a9LX-l3VdML
 LV Write Access        read/write
 LV Status              available
 # open                 0
 LV Size                6.00 GB
 Current LE             1536
 Segments               1
 Allocation             inherit
 Read ahead sectors     auto
 - currently set to     256
 Block device           253:1

 --- Logical volume ---  LV Name                /dev/drbdvg/storage
 VG Name                drbdvg
 LV UUID                VdS6PZ-pWfa-f6Sc-CNMd-zPSd-SU6V-24PorN
 LV Write Access        read/write
 LV Status              available
 # open                 0
 LV Size                1.17 TB
 Current LE             307200
 Segments               1
 Allocation             inherit
 Read ahead sectors     auto
 - currently set to     256
 Block device           253:4

 --- Logical volume ---
 LV Name                /dev/drbdvg/vm-113-disk-1
 VG Name                drbdvg
 LV UUID                WBHb9s-wLZy-hu5z-nGJa-kt2T-EFpE-1Alv8C
 LV Write Access        read/write
 LV Status              available
 # open                 0
 LV Size                8.00 GB
 Current LE             2048
 Segments               1
 Allocation             inherit
 Read ahead sectors     auto
 - currently set to     256
 Block device           253:5

 --- Logical volume ---
 LV Name                /dev/drbdvg/vm-114-disk-1
 VG Name                drbdvg
 LV UUID                1YLQ17-ZkJg-VhdG-HVaB-hvdu-eT66-0GktyI
 LV Write Access        read/write
 LV Status              available
 # open                 0
 LV Size                8.00 GB
 Current LE             2048
 Segments               1
 Allocation             inherit
 Read ahead sectors     auto
 - currently set to     256
 Block device           253:6

Okay it can see everything! It looks like I have the keys to the kingdom here.
I was unable to mount it though. In NFS2 I decided to try


and a number of errors were repaired. I had been mucking about in the file system so I wasn't to surprised at this result.

After this I was able to mount /dev/drbdvg/storage on both machines...

and had the exact same result as before. A long dive into the works of DRBD and LVM showed me that I need to use a clustering file system for what I am attempting in order to prevent data corruption, but there may still be a way to do this with ext3. If we don't load balance the NFS server, and just make it a High Availability (hot standby system), we might be able to make this work.

オープンソース関連のおすすめ書籍 [オープンソース活用]


オープンソースソフトウェアの本当の使い方 (技評SE新書)

オープンソースソフトウェアの本当の使い方 (技評SE新書)

  • 作者: 濱野 賢一朗/鈴木 友峰
  • 出版社/メーカー: 技術評論社
  • 発売日: 2007/05/03
  • メディア: 単行本(ソフトカバー)

オープンソースがなぜビジネスになるのか (MYCOM新書)

オープンソースがなぜビジネスになるのか (MYCOM新書)

  • 作者: 井田 昌之
  • 出版社/メーカー: 毎日コミュニケーションズ
  • 発売日: 2006/06
  • メディア: 新書

Mulateur Libre: Wine, Xen, Qemu, Pcsx2, Cygwin, Kernel-Based Virtual Machine, Proxmox, Scummvm, Zsnes, Pearpc, Dosbox, Mednafen, Yabau

Mulateur Libre: Wine, Xen, Qemu, Pcsx2, Cygwin, Kernel-Based Virtual Machine, Proxmox, Scummvm, Zsnes, Pearpc, Dosbox, Mednafen, Yabau

  • 作者:
  • 出版社/メーカー: Books LLC
  • 発売日: 2010/08
  • メディア: ペーパーバック

Setting Up A Highly Available NFS Server (1) [Linux-HA]

今日は自宅でHighly Available NFS Server構築です。

Proxmox ver 1.6(を自宅では使っています。

LVM(Logical Volume Manager) running over DRBD.


別々のサーバーのデータをミラーリングすることがこれによって可能となっている。これはHigh Availability (HA=高可用性)には不可欠な構成だ。

This as it stands now is not meant to be a guide. It will eventually become one (hopefully). It is currently more a record of what we have done, what is working and what isn't...far more the latter at this point. Now back to the article.

LVM is not truly necessary for high availability (HA), however for our virtualized environment it has the advantage of letting us take snapshot backups of our VMs. This means that we do not need to have any down time in order to perform backups. Proxmox is also designed to work with LVM. When we create a new VM assigned to our storage volume, Proxmox creates a new LM for each virtual disk. This setup also allows us to live migrate VMs between the two servers in our cluster.

私たちの環境はProxmox Clusterで上記の構成サーバーで動作しています。
私たちの取りあえずのゴールはLoadbalanced High-Availability Web Serverを構成することですが、その前に、Highly Available NFS Serverを準備します。


Wikiには次のように説明がある。OpenVZ は、Linuxカーネルをベースに開発されたOSレベルのサーバ仮想化ソフト。1つの物理サーバ上に複数の独立したOSインスタンスを作成することができる。 ハイパーバイザー型(ハードウェアレベル)の仮想化ソフトであるVMwareやXenに比べて稼動させる環境数で勝っているが、swapが使えない為、与えられたメモリ容量をオーバーするとプロセスを強制終了させてしまうなどの理由で、VPSサービス利用者からの評判はあまり良くない。 また、Linux以外の環境(Windowsなど)を動作させることはできない。
Official site is

We are going to build our HA NFS as per the diagram above and have it provide common storage for our web servers. As we are limited to two physical machines this presents a couple of challenges. With Proxmox we have the options of two types of VMs. The first type is the "container" type, based on OpenVZ. The second type is the fully virtualized variety using KVM. There are advantages and disadvantages to both. OpenVZ based machines are basically a virtualized OS and have no hypervisor layer. This means that there is very little overhead and the VMs created in this fashion operate at near "bare metal" speeds. The fully virtualized systems are slower, especially considering io access (disk access/networking). You would think that for a NFS server the container method would be better due to these reasons, however the OpenVZ containers cannot access raw disks outside of their containers. This means that there will be no way to mount our storage volumes and therefore we are unable to use this option. We can partially mitigate the io issue by using paravirtualized drivers (virtio) for our KVM builds. We'll use containers for our load balancers.

So let's get started!

First let's grab the latest version of Debian for our platform. If you are in Japan I find this mirror is fast (I can get the CD in about 5 minutes) .

Just grab the first CD for your architecture and let's begin. If you are new to Proxmox, we aren't going to get into setting it up here, but the basic install on a new machine is a breeze.

Let's login to the Proxmox web management interface.

proxmox home screen.png

Here is a brief view of our setup. We have the one master and the one node. It is called asobihost for historical reasons...and I just couldn't bring myself to call it asobislave. I kind of wanted to...but cooler heads prevailed. Anyhow in the time it took to write this the iso image cam down, and we first need to upload it to Proxmox by using the Iso Images link on the left. Once that is done go to the Virtual Machines link, hit create and select Fully Virtualized (we are going to build our NFS VMs first)

NFS Creation.png

Here we have chosen our installation media. Our disk storage is our LVM group which I have named VMcluster and we have chosen 512MB of RAM and just 8GMB of space as we will be adding LVMs from the host later. Another nice thing about Proxmox, is that it is possible to adjust the RAM and the CPU limits on the fly. Though if you are planning on using more than one core, bet to set that up during the build.

The other main items to note are the disk type and the network card. In both cases I have chosen the virtio option as these drivers offer the best performance. For networking, the e1000 option is also a good choice for the network card. Click create and you will get a message saying the machine has been created. Go back to the Virtual Machines link and start your new NFS to be up. Click on the VM and select Open VNC Console and you are ready to do the initial install of Debian.

Debian Install.png

I am not going to walk you through the Debian install except to say this write-up will be based on the "standard system" option, but get the first one started, then create the 2nd machine and you can start installing it as well. Once the core is installed and it is downloading packages is a good time for a coffee.

While drinking your coffee and waiting, I can tell you about the other advantage of OpenVZ. It is based on templates. Once the template is created, you just upload it and you are done. No waiting (coffee optional). Templates are available from OpenVZ as well as Proxmox, or you can roll your own.

Okay now the install on the first VM is finished. The first thing to do is set the static IPs and install openssh so we can get out of this clunky VNC window and into a a secure shell.

Edit your interfaces file to set the address using your editor of choice. I use vi...have been using it for years...still don't know how to use it properly.

vi /etc/networking/interfaces

Your final file should look something like this.
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface auto loiface lo inet loopback

# The primary network interface
#allow-hotplug eth0
#iface eth0 inet dhcp
auto eth0

iface eth0 inet static
address 192.168.your.address
gateway 192.168.your.gateway


/etc/init.d/networking restart

you can confirm your new setting by looking at the output of ifconfig
I had to edit my sources.list to remove the cd reference so

vi /etc/apt/sources.list

and comment out the CD reference if it is in there.
Time to install ssh server and upgrade the system.

apt-get update
apt-get install openssh-server
apt-get upgrade

Then try and ssh in. If it works then go ahead and config the second machine in the same way.
Now onto the scary part. We currently have a VM running on a virtual disk created in a logical volume, in a volume group on top of DRBD. I am still a newbie when it comes to LVM and DRBD. But there are a few things to remember. The DRBD past should just be like a hard disk, the volume groups and logical volumes should be similar to partitions. There are commands to view what is going on so on the host machine (asobimaster) lets look at the following.

pvdisplay shows us the physical device information
--- Physical volume --- PV Name /dev/drbd0
VG Name drbdvg
PV Size 1.82 TB / not usable 312.00 KB
Allocatable yes
PE Size (KByte) 4096
Total PE 476917
Free PE 139509
Allocated PE 337408
PV UUID iS7MpA-Cnbs-MhlE-2XsM-4AY8-50Xo-Npabpa

the /dev/drbd0 can be viewed just the same as a /dev/sda device. It is the block device i created when setting up DRBD.

vgdisplay yields
--- Volume group ---
VG Name drbdvg
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 26
VG Access read/write
VG Status resizable
Cur LV 7
Open LV 6
Max PV 0
Cur PV 1
Act PV 1
VG Size 1.82 TB
PE Size 4.00 MB
Total PE 476917
Alloc PE / Size 337408 / 1.29 TB
Free PE / Size 139509 / 544.96 GB
VG UUID X3CjKm-WQy1-nAyw-bdeU-LBJG-cb5p-jxNeuN

and lvdisplay shows us

--- Logical volume ---
LV Name /dev/drbdvg/vm-113-disk-1
VG Name drbdvg
LV UUID WBHb9s-wLZy-hu5z-nGJa-kt2T-EFpE-1Alv8C
LV Write Access read/write
LV Status available
# open 1
LV Size 8.00 GB
Current LE 2048
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 254:8

--- Logical volume ---
LV Name /dev/drbdvg/vm-114-disk-1
VG Name drbdvg
LV UUID 1YLQ17-ZkJg-VhdG-HVaB-hvdu-eT66-0GktyI
LV Write Access read/write
LV Status available
# open 1
LV Size 8.00 GB
Current LE 2048
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 254:9

This last output has been abbreviated, but these are the two logical volumes for our two recently created VMs.

When I decided I wanted to try the NFS over DRBD, I thought I had already allocated my entire space to the Virtual Machines in the storage group drbdvg. I was searching for ways of resizing my Volume Group, or LVMs for the longest time, but then I finally realized I was looking at it the wrong way. With the way it works, I should be able to just create a logical volume for my needs inside the already existing volume group. Being uncertain, I just went for it with the command

lvcreate -L 1200G -n storage drbdvg

and it seemed to work
--- Logical volume ---
LV Name /dev/drbdvg/storage
VG Name drbdvg
LV UUID VdS6PZ-pWfa-f6Sc-CNMd-zPSd-SU6V-24PorN
LV Write Access read/write
LV Status available
# open 1
LV Size 1.17 TB
Current LE 307200
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 254:7

So I "formatted" it with the command

mkfs -t ext3 -m 1 -v /dev/drbdvg/storage

and just like formatting a partition, it seemed to work.
So now the question became, how can I access this volume from the Virtual machine that resides on this host, because without that, this whole endeavour has been for naught. Again, thinking of it simply in terms of physical devices, I thought I would simply try to add the volume to the virtual machine as a hard disk, so after shutting down the guest, on the host (asobimaster, do this on the node you created the VM on) I typed

(note the following initially seems that it will work, but upon testing the next day it is shown not to work properly. If you are interested then feel free to continue, but I don't recommend trying this. The problem gets resolved in the part 2)

qm set 113 -virtio1 /dev/drbdvg/storage

this is the same way I would add a physical drive such as /dev/sdb, where 113 is our virtual machine id (vmid) -virtio1 is the kind (ide, scsi) and the number (our boot disk is virtio0).
And it seemed to work! (after failing the first time with the error "unable to read parameters" because i tried it on the wrong node)
Now I am getting excited, so I fired up the guest and checked the /dev directory

ls /dev/vd*

and sure enough I saw
/dev/vdb in there along with all of the /dev/vda items.

Now this seems a little odd as partitions are associated with a number (i.e. /dev/vda1) and I had already created a file system, but I figured I would try to mount it and see.

mkdir /test
mount -t ext3 /dev/vdb /test
ls /test

and lo and behold, in there I find "lost+found"

after creating a directory the lost and found disappeared.

mkdir /test/testdir

but it seems that this is working as I can see my testdir in there.
As I said, I am still a newbie with LVM and DRBD so I have no idea if this setup is right, or if in a week I will find that my entire infrastructure comes crashing down around my shoulders, but for now it is enough.
And with that, I am off to bed. Will pick this up tomorrow.