center of tech

(Today’s blog post comes from our intern Liu Lei.)
Sun HPC Software, Linux Edition (“Sun HPC Software”) is an integrated open-source software solution for Linux-based HPC clusters running on Sun hardware. It provides a framework of software components to simplify the process of deploying and managing large-scale Linux HPC clusters.
For those who are interested in trying out the stack but have no access to a lot of hardware, Sun xVM VirtualBox could turn your laptop into a HPC development platform. VirtualBox 2.2.2 supports 64bit virtualization on top of both 32 bit and 64 bit host OS.
Here are the hardware specs of my Lenovo T61 laptop and a list of software I used.
Hardware Specification | Software Configuration |
|---|---|
Intel(R) Core(TM)2 Duo CPU T7300 | Ubuntu 9.04 |
Please Note:
If you want to go through the entire tutorial and configure the lustre file system as instructed, at least 60 GB of free disk space is required which includes the space for downloading the required software (CentOS 5.3, VirtualBox 2.2.2 and Sun HPC Stack) and for accommodating six Virtual Machines. Also, 2 GB RAM is the minimum requirement for running six Virtual Machines smoothly simultaneously.
You will need access to the following software or software repositories.
The installation procedure described in this guide installs the Sun HPC Software on a cluster configured similar to that shown in Figure 1. This example cluster contains:
The Cobbler provisioning tool facilitates provisioning (via DHCP/PXE) of diskful or diskless node configurations. For diskless nodes, Cobbler uses the oneSIS system administration tool to provide NFS-mounted root filesystems for each node class, such as a Lustre server, compute node, or login node.The following figure shows example cluster configuration using an Ethernet or Infiniband network as the compute network.



[root@mgmt1 ~]# ifconfig -a
eth0 Link encap:Ethernet HWaddr 08:00:27:EB:B8:42
inet addr:10.0.2.15 Bcast:10.0.2.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:feeb:b842/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1189 errors:0 dropped:0 overruns:0 frame:0
TX packets:1294 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1358518 (1.2 MiB) TX bytes:85153 (83.1 KiB)
Interrupt:177 Base address:0xc020
eth1 Link encap:Ethernet HWaddr 08:00:27:9B:A4:22
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Interrupt:193 Base address:0xc240
We need to assign a static IP address for eth1 in order to active it. We can use the Network Configuration tool in Gnome to do this.

Select “eth1″ and click “Edit”. Then type in static IP address as below. Tick the option of “Activate devices when computer starts”.
Sometimes, it might be a bit tricky to find “OK” button because of resolution of screen. The solution is to hold down the ALT key and click/drag the window to a new position.

Click “Activate” to activate eth1.

Now we should have Internet access both through eth0 and an internal network on eth1.
[root@RHEL-mgmt1 ~]# wget www.sun.com
--12:40:24-- http://www.sun.com/
Resolving www.sun.com... 72.5.124.61
Connecting to www.sun.com|72.5.124.61|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `index.html.1'
[ ] 30,441 10.2K/s in 2.9s
12:40:34 (10.2 KB/s) - `index.html.1' saved [30441]
I personally find it very convenient to be able to connect to the virtual box through SSH. In particular, I could copy & paste some long commands in the ssh terminal before I install Guest Additions.
We have a Guest Machine (mgmt1) with a running ssh server which accepts connections on the TCP port 22. Our goal is to make any packet arriving at a given TCP port (i.e. 2222) of the Host machine to be forwarded to the TCP port 22 of the Guest Machine. Fortunately, there is Virtualbox command which permits us to do it almost instantly.
Open a terminal on the host machine and enter the following commands:
larry@LiuLei:~$ VBoxManage setextradata "mgmt1" "VBoxInternal/Devices/pcnet/0/LUN#0/Config/ssh/Protocol" TCP VirtualBox Command Line Management Interface Version 2.2.2 (C) 2005-2009 Sun Microsystems, Inc. All rights reserved. larry@LiuLei:~$ VBoxManage setextradata "mgmt1" "VBoxInternal/Devices/pcnet/0/LUN#0/Config/ssh/GuestPort" 22 VirtualBox Command Line Management Interface Version 2.2.2 (C) 2005-2009 Sun Microsystems, Inc. All rights reserved. larry@LiuLei:~$ VBoxManage setextradata "mgmt1" "VBoxInternal/Devices/pcnet/0/LUN#0/Config/ssh/HostPort" 2222 VirtualBox Command Line Management Interface Version 2.2.2 (C) 2005-2009 Sun Microsystems, Inc. All rights reserved.
The HostPort must be greater than or equal to 1024 since listening on ports 0-1023 needs root permissions (and Virtualbox usually doesn’t). Instead, GuestPort has to be equivalent to the one on which the virtualized ssh is listening.
Once you have typed the above commands, you need to close the Guest Machine(a reboot won’t be sufficient), restart it and then connect via ssh with following commands:
larry@LiuLei:~$ ssh -p 2222 root@localhost root@localhost's password: Last login: Wed May 8 08:59:05 2009 [root@mgmt1 ~]# larry@LiuLei:~$ sftp -o port=2222 root@localhost Connecting to localhost... root@localhost's password: sftp>
The virtual machine, “mgmt1″, is ready to be the head node of virtual clusters.
Additional Information:
If you want an ssh connection to more than one Guest Machine, you may assign different numbers to the HostPort (for example 2222 and 2301), GuestPort could remain the same(22), then close and start (a reboot won’t be sufficient) all the Guest Machines (e.g. mgmt1 and mgmt2).
larry@LiuLei:~$ ssh -p 2222 root@localhost root@localhost's password: Last login: Fri May 8 14:01:37 2009 from 10.0.2.2 [root@mgmt1 ~]# larry@LiuLei:~$ ssh -p 2301 root@localhost root@localhost's password: Last login: Fri May 8 14:05:20 2009 [root@mgmt2 ~]#
If you encounter the following problem, remove */.ssh/known_hosts and use ssh to log in again.
larry@LiuLei:~$ ssh -p 2222 root@localhost @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! Someone could be eavesdropping on you right now (man-in-the-middle attack)! It is also possible that the RSA host key has just been changed. The fingerprint for the RSA key sent by the remote host is 83:86:f1:ac:b9:ca:f5:53:03:49:92:97:8a:49:5a:f3. Please contact your system administrator. Add correct host key in /home/larry/.ssh/known_hosts to get rid of this message. Offending key in /home/larry/.ssh/known_hosts:1 RSA host key for [localhost]:2222 has changed and you have requested strict checking. Host key verification failed.
/root/centos-5.3-x86_64-bin-DVD.iso /mnt/centos5.3 iso9660 ro,loop 0 0
[root@mgmt1 ~]# mkdir -p /mnt/centos5.3 [root@mgmt1 ~]# mount -a
# cat /etc/yum.repos.d/centos.repo [centos] name=CentOS DVD baseurl=file:///mnt/centos5.3 enabled=1 gpgcheck=0
[root@mgmt1 ~]# rpm -qa | grep dialog
[root@mgmt1 ~]# yum install dialog
/root/sun-hpc-linux-rhel-2.0.iso /media/sun_hpc_linux iso9660 ro,loop 0 0
[root@mgmt1 ~]# mkdir -p /media/sun_hpc_linux [root@mgmt1 ~]# mount -a
[root@mgmt1 ~]# rpm -ivh /media/sun_hpc_linux/SunHPC/sunhpc-release.rpm Preparing... ########################################### [100%] 1:sunhpc-release ########################################### [100%]
[root@mgmt1 ~]# sunhpc_install
The following screens indicates that the installation is successful

To set up a Cobbler profile on a head node running CentOS 5.3, follow the procedure below. The examples assume that the headnode has two network interfaces: eth0 connects to the Internet or public network; eth1 connects to the rest of the HPC cluster nodes and serves as a DHCP interface. The Cobbler profile is used to provision the compute cluster.
[root@mgmt1 ~]# mount --snip-- /root/CentOS-5.3-x86_64-bin-DVD.iso on /mnt/centos5.3 type iso9660 (rw,loop=/dev/loop0) /root/sun-hpc-linux-rhel-2.0.iso on /media/sun_hpc_linux type iso9660 (rw,loop=/ dev/loop1) --snip--
[root@mgmt1 ~]# sunhpc_setup --profile=centos5.3 --distro-image=/mnt/centos5.3 \ --sunhpc-repo=/media/sun_hpc_linux --netif=eth1 --bootdisk=hda Initializing Cobbler configuration... Done Disabling the iptables... Done Restarting dhcpd/cobblerd/httpd... Done Generating SSH key in /root/.sshDone Copying /mnt/centos5.3 to /var/www/cobbler/ks_mirror/centos5.3... Done Created 'sunhpc_base_centos5.3' repo and copying... Done Created 'sunhpc_extras_centos5.3' repo and copying... Done Created 'sunhpc_lustre_centos5.3' repo and copying... Done Created 'sunhpc_updates_centos5.3' repo and copying... Done Creating distro 'centos5.3' in cobbler... Done Creating profile 'centos5.3' in cobbler... Done Creating profile 'centos5.3-lustre' in cobbler... Done
[root@mgmt1 ~]# cobbler profile list centos5.3 centos5.3-lustre
[root@mgmt1 ~]# sunhpc_setup --profile=centos5.3-onesis --diskless \ --netif=eth1 --onesis-exclude=/root --bootdisk=hda Initializing Cobbler configuration... Done Disabling the iptables... Done Restarting dhcpd/cobblerd/httpd... Done Copying / to /var/lib/oneSIS/image/centos5.3-onesis... Done Creating initrd... Done Applying OneSIS configuration... Done Updated /etc/exports and restarting NFS... Done Copying /var/lib/oneSIS/image/centos5.3-onesis to /var/lib/oneSIS/image /centos5.3-onesis-lustre ... Done. Un-specializing centos5.3-onesis-lustre ... Done. Removing SunHPC Lustre Client group from centos5.3-onesis-lustre ... Done. Installing perl-TimeDate from distro... Done. Installing compat-libcom_err from distro... Done. Installing uuidd from distro... Done. Installing libnet from distro... Done. Installing python-xml from distro... Done. Upgrading e2fsprogs for ldiskfs support... Done. Removing base kernel from centos5.3-onesis-lustre ... Done. Installing SunHPC Lustre Server group to centos5.3-onesis-lustre ... Done. Creating oneSIS initrd for centos5.3-onesis-lustre ... Done. Converting centos5.3-onesis-lustre to oneSIS rootfs image ... Done. Adding /var/lib/oneSIS/image/centos5.3-onesis-lustre to /etc/exports ... Done. Now (re)starting NFS... Done. Creating distro 'centos5.3-onesis' in cobbler... Done Creating distro 'centos5.3-onesis-lustre' in cobbler... Done Creating profile 'centos5.3-onesis' in cobbler... Done Creating profile 'centos5.3-onesis-lustre' in cobbler... Done
This command creates two oneSIS system images, one for diskless Lustre client nodes and one for diskless Lustre server nodes, in the directory /var/lib/oneSIS/image on the head node.
[root@mgmt1 ~]# ls /var/lib/oneSIS/image centos5.3-onesis centos5.3-onesis-lustre
[root@mgmt1 ~]# cobbler profile list centosl5.3-onesis centos5.3-onesis-lustre
[root@mgmt1 ~]# setup_cfengine parsing setting fix cfengine settings in gtdb update configuration files set up cfengine setup cfengine done
Note:
For more information about provisioning the Head Node, please refer to pages 20-22 of the Sun HPC Software, Linux Edition manual.
The Sun HPC Software manages the client node provisioning process using the Sun HPC Software, Linux Management Database (gtdb) provided with the Sun HPC Software. To provision the client nodes in the compute cluster, you will first populate gtdb using the Sun HPC Software, Linux Management Tools (gtt). You will then generate configuration files for provisioning tools such as Cobbler, which will be used to provision each node in the cluster from the head node.
The Sun HPC Software provides support for provisioning four types of client nodes using a Cobbler service on the head node:
The tables below shows the clients we use to provision a lustre file system.
| Node ID | Configuration | Role | Provisioning Interface MAC Address |
Provisioning Interface IP Address |
|---|---|---|---|---|
| mgmt1 | Diskful | Management | 10.1.80.1 | |
| dfmds01 | Diskful | Lustre MDS | 08:00:27:5E:C2:77 | 10.1.80.4 |
| dflcn001 | Diskful | Luster client/compute node | 08:00:27:60:12:CB | 10.1.80.8 |
| dloss01 | Diskless | Luster OSS | 08:00:27:AF:AB:A8 | 10.1.80.12 |
| dloss02 | Diskless | Luster OSS | 08:00:27:C9:6C:C7 | 10.1.80.13 |
| dllcn001 | Diskless | Luster client/compute node | 08:00:27:74:65:51 | 10.1.80.1 |
Notes:
For more information about provisioning client nodes, please refer to pages 28-31 of the Sun HPC Software, Linux Edition manual.
The Sun HPC Software, Management Database (gtdb) is provided to relieve the HPC cluster administrator of repetitive and error-prone service configuration management. After populating the database with information about the HPC cluster (e.g. hostnames, network addresses, etc.), the administrator can then generate configuration files for supported services (e.g. ConMan, PowerMan, SLURM) and system databases (e.g. /etc/hosts, /etc/genders, etc.).
Follow the procedure below to population the Sun HPC Software Management Database (gtdb) and generate configuration files for provisioning.

[root@mgmt1 ~]# gtt host --add --name dflcn001 --network "hwaddr=08:00:27:60:12:CB,ipaddr=10.1.80.8,device=eth0,bootnet=true" --attribute "profile=centos5.3" --attribute static Host added successfully: dflcn001 Network added successfully: eth0 Attribute added successfully to compute1: profile Attribute added successfully to compute1: static [root@mgmt1 ~]# gtt host --add --name dllcn001 --network "hwaddr=08:00:27:74:65:51,ipaddr=10.1.80.15,device=eth0,bootnet=true" --attribute "profile=centos5.3-onesis" --attribute static Host added successfully: dllcn001 Network added successfully: eth0 Attribute added successfully to compute1: profile Attribute added successfully to compute1: static [root@mgmt1 ~]# gtt host --add --name dfmds01 --network "hwaddr=08:00:27:5E:C2:77,ipaddr=10.1.80.4,device=eth0,bootnet=true" --attribute "mds" --attribute "profile-centos5.3-lustre" --attribute static Host added successfully: dfmds01 Network added successfully: eth0 Attribute added successfully to compute1: profile Attribute added successfully to compute1: static [root@mgmt1 ~]# gtt host --add --name dloss01 --network "hwaddr=08:00:27:C9:6C:C7,ipaddr=10.1.80.12,device=eth0,bootnet=true" --attribute "oss" --attribute "profile=centos5.3-onesis-lustre" --attribute static Host added successfully: dloss01 Network added successfully: eth0 Attribute added successfully to compute1: profile Attribute added successfully to compute1: static [root@mgmt1 ~]# gtt host --add --name dloss02 --network "hwaddr=08:00:27:AF:AB:A8,ipaddr=10.1.80.13,device=eth0,bootnet=true" --attribute "oss" --attribute "profile=centos5.3-onesis-lustre" --attribute static Host added successfully: dloss02 Network added successfully: eth0 Attribute added successfully to compute1: profile Attribute added successfully to compute1: static
The option –attribute static enables clients to be provided with a static IP address after provisioning. Without this attribute, the clients will be provided with a dynamic IP address allocated by the DHCP server running on the head node.
[root@mgmt1 ~]# gtt config --update all Updating config: cfservd /var/lib/sunhpc/cfengine/var/cfengine/inputs/cfservd.conf: Wrote 35 lines Updating config: cfupdate /var/lib/sunhpc/cfengine/var/cfengine/inputs/update.conf: Wrote 70 lines Updating config: cobbler /var/lib/sunhpc/cfengine/tmp/cobbler.csv: Wrote 2 lines Updating config: conman /var/lib/sunhpc/cfengine/etc/conman.conf: Wrote 179 lines Updating config: genders ^[[B/var/lib/sunhpc/cfengine/etc/genders: Wrote 3 lines Updating config: hosts /var/lib/sunhpc/cfengine/etc/hosts: Wrote 5 lines Updating config: ntp /var/lib/sunhpc/cfengine/etc/ntp.conf: Wrote 24 lines Updating config: powerman /var/lib/sunhpc/cfengine/etc/powerman/powerman.conf: Wrote 5 lines Updating config: slurm /var/lib/sunhpc/cfengine/etc/slurm/slurm.conf: Wrote 37 lines
[root@mgmt1 ~]# cfagent -q
[root@mgmt1 ~]# populate_cobbler_system /var/lib/sunhpc/cfengine/tmp/cobbler.csv Internet Systems Consortium DHCP Server V3.0.5-RedHat Copyright 2004-2006 Internet Systems Consortium. All rights reserved. For info, please visit http://www.isc.org/sw/dhcp/ Shutting down dhcpd: [ OK ] Starting dhcpd: [ OK ]
[root@mgmt1 ~]# cobbler list
distro centos5.3
profile centos5.3
system dflcn001
profile centos5.3-lustre
system dfmds01
distro centos5.3-onesis
profile centos5.3-onesis
system dllcn001
distro centos5.3-onesis-lustre
profile centos5.3-onesis-lustre
system dloss01
system dloss02
repo sunhpc_base_centos5.3
repo sunhpc_extras_centos5.3
repo sunhpc_lustre_centos5.3
repo sunhpc_updates_centos5.3
[root@mgmt1 ~]# cobbler profile edit --name=centos5.3 --kopts="selinux=0" [root@mgmt1 ~]# cobbler profile edit --name=centos5.3-lustre --kopts="selinux=0" [root@mgmt1 ~]# cobbler profile edit --name=centos5.3-onesis --kopts="selinux=0" [root@mgmt1 ~]# cobbler profile edit --name=centos5.3-onesis-lustre --kopts="selinux=0"
The client nodes are now ready to boot.
Notes:
If you ever reboot the headnode during provision process, you will need to manually start httpd and nfs before continuing provision.
[root@mgmt1 ~]# /etc/init.d/httpd status
httpd is stopped
[root@mgmt1 ~]# /etc/init.d/httpd start
Starting httpd: httpd: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1 for ServerName
[ OK ]
[root@mgmt1 ~]# /etc/init.d/httpd status
httpd (pid 4319 4318 4317 4316 4315 4314 4313 4312 4310) is running...
[root@mgmt1 ~]# /etc/init.d/nfs status
rpc.mountd is stopped
nfsd is stopped
rpc.rquotad is stopped
[root@mgmt1 ~]# /etc/init.d/nfs start
Starting NFS services: [ OK ]
Starting NFS quotas: [ OK ]
Starting NFS daemon: [ OK ]
Starting NFS mountd: [ OK ]
Boot dflcn001 from network and the installation started.


Notes:
You need to keep an eye on the installation process and after the installation finished when dlfcn001 starts to reboot, you must manually shut down it and boot from local disk, otherwise, the installation process will repeat once more.
Once the client provisioning completes, run the following commands to test password-less ssh access to the provisioned clients and add them to .ssh/known_hosts
[root@mgmt1 ~]# PDSH_SSH_ARGS_APPEND="-o StrictHostKeyChecking=no" pdsh -g profile hostname
Warning messages similar to the following are displayed to indicate the clients have been added to the known_hosts list:
dflcn001: Warning: Permanently added 'dflcn001,10.1.80.8' (RSA) to the list of known hosts. [root@mgmt1 ~]# ssh dflcn001 [root@dflcn001 ~]
Boot dllcn001 from network and after dllcn001 boots up, just press “Enter” if the following screen pops up


[root@mgmt1 ~]# PDSH_SSH_ARGS_APPEND="-o StrictHostKeyChecking=no" pdsh -g profile hostname --snip-- dllcn001: Warning: Permanently added 'dllcn001,10.1.80.15' (RSA) to the list of known hosts. --snip-- [root@mgmt1 ~]# ssh dllcn001 [root@dllcn001 ~]#
Notes:
During the process of provisioning diskless nodes(including dllcn001, dloss01 and dloss02), you may encounter the following problem. You need to wait for a while and there is a chance that the problem will go away by itself (nfs: server xx OK). However, if “not responding” problem persists, you have no choice but to reboot the node until the problem is gone. Although it sounds awkward and is sometime time-consuming, there seems to be no better solution.

Boot dfmds01 from network and after the installation completes manually reboots it from local disk as you did when provisioning dflcn001.
After dfmds01 boots up,you should be able to ssh it from the headnode
[root@mgmt1 ~]# PDSH_SSH_ARGS_APPEND="-o StrictHostKeyChecking=no" pdsh -g profile hostname --snip-- dfmds01: Warning: Permanently added 'dfmds01,10.1.80.4' (RSA) to the list of known hosts. --snip-- [root@mgmt1 ~]# ssh dfmds01 [root@dfmds01 ~]#
Boot dloss01 from network. The following screen may or may not pop up shortly after you start dloss01, if it does pop up you need to choose centos5.3-onesis-lustre in time

After dloss01 boots up you can ssh it from headnode
[root@mgmt1 ~]# PDSH_SSH_ARGS_APPEND="-o StrictHostKeyChecking=no" pdsh -g profile hostname --snip-- dloss01: Warning: Permanently added 'dloss01,10.1.80.12' (RSA) to the list of known hosts. --snip-- [root@mgmt1 ~]# ssh dloss01 bash-3.2# PS1=['\u@\H:\w]# ' [root@dloss01:~]#
Then use the same method to provision dloss02
After provisioning the client nodes, run a simple pdsh command to check if all the provisioned clients are accessible. A typical result is:
[root@mgmt1 ~]# pdsh -g profile uptime dllcn001: 14:25:36 up 2:45, 0 users, load average: 0.02, 0.02, 0.00 dloss01: 13:25:52 up 2:44, 0 users, load average: 0.09, 0.03, 0.00 dflcn001: 14:25:59 up 2:45, 0 users, load average: 0.01, 0.01, 0.00 dfmds01: 19:25:39 up 1:49, 2 users, load average: 0.00, 0.00, 0.08 dloss02: 18:25:49 up 1:55, 0 users, load average: 0.00, 0.00, 0.04
Once the client nodes have been provisioned, they can serve as Lustre server nodes or Lustre client nodes, regardless of whether they are diskful or diskless.
[root@dfmds01 ~]# vi /etc/modprobe.conf --snip-- alias eth0 pcnet32 alias eth1 pcnet32 alias scsi_hostadapter ata_piix alias snd-card-0 snd-intel8x0 options snd-card-0 index=0 options snd-intel8x0 index=0 options lnet networks=tcp --snip--
[root@dfmds01 ~]# mkfs.lustre --mdt --mgs --device-size=100000 /tmp/mdt
Permanent disk data:
Target: lustre-MDTffff
Index: unassigned
Lustre FS: lustre
Mount type: ldiskfs
Flags: 0x75
(MDT MGS needs_index first_time update )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: mdt.group_upcall=/usr/sbin/l_getgroups
checking for existing Lustre data: not found
2 6 18
formatting backing filesystem ldiskfs on /dev/loop0
target name lustre-MDTffff
4k blocks 25000
options -i 4096 -I 512 -q -O dir_index,uninit_groups -F
mkfs_cmd = mkfs.ext2 -j -b 4096 -L lustre-MDTffff -i 4096 -I 512 -q -O dir_index,uninit_groups -F /dev/loop0 25000
Writing CONFIGS/mountdata
[root@dfmds01 ~]# mkdir /mnt/mdt [root@dfmds01 ~]# mount -t lustre -o loop /tmp/mdt /mnt/mdt
[root@dfmds01 ~]# lctl list_nids 10.1.80.4@tcp [root@dfmds01 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/hda3 19G 1.7G 16G 10% / /dev/hda1 122M 12M 104M 11% /boot tmpfs 250M 0 250M 0% /dev/shm /dev/loop0 86M 4.2M 77M 6% /mnt/mdt
[root@mgmt1 ~]# vi /var/lib/oneSIS/image/centos5.3-onesis-lustre/etc/modprobe.conf --snip-- alias eth0 pcnet32 alias eth1 pcnet32 alias scsi_hostadapter ata_piix alias snd-card-0 snd-intel8x0 options snd-card-0 index=0 options snd-intel8x0 index=0 options lnet networks=tcp --snip--
[root@dloss01:~]# fdisk -l
Disk /dev/hda: 21.4 GB, 21474836480 bytes
16 heads, 63 sectors/track, 41610 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes
Disk /dev/hda doesn't contain a valid partition table
[root@dloss01:~]# mkfs.lustre --ost --mgsnode=10.1.80.4@tcp /dev/hda
Permanent disk data:
Target: lustre-OSTffff
Index: unassigned
Lustre FS: lustre
Mount type: ldiskfs
Flags: 0x72
(OST needs_index first_time update )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=10.1.80.4@tcp
checking for existing Lustre data: not found
device size = 20480MB
2 6 18
formatting backing filesystem ldiskfs on /dev/hda
target name lustre-OSTffff
4k blocks 0
options -J size=400 -i 16384 -I 256 -q -O dir_index,uninit_groups -F
mkfs_cmd = mkfs.ext2 -j -b 4096 -L lustre-OSTffff -J size=400 -i 16384 -I 256 -q -O
dir_index,uninit_groups -F /dev/hda
Writing CONFIGS/mountdata
[root@mgmt1 ~]# mkdir -p /var/lib/oneSIS/image/centos5.3-onesis-lustre/mnt/ost0
[root@dloss01:~]# mount -t lustre /dev/hda /mnt/ost0
Notes:
You have to make sure that mds(dfmds01) is running when creating OST. If you ever reboot dfmds01, you have to remount mdt device in order for mds to function.
[root@dloss01:~]# mount -t lustre /dev/hda /mnt/ost0 mount.lustre: mount /dev/hda at /mnt/ost0 failed: Input/output error Is the MGS running? [root@dfmds01 ~]# mount -t lustre -o loop /tmp/mdt /mnt/mdt [root@dfmds01 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/hda3 19G 1.7G 16G 10% / /dev/hda1 122M 12M 104M 11% /boot tmpfs 124M 0 124M 0% /dev/shm /dev/loop0 86M 4.2M 77M 6% /mnt/mdt [root@dloss01:~]# mount -t lustre /dev/hda /mnt/ost0 [root@dloss01:~]#
[root@mgmt1 ~]# vi /var/lib/oneSIS/image/centos5.3-onesis/etc/modprobe.conf --snip-- alias eth0 pcnet32 alias eth1 pcnet32 alias scsi_hostadapter ata_piix alias snd-card-0 snd-intel8x0 options snd-card-0 index=0 options snd-intel8x0 index=0 options lnet networks=tcp --snip--
[root@mgmt1 ~]# mkdir /var/lib/oneSIS/image/centos5.3-onesis/mnt/lustre
[root@dllcn001 ~]# mount -t lustre 10.1.80.4@tcp:/lustre /mnt/lustre
[root@dllcn001 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
rootfs 38G 19G 17G 53% /
10.1.80.1:/var/lib/oneSIS/image/centos5.3-onesis
38G 19G 17G 53% /
/dev/ram 100M 1.5M 99M 2% /ram
none 61M 88K 61M 1% /dev
tmpfs 61M 0 61M 0% /dev/shm
10.1.80.4@tcp:/lustre
40G 890M 37G 3% /mnt/lustre
You can see that he size of Lustre file system is 40G which is equal to the sum of harddisks(/dev/hda) on dloss01 and dloss02.
[root@dflcn001 ~]# uname -a Linux dflcn001 2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64 x86_64 x86_64 GNU/Linux [root@dflcn001 ~]# rpm -qa|grep kernel-ib kernel-ib-1.3.1-2.6.18_128.el5
[root@dflcn001 ~]# vi /etc/modprobe.conf +options lnet networks=tcp
[root@dflcn001 ~]# mkdir /mnt/lustre
[root@dflcn001 ~]# mount -t lustre 10.1.80.4@tcp:/lustre /mnt/lustre
[root@dflcn001 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/hda3 19G 1.6G 16G 10% /
/dev/hda1 122M 12M 104M 10% /boot
tmpfs 61M 0 61M 0% /dev/shm
10.1.80.4@tcp:/lustre
20G 445M 19G 3% /mnt/lustre
Now we should able to copy files into /mnt/lustre and access these files from both Lustre clients.
[root@dllcn001:~]# cp ~/sun-hpc-linux-rhel-2.0.iso /mnt/lustre/
[root@dflcn001 ~]# ls /mnt/lustre/
sun-hpc-linux-rhel-2.0.iso
[root@dflcn001 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/hda3 19G 1.6G 16G 10% /
/dev/hda1 122M 12M 104M 10% /boot
tmpfs 61M 0 61M 0% /dev/shm
10.1.80.4@tcp:/lustre
40G 1.5G 36G 4% /mnt/lustre
This tutorial illustrated the procedure of utilizing Sun HPC software stack, Linux Edition 2.0 and Sun xVM Virtual Box to set up a High Performance Computing development platform on a moderate laptop. We demonstrated a complete open source solution for HPC software developers who has no access to computer clusters and proprietary softwares. All of software components in Sun HPC software stack, Linux Edition are exactly as same as those running on world leading supercomputers. We hope more and more HPC software developers could take advantage of such a “mobile” development platform in their projects. We will also be eager to receive feedbacks and wishlists for the future Sun HPC software stack releases. Download the Sun HPC software stack and Join the community today!
Source/Kaynak : http://blogs.sun.com/giraffe/entry/building_a_sun_hpc_virtual