![]() |
Installing the 1U AMD nodes![]() System configuration as assembled on July 22nd, 2002
CommissioningAs a basic functionality test, we used the testup floppy (NIK-RESCUE) to get a Linux 2.4.18 system booted using NFS root. This disk worked correctly on the "old-style" pizza boxes and on the D0 nodes. On this test system, it takes about 5-10 minutes to read the kernel from the floppy disk (sic!). Since a fault in the diskette drive was suspected, we tried two other drives and a different cable. This did not resolve the problem so it is likely a motherboard failure. The first test was to hook the machine up to the "default" farm network (net-18), and use the LCFG for installing this system. Since the reading of the floppy disk is extremly slow, we try netbooting using MBA or PXE.MBA net bootingMBA booting is not a success: downloaded and installed mknbi in triode:/export/data/stage/davidg/grid/farm/etherb/ and took the kernel from the NIK-RESCUE disk to wrap it into an MBA file:./mknbi \ --target=linux --format=elf \ --ip=both \ --append="root=/dev/nfs ip=both init=/bin/sh" \ --output=shellb.nb \ ../../nfsboot/linux/arch/i386/boot/bzImageAnd downloaded imggen (for MBA) from uni-koeln and ran it. The resuling image will not load, because it is larger than 510 kByte (namely approx. 1200 kByte) and thus will not fit in main memory. PXE bootingWith PXE booting, you install a ready-made boot file available from the SYSLINUX distribution in /tftpboot, enable tftp in the inet configuration and install a new tftpd that supports the TSIZE options. On booder, the sources are in ~davidg/src/pxelinux/tftp-hpa-0.29 alongside the tar file that originated at kernel.org. The new in.tftpd is installed in /usr/local/sbin/.From the DICE boot disk we take the bzImage kernel image and put it in booder:/tftpboot/vmlinuz-lcfg-6.2. In the pxelinux.cfg directory, a template is set up that can be hard linked from by the various per-host configuration files. [root@booder /root]# less /tftpboot/pxelinux.cfg/template DEFAULT lcfg LABEL lcfg KERNEL vmlinuz-lcfg-6.2 APPEND root=/dev/nfs init=/etc/dcsrc LABEL shell KERNEL bzkernel-2.4.18-nfsroot APPEND root=/dev/nfs ip=both init=/bin/shNote that you cannot correctly boot the shell target using this NFS root because the 2.4.18 kernel uses the new "dev"-filesystem, which is incompatible with the RedHat-6.2 style root filesystem. But you can use the non-devfs NFSroot kernel as described on the XEON test pages. The kernel bzImage is available from this directory. For this test we put the node as "node18-21.farmnet" and restarted dhcpd.conf. Using the "filename" directive in dhcpd.conf we get: host node18-21 { hardware ethernet 0:E0:81:21:A8:82; fixed-address node18-21.farmnet.nikhef.nl; option root-path "/ir62"; option option-151 "http://booder.nikhef.nl/"; filename "/tftpboot/pxelinux.0"; #hostip C0A81215 }This seems great, and the system will indeed load the kernel. But after initialising a NIC, the system will wait fore a few seconds, then try BOOTP and subsequently barf: IP-Config: can't set default route Error -101and it will not be able to get to the portmapper or for that matter do NFS. Restarting this cycle with the "plain" floppy boot exhibits the same problem, so it is an intrinsic difference between this node and all our existing nodes (I tried re-installing on "old" node, which just went OK). Disabling the second NIC![]() After disabling NIC2 we can again boot the system using PXE, and this time the install will actually work. One caveat: do not remove the "standard" (non-SMP) kernel from the LCFG RPM list or the install of kernel-cfg will fail).
Doing proper PXE bootingIn the new setup the client will boot the PXE image for ever, so we need a mechanism to "unmark" the PXE image from the boot server. The simplest way to do this is by sending a signal from the /etc/dcsrc script back to the server indicating that the install has started. And the simplest signal is an http GET message sent to the server. The standard /ir62 root filesystem contains the "GET" command from the w2c distribution, so in the dcsrc script just before the interactive question to remove the floppy, I added:/usr/bin/GET http://$SERVER:8087/cgi/reset?IP=${IPADDR}
On booder, I installed "thttpd" in /opt/local/farming/sbin,
and created the thttpd.conf file to allow
CGI-bin execution of a simple script: "reset".
This script is a suid-root perl script, that will
comment out those lines in dhcpd.conf that
match "/filename.*#hostip XXXXXXXX/", where the X's are
to be replaces with the IP address of the client in hex (upper-case),
as shown in the example above.
After updating the dhcpd.conf file the "reset" script will
restart the dhcpd service using the init scripts.
The "mkdhcpdconf" script was modified to support the generation of dhcpd.conf files using this convention. If a "$BFIL" directive appears in a client definition, a commented "filename" directive will get in the dhcpd.conf file with the "#hostip" token appended. Before installation starts the "#"-sign should be removed for those nodes and the dhcpd server restarted. In the pxelinux.cfg directory, the template as shown above is linked to "C0A812", i.e., all nodes in net-18.
Final touchesIn the BIOS:
Update 2002.08.20 This system has some fundamental error: the memory to be used is Registered ECC DRAM, but in this test non-ECC memory was used. Forthermore, the Linux kernel 2.2 is not suitable for this motherboad. See the upgrade as part of the XEON tests for more details on a new 2.4 kernel. Metainfo
Comments to David Groep |