[Go to Home]

Installing the 1U AMD nodes

System configuration as assembled on July 22nd, 2002

System boardTyan Thunder K7X
CPUDual AMD Athlon 2.0 GHz
Disk60 GByte UDMA5 capable
FDD1.44 MByte 3.5"
NIC2x 3COM 3C920 (3c50x compatible)
LanWorks MBA & PXE capable

Commissioning

As a basic functionality test, we used the testup floppy (NIK-RESCUE) to get a Linux 2.4.18 system booted using NFS root. This disk worked correctly on the "old-style" pizza boxes and on the D0 nodes. On this test system, it takes about 5-10 minutes to read the kernel from the floppy disk (sic!). Since a fault in the diskette drive was suspected, we tried two other drives and a different cable. This did not resolve the problem so it is likely a motherboard failure. The first test was to hook the machine up to the "default" farm network (net-18), and use the LCFG for installing this system. Since the reading of the floppy disk is extremly slow, we try netbooting using MBA or PXE.

MBA net booting

MBA booting is not a success: downloaded and installed mknbi in triode:/export/data/stage/davidg/grid/farm/etherb/ and took the kernel from the NIK-RESCUE disk to wrap it into an MBA file:
./mknbi \
  --target=linux --format=elf \
  --ip=both \
  --append="root=/dev/nfs ip=both init=/bin/sh" \
  --output=shellb.nb \
  ../../nfsboot/linux/arch/i386/boot/bzImage
And downloaded imggen (for MBA) from uni-koeln and ran it.
The resuling image will not load, because it is larger than 510 kByte (namely approx. 1200 kByte) and thus will not fit in main memory.

PXE booting

With PXE booting, you install a ready-made boot file available from the SYSLINUX distribution in /tftpboot, enable tftp in the inet configuration and install a new tftpd that supports the TSIZE options. On booder, the sources are in ~davidg/src/pxelinux/tftp-hpa-0.29 alongside the tar file that originated at kernel.org. The new in.tftpd is installed in /usr/local/sbin/.

From the DICE boot disk we take the bzImage kernel image and put it in booder:/tftpboot/vmlinuz-lcfg-6.2. In the pxelinux.cfg directory, a template is set up that can be hard linked from by the various per-host configuration files.

[root@booder /root]# less /tftpboot/pxelinux.cfg/template 
DEFAULT lcfg
LABEL lcfg
  KERNEL vmlinuz-lcfg-6.2
  APPEND root=/dev/nfs init=/etc/dcsrc 
LABEL shell
  KERNEL bzkernel-2.4.18-nfsroot
  APPEND root=/dev/nfs ip=both init=/bin/sh
Note that you cannot correctly boot the shell target using this NFS root because the 2.4.18 kernel uses the new "dev"-filesystem, which is incompatible with the RedHat-6.2 style root filesystem.
But you can use the non-devfs NFSroot kernel as described on the XEON test pages. The kernel bzImage is available from this directory.

For this test we put the node as "node18-21.farmnet" and restarted dhcpd.conf. Using the "filename" directive in dhcpd.conf we get:

	host node18-21 {
		hardware ethernet 0:E0:81:21:A8:82;
		fixed-address node18-21.farmnet.nikhef.nl;
		option root-path "/ir62";
		option option-151 "http://booder.nikhef.nl/";
		filename "/tftpboot/pxelinux.0";	#hostip C0A81215
	}
This seems great, and the system will indeed load the kernel. But after initialising a NIC, the system will wait fore a few seconds, then try BOOTP and subsequently barf:
IP-Config: can't set default route  Error -101
and it will not be able to get to the portmapper or for that matter do NFS. Restarting this cycle with the "plain" floppy boot exhibits the same problem, so it is an intrinsic difference between this node and all our existing nodes (I tried re-installing on "old" node, which just went OK).

Disabling the second NIC

[a photograph of the motherboard showing the J88 jumper position] The linux kernel (v2.2.14) used on the DICE boot disk and thus also used for PXE booting cannot handle more than one 3COM NIC at boot time. In order for the LCFG install to success, you must disbale the second NIC on the motherboard. This can be done by jumpering J88 (i.e. making the connection). This jumper is located next to the connector on the board (see figure).

After disabling NIC2 we can again boot the system using PXE, and this time the install will actually work. One caveat: do not remove the "standard" (non-SMP) kernel from the LCFG RPM list or the install of kernel-cfg will fail).

Doing proper PXE booting

In the new setup the client will boot the PXE image for ever, so we need a mechanism to "unmark" the PXE image from the boot server. The simplest way to do this is by sending a signal from the /etc/dcsrc script back to the server indicating that the install has started. And the simplest signal is an http GET message sent to the server. The standard /ir62 root filesystem contains the "GET" command from the w2c distribution, so in the dcsrc script just before the interactive question to remove the floppy, I added:
/usr/bin/GET http://$SERVER:8087/cgi/reset?IP=${IPADDR}

On booder, I installed "thttpd" in /opt/local/farming/sbin, and created the thttpd.conf file to allow CGI-bin execution of a simple script: "reset". This script is a suid-root perl script, that will comment out those lines in dhcpd.conf that match "/filename.*#hostip XXXXXXXX/", where the X's are to be replaces with the IP address of the client in hex (upper-case), as shown in the example above. After updating the dhcpd.conf file the "reset" script will restart the dhcpd service using the init scripts.
The script has some simple safeguards:

  • You can only reset the host you are connecting from
  • You must present the IP address on the command line
I realise this security is rather weak, but then the script itself is rather harmless.

The "mkdhcpdconf" script was modified to support the generation of dhcpd.conf files using this convention. If a "$BFIL" directive appears in a client definition, a commented "filename" directive will get in the dhcpd.conf file with the "#hostip" token appended. Before installation starts the "#"-sign should be removed for those nodes and the dhcpd server restarted.

In the pxelinux.cfg directory, the template as shown above is linked to "C0A812", i.e., all nodes in net-18.

Final touches

In the BIOS:
  • set fast boot to enable (saved approx. 1 minute)
  • Set the boot order to Removable, PXE, Fixed disks, CDROM
  • Set "switch-on after power failure"
  • In the MBA setup (use Ctrl+Alt+B), set the error behaviour to "timeout" instead of "wait for key" :-)


Update 2002.08.20
This system has some fundamental error: the memory to be used is Registered ECC DRAM, but in this test non-ECC memory was used. Forthermore, the Linux kernel 2.2 is not suitable for this motherboad. See the upgrade as part of the XEON tests for more details on a new 2.4 kernel.

Metainfo

Author: David Groep
Date: 2002.07.30

Comments to David Groep