Subsections

4. Configuration by Machine Type

4.1 Introduction

The various computers involved in building a site of the DataGrid testbed fall into just a few functional categories. For a typical site this involves setting up a Gatekeeper node which acts as the portal for the site and as the front-end of the local batch system controlling a set of Worker nodes. Combined, this comprise a Computing Element (CE). Most sites operating a CE will provide some persistence storage with a Storage Element which is the interface to storage devices. To access the testbed for job submission a User Interface (UI) machine is required.

Some larger sites will offer additional services like the Job Submission Service which requires to install a Resource Broker (RB). If a RB is setup the site has to run a BDII information index server, too. This is a LDAP server that acts as an information index.

To allow users to run jobs with a livetime that exceeds the livetime of the proxy certificate a MyProxy machine that provides the proxy renewal service has to be setup.

Again, this is not necessary for the typical site.

Inside the testbed for each VO a replica catalog (RC) and a VO server has to be setup and maintained.

4.1 gives the list of different machine types and the services which must run on each.


Table 4.1: Machine Types and Necessary Services
Daemon UI IS CE WN SE RC RB MProxy BDII
Globus or Edg Gatekeeper - - XX - XX - - - -
Globus Rep. Cat. - - - - - XX - - -
GSI-enabled FTPD - - XX - XX - XX - -
Globus MDS - XX XX - XX - - - -
Info-MDS - XX XX - XX - - - -
Broker - - - - - - XX - -
Job submission serv. - - - - - - XX - -

Info. Index - - - - - - - - XX
Logging & Bookkeeping - - - - - - XX - -
Local Logger - - XX - XX - XX - -
CRL Update - - XX - XX - XX - -
Grid mapfile Update - - XX - XX - XX - -
RFIO - - - - XX - - - -
GDMP - - - - XX - - - -
MyProxy - - - - - - - XX -

Before a site is setup a few things common to all nodes should be mentioned. Please read, before you start the section about time synchronization 8.1. If you install your site using LCFG this service will be handled by the tool. In case you opt for manual configuration you have to install, configure and start the service on every node. In case you are using AFS make shure that your local AFS time server is in sync with the rest of the grid.

To get a better understanding of the security model used by EDG and globus read the section 8.2.

In case you need more information about a given service have a look at the sections in the Appendix.

4.2 Using LCFG

Before the steps needed to configure individual nodes are described, a walkthrough the main configuration file (site-cfg.h) is given. For details about using LCFG follow the references given in the introduction (1).

In several places of this guide verbatim text from configuration files is used. In some cases the original lines have been to long to be given here. In these case the lines have been split. Whenever this was done a \ character at the end of the line was used. In case you use the sample code given here, please join these lines again.

The example site-cfg.h file has been taken from CERN. To keep the information compact we assume that only two VOs are supported.

 /*
  site-cfg.h
  ==================================================
  SITE SPECIFIC CONFIGURATION
*/

/* COMMON LCFG DEFINITIONS ------------------------------------------------ */

#define LCFGSRV                 lxshare0371.cern.ch
#define URL_SERVER_CONFIG       http://lxshare0371.cern.ch
You have to set this to the full name of your LCFG server.
/*SOURCE TREE LOCATIONS -------------------------------------------------- */

/* Define the root locations of the Globus tree and the EDG tree.   These 
   are used in many configuration files and for setting the ld.so.conf 
   libraries. NOTE: the underscore at the end of the define.  Used to avoid
   confusion with the GLOBUS_LOCATION and EDG_LOCATION tags in configuration
   files. */
#define GLOBUS_LOCATION_        /opt/globus
#define EDG_LOCATION_           /opt/edg

/* COMMON GRID DEFINITIONS ------------------------------------------------ */

/* CE AND SE HOST NAMES. These are defined here because they are used in
   some of the site definitions. */

/* ComputingElement hostname */
#define CE_HOSTNAME             lxshare0227.cern.ch
/* StorageElement hostname */
#define SE_HOSTNAME             lxshare0393.cern.ch

To handle multiple CEs and SEs is possible and requires some modifications in the configuration files. This is beyond the scope of this guide.

/* COMMON SITE DEFINITIONS ------------------------------------------------ */

#define LOCALDOMAIN             cern.ch
#define SITE_MAILROOT           SITE_MANAGERS_MAIL_ADDRESS@YOURSITE.ch
#define SITE_GATEWAYS           137.138.1.1
/* Allowed networks (useful for tcpwrappers) */
#define SITE_ALLOWED_NETWORKS   127.0.0.1, 137.138., 128.141.
#define SITE_NAMESERVERS        137.138.16.5 137.138.17.5
Please note that some lists are comma separated while others, like the SITE_NAMESERVERS, are separated by a single white space !
/* The netmask */
#define SITE_NETMASK            255.255.0.0
/* NTP server (domain and hostname) */
#define SITE_NTP_DOMAIN         cern.ch
#define SITE_NTP_HOSTNAME       ip-time-1
/* The time zone */
#define SITE_TIMEZONE           Europe/Paris
/* Site name */
#define SITE_NAME_              CERN-PRO-1-4
This name must be unique inside the whole grid! Make sure you coordinate your choice with the other site administrators.
/* Site EDG version */
#define SITE_EDG_VERSION        v1_4_3
/* Site installation date       year month day time */
#define SITE_INSTALLATION_DATE_ 20021118120000Z
/* Site distinguished name. */
#define SITE_DN_                \"dc=cern, dc=ch, o=Grid\"
You can find this information in the host certificate of your CE node.
/* All the WN (used by /etc/export configuration of /home NFS Mount
   e.g. testbed*.lnl.infn.it. Needed by ComputingElement.h) */
#define SITE_WN_HOSTS           lxshare*.cern.ch,tbed0*.cern.ch,adc*.cern.ch
/* All the SE hosts (comma separated list) */
#define SITE_SE_HOSTS_          SE_HOSTNAME
/* List (comma separated) of the Computing Element(s) of your site */
#define SITE_CE_HOSTS_          CE_HOSTNAME:2119/jobmanager-pbs-short,\
CE_HOSTNAME:2119/jobmanager-pbs-infinite
This is the list of CEs and their local resource managers. 2119 is the port of the gate keeper.
/* The default configuration of MDS is that there is a GRIS running on
   each functional node (CE, SE).  There is a single site-level GIIS 
   running by default on the CE.  This site-level GIIS then registers to 
   the top-level GIIS for the production or development testbed.  The
   details are handled via the globuscfg configuration object. */
/* Usually use a name like nikhefpro or nikhefdev for the production
   or development testbeds. */
#define SITE_GIIS               cern
#define SITE_GIIS_HOSTNAME      CE_HOSTNAME
/* These point to the next highest level in the MDS hierarchy.  Ask to
   find out the parameters for this.  At time of tagging these were:
   edgdev on lxshare0372.cern.ch for DEVELOPMENT Testbed
   edgpro on lxshare0373.cern.ch for PRODUCTION (Application) Testbed 
   but DO ask to be sure.*/
#define TOP_GIIS                edgpro
#define TOP_GIIS_HOSTNAME       lxshare0373.cern.ch
For this information you should contact the integration team. Contact information is provided on the WP6 web page.
/* COMMON DEFAULT VALUES -------------------------------------------------- */
/* This defines the default location for the host certificates.  If
this is different for your site define the new value here.  If you
need to change it for the CE or SE separately, see below. */
#define SITE_DEF_HOST_CERT  /etc/grid-security-local/hostcert.pem
#define SITE_DEF_HOST_KEY   /etc/grid-security-local/hostkey.pem
#define SITE_DEF_GRIDMAP    /etc/grid-security/grid-mapfile
#define SITE_DEF_GRIDMAPDIR /etc/grid-security/gridmapdir/

/* DATA MGT PARAMETERS FOR SEVERAL NODE TYPES  ---------------------------- */

/* These variables define which VOs your site supports.  At least one must
   be defined.  For each line the RC and GDMP configurations will be done and
   on the SE a GDMP server will be configured.  It also will create 50 
   accounts for each defined VO. 

   You must define the associated password for each of the supported VOs. 
   Contact the site administrators to obtain the passwords. 
*/
#define SE_VO_ALICE
#define SE_GDMP_REP_CAT_ALICE_PWD  ALICE_PASSWORD

#define SE_VO_ATLAS
#define SE_GDMP_REP_CAT_ATLAS_PWD  ATLAS_PASSWORD
For this information contact the integration team or the VO managers.
/* COMPUTING ELEMENT DEFINITIONS ------------------------------------------ */

/* Subject of the certificate */
#define CE_CERT_SBJ             \"/O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0227.cern.ch\"
/* Some site and host information (it goes in globus.conf)*/
#define CE_HOST_DN              \"hn=lxshare0227.cern.ch, dc=cern, dc=ch, o=Grid\"
/* Full path of the certificate */
#define CE_CERT_PATH            SITE_DEF_HOST_CERT
/* Full path of the secret key */
#define CE_SECKEY_PATH          SITE_DEF_HOST_KEY
/* System administrator e-mail */
#define CE_SYSADMIN             SITE_MAILROOT
/* Space separated job manager list (e.g. fork, pbs, lsf), part of globus.conf.
   NOTE: To support the standard globus commands (in particular the 
         globus-job-get-output command) the fork job manager must be 
         listed first! I.e. the fork job manager must be the default. */
#define CE_JOBMANAGERS          \"fork pbs\"
Note fork has not only to be the first in the list, but it has to be always in the list!
/* Batch system adopted by CE (this info goes in info-mds.conf */
#define CE_BATCHSYSTEM_         pbs
/* Binaries path of the batch system */
#define CE_BATCHSYSTEM_BIN_PATH /usr/pbs/bin
/* Local queue names */
#define CE_QUEUE_               short,infinite
/* List (comma separated no spaces) of StorageElement(s) close to this CE */
#define CE_CLOSE_SE_ID_         SE_HOSTNAME
/* Mount point(s) of the SE(s) close to this CE */
#define CE_CLOSE_SE_MOUNTPOINT  /flatfiles/SE00
More information on mount points will be given in 4.3.1.There the layout of the shared file system is explained in more detail.
/* Disk description */
#define CE_DISK_DESC            15GB-EIDE
/* CPU description */
#define CE_CPU_DESC             DUAL-PIII-800
/* CE InformationProviders: MinPhysMemory */
#define CE_IP_MINPHYSMEM        512
/* CE InformationProviders: MinLocalDiskSpace */
#define CE_IP_MINLOCDISK        2048
/* CE InformationProviders: NumSMPs */
#define CE_IP_NUMSMPS           26
/* CE InformationProviders: MinSPUProcessors */
#define CE_IP_MINSPUPROC        2
/* CE InformationProviders: MaxSPUProcessors */
#define CE_IP_MAXSPUPROC        2
/* CE InformationProviders: MaxSI00.
   See some examples of SpecInt at
   http://www.specbench.org/osg/cpu2000/results/cint2000.html */
#define CE_IP_MAXSI00           380
/* CE InformationProviders: MinSI00 */
#define CE_IP_MINSI00           380
/* CE InformationProviders: AverageSI00 */
#define CE_IP_AVRSI00           380
/* CE InformationProviders: AFSAvailable: */
#define CE_IP_AFS_AFSAVAILABLE  FALSE
/* CE InformationProviders: OutboundIP */
#define CE_IP_OUTBOUNDIP        TRUE
/* CE InformationProviders: InboundIP */
#define CE_IP_INBOUNDIP         FALSE
/* CE InformationProviders: RunTimeEnvironment (1) */
#define CE_IP_RUNTIMEENV1       ATLAS-3.2.1
/* CE InformationProviders: RunTimeEnvironment (2) */
#define CE_IP_RUNTIMEENV2       ALICE-3.07.01
/* CE InformationProviders: RunTimeEnvironment (10) */
/*#define CE_IP_RUNTIMEENV10      ! define it if you need! */
/*
   This must be defined for your CE; it indicates that your site is running
   but hasn't yet been certified.  Change this to EDG-CERTIFIED once your
   site has been tested by the ITeam. */ 
#define CE_IP_RUNTIMEENV15        EDG-TEST
/*#define CE_IP_RUNTIMEENV15      EDG-CERTIFIED */
By default 15 runtime environment variables can be defined. It is possible to add more by modifying the CE specific configuration file. It is important that you first set the EDG-TEST. For details about being certified for the edg testbed contact the integration team.
/* The mountpoint on the CE of the SE exported area via NFS */
#define CE_MOUNTPOINT_SE_AREA   CE_CLOSE_SE_MOUNTPOINT
/* Uncomment this below if you want to collect and publish
   data from a network monitor */
/* #define NETMON_HOST_ gppnm06.gridpp.rl.ac.uk */

/* STORAGE ELEMENT DEFINITIONS -------------------------------------------- */

/* Full path of the certificate */
#define SE_CERT_PATH            SITE_DEF_HOST_CERT
/* Full path of the secret key */
#define SE_SECKEY_PATH          SITE_DEF_HOST_KEY
/* Subject of the SE certificate */
#define SE_CERT_SBJ            \"/O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0393.cern.ch\"
/* Some site and host information (it goes in globus.conf) */
#define SE_HOST_DN             \"hn=lxshare0393.cern.ch, dc=cern, dc=ch, o=Grid\"
/* System administrator e-mail */
#define SE_SYSADMIN            SITE_MAILROOT
/* List (comma separated without spaces) of ComputingElement(s) close to the SE. */
#define SE_CLOSE_CE_           SITE_CE_HOSTS_
/* The value of SE_SIZE in info-mds.conf */
#define SE_DISKSIZE             15
The SE_DISKSIZE and SE_FILESYSTEMS_ should be set in the node configuration file reflecting the actual available space and configuration.
/* comma separated list without spaces, values used in df to obtain freespace */
#define SE_FILESYSTEMS_         /dev/hda2
/* Disk description */
#define SE_DISK_DESC            15GB-EIDE
/* CPU description */
#define SE_CPU_DESC             DUAL-PIII-800
/* SE protocols */
#define SE_PROTOCOLS_           gridftp,rfio,file
Note that the file protocol is only available if you use a shared file system between the SE and the WNs.
/* SE protocols ports */ 
/* Note that although the IANA port for rfio is 3147, the software by 
   default runs on 5001. */
#define SE_PROTOCOL_PORTS_      2811,5001,
/* GDMP area */
#define SE_GDMP_AREA            /flatfiles/SE00
/* List of the supported VO. Add/remove the VO name for each VO that you
   support/do not support in both of the following defines. */
#define SE_GDMP_VOS             alice,atlas
#define SE_VO_                  alice:SE_GDMP_AREA/alice,atlas:SE_GDMP_AREA/atlas

/* WORKER NODE DEFINITIONS ------------------------------------------------ */

/* The mountpoint on the WN of the SE exported area via NFS. It should be
   the same used for the SE */
#define WN_MOUNTPOINT_SE_AREA   CE_MOUNTPOINT_SE_AREA
/* USER INTERFACE DEFINITIONS --------------------------------------------- */

/* Resource broker */
#define UI_RESBROKER            lxshare0380.cern.ch
/* Logging and Bookkeeping URL */
#define UI_LOGBOOK              https://lxshare0380.cern.ch:7846
If you only want to install and configure a UI, then the UI_RESBROKER and UI_LOGBOOK are the only two defines you need to change.

4.3 Shared Filesystem Layout

As mentioned earlier several parts of the file system have to be shared between nodes on a EDG site. To help site-administrators to find a workable layout the CERN system is described. This should not be seen as a general blueprint, but as a working example. We included some comments on managing users.

This section should be first read before moving on to the following sections describing the installation of individual nodes. Many things will be unclear reading this the first time, but will become clearer later.

The current setup is based on 3 RAID disk servers using EIDE disks. Each server is configured as an NFS server exporting 5 100GB partitions.

All partitions are inserted in a common file system naming schema which follows the pattern: /shift/ < server > / < disk >, e.g./shift/lxshare072d/data02

All client nodes (UI,CE,SE,WN) mount the needed partitions using the standard name as a mount point. Ad hoc links are then created on the nodes to point to these paths (examples of this later).

As we are managing several different testbeds, we added an extra path layer which specifies the testbed which is using a particular section of a file system e.g. /shift/lxshare072d/data02/site_pro-1.3 for EDG 1.3 production site.

For a given testbed, the following disk areas are located on the disk server (in parenthesis the nodes mounting the area):

  1. User home directories, /home (UI)
  2. GRID security directory, /etc/grid-security (CE,SE,WN)
  3. VO users home directories, /home (CE,SE,WN)
  4. SE storage area(s), /flatfiles (CE,SE,WN)

1. User home directories: this area is mounted on all UI nodes, independently of the testbed they belong to. In this way users have a unique working space. This unique home directory structure is associated to a unique user account system based on a NIS server (more on this later).On all hosts the link for this area is (note the absence of the "testbed" path component):

/home -> /shift/lxshare072d/data01/UIhome

2. GRID security directory: the main reason to share this directory tree is that the grid-mapfile and the CA CRL files are regularly updated. Having each host to do it independently increases the strain on the servers providing the update information and is prone to misalignments, e.g. a user can start a job on a CE but then the job cannot access the local SE as the user's certificate is not yet accepted by that node. Sharing this directory requires that:

For the application testbed, updates take place on the SE node lxshare0393.

On all CE,SE,WN nodes belonging to the application testbed we have the link:

/etc/grid-security -> /shift/lxshare072d/data02/site_pro-1.3/grid-security
In some occasion this directory cannot be shared. At CERN this happens on RBs and on a special SE node, lxshare0384. In both cases this is due to the fact that the grid-mapfile had to be different from the standard one: on the RB all authorized certificates must be mapped to the dguser account (gridmap directory is also not needed here), while on the special SE we wanted to limit access to a predefined set of users. On these special nodes the grid-security directory is local, all ca_< site > rpms are installed, and the update cron jobs are executed.

3. VO users home directories: sharing of these directories between CE and WNs is mandatory (see XX). There is no real need to share this dir with the SE, but as a gatekeeper is also running on that node, this reduces the number of directories to keep under control. The corresponding link on the application testbed is:

/home -> /shift/lxshare072d/data02/site_pro-1.3/CEhome

4. SE storage area: sharing of this area between SE and CE/WNs allows the activation of the "file" access protocol on the SE. Due to the limitation in the partition size on CERN disk servers, this area spans more than one mounted partition. On each of the nodes sharing this area we created a local /flatfiles directory tree. Within this tree we created links to the actual disk partition. The structure of the directory tree must of course be rigorously identical on all nodes sharing it.

As an example, this is the content of the /flatfiles on the application testbed at CERN:

[root@lxshare0393]# ls -l /flatfiles/SE00
alice -> /shift/lxshare072d/data04/site_pro-1.3/flatfiles/SE00/alice
atlas -> /shift/lxshare072d/data03/site_pro-1.3/flatfiles/SE00/atlas
biome -> /shift/lxshare072d/data04/site_pro-1.3/flatfiles/SE00/biome
cms -> /shift/lxshare072d/data05/site_pro-1.3/flatfiles/SE00/cms
dzero -> /shift/lxshare072d/data04/site_pro-1.3/flatfiles/SE00/dzero
eo -> /shift/lxshare072d/data04/site_pro-1.3/flatfiles/SE00/eo
flatfiles -> ..
iteam -> /shift/lxshare072d/data04/site_pro-1.3/flatfiles/SE00/iteam
lhcb -> /shift/lxshare072d/data04/site_pro-1.3/flatfiles/SE00/lhcb
tutor -> /shift/lxshare072d/data04/site_pro-1.3/flatfiles/SE00/tutor
wpsix -> /shift/lxshare072d/data04/site_pro-1.3/flatfiles/SE00/wpsix

From this we see that all VOs are sharing the same disk partition, /shift/lxshare072d/data04, with the exception of atlas and cms, using /shift/lxshare072d/data03 and /shift/lxshare072d/data05 respectively (note: this layout was chosen as atlas and cms were planning some production tests). Also note the flatfiles - > .. link: this must always be present to allow correct LFN-to-PFN mapping on all nodes.


4.3.1 LCFG based NFS setup

To configure LCFG to mount the correct disk partitions on each node, we commented out all NFS-export/-mount sections from the default configuration files and included a testbed-specific nfsmount configuration file to each of the nodes. For the application testbed this file is:
nfsmount-cfg.h
/*
 * Common nfsmount configuration file
 */
EXTRA(nfsmount.nfsmount) l072d01 l072d02 l072d03 l072d04 l072d05
nfsmount.nfsdetails_l072d01 /shift/lxshare072d/data01 edg-nfs00.cern.ch:\
/shift/lxshare072d/data01 rw,bg,intr,hard
nfsmount.nfsdetails_l072d02 /shift/lxshare072d/data02 edg-nfs00.cern.ch:\
/shift/lxshare072d/data02 rw,bg,intr,hard
nfsmount.nfsdetails_l072d03 /shift/lxshare072d/data03 edg-nfs00.cern.ch:\
/shift/lxshare072d/data03 rw,bg,intr,hard
nfsmount.nfsdetails_l072d04 /shift/lxshare072d/data04 edg-nfs00.cern.ch:\
/shift/lxshare072d/data04 rw,bg,intr,hard
nfsmount.nfsdetails_l072d05 /shift/lxshare072d/data05 edg-nfs00.cern.ch:\
/shift/lxshare072d/data05 rw,bg,intr,hard

Creation of the node specific links cannot, to our knowledge, be handled by LCFG, so we created node-type specific scripts to be executed on each node after installation.

4.3.2 NIS Server

As not all the users participating to the EDG project are CERN users, we set up a user account system independent from the standard CERN one which is based on AFS. To keep it simple, we used a standard NIS server installed on the disk server which also hosts the user home directory disk: this allows us to create users and home directories with a single command.

We then configured all UIs as NIS clients for the "edg-tb" NIS domain. This is only partially handled by LCFG. A full configuration requires the inclusion of a NIS client specific configuration file in the LCFG set up and then the execution of a script on each node.

The LCFG part is:

nisclient-cfg.h:

+auth.nsswitch ignore
+update.ypserver              edg-nfs00.cern.ch
EXTRA(boot.services)          nsswitch
EXTRA(boot.run)               nsswitch
EXTRA(profile.components)     nsswitch
+nsswitch.mods_passwd   compat
+nsswitch.mods_shadow   files nis
+nsswitch.mods_group    files nis
+nsswitch.mods_hosts    files dns [NOTFOUND=return]
+nsswitch.mods_netgroup files nis
This only creates the /etc/nsswitch.conf and /etc/ypserv.conf files.

The script to be executed on each node takes care of enabling and starting the ypbind server, also configuring the system DOMAINNAME variable:

/scripts/setup_enableNIS.sh:

#!/bin/bash
# Set the NISDOMAIN once and for all
domainname edg-tb
sed -e "s/^NISDOMAIN=.*$/NISDOMAIN=edg-tb/" /etc/sysconfig/network > /etc/sysconfig/network.new
mv -f /etc/sysconfig/network /etc/sysconfig/network.old
mv -f /etc/sysconfig/network.new /etc/sysconfig/network

# Add the NIS entries to passwd and group
#echo "ypserver edg-nfs00.cern.ch" >> /etc/yp.conf
echo "+::::::" >> /etc/passwd
echo "+:::" >> /etc/group

# all is ready: start the ypbind daemon
/etc/rc.d/init.d/ypbind start
/sbin/chkconfig ypbind on

# Once NIS client is up, groups exist and the auth object can be executed
cd /etc/obj
bin/runobj auth start

4.4 User Interface Machine

The User Interface Machine contains the client software necessary to communicate with the Resource Broker as described in the Users Guide document.

4.4.1 Manual Installation

The list of RPM packages to be installed for each version of EDG software can be viewed and downloaded at the following address: http://datagrid.in2p3.fr/autobuild/rh6.2/rpmlist

The RPMs for each EDG component (CE, SE, ...) are divided into several categories (CA, Globus, EDG, ...) this allows to install only the required components.

If you are upgrading a machine where a previous version of EDG is already installed it is strongly recommended to uninstall the EDG software.

To install the edg and globus software you need super-user privileges. All the commands listed below assume to be executed under 'root'.

  1. Download and install the certification authorities packages /bin/rpm -ivh ca_* . You will end up with a set of files in the directory /etc/grid-sercurity/certificates with the extension .0, *.signing_policy and *.crl_url.
  2. Download and install the packages under the Globus category The files are installed under /opt/globus and /opt/globus-24.
  3. Download and install the packages under the EDG category The files are installed under /opt/edg and some of them also on /etc/rc.d/init.d and /etc/profile.d
  4. Some of the packages in the External category are needed. You may want to select the ones needed or just install all of them.
  5. Add the following lines to /etc/ld.so.conf
    /opt/globus/lib
    /opt/edg/lib
    /opt/globus-24/lib
    
    Run /bin/ldconfig
  6. Copy the template file /opt/edg/etc/UI_ConfigEnv.cfg.template to /opt/edg/etc/UI_ConfigEnv.cfg and specify which Resource Broker you will use as you default one. You can visit the CERN Site Status page on http://marianne.in2p3.fr for knowing which is the RB currently being used for the applications testbed.
  7. Configure GDMP for each VO your site plans to support. Currently edg as a whole supports atlas, alice, cms, lhcb, eo, biome, iteam, wpsix, dzero and tutor. Use the command /opt/edg/sbin/configure_gdmp_client with the appropriate options.

These 5 steps are common to most nodes. Steps 6 and 7 are specific to a UI node.

4.4.2 LCFG Based Installation

Modify the site-cfg.h file in the source directory of your LCFG server Apart from general settings you have to change the defines that represent the resource broker and L&B server.:

/* Resource broker */
#define UI_RESBROKER            lxshare0380.cern.ch
/* Logging and Bookkeeping URL */
#define UI_LOGBOOK              https://lxshare0380.cern.ch:7846

Start the update or installation of you UI node as described in the LCFG guide. No additional manual intervention is needed.

4.4.3 Initial Test

A user may wish to customize the UI configuration file to use a different resource broker or to change the sandbox location, for example. This can be done by copying the standard UI configuration file, editing it appropriately, and then setting the environmental variable EDG_WL_UI_CONFIG_PATH to point to the new configuration file.

A rudimentary test of the user interface is to submit a "Hello World" example. Put the following into a file called hello.jdl:

Executable    = "/bin/echo";
Arguments     = "Hello";
StdOutput     = "hello.out";
StdError      = "hello.err";
OutputSandbox = {"hello.out","hello.err"};
Rank          = other.MaxCpuTime;
and submit this job with
dg-job-submit hello.jdl
The status of the job can be obtained with dg-job-status using the job identifier returned from the submit command. The output can be retrieved with dg-job-get-output again with the job identifier. The hello.out file should contain the word "Hello".

For a better introduction to using edg consult the EDG User Guide which can be found on WP6's web page.

4.5 Computing Element Configuration

A computing element consists of a gatekeeper and optionally a set of worker nodes joined by a local resource management system (batch system). If the computing element contains worker nodes, then the home areas of all of the accounts must be on a common shared file system with the gatekeeper node.

4.5.1 Manual Installation

Follow the steps 1-5 described for the UI machine, then:

  1. Install the host certificate and private key respectively in the following places: /etc/grid-security/hostcert.pem with permissions set to 0644 and /etc/grid-security/hostkey.pem with permissions set to 0400.
  2. Create the file /etc/sysconfig/globus containing the following 2 lines:
    GLOBUS_LOCATION=/opt/globus-24
    GLOBUS_CONFIG=/etc/globus2.conf
    

  3. Run the Globus initialization script:
    setenv GLOBUS_LOCATION /opt/globus-24
    $GLOBUS_LOCATION/sbin/globus-initialization.sh
    
  4. Create and customize the following configuration files:
    /etc/globus.conf
    /etc/edg/info-mds.conf
    
    Examples for these files will be given later. File /etc/globus2.conf is used for configuring the EDG information system on your site. Below you'll find the file on the CE host ccgridli03.in2p3.fr which is a GRIS and a GIIS for the site cc-in2p3 which registers to the EDG applications testbed information system edgpro host lxshare0373.cern.ch. The user edginfo is used to run the LDAP daemons needed by the information system.
    [common]
    X509_USER_CERT=/etc/grid-security/hostcert.pem
    X509_USER_KEY=/etc/grid-security/hostkey.pem
    GRIDMAP=/etc/grid-security/grid-mapfile
    
    [mds]
    user=edginfo
    [mds/gris/provider/gg]
    provider=globus-gris
    
    [mds/gris/provider/ggr]
    provider=globus-gram-reporter
    
    [mds/gris/provider/edg]
    
    [mds/gris/registration/cc-in2p3]
    regname=cc-in2p3
    reghn=ccgridli03.in2p3.fr
    
    [mds/giis/cc-in2p3]
    name=cc-in2p3
    
    [mds/giis/cc-in2p3/registration/edgpro]
    regname=edgpro
    reghn=lxshare0373.cern.ch
    
    [gridftp]
    
  5. Start the Globus/EDG services on the CE:
    /sbin/chkconfig globus-gatekeeper on
    /etc/rc.d/init.d/globus-gatekeeper start
            
    /sbin/chkconfig globus-mds on 
    /etc/rc.d/init.d/globus-mds start
    
    /sbin/chkconfig globus-gsincftp on 
    /etc/rc.d/init.d/globus-gsincftp start
    
    /sbin/chkconfig localloger on
    /etc/rc.d/init.d/localloger start
    

  6. Increase some system parameters to improve EDG CE scalability and add these commands to rc.local to let them survive reboots. The following script will do the trick.
    echo 480000 > /proc/sys/fs/inode-max 
    echo 120000 > /proc/sys/fs/file-max
    cp -f /etc/rc.d/rc.local /etc/rc.d/rc.local.orig
    cat >> /etc/rc.d/rc.local <<EOD
    
    # Increase some system parameters to improve EDG CE scalability
    if [ -f /proc/sys/fs/inode-max ]; then
        echo 480000 > /proc/sys/fs/inode-max 
    fi
    if [ -f /proc/sys/fs/file-max ]; then
        echo 120000 > /proc/sys/fs/file-max
    fi
    EOD
    

  7. Create, if not present, the /etc/grid-security-local directory and copy there the host key/cert pair with name hostkey.pem and hostcert.pem.
  8. If you are not using a pre-existing /etc/grid-security area mounted from the NFS server, you must create the /etc/grid-security/gridmapdir directory and here create one 0-length file for each of the users you created. This can be done using the command line:
    touch `egrep "[a-z]+[0-9][0-9][0-9]"  /etc/passwd | cut -d ":" -f 1`
    
  9. In case you want to use PBS follow the configuration steps given. lxshare0227.cern.ch is the CE and all other nodes mentioned are worker nodes.

4.5.1.1 Local Centre Authorization Service (LCAS)

The Local Centre Authorization Service (LCAS) handles authorization requests to the local computing fabric.

In this release the LCAS is a shared library, which is loaded dynamically by the globus gatekeeper. The gatekeeper has been slightly modified for this purpose and will from now on be referred to as edg-gatekeeper.

The authorization decision of the LCAS is based upon the user's certificate and the job specification in RSL (JDL) format. The certificate and RSL are passed to (plug-in) authorization modules, which grant or deny the access to the fabric. Three standard authorization modules are provided by default:

All three modules get their information from simple configuration files: allowed_users.db, ban_users.db and timeslots.db, respectively.

For installation and configuration instructions on the edg-gatekeeper and LCAS modules, go to LCAS website.

4.5.1.2 Security

The Gatekeeper must have a valid host certificate and key installed in the /etc/grid-security directory. These are usually links to files in the /etc/grid-security-local directory.

The Gatekeeper must have all of the security RPMs installed. In addition, the daemon which updates the certificate revocation lists (see 8.2.2) and that which updates the grid mapfile (see Section 23) must also be running. An example mkgridmap configuration file can be found on the Testbed website. The example maps users into pooled accounts based on membership in a virtual organization. These tasks are done by cron jobs.

4.5.1.3 Configuring the GRAM and GRIS

Here is an example for a /etc/globus.conf file as it is used on the CERN application testbed. The node is lxshare0227.cern.ch.

GLOBUS_LOCATION=/opt/globus
GLOBUS_GATEKEEPER_SUBJECT="/O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0227.cern.ch"
GLOBUS_HOST_DN="hn=lxshare0227.cern.ch, dc=cern, dc=ch, o=Grid"
GLOBUS_ORG_DN="dc=cern, dc=ch, o=Grid"
GLOBUS_GATEKEEPER_HOST="lxshare0227.cern.ch"
GATEKEEPER_PORT=2119
GATEKEEPER_LOG=/var/log/globus-gatekeeper.log
X509_CERT_DIR=/etc/grid-security/certificates
X509_GATEKEEPER_CERT=/etc/grid-security-local/hostcert.pem
X509_GATEKEEPER_KEY=/etc/grid-security-local/hostkey.pem
GRIDMAP=/etc/grid-security/grid-mapfile
GLOBUS_JOBMANAGERS="fork pbs"
GSIWUFTPPORT=2811
GSIWUFTPDLOG=/var/log/gsiwuftpd.log
GLOBUS_FLAVOR_NAME=gcc32dbg
GRID_INFO_EDG=yes
GRIDMAPDIR=/etc/grid-security/gridmapdir/
GRID_INFO_GRIS_REG_GIIS=cern
GRID_INFO_GRIS_REG_HOST=lxshare0227.cern.ch
GLOBUS_GATEKEEPER=/opt/edg/sbin/edg-gatekeeper
GLOBUS_GATEKEEPER_OPTIONS="-lcas_dir /opt/edg/etc/lcas -lcasmod_dir /opt/edg/lib/lcas/"
GLOBUS_GSIWUFTPD_UMASK=002
GRID_INFO_GRIS=yes
GRID_INFO_USER=edginfo
X509_GSIWUFTPD_CERT=/etc/grid-security-local/hostcert.pem
X509_GSIWUFTPD_KEY=/etc/grid-security-local/hostkey.pem
GLOBUS_GRAM_JOB_MANAGER_QSUB=/usr/pbs/bin/qsub
GLOBUS_GRAM_JOB_MANAGER_QDEL=/usr/pbs/bin/qdel
GLOBUS_GRAM_JOB_MANAGER_QSTAT=/usr/pbs/bin/qstat
GLOBUS_GRAM_JOB_MANAGER_MPIRUN=/usr/pbs/bin/qrun
GLOBUS_GRAM_JOB_MANAGER_QSELECT=/usr/pbs/bin/qselect

To configure the /etc/edg/info-mds.conf start with what is present in /etc/edg/info-mds.conf.in.

The example given is taken from the CERN CE lxshare0227.cern.ch Read the section about the site-cfg.h file and the part about installing via LCFG to get more information about the parameters.

WP3_DEPLOY=/opt/edg/info/mds
FTREE_INFO_PORT=2171
FTREE_DEBUG_LEVEL=0
SITE_DN=Mds-Vo-Name=local,o=Grid
SITE_INFO=yes
SITE_NAME=CERN-PRO-1-4
SITE_INSTALLATION_DATE=20021118120000Z
SITE_CPU_RESOURCE_DESCRIPTION=DUAL-PIII-800
SITE_DISK_RESOURCE_DESCRIPTION=15GB-EIDE
SITE_SYSADMIN_CONTACT=hep-project-grid-cern-testbed-managers@cern.ch
SITE_USER_SUPPORT_CONTACT=hep-project-grid-cern-testbed-managers@cern.ch
SITE_SECURITY_CONTACT=hep-project-grid-cern-testbed-managers@cern.ch
SITE_DATAGRID_VERSION=v1_4_3
SITE_SE_HOSTS=lxshare0393.cern.ch
SITE_CE_HOSTS=lxshare0227.cern.ch:2119/jobmanager-pbs-short,\
lxshare0227.cern.ch:2119/jobmanager-pbs-infinite
NETMON_PRESENT=no
NETMON_PINGER_HOST=lxshare0227.cern.ch
CE_PRESENT=yes
CE_HOST=lxshare0227.cern.ch
CE_BATCHSYSTEM=pbs
CE_CLUSTER_BATCH_SYSTEM_BIN_PATH=/usr/pbs/bin
CE_STATIC_LDIF=/opt/edg/info/mds/etc/ldif/ce-static.ldif
CE_QUEUE=medium,long,short,infinite
CE_CLOSE_SE_ID=lxshare0393.cern.ch
CE_CLOSE_SE_MOUNT_POINT=/flatfiles/SE00
GRID_INFO_USER=edginfo
SITE_NETMON_HOST=no
SITE_NETMON_HOSTS=none

Next copy the file /opt/edg/info/mds/etc/ldif/ce-static.ldif.in
to /opt/edg/info/mds/etc/ldif/ce-static.ldif and modify it to reflect the local environment. SpecInt2000 benchmarks can be found at SPEC Website. (For valid tags for RunTimeEnvironment see 23.) The file has been taken from the CE on lxshare0227.cern.ch. The nodes are dual 800MHz PIII nodes. The text on the right hand side is not part of the configuration. It has been put here to provide some description of the parameters.

 
Architecture:        intel   The architecture of the hosts composing the CE
OpSys:               RH 6.2  The operating system of the hosts composing the CE
MinPhysMemory:       512     Minimum value of the physical memory of any WN
MinLocalDiskSpace:   2048    The minimum local disk footprint
NumSMPs:             26      Number of SMP hosts
MinSPUProcessors:    2       The minimum number of SPU processors (for SMP hosts)
MaxSPUProcessors:    2       The Maximum number of SPU processors (for SMP hosts
AverageSI00:         380     Average of the SpecInt2000 benchmark of the WNs 
MinSI00:             380     Minimum value of the SpecInt2000 benchmark of the WNs  
MaxSI00:             380     Maximum value of the SpecInt2000 benchmark of the WNs
AFSAvailable:        FALSE   Defines if AFS is installed 
OutboundIP:          TRUE    Defines if outbound connectivity is allowed
InboundIP:           FALSE   Defines if inbound connectivity is allowed
RunTimeEnvironment:  CMS-1.1.0
RunTimeEnvironment:  ATLAS-3.2.1
RunTimeEnvironment:  ALICE-3.07.01
RunTimeEnvironment:  LHCb-1.1.1
RunTimeEnvironment:  IDL-5.4
RunTimeEnvironment:  CERN-MSS
RunTimeEnvironment:  CMSIM-125
RunTimeEnvironment:  CERN-PRO-1-4
RunTimeEnvironment:  CMS-STRESSTEST-1.0.0
RunTimeEnvironment:  EDG-TEST

Once the globus-mds service is started, to see which information is published you can use:

ldapsearch -LLL -x -H \
ldap://<CE-hostname>:2135 -b "mds-vo-name=local,o=grid" "(objectClass=*)"

4.5.2 LCFG Based Installation

After doing the appropriate configuration changes to site-cfg.h the file ComputingElement-cfg.h might need minor changes. These could be in the area of the LCAS object configuration and the parts that refer to the used NFS configuration. Check the users.h file and add or remove the virtual users appropriate for the VOs that you support.

Another configuration file that should be checked that it contains current information is rc-cfg.h. Make sure that the information about the replica catalogs given for the different supported VOs is correct. This information can be obtained from the VO managers or the Integration team.

If you plan to use PBS, include in the node specific configuration file pbs-cfg.h after the ComputingElement-cfg.h. This is needed due to dependencies. In case you plan to use your CE in addition as a WN then you have to replace pbs-cfg.h with pbsexechost-cfg.h and configure it as described in the section about setting up a WN.

Install the CE using LCFG. Then a few minor manual changes are required. Follow the steps described in the Manual Configuration section described in the items 6 to the end. Reboot the machine.

4.5.3 First Test

Check that the following services are running: pbs, globus-gsi_wuftpd, globus-gatekeeper, globus-mds and locallogger.

Check the information published by the node:

ldapsearch -LLL -x -H ldap://<CE-hostname>:2135 -b "mds-vo-name=local,o=grid" "(objectClass=*)"

Follow instructions given in the Users Guide and run from a UI node a globus-job-run command.

4.6 Worker Node Configuration

In a typical configuration all of the authorization for the worker node is handled by the associated gatekeeper. FTP transfers to and from are usually allowed but they must be initiated by the worker node as it does not run an FTP daemon.

As a consequence of this, the machine does not need to have a host certificate/key, grid-mapfile, or the security RPMs installed. On the other side, the /etc/grid-security/certificates directory must exists to allow the WNs to verify user and host certificates.

Note: A Worker Node cannot be a Storage Element as well and cannot host a Resource Broker or the Logging and Bookkeeping services.

4.6.1 Manual Installation

Perform the steps 1-5 of the installation of a UI. Then configure the GDMP client in the same way it has been done on the UI.

4.6.2 LCFG Based Installation

Assuming that by now the site-cfg.h file has been configured, there is very little that you have to do. In case you use PBS, include the pbsexechost-cfg.h file and make sure that you set the parameters for the mount points for /home correctly.

Install the node(s). Check that the rc object has been run and set the access privileges for the /opt/edg/etc/ < VO > directories correctly. They have to be rwxr-xr-x and the owner must be set to root : < VOgrp >.

4.7 Storage Element Configuration

4.7.1 General Comments

Access control to files is managed in the following way. On the SE group ID per VO has been created. All users belonging to a given VO have their certificates mapped in the grid-mapfile of the SE to a local user with the VOs group ID. The directory used by GDMP to replicate files is group writable by the VO group ID only and has a group sticky bit set (see the details further in these instructions). This prevents users belonging to other VOs to write/use the directory or files contained.

The storage element must also run a gatekeeper () and an FTP daemon (xref linkend="gsiftpd"/ ). See the appropriate sections for the proper configuration of these daemons.

To run a storage element a host certificate is required.

If you want to provide the access protocol ``file'' you have to use a shared file system and make the storage area accessible from the WNs. This area is on most systems at the location /flatfiles and contains directories for the various VOs.

In case you want to provide RFIO access install the castor-rfio and castor-rfio-devel RPMS. To use these make sure that the $PATH variable includes the path to the RFIO commands.

There is a script "rfiod.scripts" in the rfio sub-directory of the castor distribution that can be used to start/stop/restart/check the presence of rfiod (i.e. to start run rfiod.scripts start). This works for all Unix platforms.

Its not advisable to use inetd to start rfios as the requests can come very quickly and cause inetd to think the process is looping and refuse to start a new rfiod.

4.7.2 Manual Installation

  1. Follow the steps 1-5 given in the UI section. Install the certificates as described in the CE section and follow the steps leading to running the Globus initialization scripts in this section.

  2. The same configuration files that are needed for the CE are needed for an SE, however the contents of /etc/globus2.conf differs. Here is, as an example, the file used on CERN's SE (lxshare0393).
    [common]
    GLOBUS_LOCATION=/opt/globus-24
    globus_flavor_name=gcc32dbg
    X509_USER_CERT=/etc/grid-security-local/hostcert.pem
    X509_USER_KEY=/etc/grid-security-local/hostkey.pem
    GRIDMAP=/etc/grid-security/grid-mapfile
    GRIDMAPDIR=/etc/grid-security/gridmapdir/
    
    [mds]
    user=edginfo
    
    [mds/gris/provider/gg]
    provider=globus-gris
    
    [mds/gris/provider/ggr]
    provider=globus-gram-reporter
    
    [mds/gris/provider/edg]
    
    [mds/gris/registration/site]
    regname=cern
    reghn=lxshare0227.cern.ch
    
    [gridftp]
    
  3. Configure GDMP for each VO your site supports. Use the command
    /opt/edg/sbin/configure_gdmp with the appropriate options, consult information provided on the WP2 website. This will create a directory for each VO under /opt/edg/etc, and you must modify the configuration files for each VO. For example, for the VO 'alice', you will have to modify /opt/edg/etc/alice/gdmp.conf and /opt/edg/etc/alice/gdmp.private.conf. Here is an example for /opt/edg/etc/alice/gdmp.conf:
    GDMP_SHARED_CONF=/opt/edg/etc/gdmp.shared.conf
    GDMP_SERVICE_NAME=host/lxshare0393.cern.ch
    GDMP_VIRTUAL_ORG=alice
    GDMP_CONFIG_DIR=/opt/edg/etc/alice
    GDMP_VAR_DIR=/opt/edg/var/alice
    GDMP_TMP_DIR=/opt/edg/tmp/alice
    GDMP_GRID_MAPFILE=/opt/edg/etc/alice/grid-mapfile
    GDMP_SERVER_PROXY=/opt/edg/etc/gdmp_server.proxy
    GDMP_PRIVATE_CONF=/opt/edg/etc/alice/gdmp.private.conf
    GDMP_STORAGE_DIR=/flatfiles/SE00/alice
    GDMP_STAGE_FROM_MSS=/opt/edg/alice/bin/stage_from_mss.sh
    GDMP_STAGE_TO_MSS=/opt/edg/alice/bin/stage_to_mss.sh
    
    and for /opt/edg/etc/alice/gdmp.private.conf:
    GDMP_REP_CAT_HOST=ldap://grid-vo.nikhef.nl:10489
    GDMP_REP_CAT_NAME=AliceReplicaCatalog
    GDMP_REP_CAT_MANAGER_CN=Manager
    GDMP_REP_CAT_MANAGER_PWD=THE PASSWORD
    GDMP_REP_CAT_CN=dc=eu-datagrid,dc=org
    GDMP_REP_CAT_FILE_COLL_NAME=Alice WP1 Repcat
    GDMP_REP_CAT_MANAGER_DN=cn=${GDMP_REP_CAT_MANAGER_CN},rc=${GDMP_REP_CAT_NAME},\
    ${GDMP_REP_CAT_CN}
    GDMP_REP_CAT_URL=${GDMP_REP_CAT_HOST}/rc=${GDMP_REP_CAT_NAME},${GDMP_REP_CAT_CN}
    GDMP_REP_CAT_FILE_COLL_URL=${GDMP_REP_CAT_HOST}/lc=${GDMP_REP_CAT_FILE_COLL_NAME},\
    rc=${GDMP_REP_CAT_NAME},${GDMP_REP_CAT_CN}
    GDMP_REP_CAT_OBJECTIVITY_COLL_URL=${GDMP_REP_CAT_HOST}/lc=${GDMP_REP_CAT_OBJYFILE_COLL_NAME},\
    rc=${GDMP_REP_CAT_NAME},${GDMP_REP_CAT_CN}
    
    Note that this file contains the password for the VO specific replica catalog which you can get from the VO manager or Integration team.
  4. Start the Globus/EDG services on the CE:
    /sbin/chkconfig globus-gatekeeper
    /etc/rc.d/init.d/globus-gatekeeper start
        
    /sbin/chkconfig globus-mds
    /etc/rc.d/init.d/globus-mds start
    
    /sbin/chkconfig globus-gsincftp
    /etc/rc.d/init.d/globus-gsincftp start
    
    Note: the GDMP server is started by 'inetd'.

There are now a few steps that the manual and LCFG based installation have in common.

4.7.3 LCFG Based Installation

4.7.4 mkgridmap.conf

After the initial installation and configuration make sure that a correct /opt/edg/etc/mkgridmap.conf file has been created.

Apart from the VO specific lines, giving the ldap addresses of the VOs, for the SE this file has to contain the special storage element VO.

To get some orientation have a look at the file used at CERN.

#### GROUP: group URI [lcluser]
#
# EDG Standard Virtual Organizations
group ldap://grid-vo.nikhef.nl/ou=testbed1,o=alice,dc=eu-datagrid,dc=org .alice
group ldap://grid-vo.nikhef.nl/ou=testbed1,o=atlas,dc=eu-datagrid,dc=org .atlas
group ldap://grid-vo.nikhef.nl/ou=tb1users,o=cms,dc=eu-datagrid,dc=org .cms
group ldap://grid-vo.nikhef.nl/ou=tb1users,o=lhcb,dc=eu-datagrid,dc=org .lhcb
group ldap://grid-vo.nikhef.nl/ou=tb1users,o=biomedical,dc=eu-datagrid,dc=org .biome
group ldap://grid-vo.nikhef.nl/ou=tb1users,o=earthob,dc=eu-datagrid,dc=org .eo
group ldap://marianne.in2p3.fr/ou=ITeam,o=testbed,dc=eu-datagrid,dc=org .iteam
group ldap://marianne.in2p3.fr/ou=wp6,o=testbed,dc=eu-datagrid,dc=org .wpsix
group ldap://grid-vo.nikhef.nl/ou=testbed1,o=dzero,dc=eu-datagrid,dc=org .dzero
group ldap://marianne.in2p3.fr/ou=EDGtutorial,o=testbed,dc=eu-datagrid,dc=org .tutor
#
# Other Virtual Organizations
#group ldap://grid-vo.cnaf.infn.it/ou=testbed1,o=infn,c=it .infngrid
#group ldap://vo.gridpp.ac.uk/ou=testbed,dc=gridpp,dc=ac,dc=uk .gridpp
#group ldap://babar-vo.gridpp.ac.uk/ou=babar,dc=gridpp,dc=ac,dc=uk .babar
#

# Following group is to get SE (GDMP) host certs ...
#group ldap://grid-vo.nikhef.nl/ou=devtb,o=gdmpservers,dc=eu-datagrid,dc=org gdmp
group ldap://grid-vo.nikhef.nl/ou=apptb,o=gdmpservers,dc=eu-datagrid,dc=org gdmp

#### Optional - DEFAULT LOCAL USER: default_lcluser lcluser
#default_lcluser .

#### Optional - AUTHORIZED VO: auth URI
auth ldap://grid-vo.nikhef.nl/ou=People,o=gdmpservers,dc=eu-datagrid,dc=org
auth ldap://marianne.in2p3.fr/ou=People,o=testbed,dc=eu-datagrid,dc=org

#### Optional - ACL: deny|allow pattern_to_match
#allow *INFN*

#### Optional - GRID-MAPFILE-LOCAL
#gmf_local /opt/edg/etc/grid-mapfile-local

The grid-mapfile-local file contains a list of certificates which will be included in addition to the items added during the periodic update of the file.

4.8 Replica Catalog

Here the differences between manual and LCFG based installation are marginal.

4.9 Resource Broker

For Testbed 1, the resource broker machine contains the resource broker itself, the job submission service, a logging and bookkeeping server. The information index has been moved to a different node and is replaced by the BDII, based on a standard LDAP server using the schemas previously used by the II. Each of these must be configured as well as some external software upon which these depend.

The resource broker machine must also be running an grid-ftp daemon.

For full functionality, sendmail must be available on the resource broker machine and must be in the path of the user running the various daemons.

Due to some limitations of the RB, multiple RBs in the application testbed have to be deployed. This means if you are a site running many UIs with active users you should consider setting up additional RBs.

4.9.1 Security

The resource broker must have a valid host certificate and key installed in the /etc/grid-security directory. In addition, copies of these files must be in the .hostcert subdirectory of the account running the resource broker daemons, usually dguser.

The resource broker must have all of the security RPMs installed. In addition, the daemon which updates the certificate revocation lists (see 8.2.2) and that which updates the grid mapfile (see 23) must also be running. An example mkgridmap configuration file can be found on the EDG documentation web page.

4.9.2 External Packages

4.9.2.1 CondorG

The resource broker relies on CondorG and the ClassAds from the Condor team. The RPMs for these packages must be installed and can be obtained from the EDG package repository.

CondorG runs several daemons under an unpriviledged account. You must create this account before installing CondorG. The recommended name is "dguser".

4.9.3 Installation and Configuration

First the procedures will be given that are unique to the manual part. After this the LCFG based procedure will be described and then the common, additional manual configuration steps are given.

4.9.3.1 Manual Installation

As with the other nodes, download and install the RPM lists. Follow the first 5 steps of the UI installation.

Then configure /etc/globus.conf following the example given here:

GLOBUS_LOCATION=/opt/globus
GLOBUS_HOST_DN="hn=lxshare0227.cern.ch, dc=cern, dc=ch, o=Grid"
GLOBUS_ORG_DN="dc=cern, dc=ch, o=Grid"
GRIDMAP=/etc/grid-security/grid-mapfile
GRIDMAPDIR=/etc/grid-security/gridmapdir/
GSIWUFTPPORT=2811
GSIWUFTPDLOG=/var/log/gsiwuftpd.log
GLOBUS_FLAVOR_NAME=gcc32dbg
X509_GATEKEEPER_CERT=/etc/grid-security-local/hostcert.pem
X509_GATEKEEPER_KEY=/etc/grid-security-local/hostkey.pem
X509_GSIWUFTPD_CERT=/etc/grid-security-local/hostcert.pem
X509_GSIWUFTPD_KEY=/etc/grid-security-local/hostkey.pem

Create the required local users: mysql, postgres and dguser

On the RB there are several services that require access to a valid proxy. The proxies are generated by the SysV startup script with a default time of 24 hours. Add

57 2,8,14,20 * * * root service broker proxy
57 2,8,14,20 * * * root service jobsubmission proxy
57 2,8,14,20 * * * root service lbserver proxy
57 2,8,14,20 * * * root service locallogger proxy
to the /etc/crontab. In addition the gridmapfiles and the CRL files have to be updated on a regular basis. Add in addition:
53 1,7,13,19 * * * root /opt/edg/etc/cron/mkgridmap-cron
53 1,7,13,19 * * * root /opt/edg/etc/cron/edg-fetch-crl-cron

4.9.3.2 LCFG based Installation

As always, make sure the site-cfg.h file reflects your site. Have a look at ResourceBroker.h. There shouldn't be any need to modify this file. Install the node.

4.9.3.3 Common Interactions

  1. As user root increase some of the system limits to make the RB more robust. Execute the following lines:
    # Increase some default system parameters for out greedy RB
    echo 480000 > /proc/sys/fs/inode-max 
    echo 120000 > /proc/sys/fs/file-max
    echo 1024 7999 > /proc/sys/net/ipv4/ip_local_port_range 
    
    # To make these modifications permanent, we add them to rc.local
    cp -f /etc/rc.d/rc.local /etc/rc.d/rc.local.orig
    cat >> /etc/rc.d/rc.local <<EOD
    
    # Increase some system parameters to improve EDG RB scalability
    if [ -f /proc/sys/fs/inode-max ]; then
        echo 480000 > /proc/sys/fs/inode-max 
    fi
    if [ -f /proc/sys/fs/file-max ]; then
        echo 120000 > /proc/sys/fs/file-max
    fi
    if [ -f /proc/sys/net/ipv4/ip_local_port_range ]; then
        echo 1024 7999 > /proc/sys/net/ipv4/ip_local_port_range 
    fi
    EOD
    
  2. Create the directories for the host key and the host certificate and move them there. According to the given globus.conf, the paths should be /etc/grid-security-local/hostkey.pem and /etc/grid-security-local/hostcert.pem
  3. Create links in /etc/grid-security:
    ln -s /etc/grid-security-local/hostkey.pem /etc/grid-security/hostkey.pem 
    ln -s /etc/grid-security-local/hostcert.pem /etc/grid-security/hostcert.pem
    
  4. Copy the credentials to the dguser directory.
    mkdir /home/dguser/.hostcert
    cp /etc/grid-security-local/* /home/dguser/.hostcert/
    chown -R dguser:dguser /home/dguser/.hostcert
    
  5. Map all users to the user dguser to do this the configuration file for mkgridmap has to be changed.
    mv /opt/edg/etc/mkgridmap.conf /opt/edg/etc/mkgridmap.conf.orig
    cat > /opt/edg/etc/mkgridmap.conf.rb <<EOD
    group ldap://marianne.in2p3.fr/ou=guidelines,o=testbed,dc=eu-datagrid,dc=org dguser
    auth ldap://marianne.in2p3.fr/ou=People,o=testbed,dc=eu-datagrid,dc=org
    gmf_local /opt/edg/etc/grid-mapfile-local
    EOD
    cp /opt/edg/etc/mkgridmap.conf.rb /opt/edg/etc/mkgridmap.conf
    
  6. The mkgridmap script is run every 6 hours. To get an initial update start the process from the command line: /opt/edg/etc/cron/mkgridmap-cron
  7. Configure CondorG. You have to be user dguser for some of the operations. The setup requires input from the user. All the defaults are correct and you have just to accept them.
    su dguser
    /opt/CondorG/setup.sh
    
  8. Setup /home/dguser/.bashrc and /home/dguser/workload_setup.sh
    # .bashrc
    # User specific aliases and functions
    if [ -f ~/workload_setup.sh ]; then
      . ~/workload_setup.sh
    fi
    
    # Source global definitions
    if [ -f /etc/bashrc ]; then
            . /etc/bashrc
    fi
    
    and workload_setup.sh:
    # Point to the CondorG installation path and configuration file.
    CONDORG_INSTALL_PATH=/home/dguser/CondorG
    export CONDORG_INSTALL_PATH
    CONDOR_CONFIG=$CONDORG_INSTALL_PATH/etc/condor_config
    export CONDOR_CONFIG
    
    # Replica catalog API is needed by resource broker.  
    GDMP_INSTALL_PATH=/opt/edg
    export GDMP_INSTALL_PATH
    
    # Setup the user and database area for the postgresql database. 
    # This is used by the resource broker.
    PGSQL_USER=postgres
    export PGSQL_USER
    PGDATA=/opt/data
    export PGDATA
    PGSQL_INSTALL_PATH=/usr/bin/psql
    export PGSQL_INSTALL_PATH
    
    # Add paths to the shared library path.
    for p in \
        "${CONDORG_INSTALL_PATH}/lib" \
        "${GDMP_INSTALL_PATH}/lib"
    do
        if ! printenv LD_LIBRARY_PATH | grep -q "${p}"; then
            if [ -n "${LD_LIBRARY_PATH}" ]; then
                LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${p}"
            else
                LD_LIBRARY_PATH="${p}"
            fi
        fi
    done
    export LD_LIBRARY_PATH
    
    # Add condor binaries to the path.
    for p in \
        "$CONDORG_INSTALL_PATH/sbin" \
        "$CONDORG_INSTALL_PATH/bin" \
        "/usr/sbin"
    do
        if ! printenv PATH | grep -q "${p}"; then
            PATH="${p}:${PATH}"
        fi
    done
    export PATH
    
    # MUST add the libraries for the 2.95.2 run time libraries.
    for p in \
        "/usr/local/lib"
    do
        if ! printenv LD_LIBRARY_PATH | grep -q "${p}"; then
            if [ -n "${LD_LIBRARY_PATH}" ]; then
                LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${p}"
            else
                LD_LIBRARY_PATH="${p}"
            fi
        fi
    done
    export LD_LIBRARY_PATH
    
  9. Configure /home/dguser/CondorG/etc/condor_config. At the top of the file:
    SKIP_AUTHENTICATION = YES
    AUTHENTICATION_METHODS = CLAIMTOBE
    DISABLE_AUTH_NEGOTIATION = TRUE
    GRIDMANAGER_CHECKPROXY_INTERVAL = 600
    GRIDMANAGER_MINIMUM_PROXY_TIME = 180
    
    (changing the hostname to the host of your resource broker) and modify the following parameters to have the given values:
    CRED_MIN_TIME_LEFT = 0
    GLOBUSRUN = \$(GLOBUS\_LOCATION)/bin/globusrun
    
    You may also wish to modify the CONDOR_ADMIN parameter to set the recipient of email to something other than the dguser account.
  10. Return to the root account.
  11. CondorG currently requires the Globus1 style signing policy file in which all of the policies appear in a single file. Until this is fixed by Condor, you must do the following:
    cat /etc/grid-security/certificates/*.signing_policy \
      > /etc/grid-security/certificates/ca-signing-policy.conf
    
  12. Configure Postgres. First initialize the default DB area. Use the following lines:
    mkdir /opt/data
    chown postgres:postgres /opt/data
    su postgres
    initdb -D /opt/data
    exit
    
  13. Activate Postgresql. To make operation more simple add the following
    lines to /etc/rc.d/init.d/postgresql:
    # Use EDG data location
    export PGDATA=/opt/data
    
    in the start() section just before:
    # Check for the PGDATA structure
    
    Then change in the line:
    su -l postgres -s /bin/sh -c "/usr/bin/pg_ctl  -D \
     $PGDATA -p /usr/bin/postmaster start  > /dev/null 2>&1" < /dev/null
    
    the output file from /dev/null to
     /var/tmp/postgres.log 2>&1
    
    As a last change add in the stop() section just before the su -l postgres
    # Use EDG data location
    export PGDATA=/opt/data
    
    Now to start it:
    /sbin/chkconfig postgresql on
    /etc/rc.d/init.d/postgresql start
    
  14. Give the privileges to create DBs to dguser.
    su postgres
    createuser <<EOD
    dguser
    y
    n
    EOD
    exit
    
  15. Since the Information index is not run on this node anymore, no instruction to configure it is given.
  16. Configure the Resource Broker services. In the sample code lxshare0380 is the RB, lxshare0375 is the proxy server node and lxshare0225 is the BDII node.
    Copy /opt/edg/etc/rb.conf.template to /opt/edg/etc/rb.conf.template and edit it:
    [
      MDS_contact = "lxshare0225.cern.ch";
      MDS_port = 2170;
      MDS_timeout = 60;
      MDS_gris_port = 2135;
      MDS_basedn = "mds-vo-name=local,o=grid";
     
      MDS_multi_attributes = {
    	
    	"AuthorizedUser",
    	"RunTimeEnvironment",
    	"CloseCE"
      };
    
      LB_contact = "lxshare0380.cern.ch";
      LB_port = 7846;
     
      JSS_contact = "lxshare0380.cern.ch";
      JSS_client_port = 8881;
      JSS_server_port = 9991;
      
      JSS_backlog = 5;
      UI_backlog  = 5;
      
      UI_server_port   = 7771;
      
      RB_pool_size = 512;
      RB_notification_queue_size = 32;
      RB_purge_threshold = 600000;	
      RB_cleanup_threshold = 3600;
      RB_sandbox_path = "/tmp";
      RB_logfile="/var/tmp/RBserver.log";
      RB_logfile_size=512000000;
      RB_logfile_level=7;
      RB_submission_retries=3;
      MyProxyServer="lxshare0375.cern.ch";
      SkipJobSubmission = false;
    ]
    
  17. Now start the service
    /sbin/chkconfig broker on
    /etc/rc.d/init.d/broker start
    
  18. To configure the Job Submission service edit /opt/edg/etc/jss.conf. In case you use LCFG you just have to copy the template file.
    [
    	Condor_submit_file_prefix   = "/var/tmp/CondorG.sub";
    	Condor_log_file      	    = "/var/tmp/CondorG.log";
    	Condor_stdoe_dir            = "/var/tmp";
    	Job_wrapper_file_prefix     = "/var/tmp/Job_wrapper.sh";
    	Database_name               = "template1";
    	Database_table_name         = "condor_submit";	
    	JSS_server_port             = 8881;
    	RB_client_port		    = 9991;
    	Condor_log_file_size        = 64000;
    ]
    
  19. Configure /opt/edg/etc/wl-jss_rb-env.sh
    mv /opt/edg/etc/wl-jss_rb-env.sh /opt/edg/etc/wl-jss_rb-env.sh.orig
    cat /opt/edg/etc/wl-jss_rb-env.sh.orig | \
      sed -e "s/CONDOR_IDS=/CONDOR_IDS=\${CONDOR_IDS\:\-2002\.2002}/" \
      > /opt/edg/etc/wl-jss_rb-env.sh.rb
    cp /opt/edg/etc/wl-jss_rb-env.sh.rb /opt/edg/etc/wl-jss_rb-env.sh
    
  20. /var/tmp/CondorG.log must exist and be owned by the dguser account before starting the job submission service.
    touch /var/tmp/CondorG.log
    chown dguser:dguser /var/tmp/CondorG.log
    
  21. Start the Jobsubmission
    /sbin/chkconfig jobsubmission on
    /etc/rc.d/init.d/jobsubmission start
    
  22. The logging and bookkeeping services keep track of the state of submitted jobs and record data useful for debugging problems. These services use a MySQL database to store this information; the database resides on the Resource Broker machine. The locallogger daemons must run on all gatekeeper and resource broker nodes. The lbserver daemons only need to run on the resource broker.

    On a Resource Broker a MySQL has to be run under a non privileged account. The following steps walk you through the required configuration. You have to choose a password for the server. In this example the password is globus_admin.

    mkdir /var/lib/mysql
    mkdir /var/lib/mysql/test
    mkdir /var/lib/mysql/mysql
    chown -R mysql /var/lib/mysql
    /usr/bin/mysql_install_db
    chown -R mysql /var/lib/mysql
    chmod -R og-rw /var/lib/mysql/mysql
    /sbin/chkconfig mysql on
    /etc/rc.d/init.d/mysql start
    
  23. Now configure the running MySQL server, first set the password:
    /usr/bin/mysqladmin -u root password 'globus_admin'
    
    Then use it for the next commands to setup the default tables for logging and bookkeeping:
    /usr/bin/mysqladmin -u root  -p create lbserver
    /usr/bin/mysql -u root -p -e \
    'grant create,drop,select,insert,update,delete on lbserver.* to lbserver@localhost'
    /usr/bin/mysql -u lbserver lbserver < /opt/edg/etc/server.sql
    
    To insure that MySQL is started, as required before logging and bookkeeping servers change the entry in /etc/rc.d/rc3.d
    mv /etc/rc.d/rc3.d/S90mysql /etc/rc.d/rc3.d/S85mysql
    
  24. Now start the logging and bookkeeping services
    /sbin/chkconfig lbserver on
    /sbin/chkconfig locallogger on
    /etc/rc.d/init.d/lbserver start
    /etc/rc.d/init.d/locallogger start
    
  25. As a final step reboot the machine.

4.9.3.4 Preliminary Tests

The tests here are very limited, but still useful.

To quickly check that the Postgres installation worked, you can create a dummy database as the user running the resource broker daemons:

su - dguser
createdb test
psql test
There should be no errors from these two commands.

A quick check to see if the server is responding is the following:

 
openssl s_client -connect \
lxshare0380.cern.ch:7846 -state -debug
This should respond verbosely with information about the SSL connection. Any error indicates a problem with the certificates. You will have to interrupt this command to get back to the command line.

4.10 MyProxy Server

For long-lived jobs there is the possibility that the job will outlive the validity of its proxy causing the job to fail. To avoid this, the workload management tools allow a proxy to be automatically renewed via a MyProxy server. The MyProxy server manages a long-lived proxy generated by a user and gives updated proxies to properly authenticated processes on behalf of the user.

The usual configuration is to have one MyProxy server per resource broker machine. The MyProxy server should run on a separate well-secured machine.

4.10.1 Security

The MyProxy must have a valid host certificate and key installed in the /etc/grid-security directory.

The MyProxy server must have all of the security RPMs installed. In addition, the daemon which updates the certificate revocation lists (see 8.2.2) must also be running.

4.10.2 Configuration

There is a single configuration file /opt/edg/etc/edg-myproxy.conf which should be filled with the subject names of associated resource brokers.

The SysV initialization script remakes the configuration file from the information in the edg-myproxy.conf and from the "signing policy" files in /etc/grid-security/certificates. This is done every time the daemon is started, so all changes are reflected in the running daemon when it is restarted.

4.10.3 Manual Configuration

Download and install the RPMs as described for the previous nodes. Go through the initial configuration steps. Don't forget the cron tab entries for the CRL update daemon. Add the subjects of the associated resource brokers to the /opt/edg/etc/edg-myproxy.conf file. At CERN for example a quite large number of nodes is supported:
/O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0380.cern.ch
/O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0383.cern.ch
/O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0382.cern.ch
/O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0381.cern.ch
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid010g.cnaf.infn.it/Email=sitemanager@cnaf.infn.it
/C=IT/O=INFN/OU=host/L=CNAF/CN=grid004f.cnaf.infn.it/Email=sitemanager@cnaf.infn.it
/C=IT/O=INFN/OU=www server/L=Catania/CN=genius.ct.infn.it/\
Email=falzone@ct.infn.it,roberto.barbera@ct.infn.it
/C=IT/O=INFN/OU=User Interface/L=Catania/CN=grid008.ct.infn.it/\
Email=patrizia.belluomo@ct.infn.it
/C=IT/O=INFN/OU=www server/L=Catania/CN=grid009.ct.infn.it/\
Email=falzone@ct.infn.it
/C=IT/O=INFN/OU=gatekeeper/L=PD/CN=grid012.pd.infn.it/\
Email=Marco.Verlato@padova.infn.it
/C=IT/O=INFN/OU=datagrid-genius/L=Pisa/CN=genius.pi.infn.it/\
Email=livio.salconi@pi.infn.it
/C=IT/O=INFN/OU=GRID UI/L=CNAF Bologna/CN=genius.cnaf.infn.it/\
Email=stefano.zani@cnaf.infn.it
/C=IT/O=INFN/OU=gatekeeper/L=CA/CN=grid004.ca.infn.it/\
Email=daniele.mura@ca.infn.it

To start the server:

/sbin/chkconfig myproxy on
/etc/rc.d/init.d/myproxy start

4.10.4 LCFG based Installation

Configure the server by editing the part of the Proxy-cfg.h file that deals with the configuration of the myproxy object. Edit the line to contain subjects of all trusted RBs.
+myproxy.trusted /O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0380.cern.ch \
/O=Grid/O=CERN/OU=cern.ch/CN=host/lxshare0383.cern.ch
Copy the host certificate to the locations given in /etc/globus.conf. Start the services as described in the previous section

4.11 BDII Node

The BDII node runs a standard open LDAP server to provide index information of the resources available. This information is updated by queries to the MDS system in regular intervals.

4.11.1 Manual Installation

Download the RPMs as before and install them on the target node.Follow the first 5 steps of the UI installation. Add to the line:
3,13,23,33,43,53 * * * * EDG_LOCATION_/etc/cron/bdii-cron 1>/dev/null 2>&1
to the /etc/crontab file.

Follow the steps that are common with the LCFG based installation.

4.11.2 LCFG based Installation

Install a node using the BDII-cfg.h file. No changes are needed.

4.11.3 Common Configuration

4.12 MDS Node

WP3 delivered Information Servers based on LDAP which has been deployed in the EDG testbed.

4.12.1 LCFG based Installation

After configuring site-cfg.h edit MDS-cfg.h to set the name of the top level giis and gris.
globuscfg.gris       alledg
globuscfg.giis       edgpro
No host cerificate is needed for this host. Install the node.

Since there is only a single Top-MDS node in EDG and this node has been setup using LCFG no information is given on manual installation.

System administrators should register their site with the appropriate GIIS at the next highest level.

4.13 Network Monitoring Node

Empty


http://marianne.in2p3.fr