Subsections

8. Appendix

In these sections we collect information that might deepen the understanding or might help configuring certain components. In general this information is not needed to setup a EDG testbed.

8.1 Time Synchronization

Time plays a vital role when checking the validity of certificates. Consequently it is vital that the DataGrid machines be clients of a reliable time server.

If the machines at your site are not already syncronized, then you may use the xntp3 package distributed with the other external packages used by EDG (See 1.4). This package implements the network time protocol (ntp) and allows a machine to be a time client (as well as a time server).

If you use the xntp3 package, then configuring your machine as a time client is rather trivial. You must add at least one time server reference to the ntp configuration file /etc/ntp.conf and configure the machine to run the ntp daemon. The detailed steps are:

The xntp3 package tries to be rather gentle to the system when readjusting the time. One result is that if the time is too far wrong (more than 1000s by default), then the daemon will simply refuse to reset the clock and will die. This is a common problem if you forget to use the ntpdate above.

It is extremely important that the hardware clock is synchronized to the system clock. If not, at the next boot an unsynchronized time will be reloaded and you risk having the time syncronization daemon stop.

If you have a large number of machines, you may wish to create a local time server. Refer to the local documentation of the xntp3 package for instructions.

8.2 Authentication, Authorization, and Security

8.2.1 Certificates

Cryptographic certificates are used to attest to the identity of an user or machine to the extent specified in the issuing certification authority's (CA) policy documents. Users accessing DataGrid resources must have a valid certificate; similarly, hosts offering services within the testbed must also have one.

The EDG-approved CAs have service areas which cover most of Europe and the United States. (Consult the current list on the web.) If a user or site is not covered by an existing CA's service area, then one must either negociate with a CA to extend its service area or start a new CA.

It has been agreed that the CA operated in Lyon is responsible for users without access to a CA.

8.2.1.1 Installing User Certificates

To use the Globus security infrastructure you must have your certificate in PEM format. Follow the instructions below if you need to change a P12-formatted certificate into a PEM-formatted certificate. You should then place the two files "usercert.pem" and "userkey.pem" into a ".globus" directory in your home area. The file permissions for the userkey file should be 0700 for the other 755 is appropriate.

Optionally, you may place your certificate and key in a non-standard location. In this case you must define the two environmental variables X509_USER_CERT and X509_USER_KEY to point to your certificate and key, respectively.

8.2.2 Installing Host Certificates

Host certificate/key pairs should be installed into the directory /etc/grid-security/. The host key must be readable only by root (chmod 0400 hostkey.pem); the host certificate can be world readable (chmod 0444 hostcert.pem).

These certificates may be installed in non-standard locations by setting the values
X509_GATEKEEPER_CERT and X509_GATEKEEPER_KEY to the fully-qualified location of the host certificate and key, respectively.

8.2.2.1 Changing Certificate Formats

Many of the certificate authorities deliver certificates through a web browser. To use these certificates with Globus, they must be exported from the browser and then reformatted for Globus. Exporting is browser-specific so you will need to follow the help provided with your browser. Once you have extracted the certificate you should have a file with a p12 extension. This is in the PKCS12 format; you will need to change this to PEM format. If the edg-utils package is installed on your machine, simply executing

/opt/edg/bin/pkcs12-extract
will create appropriate certificate and key files and place them in the standard location. This is a convenience method for the following:
openssl pkcs12 -nocerts \
        -in cert.p12 \
        -out ~user/.globus/userkey.pem
openssl pkcs12 -clcerts -nokeys 
        -in cert.p12 
        -out ~user/.globus/usercert.pem
The first command gives you your private key; this file must be readable only by you. The second command gives your public certificate. The " user" should be replaced by the path to your home area. The ".globus" directory is standard place to put your certificates.

Popular browsers typically use certificates in PKCS12 format. Consequently you will need to modify the format of the PEM certificates used for Globus to use them within a browser. To change a certificate from PEM format into PKCS12 format (on a machine with edg-utils installed), just issue the following command:

/opt/edg/bin/grid-mk-pkcs12
Again, this is a convenience method for the following:
 openssl pkcs12 -export \
               -out file_name.p12 \
               -name "My certificate" \
               -inkey ~user/.globus/userkey.pem \
               -in ~user/.globus/usercert.pem
where file_name.p12 is the name of the PKCS12 certificate, and the " user" in the last two lines should be replaced by the path to your home area. You must then import the certificate into your browser.

8.2.2.2 Updating Certificate Revocation Lists

Having current certificate revocation lists (CRLs) is an extremely important aspect of the security framework. These lists identify certificates which have been revoked because the user no longer uses them, or they have been compromised. The CRLs can be updated with the command edg-fetch-crl. There is an associated daemon (edg-crl-upgraded) which can be started automatically to retrieve the CRLs periodically. It can be manipulated like all SysV daemon scripts.

Note: if the CRLs are out-of-date, certificates from the associated CA will not be accepted.

8.2.2.3 Virtual Organizations

The current list of virtual organizations can be found on the web. If you did not register with a virtual organization when you signed the EDG Usage Guidelines (or wish to change your VO membership), then you must contact the VO manager directly.

Note: With the Testbed 1 software, membership in more than one virtual organization is not supported. When grid mapfiles are generated the actual organization you will be associated with depends on the order the virtual organizations are listed in a site's mkgridmap configuration file. There is no mechanism by which the user can indicate which virtual organization should be used.

If you really need different roles in the Testbed 1 context, you should request multiple certificates (with slightly different subject names) and register the different subject names with different virtual organizations.

8.2.2.4 VO-specific Software

Most of the virtual organization currently require that some VO-specific software is preinstalled at sites supporting that virtual organization. The list of VO-specific software is published into the information systems from the /opt/edg/info/mds/etc/ldif/ce-static.ldif file by setting one or more RunTimeEnvironment attributes.

The list of RPMs can be obtained from the edg repository.

8.2.2.5 Mapping Users to Local Accounts (grid-mapfile)

Grid users are given access to a site's resources based on a local unix account. The Globus system uses a grid mapfile to map a user's certificate subject into a local account. The grid mapfile is generated from information contained in various virtual organization (VO) membership lists and a local configuration file.

8.2.2.5.1 Individual Accounts

The configuration file allows for three different strategies for creating the local user accounts, each with advantages and disadvantages. The first option is to create a unique local account for every grid user. This allows the environment for each user to be specifically tailored for that user and allows detailed accounting of resource usage through standard mechanisms. The disadvantage is that this involves a lot of maintainance by the system administrator and may involve a large number of accounts being created.

8.2.2.5.2 Shared Account

The second option is for all members of a particular virtual organization to be mapped into a shared account. Administratively this is the easiest solution as it usually involves only setting up one account per virtual organization. However, all detailed accounting information is lost, detailed access control is more difficult, and there are possible resource conflicts between multiple users at the same site.

8.2.2.5.3 Pooled Accounts

The third option is creating pooled accounts. It is similar to the last option but instead pools of identical accounts are created and at any given time only one user (identified by subject name) is using one account. For example, for the Atlas VO a site may create a pool of accounts atlas001, atlas002, etc. This has the advantages that the accounts are easier to maintain and allow detailed accounting. However, there is a need to specify a policy for local resources when a given user stops using a pooled account. (E.g. how long local files are maintained, will the user get the same account when she/he returns, etc.)

8.2.2.5.3.1 Configuration
In addition to the configuration of the mkgridmap script (described below), the accounts and a gridmapdir must setup.

To create a pool of accounts, you must setup individual unix user accounts whose names have a common prefix and a numeric suffix. For example, "atlas001", "atlas002", etc. To map users into this pool the prefix must be specified preceeded with a dot, i.e. ".atlas".

In addition, a gridmapdir must be created; it's default location is /etc/grid-security/gridmapdir, but may be set to a different location in the globus.conf file. An empty file must exist in the gridmapdir for each pooled account; the name of the file must match exactly the account name including both prefix and numeric suffix.

The mapping between a subject name and an individual account is based on the time stamps of the account entries in the gridmapdir and additional files named according to the URL-encoded subject names of the users.

Note: this mapping is fixed until the subject name entry is deleted. Currently this isn't done automatically and if the account pool is exhausted, users will get the same error as if they were not authorized to use the resource.

One important aspect for using pooled accounts is that the grid-mapfile and the /etc/grid-security/gridmapdir directory must be shared between all of the nodes in a site. If this is not done, then it is possible that the mapping will be done inconsistently depending on how a given machine is accessed.

8.2.2.5.4 Generating Mapfile with mkgridmap

The mkgridmap script generates a gridmap file based on user information in the LDAP servers of various virtual organizations.

The behaviour of the script can be highly customized via a configuration file which is located in
/opt/edg/etc/mkgridmap.conf. In its simpliest form, it simply lists the appropriate virtual organizations, the accounts to map these users to, and an auth directive to check that the users have signed the EDG Usage Guidelines.

The following example file (appropriate for a computing element) maps users from the specified virtual organizations to pooled accounts with the given prefix.

group ldap://grid-vo.nikhef.nl/ou=testbed1,o=alice,dc=eu-datagrid,dc=org .alice
group ldap://grid-vo.nikhef.nl/ou=testbed1,o=atlas,dc=eu-datagrid,dc=org .atlas
group ldap://grid-vo.nikhef.nl/ou=tb1users,o=cms,dc=eu-datagrid,dc=org .cms
group ldap://grid-vo.nikhef.nl/ou=tb1users,o=lhcb,dc=eu-datagrid,dc=org .lhcb
group ldap://grid-vo.nikhef.nl/ou=tb1users,o=biomedical,dc=eu-datagrid,dc=org .biome
group ldap://grid-vo.nikhef.nl/ou=tb1users,o=earthob,dc=eu-datagrid,dc=org .eo

group ldap://marianne.in2p3.fr/ou=ITeam,o=testbed,dc=eu-datagrid,dc=org .iteam
group ldap://marianne.in2p3.fr/ou=wp6,o=testbed,dc=eu-datagrid,dc=org .wpsix

auth ldap://marianne.in2p3.fr/ou=People,o=testbed,dc=eu-datagrid,dc=org
This also checks the generated list of users against those who have signed the EDG Usage Guidelines. An example appropriate for a resource broker
group ldap://marianne.in2p3.fr/ou=guidelines,o=testbed,dc=eu-datagrid,dc=org dguser
auth ldap://marianne.in2p3.fr/ou=People,o=testbed,dc=eu-datagrid,dc=org
checks only the group of users who have signed the EDG Usage Guidelines and maps them into the user which runs the broker daemons.

8.2.2.6 Firewalls, Ports, and Site Security

Table 8.1 lists those ports used by various parts of the testbed software. Temporary ports used by Globus can be restricted to a particular range. Nearly all services can be configured to run on non-standard ports, if necessary.


Table 8.1: Ports Used by Various Services
Port Service
80 HTTP server for Network Monitoring
123 Network Time Protocol
2119 Globus Gatekeeper
2135 MDS info port
2169 FTree info port
2170 Information Index
2171 FTree info port
2811 GSI ftp server
3147 RFIO
7771 Resource Broker
7846 Logging & Bookkeeping
8080 Tomcat Server (R-GMA, SpitFire)
8881 Job Sub. Service (client)
9991 Job Sub. Service (server)

There is at least one additional port needed for a two-phase commit job submission. This port has not yet been identified; in the meantime, opening all ports above 1024 will work.

8.2.2.7 GSI and Kerberos (AFS)

The client and server programs gsiklog and gsiklogd allow you to obtain an AFS token by presenting a Grid proxy rather than a Kerberos password.

This software has been produced by Doug Engert of Argonne, with some testing and bug fixes by Helmut Heller and Andrea Parrini.

The source code is available from Argonne National Laboratory and we have produced Linux RPM's built with the Testbed 1 Globus2.0 distribution available from the EDG software repository.

For the client, installation from RPM is very straightforward, with no post-install configuration if the machine is already running as an AFS client. (gsiklog uses the existing AFS configuration files of the afsd cache daemon.)

Once configured, AFS tokens can be acquired in a gsiklogd-enabled cell by simply using the grid-proxy-init and then gsiklog commands. (gsiklog -help lists additional options, including specifying the remote AFS username and remote cell.)

The gsiklogd daemon can most easily be installed on an existing AFS authentification server, as it needs access to the Kerberos key /usr/afs/etc/KeyFile for its cell.

It must also be provided with a Grid key and certificate pair in /etc/grid-security called afskey.pem and afscert.pem, and the distinguished name must end CN=afs/CELL where CELL is the AFS cell name.

Finally, a file /etc/grid-security/afsgrid-mapfile must exist, with the same format as a gatekeeper grid-mapfile, but specifying local AFS usernames rather than unix usernames.

The daemon supports and SysV interface and can be started, stopped, and set to autostart in the customary way (see 3.2).

8.3 Monitoring

8.3.1 Network Monitoring

8.3.2 Application Monitoring with GRM/PROVE

For more information on GRM see GRM - Grid Application Monitor Users Manual. For more information on PROVE, see PROVE-Visualisation tool for Grid Applications.

For linux, installing the RPM does all necessary configuration.

For other operating systems, the following must be done. Replace the terms 'linux' and 'LINUX' in the "grm.spec" and "prove.spec" files with appropriate terms from 8.2. The 'linux' term signifies the architecture. 'LINUX' is the name of the (sub)directory that will contain the binary files for linux.


Table 8.2: Spec File Changes
  architecture Directory Name
Linux linux LINUX
Solaris 2.6, gcc solaris-2.6-gcc SUN4SOL2
SGI Irix 6.x, cc irix-6-cc SGI64

A configuration file with the same name ('linux.def') should be present in the conf/ directory of the source. In the conf/ file there are also configuration files for the irix and solaris operating systems as examples (irix-6-cc.def and solaris-2.6-gcc.def).

8.4 Installing Application Software

The software specific to various applications is available from the EDG package repository. You should install all of the application software necessary to support the users authorized to use your site.

When installing application software be sure that you update the RunTimeEnvironment flags in /opt/edg/info/mds/etc/ldif/ce-static.ldif/ and restart the information systems. This will publish via the information systems that you have installed the given set of software.

8.5 Ftree and Globus MDS Information Services and Information Providers

8.5.1 Introduction

In order for the Job broker to find resouces on which to run a job, and storage elements on which to store data, an information provider needs to be setup. The GIIS, or Grid Index Information Service, is the type of information provider used to locate resources in Testbed-1. The GIIS is based on LDAP.

MDS is the LDAP based information provider which is part of Globus. WP3 has written it's own LDAP type of information provider, where the backend is cached in memory. This was written because performance tests with Globus MDS indicated that the performance of MDS was not adequate. The LDAP based information provider provided by WP3 is called ftree, and it is integrated with OpenLDAP2, not with MDS.

WP3 has also delivered schema files, which define the information to be displayed by both ftree and MDS. The same schema files and information providers are used by both MDS and ftree. This will allow comparative tests between ftree and MDS to take place, while providing the same information.

For further information on LDAP and MDS deployment, along with a description of the schema files see `MDS Deployment - Testbed 1.'

These documents are available in the documentation directory on the WP6 website for testbed-1 under documentation.

The Configuration instructions for the LDAP based information providers which follow apply regardless of whether Globus MDS or ftree is used. In some places they are slightly different, and this will be indicated.

8.5.2 Installation and Configuration

The following instructions explain how to configure the information providers. The installation procedure is largely carried out by installing the appropriate RPMs for the type of machine (site cache/GIIS, SE (storage element) CE (computing element) or netmon (network monitor)). In addition to the Globus RPMs, an RPM needs to be installed for the ftree information service. An RPM, edg-info-main-*.rpm is provided to help configure the information providers and three RPMs are provided to install the information provider scripts, the ones to install are dependent on machine type. Following the installation of the RPMs copy /etc/edg/info-mds.conf.in to /etc/edg/info-mds.conf. Edit info-mds.conf, the variables prefixed with a hash (#) must be edited and the hash removed.

8.6 Base Installation - all machines (site cache/GIIS, SE and CE)

Install

openldap-ftree-*.rpm 
edginfo-main-*.rpm

Common settings for all configurations:

For all configuration, set the values in /etc/globus.conf to

GRIDMAP=/etc/grid-security/grid-mapfile 
GATE_KEEPER_PORT=2119
GLOBUS_LOCATION=/opt/globus/
#GRID_INFO_USER= - This should NOT be root, this should be set to a non privileged user
GRID_INFO_GRIS=yes
GRID_INFO_EDG=yes
and in /etc/edg/info-mds.conf to
WP3_DEPLOY=/opt/edg/info/mds - The directory in which WP3 
                               software is installed. If it is installed using 
                               the RPMs this does not need to be changed
FTREE_INFO_PORT=2171 - The port number for the ftree information server
FTREE_DEBUG_LEVEL=0 - The debug level for ftree, 
                      useful settings are 255 and 256
SITE_DN=Mds-Vo-name=local,o=grid - This should not contain 
                    any spaces and should end in o=grid, if 
                    left blank it will default to the hosts domain
                    components, dc=...,dc= For use with MDS2 use 
                    Mds-Vo-name=local,o=grid

8.6.0.1 Site cache/GIIS Installation

Set the following variables within /etc/globus.conf

#GRID_INFO_GIIS_1=ral - The site name 
#GRID_INFO_REG_GIIS=uk - The country 
#GRID_INFO__REG_HOST=gppmds.gridpp.rl.ac.uk - The country host

Set the following variables within /etc/edg/info-mds.conf

SITE_INFO=yes
NETMON_PRESENT=no
CE_PRESENT=no
SE_PRESENT=no 
#SITE_NAME=RAL - The site name
#SITE_INSTALLATION_DATE=20011115123410Z - This is in the format yyyymmddhhmmssZ
SITE_SYSADMIN_CONTACT=grid.sysadmin@hostname
SITE_USER_SUPPORT_CONTACT=grid.support@hostname
SITE_SECURITY_CONTACT=grid.security@hostname
SITE_DATAGRID_VERSION=1
#SITE_SE_HOSTS=gppse01.gridpp.rl.ac.uk,gppse02.gridpp.rl.ac.uk - 
               This is a comma separated list with no 
               spaces of the host names of the SEs
#SITE_CE_HOSTS=gppa.gridpp.rl.ac.uk - This is a comma separated 
               list with no spaces of the host names of 
               the CEs
#SITE_NETMON_HOST=gppnet.gridpp.rl.ac.uk - This is host name 
               of the network monitor information provider

8.6.0.2 Network Monitor Information Provider Installation

Install

edg-info-netmon-*.i386.rpm

Set the following variables within /etc/globus.conf

#GRID_INFO_GIIS_1=netmon - The GIIS name
#GRID_INFO_REG_GIIS=ral - The site name
#GRID_INFO__REG_HOST=gppmds.gridpp.rl.ac.uk - The site host

Set the following variables within /etc/edg/info-mds.conf

SITE_INFO=no
NETMON_PRESENT=yes
CE_PRESENT=no
SE_PRESENT=no 
#NETMON_PINGER_HOST=network.rl.ac.uk - This is the machine on 
                    which the edg-pinger-*.i386.rpm is installed

8.6.0.3 Storage Element Information Provider Installation

Install

edg-info-se-*.i386.rpm
perl-Filesys-DiskFree-*.rpm

Set the following variables within /etc/globus.conf

#GRID_INFO_GIIS_1=se - The GIIS name
#GRID_INFO_REG_GIIS=ral - The site name
#GRID_INFO__REG_HOST=gppmds.gridpp.rl.ac.uk - The site host

Set the following variables within /etc/edg/info-mds.conf

SITE_INFO=no
NETMON_PRESENT=no
CE_PRESENT=no
SE_PRESENT=yes
#SE_ID=gppse01.gridpp.rl.ac.uk - This may be set manually, if left 
       blank it will default to the local hostname
#SE_SIZE=500 - The size of the storage element in MB
SE_CONTACT=grid.support@hostname
SE_TYPE=disk
#SE_FILESYSTEMS=/dev/hda2,/dev/hda4 - This is a comma separated 
                list with no spaces, these values are 
                used with df to obtain the free space of the SE
#SE_CLOSE_CE=gppa.gridpp.rl.ac.uk - This is a comma separated 
             list with no spaces, the values are the host 
             names of the close computing elements
SE_PROTOCOLS=gridftp,rfio,file - This is a comma separated 
             list with no spaces, the values are the protocols 
             supported by the storage element
SE_PROTOCOL_PORTS=2811,3147, - This is a comma separated list 
                  with no spaces, these values must relate 
                  to the corresponding SE_PROTOCOLS

8.6.0.4 Computing Element Information Provider Installation

Install

CEInformationProviders-*.i386.rpm

Set the following variables within /etc/globus.conf

GRID_INFO_GIIS_1=ce - The GIIS name
GRID_INFO_REG_GIIS=ral - The site name
GRID_INFO__REG_HOST=gppmds.gridpp.rl.ac.uk - The site host

Set the following variables within /etc/edg/info-mds.conf

SITE_INFO=no
NETMON_PRESENT=no
SE_PRESENT=no 
CE_PRESENT=yes
#CE_HOST=gppa.gridpp.rl.ac.uk - This may be set manually, 
         if left blank it will default to the local hostname
#CE_BATCHSYSTEM=pbs - Supported systems are pbs and lsf, 
                bqs will be added shortly
#CE_CLUSTER_BATCH_SYSTEM_BIN_PATH=/usr/pbs/bin - 
              This is the path to the directory containing the 
              queue management commands
#CE_QUEUE=short,long - This is a comma separated list 
          with no spaces of the queue names of the computing 
          element
#CE_CLOSE_SE_ID=gppse01.gridpp.rl.ac.uk, gppse02.gridpp.rl.ac.uk,gppse03.gridpp.rl.ac.uk 
               - This is a comma separated list with no spaces of the 
                 names of close storage elements
#CE_CLOSE_SE_MOUNT_POINT=usr/atlas,,usr/cms - 
               This is a comma separated list with no spaces of the 
               mount points of Close Storage Elements, 
               these values must relate to the corresponding CLOSE_SE_ID's

The CEInformationProviders-*.rpm also installs the file /opt/edg/info/mds/etc/ldif/ce-static.ldif.in. This file has to be copied to /opt/edg/info/mds/etc/ldif/ce-static.ldif, the contents need to be changed to reflect the computing element environment. If required each queue can be customised using individualised static ldif files. If ce-static-queuename.ldif exists then this will be used in place of the ce-static.ldif.

8.6.0.5 Starting the server

The servers can be started and stopped via SysV scripts named edginfo-mds and globus-mds and can be set to autostart with the chkconfig command (see see 3.2).

8.6.0.6 Setting up a virtual organisation or country information service.

Install the edg-info-main RPM. There are 3 files of interest

etc/info-vo.conf
etc/rc.d/init.d/edginfo-vo
opt/edg/info/mds/etc/testbed1-vo.ldif

The only file that should need editing is the testbed1-vo.ldif this needs to contain entries for the sites within the vo/country an example entry is given for RAL

The ftree vo/country server is now ready to roll: /etc/rc.d/init.d/edginfo-vo start .

8.6.0.7 Setup for multiple site servers

There is a requirement for some sites to have more than one server running. Where a site has two or more sets of resources that are used by different VOs, a server will have to be run for each VO. If one or more VO shares the resources then only one server is required. Hence the need to have a .conf separate from globus.conf. A copy of the info.conf will be required for each server, as will copies of the edginfo and the contents of the /opt/edg/info/mds directory.

To set up another server a copies of the wp3-testbed-mds directory (e.g. wp3-testbed-mds-atlas), the edginfo (e.g. edginfo-atlas), and the info.conf (e.g. info-atlas.conf) file will have to be made.

The value for the WP3_DEPLOY in info-atlas.conf will have to be set to wp3-testbed-mds-atlas; the value for INFO_CONFIG in edginfo will have to be set to info-atlas.conf.

Finally, any other site- or VO-specific values will also have to be set in info-atlas.conf.

8.7 R-GMA

The Relational database based information provider has also been written by WP3. This is known as R-GMA, short for Relational Grid Monitoring Architecture. It is again possible to display information using the same schema as that displayed using LDAP and MDS. Again, the performance of this is being tested for comparison with the LDAP and MDS approach.

For further information on R-GMA see `R-GMA Relational Information Monitoring and Management System User Guide'

8.7.1 Installation and configuration of R-GMA - Initial Setup

The R-GMA package consists of seven RPMs and depends on a number of external packages. These components can be obtained from the package repository. Each RPM is described below.

The external packages on which R-GMA depends are listed below. They can also be found from the above mentioned website.

8.7.2 Configuring R-GMA for a Virtual Organization

To set up R-GMA for a virtual organization one has to run one RegistryServlet and one SchemaServlet. These make use of a database to store information about producers (RegistryServlet) and tables (SchemaServlet). These Servlets will initially be run at RAL and the URLs of those Servlets will be made available elsewhere. In the future each virtual organisation will run their own RegistryServlet and SchemaServlet.

To be able to produce information one has to run at least one ProducerServlet, however many producers can use the same ProducerServlet to publish data for them. In the same way one has to run at least one ConsumerServlet to be able to consume data that has been published by a producer. For every servlet the web.xml file describing the web-application has to be configured and for Consumer, Producer, DataBaseProduer and Archiver a properties file has to be configured. Each API class needs to know the location of the respective servlet which services it. The properties files are located in $RGMA_HOME and currently have to be copied into the home directory of the user running the application code that uses the API class in order to be found. The scripts to run the demos and the sensors do this automatically for you. If you are running Tomcat on the same machine as the application code that uses the producer/consumer/archiver APIs the default values for the ServletLocations suffice. The idea behind this setup is, that one can run a sensor that publishes information on each node of a cluster and have one ProducerServlet running on a head node to handle all the requests from consumers.

Each servlet has a number of init parameters that are set at the beginning of the servlet life cycle and are now discussed in turn. The sections about the SchemaServlet and RegistryServlet and Tools are only relevant if you need to build your own VO. The web.xml files have to be configured before Tomcat is started up since they are read at startup time only.

8.7.2.1 ProducerServlet

registryServletLocation is the URL of the RegistryServlet. The ProducerServlet has to be able to contact the Registry to register Producers.

8.7.2.2 DBProducerServlet

registryServletLocation is the URL of the RegistryServlet. The DBProducerServlet has to be able to contact the Registry to register DataBaseProducers.

8.7.2.3 ConsumerServlet

registryServletLocation is the URL of the RegistryServlet. The ConsumerServlet has to be able to contact the Registry to find out about Producers.

8.7.2.4 SchemaServlet

schemaDatabaseLocation is a JDBC URL for the location of the Schema database, see the documentation of your database for more information. The default setting is for a mysql database running on localhost. It probably makes sense to run the database on the same host as the SchemaServlet, but it is not mandatory.

schemaDatabaseUserName is the database user name of the schema database. The default is schema.

schemaDatabasePassword is the clear text password for the above user. The default is info.

8.7.2.5 RegistryServlet

registryDatabaseLocation is a JDBC URL for the location of the Registry database, see the documentation of your database for more information. The default setting is for a mysql database running on localhost. It probably makes sense to run the database on the same host as the RegistryServlet, but it is not mandatory.

registryDatabaseUserName is the database user name of the registry database. The default is registry.

registryDatabasePassword is the clear text password for the above user. The default is info.

schemaServletLocation is the URL of the SchemaServlet. The default is to run the SchemaServlet on the same host as the RegistryServlet, in which case the same database can hold both the registry and schema database.

8.7.2.6 Tools

To populate the Schema database with a set of known tables and to bring the registry database into a clean state with no registered producers the build file in $RGMA_HOME/tools/dbases has to be run. Since soft-state registration is currently not yet implemented the Registry can get into an inconsistent state. The administration of the registry database will be moved into the RegistryServlet in a future release.

8.7.2.7 Demos

The /opt/edg/info/demo directory contains two demos to illustrate the use of R-GMA. To run the demos just install the RPMs and don't configure the servlets or properties files. Also include the RegistryServlet and SchemaServlet in the server.xml file. Run a MySQL server with no password for the root user. Starting up tomcat makes sure all these services are available. In order to run the demo you need to change the permissions on the files named run in SimpleDemo and SimpleDemo/etc to be executable. E.g.

 find . -name run -exec chmod 0755
{} \;
Each subdirectory of the demo directory contains a script called run that takes the name of the respective demo SimpleDemo or ClusterLoad as an argument. The README in each subdirectory explains briefly what is happening.

8.7.3 Installation and configuration of R-GMA - Avaliable Sensors

In this section the available sensors that have been implemented using the R-GMA approach are discussed, with the emphasis upon how the sensors are used.

8.7.3.1 MDS Producer Sensor

The purpose of the MDS Producer sensor is to publish all the information available from a Globus GRIS server or in fact from any LDAP server into the R-GMA and to permit a consumer to access this information via the normal R-GMA approach. Each Site that runs a GRIS/LDAP server should run a MDSProducer.

The Globus GRIS server publishes information about the status of the Grid and its components, such as available CPU nodes, available service types and the status of batch queues. The server is implemented using the LDAP protocol, with the Grid information stored in a hierarchical LDAP directory structure. Each piece of information is associated with an attribute, with the permitted attributes being defined and grouped by an LDAP schema or 'object class'. The context of the information is given by its position within the directory structure.

There are currently 6 schema defined in the Globus2 release:

globusBenchmarkInformation
globusNetworkInterface
globusQueue
globusServiceJobManager
globusSoftware
grADSoftware
Furthermore EDG publishes information according to a number of objectclasses which are republished into the following set of R-GMA tables:
NetMonHostLoss
NetMonHostRTT
NetMonHostThroughput
NetMonLossPacketSize
NetMonRTTPacketSize
NetMonThroughputBufferSize
NetMonTooliperfER
NetMonToolpingER
SiteInfo
StorageElement
StorageElementProtocol
StorageElementStatus

For the Globus GRIS here is exactly one table in R-GMA for each of the objectclasses. Since each schema consists of a number of attributes, these attributes form the column names of the relational table. An additional column is added to each table, giving the LDAP distinguished name (DN), or the context, of the entry. The way the EDG LDAP schemas are used is more complicated especially for the networking information, but there is a correspondence between a table and a certain combination of Objectclasses. R-GMA cannot currently republish information about the FileElement Objectclass because this information is not permanently held in the LDAP server but dynamically created requiring the knowledge of a local filename. The MDSProducer is completely generic and only assumes knowledge about the names of the objectclasses.

The MDS Producer is implemented in JAVA, in the class MDSProducer. The class is supplied with a properties file which points it to a particular LDAP server, and contains a list of table names and corresponding search filters to publish. The properties file, MDSProducer.props, consists of 5 properties in the standard JAVA key=value format with each property taking a new line:

The MDS Producer will then request all entries of each of the specified search filters, starting the search at the given base DN. The information from each of the entries found is then published, along with the DN of the entry, in the appropriate table.

The MDS Producer is likely to run nearby the LDAP server that it is polling, possibly upon the same machine, although this does not have to be the case since the LDAP servers are polled using the standard LDAP wire protocol. Currently there will be one MDS Producer for every GRIS server. It would be easy to implement a system to provide aggregate information about one or more sites. This would involve a simple Consumer-Producer model where the Consumer side subscribes to all the site MDSProducers and then publishes the aggregate information in some suitable format.

The MDSProducer class includes a main() method that runs the pollGRIS method of the MDSProducer class in an infinite loop. The time between subsequent polls is given as a commandline argument in milliseconds. There is currently a bug, which we don't understand, when accessing the ComputingElement objectclass which prevents us from republishing this information.

8.7.3.2 Running the MDSProducer

We assume that a ProducerServlet is deployed and properly configured (registryServletLocation points to your VO's RegistryServlet), tomcat is up and running, the MDSProducer.props file points to an LDAP server that runs the EDG information provider scripts, the property schemaServletLocation points to the VO's SchemaServlet and the file'' run'' in /opt/edg/info/sensors is executable. Now run the command ``run MDSProducer 10000'' which starts the MDSProducer and polls the LDAP server every 10000 milliseconds.

8.8 GSI FTP Daemon

A GSI-enabled daemon must run on any node which needs to serve its local file system to remote users via GridFTP (i.e. via the client globus_url_copy which uses gridftp as the transport protocol). This includes the gatekeeper, resource broker, and storage element nodes.

8.8.1 Security

Incoming requests are authorized via the grid-mapfile mechanism. Consequently, machines running the ftp daemon must have a full security installation. That is the machine must have a host certificate and key installed, a grid-mapfile, and all of the security RPMs which contain the Certificate Authority certificates and Certificate Revocation List URLs installed. The daemons which update the grid-mapfile (23) and CRLs (8.2.2) should also be running.

8.8.2 Configuration

The FTP daemon is configured via the /etc/globus.conf file. 8.3 lists the relevant parameters, their default values, and their descriptions.


Table 8.3: FTP Daemon Parameters
Parameter Default Description
GLOBUS_LOCATION /opt/globus Installation root of Globus software.
GLOBUS_GSIWUFTPD_PORT 2811 Port to use for GSI-enabled FTP.
GLOBUS_GSIWUFTPD_LOG /var/log/globus-gsi_wuftpd.log Location of log file.
X509_GSIWUFTPD_CERT /etc/grid-security/hostcert.pem Location of host certificate.
X509_GSIWUFTPD_KEY /etc/grid-security/hostkey.pem Location of host key.
GRID_GSIWUFTPD_USER root User to run FTP daemon.
GLOBUS_GSIWUFTPD_OPTIONS unspecified Additional FTP daemon options.
GLOBUS_TCP_PORT_RANGE unspecified Range of TCP ports (e.g. "30000,31000")
GLOBUS_UDP_PORT_RANGE unspecified Range of UDP ports (as above)

8.8.3 Control

This daemon is controlled via a standard init.d-style script which support the start, stop, restart, and status directives. (See 3.2 for more details.)


http://marianne.in2p3.fr