Subsections


7. Storage Element

The Storage Element (SE) acts as a Grid interface to mass storage systems (MSS), or to disk. The SE can be used to make existing files in an MSS available to the Grid, or to write files from the Grid and store these in MSS. The SE itself does not store the files, it only keeps cached copies of the files on its own disk (except for disk only SEs of course), plus some metadata for the files such as access control lists etc.

So when you need to access a file from the Grid, you ask the SE to make a copy of the file available on its disk cache. Conversely, when writing a file into the SE, you need to ask the SE for a location in the disk cache to which you can upload the file. In either case you also need to tell the SE when you have finished with the file so the SE can reclaim the space in its disk cache. This step is necessary because a client may make several independent accesses to any given file.

Normally Grid clients don't access an SE directly, but leave it to the Replica management system (see Chapter 6) to manage the files for them. Nevertheless there are cases where a client will wish to access an SE directly, for example when writing output from a job to a specific SE shared for a cluster, or creating temporary files, or other files that need not be replicated.

7.1 TURLs

7.1.1 What is a TURL?

A TURL is a Transfer URL. In the Storage Resource Manager (SRM) model, you tell the SE/SRM that you wish to access an existing file and the SRM provides a URL where you can fetch it. You must then tell the SRM when you have finished accessing the file, so the disk cache space can be reclaimed.

Creating a file is entirely analogous: first you tell the SE/SRM that you wish to create a file with a given size and a given name, and then the SE/SRM sends you a TURL to which you can upload the file. Finally you must tell the SE/SRM that you have finished uploading the file.

7.1.2 An example using the SE command line interface

In this example we create and upload a file to the SE using the WP2 command line interface to the SE web service.

>> edg-se-webservice \
  -i create gppse01.gridpp.rl.ac.uk/nikuufop \
  --endpoint http://gppse01.gridpp.rl.ac.uk:8080/edg-se-webservice/services/edg-se-webservice

gsiftp://gppse01.gridpp.rl.ac.uk//flatfiles/01/data/d0/d0738b9c936f8790a3cd58ea60625642
This returns a request id marked as ``<request id>'' below. (For SEs prior to the 2.1 release, also happens to be a gsiftp turl.) Then we use the getTURL subcommand, specifying the desired protocol:
>> edg-se-webservice -i getTURL	\
                     --endpoint <...> \
                     <request id>

gsiftp://gppse01.gridpp.rl.ac.uk//flatfiles/01/data/d0/d0738b9c936f8790a3cd58ea60625642
This returns the turl for the protocol (``<TURL>'' below). Then we upload the file to the turl:
>> globus-url-copy file://`pwd`/testfile <TURL>

Finally we call commit to tell the SE that we have finished uploading the file, passing again the request id to the SE.

>> edg-se-webservice -i commit \
                     --endpoint <...> \
                     <request id>

The SE interface to the SE (as opposed to the SRM interface to the SE) can be confusing because prior to release TB2.1 it will return a gsiftp TURL as a request id. This is confusing because people then forget to call getTURL subcommand (which is OK if you only wanted a gsiftp turl anyway, because in that case getTURL does nothing).

7.1.3 Simplified SRM Example

We give an example of using the SRM version 1 API to download the file we just uploaded above. The example is simplified in the following ways: we consider only requests on a single file and we simplify the response by only considering the most essential parts of the response.

The example is to fetch and download a file from the SRM using the SRM v1 API.

The client calls

get( "srm://gppse01.gridpp.rl.ac.uk/nikuufop", "gsiftp" )
This command returns a structure which contains
Request ID
e.g. 16726, for SRM v1 request ids are always integers

File Index
An index for the file in the request. This is so each file in the request can be addressed individually. Say, 0 in this case.

State
Typically ``Pending'', ``Ready'', or ``Failed'' at this stage.

TURL
A transfer URL which is valid only when the state is Ready.

If the state above is not ``Ready'', the client poll for a ``Ready'' status with

getRequestStatus(16726)
The number 16726 is the request id returned by the get method. This command returns the same structures as the get command, but perhaps with an updated ``state'' entry.

Once the state becomes ``Ready'', the TURL entry is valid:

gsiftp://gppse01.gridpp.rl.ac.uk//flatfiles/01/data/d0/d0738b9c936f8790a3cd58ea60625642
Then you fetch the file using, e.g., globus-url-copy. Finally, when you're done you must inform the SRM that you have finished with the file.
setFileStatus( 16726, 0, "Done" )
The 0 here is the index of the file in the request, as returned by the intial get method and by getRequestStatus.

7.2 Special TURLs

The TURL is always a URL. This is because the TURL is parsed by replica manager software and this software expects well-formed URLs.

However, sometimes files are not accessed using URLs, and the two canonical examples of this are RFIO and POSIX access (e.g. an NFS mounted SE).

7.2.1 RFIO

For RFIO, the returned TURL may look like this:

rfio://adc0027.cern.ch///flatfiles/SE02/data/5a/5a155b26b642157b7cef1f407206f825
However, the RFIO API expects an RFIO name of the form
adc0027.cern.ch:/flatfiles/SE02/data/5a/5a155b26b642157b7cef1f407206f825
This means that the client has to strip away the ``rfio://'' part, and insert a colon at the end of the hostname. The extra slashes don't matter.

NOTE: The RFIO port number used for to access CASTOR at CERN is 5001, but the IANA defined standard is 3147. The SEs at CERN generally use 3147. You may need to set the rfio_port environment variable to the value 3147 before using RFIO clients to access the SE.

7.2.2 POSIX

If you request a file TURL with getTURL, it will return something like the following:

file:////flatfiles/SE02/data/5a/5a155b26b642157b7cef1f407206f825
In this case the hostname is already stripped out by the SE. Java can open a TURL of this format. The standard C library POSIX cannot cope with it, and it is necessary for the client to strip away the initial ``file://'':
        FILE *
        turl_open( char const *turl, char const *access )
        {
                char const proto[] = "file://";
                if( strncmp( proto, turl, strlen(proto) ) ) {
                        errno = EFAULT;
                        return (FILE *)NULL;
                }
                turl += strlen( proto );
                return fopen( turl, access );
        }
This example assumes that the SE is mounted on a directory that makes the file path identical on the SE and on the client-which is generally the case. Otherwise the client will have to translate the path as well (or the SE will have to be configured to do this, but this will only work if the translation is the same for all clients).


http://marianne.in2p3.fr