
  The Dynamically-Updated Request Online Coallocator
  DUROC v0.8

  This is a document to specify the existing DUROC v0.8 implementation and
  interfaces, as they are provided in the Globus v1.0 release. This document
  serves as a reference, and examples can be found in the Examples directory
  of the Globus distribution.

  This document is maintained and best-viewed in HTML. The ASCII version
  suppresses typefaces and the hyperlinks which ease navigation.


 Introduction to DUROC

  DUROC provides a simple distributed-job coallocator with dynamic
  job-reconfiguration capabilities. It is meant to provide the common
  infrastructure needed by most distributed Globus applications, to expedite
  further exploration of resource brokering.

  This document is divided into the following sections:

       General information

          * Coallocator requirements and motivation
            What is a coallocator for?
          * Refinement-based management architecture
            Which management tasks does DUROC handle?
            Which management tasks doen't DUROC handle?
          * Atomic requests
            What are the basic DUROC job-start semantics?
          * Coallocated-resource specification language
            How are resource requests described?
          * DUROC request processing (allocation)
            How are DUROC requests processed, and job status values
            defined?
          * Generic coallocation API
            What are the programmatic interfaces to DUROC?
               o Control (requestor-side) API
                    + globus_module_activate (GLOBUS_DUROC_CONTROL_MODULE)
                    + globus_module_deactivate (GLOBUS_DUROC_CONTROL_MODULE)
                    + globus_duroc_control_init ( )
                    + globus_duroc_control_job_request ( )
                    + globus_duroc_control_subjob_add ( )
                    + globus_duroc_control_subjob_delete ( )
                    + globus_duroc_control_barrier_release ( )
                    + globus_duroc_control_job_cancel ( )
                    + globus_duroc_control_subjob_states ( )
               o Runtime (application-side) API
                    + globus_module_activate (GLOBUS_DUROC_RUNTIME_MODULE)
                    + globus_module_deactivate (GLOBUS_DUROC_RUNTIME_MODULE)
                    + globus_duroc_runtime_barrier ( )
                    + globus_duroc_runtime_inter_subjob_structure ( )
                    + globus_duroc_runtime_inter_subjob_send ( )
                    + globus_duroc_runtime_inter_subjob_receive ( )
                    + globus_duroc_runtime_intra_subjob_rank ( )
                    + globus_duroc_runtime_intra_subjob_size ( )
                    + globus_duroc_runtime_intra_subjob_send ( )
                    + globus_duroc_runtime_intra_subjob_receive ( )
               o Bootstrap (application-side utility) Library
                    + globus_module_activate (GLOBUS_DUROC_BOOTSTRAP_MODULE)
                    + globus_module_deactivate (GLOBUS_DUROC_BOOTSTRAP_MODULE)
                    + globus_duroc_bootstrap_subjob_exchange ( )
                    + globus_duroc_bootstrap_master_sp_vector ( )
                    + globus_duroc_bootstrap_ordered_master_sp_vector ( )

       Installation information

          * DUROC source manifest
            What files are included in the distribution?
          * Building DUROC
            How do I compile the sources for my platform?
          * Installing DUROC
            How do I install on my machine?

       Library usage instructions

          * Using the DUROC libraries
            How do I compile my code with DUROC?
          * Using the DUROC tools
            What tools are provided with DUROC?
          * Known bugs and limitations
            What known problems are there with the implementation and/or
            documentation?


 Coallocator requirements and motivation

  The Globus environment includes resource managers to provide access to a
  range of system-dependent schedulers. Each resource manager (RM) provides an
  interface to submit jobs on a particular set of physical resources.

  In order to execute jobs which need to be distributed over resources
  accessed through independent RMs, a coallocator is used to coordinate
  transactions with each of the RMs and bring up the distributed pieces of the
  job. The coallocator must provide a convenient interface to obtain resources
  and execute jobs across multiple management pools.


 Reflective management architecture

  The task an intelligent coallocation agent performs has two abstractly
  distinct parts. First, the agent must process resource specifications to
  determine how a job might be distributed across the resources of which it is
  aware--the agent lowers an abstract specification such that portions of the
  specification are allocated to the individual RMs that control access to
  those required resources. Second, the agent must process the lowered
  resource specification as part of a job request to actually attempt resource
  allocation--the agent issues job requests to each of the pertinent RMs to
  schedule the job.

  The process of lowering a resource specification in a job request in essence
  refines the request based on information available to the lowering agent. By
  separating the tasks of refinement and allocation in the architecture, we
  can allow user intervention to adjust the refinement based on information or
  constraints beyond the heuristics used internally by a particular automated
  agent. A GUI specification-editor has been suggested as a meaningful mode of
  user (job requester) intervention.

       spec1 : resource specification
       spec2 : resource specification
       lower (spec1)   -->   spec2

       spec : resource specification
       job : job contact information (or error status)
       request (spec)   -->   job

       lowering example:
       lower ( (count=5) )   -->
        (+ (& (count=3) (resourceManagerContact=RM1))
           (& (count=2) (resourceManagerContact=RM2)))

  DUROC implements the allocation operation across multiple RMs in the Globus
  test-bed and leaves lowering decisions to higher-level tools.


 Atomic requests

  Once a resource specification has been refined the agent must attempt to
  allocate resources. In general the resources might managed by different RMs,
  and the coallocator must atomically schedule the user's single abstract job
  or fail to schedule the job. Because the GRAM interface does not provide
  support for inter-manager atomicity, the user code must be augmented to
  implement a job-start barrier; as distributed components of the job become
  active, they must rendezvous with the allocating agent to be sure all
  components were successfully started prior to performing any non-restartable
  user operations.

       main :
          job_start_barrier ( )
          . . .
          user_operations ( )

  atomicity of job creation can only guaranteed after the barrier, so the user
  should not perform operations which cannot be reversed, e.g. certain
  persistent effects or input/output operations, until after the barrier.


 Coallocated resource specification language

  DUROC shares its Resource Specification Language (RSL) with GRAM.

  DUROC can perform allocations described by a 'lowered' resource
  specification.

  The task of the lowering agent is to take a resource request of some form,
  be it a generalized GRAM request or user inputs to a GUI interface, and
  produce a lowered request so that DUROC can directly acquire the resources
  for the user.

  The allocation semantics for DUROC requests are that each component of the
  top-level multi-request represents one GRAM request that DUROC should make
  as part of the distributed job DUROC is allocating. In order to make the
  request, DUROC must be able to determine what RM to contact.

  Typically there will be additional terms in the conjunctions of the lowered
  request, and those terms will be passed on verbatim in each GRAM request.
  DUROC will extract each component of the lowered multi-request, remove the
  DUROC-specific components of the subrequest, and then forward that
  subrequest to the specified GRAM. Therefore any other attributes supported
  by GRAM are implicitly supported by DUROC. For example:

       +(&(resourceManagerContact=RM1)(count=3)(executable=myprog.sparc))
        (&(resourceManagerContact=RM2)(count=2)(executable=myprog.rs6000))

  in this request the executables and node counts are specified for each
  resource pool. While GRAM may in fact require fields such as these, DUROC
  treats them as it would any other fields not needed to do its job--it
  forwards them in the subrequests and it is up the the RMs to either
  successfully handle the request or return a failure-code back to DUROC
  (which will then return an appropriate code to the user).


 DUROC request processing (coallocation)

  Requests submitted to the DUROC API are decomposed into the individual GRAM
  requests and each request is submitted through the GRAM API. A DUROC request
  proceeds with each GRAM request in the job that succeeds. Runtime features
  available to the job processes include a start barrier and inter-process
  communications to help coordinate the job processes.

  The start barrier allows the processes to synchronize before performing any
  non-restartable operations. In the absence of a start barrier, there is no
  way to guarantee that all job components are successfully created prior to
  executing user code. The communications library provides two simple
  mechanisms to send start-up and bootstrapping information between processes:
  an inter-subjob mechanism to communicate between ``node 0'' of each subjob,
  and an intra-subjob mechanism to communicate between all the nodes of a
  single subjob. A library of common bootstrapping operations is provided,
  using the public inter-subjob and intra-subjob communication interfaces.

  It is important to note that the bootstrapping interfaces are designed to be
  reliable and portable. They do not necessarily provide high-performance nor
  asynchronous, concurrent messaging. To user should bootstrap their own
  communications environment and completely switch over to it; failure to do
  so may result in confusing deadlock situations where the bootstrapping
  interfaces prevent forward progress in the user communications or vice
  versa.

  For each GRAM subjob in the DUROC job, there are two optional RSL fields
  which affect the subjob behavior.

  The `subjobStartType' field allows the user to configure each subjob to
  either participate in the start barrier with strict subjob-state monitoring
  (value `strict-barrier'), participate in the start barrier without strict
  subjob-state monitoring (value `loose-barrier), or not participate in the
  barrier at all (value `no-barrier'). Subjobs that don't perform the barrier
  run forward independently of the other subjobs. Strict state monitoring
  means that the job will be automatically killed if the subjob terminates
  prior to completing the barrier.

  The `subjobCommsType' field allows the user to configure each subjob to
  either join the inter-subjob communications group as a blocking operation
  (value `blocking-join') or not join the inter-subjob communications group at
  all (value `independent'). When joining the group as a blocking operation,
  all participating subjobs will join together, i.e. the communications
  startup function will function as a group barrier.


 Generic resource coallocation API

  The resource coallocation API provides functions for submitting a job
  request to a broker, for editing a submitted request, for cancelling a
  request, and for requesting job state information.

  The Dynamically-Updated Request Online Coallocator API (DUROC) is similar to
  that of the Resource Management API (GRAM), with the addition of the
  subjob-add, subjob-delete, and barrier-release operations for managing
  resources, the runtime-barrier operation which must be performed during the
  startup of each node, and the job-structure and inter-subjob communication
  interface operations, which at runtime provide a mechanism for job
  self-organization.

  The following API documents the DUROC v0.8 API, including runtime operations
  necessary to use DUROC v0.8.

  Duroc control-library API

  globus_module_activate (GLOBUS_DUROC_CONTROL_MODULE)

  Activate the DUROC control-library API implementation prior to using any of
  the API functions.

  int
  globus_module_activate (GLOBUS_DUROC_CONTROL_MODULE)

          * Returns GLOBUS_SUCCESS if successful, otherwise one of: [no
            errors currently defined]

  globus_module_deactivate (GLOBUS_DUROC_CONTROL_MODULE)

  Deactivate the DUROC control-library API implementation when finished using
  any of the API functions.

  int
  globus_module_deactivate (GLOBUS_DUROC_CONTROL_MODULE)

          * Returns GLOBUS_SUCCESS if successful, otherwise one of: [no
            errors currently defined]

  globus_duroc_control_init ( )

  Initialize a globus_duroc_control_t object for subsequent coallocated-job
  submission and control.

  int
  globus_duroc_control_init (globus_duroc_control_t * controlp)

          * controlp is the globus_duroc_control_t object to initialize.
          * Returns GLOBUS_DUROC_SUCCESS if successful, otherwise one of:
            [no errors currently defined]

  A single globus_duroc_control_t object can be used to concurrently submit
  and control multiple DUROC jobs.

  globus_duroc_control_job_request ( )

  Request coallocation of interactive resources at the current time.

  int
  globus_duroc_control_job_request
                                (globus_duroc_control_t  * controlp,
                                 const char              * description,
                                 int                       job_state_mask,
                                 const char              * callback_contact,
                                 char                   ** job_contactp,
                                 int                     * subreq_countp,
                                 int                    ** subreq_resultsp)

          * controlp points to a globus_duroc_control_t object previously
            initialized with globus_duroc_control_init().
          * description is a description of the requested job.
          * job_state_mask is 0 or a bitwise OR of the
            GLOBUS_DUROC_JOB_STATE_* states listed above or
            GLOBUS_DUROC_JOB_STATE_ALL [currently ignored].
          * callback_contact is the URL to which events about the job
            should be reported [currently ignored].
          * job_contactp is a pointer to a character string storage
            pointer.
          * subreq_countp is a pointer to integer storage.
          * subreq_resultsp is a pointer to an integer storage array
            pointer.
          * If successful, *job_contactp is set to a unique identifier
            for the job and can be used as a handle in other DUROC API
            functions on this globus_duroc_control_t object,
            *subreq_count is set to the number of subrequests found in
            description, and *subreq_resultsp is set to point at a
            freshly-allocated array of integers holding result codes for
            each subrequest. Subjob result codes are either
            GLOBUS_SUCCESS, or an error code that can be interrogated
            with globus_duroc_error_string() to obtain a human-readable
            message.
          * Return GLOBUS_DUROC_SUCCESS on success, or one of the
            following error codes: [no errors currently defined]
          * The array returned in *subreq_resultsp and the string
            returned in *job_contactp should be freed with globus_free()
            when the values are no longer needed.

  A job submitted through this interface can subsequently be controlled with
  the other DUROC API functions by providing the submitted job's contact
  string to those calls.

  globus_duroc_control_subjob_add ( )

  Augment a coallocation with an additional interactive resource at the
  current time.

  int
  globus_duroc_control_subjob_add
                                (globus_duroc_control_t  * controlp,
                                 const char              * job_contact,
                                 const char              * subjob_description)

          * controlp points to a globus_duroc_control_t object previously
            initialized with globus_duroc_control_init().
          * job_contact is as returned by globus_duroc_job_request.
          * subjob_description is a description of the subjob to be
            added.
          * Return GLOBUS_DUROC_SUCCESS on success, or one of the
            following error codes: [no errors currently defined]

  A job modified through this interface can subsequently be controlled with
  the other DUROC API functions by providing the job's contact string to those
  calls.

  globus_duroc_control_subjob_delete ( )

  Modify a coallocation by removing an interactive resource at the current
  time.

  int
  globus_duroc_control_subjob_delete
                                (globus_duroc_control_t  * controlp,
                                 const char              * job_contact,
                                 const char              * subjob_label)

          * controlp points to a globus_duroc_control_t object previously
            initialized with globus_duroc_control_init().
          * job_contact is as returned by
            globus_duroc_control_job_request.
          * subjob_label is the label of a subjob previously created via
            globus_duroc_control_job_request or
            globus_duroc_control_subjob_add.
          * Return GLOBUS_DUROC_SUCCESS on success, or one of the
            following error codes: [no errors currently defined]

  A job modified through this interface can subsequently be controlled with
  the other DUROC API functions by providing the job's contact string to those
  calls.

  globus_duroc_control_barrier_release ( )

  Allow a requested coallocation to run forward when all subjobs have entered
  the barrier.

  int
  globus_duroc_control_barrier_release
                                (globus_duroc_control_t * controlp,
                                 const char             * job_contact,
                                 globus_bool_t            wait_for_subjobs)

          * controlp is the same globus_duroc_control_t object to which
            the job was submitted.
          * job_contact is as returned by globus_duroc_job_request
          * wait_for_subjobs is one of the values:
               o GLOBUS_TRUE: release subjobs once they all enter barrier
               o GLOBUS_FALSE: release subjobs if they have already
                 entered the barrier, cancel the job if any subjob has
                 not yet entered the barrier
          * Returns GLOBUS_DUROC_SUCCESS if successful, otherwise one of:
            [no errors currently defined]

  This routine allows subjobs to run forward past the runtime barrier, and
  currently delimits a point after which subjobs cannot be added or deleted.

  globus_duroc_control_job_cancel ( )

  Remove a Pending job request or kill processes associated with an Active
  request, releasing any associated resources, if such action is supported by
  the associated resource managers.

  int
  globus_duroc_control_job_cancel
                                (globus_duroc_control_t * controlp,
                                 const char             * job_contact)

          * controlp is the same globus_duroc_control_t object to which
            the job was submitted.
          * job_contact is as returned by grub_job_request
          * Returns GLOBUS_DUROC_SUCCESS if successful, otherwise one of:
            [no errors currently defined]

  This routine ``succeeds'' if the job is known. A successful return code does
  not guarantee that all job resources were successfully released.

  globus_duroc_control_subjob_states ( )

  Obtain a snapshot of the status of each subjob in a submitted DUROC job.

  int
  globus_duroc_control_subjob_states
                                (globus_duroc_control_t   * controlp,
                                 const char               * job_contact,
                                 int                      * subjob_countp,
                                 int                     ** subjob_statesp,
                                 char                   *** subjob_labelsp)

          * controlp is the same globus_duroc_control_t object to which
            the job was submitted.
          * job_contact is as returned by globus_duroc_job_request
          * subjob_countp is a pointer to integer storage.
          * subjob_statesp is a pointer to an integer storage array
            pointer.
          * subjob_labelsp is a pointer to a string storage pointer.
          * If successful, *subjob_count is set to the number of subjobs
            for which state information is known, *subjob_statesp is set
            to point to a freshly-allocated array of integer subjob
            states, and *subjob_labelsp is set to point to a
            freshly-allocated array of freshly-allocated string labels
            (or NULL values for subjobs which weren't given labels by the
            user).
          * The individual subjob states are defined as:
               o GLOBUS_DUROC_SUBJOB_STATE_PENDING: the subjob's GRAM
                 request succeeded
               o GLOBUS_DUROC_SUBJOB_STATE_ACTIVE: the subjob's GRAM job
                 is active (but not checked in)
               o GLOBUS_DUROC_SUBJOB_STATE_CHECKED_IN: the subjob runtime
                 system has checked in
               o GLOBUS_DUROC_SUBJOB_STATE_RELEASED: the subjob runtime
                 system has checked in and been released
               o GLOBUS_DUROC_SUBJOB_STATE_DONE: the subjob's GRAM job is
                 done and the subjob runtime system has been released
               o GLOBUS_DUROC_SUBJOB_STATE_FAILED: the subjob's GRAM job
                 has terminated and the subjob runtime system was not
                 released
          * Returns GLOBUS_DUROC_SUCCESS if successful, otherwise one of:
            [no errors currently defined]
          * The arrays and strings returned in *subjob_statesp and
            subjob_labelsp should be freed with globus_free() when the
            values are no longer needed.

  This routine can effectively be used in a polling loop to monitor the status
  of a job, for example in the display loop of a GUI agent.

  DUROC runtime-library API

  globus_module_activate (GLOBUS_DUROC_RUNTIME_MODULE)

  Activate the DUROC runtime-library API implementation prior to using any of
  the API functions.

  int
  globus_module_activate (GLOBUS_DUROC_RUNTIME_MODULE)

          * Returns GLOBUS_SUCCESS if successful, otherwise one of: [no
            errors currently defined]

  globus_module_deactivate (GLOBUS_DUROC_RUNTIME_MODULE)

  Deactivate the DUROC runtime-library API implementation when finished using
  any of the API functions.

  int
  globus_module_deactivate (GLOBUS_DUROC_RUNTIME_MODULE)

          * Returns GLOBUS_SUCCESS if successful, otherwise one of: [no
            errors currently defined]

  globus_duroc_runtime_barrier ( )

  Rendezvous with the coallocator to implement job-start atomicity and
  coordinate the distributed processes.

  void
  globus_duroc_runtime_barrier ()

          * Returns only when all barrier participants have successfully
            entered the barrier.

  This routine is called by the job processes at startup to implement
  job-start atomicity. It is not really part of the coallocation API in that
  it is called by the job, rather than by the process requesting a job.

  globus_duroc_runtime_inter_subjob_structure ( )

  Get the layout of the DUROC job.

  The DUROC inter-subjob communication routines can only be called on the
  subjob node where globus_duroc_runtime_intra_subjob_rank() reports the rank
  as zero (0)!

  int
  globus_duroc_runtime_inter_subjob_structure
                                (int  * local_addressp,
                                 int  * remote_countp,
                                 int ** remote_addressesp)

          * local_addressp is a pointer to integer storage.
          * remote_countp is a pointer to integer storage.
          * remote_addressesp is a pointer to an integer storage array
            pointer.
          * Return GLOBUS_DUROC_SUCCESS and initialize *local_addressp
            with the local subjob's communication address, *remote_countp
            with the number of remote subjobs, and *remote_addressesp
            with a freshly-allocated array containing the remote subjobs'
            communication addresses, or return one of the following error
            codes: [no errors currently defined]
          * The array returned in *remote_addressesp should be freed with
            globus_free() when the values are no longer needed.

  This routine is called by the job processes after the inter-subjob
  initialization operation to find the layout of the job. It is not really
  part of the coallocation API in that it is called by the job, rather than by
  the process requesting a job.

  globus_duroc_runtime_inter_subjob_send ( )

  Send a byte-vector to another subjob in the DUROC job.

  The DUROC inter-subjob communication routines can only be called on the
  subjob node where globus_duroc_runtime_intra_subjob_rank() reports the rank
  as zero (0)!

  int
  globus_duroc_runtime_inter_subjob_send
                                (int             dst_addr,
                                 const char    * tag,
                                 int             msg_size,
                                 globus_byte_t * msg)

          * dst_rank is the address of the destination subjob.
          * tag is a nul-terminated string which must match that provided
            to the receive call on the destination subjob.
          * msg_len is the number of bytes of payload to send.
          * msg is a pointer to the payload of msg_len values of type
            globus_byte_t.
          * Return GLOBUS_DUROC_SUCCESS or one of the following error
            codes: [no errors currently defined]

  This routine is called by the job processes after the inter-subjob
  initialization operation to transmit messages between subjobs. The data is
  received by a corresponding call to
  globus_duroc_runtime_inter_subjob_receive at the destination subjob. It is
  not really part of the coallocation API in that it is called by the job,
  rather than by the process requesting a job.

  globus_duroc_runtime_inter_subjob_receive ( )

  Receive a byte-vector sent by another subjob in the DUROC job.

  int
  globus_duroc_runtime_inter_subjob_receive
                                (const char     * tag,
                                 int            * msg_sizep,
                                 globus_byte_t ** msgp)

  The DUROC inter-subjob communication routines can only be called on the
  subjob node where globus_duroc_runtime_intra_subjob_rank() reports the rank
  as zero (0)!

          * tag is a nul-terminated string which must match that provided
            to the send call on the originating subjob.
          * msg_sizep is a pointer to integer storage.
          * msgp is a pointer to a globus_byte_t storage array pointer.
          * Return GLOBUS_DUROC_SUCCESS and initialize *mesg_sizep with
            the length of the incoming message payload and *msgp with a
            freshly allocated array of globus_byte_t values containing
            the message payload, or return one of the following error
            codes: [no errors currently defined]
          * The array returned in *msgp should be freed with
            globus_free() when the values are no longer needed.

  This routine is called by the job processes after the inter-subjob
  initialization operation to receive messages from other subjobs. The data is
  transmitted by a corresponding call to
  globus_duroc_runtime_inter_subjob_send at the originating subjob with a
  matching message tag, and messages are queued and reordered if the subjob
  receives messages with a different tag than the one requested by the
  receiving subjob process. It is not really part of the coallocation API in
  that it is called by the job, rather than by the process requesting a job.

  globus_duroc_runtime_intra_subjob_rank ( )

  Obtain the rank of the local subjob process.

  int
  globus_duroc_runtime_intra_subjob_rank (int * rankp)

          * rankp is a pointer to integer storage.

  This routine is called by the job processes after the intra-subjob
  initialization operation to obtain the rank of the local subjob process. It
  is not really part of the coallocation API in that it is called by the job,
  rather than by the process requesting a job.

  globus_duroc_runtime_intra_subjob_size ( )

  Obtain the rank of the local subjob process.

  int
  globus_duroc_runtime_intra_subjob_size (int * sizep)

          * sizep is a pointer to integer storage.

  This routine is called by the job processes after the intra-subjob
  initialization operation to obtain the number of local subjob processes. It
  is not really part of the coallocation API in that it is called by the job,
  rather than by the process requesting a job.

  globus_duroc_runtime_intra_subjob_send ( )

  Send a byte-vector to another process in the DUROC subjob.

  void
  globus_duroc_runtime_intra_subjob_send
                                (int               dst_rank,
                                 const char      * tag,
                                 int               msg_size,
                                 globus_byte_t   * msg)

          * dst_rank is the rank of the destination process.
          * tag is a nul-terminated string which must match that provided
            to the receive call on the destination process.
          * msg_len is the number of bytes of payload to send.
          * msg is a pointer to the payload of msg_len values of type
            gram_byte_t.

  This routine is called by the job processes after the intra-subjob
  initialization operation to transmit messages between subjob processes. The
  data is received by a corresponding call to
  globus_duroc_runtime_intra_subjob_receive at the destination subjob. It is
  not really part of the coallocation API in that it is called by the job,
  rather than by the process requesting a job.

  globus_duroc_runtime_intra_subjob_receive ( )

  Receive a byte-vector sent by another process in the DUROC subjob.

  void
  globus_duroc_runtime_intra_subjob_receive
                                (const char      * tag,
                                 int             * msg_sizep,
                                 globus_byte_t   * msg)

          * tag is a nul-terminated string which must match that provided
            to the send call on the originating process.
          * msg_sizep is a pointer to integer storage.
          * msgp is a pointer to a gram_byte_t storage array of at least
            GRAM_MYJOB_MAX_BUFFER_LENGTH bytes.
          * Return GLOBUS_DUROC_SUCCESS and initialize *mesg_sizep with
            the length of the incoming message payload and msg[0] to
            msg[(*msg_sizep)-1] with the message payload.

  This routine is called by the job processes after the intra-subjob
  initialization operation to receive messages from other subjob processes.
  The data is transmitted by a corresponding call to
  globus_duroc_runtime_intra_subjob_send at the originating process with a
  matching message tag, and messages are queued and reordered if the process
  receives messages with a different tag than the one requested by the
  receiving call. It is not really part of the coallocation API in that it is
  called by the job, rather than by the process requesting a job.

  DUROC bootstrap-library

  globus_module_activate (GLOBUS_DUROC_BOOTSTRAP_MODULE)

  Activate the DUROC bootstrap-library implementation prior to using any of
  the API functions.

  int
  globus_module_activate (GLOBUS_DUROC_BOOTSTRAP_MODULE)

          * Returns GLOBUS_SUCCESS if successful, otherwise one of: [no
            errors currently defined]

  globus_module_deactivate (GLOBUS_DUROC_BOOTSTRAP_MODULE)

  Deactivate the DUROC bootstrap-library implementation when finished using
  any of the API functions.

  int
  globus_module_deactivate (GLOBUS_DUROC_BOOTSTRAP_MODULE)

          * Returns GLOBUS_SUCCESS if successful, otherwise one of: [no
            errors currently defined]

  globus_duroc_bootstrap_subjob_exchange ( )

  Perform an exchange of information between subjobs.

  void
  globus_duroc_bootstrap_subjob_exchange
                                 (const char   * local_info,
                                  int          * subjob_countp,
                                  int          * local_indexp,
                                  char       *** subjob_info_arrayp)

          * local_info is a nul-terminated string of information to be
            broadcast to other subjobs.
          * subjob_countp and local_indexp are pointers to integer
            storage.
          * *subjob_info_arrayp is a pointer to string pointers.
          * Returns after the exchange is complete, initializing
            *subjob_countp with the total number of subjobs,
            *local_indexp with the local subjob's index (0 <=
            *local_indexp < *subjob_countp), and *subjob_info_arrayp with
            an array of subjob information strings. The string in
            (*subjob_info_arrayp)[i] is the local information string
            broadcast by the ith subjob.
          * The array and strings returned in *subjob_info_arrayp should
            be freed with globus_free() when the values are no longer
            needed by the caller.

  This routine is called by the job processes after the bootstrap activation
  operation to exchange string information between subjobs. It is not really
  part of the coallocation API in that it is called by the job, rather than by
  the process requesting a job.

  globus_duroc_bootstrap_master_sp_vector ( )

  Construct a vector of Nexus startpoints on the master node.

  void
  globus_duroc_bootstrap_master_sp_vector
                                (nexus_startpoint_t  * local_sp,
                                 int                 * job_sizep,
                                 nexus_startpoint_t ** sp_vectorp)

          * *local_sp is the startpoint to send to the master.
          * job_sizep is a pointer to integer storage.
          * *sp_vectorp is a pointer to Nexus startpoints.
          * Returns after the construction is complete, initializing
            *sp_vectorp on one master node with an array of Nexus
            startpoints, and *job_sizep with the total number of
            processes; on all other nodes *sp_vectorp is set to NULL and
            *job_sizep is undefined. The startpoint (*sp_vectorp)[i] is
            the local startpoint provided by the ith node. The master is
            always node 0 (zero).
          * The array returned in *sp_vectorp should be freed with
            globus_free() after the values are no longer needed by the
            caller and after nexus_startpoint_destroy() has been called
            on each startpoint.

  This routine is called by the job processes after the bootstrap activation
  operation to construct a startpoint vector on the master node. It is not
  really part of the coallocation API in that it is called by the job, rather
  than by the process requesting a job.

  globus_duroc_bootstrap_ordered_master_sp_vector ( )

  Construct a vector of Nexus startpoints on the master node.

  void
  globus_duroc_bootstrap_ordered_master_sp_vector
                                (nexus_startpoint_t  * local_sp,
                                 int                   subjob_index,
                                 int                 * job_sizep,
                                 nexus_startpoint_t ** sp_vectorp)

          * *local_sp is the startpoint to send to the master.
          * subjob_index is the user-assigned index to position this
            subjob with respect to other subjobs (indices must be unique
            and in the range [0,N-1] for N subjobs).
          * job_sizep is a pointer to integer storage.
          * *sp_vectorp is a pointer to Nexus startpoints.
          * Returns after the construction is complete, initializing
            *sp_vectorp on one master node with an array of Nexus
            startpoints, and *job_sizep with the total number of
            processes; on all other nodes *sp_vectorp is set to NULL and
            *job_sizep is undefined. The startpoint (*sp_vectorp)[i] is
            the local startpoint provided by the ith node. The master is
            always node 0 (zero) and belongs to the subjob with
            subjob_index of 0 (zero).
          * The array returned in *sp_vectorp should be freed with
            globus_free() after the values are no longer needed by the
            caller and after nexus_startpoint_destroy() has been called
            on each startpoint.

  This routine is called by the job processes after the bootstrap activation
  operation to construct a startpoint vector on the master node. It differs
  from the simpler globus_duroc_bootstrap_master_sp_vector() routine in that
  it allows some extra control over the selection of a master node for expert
  users with special considerations. It is not really part of the coallocation
  API in that it is called by the job, rather than by the process requesting a
  job.


 DUROC source manifest

  The ResourceManagement/duroc directory in your Globus source tree should
  contain the following directories and files:

       README              this document in plain ASCII format
       doc/                all DUROC documentation
       doc/duroc.html      this document in HTML format
       src/bootstrap/      bootstrapping (communication utility)
                           library sources
       src/control/        control (coallocation API) library
                           sources
       src/misc/           miscellaneous shared sources
       src/runtime/        runtime (barrier) library sources
       src/test/*          test-app sources
       src/tools/*         command-line tool sources [deprecated]
       Makefile.in         build script
       configure.in        build script
       config.status-r.in  build script
       configure           build script
       aclocal.m4          build script
       makefile.vars.in    build script
       bootstrap/          build directory for bootstrap library
       control/            build directory for control library
       misc/               build directory for miscellaneous shared
                           code
       runtime/            build directory for runtime library
       test-app/           build directory for test application
                           code
       tools/              build directory for command-line tools

  For each build-directory listed above, the following files exist:

       ./Makefile.in       build script
       ./configure.in      build script
       ./configure         build script
       ./aclocal.m4        build script


 Building DUROC

  Globus uses the GNU autoconf system to configure and build on any supported
  platform. To build DUROC you can run the following commands in your Globus
  build directory (see the Globus docs for more general information):

       % ./configure --enable-duroc   (plus any other desired options)
       % make

  The optional DUROC configuration flags are:

       --enable-duroc-debug

       This build option enables copious debugging messages from DUROC,
       and is primarily of interest when debugging the DUROC
       implementation. Once enabled, these messages can be conditionally
       suppressed at runtime.

       --disable-duroc-warnings

       This build option disables messages that report when various
       fault-handling mechanisms are triggered in DUROC. These messages
       can be left enabled in the build and conditionally suppressed at
       runtime.


 Installing DUROC

  After building DUROC as described above, you can install it on your system
  by running the following command in the directory where Globus was built
  (see the Globus docs for more general information):

       % make install

  The following libraries and header files are installed
  (globus_duroc_common.h is referenced from the other header files):

                                 globus_duroc_common.h
       libglobus_duroc_bootstrap globus_duroc_bootstrap.h
       libglobus_duroc_control   globus_duroc_control.h
       libglobus_duroc_runtime   globus_duroc_runtime.h


 Using the DUROC libraries

  The DUROC control and runtime libraries can be linked and used individually
  or in any combination within the same program. The control library provides
  the DUROC request API and the runtime library is used by every process
  initiated via DUROC.

  The programs duroc/src/tools/duroc-request.c and
  duroc/src/tools/duroc-stub-app.c serve as examples of how to use the DUROC
  libraries.

     * duroc-request.c uses the control DUROC library to initiate distributed
       jobs.
     * duroc-stub-app.c uses the runtime DUROC library and performs a DUROC
       check-in and some other simple runtime operations before exiting.
     * duroc-test-app.c uses the bootstrap and runtime DUROC libraries and
       performs several diagnostic bootstrapping operations to test the system
       before exiting.

  The suggested method of compiling DUROC-aware applications is to use a
  Makefile and insert the file "makefile_header" (by hand or via autoconf if
  you want to avoid non-portable makefile features) which is installed as part
  of Globus. This file can be found in the ${sysconfdir} of your installation
  (which defaults to ${prefix}/etc). This file contains environment variable
  definitions with values discovered during the Globus configuration process.

  Assuming you have done that, and your application includes
  "globus_duroc_runtime.h", your makfile will have the following two flavors
  of rules (one for compiling and one for linking):

       myapp.$(OFILE): myapp.c
               $(CC) $(CFLAGS) $(GLOBUS_DUROC_RUNTIME_CFLAGS) \
               -I$(includedir) -c myapp.c

       myapp: myapp.$(OFILE)
               $(CC) $(CFLAGS) myapp.$(OFILE) -o myapp \
               -L$(libdir) $(LDFLAGS) $(GLOBUS_DUROC_RUNTIME_LDFLAGS) \
               $(GLOBUS_DUROC_RUNTIME_LIBS) $(LIBS)

  If you are constructing a makefile to build your app as a globus component,
  simply replace "$(libdir)" and "$(includedir)" with "$(BUILD_DIR_LIB)" and
  "$(BUILD_DIR_INC)", respectively. In this case you should also use the
  standard Globus method of inserting makefile_header into your Makefile
  during the configuration process.

  The complete set of DUROC-related variables defined in the "makefile_header"
  are as follows:

     * For applications using the DUROC bootstrap library:
       GLOBUS_DUROC_BOOTSTRAP_CFLAGS, GLOBUS_DUROC_BOOTSTRAP_LDFLAGS,
       GLOBUS_DUROC_BOOTSTRAP_LIBS

     * For applications using the DUROC control library:
       GLOBUS_DUROC_CONTROL_CFLAGS, GLOBUS_DUROC_CONTROL_LDFLAGS,
       GLOBUS_DUROC_CONTROL_LIBS

     * For applications using the DUROC runtime library:
       GLOBUS_DUROC_RUNTIME_CFLAGS, GLOBUS_DUROC_RUNTIME_LDFLAGS,
       GLOBUS_DUROC_RUNTIME_LIBS


 Using the DUROC tools

  Below is a summary of the tools provided with DUROC. Each tool is a
  minimalist wrapper around DUROC library functions.

       globus-duroc-request     a command-line utility to initiate
                                jobs.
            synopsis: globus-duroc-request  [ -i ] [ -q ] [ -s ]
                       spec
                      globus-duroc-request  [ -i ] [ -q ] [ -s ]
                       -f spec-file

            spec is a DUROC resource specification string to
            be passed to the DUROC client API, or spec-file is
            the name of a file containing the specification
            string. The -i option enables interactive mode,
            otherwise automatic mode is used. The -q mode
            enables quiet mode to suppress advisory messages
            generated by the tool. The -s option enables
            synchronous mode (as for other schedulers) which
            means that stdout/stderr of the application will
            be directed to stdout/stderr of the request tool
            unless they were explicitly sent to file, and this
            mode implies quiet mode. Without -s the outputs of
            the application default in GRAM to the destination
            /dev/null.

            In all modes, globus-duroc-request runs an
            instance of the control lib and makes the DUROC
            request. Unless the tool is in quiet mode, it then
            prints the result of the request operation,
            including the result codes for each subjob if the
            overall request is successful.

            In automatic mode, the tool continues after
            submission by releasing the barrier.

            The tool waits for job termination while the
            control library performs its processing and then
            exits.

            In interactive mode, the tool continues after
            submission by issuing a subjob-state summary and a
            prompt for user commands. The command language is
            very simple, consisting of the following
            operations:

                 Dlabel  Delete the subjob with the given label.
                 K       Kill the entire job.
                 C       Commit the job, releasing runtime
                         barriers.
                 Q       Quit the tool immediately.

            All commands begin with a single command character
            and the label parameter begins with the next
            character after this command character and
            continues to the first new-line. All unrecognized
            command characters or other extraneous characters
            are discarded but cause the prompt to be reissued.

            While waiting for user input, the tool continues
            job processing in the background. The user must
            explicitly terminate the tool. In interactive mode
            this can be accomplished via a signal or the `Q'
            quit command.

            You must have the appropriate ~/.globus* GSSAPI
            configuration files installed in your home
            directory, and depending on the GSSAPI library
            used with the GRAM client, the tool may prompt for
            passwords. See the GSSAPI documentation for more
            information.


 Known bugs and limitations

  The error codes documented in the API section of this file are a subset of
  the actual codes returned.

  The globus_duroc_runtime_inter_subjob_* and
  globus_duroc_runtime_intra_subjob_* interfaces are not yet reentrant. The
  user must refrain from calling any of the routines concurrently.


  ------------------------------------------------------------------------

             Last modified 10/23/98. Comments? webmaster@globus.org


