Understanding RPC

Jaideep More
7 min readJul 23, 2023

--

This article summarises the paper Implementing Remote Procedure Calls and contains parts from the Distributed-Systems Concepts and Design — George Coulouris.

Introduction

Remote Procedure Call (RPC) is based on the observation that procedure calls are a well-known and well-understood mechanism for transferring control within a program running on a single computer.

It is proposed that this same mechanism be extended to transfer control and data across a communication network. RPC Makes the programming of distributed systems look similar, if not identical, to conventional programming — achieving a high level of distribution transparency.

RPC Call Semantics

Request-reply protocols can be implemented in different ways to provide different delivery guarantees. The main choices are:

  1. Retry Request message: Retransmit the request until a reply is received or the server is assumed to have failed.
  2. Duplicate Filtering: Controls whether to filter out duplicate requests at the server.
  3. Retransmission of Results: keeps a local history of result messages to enable lost results to be retransmitted without re-executing the operations at the server.

Combinations of these choices lead to various possible semantics for the reliability of remote invocations.

RPC Semantics

Maybe Semantics

  • The remote procedure call may be executed once or not at all.
  • Can suffer from the following types of failure:
    1. Omission failures if the request or reply got lost. Either
    — the operation was not performed. (the request was lost).
    — the operation was performed (the reply was lost).
    2. Crash failures when the server fails.

At-least-once Semantics

  • Semantics are achieved by retransmission of request messages. As there is no duplicate filtering, a duplicate request leads to the re-execution of the procedure.
  • Can suffer from the following failures:
    1. Crash failures when the server fails.
    2. Arbitrary failures: when the procedures are executed multiple times. One has to be careful, as calls can modify the state multiple times.

At-most-once Semantics

  • Semantics can be achieved by using all the fault-tolerance measures.
  • The use of retries masks any omission failures.
  • It prevents arbitrary failures by ensuring none of the operations are performed more than once.

RPC Structure

RPC Structure

When making remote calls, five pieces of the program are involved: the user, user-stub, RPCRuntime, server-stub and the server

The user, the user-stub and one instance of RPCRuntime execute in the caller machine;

The server, server-stub and another instance of RPCRuntime execution in the callee machine.

user-stub

is responsible for placing the specification of the target procedure and their arguments into one or more packets and asking the RPCRuntime to transmit these reliably to the callee machine.

server-stub:

is responsible for unpacking the information sent by the user-stub and calling the appropriate procedure in the server. When the call in the server completes, it returns to the server-stub and the results are passed back to the suspended process in the caller machine.

RPCRuntime:

is responsible for retransmissions, acknowledgements, packet routing and encryption

A programmer does not need to build detailed communication-related code. After designing the interface, he need only write the user and server code.

  • An interface module is mainly a list of procedure names and the types of their arguments and results.
  • A program module implementing procedures in an interface is said to export that interface.
  • A program module calling procedures from an interface is said to import that interface.

Binding

How does a client specify what he wants to be bound to?
How does a caller determine the machine address of the callee?
How to specify to the callee the procedure to be invoked?

The binding operation binds an importer of an interface to an exporter of an interface. After binding, calls made by the importer invoke the procedures implemented by the exporter.

We now discuss the binding discussed in the paper, which uses Grapevine for name resolution. There are alternatives to using such a database

Naming Interfaces:

There are two parts to the name of an interface:

  1. type: which interface the caller expects the callee to implement. Ex: mail server
  2. instance: which particular implementor of an abstract interface is desired: Ex: some particular mail server selected from many

Locating an Exporter:

We use the Grapevine distributed database for our RPC binding. Grapevine's database consists of a set of entries, each having a key value known as Grapevine RName.

There are two types of entries: individual and groups

Grapevine keeps several pieces of information for each entry, but we are concerned with only two:

  1. connect-site for each individual entry: network address
  2. member-list for each group: list of RName

RPC package maintains two entries in the Grapevine database for each interface name:

  1. type: value is a member-list whose elements are the RName of the instance of that type
  2. instance: value is a connect-site and is the network address of the machine on which that instance was last exported

Exporting an Interface

When an exporter wishes to make his interface available to remote clients, the server code calls the server-stub which in turn calls a procedure ExportInterface in RPCRuntime.

ExportInterface is given the interface name, a procedure (dispatcher) implemented in server-stub which will handle incoming calls for the interface.

It calls Grapevine and ensures that appropriate entries exist for the instance and type of the interface.

RPCRuntime then records information about this export in a table maintained on the exporting machine. For each currently exported interface, this table contains the interface name, the dispatcher and a 32-bit value that serves as a permanently unique_identifier of the export.

Importing an Interface

When an importer wants to bind to an exporter, the user code calls its user-stub which calls a procedure, ImportInterface with the type and instance information in the RPCRuntime. RPCRuntime queries Grapevine and gets the network address of the exporter to make RPC for the binding information.

Depending on whether the machine is currently exporting the interface, appropriate binding information (the unique_identifier ) is sent back to the importing machine and the binding succeeds. The exporter network address, identifier and table index are remembered by the user-stub for use in remote calls.

Transport Protocol

The particular nature of RPC communication means substantial performance gains are possible if one designs and implements a transport protocol specifically for RPC.

One aim we emphasised in our protocol design was minimising

  • the elapsed time between initiating a call and getting results
  • from the load imposed on a server by many users.

With protocols for bulk data transfers, this is unimportant: most of the time is spent setting up and taking down connections and requires maintenance of state information.

We want our machines to be able to serve a substantial number of clients, and it would be unacceptable to require either a large amount of state information or expensive connection handshaking.

We guarantee that if the call returns to the user, then the procedure in the server has been invoked precisely once (at-most-once semantics?); otherwise, an exception is reported to the user.

If an exception is sent, the user has no idea whether the server has crashed or if there is a problem in the network.

Simple Calls

The machine that transmits a packet is responsible for retransmitting it until an acknowledgement is received (Retransmit Request Message).

The result of a call is a sufficient acknowledgement that the call was received, and a call packet is a sufficient acknowledgement of the previous call made. (No explicit ACKs)

To make a call, the caller sends a call packet containing call_identifier, data specifying the desired procedure and the arguments. When a procedure returns, a result packet containing the same call identifier and the results are returned to the caller.

The call_identifier consists of:

  1. machine_identifier: permanent and globally unique (IP Address)
  2. machine relative process_identifier for the calling process (Port#)
  3. sequence number

We define an activity as a tuple (machine_identifier, process_identifier). the important property of activity is that each activity has at most one outstanding packet.

When a call packet is received, it can be discarded as a duplicate by examining its caller_identifier (Duplicate Filtering)

This scheme guarantees traditional connection-oriented protocols without the costs. We assume that an activity's call sequence number does not repeat even if the calling machine is restarted. We generate a conversation identifier based on a 32-bit clock maintained by every machine (initialised from network time servers when a machine restarts).

Complicated Calls

To handle lost packets, long-duration calls and long gaps between calls, the packet is modified to request an explicit acknowledgement.

When the caller gets the acknowledgement, it waits for the result packet. While waiting, the caller periodically sends a probe packet to the callee, which is expected to receive an ACK.

This helps to detect if the callee has crashed or if there is a serious issue with the communication network.

Provided the caller receives ACK for the probes, it continues waiting. Notice this does not detect if the callee has deadlocked. This is in keeping with the semantics of local procedure calls.

If the arguments or results are too large to fit in a single packet, they are sent in multiple packets, with each but the last packet requiring an explicit ACK. This allows the implementation to use only one packet buffer at each end for the call and avoids the necessity of including the buffering and flow control strategies found in normal bulk data-transfer protocols.

Thus, if the call requires more than one packet for its arguments or results, our protocol sends more packets than are logically required.

Originally published at http://github.com.

--

--

No responses yet