User Manual
Tech Manual
Downloads
Connectathon
Courtesy of TRAAKAN
Technical Manual

Abstract

NDMJOB is a NDMP-compatible backup/restore package, reference implementation, and conformance test. It is furnished in source code form to the public by Traakan, Inc. and other contributors free of charge or restriction. NDMP (Network Data Management Protocol) is an open protocol for network based backup enabling multi-vendor backup solutions. NDMJOB may be used as a ready-to-run backup package, used in conjunction with other NDMP products, as the basis for new products, as a verification of NDMP products, and as a reference of the NDMP protocol standards and conventions.

This NDMJOB/NDMJOBLIB Technical Manual explains the software and discusses NDMP in general. This software is more library than command (the command interface is about 5% of the source code), and readily adapted to other NDMP products. The audiences for this manual include:

  • developers interested this software or NDMP in general;
  • developers of other NDMP products using NDMJOB as a testing and diagnostic tool, and thus needing technical insight into NDMJOB;
  • developers using NDMJOBLIB as a foundation for other applications;
  • contributors to NDMJOB who enhance and extend its capabilities; and
  • retargetters who implement NDMJOB on additional operating systems.
The topics addressed include
  • the elements, interactions, and detailed operations of the NDMP architecture;
  • the construction of NDMJOB/NDMJOBLIB software;
  • the NDMJOBLIB address of vagueries in the NDMP specification; and
  • guidance for porting (retargetting) NDMJOB to other operating systems.
http://www.traakan.com/ndmjob NDMJOB latest source and docs
http://www.ndmp.org NDMP protocol status and info
ndmp-tech@ndmp.org Questions, comments and bug reports.

Table of Contents

Introduction
  What are NDMJOB and NDMJOBLIB
  Adaptability to Applications
  Background, Purpose, and Motivation
  Scope and Intended Audience
  Terminology - "Agents" vs "Client/Server"
  Other Documentation

NDMP Architectural Overview
  NDMP Agents and Elements
    CONTROL Agent (Client)
    Disk Files
    DATA Agent (Server)
    Image Stream
    TAPE Agent (Server)
    Tape Drive and Robot
    ROBOT Agent (Server)
  NDMP Data Flow and Media Model
  NDMP Sequence for Connect and Authentication
  NDMP Sequence for Backups
  NDMP Sequence for Recovery

Run time hierarchy
  Job
  Session
  Agent
  Activity

Implementation Overview
  NDMJOB Command
    ndmjob.h - header for NDMJOB command
    ndmjob-args - example argument macros
    ndmjob_args.c - process command arguments
    ndmjob_job.c - construct ndm_job_param from args
    ndmjob_main.c - main() routine
    ndmjob_rules.c - apply RULES to a job
  Rules
    ndmjr_none.h/.c - no additional rules
  Agents
    Agents Implementation Method
      ndmxx_initialize()
      ndmxx_commission()
      ndmxx_quantum()
      ndmxx_decomission()
    Agents General Files
      ndmagents.h - header file for Agents
      ndma_dispatch.c - dispatch NDMP requests
      ndma_job.c - audit and perform ndm_job_param
      ndma_image_stream.c - glue between DATA/TAPE
      ndma_session.c - session orchestration
      ndma_subr.c - Agents subroutines
    Control Agent - General and Support
      ndma_control.c - Dispatch job
      ndma_ctrl_calls.c - issue NDMP requests to appropriate agent
      ndma_ctrl_conn.c - XDR/TCP/IP setup and management
      ndma_ctrl_media.c - media positioning and results
      ndma_ctrl_robot.c - robotic support for session
    Control Agent - Normal Operations
      ndma_cops_backreco.c - backup/recover
      ndma_cops_labels.c - initialize/query tape labels
      ndma_cops_query.c - query agents capabilities
      ndma_cops_robot.c - robot fixer and subroutines
    Control Agent - Test/Diagnostic Operations
      ndma_ctst_mover.c - test NDMP MOVER behaviour
      ndma_ctst_subr.c - test subroutines
      ndma_ctst_tape.c - test NDMP TAPE behaviour
    Data Agent
      ndma_data.c
      ndma_data_fh.c
      ndma_data_gtar.c
      ndma_data_pfe.c
    Tape Agent
      ndma_tape.c
      ndma_tape_simulator.c
    Robot Agent
  Utility Routines
    ndmlib.h
    ndml_agent.c - Agent spec: host,logon,pw
    ndml_chan.c - Channel, non-blocking I/O
    ndml_conn.c - XDR Connection
    ndml_cstr.c - Canonical strings ala HTTP
    ndml_log.c - Logging helper functions
    ndml_media.c - Media spec LABEL/SIZE+FM@ADDR
    ndml_nmb.c - NDMP Message Buf helpers
    ndml_scsi.c
    ndml_util.c
  NDMP Protocol Support
    ndmprotocol.h
    ndmprotocol.c - Protocol layer accessor routines
    ndmp_ammend.h - Definitions needed in spec
    ndmp_translate.h/.c - vX/v9 translators
    ndmp[0239].x - Protocol specification files
    ndmp[0239].h - rpcgen(1) generated header files
    ndmp[0239]_xdr.c - rpcgen(1) generated XDR routines
    ndmp[023]_xmt.c - XDR message dispatch tables
    ndmp[0230]_enum_strs.h/.c - enum/str for pp
    ndmp[023]_pp.c - Pretty Printers, used for snoop
  SCSI Media Changer (smc)
    scsiconst.h
    smc.h
    smc_api.c
    smc_parse.c
    smc_pp.c
    smc_priv.h
    smc_raw.h
  TAR Format Support Routines
    tarhdr_gnu.h
    tarsnoop.c and tarsnoop.h
  Operating System Specific
    ndmos.h
    ndmos.c
    ndmos_freebsd.h and ndmos_freebsd.c
    ndmos_solaris.h and ndmos_solaris.c
    ndmos_xxx.h and ndmos_xxx.c templates

About Recovery
  Basic recovery steps
    Aquisition
    Disposition
      Disposition PASS
      Disposition DISCARD
  
  Direct and sequential access
    Environment variables controlling access
      DIRECT
      RECOVER_DIRECT
  NDMP ndmp_name structure
    fh_info field
      Known invalid fh_info all 1s
      RECOVER_FH_INFO_VALID environment variable
      RECOVER_PREFIX environment variable
  Direct/sequential vs. valid/invalid fh_info
  Recovery modes
    Sequential mode
    Semi-direct mode
    Direct mode
    Mixed mode

Porting Tips



Introduction

What are NDMJOB and NDMJOBLIB

NDMJOB and NDMJOBLIB comprise a software package which manages tape backup and recover operations using NDMP (Network Data Management Protocol).

NDMJOB/NDMJOBLIB is more library than command, and this library may serve as the foundation for more elaborate CONTROL packages and/or better user interfaces. For the purposes of this manual, NDMJOB precisely refers to the thin layer of code -- about 5% -- which provide the command-line style user interface to NDMJOBLIB. NDMJOBLIB refers to the rest of the software and does all the real work.

NDMJOBLIB and NDMP in general are a frameworks for managing backups and not a backup method. There is no NDMJOB tape format, for example. Rather, NDMJOBLIB is a "wrapper" around programs which do the real work, and provides an interoperable way to orchestrate backup/recover operations.

Adaptability to Applications

The NDMP behaviour and conventions are isolated in operating system independent (O/S generic) modules. Operating system specific portions, such as device access and manipulation, are isolated in per-system modules. The O/S specific portions for any system represent about 4% of the source code.

Application sensitive aspects of NDMJOBLIB are isolated using C preprocessor (cpp(1)) #ifdef's. By default, NDMJOB/NDMJOBLIB builds with all abilities and all NDMP versions supported. Specific portions may be omitted by defining the right cpp(1) symbol.

For example, server-only applications may want to shed the NDMJOBLIB control capabilities, while client-only applications may want to only keep the control capabilities.

NDMJOBLIB supports both NDMPv2 and NDMPv3 simultaneously, and can coordinate the activities of two agents with different versions. Again, the right cpp(1) symbol can omit support for specific versions.

Background, Purpose, and Motivation

NDMJOB was originally developed by Traakan, Inc as a testing tool. It gradually became more sophisticatd to test tape robotics, and directed access recovery. Further, it became a vehicle for identifying and resolving ambiguities in the NDMP specification. As NDMJOB evolved, it became a generally useful utility as well as a testing tool. Traakan released NDMJOB into the public domain in the spirit of fostering interoperability, and fostering the deployment of NDMP-based backup systems.

Please e-mail comments and contributions to ndmjob@traakan.com and/or ndmp-tech@ndmp.org.

Scope and Intended Audience

The audiences for this manual include:
  • developers interested this software or NDMP in general;
  • developers of other NDMP products using NDMJOB as a testing and diagnostic tool, and thus needing technical insight into NDMJOB;
  • developers using NDMJOBLIB as a foundation for other applications;
  • contributors to NDMJOB who enhance and extend its capabilities; and
  • retargetters who implement NDMJOB on additional operating systems.

Terminology - "Agents" vs "Client/Server"

The NDMJOB/NDMJOBLIB documentation and code refer to "Agents", while the NDMP specification refers to "Clients" and "Servers". The client/server model -- or duality -- is the most commonly used for networking protocols. Peer-to-peer and agents are other terms frequently used.

If you say "NDMP Client" to a mixed audience of backup folks, half of them are thinking the same thing you are and the other half aren't. The conversation invariably comes to a halt while everybody syncs up on terminology. If you say "NDMP Control agent" to an audience, everybody knows what you mean. As NDMJOB and NDMP are introduced into more elaborate packages -- and the client/server duality becomes difficult to apply -- the "agent" terminology will continue to work.

The NDMP specification refers to client/server. This is heavily influenced by the direction of connection. The client-side initiates activity, connects to one or more servers, the uses the NDMP to coordinate activities. To people accustomed to working with networking protocols, this terminology is crystal clear and easily recognized: the elements which "connect" are clients, and the elements which "listen" are the servers. Yet, ignore how connections are established for a moment, and for NDMP the recognition gets more difficult.

The promotional materials and personel of commercial backup packages also refer to client/server, and the meaning is reversed from that of the NDMP specification. This has an interesting history which we won't explore here. To these folks the servers are the elements closest to objective, and the clients are elements which interact and support the server. The objective, again to these folks, is the management of backup schedules, file indexes, recovery request queues, media aging and reassigment, etc, etc, etc. A tape drive, or anything that helps talk to it, is a supportive role to the server.

The client/server duality breaks down for backup packages precisely because backup packages have more than two elements. Although NDMJOB is a stand-alone CONTROL agent, the CONTROL agent for commercial backup packages is encapsulated within standing processes ("servers") which await connections from administrative and user interfaces, and peer processes running the same package on other machines. At appropriate times, these packages initiate and orchestrate backup activity. It's easy to imagine backup packages with three, four, or even five key elements. Architectures and systems with five elements defy traditional client/server duality.

NDMJOB introduces its own wrinkle with its "resident" agents. A resident agent does not involve connecting and listening, nor any network connection at all. Resident agents are accessed as a subroutine package.

The term "agent" is used for SNMP (Simple Network Management Protocol) because large SNMP systems also have more than two key elements. The issues surrounding NDMP and backup packages are similar to those of SNMP, so we adopted the "agents" terminology rather than inventing a new one. At times, client/server is a handy short hand for what's connecting and what's listening.

Other Documentation

The NDMPv2 and NDMPv3 specifications, and the Workflow documents which describe NDMP sequences, may be found on the http://www.ndmp.org/ NDMP web site.

The NDMJOB User Manual explains the capabilities, command usage and options, and provides examples and guidance.





NDMP Architectural Overview

NDMP Agents and Elements

CONTROL Agent (Client)

The CONTROL agent orchestrates the activity of the DATA, TAPE, and ROBOT agents. Among other things, the CONTROL agent:
  • Implements all user interfaces
  • Maintains a file index of all backups
  • Initiates backup activity
  • Initiates recovery (some files) and restore (all files) activities
  • Implements tape robot control via SCSI pass-through
  • Maintains a media (tape) index

Disk Files

DATA Agent (Server)

The DATA agent constructs and parses a backup image stream from or to a collection of disk files, respectively. Think of the data agent as your favorite archive program, like tar or zip, but rather than save the data to a file (or to tape), it delivers the data to a data connection.

Image Stream

TAPE Agent (Server)

The TAPE agent stores or retrieves a backup image stream to or from one or more tape files, respectively.

Tape Drive and Robot

ROBOT Agent (Server)

NDMP Data Flow and Media Model

NDMP Sequence for Connect and Authentication

Item Description
1 A standing NDMP daemon awaits connection requests on TCP port number 10000.
2 When a connection is received, the NDMP daemon launches an NDMP session. More than one NDMP session may be active at a time, though the NDMP sessions may not share tape devices.
3 An NDMP session control block is constructed by the NDMP session process. This contains all state information for the various components.
4 The NDMP session sends an NDMP_NOTIFY_CONNECTED message to the NDMP CONTROL agent (client), thus indicating that initialization is complete and the session may proceed.
5 The CONTROL agent negotiates a protocol version using NDMP_CONNECT_OPEN
6 The CONTROL agent requests a list of acceptable authentication methods.
7 The client presents the password to authenticate itself as having authority to perform the (priviledged) backup/recover operations.

NDMP Sequence for Backups

Item Description
1 The CONTROL agent, using the SCSI pass-through, initiates tape robot activity. If a tape robot is not being used, the CONTROL agent requests manual intervention.
2 The CONTROL agent requests the TAPE agent open of the appropriate tape drive.
3 Using the NDMP_TAPE interfaces, the CONTROL agent checks and positions the tape drive. Some CONTROL agents will write an identifying label, necessarily as a separate tape file, onto the tape.
4 The CONTROL agent requests the TAPE agent commence NDMP_MOVER functions start using NDMP_MOVER_LISTEN. The NDMP_MOVER is an interface between the image stream and the tape media.
5 The CONTROL agent requests the DATA agent start using NDMP_DATA_START_BACKUP. The file set, NDMP_MOVER address (local or remote), and other parameters to govern the backup are included in the request.
6 The DATA agent connects to the TAPE agent, and commences delivery of the backup image stream.
7 The MOVER receives the connection, and commences blocking and writing the image stream to the tape.
8 As the DATA agent processes files, it delivers NDMP_FH (file history) information to the CONTROL agent. This is used by the CONTROL agent to construct a file index of the backup.
9 Periodically, the CONTROL agent will request status information for the NDMP_DATA, NDMP_TAPE, and NDMP_MOVER.
10 When the TAPE agent reaches the end of the tape, it notifies the CONTROL agent using an NDMP_NOTIFY_MOVER_PAUSED message. The NDMP_MOVER then suspends until directed to continue (step 14).
11 The CONTROL agent, using the NDMP_TAPE interfaces, directs the TAPE agent to write file marks, rewind the tape drive, and unload the tape.
12 The CONTROL agent moves tapes around using the SCSI pass-through to operate the tape robot. If there is no tape robot, the CONTROL agent requests manual intervention.
13 As with step 3, the CONTROL agent positions the tape and writes identifying data.
14 The CONTROL agent directs the NDMP_MOVER resume using the NDMP_MOVER_CONTINUE request.
15 The TAPE agent continues to write the image stream to the tape drive.
16 When the backup is complete, the DATA agent enters the HALT state with the SUCCESSFUL reason, and notifies the CONTROL agent with a NDMP_NOTIFY_DATA_HALTED message. The image stream connection is closed.
17 The CONTROL agent issues a NDMP_DATA_STOP request. The DATA agent terminates.
18 The TAPE agent detects that the image stream is closed.
19 The TAPE agent flushes all pending writes to the tape drive.
20 The TAPE agent notifies the CONTROL agent that it is done with a NDMP_NOTIFY_MOVER_HALTED message.
21 The CONTROL agent issues a NDMP_MOVER_STOP request, which causes the NDMP_MOVER functions to terminate.
22 Using the NDMP_TAPE interfaces, the CONTROL agent writes file marks to the tape drive, rewinds, and unloads
23 The CONTROL agent, using the SCSI pass-through, operates the tape robot to store the tape.
24 The CONTROL agent issues and NDMP_CONNECT_CLOSE request(s). Both the CONTROL agent and the session(s) close the TCP connection.

NDMP Sequence for Recovery

Item Description
1 The CLIENT, using the SCSI pass-through, initiates tape robot activity. If a tape robot is not begin used, the CLIENT requests manual intervention.
2 The CLIENT requests opening of the appropriate tape drive via NDMP_TAPE_OPEN.
3 Using the NDMP_TAPE interfaces, the CLIENT checks and positions the tape drive.
4 The CLIENT requests the MOVER thread start using NDMP_MOVER_LISTEN.
5 The CLIENT requests the DATA thread start using NDMP_DATA_START_RESTORE.
6 The DATA thread analyzes the parameters, and issues a tape seek and read using NDMP_NOTIFY_DATA_READ. The CLIENT may unload and reload the tape drive.
7 The CLIENT sends an NDMP_MOVER_SET_WINDOW request, which establishes the relative position of the tape file to the original data stream. Then, the CLIENT sends and NDMP_MOVER_READ request with the same parameters as the NDMP_NOTIFY_DATA_READ. The MOVER thread is responsible for resolving the window with the data-stream relative offsets.
8 The MOVER thread begins reading the tape drive.
9 The MOVER writes the data stream, and the DATA thread reads it.
10 As files are restored, the DATA sends NDMP_LOG_FILE messages identifying what was recovered. Only names that match the original request (step #5) are reported. File and subdirectories are not individually reported.
11 When the current tape file is exhausted, the MOVER enters the paused state and informs the CLIENT of such with NDMP_NOTIFY_MOVER_PAUSED.
12 The CLIENT, using the NDMP_TAPE interfaces, rewinds the tape drive (or possibly possitions to another tape file).
13 The CLIENT requests the tape drive be closed using NDMP_TAPE_CLOSE.
14 The CLIENT initiates tape robot activity to change tapes.
15 The CLIENT re-opens the tape drive.
16 The CLIENT repositions the tape.
17 The CLIENT sends an NDMP_MOVER_CONTINUE request, and the MOVER reenters the active state. It continues to read data from the tape and writing it to the data connection.
18 When the DATA thread has recovered all files (or exhausted the backup), it enters the halted state and informs the CLIENT of such using NDMP_NOTIFY_DATA_HALTED.
19 The CLIENT issues a NDMP_DATA_STOP, which causes the DATA thread to terminate. The data connection is closed.
20 The MOVER notices the closed connection, enters the halt state, and informs the CLIENT of such with NDMP_NOTIFY_MOVER_HALTED.
21 The CLIENT issues an NDMP_MOVER_STOP request, and the MOVER thread terminates.
22 Using the NDMP_TAPE interfaces, the CLIENT rewinds the tape drive.
23 The CLIENT uses NDMP_TAPE_CLOSE to close the tape drive.
24 The CLIENT uses the SCSI pass-through to store the tape.




Run time hierarchy

(from ndmagents.h)

    "job" -> ndma_client_session()    ndma_server_session()
                   |                           |
     /-------------/                           Q
     |                                         v
     |       +------------------------------------+      +-----------+
     |    /->|           SESSION QUANTUM          |----->| disp conn |
     |    |  +------------------------------------+ \    +-----------+
     |    Q   Q       Q         Q       Q        Q  |         v    |
     |    |   |       |         |       |        |  | +----------+ |
     |  /-----|----+--|----+----|----+--|----+---|----| dispatch | |
     |  | |   |    |  |    |    |    |  |    |   |  | | request  | |
     |  v |   v    v  v    v    v    v  v    v   v  | +----------+ |
     | +-------+  +----+  +------+  +----+  +-----+ |      ^       |
     +>|CONTROL|  |DATA|  |IMAGE |  |TAPE|  |ROBOT| |      |       |
       |       |  |    |->|STREAM|<-|    |  |     | |      |       |
       |       |  |   *====*    *====*   |  |     | |      |  ndmconn_recv()
       | ndmca |  ndmda|  |ndmis |  ndmta|  |ndmra| |      |       |
       +-------+  +----+  +------+  +----+  +-----+ |      |resi   |
              |     | |    |    |       |           |   +------+   |
              \-----|-+----|----+-------+-------------->| call |   |
                    |      |                        |   +------+   |
           formatter|      |image_stream            |      |remo   |
                    v      v                        |      v       v
                   +---------+<---ndmchan_poll()----/     +---------+
                   | ndmchan |<---------------------------| ndmconn |
                   +---------+                            +---------+
                 non-blocking I/O                         XDR wrapper

   -----> caller/callee
   --Q--> quantum (CPU scheduling)
   ====== image stream shared data structures

Job

Session

Agent

Activity





Implementation Overview

(from Makefile)

	This illustrates the strata (layers) of the
	NDMJOB/NDMJOBLIB software, the scope of key
	header (.h) files, and the source files
	constituting each layer.

  -  -  -  -  -		+---------------------------------------+
  ^  ^  ^  ^  ndmjob.h	| NDMJOB Command	ndmjob_*.c	|
  |  |  |  |  -		+---------------------------------------+
  |  |  |  |		          NDMJOBLIB API "job"
  |  |  |  |		+---------------------------------------+  \
  |  |  |  |		| Rules       (NDMJLR)  ndmjr_*.[ch]	|   \
  |  |  |  ndmagents.h	| Agents      (NDMJLA)  ndma_*.c	|    |
  |  |  |  -		+---------------------------------------+    |
  |  |  ndmlib.h	| Library     (NDMJLL)  ndml_*.c	|    |
  |  |  -		+---------------------------------------+    |
  |  ndmprotocol.h	| Protocol    (NDMJLP)  ndmp*.[chx]	| NDMJOBLIB
  |  -			+---------------------------------------+    |
  |			| SMC         (NDMJLS)  smc*.[ch]	|    |
  |			| Formats     (NDMJLF)  tar*.[ch]	|    |
  |			+---------------------------------------+    |
  ndmos.h		| OS intf     (NDMJLO)  ndmos*.[ch]	|   /
  -			+---------------------------------------+  /

NDMJOB Command

ndmjob.h - header for NDMJOB command

The primary header file for the NDMJOB command line interface. All global variables for the command are defined here. Most of the global variables directly correspond to command line options.

ndmjob-args - example argument macros

Sample arg macros file for NDMJOB.

ndmjob_args.c - process command arguments

NDMJOB command line argument processing

ndmjob_job.c - construct ndm_job_param from args

NDMJOB synthesis of structure used to enter the NDMAGENTS library.

ndmjob_main.c - main() routine

NDMJOB main() routines. Handles log files, debug levels, etc.

ndmjob_rules.c - apply RULES to a job

Rules

ndmjr_none.h/.c - no additional rules

These contain the rule set for -o rules=none. They are here primarily as templates.

Agents

Agents Implementation Method

ndmxx_initialize()

ndmxx_commission()

ndmxx_quantum()

ndmxx_decomission()

Agents General Files

ndmagents.h - header file for Agents

ALL AGENTS header file for the NDMAGENTS library.

ndma_dispatch.c - dispatch NDMP requests

ALL AGENTS dispatch routines for NDMP messages. Most audits (error/status checking) are done here.

ndma_job.c - audit and perform ndm_job_param

CONTROL AGENT API into NDMAGENT library

ndma_image_stream.c - glue between DATA/TAPE

DATA/TAPE image stream subroutines. The image stream is the data stream between the DATA and TAPE Agents.

ndma_session.c - session orchestration

ALL AGENTS session management. Primarily processes connections and dispatches quantums.

ndma_subr.c - Agents subroutines

ALL AGENTS various subroutines.

Control Agent - General and Support

ndma_control.c - Dispatch job

ndma_ctrl_calls.c - issue NDMP requests to appropriate agent

CONTROL AGENT call interfaces

ndma_ctrl_conn.c - XDR/TCP/IP setup and management

CONTROL AGENT connection management for connections to the other (DATA/TAPE/ROBOT) Agents.

ndma_ctrl_media.c - media positioning and results

CONTROL AGENT media management. Tape labels, robotics, tape positioning, window sizes and window capture.

ndma_ctrl_robot.c - robotic support for session

CONTROL AGENT tape robotics subroutines

Control Agent - Normal Operations

ndma_cops_backreco.c - backup/recover

CONTROL AGENT OPERATIONS for backup/recover. This is complete.

ndma_cops_labels.c - initialize/query tape labels

CONTROL AGENT OPERATIONS for tape labeling and label queries

ndma_cops_query.c - query agents capabilities

CONTROL AGENT OPERATIONS for NDMP server queries

ndma_cops_robot.c - robot fixer and subroutines

CONTROL AGENT OPERATIONS for tape robotics operations, primarily subroutines for the other ndma_cops_xxx.c files.

Control Agent - Test/Diagnostic Operations

(from ndma_ctst_mover.c)

NDMP Elements of a test-mover session

                   +-----+     ###########
                   | Job |----># CONTROL #
                   +-----+     #  Agent  #
                               #         #
                               ###########
                                #   |  |
                  #=============#   |  +---------------------+
                  #                 |                        |
   CONTROL        #         control | connections            |
   impersonates   #                 V                        V
   DATA side of   #            ############  +-------+   #########
   image stream   #            #  TAPE    #  |       |   # ROBOT #
                  #            #  Agent   #  | ROBOT |<-># Agent #
                  #     image  # +------+ #  |+-----+|   #       #
                  #==============|mover |=====|DRIVE||   #       #
                        stream # +------+ #  |+-----+|   #       #
                               ############  +-------+   #########

ndma_ctst_mover.c - test NDMP MOVER behaviour

CONTROL AGENT TEST OPERATIONs for testing the NDMP MOVER component of a TAPE Agent.

ndma_ctst_subr.c - test subroutines

CONTROL AGENT TEST OPERATIONS subroutines for the other ndma_ctst_xxx.c files.

ndma_ctst_tape.c - test NDMP TAPE behaviour

CONTROL AGENT TEST OPERATIONs for testing the NDMP TAPE component of a TAPE Agent. This does a lot of tape positioning and verification of status/error codes for certain conditions.

Data Agent

ndma_data.c

DATA AGENT primary implementation. Constructors, destructors, semantic actions, and quantum processing.

ndma_data_fh.c

DATA AGENT file history (FH) support routines.

ndma_data_gtar.c

DATA AGENT Gnu tar(1) interface.

ndma_data_pfe.c

DATA AGENT pipe/fork/exec subroutines.

Tape Agent

ndma_tape.c

TAPE AGENT primary implementation. Constructors, destructors, semantic actions, and quantum processing.

ndma_tape_simulator.c

TAPE AGENT simulator of a properly functioning tape drive and driver. This uses a disk file. It can be used as a reference for implementing other tape drive/driver interfaces, which are necessarily OS DEPENDENT.

Robot Agent

Utility Routines

ndmlib.h

NDMLIB header file

ndml_agent.c - Agent spec: host,logon,pw

NDMLIB library routine for processing Agent identification and authentication (host name, password, etc). This is used by NDMJOB to assist command line processing.

ndml_chan.c - Channel, non-blocking I/O

NDMLIB channel functions. A channel is simply a recuring I/O operation.

ndml_conn.c - XDR Connection

NDMLIB connection functions. Connections are channels (network connections) between NDMP Agents.

ndml_cstr.c - Canonical strings ala HTTP

NDMLIB connection functions. Connections are channels (network connections) between NDMP Agents.

ndml_log.c - Logging helper functions

NDMLIB log helper functions.

ndml_media.c - Media spec LABEL/SIZE+FM@ADDR

NDMLIB media helper functions. Primarily command line argument processing helpers.

ndml_nmb.c - NDMP Message Buf helpers

NDMLIB NDMP Message Buffer (NMB) support routines.

ndml_scsi.c

NDMLIB SCSI interfaces. This is complete, but there is some cleanup work to do. That's why it hasn't been renamed to ndml_scsi.c

ndml_util.c

NDMLIB utility functions

NDMP Protocol Support

There are multiple version of NDMP. This gathers them together. Under control of #ifdef NDMOS_OPTION_NO_NDMPx specific versions may be omitted. At this time, NDMPv2 and NDMPv3 are deployed. NDMPv1 was defined but not widely deployed, and deemed irrelavent. NDMPv4 is under consideration.

NDMP is defined using RPC protocol specification files (.x files). NDMP does not really use the RPC layer, but it does use the RPC XDR (External Data Representation) layer.

The original NDMP .x files are cosmetically transformed for NDMJOBLIB. The original NDMPv2 and NDMPv3 .x files use names like ndmp_name and ndmp_config_get_host_info_reply. These changed between versions even though they have the same name. Data structures which didn't change, like ndmp_pval, caused compile-time agony. For example, xdr_ndmp_pval() would be multiply defined at ld(1)-time. The first approach considered and rejected to resolve this was to make a unified, all versions .x file. It was rejected because it becomes difficult, even impractical, to integrate new versions and to omit old ones. The approach taken was to transform the names to reflect protocol version. This same approach was adopted by NFS for NFSv3 and NFSv4. Now there is an ndmp2_pval and an ndmp3_pval, and the compiler is happy. When it's defined, there will be an ndmp4_pval.

There are two pseudo-versions of the protocol here: NDMPv0 and NDMPv9. These are used for internal convenience. These are also defined using .x files because it's easy to cut-n-paste from the official .x files. Neither NDMPv0 nor NDMPv9 may be omitted.

NDMPv0 is the NDMP protocol subset used before the protocol version negotiation is complete. This subset of the protocol must necessarily remain immutable and constant for all time. NDMPv0 is the over-the-wire protocol until the version is negotiated.

NDMPv9 is an internal representation of the protocol and isolates higher layers of NDMJOB from most variations between protocol version. NDMPv9 makes it a little easier to add new versions and omit older ones. NDMPv9 is never used over-the-wire, and therefor there are no XDR routines.

There are three primary elements of this layer:

  1. Header files which define each version of the protocol. These are generated from files (.x files) by rpcgen(1).

  2. XDR routines which convert to/from the over-the-wire protocol and internal data structures. These are also generated by rpcgen(1). There are also tables of XDR routines.

  3. Support for pretty-printing the protocol data structures. Maybe someday rpcgen(1) will generate these, too.

ndmprotocol.h

This is the key #include file for the NDMP protocol layer of NDMJOBLIB. It #include's the other header files for this layer based on protocol version configuration preprocessor symbols (NDMOS_OPTION_NO_NDMP2 and NDMOS_OPTION_NO_NDMP3).

ndmprotocol.c - Protocol layer accessor routines

ndmp_ammend.h - Definitions needed in spec

NDMP PROTOCOL ammendments. Makes certain names follow logical rules. These rules are used in a great many macros.

ndmp_translate.h/.c - vX/v9 translators

ndmp[0239].x - Protocol specification files

ndmp[0239].h - rpcgen(1) generated header files

ndmp[0239]_xdr.c - rpcgen(1) generated XDR routines

ndmp[023]_xmt.c - XDR message dispatch tables

ndmp[0230]_enum_strs.h/.c - enum/str for pp

ndmp[023]_pp.c - Pretty Printers, used for snoop

SCSI Media Changer (smc)

scsiconst.h

SCSI constants used for the tape robotics

smc.h

SCSI MEDIA CHANGER header file

smc_api.c

SCSI MEDIA CHANGER library API

smc_parse.c

SCSI MEDIA CHANGER parser for the data returned by certain queries.

smc_pp.c

SCSI MEDIA CHANGER pretty-printer (pp)

smc_priv.h

SCSI MEDIA CHANGER private header file.

smc_raw.h

SCSI MEDIA CHANGER raw format of query result data

TAR Format Support Routines

tarhdr_gnu.h

GNU Tar record structure header file

tarsnoop.c and tarsnoop.h

General tar(1) data stream snooper.

Operating System Specific

By Operating System (O/S) specific we mean the programming environment including compilers, header files, as well the host O/S APIs. O/S specific is clear and concise, so that's how we refer to the hosting environment.

The ndmos.h file essentially #include's the right ndmos_xxx.h for the hosting environment. The companion source C files, ndmos_xxx*.c, are similarly selected by ndmos.c.

The strategy for separating the O/S specific and O/S generic portions of NDMJOBLIB has four key points:

  1. Isolate O/S specific portions in separate files which can be developed, contributed, and maintained independently of the overall source base.

  2. NEVER NEVER #ifdef based on O/S or programming environment in the O/S generic portions. These make collective maintenance and integration too difficult.

  3. Use O/S specific #define macros (NDMOS_...) and C functions (ndmos_...) as wrappers around the portions that vary between environments and applications.

  4. Use generic, objective-oriented #ifdef's to isolate and omit functionality which may not be wanted in all applications.

There are templates in ndmos_xxx.h and ndmos_xxx.c to get started on a new O/S specific portion. Send contributions to the current keeper of NDMJOB. Contact ndmp-tech@ndmp.org for details.

DO NOT MODIFY ANY GENERIC PORTION OF NDMJOBLIB FOR THE SAKE OF A HOSTING ENVIRONMENT OR APPLICATION.
If you discover additional isolation requirements, raise the issue on ndmp-tech@ndmp.org. Propose new #define NDMOS_ macros to address them. Then, submit the proposal with required changes to the current keeper of NDMJOB. Changes to the generic portion which use #ifdef's based on anything other than NDMOS_ macros will be summarily rejected.

ndmos.h

There are four sections of this file:
  1. Establish identities for various O/S platforms
  2. Try to auto-recognize the environment
  3. #include the right O/S specific ndmos_xxx.h
  4. Establish default #define-itions for macros left undefined

ndmos.c

This merely #include's the right O/S specific C file based on the NDMOS_ID preprocessor symbol. The O/S specific source file is contained in a file ndmos_xxx.c, where xxx is the name for the programming environment.

ndmos_freebsd.h and ndmos_freebsd.c

The O/S specific files for FreeBSD. As of this writting (NDMJOBLIB 1.1), it uses the tape simulator (ndma_tape_simulator.c) and does not implement the SCSI pass-thru.

ndmos_solaris.h and ndmos_solaris.c

The O/S specific files for Solaris. As of this writting (NDMJOBLIB 1.1), it uses the tape simulator (ndma_tape_simulator.c) and does not implement the SCSI pass-thru.

ndmos_xxx.h and ndmos_xxx.c templates

These are templates for creating additional O/S specific portions. Copy to new files with xxx replaced with a short name for the hosting programming environment, then follow the directions in the comments. Please contribute your working module to ndmjob@traakan.com.




About Recovery

A great deal of the NDMP merit, and of the implementation difficulties, and of disperate NDMP implementations centers around the recovery features. This sections discusses NDMJOB recovery operations.

The NDMP recovery process is about selecting objects from the image stream in as efficient a manner as possible. Selected objects are passed to the formatter program for processing (which usually means storing).

Basic recovery steps

Recovery can be viewed as two steps for each object:

Aquisition

Pre-read enough of the object in order to fully identify it and subsequently determine its disposition. During the pre-read, the image stream is not passed to the formatter. Pre-read data is held in the plumb.image channel buffer.

Disposition

Once enough of the object has been pre-read, its disposition is decided. An object is either PASSED or DISCARDED.

Disposition PASS

the pre-read portion plus the rest of the object are passed to the formatter program via the formatter_image channel. This simply requires copying the required amount of data from the image channel buffer to the formatter_image buffer.

Disposition DISCARD

pre-read portion plus the rest of the object are simply consumed out of the image channel buffer.

The backup image is a sequence of objects. Some objects are selected (PASSED), some are not (DISCARDED). We expect objects to appear in consecutive groups of either selected or not selected, thus creating a detectable "edge". This edge can trigger certain optimizations. Detecting the edge is as simple as recognizing when the current disposition is not the same as the previous disposition.

Direct and sequential access

The NDMP architecture allows for two methods of access to the backup image during recovery: direct and sequential.

Direct access allows for the DATA agent to cue the TAPE agent for portions of the backup image. This is done with the NDMP_NOTIFY_DATA_READ and NDMP_MOVER_READ interfaces. The TAPE agent uses these cues to rapidly position the tape to the required portion. CONTROL agent intervention is required for tape changes and such.

There are times when direct access is impossible. Two examples spring to mind. First, the NDMPCOPY scenario, where one DATA agent is constructing a backup image, the image is delivered to a second DATA agent, and it recovers the image to disk. Second, when the CONTROL or TAPE agent does not support (implement) the direct access features of NDMP.

Hence, there are times when the entire backup image must be conveyed over the image stream and processed. This is the sequential access method.

Environment variables controlling access

DIRECT

The NDMPv3 spec mentions an environment variable "DIRECT", which is either "yes" or "no". The spec does not clearly state the semantics of this variable. Here, it is defined. If "no", discreet NDMP_NOTIFY_DATA_READ requests may not be issued. The DATA agent is expected to issue a single NDMP_NOTIFY_DATA_READ with an offset of 0 and a length of infinity (all 1s). Such a request is the strict definition of SEQUENTIAL access. If "DIRECT" is "yes", the DATA agent MAY, though is not required to, use discreet NDMP_NOTIFY_DATA_READ requests. The DIRECT variable has no implication to the fh_info fields (see below).

RECOVER_DIRECT

Per postings to the ndmp-tech e-mail list, the environment variable favored is "RECOVER_DIRECT", rather than just "DIRECT". If RECOVER_DIRECT is not given, "DIRECT" is checked. If neither is given, SEQUENTIAL access is used, as per the NDMPv3 spec which says the default value of DIRECT is "n".

NDMP ndmp_name structure

The ndmp_name structure has three important fields. The "name" field is the name of the file/object as it occurs in the backup image. The "dest" field is the name as which the object should be stored. The "fh_info" field is a 64-bit cookie generated at the time the backup_image was constructed which is used to identify the position of the object in the backup_image.

The NDMPv2 specication says that the name field should be the path name of the file/object, relative to the backup root, as it occurs in the backup. The name field participates in the DISPOSITION phase of object processing (see BASIC STEPS above). There is no requirement nor guidance in the NDMP specifications (v2 nor v3) whether a name implies selection of a single object or perhaps a collection of objects. For example, if the named object is a directory, should just the directory be recovered, or should all objects at or below the directory be recovered? Conventional practice is that the recovering DATA agent will answer the question in a manner it deems natural. For "tar" format backups, as implemented here, a named directory implies the directory and its contents. The name field is used as a prefix match for objects in the stream. Objects which match are deemed selected, and their DISPOSITION is PASS.

The dest field specifies where the recovered object(s) are to be places. For "tar" format backups, as implemented here, selected objects have the matching prefix substituted by the dest field.

fh_info field

The fh_info field is said to be an opaque object, with its contents only known to the relavent DATA agents. In practice, it is a byte offset in the backup image. Some NDMP implementation have problems if the fh_info field is otherwise. The NDMP specifications (neither v2 nor v3) and protocol make no provisions for identifying or recognizing the validity of the fh_info field. The assumption is that if the DATA agent expects the fh_info field to be valid, it MUST be valid. If the DATA agent does not support (implement) the fh_info field, it is disregarded. This leads to a severe problem. It means that for DATA agents which implement fh_info field, a recovery request simply can not be processed without valid fh_info fields. There is no way to cue the DATA agent to perform the recovery solely based on the name and dest fields. The common answer is inadequate: use an unspecified environment variable to indicate the status of the fh_info field. This answer would lead to disperate practice for a fundamental issue. Here, we establish a practice as proposed on the ndmp-tech e-mail list.

Known invalid fh_info all 1s

An fh_info field value of all 1s indicates a known invalid fh_info.

RECOVER_FH_INFO_VALID environment variable

The RECOVER_FH_INFO_VALID environment variable is defined here. It has either "yes" or "no" as its value. If "yes", the fh_info fields are considered valid, with deference to the known invalid value (all 1s). If "no", all fh_info fields are deemed invalid. If not given, the default is "yes".

RECOVER_PREFIX environment variable

The RECOVER_PREFIX environment variable is defined here. If given, the path names in the dest fields are prepended with this value. The PREFIX environment variable plays no role in recovery, and is considered merely informational about when the backup image was created. If a dest field is not given (null or empty), the name field is used as the dest, subject to the RECOVER_PREFIX value.

Direct/sequential vs. valid/invalid fh_info

Here comes the tricky part. Does invalid fh_info preclude direct access? No. Does valid fh_info preclude sequential access? No.

This table shows the mode used depending on the DIRECT environment variable and the status of the fh_info field.


			      DIRECT=yes     DIRECT=no
========================================================
  all fh_info valid         direct         sequential
  all fh_info invalid       semi-direct    sequential
  fh_info mixed             (see text)     sequential
========================================================

Recovery modes

Sequential mode

All objects are AQUIRED, and extracted from the image stream. SEQUENTIAL access is initiated (NOTIFY_DATA_READ 0/inf). All DISPOSITION determined by name field match. PASSED data is copied to the formatter program. DISCARDED data is consumed and not passed.

Semi-direct mode

All objects are AQUIRED, and selectively extracted from the image stream. Discreet NOTIFY_DATA_READ requests used for AQUISITION. Once aquired, DISPOSITION is determined based on name field match. PASSED data is requested by NOTIFY_DATA_READ, and passed to formatter program. DISCARDED data is simply skipped by omitting a corresponding NDMP_NOTIFY_DATA_READ request.

Direct mode

Objects selectively AQUIRED, and selectively extracted from image stream. The ndmp_name entries in the recovery request re processed in fh_info order. Objects initially AQUIRED by direct access based on fh_info field, then semi-direct method employed until an object with disposition DISCARD encountered. This is the PASS->DISCARD edge. If the named object is a directory, then all directory contents are PASSED until a non-matching name is encountered. Then, the next ndmp_name entry is processed.

Mixed mode

The mixed mode is used when some or all of the fh_info fields are known invalid. The semi-direct mode is used until all ndmp_name entries with invalid fh_info values are satisfied. The remaining ndmp_name entries are processed with the direct mode.




Porting Tips

 


Up to top