Memo:   Concept of the VCAT -- new VLBI Catalogue system for Calc/Solve
Author: L. Petrov
Date:   2003.08.14


	I. Introduction.
        ----------------

  VCAT is a software for handling a set of VLBI experiments for their 
processing by the Mark-5 Calc/Solve VLBI analysis software system. It is 
designed for manipulating the database files in the GVF format and is 
supposed to  overcome numerous obstacles imposed by the old VLBI Catalogue 
system developed in 70x. The differences of databases in the GVF format from 
databases in the Mark-3 DBH format are as follows:

a) A database in the GVF format consists on one or more files. Although the
   GVF format is flexible and allows to put all lcodes in one section and one 
   file, or concatenate different sections in one file, in practices the 
   database will be split in 6-10 files.

b) VCAT assumes that all databases, all versions are available on line in
   local disk, while the old system was designed to support the case that only
   a fraction of the databases is available on-line.

c) GVF file has information about both bands, although Mark3 DBH database
   files are band specific with some reservations.

d) GVF format supports unlimited number of encodings, ASCII dump, binary,
   big_endian, little_endian etc., although in practice the binary format 
   will be used.

e) GVF format supports compressed databases.

f) GVF format supports network name of the database files, i.e. the database
   files may be located at a remote server.


       II. Data structure.
       -------------------

  2.1) Each database consists of one or more datafiles and one envelop file.
       The envelop file has the list of the datafiles and the area name
       (explained later)

  Example of the envelop file 2003-07-03A1_v001.vdb :

2003-07-03A1_fr1_v001.gvf SYS 
2003-07-03A1_fr2_v001.gvf SYS 

   How it is related to the old VLBI catalogue system:
       Envelop file was hidden inside the catalogue file in the old catalogue
       system.

  2.2) A database systems consists of one or more areas. An area consists
       of one or more directories. An area has name and a set of attributes.
       An area with name SYS is mandatory, other areas are optional.
       A database directory of the area contains only subdirectories, one
       subdirectory per experiment. Database and envelop file are located in 
       these subdirectories. So all files, all versions related to an 
       experiment are gathered in one directory.

                example:
       AREA-1 (SYS)     AREA-2 (TEST)    AREA-3 (PET)
       /data1 /data2    /data3           ~/pet/databases
 
       2003-07-03A1--|
                     |-2003-07-03A1_v001.vdb 
                     |-2003-07-03A1_fr1_v001.gvf 
                     |-2003-07-03A1_fr2_v001.gvf 

       2003-07-03A2--|
                     |-2003-07-03A2_v001.vdb 
                     |-2003-07-03A2_v002.vdb 
                     |-2003-07-03A2_v003.vdb 
                     |-2003-07-03A2_v004.vdb 
                     |-2003-07-03A2_v005.vdb 
                     |-2003-07-03A2_fr1_v001.gvf 
                     |-2003-07-03A2_fr2_v001.gvf 
                     |-2003-07-03A2_th1_v001.gvf 
                     |-2003-07-03A2_th2_v001.gvf 
                     |-2003-07-03A2_th1_v002.gvf 
                     |-2003-07-03A2_th2_v002.gvf 
                     |-2003-07-03A2_cl1_v001.gvf 
                     |-2003-07-03A2_cl2_v001.gvf 
                     |-2003-07-03A2_sl1_v001.gvf 
                     |-2003-07-03A2_sl1_v002.gvf 

       2003-07-05U1--|
                     |-2003-07-05U1_fr1_v001.vdb 
                     |-2003-07-05U1_fr2_v001.vdb 
                     |-2003-07-05U1_th1_v001.gvf 
                     |-2003-07-05U1_th2_v001.gvf 
                     |-2003-07-05U1_cl1_v001.gvf 
                     |-2003-07-05U1_cl2_v001.gvf 
                     |-2003-07-05U1_sl1_v001.gvf 

   How it is related to the old VLBI catalogue system:
       The old VLBI system has only one area. Files were scattered chaoticly
       over different directories of an area.


  2.3) Version control.

       VCAT allows more refined version granulation than the old catalogue 
       system. Each database file has its version and each envelop file has 
       its own version. When a database (which, I remind, consists of an 
       envelop and one or more datafiles) is updated, then one or more 
       datafiles are updated, (not necessarily all datafiles) and the version 
       counter of each updated datafile is incremented. In addition to that 
       the new envelop file is created with the version counter incremented. 
       VCAT also allows overwriting existing file without incrementing version 
       number. An area should have an attribute which allows such an 
       operation. It is useful when a temporary database is created which
       is not intended to be kept permanently, f.e. due to testing runs.

   Example:

2003-07-03A2_v001.vdb 
   2003-07-03A2_fr1_v001.gvf SYS
   2003-07-03A2_fr2_v001.gvf SYS

2003-07-03A2_v002.vdb 
   2003-07-03A2_fr1_v001.gvf SYS
   2003-07-03A2_fr2_v001.gvf SYS
   2003-07-03A2_th1_v001.gvf SYS
   2003-07-03A2_th2_v001.gvf SYS

2003-07-03A2_v003.vdb 
   2003-07-03A2_fr1_v001.gvf SYS
   2003-07-03A2_fr2_v001.gvf SYS
   2003-07-03A2_th1_v001.gvf SYS
   2003-07-03A2_th2_v001.gvf SYS
   2003-07-03A2_cl1_v001.gvf SYS
   2003-07-03A2_cl2_v001.gvf SYS

2003-07-03A2_v004.vdb 
   2003-07-03A2_fr1_v001.gvf SYS
   2003-07-03A2_fr2_v001.gvf SYS
   2003-07-03A2_th1_v001.gvf SYS
   2003-07-03A2_th2_v001.gvf SYS
   2003-07-03A2_cl1_v001.gvf SYS
   2003-07-03A2_cl2_v001.gvf SYS
   2003-07-03A2_sl1_v001.gvf SYS

2003-07-03A2_v005.vdb 
   2003-07-03A2_fr1_v001.gvf SYS
   2003-07-03A2_fr2_v001.gvf SYS
   2003-07-03A2_th1_v001.gvf SYS
   2003-07-03A2_th2_v001.gvf SYS
   2003-07-03A2_cl1_v001.gvf SYS
   2003-07-03A2_cl2_v001.gvf SYS
   2003-07-03A2_sl1_v002.gvf SYS

   How it is related to the old VLBI catalogue system:
       The old VLBI system when a database was updated overwritten all 
       information from the old database file to the new database file. This 
       created a gigantic redundancy and eventually suffocated users with data.
       Although VCAT allows to update each datafile, but does not requires it. 
       Normally, only the datafile which had information which has been 
       actually changed is updated. Some datafiles must be never updated,
       f.e. fr1-file (Fourfit-supplied information) and can be protected. Thus,
       the probability that the lcodes which are not suppored to be updated
       will be altered is eliminated.


       III. VCAT: client-server interaction.
       -------------------------------------


  3.1) VCAT consists of the server part, a standalone program which 
is constantly running, and a client part which requests a service from 
the server. VCAT server processes the following requests:

  a) Resolve database file names.
     Input:  area name, database name, database version
     Output: database version, the number of database files, full path name
             of each file

  b) Get list of all database names.
     Input:  area name
     Output: total number of database names, list of database names

  c) Create new database.
     Input:  area name, database name, number of database files, list of 
             database files (without paths).
     Output: lock_id, list of full path names of the database files

  d) Update database file.
     Input:  area name, database name, number of database files to be updated, 
             list of the database files to be updated (without paths).
     Output: lock_id, list of full path names of the database files.


  e) Lift write lock.
     Input:  area name, lock_id
     Output: none
     This operation informs VCAT that the database has been successfully
     written. If "lift write lock" request was not received after a specified
     amount of time after the operation for database file creation or update
     is initiated, VCAT automatically lifts write lock.


 3.2) If a server is not responding within a specified amount of time, 
      a client tries to restart server. If the server is not restarting
      within a specified amount of time, a client falls back in a local mode:
      it spawns a subprocess with server part and then communicate with the
      subprocess. It may be useful in the case if the host where the server
      is supposed to run is down, or there is no connection to that host.
      The setback of this situation is that it potentially allows to two
      processes which run on  different hosts simultaneously update the same
      database.

 3.3) VCAT supports a cluster of hosts at an analysis center. Hosts forms the
      clusters if each node of the cluster "sees" (i.e. has read/write access)
      areas at each another host via NFS. One of the hosts is declared as 
      primary, others are declared as secondary. VCAT server runs on a primary 
      host. One of the faction of VCAT is to keep tracks what all hosts has 
      the same collection of databases. An area on the primary host can be 
      declared as mirrored. When VCAT starts it gets listings of all 
      directories of the mirrored area at all hosts and compares them against 
      the listing of the mirrored directory in the primary host. If there are 
      databases at the mirrored area at the primary host which are absent at 
      the secondary hosts, VCAT copies these databases. When VCAT receives 
      a request for database creation and update it returns the full pathname 
      at the directories at the primary host. Thus an operation of database
      writing is performed at the directory which is mounted on a local disk
      of the primary host. When VCAT server receives a request "Lift write 
      lock", it copies  database from the primary host to all secondary hosts.
      Thus all hosts of the cluster have a copy of all database files in the
      the mirrored area at a local disk. When the VCAT server receives 
      a request to "resolve full database path name" it returns the full path 
      name in the directory at the local disk. Thus, when Solve runs a solution 
      it is always fed by files located in the local disk.


       IV. How does it work.
       ---------------------


   4.1 Contrary to the old VLBI catalogue system, vcat does not have 
a gigantic file with datafile names. It is a function of the operating system. 
On start vcat reads configuration file and then reads all directories and 
subdirectories of known areas. It keeps the names of known files and cache
tables in memory. When it receives a request to write or create a database it
returns the names of the temporary files and sets read lock (creates small
lock files). When it receives a request to lift the lock, what means that the
user process completed writing the database files, it copies the temporary files
to the permanent directories and if necessary copies it to mirrored directories.
It creates the new envelop and updates its internal data structure and cache
tables. Then it removes lock files. In addition to than after a certain 
interval it checks lock files. It removes stale lock files and stale temporary
files -- it means that the user process terminated before completion of writing
the database files. After a specified time it re-reads directories and updates
its internal data structure.

   How it is related to the old VLBI catalogue system:
       The old catalogue system performed function of the operating system:
       it kept its own directories which conflicted with UNIX. VCAT does not
       keep the names of all datafile in a special file: it relies on directory
       files maintained by operating system.


        V. Interface of higher level.
        -----------------------------

   5.1) Section 3.1 contained only primitives of the client-server 
interaction. Client software is based on these primitives.

   5.2) Database read client.
        The interface of the low level of GVH is 

        SUBROUTINE GVH_READ_BGV ( GVH, DATAFILE_NAME, IUER )

        where DATAFILE_NAME is a datafile.
        
        The interface of the higher level which will be actually used in solve
        will be 

        SUBROUTINE GVH_DATABASE_LOAD ( GVH, AREA_NAME, DATABASE_NAME, IUER )

        where DATABASE_NAME is the name of an database.

        Routine GVH_DATABASE_LOAD sends a request to VCAT for resolving 
        a database name. VCAT returns the list of full path names of the
        datafiles. Then GVH_DATABASE_LOAD consecutively calls GVH_READ_BGV.

        During a transition period an emulation of KAI (routine for 
        database reading of the old catalogue system) will be provided.
        Emulated KAI calls GVH_DATABASE_LOAD.

   5.3) Database import. A user process puts database files: an envelop and
        datafiles in an import directory which is outside of the directories
        in any area. Then user process calls a program 
        "vcat_import <envelop file>", vcat_import calls GVH_READ_BGV and 
        reads all datafiles into memory. Then it GVH_WRITE_BGV. GVH_WRITE_BGV
        requests VCAT to create a new database (or database update) and then
        it writes datafiles and envelop in the filenames provided by VCAT.
        vcat_import always writes databases in "native" format. It means that
        if the files were imported in big/little_endian, compressed, even
        ascii, -- in any format recognized by GVH they will be re-written 
        automatically in the native format.

   5.4) vcat_info -- a program which has feature similar to catlg: to learn
        the list of available databases, their versions, date of update,
        history etc. However vcat_info is in a position to provide much more.
        Since the operation of reading GVF file is cheap, vcat_info can quickly
        read all databases and collect information about sources, stations,
        baselines etc. F.e. vcat_info will be in a position to give the
        list of all database files where station TIGOCONC participated.

   5.5) Archiving. Since GVF files are much more condensed, all datafiles,
        all versions are on disk. So operation of data archiving is reduced
        to archiving the disk: daily incremental archive, weekly incremental
        archive, monthly full backup.