Memo: Concept of the VCAT -- new VLBI Catalogue system for Calc/Solve Author: L. Petrov Date: 2003.08.14 I. Introduction. ---------------- VCAT is a software for handling a set of VLBI experiments for their processing by the Mark-5 Calc/Solve VLBI analysis software system. It is designed for manipulating the database files in the GVF format and is supposed to overcome numerous obstacles imposed by the old VLBI Catalogue system developed in 70x. The differences of databases in the GVF format from databases in the Mark-3 DBH format are as follows: a) A database in the GVF format consists on one or more files. Although the GVF format is flexible and allows to put all lcodes in one section and one file, or concatenate different sections in one file, in practices the database will be split in 6-10 files. b) VCAT assumes that all databases, all versions are available on line in local disk, while the old system was designed to support the case that only a fraction of the databases is available on-line. c) GVF file has information about both bands, although Mark3 DBH database files are band specific with some reservations. d) GVF format supports unlimited number of encodings, ASCII dump, binary, big_endian, little_endian etc., although in practice the binary format will be used. e) GVF format supports compressed databases. f) GVF format supports network name of the database files, i.e. the database files may be located at a remote server. II. Data structure. ------------------- 2.1) Each database consists of one or more datafiles and one envelop file. The envelop file has the list of the datafiles and the area name (explained later) Example of the envelop file 2003-07-03A1_v001.vdb : 2003-07-03A1_fr1_v001.gvf SYS 2003-07-03A1_fr2_v001.gvf SYS How it is related to the old VLBI catalogue system: Envelop file was hidden inside the catalogue file in the old catalogue system. 2.2) A database systems consists of one or more areas. An area consists of one or more directories. An area has name and a set of attributes. An area with name SYS is mandatory, other areas are optional. A database directory of the area contains only subdirectories, one subdirectory per experiment. Database and envelop file are located in these subdirectories. So all files, all versions related to an experiment are gathered in one directory. example: AREA-1 (SYS) AREA-2 (TEST) AREA-3 (PET) /data1 /data2 /data3 ~/pet/databases 2003-07-03A1--| |-2003-07-03A1_v001.vdb |-2003-07-03A1_fr1_v001.gvf |-2003-07-03A1_fr2_v001.gvf 2003-07-03A2--| |-2003-07-03A2_v001.vdb |-2003-07-03A2_v002.vdb |-2003-07-03A2_v003.vdb |-2003-07-03A2_v004.vdb |-2003-07-03A2_v005.vdb |-2003-07-03A2_fr1_v001.gvf |-2003-07-03A2_fr2_v001.gvf |-2003-07-03A2_th1_v001.gvf |-2003-07-03A2_th2_v001.gvf |-2003-07-03A2_th1_v002.gvf |-2003-07-03A2_th2_v002.gvf |-2003-07-03A2_cl1_v001.gvf |-2003-07-03A2_cl2_v001.gvf |-2003-07-03A2_sl1_v001.gvf |-2003-07-03A2_sl1_v002.gvf 2003-07-05U1--| |-2003-07-05U1_fr1_v001.vdb |-2003-07-05U1_fr2_v001.vdb |-2003-07-05U1_th1_v001.gvf |-2003-07-05U1_th2_v001.gvf |-2003-07-05U1_cl1_v001.gvf |-2003-07-05U1_cl2_v001.gvf |-2003-07-05U1_sl1_v001.gvf How it is related to the old VLBI catalogue system: The old VLBI system has only one area. Files were scattered chaoticly over different directories of an area. 2.3) Version control. VCAT allows more refined version granulation than the old catalogue system. Each database file has its version and each envelop file has its own version. When a database (which, I remind, consists of an envelop and one or more datafiles) is updated, then one or more datafiles are updated, (not necessarily all datafiles) and the version counter of each updated datafile is incremented. In addition to that the new envelop file is created with the version counter incremented. VCAT also allows overwriting existing file without incrementing version number. An area should have an attribute which allows such an operation. It is useful when a temporary database is created which is not intended to be kept permanently, f.e. due to testing runs. Example: 2003-07-03A2_v001.vdb 2003-07-03A2_fr1_v001.gvf SYS 2003-07-03A2_fr2_v001.gvf SYS 2003-07-03A2_v002.vdb 2003-07-03A2_fr1_v001.gvf SYS 2003-07-03A2_fr2_v001.gvf SYS 2003-07-03A2_th1_v001.gvf SYS 2003-07-03A2_th2_v001.gvf SYS 2003-07-03A2_v003.vdb 2003-07-03A2_fr1_v001.gvf SYS 2003-07-03A2_fr2_v001.gvf SYS 2003-07-03A2_th1_v001.gvf SYS 2003-07-03A2_th2_v001.gvf SYS 2003-07-03A2_cl1_v001.gvf SYS 2003-07-03A2_cl2_v001.gvf SYS 2003-07-03A2_v004.vdb 2003-07-03A2_fr1_v001.gvf SYS 2003-07-03A2_fr2_v001.gvf SYS 2003-07-03A2_th1_v001.gvf SYS 2003-07-03A2_th2_v001.gvf SYS 2003-07-03A2_cl1_v001.gvf SYS 2003-07-03A2_cl2_v001.gvf SYS 2003-07-03A2_sl1_v001.gvf SYS 2003-07-03A2_v005.vdb 2003-07-03A2_fr1_v001.gvf SYS 2003-07-03A2_fr2_v001.gvf SYS 2003-07-03A2_th1_v001.gvf SYS 2003-07-03A2_th2_v001.gvf SYS 2003-07-03A2_cl1_v001.gvf SYS 2003-07-03A2_cl2_v001.gvf SYS 2003-07-03A2_sl1_v002.gvf SYS How it is related to the old VLBI catalogue system: The old VLBI system when a database was updated overwritten all information from the old database file to the new database file. This created a gigantic redundancy and eventually suffocated users with data. Although VCAT allows to update each datafile, but does not requires it. Normally, only the datafile which had information which has been actually changed is updated. Some datafiles must be never updated, f.e. fr1-file (Fourfit-supplied information) and can be protected. Thus, the probability that the lcodes which are not suppored to be updated will be altered is eliminated. III. VCAT: client-server interaction. ------------------------------------- 3.1) VCAT consists of the server part, a standalone program which is constantly running, and a client part which requests a service from the server. VCAT server processes the following requests: a) Resolve database file names. Input: area name, database name, database version Output: database version, the number of database files, full path name of each file b) Get list of all database names. Input: area name Output: total number of database names, list of database names c) Create new database. Input: area name, database name, number of database files, list of database files (without paths). Output: lock_id, list of full path names of the database files d) Update database file. Input: area name, database name, number of database files to be updated, list of the database files to be updated (without paths). Output: lock_id, list of full path names of the database files. e) Lift write lock. Input: area name, lock_id Output: none This operation informs VCAT that the database has been successfully written. If "lift write lock" request was not received after a specified amount of time after the operation for database file creation or update is initiated, VCAT automatically lifts write lock. 3.2) If a server is not responding within a specified amount of time, a client tries to restart server. If the server is not restarting within a specified amount of time, a client falls back in a local mode: it spawns a subprocess with server part and then communicate with the subprocess. It may be useful in the case if the host where the server is supposed to run is down, or there is no connection to that host. The setback of this situation is that it potentially allows to two processes which run on different hosts simultaneously update the same database. 3.3) VCAT supports a cluster of hosts at an analysis center. Hosts forms the clusters if each node of the cluster "sees" (i.e. has read/write access) areas at each another host via NFS. One of the hosts is declared as primary, others are declared as secondary. VCAT server runs on a primary host. One of the faction of VCAT is to keep tracks what all hosts has the same collection of databases. An area on the primary host can be declared as mirrored. When VCAT starts it gets listings of all directories of the mirrored area at all hosts and compares them against the listing of the mirrored directory in the primary host. If there are databases at the mirrored area at the primary host which are absent at the secondary hosts, VCAT copies these databases. When VCAT receives a request for database creation and update it returns the full pathname at the directories at the primary host. Thus an operation of database writing is performed at the directory which is mounted on a local disk of the primary host. When VCAT server receives a request "Lift write lock", it copies database from the primary host to all secondary hosts. Thus all hosts of the cluster have a copy of all database files in the the mirrored area at a local disk. When the VCAT server receives a request to "resolve full database path name" it returns the full path name in the directory at the local disk. Thus, when Solve runs a solution it is always fed by files located in the local disk. IV. How does it work. --------------------- 4.1 Contrary to the old VLBI catalogue system, vcat does not have a gigantic file with datafile names. It is a function of the operating system. On start vcat reads configuration file and then reads all directories and subdirectories of known areas. It keeps the names of known files and cache tables in memory. When it receives a request to write or create a database it returns the names of the temporary files and sets read lock (creates small lock files). When it receives a request to lift the lock, what means that the user process completed writing the database files, it copies the temporary files to the permanent directories and if necessary copies it to mirrored directories. It creates the new envelop and updates its internal data structure and cache tables. Then it removes lock files. In addition to than after a certain interval it checks lock files. It removes stale lock files and stale temporary files -- it means that the user process terminated before completion of writing the database files. After a specified time it re-reads directories and updates its internal data structure. How it is related to the old VLBI catalogue system: The old catalogue system performed function of the operating system: it kept its own directories which conflicted with UNIX. VCAT does not keep the names of all datafile in a special file: it relies on directory files maintained by operating system. V. Interface of higher level. ----------------------------- 5.1) Section 3.1 contained only primitives of the client-server interaction. Client software is based on these primitives. 5.2) Database read client. The interface of the low level of GVH is SUBROUTINE GVH_READ_BGV ( GVH, DATAFILE_NAME, IUER ) where DATAFILE_NAME is a datafile. The interface of the higher level which will be actually used in solve will be SUBROUTINE GVH_DATABASE_LOAD ( GVH, AREA_NAME, DATABASE_NAME, IUER ) where DATABASE_NAME is the name of an database. Routine GVH_DATABASE_LOAD sends a request to VCAT for resolving a database name. VCAT returns the list of full path names of the datafiles. Then GVH_DATABASE_LOAD consecutively calls GVH_READ_BGV. During a transition period an emulation of KAI (routine for database reading of the old catalogue system) will be provided. Emulated KAI calls GVH_DATABASE_LOAD. 5.3) Database import. A user process puts database files: an envelop and datafiles in an import directory which is outside of the directories in any area. Then user process calls a program "vcat_import ", vcat_import calls GVH_READ_BGV and reads all datafiles into memory. Then it GVH_WRITE_BGV. GVH_WRITE_BGV requests VCAT to create a new database (or database update) and then it writes datafiles and envelop in the filenames provided by VCAT. vcat_import always writes databases in "native" format. It means that if the files were imported in big/little_endian, compressed, even ascii, -- in any format recognized by GVH they will be re-written automatically in the native format. 5.4) vcat_info -- a program which has feature similar to catlg: to learn the list of available databases, their versions, date of update, history etc. However vcat_info is in a position to provide much more. Since the operation of reading GVF file is cheap, vcat_info can quickly read all databases and collect information about sources, stations, baselines etc. F.e. vcat_info will be in a position to give the list of all database files where station TIGOCONC participated. 5.5) Archiving. Since GVF files are much more condensed, all datafiles, all versions are on disk. So operation of data archiving is reduced to archiving the disk: daily incremental archive, weekly incremental archive, monthly full backup.