                           User manual to geo_import
 
 
                                Leonid Petrov
 
 
                                2000.10.23
 
 
                                Abstract
 
   This document describes program geo_import for automatic downloading new
VLBI database files submitted to the IVS Data Center. The document describes
format of configuration file and gives hints how to customize this program for
needs of the Analysis Center.
 
 
Questions and comments about this guide should be sent to:
 
Leonid Petrov ( pet@leo.gsfc.nasa.gov )
 
 
 
                                Table of contents:
 
 
1 ................ Overview
 
 
2 ................ Usage
 
   2.1 ........... geo_import
   2.2 ........... db_import
 
3 ................ Configuration file
 
   3.1 ........... Format of configuration file
   3.2 ........... Descriptions of fields of configuration file
   3.3 ........... Example of configuration file
 
4 ................ Customization
 
 
5 ................ Hints
 
 
6 ................ History
 
 
________________________________________________________________________________
 
        1 Overview
        ==========
 
   Program geo_import is for automatic downloading new databases submitted to
the IVS Data Center. It is assumed that geo_import is scheduled by cron and
runs automatically. geo_import calls program db_import unless it already runs.
db_import uses popular free-ware program wget
( http://www.gnu.org/software/wget/wget.html ) for inquiring IVS Data Center.
It retrieves the list of VLBI database files which are in the IVS Data Center,
get the database list which are in the current VLBI catalogue, checks the
database file lists which are in the incoming directory, checks database
filename filter specified in the configuration files and builds the list of the
databases which are to be downloaded. If the list is not empty it schedules
program wget for retrieving database files to the incoming directory. Finally
db_import uncompress database files and sends an e-mail message to a set of
users notifying them that the new data just arrived.
 
        2 Usage
        =======
 
   There are two programs: geo_import and db_import
 
                2.1 geo_import
                ~~~~~~~~~~~~~~
 
   geo_import doesn't have parameters. It looks for configuration file
$MK4_ROOT/local/{center_label}.dbi sources it and checks whether program
db_import is running. If it is running geo_import does nothing more and
terminates quietly with status code 0. If db_import is not running then it
launches db_import which makes actual work for database file downloading.
In the cases of errors (what is unlikely) geo_import sends a message to
solve administrator and creates a stop file which prohibits launching db_import
until users removes this stop file manually.
 
   NB: geo_import is a C-shell script and cannot be launched by cron. A proxy
.sh sell program should be used for launching geo_import by cron.
 
                2.2 db_import
                ~~~~~~~~~~~~~
 
  Normally user doesn't launch db_import directly, although he/she, of course,
can. db_import returns error code 0 if it successfully reaches the end of
work: successfully downloads files and sends an e-mail message or it finds that
no files should be downloaded. Otherwise, it returns non-zero error code and
prints the error message to the screen.
 
   db_import makes the following steps:
     1) reads configuration file and checks its parameters.
     2) retrieves list of the database files in the IVS Data Center
        for the specified period.
     3) gets the list of database files which are in the current
        catalogue system.
     4) gets the list of database files which are in the incoming
        directory and may necessarily in the catalogue system.
     5) reads get_lists and notget_lists which specifies which
        databases should be downloaded and which databases should not
        be downloaded. Both lists may contains (and usually contains)
        wild-card symbols.
     6) generates lists of the databases which are to be downloaded:
        -- in the IVS Data Center database list  AND
        -- not in the current catalogue system   AND
        -- not in the incoming directory         AND
        -- in the get_list                       AND
        -- not in the noget_list.
     7) if this list is empty then end of work. If not then URL list
        of the databases to be downloaded is build and written in the
        temporary file. These files are downloaded by using wget
        program. If database files are compressed by gzip they are
        uncompressed.
     8) e-mail message is sent about successful downloading database
        files to the user specified in the configuration file. End of
        work.
 
        Usage: db_import [-h] -c <config_file> [-v <verbosity>]
 
  Options:
 
  -h -- if specified then a short help file is printed to the screen.
  -c -- file name of the configuration file. Refer to
        geo_import_01.txt for specifications of the configuration file.
  -v -- (optional) specifies verbosity level. If 1 then
        informational messages about progress appear on the screen.
        0 turns on silent mode -- only error messages will be printed.
 
  NB: user should be aware that nobody runs at this moment db_import at this
machine or other machines connected in your local cluster!
 
        3 Configuration file
        ====================
 
  Configuration file controls work of geo_import. geo_import looks for
file $MK4_ROOT/local/{center_label}.dbi . Name of the configuration file is
passed as an argument to db_import.
 
                3.1 Format of configuration file
                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
  Configuration file contains records of three type:
  1) comments: any line which beginning from ##
  2) directives for db_import: the line which starts from # and which is
     not a comment line.
     Directive consists of three or more words separated by one or more
     blanks.
 
     Word1 -- symbol # -- directive attribute
     Word2 -- keyword
     Word3,4...  value(s) of the keyword
 
  3) setting variable for geo_import: the line which starts from "set".
     format is consistent with format of command set in C-shell:
     "set variable = value". If value contains blank(s) then it would be
     embraced in ". Refer to HP-UX Reference Section 1, Volume 1, description
     of csh.
 
  Configuration file should contain definition of all keywords and all variables
  listed in the next subsection.
 
                3.2 Descriptions of fields of configuration file
                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
Directives of db_import.
 
IVS_DB_URL: URL for the root directory in the IVS Data Center the which
            contains database file. Value should end with letters "db/". Since
            there are several IVS Data Centers user can select the Data Center
            to which connection is the best.
 
WGET_EXE:   Filename with path of program wget.
 
GZIP_EXE:   Filename with path of program gzip.
 
TMP_DIR:    Name of temporary directory.
 
GET_FILE:   Filename with path of the file which lists database filenames to be
            downloaded. Database filenames may contain wild-card symbols * and ?
            which are interpreted by the same well as Unix shell.
            NB: normally database files are compressed by gzip in the IVS
                data center and have a suffix .gz . Suffix .gz is removed
                from the IVS database filenames before comparison and therefore
                filenames in GET_FILE should not contain it.
            Format of GET_FILE: records of variable length. Each record contain
            one filename which user desires to download. Trailing blanks are
            ignored. Lines which starts from ## are considered as comments.
 
GET_FILE:   Filename with path of the file which lists database filenames NOT
            to be downloaded. Database filenames may contain wild-card symbols
            * and ? which are interpreted by the same well as Unix shell.
            NB: normally database files are compressed by gzip in the IVS
                data center and have a suffix .gz . Suffix .gz is removed
                from the IVS database filenames before comparison and therefore
                filenames in GET_FILE should not contain it.
            Format of NOGET_FILE is the same as GET_FILE: records of variable
            length. Each record contain one filename which user desires to
            download. Trailing blanks are ignored. Lines which starts from ##
            are considered as comments.
 
INCOMING_DIR: Directory name where geo_import will put new databases. It is
              desirable to keep this directory at the local machine.
 
LOG_FILE:     Filename with path of the log file. geo_import records names
              of the files retrieved from the IVS Data Center.
 
DATE_START:   Start date. geo_import would retrieve files for experiments
              date conducted at DATE_START and later. Experiment date
              is derived from the database name. format of DATE_START:
              yyyy.mm.dd where yyyy -- integer year number, mm -- integer
              month number, dd -- integer day number. Leading blanks are
              replaced with blanks. For example, 2000.12.02 (December 2, 2000)
 
DATE_END:     End date. geo_import would retrieve files for experiments
              date conducted at DATE_END and before. Experiment date
              is derived from the database name. format of DATE_END:
              yyyy.mm.dd where yyyy -- integer year number, mm -- integer
              month number, dd -- integer day number. Leading blanks are
              replaced with blanks. For example, 2000.12.02 (December 2, 2000)
              NB: geo_import converts dates in future to tomorrow dates what
              prohibits crazy databases which corresponds to the experiment
              which was not yet observed.
 
MAIL_COMMAND: Name of the command (with possible switches) for sending mail
              in non-interactive mode. mailx is recommended (but check,
              whether your system has it)
 
EMAIL_IMPORT: db_import may send a message about successful downloading new
              data. If value of EMAIL_IMPORT is NO then no e-mail messages is
              sent. Any other values of EMIAL_IMPORT considered as a blank
              separated list of e-mail addresses of the respondents which
              receive notification.
 
Settings for geo_import
 
dbi_exec      Full name of the db_import executable. solve_install puts
              db_import executable in $MK4_ROOT/bin/db_import . NB: check
              whether variable MK4_ROOT is defined in the context of the
              process which executes geo_import.
 
dbi_stopfile  Full name of the so-called stop file. Normally this file should
              not exist. If geo_import encountered an error then it sends the
              error message and creates stop file. If there exist a stop file
              then geo_import doesn't try to execute db_import and
              terminates quietly with status code 0 without attempt to retrieve
              database. This mechanism prevents incessant flux of e-mail with
              error message from geo_import scheduled by cron (Solve
              administrator would receive only one error message), but requires
              "recovery action" after receiving error message: stop file should
              be manually removed in order to re-activate geo_import.
 
host_name     Host name where geo_import is running.
 
email_failure Blank-separated list of e-mail addresses of the persons who will
              receive error messages from geo_import. Failure of geo_import
              is unlikely, but if it occur, geo_import is suspended: stop
              file is created and geo_import will block other attempt to
              retrieve database files until stop file is manually removed.
              This list may be empty. NB: if the list contains blanks it should
              be enclosed in ".
 
mail_command  Name of the command (with possible switches) for sending mail
              in non-interactive mode. mailx is recommended (but check,
              whether your system has it)
 
verbosity     Verbosity level. 0 should be set for normal operation. Value
              1 will force appearance of informational messages which are
              interesting for developer.
 
                3.3 Example of configuration file
                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
#############################################################################
##                                                                         ##
##  DB_IMPORT configuration file for Goddard Space Flight Center (GSF)     ##
##                                                                         ##
##                                                    24-OCT-2000 09:40:35 ##
##                                                                         ##
#############################################################################
# IVS_DB_URL:      ftp://cddisa.gsfc.nasa.gov/vlbi/ivsdata/db/
# WGET_EXE:        /users/pet/bin/wget
# GZIP_EXE:        /usr/local/bin/gzip
# TMP_DIR:         /home/vlbiagent/temp/
# GET_FILE:        /data27/mk4/local/GSFC.dbi_get
# NOGET_FILE:      /data27/mk4/local/GSFC.dbi_noget
# INCOMING_DIR:    /box3/incoming/
# LOG_FILE:        /data1/save_files/dbi_GSFC.log
# DATE_START:      2000.01.01
# DATE_END:        2010.01.01
# MAIL_COMMAND:    mailx
# EMAIL_IMPORT:    dgg cal
#!
set dbi_exec      = "/data27/mk4/bin/db_import"
set dbi_stopfile  = "/data27/mk4/local/GSFC.dbi_stop"
set email_failure = "pet fgg"
set host_name     = "bootes"
set mail_command  = "mailx"
set verbosity     = 0
 
        4 Customization
        ===============
 
   geo_import and db_import are created during Solve installation. Source
code resides in the directory $MK4_ROOT/utils/db_import .
 
   In order to use geo_import you have to do several steps:
 
1) create db_import configuration file in directory $MK4_ROOT/local under
   name <Center_label>.dbi
   You have to decide from which IVS Data Center you are going to get databases
   and in which date range. You have also assign e-mail address of the persons
   who will receive e-mail message about retrieving new database files and
   about geo_import failures.
 
2) create files GET_FILE and NOGET_FILE
 
   If you would like to get all files then you can specify * in GET_FILE and
   keep NOGET_FILE empty.
 
   If you don't like to receive preliminary versions of databases you can
   specify
 
   *_V001
   *_V002
   *_V003
 
   in NOGET_FILE.
 
3) decide which "user" will run geo_import regularly. It may be account for
   normal user or a "pseudo-user" for batch jobs. That account should have normal
   user privileges plus ability to use cron. Check: "crontab -l" whether
   the account has this privilege. NB: don't run geo_import from root account!
 
4) copy $MK4_ROOT/utils/db_import/geo_import_cron.sh.templ to
   geo_import_cron.sh and $MK4_ROOT/utils/db_import/geo_import.crn.templ to
   geo_import.crn
 
   geo_import_cron.sh calls db_import under sh-shell. Path to db_import should
   be changed there.
 
   geo_import.crn calls geo_import_cron.sh every hour. You should change
   path to geo_import_cron.sh and you can change schedule of launching
   gep_import_cron.sh . Refer to description of Unix crontab command.
 
5) Test geo_import_cron.sh . If it works correctly and doesn't send error
   messages then you have to add it to the cron table of the account which
   is running geo_import:
 
   crontab geo_import.crn
 
    That is all.
 
        5 Hints
        =======
 
  If you scheduled geo_import by cron you can change configuration file,
get and noget files at any time. Their changes will be applied to the next
time when db_import is called.
 
  geo_import puts files in some incoming directory. It is wise to have this
directory as "VLBI Catalogue area" or by another words to make it visible
for program catlg. If you removed databases from incoming directory but didn't
imported them in your VLBI catalogue system, db_import will stubbornly retrieve
them again. If db_import retrieved the database file which you don't want to
keep in incoming and are not going to import it in you VLBI catalogue, the
only way to secure that geo_import will not retrieve it again is to add
the database file name to the noget-list.
 
  If geo_import retrieved the database file version x, it will not
automatically retrieve the database version y when it appears in the IVS Data
Center, provided that the version x is already either in the incoming directory
or in your VLBI catalogue system. If you don't need x version of the database
but you are interesting only in version y, then you first have to add the
database filename with version x in the noget-file, then to remove the database
file from the catalogue system and from the the incoming directory.
 
  You can use geo_import for browsing certain list of databases: it is enough
to specify their names (without suffix .gz) in the get_file.
 
  You can use geo_import without cron by launching it manually.
 
  In order to stop retrieving you have to
     1) kill db_import process
     2) remove geo_import.crn from cron-table
 
  If you got the error message from geo_import you have to re-animate it by
removing stop file. geo_import will be totally disabled unless you removed stop
file.
 
        6 History
        =========
 
 2000.10.17  L. Petrov  Beginning of development.
 2000.10.19  L. Petrov  Release of version 1.00
 2000.10.26  L. Petrov  Development of documentation is completed.
