User manual to geo_import
Leonid Petrov
Abstract:
This document describes program geo_import for automatic downloading new
VLBI database files submitted to the IVS Data Center. The document describes
format of configuration file and gives hints how to customize this program for
needs of the Analysis Center.
Table of contents:
- 1 Overview
- 2 Usage
-
- 2.1 geo_import
-
-
- 2.2 db_import
-
- 3 Configuration file
-
- 3.1 Format of configuration file
-
-
- 3.2 Descriptions of fields of configuration file
-
-
- 3.3 Example of configuration file
-
- 4 Customization
- 5 Hints
- 6 History
1 Overview
Program geo_import is for automatic downloading new databases submitted to
the IVS Data Center. It is assumed that geo_import is scheduled by cron and
runs automatically. geo_import calls program db_import unless it already runs.
db_import uses popular free-ware program wget
( http://www.gnu.org/software/wget/wget.html ) for inquiring IVS Data Center.
It retrieves the list of VLBI database files which are in the IVS Data Center,
get the database list which are in the current VLBI catalogue, checks the
database file lists which are in the incoming directory, checks database
filename filter specified in the configuration files and builds the list of the
databases which are to be downloaded. If the list is not empty it schedules
program wget for retrieving database files to the incoming directory. Finally
db_import uncompress database files and sends an e-mail message to a set of
users notifying them that the new data just arrived.
2 Usage
There are two programs: geo_import and db_import
2.1 geo_import
geo_import doesn't have parameters. It looks for configuration file
$MK4_ROOT/local/{center_label}.dbi sources it and checks whether program
db_import is running. If it is running geo_import does nothing more and
terminates quietly with status code 0. If db_import is not running then it
launches db_import which makes actual work for database file downloading.
In the cases of errors (what is unlikely) geo_import sends a message to
solve administrator and creates a stop file which prohibits launching db_import
until users removes this stop file manually.
NB: geo_import is a C-shell script and cannot be launched by cron. A proxy
.sh sell program should be used for launching geo_import by cron.
2.2 db_import
Normally user doesn't launch db_import directly, although he/she, of course,
can. db_import returns error code 0 if it successfully reaches the end of
work: successfully downloads files and sends an e-mail message or it finds that
no files should be downloaded. Otherwise, it returns non-zero error code and
prints the error message to the screen.
db_import makes the following steps:
1) reads configuration file and checks its parameters.
2) retrieves list of the database files in the IVS Data Center
for the specified period.
3) gets the list of database files which are in the current
catalogue system.
4) gets the list of database files which are in the incoming
directory and may necessarily in the catalogue system.
5) reads get_lists and notget_lists which specifies which
databases should be downloaded and which databases should not
be downloaded. Both lists may contains (and usually contains)
wild-card symbols.
6) generates lists of the databases which are to be downloaded:
-- in the IVS Data Center database list AND
-- not in the current catalogue system AND
-- not in the incoming directory AND
-- in the get_list AND
-- not in the noget_list.
7) if this list is empty then end of work. If not then URL list
of the databases to be downloaded is build and written in the
temporary file. These files are downloaded by using wget
program. If database files are compressed by gzip they are
uncompressed.
8) e-mail message is sent about successful downloading database
files to the user specified in the configuration file. End of
work.
Usage: db_import [-h] -c <config_file> [-v <verbosity>]
Options:
-h -- if specified then a short help file is printed to the screen.
-c -- file name of the configuration file. Refer to
geo_import_01.txt for specifications of the configuration file.
-v -- (optional) specifies verbosity level. If 1 then
informational messages about progress appear on the screen.
0 turns on silent mode -- only error messages will be printed.
NB: user should be aware that nobody runs at this moment db_import at this
machine or other machines connected in your local cluster!
3 Configuration file
Configuration file controls work of geo_import. geo_import looks for
file $MK4_ROOT/local/{center_label}.dbi . Name of the configuration file is
passed as an argument to db_import.
3.1 Format of configuration file
Configuration file contains records of three type:
1) comments: any line which beginning from ##
2) directives for db_import: the line which starts from # and which is
not a comment line.
Directive consists of three or more words separated by one or more
blanks.
Word1 -- symbol # -- directive attribute
Word2 -- keyword
Word3,4... value(s) of the keyword
3) setting variable for geo_import: the line which starts from "set".
format is consistent with format of command set in C-shell:
"set variable = value". If value contains blank(s) then it would be
embraced in ". Refer to HP-UX Reference Section 1, Volume 1, description
of csh.
Configuration file should contain definition of all keywords and all variables
listed in the next subsection.
3.2 Descriptions of fields of configuration file
Directives of db_import.
IVS_DB_URL: URL for the root directory in the IVS Data Center the which
contains database file. Value should end with letters "db/". Since
there are several IVS Data Centers user can select the Data Center
to which connection is the best.
WGET_EXE: Filename with path of program wget.
GZIP_EXE: Filename with path of program gzip.
TMP_DIR: Name of temporary directory.
GET_FILE: Filename with path of the file which lists database filenames to be
downloaded. Database filenames may contain wild-card symbols * and ?
which are interpreted by the same well as Unix shell.
NB: normally database files are compressed by gzip in the IVS
data center and have a suffix .gz . Suffix .gz is removed
from the IVS database filenames before comparison and therefore
filenames in GET_FILE should not contain it.
Format of GET_FILE: records of variable length. Each record contain
one filename which user desires to download. Trailing blanks are
ignored. Lines which starts from ## are considered as comments.
GET_FILE: Filename with path of the file which lists database filenames NOT
to be downloaded. Database filenames may contain wild-card symbols
* and ? which are interpreted by the same well as Unix shell.
NB: normally database files are compressed by gzip in the IVS
data center and have a suffix .gz . Suffix .gz is removed
from the IVS database filenames before comparison and therefore
filenames in GET_FILE should not contain it.
Format of NOGET_FILE is the same as GET_FILE: records of variable
length. Each record contain one filename which user desires to
download. Trailing blanks are ignored. Lines which starts from ##
are considered as comments.
INCOMING_DIR: Directory name where geo_import will put new databases. It is
desirable to keep this directory at the local machine.
LOG_FILE: Filename with path of the log file. geo_import records names
of the files retrieved from the IVS Data Center.
DATE_START: Start date. geo_import would retrieve files for experiments
date conducted at DATE_START and later. Experiment date
is derived from the database name. format of DATE_START:
yyyy.mm.dd where yyyy -- integer year number, mm -- integer
month number, dd -- integer day number. Leading blanks are
replaced with blanks. For example, 2000.12.02 (December 2, 2000)
DATE_END: End date. geo_import would retrieve files for experiments
date conducted at DATE_END and before. Experiment date
is derived from the database name. format of DATE_END:
yyyy.mm.dd where yyyy -- integer year number, mm -- integer
month number, dd -- integer day number. Leading blanks are
replaced with blanks. For example, 2000.12.02 (December 2, 2000)
NB: geo_import converts dates in future to tomorrow dates what
prohibits crazy databases which corresponds to the experiment
which was not yet observed.
MAIL_COMMAND: Name of the command (with possible switches) for sending mail
in non-interactive mode. mailx is recommended (but check,
whether your system has it)
EMAIL_IMPORT: db_import may send a message about successful downloading new
data. If value of EMAIL_IMPORT is NO then no e-mail messages is
sent. Any other values of EMIAL_IMPORT considered as a blank
separated list of e-mail addresses of the respondents which
receive notification.
Settings for geo_import
dbi_exec Full name of the db_import executable. solve_install puts
db_import executable in $MK4_ROOT/bin/db_import . NB: check
whether variable MK4_ROOT is defined in the context of the
process which executes geo_import.
dbi_stopfile Full name of the so-called stop file. Normally this file should
not exist. If geo_import encountered an error then it sends the
error message and creates stop file. If there exist a stop file
then geo_import doesn't try to execute db_import and
terminates quietly with status code 0 without attempt to retrieve
database. This mechanism prevents incessant flux of e-mail with
error message from geo_import scheduled by cron (Solve
administrator would receive only one error message), but requires
"recovery action" after receiving error message: stop file should
be manually removed in order to re-activate geo_import.
host_name Host name where geo_import is running.
email_failure Blank-separated list of e-mail addresses of the persons who will
receive error messages from geo_import. Failure of geo_import
is unlikely, but if it occur, geo_import is suspended: stop
file is created and geo_import will block other attempt to
retrieve database files until stop file is manually removed.
This list may be empty. NB: if the list contains blanks it should
be enclosed in ".
mail_command Name of the command (with possible switches) for sending mail
in non-interactive mode. mailx is recommended (but check,
whether your system has it)
verbosity Verbosity level. 0 should be set for normal operation. Value
1 will force appearance of informational messages which are
interesting for developer.
3.3 Example of configuration file
#############################################################################
## ##
## DB_IMPORT configuration file for Goddard Space Flight Center (GSF) ##
## ##
## 24-OCT-2000 09:40:35 ##
## ##
#############################################################################
# IVS_DB_URL: ftp://cddisa.gsfc.nasa.gov/vlbi/ivsdata/db/
# WGET_EXE: /users/pet/bin/wget
# GZIP_EXE: /usr/local/bin/gzip
# TMP_DIR: /home/vlbiagent/temp/
# GET_FILE: /data27/mk4/local/GSFC.dbi_get
# NOGET_FILE: /data27/mk4/local/GSFC.dbi_noget
# INCOMING_DIR: /box3/incoming/
# LOG_FILE: /data1/save_files/dbi_GSFC.log
# DATE_START: 2000.01.01
# DATE_END: 2010.01.01
# MAIL_COMMAND: mailx
# EMAIL_IMPORT: dgg cal
#!
set dbi_exec = "/data27/mk4/bin/db_import"
set dbi_stopfile = "/data27/mk4/local/GSFC.dbi_stop"
set email_failure = "pet fgg"
set host_name = "bootes"
set mail_command = "mailx"
set verbosity = 0
4 Customization
geo_import and db_import are created during Solve installation. Source
code resides in the directory $MK4_ROOT/utils/db_import .
In order to use geo_import you have to do several steps:
1) create db_import configuration file in directory $MK4_ROOT/local under
name <Center_label>.dbi
You have to decide from which IVS Data Center you are going to get databases
and in which date range. You have also assign e-mail address of the persons
who will receive e-mail message about retrieving new database files and
about geo_import failures.
2) create files GET_FILE and NOGET_FILE
If you would like to get all files then you can specify * in GET_FILE and
keep NOGET_FILE empty.
If you don't like to receive preliminary versions of databases you can
specify
*_V001
*_V002
*_V003
in NOGET_FILE.
3) decide which "user" will run geo_import regularly. It may be account for
normal user or a "pseudo-user" for batch jobs. That account should have normal
user privileges plus ability to use cron. Check: "crontab -l" whether
the account has this privilege. NB: don't run geo_import from root account!
4) copy $MK4_ROOT/utils/db_import/geo_import_cron.sh.templ to
geo_import_cron.sh and $MK4_ROOT/utils/db_import/geo_import.crn.templ to
geo_import.crn
geo_import_cron.sh calls db_import under sh-shell. Path to db_import should
be changed there.
geo_import.crn calls geo_import_cron.sh every hour. You should change
path to geo_import_cron.sh and you can change schedule of launching
gep_import_cron.sh . Refer to description of Unix crontab command.
5) Test geo_import_cron.sh . If it works correctly and doesn't send error
messages then you have to add it to the cron table of the account which
is running geo_import:
crontab geo_import.crn
That is all.
5 Hints
If you scheduled geo_import by cron you can change configuration file,
get and noget files at any time. Their changes will be applied to the next
time when db_import is called.
geo_import puts files in some incoming directory. It is wise to have this
directory as "VLBI Catalogue area" or by another words to make it visible
for program catlg. If you removed databases from incoming directory but didn't
imported them in your VLBI catalogue system, db_import will stubbornly retrieve
them again. If db_import retrieved the database file which you don't want to
keep in incoming and are not going to import it in you VLBI catalogue, the
only way to secure that geo_import will not retrieve it again is to add
the database file name to the noget-list.
If geo_import retrieved the database file version x, it will not
automatically retrieve the database version y when it appears in the IVS Data
Center, provided that the version x is already either in the incoming directory
or in your VLBI catalogue system. If you don't need x version of the database
but you are interesting only in version y, then you first have to add the
database filename with version x in the noget-file, then to remove the database
file from the catalogue system and from the the incoming directory.
You can use geo_import for browsing certain list of databases: it is enough
to specify their names (without suffix .gz) in the get_file.
You can use geo_import without cron by launching it manually.
In order to stop retrieving you have to
1) kill db_import process
2) remove geo_import.crn from cron-table
If you got the error message from geo_import you have to re-animate it by
removing stop file. geo_import will be totally disabled unless you removed stop
file.
6 History
2000.10.17 L. Petrov Beginning of development.
2000.10.19 L. Petrov Release of version 1.00
2000.10.26 L. Petrov Development of documentation is completed.
Questions and comments about this guide should be sent to:
Leonid Petrov ( pet@leo.gsfc.nasa.gov )
Last update: 2000.10.23