User manual to geo_import

Leonid Petrov

Abstract:

Table of contents:

1   Overview

2   Usage

2.1   geo_import
2.2   db_import

3   Configuration file

3.1   Format of configuration file
3.2   Descriptions of fields of configuration file
3.3   Example of configuration file

4   Customization

5   Hints

6   History


1   Overview

Program geo_import is for automatic downloading new databases submitted to the IVS Data Center. It is assumed that geo_import is scheduled by cron and runs automatically. geo_import calls program db_import unless it already runs. db_import uses popular free-ware program wget ( http://www.gnu.org/software/wget/wget.html ) for inquiring IVS Data Center. It retrieves the list of VLBI database files which are in the IVS Data Center, get the database list which are in the current VLBI catalogue, checks the database file lists which are in the incoming directory, checks database filename filter specified in the configuration files and builds the list of the databases which are to be downloaded. If the list is not empty it schedules program wget for retrieving database files to the incoming directory. Finally db_import uncompress database files and sends an e-mail message to a set of users notifying them that the new data just arrived.

2   Usage

There are two programs: geo_import and db_import

2.1   geo_import

geo_import doesn't have parameters. It looks for configuration file $MK4_ROOT/local/{center_label}.dbi sources it and checks whether program db_import is running. If it is running geo_import does nothing more and terminates quietly with status code 0. If db_import is not running then it launches db_import which makes actual work for database file downloading. In the cases of errors (what is unlikely) geo_import sends a message to solve administrator and creates a stop file which prohibits launching db_import until users removes this stop file manually. NB: geo_import is a C-shell script and cannot be launched by cron. A proxy .sh sell program should be used for launching geo_import by cron.

2.2   db_import

Normally user doesn't launch db_import directly, although he/she, of course, can. db_import returns error code 0 if it successfully reaches the end of work: successfully downloads files and sends an e-mail message or it finds that no files should be downloaded. Otherwise, it returns non-zero error code and prints the error message to the screen. db_import makes the following steps: 1) reads configuration file and checks its parameters. 2) retrieves list of the database files in the IVS Data Center for the specified period. 3) gets the list of database files which are in the current catalogue system. 4) gets the list of database files which are in the incoming directory and may necessarily in the catalogue system. 5) reads get_lists and notget_lists which specifies which databases should be downloaded and which databases should not be downloaded. Both lists may contains (and usually contains) wild-card symbols. 6) generates lists of the databases which are to be downloaded: -- in the IVS Data Center database list AND -- not in the current catalogue system AND -- not in the incoming directory AND -- in the get_list AND -- not in the noget_list. 7) if this list is empty then end of work. If not then URL list of the databases to be downloaded is build and written in the temporary file. These files are downloaded by using wget program. If database files are compressed by gzip they are uncompressed. 8) e-mail message is sent about successful downloading database files to the user specified in the configuration file. End of work. Usage: db_import [-h] -c <config_file> [-v <verbosity>] Options: -h -- if specified then a short help file is printed to the screen. -c -- file name of the configuration file. Refer to geo_import_01.txt for specifications of the configuration file. -v -- (optional) specifies verbosity level. If 1 then informational messages about progress appear on the screen. 0 turns on silent mode -- only error messages will be printed. NB: user should be aware that nobody runs at this moment db_import at this machine or other machines connected in your local cluster!

3   Configuration file

Configuration file controls work of geo_import. geo_import looks for file $MK4_ROOT/local/{center_label}.dbi . Name of the configuration file is passed as an argument to db_import.

3.1   Format of configuration file

Configuration file contains records of three type: 1) comments: any line which beginning from ## 2) directives for db_import: the line which starts from # and which is not a comment line. Directive consists of three or more words separated by one or more blanks. Word1 -- symbol # -- directive attribute Word2 -- keyword Word3,4... value(s) of the keyword 3) setting variable for geo_import: the line which starts from "set". format is consistent with format of command set in C-shell: "set variable = value". If value contains blank(s) then it would be embraced in ". Refer to HP-UX Reference Section 1, Volume 1, description of csh. Configuration file should contain definition of all keywords and all variables listed in the next subsection.

3.2   Descriptions of fields of configuration file

Directives of db_import. IVS_DB_URL: URL for the root directory in the IVS Data Center the which contains database file. Value should end with letters "db/". Since there are several IVS Data Centers user can select the Data Center to which connection is the best. WGET_EXE: Filename with path of program wget. GZIP_EXE: Filename with path of program gzip. TMP_DIR: Name of temporary directory. GET_FILE: Filename with path of the file which lists database filenames to be downloaded. Database filenames may contain wild-card symbols * and ? which are interpreted by the same well as Unix shell. NB: normally database files are compressed by gzip in the IVS data center and have a suffix .gz . Suffix .gz is removed from the IVS database filenames before comparison and therefore filenames in GET_FILE should not contain it. Format of GET_FILE: records of variable length. Each record contain one filename which user desires to download. Trailing blanks are ignored. Lines which starts from ## are considered as comments. GET_FILE: Filename with path of the file which lists database filenames NOT to be downloaded. Database filenames may contain wild-card symbols * and ? which are interpreted by the same well as Unix shell. NB: normally database files are compressed by gzip in the IVS data center and have a suffix .gz . Suffix .gz is removed from the IVS database filenames before comparison and therefore filenames in GET_FILE should not contain it. Format of NOGET_FILE is the same as GET_FILE: records of variable length. Each record contain one filename which user desires to download. Trailing blanks are ignored. Lines which starts from ## are considered as comments. INCOMING_DIR: Directory name where geo_import will put new databases. It is desirable to keep this directory at the local machine. LOG_FILE: Filename with path of the log file. geo_import records names of the files retrieved from the IVS Data Center. DATE_START: Start date. geo_import would retrieve files for experiments date conducted at DATE_START and later. Experiment date is derived from the database name. format of DATE_START: yyyy.mm.dd where yyyy -- integer year number, mm -- integer month number, dd -- integer day number. Leading blanks are replaced with blanks. For example, 2000.12.02 (December 2, 2000) DATE_END: End date. geo_import would retrieve files for experiments date conducted at DATE_END and before. Experiment date is derived from the database name. format of DATE_END: yyyy.mm.dd where yyyy -- integer year number, mm -- integer month number, dd -- integer day number. Leading blanks are replaced with blanks. For example, 2000.12.02 (December 2, 2000) NB: geo_import converts dates in future to tomorrow dates what prohibits crazy databases which corresponds to the experiment which was not yet observed. MAIL_COMMAND: Name of the command (with possible switches) for sending mail in non-interactive mode. mailx is recommended (but check, whether your system has it) EMAIL_IMPORT: db_import may send a message about successful downloading new data. If value of EMAIL_IMPORT is NO then no e-mail messages is sent. Any other values of EMIAL_IMPORT considered as a blank separated list of e-mail addresses of the respondents which receive notification. Settings for geo_import dbi_exec Full name of the db_import executable. solve_install puts db_import executable in $MK4_ROOT/bin/db_import . NB: check whether variable MK4_ROOT is defined in the context of the process which executes geo_import. dbi_stopfile Full name of the so-called stop file. Normally this file should not exist. If geo_import encountered an error then it sends the error message and creates stop file. If there exist a stop file then geo_import doesn't try to execute db_import and terminates quietly with status code 0 without attempt to retrieve database. This mechanism prevents incessant flux of e-mail with error message from geo_import scheduled by cron (Solve administrator would receive only one error message), but requires "recovery action" after receiving error message: stop file should be manually removed in order to re-activate geo_import. host_name Host name where geo_import is running. email_failure Blank-separated list of e-mail addresses of the persons who will receive error messages from geo_import. Failure of geo_import is unlikely, but if it occur, geo_import is suspended: stop file is created and geo_import will block other attempt to retrieve database files until stop file is manually removed. This list may be empty. NB: if the list contains blanks it should be enclosed in ". mail_command Name of the command (with possible switches) for sending mail in non-interactive mode. mailx is recommended (but check, whether your system has it) verbosity Verbosity level. 0 should be set for normal operation. Value 1 will force appearance of informational messages which are interesting for developer.

3.3   Example of configuration file

############################################################################# ## ## ## DB_IMPORT configuration file for Goddard Space Flight Center (GSF) ## ## ## ## 24-OCT-2000 09:40:35 ## ## ## ############################################################################# # IVS_DB_URL: ftp://cddisa.gsfc.nasa.gov/vlbi/ivsdata/db/ # WGET_EXE: /users/pet/bin/wget # GZIP_EXE: /usr/local/bin/gzip # TMP_DIR: /home/vlbiagent/temp/ # GET_FILE: /data27/mk4/local/GSFC.dbi_get # NOGET_FILE: /data27/mk4/local/GSFC.dbi_noget # INCOMING_DIR: /box3/incoming/ # LOG_FILE: /data1/save_files/dbi_GSFC.log # DATE_START: 2000.01.01 # DATE_END: 2010.01.01 # MAIL_COMMAND: mailx # EMAIL_IMPORT: dgg cal #! set dbi_exec = "/data27/mk4/bin/db_import" set dbi_stopfile = "/data27/mk4/local/GSFC.dbi_stop" set email_failure = "pet fgg" set host_name = "bootes" set mail_command = "mailx" set verbosity = 0

4   Customization

geo_import and db_import are created during Solve installation. Source code resides in the directory $MK4_ROOT/utils/db_import . In order to use geo_import you have to do several steps: 1) create db_import configuration file in directory $MK4_ROOT/local under name <Center_label>.dbi You have to decide from which IVS Data Center you are going to get databases and in which date range. You have also assign e-mail address of the persons who will receive e-mail message about retrieving new database files and about geo_import failures. 2) create files GET_FILE and NOGET_FILE If you would like to get all files then you can specify * in GET_FILE and keep NOGET_FILE empty. If you don't like to receive preliminary versions of databases you can specify *_V001 *_V002 *_V003 in NOGET_FILE. 3) decide which "user" will run geo_import regularly. It may be account for normal user or a "pseudo-user" for batch jobs. That account should have normal user privileges plus ability to use cron. Check: "crontab -l" whether the account has this privilege. NB: don't run geo_import from root account! 4) copy $MK4_ROOT/utils/db_import/geo_import_cron.sh.templ to geo_import_cron.sh and $MK4_ROOT/utils/db_import/geo_import.crn.templ to geo_import.crn geo_import_cron.sh calls db_import under sh-shell. Path to db_import should be changed there. geo_import.crn calls geo_import_cron.sh every hour. You should change path to geo_import_cron.sh and you can change schedule of launching gep_import_cron.sh . Refer to description of Unix crontab command. 5) Test geo_import_cron.sh . If it works correctly and doesn't send error messages then you have to add it to the cron table of the account which is running geo_import: crontab geo_import.crn That is all.

5   Hints

If you scheduled geo_import by cron you can change configuration file, get and noget files at any time. Their changes will be applied to the next time when db_import is called. geo_import puts files in some incoming directory. It is wise to have this directory as "VLBI Catalogue area" or by another words to make it visible for program catlg. If you removed databases from incoming directory but didn't imported them in your VLBI catalogue system, db_import will stubbornly retrieve them again. If db_import retrieved the database file which you don't want to keep in incoming and are not going to import it in you VLBI catalogue, the only way to secure that geo_import will not retrieve it again is to add the database file name to the noget-list. If geo_import retrieved the database file version x, it will not automatically retrieve the database version y when it appears in the IVS Data Center, provided that the version x is already either in the incoming directory or in your VLBI catalogue system. If you don't need x version of the database but you are interesting only in version y, then you first have to add the database filename with version x in the noget-file, then to remove the database file from the catalogue system and from the the incoming directory. You can use geo_import for browsing certain list of databases: it is enough to specify their names (without suffix .gz) in the get_file. You can use geo_import without cron by launching it manually. In order to stop retrieving you have to 1) kill db_import process 2) remove geo_import.crn from cron-table If you got the error message from geo_import you have to re-animate it by removing stop file. geo_import will be totally disabled unless you removed stop file.

6   History

2000.10.17 L. Petrov Beginning of development. 2000.10.19 L. Petrov Release of version 1.00 2000.10.26 L. Petrov Development of documentation is completed.



Questions and comments about this guide should be sent to:

Leonid Petrov ( pet@leo.gsfc.nasa.gov )


Last update: 2000.10.23