PIMA User Guide

Date of last modification: 2017.03.15_22:35:09

This document describes how to use software PIMA for processing VLBI visibility data. PIMA performs data calibration, fringe fitting, and exporting results of fringe fitting in the form that can be digested by VTD/Post-Solve and Difmap software for astrometry/geodesy analysis and for imaging.


Contents:

Introduction

PIMA is software for processing the visibilities data from VLBI experiments. It performs data inspection, data calibration, and fringe fitting. PIMA is designed to process multi-source experiments that are common for astronomical surveys and geodesy observations. PIMA has output interface with AIPS, DIFMAP, and VTD/Post-Solve software.

Principles of PIMA

PIMA processes visibility data in FITS-IDI format. PIMA does not transform and does modify original data. At the first step PIMA "loads" the data, i.e. examines the specified set of visibility data in FITS-IDI format and creates numerous internal indexing tables that are written in disk. For performing all other operations PIMA uses these tables for getting access to specific fields of input FITS-IDI files.

PIMA has a flexible command-line interface and it is designed for a non-interactive use. PIMA is ideal for being incorporated into scripts for shell, python or similar interpreters.

All control parameters that are needed for processing a given experiment are gathered in a control file. PIMA does not support any defaults: all parameters, even those that are not used for a specific operation, are to be explicitly defined in that file.

PIMA supports the following general syntax:

pima control_file task [qualifier value...] [keyword: value...] where PIMA supports a number of optional kludge parameters that alter normal processing in a form of keyword: value. The have a prefix PIMAVAR_. They can be either defined in the control file or put in the command line, or defined as environment variables.

Frontend wrappers

PIMA provides several frontend wrappers. They accept arguments, perform some operations and finally call PIMA. A user does not have to use wrappers. They are provided just for convenience. Though, using wrappers sets some restrictions on how to name PIMA related files. If you are going to use wrappers than the file name are supposed to obey the following convention:
  1. Files related to a certain experiment reside in subdirectory VVVVV, where VVVVV is the root directory of vlbi experiments specified by --pima-exp-dir during configuration.
  2. Control file has name VVVVV/EEE/EEE_B_pima.cnt where B is band name in lower case. For instance, the control file for band C (4.3 GHz) for experiment bp192c0 is sought in /vlbi/bp192c0/bp192c0_c_pima.cnt, provided PIMA was configured with --pima-exp-dir=/vlbi.
  3. Wrappers will create the following files
    • VVVVV/EEE/EEE_load.log – log of task load the database.

    • VVVVV/EEE/EEE_B_nobps.fri – results of fringe fitting in the coarse mode.

    • VVVVV/EEE/EEE_B_coarse.log – log of fringe fitting in the coarse mode.

    • VVVVV/EEE/EEE_B_fine.log – log of fringe fitting in the fine mode.

    • VVVVV/EEE/EEE_mkdb.log – log of generation a database in GVF format.

    • VVVVV/EEE/EEE_splt.log – log of task splt.

    • VVVVV/EEE/EEE_B_gain.log – listing with used antenna gains.

    • VVVVV/EEE/EEE_B_map.log – log of automatic data imaging.

Several wrappers are provided. Among them are

Graphic interface

PIMA uses graphics interface DiaGI based on PGPLOT library. DiaGI displays plot into X-window. It allows to resize plot, change plot appearance, inquire a point, make a hardcopy, etc. Refer to DiaGI documentation for details. It is assumed in this manual a reader is already familiar with DiaGI interface. DiaGI documentation can be found here:
DiaGI doc-1, Diagi doc-2, and Diagi doc-3.

Minimalistic workflow

The workflow of the minimalistic, simplified analysis:

Creation of a configuration file

Supported keywords are described in
pima_keywords.html document. Control file contains lines with pairs keyword: value. Lines that start with # are considered as comments. The first and the last line of a control file is its label # PIMA_CONTROL file. Format Version of 2016.02.29

Keyword names are in upper case and are terminated by column. Values are case sensitive. If the same keyword is defined more than once, the last definition takes preference. The pair keyword: value defined in the control file are processed as if they appended to the end of the control file. There are two exceptions: UV_FITS and INTMOD_FILE. More than these keywords are allowed for a case when several input files should be defined.

Some values in PIMA control file are file names. Although you can define relative file names, defining absolute file names is encouraged.

A user rarely creates a configuration file from scratches. Usually, a control file from a similar experiment is copied to a new name and a user edits it. A user should check carefully every keyword. The following keywords are the most commonly need to change:

Loading the data

Task data loading is the first task. PIMA parses the control file, finds the data with visibilities in FITS-IDI format, and reads them. It gathers information about station names, sources names, frequencies, a priori models, system temperature, gain, weather information, phase calibration, cable calibration, etc. It reads all cross-correlations and auto-correlations, associates them, and checks for their consistencies. Then PIMA splits the data into scans and observations. It creates scan tables, observation tables and associates observations with indices of visibilities. All tables are written into a binary file SSSSS/EEE.pim, where SSSSS is the scratch directory specified in the keyword EXPER_DIR and EEE is the experiment name specified in the keyword SESS_CODE of the PIMA control file. Data loading takes from 20 seconds for small experiments to 2–4 hours for experiments with visibility files of terabyte size. All other PIMA operations will read file SSSSS/EEE.pim and use indexing tables. Unlike to AIPS or CASA, PIMA does not rewrite input visibility data in its own format. Instead of it, it creates indexing tables and uses this tables when it needs to collect visibilities for processing a given observation. The advantage of this approach is that no intermediate files is created. The disadvantage is that reading of visibility data may become inefficient when if the input data file that reside on magnetic hard-drive are larger than than the amount of available operative memory due to limitations related to a design of FITS-IDI data. PIMA does not need input information about learning how the data are split into scans: it does it itself. The advantage of this approach is that PIMA will process the data even if any auxiliary information is lost. The disadvantage of this approach is that PIMA can split the data into scans not the same way as an observer designed the experiment.

The first operation of task load is parsing control file. VTD control file specified in PIMA control file is also parsed. Finally, the experiment description file specified in the keyword MKDB.DESC_FILE is parsed. Any errors, such as syntax errors or files that do not exist are reported. PIMA will stop and issue an error message in a case of errors.

In the next step PIMA will check every source name first in the file specified in keyword SOU_NAMES, then in catalogue files specified in VTD control file. If it finds at least one source not in the catalogue, PIMA will issue the error message and print the list of missing source names and their coordinates extracted from the FITS file.

Then PIMA will check every station name first in the file specified in keyword STA_NAMES, then in catalogue files specified in VTD control file. If it finds at least one station not in the catalogue, PIMA will issue the error message and print the list of missing station names and their coordinates extracted from the FITS file.

Then PIMA check frequencies in each files and creates the global frequency table for the entire experiment. It converts low side band intermediate frequencies tables (LSB IF) into upper side band IFs by re-ordering frequencies of the channels within each IF for them to following in the ascending order. It merges or combines frequency groups if requested. Finally, it tables of cross indices from the original frequency tables frequency groups to the global frequency table, frequency groups and vice versus.

Next step is to read all visibility data. Visibility data are sorted, cross-correlation data are linked to autocorrelation data, and tables of time indices, cross-correlation indices and auto-correlation indices are created. PIMA checks for organ visibilities: cross-correlation visibilities without autocorrelation and auto-correlation data within matching cross-correlation data. These visibilities are added to the list of "bad data".

Then PIMA splits the data into scans. By that time the data are chronologically sorted. There are three parameters in the PIMA control file that controls the process of data splitting: MIN_SCAN_LEN, MAX_SCAN_LEN, and MAX_SCAN_GAP. PIMA sets a preliminary scan boundary when a source is changed. If it does not find valid visibilities for MAX_SCAN_GAP seconds after the last valid visibility of the previous source, it sets the end of scan of the previous source. If duration of the time from the first valid visibility of a given scan is longer than MAX_SCAN_LEN, a border of a scan is set, and a new scan starts. That means that a scan cannot be longer than MAX_SCAN_LEN seconds and it cannot have a gap longer than MAX_SCAN_GAP. At the same time scans of different sources may overlap, i.e as scan B may have start and stop time within the interval of start and stop time of the scan A. Scans shorter than MIN_SCAN_LEN seconds are eliminated and the visibilities within such short scans are marked as bad.

The choice of MIN_SCAN_LEN, MAX_SCAN_LEN, and MAX_SCAN_GAP is determined by scheduling goals and the correlator setup. Usually MIN_SCAN_LEN is set to have at least three accumulation periods, otherwise fringe fitting process may fail. For non-phase referencing experiment MAX_SCAN_LEN can be set to the scan length set by the schedule. Experiments at 22 GHz and higher MAX_SCAN_LEN can be set shorter to be close to the coherence time. MAX_SCAN_GAP can be set to 1/2 of the scan length to prevent scan split in a case of data loss within a scan. For scan-referencing observations MAX_SCAN_LEN is set to the cycle duration and MAX_SCAN_GAP is set to 90% of MAX_SCAN_LEN. It should be noted that PIMA allows to use a portion of a scan in data analysis, but it cannot unite two scans. Parameter SCAN_LEN_USED and SCAN_LEN_SKIP allows to set up continuous portion of a scan for fringe fitting and split after load task. But PIMA cannot increase scan length after task load is done. If a user needs to change scan allocation or increase scan length, task load should be re-run. NB: if a new run of task load changes the total number of observations, fringe fitting should be re-run, since the stale fringe results have different scan and observation indices.

After PIMA split the data into scans, it checks all cross- and auto- visibility data whether they are claimed by scans. All visibilities not claimed by scans are marked as bad.

If PIMA finds at least one bad visibility, PIMA stops withe an error message. Since getting bad visibilities is rather a common situation, PIMA has a mechanism to accommodate them. PIMA control file supports keyword UV_EXCLUDE_FILE that defines a file with indices of visibilities, either cross or auto, that are to be excluded at the very beginning. These visibilities are excluded from analysis, and PIMA cannot mark them bad because it does not see them. PIMA supports a special value of parameter UV_EXCLUDE_FILE: AUTO. If value AUTO is specified, than when PIMA finds bad points, it writes visibility indices in the so-called bad visibility file at SSSSS/EEE_uv.exc file, where SSSSSS is EXPER_DIR and EEE is SESS_CODE. If that file already exists, PIMA appends new visibilities to that file. When UV_EXCLUDE_FILE: AUTO, tasks load is executed several times. The first time PIMA finds bad points, puts them in the SSSSS/EEE_uv.exc file and stops with the exit code 23. The second time the bad points in SSSSS/EEE_uv.exc are read and excluded from the subsequent analysis. Usually two runs are sufficient. Sometimes the 3rd and 4th is required. Wrapper pf.py executes the 2nd, 3rd and 4th run automatically. NB: pf.pypurges SSSSS/EEE_uv.exc file if it exists.

After splitting the data into scans, PIMA reads and parses phase calibration, system temperature, weather information, interferometric model and interferometric model components. Any these parameters may be missing in the FITS-IDI file. In such cases PIMA issues a warning, but proceeds.

If PCAL: NO is specified in the control file, PIMA will skip phase calibration information present in the FITS-IDI file(s). Keep in mind, if PCAL: NO was specified during loading, PIMA cannot re-enable phase calibration later within running task load again. If phase calibration was loaded and cane be disabled for entire experiment or for the specified station(s) and re-enabled again. If phase calibration is not available for some scans at some stations, such observations are flagged as bad and are skipped for fringe fitting and other operations, unless phase ca libation is disabled for the entire experiment by specifying PCAL: NO or by disabling pcal at both stations of the baseline of that observation. It should be noted that bandpass and fringe results will be different whether phase calibration was used or not. Therefore, if phase calibration status was changed, bandpass should be re-generated and fringe fitting re-done.

By 2016.01.01 only VLBA put model and all calibration information into the FITS-IDI data. Lack of calibration information does not prevent PIMA to run fringe fitting by my prevent further tasks. For instance, task splt cannot run of no Tsys and/or antenna gains is available. Task mkdb cannot run if not interferometric model is available. VLBA hardware correlator and DiFX version 2.0 and newer puts phase calibration information into FITS-IDI. Other correlators do not to do it. Missing weather information, Tsys can be loaded by PIMA task gean using results of parsing log-files. Missing antenna gains can be loaded by PIMA task gean from external gain files. Missing interferometric for VERA, SFXC, and KJCC correlators can be loaded by task moim from external model files in native format that were used by the correlator.

Parsing log files

This is the most frustrating part of data analysis. If you have data from VLBA, you do not need to run parsing log files. Parsing log files from the KVN and VERA is very straightforward. Unfortunately, parsing log files generated by the Field System developed in the Goddard Space Flight Center often fails, because the format of field system log file is changed without notice, and the developer who maintains the field system refuses to cooperate.

PIMA can directly import log file in VLBA format or in the PIMA ANTAB format. Non-VLBA logs are parsed by program log_antab and transformed to PIMA ANTAB format. Program log_antab extracts system temperature, if present, nominal on-off time tags, cable calibration and meteorological information. Modern VLBI analysis does not use in-situ meteorological information, and uses instead of that the output of numerical weather model. The current version of PIMA does not use nominal on-off time tags from log files, since this information is already used by the correlator. PIMA can compute on-off from data actual time tags when the antenna was on source. The use of cable calibration in analysis is discretion and rarely improves the fit, and sometimes significantly degrade it. But the use of system temperature is critical for imaging. When PIMA produces the calibrated averaged visibilities, it discards observations without system temperature.

Syntax of the program for parsing log-files:

     Usage: log_to_antab {mode} {log_file} {antab_file} [year]
where The main difficulty is in extraction system temperature from field system logs. The parsing software needs to identify Tsys record, extract the array of Tsys and match that array with sky frequencies. It needs to determine intermediate frequency with respect to the frequency of the local oscillator, to determine the frequency of the local oscillator, match them, find tsys record, determine to which BBC a field in tsys record belong and match the field.

NB: An analyst should always examine the output of log_antab program. Typical failure: log_to_antab fails to determine sky frequencies. Possible reasons: a log file may have a portion at the beginning or at the end that is related to another experiment, a new change of log format. In the first case, editing a log file solves the problem. In the latter case you need to patch log_to_antab. Please try not to break its ability to parse other log files. If everything else fails, you can either develop your own parser or to parse a log file by hand. Keep in mind that some station do not record Tsys at all.

Wrapper pf.py supports log parsing. The following command does this:

usage: pf.py EEE B logs
where EEE is the experiment name and B is the band. It creates output file EEE_AA.ant, where AA is a two character long low case antenna name.

Calibrating the data

FITS-IDI visibility file is supposed to have all calibration information inside. However, only Socorro correlator inserts all calibration information into FITS-IDI. Visibility files from all other correlator missing some or all calibration information.

Although the visibility data from the old hardware VLBA correlator has all calibration information, it is recommended to re-load it since in some cases the calibration information is not correct and re-loading fixes the problem. There is no need to reload calibration information to FITS-IDI files generated in Socorro by DiFX 2.0 and newer.

Field System log files contain a) on-off start/stop scan time; ' b) meteorological information; c) cable calibration; d) frequency table; e) system temperature. VLBA log files contain phase calibration phases and amplitudes in addition to that.

PIMA task gean inserted the calibration tables into PIMA internal data structures. Task gean requires a qualifier that is followed by the value. The following qualifiers are supported:


NB: Antenna gain, system temperature, meteorological information and cable calibration does not change result of fringe fitting. Therefore, these calibration can be applied after fringe fitting. Phase calibration affects result of fringe fitting. Therefore, it is supposed to perform this kind of calibration before fringe fitting. If you changed phase calibration, phase calibration status (pcal_on, pcal_off), you have to redo bandpass calibration and fringe fitting. Otherwise, you will get wrong results.

Examine raw data and calibration information

An analyst must always examine the data and calibration information before running fringe fitting as carefully as possible. If an error will not be noticed at the initial examination, then the analysis will be have be redone. Attentiveness during early examination saves time and reduces the probability that the error will not be noticed and will lead to an erroneous result.

Running coarse fringe fitting

The goal for coarse fringe fitting is preparation for bandpass computation and for initial data examination. Coarse fringe fitting uses single polarization data (RR for dual-polarization data), does not use bandpass, because usually bandpass is not known at that time, and uses no oversampling in order to speed up computation, and performs the parabolic fine fringe search. Therefore, the following parameters are always set:
BANDPASS_USE:          NO
BANDPASS_FILE:         NO
POLARCAL_FILE:         NO
FRIB.OVERSAMPLE_MD:    1
FRIB.OVERSAMPLE_RT:    1
FRIB.FINE_SEARCH:      PAR
MKDB.FRINGE_ALGORITHM: DRF
If the bandpass mask file is not available, then BANDPASS_MASK_FILE: NO is set. POLAR: RR is set for dual-polarization or RR data and POLAR: LL is set to LL-polarization data. Usually PIMA runs in both coarse and fine fringe fitting mode. It is desirable to store results of coarse and fine fringe fitting in separate files. Therefore, when we run coarse fringe fitting, we set FRINGE_FILE and FRIRES_FILE into different files than those specified in the PIMA control file.

To run coarse fringe fitting, task frib is used. PIMA wrapper pf.py simplifies running coarse fringe fitting. Syntax:

 Usage: pf.py EEE B coarse 
where EEE is the experiment name and B is band in lower case. Wrapper will write fringe result in VVVVV/EEE/EEE_B_nobps.fri and fringe fitting residuals into VVVVV/EEE/EEE_B_nobps.frr files.

Computation of complex bandpass

Computation of the complex bandpass is the second major task that requires human intervention. The bandpass of the ideal system is rectangular shape for the amplitude and zero for phase. That means that cross-correlation spectrum of a signal from a radio sources with continuum flat spectrum is also flat with some constant phase offset and the multiplicative factor that is proportional to the square root of the products of Tsys at both stations. Unfortunately, up to date perfect VLBI hardware is not yet developed. The cross-correlation spectrum diverts from the ideal (flat phases and flat amplitudes). The use of phase-calibration may alleviate the deviation from the ideal spectrum, may have no visible effect or may even degrade it, but never fixes phases. Therefore, normally a complex function of frequency is computed that being multiplied by the cross-spectrum makes it flat. This function can be computed reliably using sources with SNR > 200, but preferably with SNR > 1000. A good principle investigator does not hesitate to spend a sizable amount of allotted time for observing bright sources that are used as calibrators. Bandpass calibration can still be performed using source with SNR 40–200, but less reliable. Quality of bandpass derived using observations with SNR in a range 10–40 is questionable. Bandpass calibrator sources are supposed to be continuum with the spectrum flat within an IF. Any active galaxy nuclea satisfies this condition.

In a case when phase calibration is applied, the bandpass is computed with respect to the phase calibration. If you use all tones of phase calibration, you should first clean them and mask out the tones affected by internal radio interference (spurious signals).

It is assumed the residual bandpass is stable with time. An experiment may have jumps in the bandpass due to power-off power-on of the VLBI hardware at one or more stations. Unfortunately, as of 2016.02.01 PIMA does not provide a convenient way for processing such data. The workaround is to effectively split the dataset into two subsets before and after the jump and compute two bandpasses. Splitting the dataset can use made using keywords OBS, INCLUDE_OBS_FILE, and EXCLUDE_OBS_FILE. Fortunately, jumps in bandpass occur in less than 5% VLBI experiments.

An analyst usually sets the mask for cross-spectrum data during this step. The mask is an array of 1 and 0 that depends on spectral channel index. PIMA multiplies the mask by visibilities when it processes the data. Zeroes in the mask effectively replaces the spectral channels with zeroes. Usually unwanted potion of the spectrum is masked out: either affected by RFI or affected by hardware bandpasses. If a signal is narrow-band, for instance from an stellar maser, then masking allows to discard spectral channels that have no signal, but only noise.

Cleaning phase calibration

If you use four or more tones of phase calibration per IF, you should first clean them and mask out the tones affected by internal radio interference. Old VLBA hardware extracted two phase calibration tones per IF. PIMA treats this case as a single tone per IF. A user can select which tone to use. If less four tones per IF was extracted in your experiment or you do not apply multiples phase cal tones per IF, just skip this section.

Usually, phase calibration signal is a rail of very narrow-band signals with frequency separation 1 MHz which less than the spectral resolution of visibility data. PIMA interpolates spectrum of phase calibration within each IF. In a case if only one tone per IF is used PIMA considers phase spectrum of the calibration signal is flat, i.e. it assigns the phase calibration phase to all spectral channels. In a case if all phase calibration tones are used, PIMA unwraps phases and performs linear interpolation or extrapolation. The presence off spurious signals distorts calibration phases. If they affect a small fraction of all tones, the tones affected by spurious signals can be masked out. Then PIMA will automatically interpolated between tones that are not affected.

Task mppl shows phase and amplitudes of phase calibration signal with multiple tones per IF. It may be useful to use keyword OBS to control which observations to use for generating plots.

Task mppl shows several types of plots. Raw phase usually is not informative, since the calibration signal may have many phase turns per IF. Let PIMA to unwrap phase for you. Plot of unwrapped phases (U) shows the spectrum: unwrapped phases (green) and modeled phases (blue). Bandpass is supposed to be smooth. Jumps in phases is due to spurious signals. The sum of the phase calibration tone and the spurious signal depends on the phase of the phase calibration tone itself. Therefore, if at a given plot of a given observation you see phase that does not strongly deviate from the smoothed curve that does not necessarily means that other observation will not be affected even if the source of spurious signals does not depend on time. Mode (C) displays both unwrapped phase and amplitude at the same plot. Since spurious signal affect both phase and amplitude of the calibration signal, this plot helps to identify frequencies affected by spurious signals.

OK, you found a peak at the plot that you believe is due to the spurious signal. What further? PIMA supports so-called phase calibration mask file specified by the keyword PCAL_MASK_FILE. This file defines the value, 0 or 1, for each phase calibration tone. If the value is zero, then that calibration tone is masked out and not used for computation of the smoothed curve that interpolates the phase calibration signal across an IF. One may edit this file manually, but in general, it is too boring. PIMA supports so-called mask definition files that allows to write which calibration tones to suppress in a concise way. It allows to define ranges of the tones that are to be mask out. An analyst edits the calibration mask definition file, converts it to the phase calibration mask, visualizes the phase calibration phases and/or amplitudes and repeat this procedure till a satisfactory result is produced.

Phase calibration mask definition file consists of records of variable length. The first record identifies the format. It should always be be # PIMA PCAL_MASK_GEN v 1.00 2015.05.10 Lines that start with '#', except the first one, are considered comments, and the parser ignores them. Mask definition records consists of 8 words separated by one or more blanks PCAL STA: ssssssss IND_FRQ: aa-bb IND_TONE: xx-yy OFF

where sssssss is the station name, aa is the index of the first IF of the range, bb is the index of the last IF of the range, xx is the index of the first tone in a given IF range and yy is the index of the last tone in a given IF range. Here is an example:

# PIMA PCAL_MASK_GEN  v  1.00 2015.05.10
#
#  Phase calibration mask definition file for VLBI experiment VEPS02
#
#  Last updated on  2015.05.13_12:35:17
#
PCAL  STA:   KUNMING     IND_FRQ:   1-1   IND_TONE:   30-31  OFF
PCAL  STA:   SESHAN25    IND_FRQ:   8-8   IND_TONE:    4-4   OFF
PCAL  STA:   URUMQI      IND_FRQ:   1-16  IND_TONE:    1-1   OFF
The first line identifies the format. The file defines the mask that deselects tones with indices 30–31 of the first IF for station KUNMING, the tone with index 4 for the 8th IF for station SESHAN25, and the first tone in all IFs from the 1st to the 16th for station URUMQI.

PIMA does not accept the mask definition file directly. Task pmge transforms the phase calibration mask definition file into phase calibration mask file. That task requires qualifier mask_gen with value mask definition file. The name of the output file is defined by keyword PCAL_MASK_FILE in the PIMA control file. Example:

pima ru0186_x_pima.cnt pmge mask_gen ru0186_pcal_mask.gen
Comments:

Masking auto- and cross-correlation spectral channels

PIMA allows to mask out specified channels of either auto or cross-spectrum or both. The auto-correlation spectrum is corrupted by the presence of internal RFI generated in the vicinity of the data acquisition system. Usually the internal RFI causes appearance of peaks in the auto-correlation spectrum. As a rule of thumb peaks with the amplitude less than 1.5 of the average amplitude can be safely ignored and peaks with the amplitude greater than the average by a factor of 2 should must be masked out since they noticeably affect the fringe fitting procedure and distort the estimate of the average phase and the amplitude. Peaks with the amplitude in a range [1.2, 2] of the average autocorrelation are in the border line. It is not recommended to mask out auto-correlation at the edge of the IFs, since during data processing PIMA interpolates auto-correlation.

Cross-correlation spectrum can be distorted by external RFI, such as satellite radio. Hardware problem or errors in the hardware setup may cause cause a significant drop in the amplitude or a total loss of signal either at the entire IF, or a range of IFs, or a portion of IFs. If there is no signal in given spectral channels, the SNR will be reduced and weak sources may not be detected. Masking out unwanted noise improves the SNR. Cross-correlation spectrum from narrow-band targets, such as masers or satellites may not have signal beyond edges of the bandwidth even in the absence of hardware failures.

PIMA supports mask file specified by the keyword BANDPASS_MASK_FILE. This file defines four sets of values, 0 or 1, for each phase spectral channels. PIMA multiplies visibilities by the mask, which effectively disables spectral channels that corresponds to mask with value 0. The first mask affects autocorrelations, and three remaining masks affect cross-correlations. The second mask used used only for computation of bandpass, the third mask used for fringe fitting, and the fourth mask is used by task splt for computing visibilities averaged over frequency and time. Usually the second, the third, and the fourth masks are the same, i.e. a common mask for cross-correlation is used. Autocorrelation and cross-correlation masks are usually different since different factors lead to necessity to mask auto-correlation and cross-correlations.

If the mask for a given cross-correlation is zero, the corresponding visibility is replaced with zero. If the mask of a given auto-correlation is zero, the auto-correlation at a given spectral channel is computed by linear interpolation between adjacent channels, or linear extrapolation, if the masked channel is at the edge of the IF. The corresponding cross-correlation is not affected. Very often strong unmasked spurious signals that results in autocorrelation greater than 2–5 of the average level usually affect the cross-correlation as well. Therefore, it is prudent first to mask out strong spurious signals at autocorrelation and then check cross-correlation spectrum.

An analyst may create the mask file by hand, but this is a tedious work. PIMA supports so-called bandpass mask definition files that allows to specify the spectral channels that are to be suppressed in a concise way. The bandpass mask definition file allows to define ranges of the spectral channels that are to be mask out. An analyst edits the bandpass mask definition file, converts it to the bandpass mask, visualizes the cross- and auto- phase and amplitude spectra, and repeat this procedure till a satisfactory result is produced.

A bandpass mask definition file consists of records of variable length. The first record identifies the format. It should always be be # PIMA BPASS_MASK_GEN v 0.90 2009.02.05 Lines that start with '#', except the first one, are considered to be comments, and the parser ignores them. Mask definition records consists of 8 words separated by one or more blanks mmmm STA: ssssssss IND_FRQ: aa-bb IND_CHN: xx-yy ddd

where mmmm is the mask type: one of AUTC (autocorrelation mask), BPAS (bandpass mask), FRNG (fringe fitting mask), SPLT (split mask), CROS (bandpass+fringe_fitting+split masks), ALL (all masks: autocorrelation+bandpass+fringe_fitting+split) sssssss is the station name, aa is the index of the first IF of the range, bb is the index of the last IF of the range, xx is the index of the first spectral channel in a given IF range and yy is the index of the last spectral channel in a given IF range; ddd is disposition: ON or OFF. Station name may be substituted by ALL, what means the definition affects all stations.

The first definition sets the default disposition: ON or OFF. Unless you have really pathological experiment and you have to disable the majority of spectral channels, the first definition is ALL STA ON, what means to enable all spectral channels The definitions are processed consecutively. Each new definition alters the mask defined by priori definitions. Here is an example:

# PIMA BPASS_MASK_GEN  v 0.90 2009.02.05
#
#  Control for bandpass mask generation for VLBI experiment BP192B3
#
#  Created on 2015.11.14_12:33:11
#
ALL    STA:   ALL         ON
#
ALL    STA:   ALL         IND_FRQ:   3-3    IND_CHN: 121-129   OFF
CROS   STA:   KP-VLBA     IND_FRQ:   1-2    IND_CHN:   1-512   OFF
AUTC   STA:   FD-VLBA     IND_FRQ:   4-4    IND_CHN: 431-436   OFF
The first line identifies the format. The first non-comment line sets the initial mask 1. The second non-comment lines disables both autcorrelations and cross-correlations for the IF #3, spectral channels 121 through 129. The third line disables cross-correlation in IFs #1 and #2 (that experiment has 512 spectral channels per IF) for station KP-VLBA. The fourth line disables autocorrelation for FD-VLBA in spectral channels from 431through 436 in IF #4 keeping autocorrelation.

Creation complex bandpass in the inspection mode

Task bpas compute complex band-pass. This task supports 4 modes: INSP (inspection), INIT (initial), ACCUM (accumulation), and FINE. Modes INIT and INSP differs only by the generated output: bpas in INSP mode generates plots, while bpas in INIT mode does not.

Task bpas takes as input result of fringe fitting. In general, it is not required to have fringe fitting results for all observations, although it is desirable. PIMA will find n observations with the highest SNR at each baseline with the reference station and will compute the complex bandpass using these observations. PIMA uses one observation with the highest SNR per baseline in INIT or INSP mode. It may happen that just the observation with the highest SNR is affected by RFI or has another problems. In such a case affected observation can be added into the exclude list.

In a case of dual-polarization observations two bandpasses are computed: for RR polarization and for difference LL minus RR polarization. The second bandpass is called polarization bandpass. The main bandpass is RR for single-polarization RR data, LL for single-polarization LL data, and RR for dual-polarization data.

Wrapper pf.py simplifies generation of the bandpass. It checks exclusion file VVVVV/EEE/EEE_B_bpas_obs.exc. If it does not find such a file, the wrapper creates an empty file with a comment line. It supports option -insp that overrides value of BPS.MODE.

 Usage: pf.py exp band obs [-insp]
When PIMA task bpas is invoked in the INSP mode, PIMA runs a cycle over baselines with the reference station and computes amplitude and phase bandpasses for an observation with the highest SNR among selected observations of each baseline with the referenced station. It displays two plots per baseline: amplitude plot and phase plot. An amplitude plots shows three function: autocorrelation (red), cross-correlation normalized to unity (blue) and the bandpass (green). The phase plot shows residual phase (blue) and phase bandpass (green).

When computing bandpass, PIMA re-runs fringe fitting for the selected observation. Blue lines in the amplitude and phase plots show just residual amplitudes and phases from that fringe fitting. Then a smooth curve is fitted to the residuals. PIMA supports two algorithms: fitting with a smoothing spline of the 3rd degree over n equi-distant knots or with a Legendre polynomial of degree b. The smoothing methods should be the same for phase and amplitude, but the degree of the Legendre polynomial or the number of knots for the spine can be different for amplitude and phase.

Smoothing mode is defined by keyword BPS.INTRP_METHOD. It can take values SPLINE or LEGENDRE. Phase initial bandpass is the smoothed model bandpass with opposite sign. The phase bandpass is normalized to have the mean phase over the band zero. PIMA will unwrap phase if phase bandpass exceeds ±π/2. Amplitude bandpass is normalized to unity. If BPS.NORML: IF, then the amplitude bandpass is analyzed over each IF separately. If BPS.NORML: BAND, then the amplitude bandpass is normalized over the band. Which mode to use? That depends on hardware. If system temperature is measured for entire band, then normalization should be made over the band. If system temperature is measured at each IF (or pairs of IFs) individually, than normalization over IF should be used. Autocorrelation is always normalized over IF.

PIMA task bpas in the inspection mode displays two plots per baseline: amplitude bandpass and phase bandpass. Reading these plots is an important skill. An analyst should identify spikes in autocorrelation amplitudes and mask them out. To identify a spike in autocorrelation, first set color index 3 by hitting C and 3, then move the cursor close to the spike and hit LeftMouse button. Dump file SSSSS/pima/EEE.frq helps to identify an IF/channel indices. Using this information, an analyst adds a record in the mask definition file. There is no firm rule when a spike should be mask out. Usually spikes with amplitudes greater than 2 should be mask out, and those with amplitudes in a range [0.8, 1.2] are kept. After all spikes in autocorrelation are mask out, a mask should be generated from the mask definition file using task bmge. Task bmge requires qualifier mask_gen with the value mask definition file. After than inspection of autocorrelation amplitudes should be repeated. NB: Amplitude of masked autocorrelation is set to zero. If necessary, mask definition file should be edited, mask re-generated, and the procedure repeated.

After cleaning autocorrelation spectrum, cross correlation should be cleaned. Several situations are rather common:

  1. There is no signal in one or more IFs mainly due to hardware error or errors in the setup. These IFs should be masked out entirely.
  2. Cross-correlation amplitude in one or more IF is lower than in others. These IFs may be masked out depending how strongly the amplitude is down. If the amplitude is lower by a factor of 4–6, such an IF brings more noise than signal and probably should be mask out. Masking out an IF with the amplitude less than 1.5 times will degrade the SNR and deteriorate the result. Amplitude loss in a range 1.5–5 is the border line.
  3. A significant portion of an IF has low amplitude. This may be caused by RFI or the receiver bandpass shape. It is recommended to masked out that portion of the bandpass.
  4. Amplitude at the edge of each IF is low. This is the most common situation. This is caused by the hardware. A general recommendation to mask out a portion of the band that is below some threshold in a range of 0.05%ndash;0.2.
A hint: task bpas in the inspection mode processes baselines in the alphabetic order. If a certain station requires heavy editing the bandpass generation file, you can run the task with OBS: num_obs qualifier, there num_obs is the index of the observation with the highest SNR at the baseline of interest.

Task bpas supports keyword BPS.AMP_MIN that defines the threshold on the fringe amplitude as the share of mean amplitude over the IF. Spectral channels with the amplitude threshold below BPS.AMP_MIN are ignored by task bpas. The bandpass to these channels is obtained by extrapolation from those channels that are in use. This may be useful if the bandpass has strong changes at channels with low amplitude, since in that case smoothing with Legendre polynomial and spline may not be robust.

Phase may become too noisy in experiment with high spectral resolution. In that case the residuals should be coherently averaged. The number of spectral channels averaged within an individual segment is defined by keyword. Value 1 stands for no averaging. A balance between averaging and spectral resolution should be maintained. From one hand strong averaging (high value of BPS.MSEG_ACCUM) results in less random noise in phase. From the other hand, averaging reduces spectral resolution and our ability to model the system response as a function of frequency. Scatter greater than 1&ndash2 rad may result in a failure to resolve phase ambiguity across an IF, which will lead to a wrong result. A general guideline is to select such averaging that the scatter of residual phases with respect to a smooth line be in a range of 0.05–0.2 rad for the sources used for bandpass computation.

When the analyst is satisfied with plots that task bpas generates in the inspection, mode, an analyst runs bpas task in the non-interactive mode to generate the final bandpass. There are three modes for computation of the bandpass: INIT, ACCUM, and FINE. PIMA computes the residual visibilities averaged over time with parameters of fringe fitting applies. In the INIT mode PIMA picks up for each baseline the observation with the highest SNR among the observations with the reference stations that are subject of filter OBS, INCLUDE_OBS, and EXCLUDE_OBS keywords. It normalizes the amplitude to have the mean value to unity either over the IF or over the band depending on BPS.NORML keyword. The residual phase is normalized to have mean value and mean rate to zero over the bandwidth regardless of the value of BPS.NORML. PIMA smooths the residual phases and residual amplitudes and the inverts them: flips the sign of phase band pass, replaces the residual amplitude with the quantity reciprocal to that for each spectral channel, and combines them to form array of complex numbers. These quantities are called initial complex bandpass.

Initial bandpass is computed using only one observation per baseline. If an analyst selected BPS.MODE: INIT, the task bpas stops here. If a user selected ACCUM or FINE mode PIMA selects N observations per baseline with the highest SNR beyond that that was used in the INIT mode. It applies the initial bandpass determine residual phases and amplitudes and averages them out. It reverses sign of residual phases, replaces normalized residual amplitudes, combines them in the array of complex numbers and multiplies it by the initial bandpass. The result is called accumulated bandpass. The number of observations per baseline used for computation of the accumulation bandpass is controlled by two parameters: BPS.NOBS_ACCUM and BPS.SNR_MIN_ACCUM. PIMA will select up to BPS.NOBS_ACCUM observations for each baseline with the highest SNR, not counting the observation used for computation of the initial bandpass, provided they SNR is BPS.SNR_MIN_ACCUM or above. The advantage of the accumulation bandpass is that it is unweighted average over N observations, and therefore, it accounts to some degree bandpass variation. If a user selected ACCUM mode PIMA task bpas stops here.

If a user chooses FINE bandpass computation mode, PIMA selects K observations per baseline with the highest SNR beyond that that was used in the INIT mode. It applies accumulation bandpass and forms the system of linear equations for adjustment to parameters of the model for the phase bandpass and logarithm of the amplitude bandpass, i.e. either coefficients of Legendre polynomial or B-spline. It solves the system using weighted least squares with weights proportional to fringe amplitude. Then it computes residual phases and amplitude corrections, computes their statistics, and if the statistics exceed the specified threshold, it discards the observations with the greatest residual and repeats computation until either the statistics become lower than the threshold or the number of rejected observations per baseline reaches the specified limit.

Keyword BPS.SNR_MIN_FINE specifies the minimum SNR. Observations with SNR below that limit are not used by PIMA for bandpass computation in the FINE mode. Keyword BPS.NOBS_FINE specifies the number of observations with the highest SNR per baseline that PIMA selects for bandpass computation in the FINE mode, provided their SNR is no less than BPS.SNR_MIN_FINE. If a given observation has residual statistics above the threshold, PIMA will discard it provided the number of remaining used observations is no less than BPS.MINOBS_FINE. This mechanism prevents rejection of too many observations.

After performing the first iteration of LSQ adjustment, PIMA computes for each IF weighted rms of residual phases and normalized residual amplitudes. Then PIMA finds an observation with maximum phase and maximum amplitude residual. If the rms of phase residual exceeds BPS.PHAS_REJECT radians, that observation is marked for rejection. If the rms of normalized amplitude residuals exceeds BPS.AMPL_REJECT, that observation is marked for rejection. If the number of used observations still exceeds BPS.MINOBS_FINE, the observation is rejected, and the next iteration runs. PIMA maintains two counters of used observations: for the phase bandpass and for the amplitude bandpass. An observation may be rejected for amplitude bandpass but kept for phase bandpass, or vice versus, or rejected for both amplitude and phase bandpasses.

PIMA task bpas prints valuable statistics when DEBUG_LEVEL: 3 or higher. An analyst should always examine it. Lines that starts with "BPASS Removed" are especially important. If there are many rejected observations at a given baseline, especially if their phase or amplitude rms of residuals is large, an analyst should examine these observations: to run a trial fringe fit and look at residuals. Bad observations may skew bandpass evaluation, so it may be necessary to examine residuals with and without applying bandpass (BANDPASS_USE: NO). One of the reasons of computing bandpass in FINE mode is to find observations with large residual phases or residual amplitudes.

It may happen that a fraction of high SNR observations are affected by hardware failure or RFI. If bpas task does not reject them, they skew the estimate of bandpass. There is another mechanism to get rid of such observations for bandpass computation: to put them in the exclude list. PIMA wrapper pf.py automatically checks file VVVVV/EEE/EEE_B_bpas_exc.obs. If it exists, it adds option EXCLUDE_OBS: VVVVV/EEE/EEE_B_bpas_exc.obs, i.e. excludes them from participation in bandpass computation.

PIMA task bpas assumes bandpass is stable. It may happen one or more jumps, f.e. due to power failure. In that case PIMA will reject many observations. At the moment, PIMA does not have a capability to accommodate a jump. A workaround is to process two portions of the experiment separately by specifying observation lists using INCLUDE_OBS_FILE, EXCLUDE_OBS_FILE or OBS keywords. If a given station has more than 2–3 jumps in bandpass, it should be discarded and station staff should be alerted.

If there are many rejected observations an analyst may a) mask out bad channels b) add offending observations to the exclude list; c) exclude a list of observations; d) discard a station; e) ignore it.

In a case of dual-polarization data when POLAR: I is specified, the procedure is repeated twice: first for the RR polarization bandpass second for the LL polarization with respect to the RR polarization data. Therefore, a trial fringe fit with bandpass applied should run three times: with RR polarization, with LL polarization, and with I polarization. If the polarization bandpass was computed perfectly, then SNR at I polarization should be   SNR2RR + SNR2LL SNR reduction 2–5% with respect to the expression above is rather common, though if the SNR at I polarization is more than 10% worse, this indicates a problem that should be investigated.

A general recommendation is to run trial fringe fitting for 5–7 observations that are marked as "removed" in the log file of bpas task with an without bandpass applied in order to familiarize with the data. If fringe plots look satisfactory, the next step: fringe fitting in the fine mode should be done. Otherwise, masking, deselection of bad observations, phase calibration disabling/enabling should be repeated. NB: computation of bandpass should be repeated if a) bandpass or phase cal bandpass mask was changed or b) treatment of phase calibration was changed.

Running fine fringe fitting

Fine fringe fitting is the main task. During coarse fringe fitting, the fine fringe search procedure is disabled in order to speed up the process, and the bandpass was not applied. During fine fringe search this simplification is lifted.

PIMA task frib performs fringe fitting. Wrapper pf.py is called as

 Usage: pf.py exp band fine
PIMA task frib creates two ascii output files defined in keywords FRINGE_FILE and FRIRES_FILE. The first file keeps results of fringe fitting and it is used by other tasks. The latter file with fringe fitting residuals is for informational purposes only.

NB: wrapper pf.py by default overwrites the files with fringe results if it exists. Wrapper option -keep prevents overwriting the file with fringe results and fringe residuals specified in the control file. PIMA tasks that reads results of fringe fitting, f.e. mkdb or splt, processes the fringe file sequentially. If there is more than one record for a given observation, the latest record takes the precedence.

There is a number of parameters that controls fringe fitting.

Export data for astrometry/geodesy solution

Fringe results can be transformed to the form that astrometry/geodesy software Solve can ingest. Task mkdb reads fringe files, fringe residual file, contents of internal PIMA tables that are kept in SSSSS/EEE.pim file, and contents of visibility files, computes scan reference time (SRT), computes a priori path delays on the SRT, computes total group delays and phase delay rates on SRT, sorts them, and writes them into database files in either binary Geo VLBI Format (GVF) or plain ascii (TEXT) format. In a case of dual-base observations it reads two fringe results and fringe residual files for both bands and matches them.

Basic operations of this task are a) splitting the data into output scans with their reference time; b)computation of path delay; c) formatting the output.

PIMA performs baseline-dependent fringe fitting, and it processes observations independently. PIMA finds the fringe reference time (FRT) automatically as a mean weighted epoch among used accumulation periods when FRT_OFFSET: AUTO. In general, the FRT is different even if all stations had the same nominal start and stop time. Often VLBI experiments are scheduled in such a way that all stations have the same stop time but different start time. Geodetic schedules tends to have chaotic start and stop time when different antennas of the network have different start and stop epochs.

PIMA has several way to set the scan reference time. The algorithm is controlled by keyword MKDB.SRT. If it has value SRT_FRT, then PIMA sets the SRT the same as FRT. As a result the number of output scans, i.e. observations with the same epoch, tends to be the same as the number of observations, i.e. each output scan has only one observation. This is usually undesirable.

When MKDB.SRT: MID_SCAN, PIMA consolidates time epochs in order to have as many as possible observations of the same scan to have the same epoch. But when the reference epoch is moved away from the weighted mean epoch, the uncertainty of group delay increases. PIMA has two keywords that controls the interval of time the scan reference time can deviate from the reference time for group delay uncertainty not to grow too much: MKDB.GD_MAX_ADD_ERROR and MKDB.GD_MAX_SCL_ERROR. Keyword MKDB.GD_MAX_ADD_ERROR specifies the tolerance of the absolute increase of the uncertainty in seconds. Keyword MKDB.GD_MAX_SCL_ERROR specifies the tolerance of the increase of the uncertainty as a fraction of the original group delay uncertainty derived by PIMA task frib. PIMA uses the smallest of the these two tolerances. Typical value of MKDB.GD_MAX_ADD_ERROR is 5.D-12, typical value of MKDB.GD_MAX_SCL_ERROR is 0.2. That means that if, for example, the original path delay uncertainty is 40 ps, the tolerance is the smallest of 5 and 0.2*40= 8 ps, i.e. 5 ps. PIMA determines the SRT in such a way that the increase of the uncertainty within the tolerance limit be minimal. If there are observations that cannot be combined to the same SRT, PIMA splits input scans set by task load into several output scans.

PIMA allows a user to compute scan reference time, write them in file and supply the file name as the value of MKDB.SRT. This may be useful for comparison results with other fringe fitting software.

PIMA can write the output in three different formats. The format is controlled by the keyword MKDB.OUTPUT_TYPE. Value TEXT instructs PIMA to generate a plain ascii output file with total group delays, total phase delay rates and many other quantities, one line per observation. Value AMPL instructs PIMA to generate a plain ascii output file fringe amplitude, fringe phase, Tsys, gain, uv baseline projections and other parameters are written in a plain ascii table, one line per used observation. Results in AMPL format are mainly for non-imaging flux density analysis. Value GVF instructs PIMA to generates a database in binary GVF format. VLBI analysis program Post-Solve for geodesy and absolute astrometry accepts GVF as input. Therefore, task mkdb provides an interface between PIMA and Post-Solve.

Keyword MKDB.OUTPUT_NAME controls the name of the output file. Its meaning depends on MKDB.OUTPUT_TYPE. If the output type is TEXT or AMPL, then the value of MKDB.OUTPUT_NAME is the file name. If the output type is GVF, then the value of MKDB.OUTPUT_NAME is the database suffix — a character in lower case. The database name is yyyymmdd_s where yyyy is the year, mm is the month number with heading zero, dd is the day of the month with heading zero and is the suffix specified by MKDB.OUTPUT_NAME. Suffixes a-e are reserved to imported databases converted from MARK3-DBH or vgosdb formats to gvf. Suffixes x-z are resolved for tests.

In order to generate the output in GVF format, PIMA requires some additional information that it cannot find in FITS-IDI files with visibilities. This information is supplied in an the experiment description file. The name of that file is the value of keyword MKDB.DESC_FILE. If the value of MKDB.DESC_FILE is NO, then no information that is supposed to be defined in the experiment description is exported to the output database.

PIMA can put in the output database two frequency bands in the dual-band experiment. The fringe fitting procedure runs twice for a dual-band experiment. The second band can either occupy a range of IFs or a frequency group. PIMA requires to have two control files for the lower and upper bands. Task mkdb should use the control file with the upper band. The file for the upper band of dual-band data should have a reference to the control file file for the lower band. The name of the control file for the lower band is specified in the keyword MKDB.2ND_BAND.

Post-Solve has two slots for group delays, phase delay rates and other quantities: the 1st and the 2nd. Post-Solve assumes the first slot is the the upper frequency and the second one is for the lower frequency but it does not check.

Keyword MKDB.2ND_BAND should have value NO for single band experiment and for the control file for fringe fitting the lower band. It should have the name of the control file for the lower band inside the control file for the upper band of a dual-band experiment.

PIMA needs to know where to write the output database in GVF format. Keyword MKDB.VCAT_CONFIG specifies the configuration file for VLBI database catalogue. That configuration file defines two directories: directory for ascii envelope wrappers (keyword GVF_ENV_DIR) and directory for binary files (keyword GVF_DB_DIR). The same configuration files is supposed to be used by Post-Solve. Post-Solve assumes that configuration file has name $SAVE_DIR/vcat.conf . Therefore, in order Post-Solve to find the database crested by PIMA, the control file should specify $SAVE_DIR/vcat.conf with environment variable SAVE_DIR expanded.

Database file in GVF format can be read with GVH library. The GVH package has routine gvf_transform that can transform from binary representation of the database to the ascii and back to binary form. The ascii representation of a database is human readable. Moreover, Post-Solve can work with both ascii and binary representation, binary representation being more than one order of magnitude faster. Transformation to an ascii representation can be useful for simple editing such as replacement of source name or station name.

Post-Solve classifies observations as good, bad, but recoverable, and bad unrecoverable. The latter category is not visible and post-Solve does not allow to recover them. The common reasons for that: 1) SNR less than the limit specified in the keyword FRIB.SNR_DETECTION of the PIMA control file; 2) lack of phase calibration for a given observation when pcal is anything else than NO; failure in fringe fitting, f.e. because there are less than 3 valid accumulation periods. Since PIMA does not determine whether an observation is really detected or not, if FRIB.SNR_DETECTION is too high, there is a chance that Solve will miss good observations. For this reason, it is recommended to set a rather low FRIB.SNR_DETECTION parameters, f.e. 5.0, and then set the SNR limit in Post-Solve but hitting J. Solve sets the SNR limit temporarily. The observations below the limit are considered unrecoverable until the limit is changed. If the snr limit is lowered, but not below the limit used by PIMA, the observation from bad and unrecoverable becomes bad and recoverable. Reducing the SNR limit in Solve below the limit used by PIMA does not have effect.

A suggested strategy is to set a low FRIB.SNR_DETECTION, 5.0, and after loading the database into post-Solve set higher SNR limit in Solve, 5.8–6.0. After cleaning the database for outliers with SNR limit 5.8–6.0 the limit is lowered to the value used by PIMA. Such a strategy avoids the problem of contamination the dataset with too many outliers which may cause difficulties in initial analysis, since too many outliers, say more than 10% may significantly skew residuals.

Split and export data for imaging

PIMA has a capability to format its results in the form that is suitable for both absolute astrometry/geodesy or imaging. In the latter case PIMA coherently averages the visibilities over time and frequency and applies all necessary calibrations and re-normalizations.

If a given observation, given IF does not system temperature or antenna gain, or have Tsys out of range [10, 10000]K or have zero gain, the visibility for such an observations, such an IF is discarded. Therefore, calibration for system temperature and antenna gain must be performed before running task splt. In contrast, the data without amplitude calibration are still usable for absolute astrometry/geodesy. Flagging data for the time intervals when the antennas were off-sources is essential for deriving a good-quality image. Flagging can also be performed before mkdb, although usually it has only a marginal effect.

Import gain curves

As of 2016, only the NRAO generates visibility data that are fully compliant with FITS-IDI specifications and contain gain curves. Data generated by other correlators do not have this information, and therefore, the gain curves should be imported. Import gain curves is beneficial for processing VLBA data, since the gain curves embedded in the database may not be the best one.

PIMA task prga (PRint GAin) prints gain for each station, each IF. It is recommended to inspect these values.

PIMA supports two formats of gain information: VLBA gain and EVN gain files. The VLBA gain format allows to specify different gain for time ranges, and therefore, this format is preferable.

Gain file specifies gain at the reference elevation, the so-called Degrees Per Flux Unit (DPFU) factor in Jy/K for R and L polarizations for a certain frequency range and a set of coefficients that specify the polynomial that describes the dependence of gain with elevation — that is why it is called "gain curve". If the elevation dependence is not known, than the polynomial has only one coefficient for degree 0: 1.0

PIMA task gean allows to import gain curves. It requires either qualifier vlba_gain or evn_gain. The value of the qualifier is the file name with gains. When the firsts qualifier is vlba_gain PIMA requires the second qualifier gain_band that specifies the band. Supported bands are class="val">90cm, class="val">50cm, class="val">21cm, class="val">18cm, class="val">13cm, class="val">13cmsx, class="val">6cm, class="val">7ghz, class="val">4cm, class="val">4cmsx, class="val">2cm, class="val">1cm, class="val">24ghz, class="val">7mm, class="val">3mm. NB: PIMA does not check whether the band is supported.

PIMA wrapper pf.py does this task as well. It assumes to find files vlba.gains and ivs.gains in directory specified by configuration parameter --stable-share. pf.py wrapper tries both gain files.

Task gean does not report an error, if it does not find a gain for the specific station, specific frequency range, specific time interval. If it does not find gain, the gain is set to zero. If the gain is zero for a given station, given IF, PIMA task splt will not export visibilities for a given station. It is recommended to inspect gain values by running PIMA task prga after importing gains in order to be sure the are correct. Wrapper pf.py runs prga automatically at the end. The results an be found in the log file of task gain.

Flagging visibilities with low amplitude at the beginning or end of a scan.

Data acquisition system often record before and/or after the actual scan time. The field system is supposed to record time stamp of nominal start and nominal stop time and the correlator is supposed to flag accumulation periods that were recorded when the antennas were off-source. However, it may happen that visibility data for a given time have intervals when the antenna were off-source. Usually, this happens at the beginning of the scan. This may happen because the fields system software incorrectly determine on/off time, or did not propagate it to the correlator, or the antenna was off while the fields system software reported the antenna was on. Propagation such "data" poses a serious problem for imaging. Such visibilities should be flagged during imaging stage. If left unflagged they distort an image. PIMA has task onof that analyzes the data, determines accumulation periods with the fringe amplitude at the beginning and/or the end of a scan with fringe amplitude below the threshold and flag them out. This task should run before splt.

PIMA supports two mechanisms for flagging visibility. When PIMA loads the data, checks all visibilities for inconsistencies, such as lack of autocorrelation, wrong source indices, duplicates, etc. It puts visibility indices in a separate file and bars them from loading. These visibilities are considered unrecoverable. PIMA supports keyword TIME_FLAG_FILE that defines a so-called time epoch flag file. The file flag file consists of records in plains ascii that defines the visibility to be flagged. A record consists of four words separated by one or more blanks. The first word is the observation index, the second word is the index of the accumulation period within that observation, and the third word is a flag. The flag is multiplied by the visibility. Flag 0 means the visibility will not be used for further processing. Lines that start with # are considered as comments and discarded by PIMA.

Task onof uses this mechanism to flag out bad accumulation periods. It determines accumulation periods with low amplitude and write their indices and observation indices into the time epoch file. Other tasks, such as frib, splt read this file and flag out visibilities that have corresponding indices.

Task onof does not require qualifiers. Its behavior is determined by a number of keywords of the control file. If Keyword ONOF.GEN_FLAGS_MODE is CREATE, PIMA will ignore the previous contents of the flag file and overwrite it. If ONOF.GEN_FLAGS_MODE is UPDATE, then PIMA will honor input of the flag file specified by the keyword TIME_FLAG_FILE and update it. In this mode PIMA will never reduce the number of flagged visibilities, but it can only increase them.

In order to determine accumulation periods at the beginning or the end of the scan that have to flags, PIMA needs to get a hint which interval to consider as "good". Two keywords, ONOF.KERNEL_START_SHARE and ONOF.KERNEL_END_SHARE determine the so-called kernel interval. The value of these keywords are the offsets or the kernel start and stop time as a share of the total nominal scan length. The share runs from 0 to 1. instance, ONOF.KERNEL_START_SHARE: 0.25, ONOF.KERNEL_END_SHARE: 0.80 specifies the kernel interval that starts at 0.25*scan_length and ends at 0.80*scan_length. However, the length of the kernel interval is limited by the value of ONOF.COHERENT_INTERVAL (in seconds). If parameters ONOF.KERNEL_START_SHARE and ONOF.KERNEL_END_SHARE specify the interval longer than ONOF.COHERENT_INTERVAL, then ONOF.KERNEL_END_SHARE is reduced in such a way that the kernel interval will be close, but not exceeding ONOF.COHERENT_INTERVAL.

PIMA first computes coherently averaged complex visibilities over the kernel interval and then tries visibilities coherently averaged over frequency over and accumulation periods backwards from the start of the kernel interval and forward from the end of the kernel interval. PIMA computes the frequency averaged visibility amplitudes, computes the ratio of the visibility amplitude over the trial accumulation to the amplitude over the kernel interval and tries two criteria: a) if the ratio is less than (1-k*σa), where k is the value of the keyword ONOF.NSIG_THRESHOLD and σa, then the accumulation periods is marked as a candidate for exclusion; b) if the ratio is less than ONOF.AMPL_THRESHOLD, then the accumulation periods is marked as a candidate for exclusion. If ONOF.AMPL_THRESHOLD is zero then the first criterion is disabled. If ONOF.AMPL_THRESHOLD is zero, then the second criterion is disabled.

If PIMA finds k consecutive candidates, where k is the value of keyword ONOF.MIN_LOW_AP, then it flags them and all consecutive visibilities at the beginning or the end of the scan.

Criterion ONOF.AMPL_THRESHOLD is suitable for observations with with high SNR and long accumulation periods. Visibility amplitude computed over one accumulation period has a large scatter and the fluctuations caused by noise may be mistakenly considered as the source being off source.

Criterion ONOF.NSIG_THRESHOLD is suitable for both low SNR and high SNR observations. Value 3 was found satisfactory for most of cases. Task onof is not able to find time interval when antennas were off-source for observations with low SNR, say less than 10, because amplitude fluctuations due to random noise become too large to be distinguished from antennas being off-source.

Running task splt for splitting and exporting data for imaging

Using results of fringe fitting, PIMA performs coherent averaging over time and frequency after rotation phases according to group delays and phase delay rates, applies calibration for system temperature, gain curves, bandpass re-normalization, combines all visibilities of a given source and writes averaged visibilities and their weights into output binary files in FITS format that are suitable for imaging with AIPS or DIFMAP.

Task splt processes the data on source basis. A user can specify the source name that will be processed or to request to process all the sources in a cycle. Keyword SPLT.SOU_NAME controls this behavior. Its value can be either B-name, or J-name or ALL, which means to process all the sources.

Keyword SPLT.FRQ_MSEG specifies the number of spectral channels to be coherently averaged out. A usual choice for processing legacy data is to specify the number of spectral channels in an individual IF. In that cases all spectral channels will be averaged out. PIMA does not average spectral channels across IF boundaries. That means that the maximum value of SPLT.FRQ_MSEG is limited to the number of channels in an IF. Starting from 2014, observations at 512 MHz band became more and more common. Averaging across 512 MHz band may result to image smearing. Therefore, it may appear beneficial to decrease the number of spectral channels that will be averaged. Task splt averages SPLT.FRQ_MSEG spectral channels in one output IF. If SPLT.FRQ_MSEG is less than the number spectral channels in one IF, then the output dataset will have more output IFs than the input dataset.

Keyword SPLT.TIM_MSEG specifies the number of accumulation periods to be coherently averaged out. The interval of time that with SPLT.TIM_MSEG accumulation periods is called segment. A usual choice is to select segment duration 10–20 seconds. DIFMAP allows to averaged data further increasing segment length (DIFMAP task uvaver), but after the data have been averaged, there is no way to undo averaging. In a case when a very large map will be made, SPLT.TIM_MSEG may be reduced in order to avoid image smearing.

One output visibility is a result of coherent averaging over SPLT.FRQ_MSEG*SPLT.TIM_MSEG input visibilities. It should be noted that if SPLT.FRQ_MSEG is not an integer divisor of the number of spectral channels and SPLT.TIM_MSEG is not an integer divisor of the total number of accumulation periods of an observation, the number of used input visibilities can be less at the end of the observation or at the upper part of the spectrum can be less.

Phases of input visibilities are rotated before averaging according to phase delay rate and group delays that are found during fringe fitting. PIMA reads results of fringe fitting from the file specified by the keyword FRINGE_FILE.

Since PIMA performs fringe fitting for each observation independently, in general, fringe reference time is different. As a result, the misclosure of the raw phases from the contribution of group delay and phase delay rates is not zero. If SPLT.STA_BASED: YES or ALL, then PIMA performs a procedure that converts baseline-dependent group delays and phase delay rates to station-based that automatically have zero misclosure and therefore, applying phase rotation from the results of fringe fitting does not change misclosure in original visibilities. PIMA performs this conversion at each scan. It may happen that observations of a given scan have to be split into several subarrays. A subarray is a set of observations that has common baselines with every station within a subarray. For example a station array ABCDRFG may have to be split into two subarrays ABCD and EFG if there are no usable observations at baselines AE,AF,AG, BE,BF,BG,CE,CF,CG,DE,DF,DG . If in this example, there are no usable data at baseline FG, there will be three subarrays: ABCF, EF, EG. PIMA splits the data into subarrays for processing each scan. If a subarray for a given station was already used in the previous scans, PIMA assigns the observations to that subarray. In a case if a new subarray has all the stations that were in one of the previous subarrays, PIMA assigns observations to that subarray. In a case is a new subarray has all the stations that were in one of the previous arrays, plus one or more new station, PIMA extends that subarray, and assigns the observations to that subarray.

In a case if all scans were scheduled at all antennas and fringes were detected at all observations and no observations were excluded, there will be only one subarray. If one of these conditions is violated, PIMA may end up with many subarrays. Since splitting data in many subarrays reduces the number of phase and amplitude misclosures, in general it is undesirable to have many subarrays. PIMA supports a procedure subarray consolidation controlled by the keyword SPLT.SUBARRY_CONSOLIDATION. Value NO means to disable subarray consolidation. Value MIN instructs PIMA to preform minimal subarray consolidation: if all stations of subarray A are present in the subarray B, then the subarray B is consolidated with subarray A. Value MAX instructs to preform maximum subarray consolidation: if a subarray has at least one common station with subarray B, both subarrays are consolidated.

In order to compute station-dependent group delay, phase delay, and group delay correctly, baseline-dependent group delay, phase delay, and group delay should be correct. Fringe fitting provides a wrong result for a non-detection, or an observation affected by RFI. Group delays of non-detections have a uniform distribution over the fringe search window, which is several orders of magnitude larger than the scatter of group delays for normal observations. Therefore, a care should taken in order to block using bad group delays by task splt. It is recommended to process the data with VTD/Post-Solve in order to identify outliers and exclude them as input to the procedure for computing station-based group delay, phase delay and group delay rate using keyword EXCLUDE_OBS_FILE. If SPLT.STA_BASED: YES is used, the observations excluded for computing station-based quantities will remain excluded from being further processed and written in the the output file. However, in general, this approach is too restrictive. When SPLT.STA_BASED: ALL is specified, the filters specified by kewords FRIB.SNR_DETECTION, EXCLUDE_OBS_FILE, INCLUDE_OBS_FILE, and OBS is applied only to the input data of the procedure for computing station-based quantities. All observations between the stations for which station-dependent group delays, phase delay, and group delays are computed are used for further processing, regardless whether they passed the input filter or not. Though if there were no observations at baselines at certain stations, there will be no station-based quatnties for these stations, and therfore, no for the observations with these stations will be used for generating the output. Value ALL is recommended.

Alternatively, when SPLT.STA_BASED: NO, PIMA does not convert baseline-dependent quantities to station-based. Important: phase misclosure is distorted when this option is used. This option is useful when the data are to be processed on a baseline basis.

There are several options to generate the output in a case of dual-polarization data. Keyword SPLT.POLAR specifies which polarizations to put in the output: ALL for all polarizations that present in the data, PAR only for RR or LL polarizations. Other supported values: I, RR, RL, LR, and LL.

PIMA computes averaged visibilities and their weights. The algorithm for weights computation is controlled by keyword SPLT.WEIGHT_TYPE. According to FITS specifications, weight is defined as reciprocal to the fringe amplitude variance. Value ONE forces PIMA to set all weights to 1. Value OBS_SNR instructs PIMA to compute weights on the basis of signal to noise ratio SNR. The segment weight is Ampl/SNR**2, where SNR is the signal to noise ratio over visibilities of a given segment. The segment SNR computed from the SNR over all visibilities used in fringe fitting and scaled by square root of the the ratio of visibilities in the segment to the total number is used visibilities in the observation. When SPLT.WEIGHT_TYPE is OBS_RMS, PIMA computes variance of the fringe amplitude over the observation and assigns weights for all segments reciprocal to this estimate of variance. When SPLT.WEIGHT_TYPE is SEG_RMS, PIMA computes variance of fringe amplitude over visibilities of a given segment and assigns weights reciprocal to this variance.

Method SEG_RMS is the preferable, since it accounts for temporal variation of the variance. However, it requires a sufficient number of segments for computing meaningful variance. When SPLT.WEIGHT_TYPE is AUTO, PIMA uses different ways to use the weight depending on the number of accumulation periods in a given segments. If the number of accumulation periods per segment specified in the keyword SPLT.TIM_MSEG is equal or greater than the threshold (currently 8), the variance is computed over visibilities of a given segment. Otherwise, the variance will be computed over all visibilities of the observation (equivalent to OBS_RMS). SPLT.WEIGHT_TYPE: AUTO is recommended for a general case.

When computing calibrated amplitude PIMA applies two renormalizations unless a user disables it. When SPLT.AUTOCORR_NRML_METHOD: AVERAGED, PIMA normalizes system temperature for masking autocorrelation. The system temperature is measured by integrating the total power over entire IF. When a portion of the bandwidth where Tsys was computed is masked out, the total power is changed. In general, the spectrum of noise is not constant over the band, it is proportional to the autocorrelation. PIMA normalizes autocorrelation to have the average equal to 1 over the nominal IF width. When SPLT.AUTOCORR_NRML_METHOD: AVERAGED, PIMA computes the mean autocorrelation over the used portion of the bandwidth within each IF, which in general is not 1. Then fringe amplitude is divided by the mean autocorrelation. SPLT.AUTOCORR_NRML_METHOD: NO disables applying this renormalization. It is recommended to use SPLT.AUTOCORR_NRML_METHOD: AVERAGED.

When SPLT.BPASS_NRML_METHOD: WEIGHTED, PIMA divides the fringe amplitude by the square root of the product of the square root of bandpass renormalization factor. The representative bandwidth of the intermediate frequency used for re-normalization is specified by the keyword SPLT.BPASS_NRML_RANGE. The value of this keyword is two numbers from 0 to 1 separated by the colon. These number specify the lower and the high part of the representative bandwidth as a share of the total bandwidth. Example: SPLT.BPASS_NRML_RANGE: 0.25:0.80. They define the representative bandwidth as [F_low + Bl*Fw, F_low*Bh*fw]. PIMA computes renormalization factor R = (sum Br/Nr ) / Sum Bt/Nt, where Br — bandpass in the bandwidth is [F_low + Bl*Fw, F_low*Bh*fw], Nr — the number of points in that bandwidth; Bt bandpass in the total bandwidth, Nt the total number of points in the entire IF. Factor R is multiplied by every point of the bandpass and makes its normalized over the representative bandwidth [F_low + Bl*Fw, F_low*Bh*fw]. Usually R > 1.0 For example, if the IF bandwidth is 16 MHz and SPLT.BPASS_NRML_RANGE: 0.25:0.80, then the representative portion of the bandwidth used for renormalization starts at 0.25*16=4.0 MHz and ends at 0.8p*16=12.8 MHz. Thus, the portion [4.0, 12.8] MHz of the total bandwidth [0, 16] MHz is considered representative and the bandpass is normalized to be 1 over the representative portion of the bandwidth. The share of the representative bandwidth depend on quality of hardware. 0.25:0.80 is a good choice for most of the cases.

Compute gain correction

It is rather common from some stations to have some IFs with gain to be wrong by a certain factor. This may be due to unaccounted change in gain curve, or due to systematic error in Tsys. During imaging process gain can be adjusted using amplitude closure. This procedure is called amplitude self-calibration. However, a source should be relatively bright and UV coverage should be rather dense for amplitude self-calibration to produce good results. If the gain is off by a factor that is constant over entire experiment, the gain correction determined from imaging one source can be used as a priori for imaging other sources. This is just that PIMA task gaco (GAin COrrection) does.

PIMA supports keyword SPLT.GAIN_CORR_FILE that specifies so-called gain correction file. This file in plain ascii format defines factors for each station and each IF by which fringe is multiplied when task splt runs.

These factors can be assigned manually or automatically. Task gaco can run in two modes: manual and automatic. In manual mode PIMA expects a qualifier init with the value of the a priori factor. Value 1.0 or 0.0 are usual choices. Task gaco with qualifier init sets all gains to the initial value. Gain correction 1.0 means no correction. Gain correction 0.0 means all IFs all stations should be deselected.

Task gaco writes the gain correction file. If task gaco was invoked in init mode, the gain correction should be edited in order to be useful. Example: station KP-VLBA had fringe amplitude a factor of 8–10 lower in IFs 3 and 4, and a user would like to get rid of them for imaging purposes. Than PIMA task gaco with qualifier init and value 1.0 is called. After that the user edits the gain correction file that the task created ad changes gain correction for KP-VLBA IFs 3 and 4 from 1.0 to 0.0.

In order to to compute gain corrections in the automatic mode, several images should be made first and self-calibrated visibilities be saved. Then PIMA analyzes the ratio of original amplitudes before amplitude self-calibration and after amplitude self-calibration and determines their ratios for each station and each IF using least squares. These gain corrections are equivalent to the factors should by task CORPLT of DIFMAP package. In order to do it, PIMA should find visibilities before and after imaging. PIMA supports convention that a file with visibilities before imaging have name JJJJJJJJJJ_B_uva.fits, where JJJJJJJJJJ is the 10-character long J2000 source name and B is the band defined in the keyword BAND. PIMA task splt created files with averaged visibilities in this format. PIMA expects files with self-calibrated visibilities after imaging to have names in the form of JJJJJJJJJJ_B_uvs.fits.

When used in the automatic mode, PIMA task gaco expects two qualifiers, sou and dir, the first is mandatory and the second is optional. The value of the first qualifier is a comma-separated source list. The value of the second optional qualifier specifies the directory where files with original and self-calibrated visibilities can be found (they should be in the same directory). If the second qualifier is omitted, PIMA will search for visibilities in the same directory where task splt put them: SSSSS/EEE_uvs, where SSSSS is the PIMA scratch directory specified by the keyword EXPER_DIR and EEE is the experiment name specified by the keyword SESS_CODE.

PIMA task gaco used in the automatic mode computes the gain correction file. If a given station observed no sources from the list, the gain correction for that station is set to 1.0. A user may edit the gain correction file, for instance setting zeros for IFs for certain station(s). Setting the gain correction to zero will effectively flag out these IFs for imaging purposes, while these IFs are still available for other tasks, for example, fringe fitting.

PIMA task splt uses gain file, unless SPLT.GAIN_CORR_FILE: NO. It multiplies the calibrated visibility by the product of gain corrections of both stations of a baseline. It writes the used gain corrections into output FITS file in two places: 1) as an ascii table in the HISTORY records of the main table, 2) as a new table in GACO. It is possible to run task taco the second time. PIMA searches for gain correction in the FITS file with calibrated visibilities, applies these corrections as a priori and writes updated total gain correction with respect to a case when no gain correction is applied. Thus, if to run gaco task more than once, the result will be approximately the same.

Use case of preparing the data suitable for imaging

The imaging analysis and absolute astrometry/geodesy pipeline has a common beginning: loading the data; parsing log files, checking logs, checking phase calibration, cleaning phase calibration for spurious tones, checking autocorrelation, checking Tsys; running coarse fringe fitting, computation of the bandpass, running fine fringe fitting. After that point the pipelines diverge. Next task will be selecting good reference sources and running task splt for them. A good reference source is strong, observed at all baselines and has relatively simple structure.

It is first recommended to run splt task for several reference sources with DEBUG_LEVEL: 2 or 3 and with FRIB.SNR_DETECTION: 6.0. It is recommended to investigate the splt log file. Search there for lines with PIMA_SPLT_FITSTA SOU:, for instance with using grep. This line provides the statistics for a subarray. (Remember: a scan may have one or more subarrays that can be later consolidated into one subarray). An analyst should examine the column followed by MaxDev_Gr_Del. This column provides the maximum residual of transforming baseline-dependent group delay to the station-based. Typical value of this residual for a detected source is 50–300 ps. A source with complicated structure may have residual 1 – 2 ns. But a residual, say 1000 ns, indicates that at least one observation in the subarray was a non-detection. Even one non-detection can spoil entire dataset for a given source to a level of uselessness. If you see large maximum residual in the subarray statistics, just look in the preceding lines that start with PIMA_SPLT_FITSTA Sou:. Identify sources with large residuals (say more than 12 &ndash 3 ns), identify their observation indices that can be found in the column followed by OBS:, and add these observation indices to the exclude file. Then run PIMA task splt with specifying this file in keyword EXCLUDE_OBS_FILE once again. Then inspect the log file again.

After inspection shows no subarray with large residuals, the selected reference sources are imaged using phase and amplitude self-calibration. If for some reason an image of one of the reference sources is not satisfactory, another that bad source should be replaced with another source.

When good images were produced for all reference sources, PIMA task gaco is invoked with qualifier sou with the comma-separated list of reference sources. After that the gain correction file is inspected. If some IFs at some stations are to be masked out, corresponding values of the gain correction file are replaced with zeroes.

After that task splt runs over entire dataset by specifying SPLT.SOU_NAME: ALL. A care must be taken to use only detected observations. There are two approaches for cleaning the dataset for non-detections. The first approach is to raise the SNR detection limit defined by keyword FRIB.SNR_DETECTION. Depending on the search window, the detection limit is 5.3–6.0, the wider the window the higher the limit. Another approach is to run full absolute astrometry/geodesy pipeline and exclude those observations that are flagged out by Solve. Solve will flag non-detections and other "bad" observations, such as those affected by RFI, low fringe rate problem, etc. Extraction of the list of observations that is performed by program gvf_db that is a part of Solve package. Usage:

    gvh_db database_name mode
where mode is either 10 for a processing a single-band experiment or the upper band of a dual-band experiment and 20 is for processing lower band of a dual-band experiment. The advantage of the second approach is that detections as weak as 4.8 can be used since Solve will eliminated non-detections and other bad detections. Re-fringing allows to recover weak detections. The disadvantage is that Solve should be installed, and running astrometry analysis requires extra efforts.

After running splt over the entire dataset, the splt log should be examined the same way as we did when we processed reference sources.

Task splt creates calibrated visibilities of all the sources, except those that have too few detections and put then in directory SSSSS/EEE_uvs. Calibrated visibilities are used for imaging with DIFMAP, AIPS or another software. Result of imaging are two files per source: a file with self-calibrated visibilities in FITS format and image in FITS-format. The image contains two tables: a set of Clean components and the binary image that was generated from the table of Clean components.

Automatic imaging

In principle, totally automatic imaging is feasible, but development such a system would require an order of magnitude more efforts than it was infested in PIMA. PIMA provides partially automatic imaging capability. In the framework of the approach implemented in PIMA, a user runs fringe fitting, runs astrometry/geodesy solution with Solve, runs task onof, runs splt for 2–4 reference sources, produces their images manually, runs task gaco, and then runs wrapper script pf.py task map. Task map of pf.py wrapper invokes PIMA task gain, splt, calls DIFMAP in a batch mode, generates images of all sources, and creates pictures of all imaged sources in gif format. pf.py scripts puts calibrated visibilities, images, self-calibrated visibilities, images, pictures of source images, and pictures of scan-averaged flux densities as a function of baseline lengths into directory SSSSS/EEE_uvs, were SSSSS is the PIMA scratch directory specified in the keyword EXPER_DIR and EEE is the experiment name specified in the keyword SESS_CODE. PIMA generates the for each source with enough usable data the following files: where B is the band specified in keyword BAND,

Quality of automatically generated images not always satisfactory. Most common reasons: a) not all visibilities when the antennas were off are filtered out; b) visibilities at some IFs require significant corrections, say more than 50%, c) non-detection visibilities used by splt; d) the source is larger than the default field of view. Therefore, automatically generated images require scrutinizing. Recommended approach: an analyst using utility gqvew screens pictures of images and self-calibrated visibilities and selects those sources which images look suspicious. These selected sources are imaged manually. It is recommended to adhere the same name conventions of output files that PIMA uses.

Re-fringe the data using results of astrometry/geodesy solution

In general, an iteration PIMA → Solve → PIMA is required. Interactive Solve is used for a) setting up parameterization; b) outliers elimination; c) re-weighting; d) setting up constraints. Results of Solve are written in the database during operation "database update" (Cntrl/U). Observation suppression status is kept in the database. This information can be extracted and used for excluding suppressed observations by task splt. Sole suppresses observations with residuals greater than some limit, typically 3–4 σ. There are several reasons why an observation may have a large residual: a) deficiency in the theoretical model; b) low fringe rate that results in PIMA selecting correlation between phase-calibration signal; c) PIMA picking up a local maximum in the Fourier transform of visibilities. If the residual is caused by deficiency in the ionosphere or atmosphere model, such an observation is "bad" for astrometry/geodesy, but good for imaging. If the a priori source position used by PIMA had a large error, say more than 0.5"–1", a quadratic term in residual fringe phase appears, group delay is biased, and the SNR is reduced. This problem can be alleviated if fringe fitting is repeated with corrected a priori source position. PIMA checks the difference between the source position used by the correlator and the a priori used by PIMA that it takes from the supplied catalogue. If the differences exceeds a certain threshold PIMA computes phase correction that compensate the quadratic term. This usually fixes the problem. NB: the dataset should be reloaded in order to change in the a priori source catalogue to take effects. Similarly, if a source position used by the correlator was correct, but the source position in the PIMA catalogue is wrong, f.e. due to a typo or wrong source association, PIMA will apply wrong correction and may spoil data that are good otherwise.

If an observation is suppressed because PIMA picked wrong maximum in the Fourier transform of visibilities, it is possible to correct it. We can predict group delay rather precisely after Solve solution: several nanoseconds at 2 GHz and several hundreds picoseconds at 8 GHz and higher. Therefore, we can guide PIMA where to search for the maximum and fringe affected data once more. This procedure is called re-fringing.

Solve has a special mode that prints residuals and a priori delay computed by VTD. To turn this mode, hit key A at the "Last page" menu and rewind spool file (Find "menu 1" set (C)hange Spooling current: on and hit key ; to rewind the spool file with solution listing, i.e. to purge its previous contents. After that just run LSQ solution by hitting key Q, scroll the listing by hitting blank key two times and leave Solve by hitting key T. Then copy the spool file into the experiment directory under name EEE_B_init.spl, where EEE is the experiment name and B is band. Using information in the residual file, one can construct command line for PIMA with modified keywords FRIB.DELAY_WINDOW_CENTER, FRIB.RATE_WINDOW_CENTER, FRIB.DELAY_WINDOW_WIDTH, FRIB.RATE_WINDOW_WIDTH in such a way that PIMA will search for the maximum within 1–3 ns of the group delay predicted on the basis of Solve solution. Program samb that is a part of Solve package does this for you.

Usage:

 samb -p {pima_control_file} -w {window_semi_width_in_nsec} -s {snr_min) 
      -r {residual_file} -o {output_file}
Parameter pima_control_file is the name of PIMA control file. Parameter window_semi_width_in_nsec is new window for group delay search. Recommended value is 5 times the wrms of residuals. Parameter snr_min is the new SNR limit. Recommended value 4.8. Parameters residual_file is the fike with residuals generated by Solve. Finally, parameter output_file specifies the name of the output command file.

Program samb analyzes the file with Solve residuals, finds outliers marked by character > or R in the 8th column, computes the expected residual group delay delay with respect to the a priori model used by the correlator, which in general is different than a priori models used by VTD plus adjustments found by Solve, and uses this value for FRIB.DELAY_WINDOW_CENTER argument for PIMA command line for re-fringing.

We need to save Solve solution before running PIMA by hitting CNTRL/U key. Then we execute the command line generated by PIMA. Re-fringing may or may not find correct maximum in the Fourier transform of visibility data. The control file generated by samb writes the results in file VVVVV/EEE/EEE_B_refri.fri and residuals in VVVVV/EEE/EEE_B_refri.frr, where VVVVV is the PIMA scratch directory, EEE is the experiment name specified in the keyword SESS_CODE of the PIMA control file, and B is the band name.

Next step is to extract records in the output fringe output and fringe residual files generated by the command file created by samb, and to add those that have SNR greater than the limit specified by samb command to the end the main fringe results and fringe residual files. Remember, process fringe results file consecutively. If there is more than one record corresponding to the same observation, the latest record overrides the previous record(s).

Next step is to create a GVF database using updated files with fringe results and fringe residuals with PIMA task mkdb. There is a caveat. When we updated the database, it stores auto suppression and user suppression flags. These flags store the status of observations before re-fringing. Re-fringing may change observation status: an observation that was considered non-detection in the first fringing, may become detected in the re-fringing. Solve does not allow to change status of an observation marked as non-detection. Program gvf_supr_promote solves this problem. It updates flags "not detected". If an observation was not detected in the first fringing, but detected during re-fringing, the flag "not detected" is cleared and flag "suppressed" is set. Program gvf_supr_promote is a part of Solve. It accepts the full database name, including and extension .env, as an argument. Alternatively, the same operation can be performed with wrapper pu.py. Wrapper pu.py has two arguments: experiment name and band.

Operations samb, PIMA, and update of suppression flags can be performed with wrapper pr.py. Wrapper pr.py requires as the first argument the experiment name, as the second argument low case band, as the third argument the SNR limit. Depending on band, pr.py will select delay window semi-width. The delay semi-width can be overridden with optional argument -delwin. Optional flag -nodb causes the wrapper to update only fringe results without creation of a database. This option is necessary when a low band of a dual-band experiment is re-fringed.

After the database is updated, it should be processed with Solve once more. Some observations that were previously suppressed as outliers (ideally all) can be restored. Observations that were considered as non-detections and therefore were considered as unrecoverable if detected during re-fringing appears as "bad", but recoverable. Solve program ELIM in restoration mode should be executed and the database be updated. Usually, there is no need to make a next iteration, unless an error has been made that should be corrected.

Data analysis pipeline

The recommended pipeline consists of three steps: 1) fringe fitting; 2) astrometry/geodesy; 3) imaging. Step astrometry/geodesy can be used for imaging analysis or can be skipped.

Fringe fitting pipeline

  1. Create PIMA configuration file for the experiment.
  2. Load FITS-IDI data. Command: pf.py EEE B load.
  3. Parse log files. Not needed for processing VLBA data after 2014. Command: pf.py EEE B logs.
  4. Load calibration information into PIMA internal data structure. Not needed for processing VLBA data after 2014. Command: pf.py EEE B gean.
  5. If all phase calibration tones are extracted, then examine phase-cal with PIMA task mppl. Create phase calibration mask definition file. Transform the mask definition file to the phase cal mask with using task pmge. Check phase cal tones for all the stations. If needed, update mask definition file and iterate the process till plots of phase calibration show satisfactory results. If 1 or 2 tones are IF are extracted, check phase calibration with task pcpl. If phase calibration for some stations is too noisy, disable phase cal for these stations with task gean.
  6. Rug coarse fringe fitting. Command: pf.py EEE B coarse.
  7. Run bandpass generation in the inspection mode. Command pf.py exp band bpas -insp To examine cross and auto spectrum. To create bandpass mask definition file. Transform mask definition file into the bandpass mask file with command bmge. Run command pf.py EEE B bpas -insp. again and check that masking bad auto- and cross- correlation channels fixed the problem. Update the band definition file if needed and iterate.
  8. Run bandpass generation in the non-interactive mode. Examine the log. Check observations that are marked as outliers. If needed, update the mask definition file, disable phase calibration, and repeat.
  9. Run fine fringe fitting in fine mode. Command: pf.py EEE B bpas fine.

Pipeline for astrometry/geodesy data analysis

This pipeline runs after the fringe fitting pipeline.
  1. Export the results of fringe fitting into GVF database.
  2. Run astrometry/geodesy solution using VTD/Post-Solve.
  3. Suppress outliers that include but not limited to non-detections using Solve.
  4. Store the GVF database using Solve.
  5. Store the full residual file
  6. Run Solve program samb that generates the control file that calls PIMA for re-fringing suppressed observations; run re-fringe with PIMA; select the observations with SNR above the detection threshold and append fringe results for these observations to the end of fringe file; create a new GVF database version 1 using PIMA task mkdb; propagate auto suppression status to the database. All these steps are executed by wrapper pr.py.
  7. Run VTD/Post-Solve solution once again. Suppress outliers and restore good observations marked as outliers.

Imaging pipeline

It is recommended to run imaging pipeline after fringe fitting pipeline and astrometry/geodesy pipeline. The latter pipeline allows you to effectively filter out non-detections and corrupted observations, f.e. observations where the fringe fitting algorithm found the maximum that corresponds to correlation of phase calibration signal. Though it is possible to skip astrometry/geodesy pipeline, but in that case you need to screen observations for non-detections or observations' where fringe fitting failed. Setting a higher SNR limit in a range 6.0–6.5 seems prudent. NB: even one non-detection that slipped into the dataset may severely distort an image.
  1. Import gain table.
  2. Run PIMA task onof.
  3. Extract indices of suppressed observations in VTD/Post-Solve solution in an ascii to use this file as value of the keyword EXCLUDE_OBS_FILE.
  4. Import antenna gain.
  5. Select 2–5 strong sources that were observed at all baselines. Run task splt for these sources. Check splt logs. If there are observations with large residual group delays, add their indices to the file pointed by keyword EXCLUDE_OBS_FILE, and repeat task splt one more time.
  6. Image selected sources.
  7. Run task gaco for imaged sources. Examine gain correction file and edit it, if necessary.
  8. Run task splt for all the sources. Check splt logs. If there are observations with large residual group delays, add their indices to the file pointed by keyword EXCLUDE_OBS_FILE, and repeat task splt one more time.
  9. Image all the sources. You may want to copy to another directory results of imaging strong calibrator that you run before, since PIMA will overwrite them otherwise.
  10. Create pictures of the brightness distribution of the imaged sources.
The three last three steps are performed by wrapper pf.py EEE B map.

Processing dual-band observations

Dual-band observations are processed separately. Though in some cases it is possible to run fringe fitting over a very wide bandwidth (several GHz), in that case this would be called wide-band fringe fitting. Depending on the correlator setup dual-band data can be be put in one frequency group, f.e. VLBA S/X observations or observations at remote wings of VLBA C-band receivers or be put into two different groups. If the frequency layout is not known, the following parameters should be set FRQ_GRP: 1, BEG_FRQ: 1, END_FRQ: 1 before loading the experiment in PIMA. After that, a user should examine the frequency file SSSSS/EEE.frq created by PIMA task load and create two PIMA control files for the upper and lower bands. The upper band is considered primary band and the low band is considered secondary band. Keywords BAND, FRQ_GRP, BEG_FRQ, END_FRQ should define frequency names frequency indices within the frequency band. The control file of the primary (upper) frequency band should define the name of the PIMA control file for the secondary (lower) frequency band in the keyword MKDB.2ND_BAND. The value of this keyword should be NO in the control file for the lower band.

PIMA task load, gean, pmge, bmge are band-independent; other tasks depends on the band an should be executed with the appropriate control file. All operations in PIMA pipeline, except tasks load, gean, pmge, bmge, and mkdb are performed two times: first for lower band and for upper band. They can run concurrently. Bandpass and phase calibration mask files are common for both bands. Task mkdb should be run for the upper band only. When PIMA finds value of MKDB.2ND_BAND that is the PIMA control file for the lower band, it computes the total observables of two control files and puts them in appropriate slots of GVF database. NB: PIMA does not check which band is upper and which band is lower frequency — an analyst should define it. If to run PIMA task mkdb with the control file for the lower band, PIMA will create the GVF with total observables only for that band. For historical reasons Post-Solve always marks the upper band as "X" and lower band as "S" regardless the frequency range.

Tasks coarse fringe fitting, bandpass generation, and fine fringe fitting is executed two times, for lower and upper band. Task mkdb is executed once for the upper band control file only. The GVF database created in the dual-band mode contains the data for both bands. Using VTD/Post-Solve two bands are processed consecutively, first the lower band marked as "S" (Data type "GS"), then the upper band marked as "X" (Data type "GX"). Two files with residuals are created: the upper band and for the lower band. Then wrapper pr.py is executed for both bands: first for the lower band and then for the upper band. Option -nodb should be used with the wrapper for the lower band. This option prevents creation of a database for the lower band, since such a database would not have the data for the upper band. After wrapper pr.py for the lower band is completed, wrapper pr.py for the upper band is executed. During next VTD/Post-Solve iteration lower band and upper band data are re-analyzed. After that a liner combination of the upper and lower band data (Data type: "G_GXS") are analyzed.

Imaging the upper and lower bands is done separately. Gain control files should be separate for upper and lower bands.

Although GVF format allows to support up to 8 bands, as of 2016.05.05, VTD/Post-Solve supports only two bands. An experiment with more than two frequency bands is processed similarly as a dual-band experiment except MKDB.2ND_BAND keyword that should be NO. In such case the keyword MKDB.OUTPUT_NAME that defines the database suffix should be different for each band. Otherwise, PIMA task mkdb will overwrite a database for a different band.

Auxiliary tools

PIMA provides a number of tools for examining the data.

Antenna log processing tool

Program log_to_antab processes input log files generated by software Field System and writes results in PIMA Antab format.
         Usage: log_to_antab mode log_file antab_file  [year]
There are three mandatory arguments:

Tools for examining data in FITS-IDI format

When you receive the data from the experiment that you intend to analyze, you first need to examine the data. NB: PIMA processes data only in
FITS-IDI format. PIMA provides several utilities that are useful for an initial data check.

Tools for manipulation with data in FITS image format

There is a number of tools for processing image data in FITS image format. NB: these tools will work with the data generated by PIMA, AIPS, and DIFMAP. They may or may not work with data generated by other programs. There are two level specifications: FITS and contents definition. FITS format defines only the data structure at the lower level. This information is not sufficient to parse arbitrary FITS-file without knowledge of contents definition specifications.

There are four contents definitions formats that PIMA deals with:

  1. FITS-IDI — format for correlator output. PIMA reads this data when executes task load. Data in FITS-IDI format contain visibilities and a lot of information that describes the experiment.
  2. FITS-UVA — (UV Averaged) format for calibrated visibilities averaged over time and frequency. PIMA task splt writes the data in this format. FITS-UVA data contains visibilities, frequency table and information about experiment (name, date, etc). It may or may not contain gain a correction table. PIMA adheres file naming convention JJJJJJJJJJ_B_uva.fits for FITS averaged visibility data, where JJJJJJJJJJ is a 10 character long J2000 source name and B is the upper case band name.
  3. FITS-UVS — (UV Self-calibrated) format is the same as FITS-UVA, but it contains self-calibrated visibilities after imaging. Visibility amplitudes and phases are corrected by the imaging process. PIMA adheres file naming convention JJJJJJJJJJ_B_uvs.fits for self-calibrated visibility data.
  4. FITS-MAP — format is for storing images. It contains several tables: Clean component table, frequency table and the gridded image. It also contains some auxiliary information: source name, source position, experiment date, beam size, etc. NB: gridded image is derived from Clean components. It does not bring additional information and is included to facilitate visualization. General purpose FITS viewers will show image in FITS-MAP format.
The following tools are provided:

 


This web page was prepared by Leonid Petrov ()
Last update: 2017.03.22_23:06:14