Memo: Perspectives of VLBI analysis development. Author: L. Petrov Last revision: 2002.11.03 Disclaimer: this memo reflects the personal point of view of the author and does not necessarily coincides with opinions of other members of Goddard VLBI group. Abstract ======== The purpose of this memo is to analyze tendencies of the VLBI processing development in mid-term perspective. 1. Introduction. ================ VLBI as well as other space geodetic techniques is evolving. The project of development of 1 Gb/sec VLBI recording system which started in early 1990s is coming to the finish. First experiment in 256 Msample/sec mode started in July 2000, and according to Alan Whitney we can expect first test experiments in 1024 Msample/sec mode between January and July 2003. However, developments in technology can improve results only if developments in analysis match them. First of all let's to analyze what does it mean improvement? We can consider as improvement a result of purposeful activity if 1) accuracy of final results is increased; 2) precision of observations becomes better; 3) data analysis procedure become more reliable (probability of a blunder is reducing); 4) data analysis procedure become more automatic, runs faster; 5) the differences in results obtained by different analysis centers are reducing; 6) the difference between VLBI results and results from other geodetic technique is reducing. Of course, it would be if nice when all these criteria are satisfied. However, in practice, these criteria are up to some degree independent and even contradictory. Therefore, we have to assign value for each criteria or select the criterion which is declared the most important and ignore all other considerations. I consider criterion 1 as the main objective, criterion 2 as subjugated to it, somewhat important criteria 3-4, and completely ignore criteria 5-6. 2. Overview of development in 1997-2003. ======================================== Development of data analysis includes an improvement of observation modeling, parameter estimation technique, investigation the error source. A) Considerable efforts were undertaken in attempt to improve modeling 1) refinement of solid Earth tides; 2) refinement of ocean loading; 3) refinement of atmosphere pressure loading; 4) Using mapping function based on global meteorological assimilation models; 5) hydrology loading; 6) Source structure contribution; Projects 1-2 are finished, projects 3-5 are at the final phase (will be finished within 6 months), project 6 is at early phase. All these projects promise only marginal improvement in accuracy of source coordinates, site position and EOP. B) Improvement in parameter estimation technique 1) Rigorous algorithm for outliers elimination and additive reweighting; 2) Generalization of outliers elimination and additive reweighting algorithms for elevation dependent weights. 3) Investigation of alternative methods of EOP estimation (f.e. direct estimation of EOP expansion on the set of basis functions, and other approaches). Procedure of automatic outliers elimination resulted in significant improvement of precision and accuracy. It was reported that baseline length repeatability was improved at 20% with respect the case when outliers were edited out manually. Unfortunately, rigid organization of VLBI databases does not allow to run the procedure of outliers elimination in batch mode. Status: project 1 is completed, projects 2-3 are adjourned due to shortage of resources. C) Automation and software optimization 1) Implementation of B3D, B1B3D algorithms; 2) Automation of data flow: correlator --> data center --> analysis center --> data center. 3) Automation of computing and submission of the EOP in the mode of rapid service (OPA and BKG package) 4) Group delay ambiguity resolution; 5) Automatic processing of log files; 6) Implementation of block algorithms for linear algebra operations (matrix multiplication, matrix inversion etc.) 7) Use of artificial intellect technology for solution parameterization. Projects 1-3 are completed and widely used, projects 4-5 were completed, but not used widely partly due to conservatism of analysts, partly due to weakness of algorithms which do not work at 100% cases. It is rather clear how algorithms may be improved, much less clear how conservatism can be overcome. Project 6 will be completed within 1-2 months. Project 7 failed and nowadays is abandoned. D) Improvements in computing normal points 1) Resolving phase delay ambiguities; 2) Correction of group and phase delay for presence of spurious signal in phase calibration (Phase Doctor); 3) Refinement of group and phase delay computation using AP-by-AP information (Fringe Doctor). Project 1 showed that precision of Mark-3 technology was not sufficient for certain resolving phase delay ambiguities for routine experiments. At the same time it was shown in numerous examples that phase delay ambiguities can be resolved if a) high SNR is achieved at both X-band and S-band; b) instrumental errors are the low level. Project 2 is finished, although Phase Doctor is not used operationally due to difficulties in extraction of system temperature from log files. This problem will be overcome by the end of 2002. Project 3 is at the very early stage. E) Investigation of instrumental errors 1) Investigation of effects of polarization leakage on measurements of group delay; 2) Investigation of effects of lack of phase calibration measurements in RDV experiments on results at ONSALA; 3) Investigation of differences VLBA versus Mark4; 4) Investigation of differences Mark3 versus Mark4. This work is done mainly by correlator personnel. Strange as it may seem, but analysis community does not consider this topic as important. Project 1 was adjourned many years ago; project 2 is not yet completed. 3. Problems of VLBI data analysis. ================================== a) Comparison of 40 sessions recorded in 256 Msample/sec mode with 40 sessions recorded in 56 Msample/sec did not show an improvement at the level of 10% or more. How observations and analysis should be organized in order to convert increasing the total number of recorded bits in 4.5 times into a sizable improvement of accuracy? b) It is known for more than 15 years that the formal uncertainties which Calc/Solve produces should be scaled by the factor of 1.3-2.0 despite of use rather sophisticated reweighting procedure. The reason is not yet found. c) Even a cursory analysis shows that the variations of the adjustments of clock function has noting common with H-maser phase fluctuations, but a manifestation of systematic errors, presumably of instrumental origin. While clock function variations with characteristic time more than 3 hours are filtered out by estimating additional parameters, fluctuations at time scale 1 hour and shorter affect result as a significant source of noise. "Furry" plots of residual + adjustments of clock function show that it is rather a rule than an exception. What is the origin of instrumental errors? How one can reduce them? d) VLBI error budget was last investigated and updated in early 1990s. What is the contribution to group and phase delay of thermal noise at the receiver, contribution due to troposphere noise, troposphere mismodeling, mismodeling other geophysical parameters, contribution due to source structure etc. to modern 256 Msample/sec observations ? We do not know. e) Unaccounted variations of source flux density and antenna's SEFD results in getting achieved SNR within at the 50% of predicted for significant number of points what makes session simulation somewhat guessing. Although technology of calculation of antenna's sensitivity and source mapping was a subject of intensive and successful development by astrophysical VLBI community, a very coarse mapping program is used. But even these results are not completely utilized. f) VLBI technique is still not reliable according to industrial standards. Data losses 20% are considered as good (would you fly the air company which routinely cancels every 5-th flight?) with 2-3% data discarded during analysis? What is the reason of data losses? How one can reduce it? 4. How accuracy of VLBI results can be improved. ================================================ Analysis in chapter 2 showed that the modeling improvement is approaching to its saturation. Of course, any model can be improved but we should not expect that these improvement will significantly affect accuracy of results. Recent development of models of atmosphere pressure loading and isobaric mapping function required gigantic resources (40 Gb input data, more than 20000 lines of source code for processing), but it is a challenge to show that these refined models provided even a marginal improvement of VLBI results. Presumably, the most promising activity here is computing source structure contribution and using it in routine data analysis. However, this task requires even more resources than using global meteorological models for loadings and troposphere mapping function. Thousands of source maps are available. Organizing a database, development of the procedure for computing source structure contribution, interpolation, extrapolation, determination of the immovable point on the sequence of maps will require at least several years of systematic work. Preliminary results of O. Sovers gives us as a hint what a level of improvement we can expect: reducing wrms at several picoseconds in quadrature. Calculation of source structure contribution can help us to improve positions of sources, especially the sources with significant structure (index 3-4), but promises a very moderate improvement of geodetic results. So far nobody proposed any other ways of refinement of models than those which are already implemented or in the process of implementation. Another traditional way for improvements of results is a refinement of a parameters estimation procedure. The parameters estimation procedure may be changed in two ways: change in parameterization and change of the optimization functional. Changes in parameterization may include both estimation of new exotic parameters such as antenna's antenna's axis tilt, antenna's bending, non-linear site motion, source proper motion, etc and modification of existing parameters, for example, estimation of nutation expansion instead of nutation daily offsets, modeling clock function by a spline of higher order, etc. Refinements in this way can remove from the residuals the vestiges of the signal which has not been taken away by the estimation procedure. One could expect a significant improvement in pursuing this avenue if the residuals had exhibited systematic behavior. We do not see it. Amount of unmodeled signal can be evaluated indirectly by comparing statistics of the residuals with the statistics of the errors. Unfortunately, we do not have today a good model of Mark-4 VLBI errors. Modification of the least square estimator is feasible if we have a knowledge about the noise process. If the noise process is "good", for example, stationary, and we know some of its characteristics we can exploit this additional information in building the estimator. But without, development of a refined model of errors it is not possible to discuss it seriously. It should be noted that usually sophisticated methods of parameters estimation require substantially much more computer resources than the standard LSQ, therefore, it is unclear how far we can go along this avenue. Remaining way is to look at the origin of VLBI observables. Contrary to a popular believe, time delays which are loaded in Solve are not created by a correlator, like loaves of bread do not grow in a field. They are obtained as a result of a rather sophisticated data analysis procedure which I will call post-correlator data analysis. Traditionally Solve data analysis and post-correlator data analysis are broken apart. It is done by different teams which rarely communicate and barely understand each other. One Mark-4 observation is obtained by processing of 10-40 Gbits. Correlator reduces this amount to several hundreds Kbytes. Post-correlation software analyzes this data set and computes five quantities: single-band delay, multi-band group delay, phase delay rate, multiband phase, amplitude of the coherence function as well as the estimates of their formal uncertainties which are further used in Solve analysis as normal points. Is that procedure perfect? Even a cursory look at the fringe plots shows that is is not so. Algorithm of post-correlator analysis had little improvement since beginning of Mark-3 epoch. It assumes that the phase of cross-correlation function is affected only by a random thermal noise in the receiver and its standard deviation is reciprocal to the fringe amplitude. This assumption is rather far from reality. Phase as a function of time and frequency within a scan shows rather clear systematic behavior. In many cases it can be easily diagnosed as unaccounted effect of ionosphere or troposphere. It is not uncommon that phase changes at more than one phase turn during integration time what severally affect the estimates of total group delay. These phenomena are known for post-correlator analysts but nowadays are out scope for Solve analysts. Plenitude of unresolved problems, some of them, for example, correcting fringe phase for ionosphere contribution, are relatively easy, makes this area very attractive as a target for analysis improvement. We saw that our physical models reached the level at which their further improvement does not affect VLBI results significantly, and a refinement of the parameters estimation procedure requires detailed knowledge of noise process. We believe that there are three main sources of noise in delays: troposphere noise, measurements noise and instrumental errors. Analyzing data at the the intra-scan level, level of accumulation periods (AP), promises us to improve estimates of group delays and understand the source of noise. It promises to solve the above mentioned problems b,c,d. 5. Evolution of VLBI data analysis. =================================== We should realize that unless we change our traditional approaches to data analysis an expectation that increasing the number of recorded bits will be converted to improving accuracy is baseless. Our efforts for improvement of VLBI accuracy requires necessarily a refinement of the post-correlator procedure. At the same time there is no need to undo that what correlator analysts do. Fringing software delivers about 1000 values of intermediary quantities, such as amplitudes and phases of cross-correlation function per AP, per channel, which can be used for a fine computation of delays. This values are currently included in the full experiment output, but are not put in the database and are not used. So, first we have to learn how to use it. There are several difficulties. First, we have to overcome a psychological barrier: group delays emerge not from the correlator, but are the result of data analysis, which starts not from group delays, but from phase and amplitudes of cross-correlation function at each accumulation period. Second, amount of input data increases at three orders of magnitude. We have to find the way how to manage them. Current Mark-3 DBH database handler cannot manipulate with the AP-by-AP data since it does not support a data structure of arrays with variable length. Replacement of the database handler in analysis software with the modern handler, like proposed GVH, becomes an imperative. Third, we have to find way how redundant intra-scan information can be used. Several ways of using intra-scan information are rather obvious: a) to compute ionosphere-free group and phase delay for each AP or a segment of APs, or to model total electron contents (TEC) by a spline; b) to model time variation of phase during a scan by some function, for example linear spline; c) to apply a procedure of reweighting for evaluation of group and phase delay, and to use postfit phase residuals as a measure of formal uncertainty of group delay instead of the average value of the fringe amplitude (which in turn is proportional to signal to noise ratio (SNR)). Presumably, there are other ways to exploit this information, for example, analysis of residual phases and amplitudes can in principle allow us to build the model of polarization leakage and to remove (or alleviate) its effect on group delay. Intra-scan information becomes the more valuable the higher SNR is achieved. Increasing bits rate can be used in two ways: to increase the total number of scans and to increase the SNR of each individual scan. As simulations showed, the first way is approaching to its limit at sampling rate 256 Msample/sec since relatively slow slewing rate becomes a limiting factor. Therefore, upgrade from 256 Msample/sec to 1024 Msample/sec rate will increase the total number of scans insignificantly, at about 20%, but will increase the SNR at the factor of 2. Therefore, data analysis technology should be oriented towards exploiting advantages of high SNR. Increasing SNR means that the contribution of random noise from the receiver for the same integration time affects phase and amplitude is reducing. It allows to investigate unmodeled effects in more details, find their model, and if the model is successful to convert them into calibrations. Intra-scan data analysis allows us to investigate systematic effects which vary at time scales of minutes, but it is relatively blind to the effects with characteristic time of change of ten minutes or longer. As it was shown, analysis of phase calibration amplitude and phase allows to correct phase calibration phase for the presence of spurious signals. It is desirable that the analysis of phase calibration information should be included in the procedure of data analysis and group delay refinement. Efforts for using intra-scan information, phase-cal phases and amplitudes promise us a) to improve precision of estimation of group delay; b) to get more realistic estimates of its uncertainty; c) to help to develop realistic errors model; d) to help to investigate instrumental errors. Although improvement of precision of group delay is valuable per se, it may allow us to cross the threshold which hinders phase delay ambiguity resolution. Usage of phase delays improves the formal uncertainty of results roughly by the factor of 3. This goal we should keep in mind in our strategy of development of analysis capabilities. 6. Conclusions. =============== Increasing the bits rate from 56 Msample/sec to 256 Msample/sec did not result in noticeable improvement of accuracy. It was shown that in order to exploit advantages of hardware upgrade, analysis concept should be revised. Data analysis should descend to the level of an accumulation period. Progress at other traditional areas, such as improvement of models and refinement of the estimation procedure, approaches to a saturation and does not promise a significant improvement of accuracy of geodetic results. At the same time data analysis at the intra-scan level is not yet sufficiently explored and promises a breakthrough. The focus of routine analysis should be shifted from the procedures which are already solved like automatic outliers elimination, automatic group delay ambiguity resolution, setting optimal parameterization, to a detailed analysis of intra-scan phase variations, variations of phase-cal phase, investigation of instrumental errors and the ways for reducing their level.