OPENDAP and GDS support for Giovanni
Support of downloading files that reside on Opendap and GDS
(GrADS Data Server) dataserver is enabled in the newest branch of Giovanni.
These are notes about implementation issues.
What are Opendap and GDS?
Opendap and
GDS are a set of CGI programs
that resides on a remote http host and enables 1) retrieving meta-information;
2) retrieving a subsection of data. Opendap also provides client software.
Although authors of opendap believes their client software established
a new paradigm in the data exchange Universe, I am rather skeptical about
its usability.
So, one can consider Opendap and GDS as a data subsetter.
Implementation in Giovanni
As of 2009.04.24, Giovanni can read from opendap and gds servers. From
the point of view of Giovanni, opendap and gds are separate protocols,
although technically they use the http data transmission protocol.
In Giovanni guts, URLs of data in an opendap and gds server start from
opendap:// and gds:// respectively in order to distinguish
them from URLs of files that are processed directly by ftp and
http without CGI programs.
opendap inventory record
In local inventory the field protocol should be opendap.
Example: ( gdev /tools/gdaac/DEV/cfg/G3/data/TL3ATD_local.002.Inventory )
<dataInventory>
<transferProtocol>opendap</transferProtocol>
<transferHost>discette.gsfc.nasa.gov</transferHost>
<transferDirectory>opendap/2008/TES/TL3ATD.002/YYYY.MM.DD</transferDirectory>
<files>
gds inventory record
In local inventory, the field protocol should be gds
Example ( gdev /tools/gdaac/DEV/cfg/G3/data/CLM10SUBP.002.Inventory )
<dataInventory>
<transferProtocol>gds</transferProtocol>
<transferHost>agdisc.gsfc.nasa.gov</transferHost>
<transferDirectory>dods/</transferDirectory>
<files>
<file>
<fileName>GLDAS_CLM10SUBP_3H.dods[@]</fileName>
<startTime>1979-01-01T00:00:00</startTime>
<endTime>2009-02-05T21:00:00</endTime>
<fileSize>0</fileSize>
<geoBoundingBox>
<south>-60.0</south>
<north>90.0</north>
<west>-180.0</west>
<east>180.0</east>
</geoBoundingBox>
</file>
</files>
</dataInventory>
One can consider that GDS manipulates with a huge file that contains
a multi-dimensional arrays with the first dimension being time. Giovanni does
not download such a huge file, but take a slice of that array with
a section of that array that corresponds to the requested epoch.
NB: the inventory file should have VALID endTime.
opendap data descriptor record
Data description file should have two mandatory fields inside the
hdfParameter group, section and fillValue:
<section>/HDFEOS/GRIDS/NadirNightGrid/Data%20Fields/TATM[0:89][0:82][0:14]</section>
<fillValue>-999.0</fillValue>
Parameter name should followed by section specified in the form
[beg1:end1][beg2:end2]... NB: the order of sections should be the same as the
order of dimension in file description.
gds data descriptor record
Data description file should have two mandatory fields inside the
hdfParameter group, section and fillValue. A
dummy time subsection [@] should be always specified as the first subsection.
Example:
<section>avgsurft[@][0:149][0:359]</section>
<fillValue>9.999E20</fillValue>
Giovanni will compute the relevant time index itself and substitute it
instead of [@].
Performance
Neither opendap, not gds supports compression on-the-fly themselves, but
if the http server allows enabling compression on-the-fly and
a client requests compression, the openadap and/or gds sever will send
compressed data stream, which may significantly reduce download time.
Unfortunately, apache disables enabling compression by default. Therefore,
we need to beg a sysadmin of an opendap/gds server to enable honoring
requests of compressed data. This may be achieved by more than one way.
In order to learn whether a specific http server supports encoding or not,
run this command:
wget --server-response -O /dev/null http://astrogeo.org | & cat | grep Encoding
If the server returns nothing, then it is not configured to support encoding.
If it supports encoding, it returns this:
Vary: Accept-Encoding
Notes on implementation
Giovanni when, accesses to opendap or gds server, reads data arrays as a
stream of bytes. It allow parses data description file and creates an hdf4
files using information from the data description file and the arrays with
data retrieved from servers. It does not put in the output hdf4 files
all information that opendap and/or gds server has. It puts in the output hdf4
files only a minimum meta information that is requited for readers be able
to read the file.
Giovanni uses the same Python program for reading opendap and gds servers.
Giovanni opendap/gds reader may optionally enable compression in the output
hdf4 files. As of 2009.04.24, compression is disabled.
Back to Leonid Petrov's discussion page.
Last update: 2009.04.24_16:26:34