OPENDAP and GDS support for Giovanni

Support of downloading files that reside on Opendap and GDS (GrADS Data Server) dataserver is enabled in the newest branch of Giovanni. These are notes about implementation issues.

What are Opendap and GDS?

Opendap and GDS are a set of CGI programs that resides on a remote http host and enables 1) retrieving meta-information; 2) retrieving a subsection of data. Opendap also provides client software. Although authors of opendap believes their client software established a new paradigm in the data exchange Universe, I am rather skeptical about its usability.

So, one can consider Opendap and GDS as a data subsetter.

Implementation in Giovanni

As of 2009.04.24, Giovanni can read from opendap and gds servers. From the point of view of Giovanni, opendap and gds are separate protocols, although technically they use the http data transmission protocol. In Giovanni guts, URLs of data in an opendap and gds server start from opendap:// and gds:// respectively in order to distinguish them from URLs of files that are processed directly by ftp and http without CGI programs.

opendap inventory record

In local inventory the field protocol should be opendap. Example: ( gdev /tools/gdaac/DEV/cfg/G3/data/TL3ATD_local.002.Inventory )

<dataInventory>
<transferProtocol>opendap</transferProtocol>
<transferHost>discette.gsfc.nasa.gov</transferHost>
<transferDirectory>opendap/2008/TES/TL3ATD.002/YYYY.MM.DD</transferDirectory>
        <files>

gds inventory record

In local inventory, the field protocol should be gds Example ( gdev /tools/gdaac/DEV/cfg/G3/data/CLM10SUBP.002.Inventory )

<dataInventory>
<transferProtocol>gds</transferProtocol>
<transferHost>agdisc.gsfc.nasa.gov</transferHost>
<transferDirectory>dods/</transferDirectory>
	<files>
		<file>
			<fileName>GLDAS_CLM10SUBP_3H.dods[@]</fileName>
		        <startTime>1979-01-01T00:00:00</startTime>
		        <endTime>2009-02-05T21:00:00</endTime>
			<fileSize>0</fileSize>
			<geoBoundingBox>
				<south>-60.0</south>
				<north>90.0</north>
				<west>-180.0</west>
				<east>180.0</east>
			</geoBoundingBox>
		</file>
	</files>
</dataInventory>
One can consider that GDS manipulates with a huge file that contains a multi-dimensional arrays with the first dimension being time. Giovanni does not download such a huge file, but take a slice of that array with a section of that array that corresponds to the requested epoch. NB: the inventory file should have VALID endTime.

opendap data descriptor record

Data description file should have two mandatory fields inside the hdfParameter group, section and fillValue:
        <section>/HDFEOS/GRIDS/NadirNightGrid/Data%20Fields/TATM[0:89][0:82][0:14]</section>
        <fillValue>-999.0</fillValue>
Parameter name should followed by section specified in the form [beg1:end1][beg2:end2]... NB: the order of sections should be the same as the order of dimension in file description.

gds data descriptor record

Data description file should have two mandatory fields inside the hdfParameter group, section and fillValue. A dummy time subsection [@] should be always specified as the first subsection. Example:
        <section>avgsurft[@][0:149][0:359]</section>
        <fillValue>9.999E20</fillValue>
Giovanni will compute the relevant time index itself and substitute it instead of [@].

Performance

Neither opendap, not gds supports compression on-the-fly themselves, but if the http server allows enabling compression on-the-fly and a client requests compression, the openadap and/or gds sever will send compressed data stream, which may significantly reduce download time.

Unfortunately, apache disables enabling compression by default. Therefore, we need to beg a sysadmin of an opendap/gds server to enable honoring requests of compressed data. This may be achieved by more than one way. In order to learn whether a specific http server supports encoding or not, run this command:

wget --server-response -O /dev/null http://astrogeo.org | & cat | grep Encoding
If the server returns nothing, then it is not configured to support encoding. If it supports encoding, it returns this:
  Vary: Accept-Encoding

Notes on implementation

Giovanni when, accesses to opendap or gds server, reads data arrays as a stream of bytes. It allow parses data description file and creates an hdf4 files using information from the data description file and the arrays with data retrieved from servers. It does not put in the output hdf4 files all information that opendap and/or gds server has. It puts in the output hdf4 files only a minimum meta information that is requited for readers be able to read the file.

Giovanni uses the same Python program for reading opendap and gds servers.

Giovanni opendap/gds reader may optionally enable compression in the output hdf4 files. As of 2009.04.24, compression is disabled.


Back to Leonid Petrov's discussion page.

Last update: 2009.04.24_16:26:34