[Date Prev][Date Next][Date Index]

Re: Diffraction Issues ...




This is a comment on item 4 from John Quintana's original post

> 4) Data Formats/Transfer:  This is a subject where one is preaching to
>           the converted, but without doubt, the community is crying for
>           some kind of data format standard.  In fact, it has reached the
>           point where a 'bad standard' is better than no standard at all.
>           I perceive that the problem is one of implementation rather than
>           specification.  However, starting with:
>
>           Specification:
>                1) Definitions: We have the International Tables
>                and the IUCr for the basic definitions.  We should use them
>                (at least in the data file) and data should be stored in
>                terms of these definitions.  If the IUCr definitions are not
>                used, then annotations to what are used must be in each data
>                file stored and should be obscure enough to not confuse the
>                user.
>                2) Utilities:  Utilities must be provided to convert the
>                data to a flat ascii file, and to some other 'standard
>                formats'.   A library of utilities will be created since
>                conversion utilities will be created on a case by case
>                basis for different formats.
>                3) Platforms: UNIX, VAX, PC, Mac, you name it....
>                4) Physical Format: Unimportant provided that the format
>                is binary and routines exist to access the data via keyword
>                etc... on any platform.
>                5) Specification should include definitions for standard set
>                of experiments such as Crystallography, Reciprocal Space
>                Volumes, Powder Diffraction, DAFS.
>                6) Intelligence: Format should include keys for what the
>                relevant independent and dependent variables are so that
>                plotting programs can key off of them to give the user a
>                plot which makes sense.
>                7) Stored data can include: Raw Data, Cooked Data (e.g.
>                converted to absolute units with deadtime etc... taken out),
>                of Parboiled (e.g. only deadtime correction done).
>                8) Format and Utilities must be in the Public Domain
>
>            Implementation:
>                This is always the hard part (converting all those great
>                ideas into code).   The cost of implementation can be
>                greatly reduced if an existing general file specification
>                is used (e.g. HDF) Jon Tischler has some ideas on this.
>
>KC> Unless there are insurmountable problems with applying HDF (CDF, netCDF
>etc)
>KC> to our anticipated experimental data, I don't see how we could possibly opt
>KC> for starting from scratch and expect do any better.  A 'bad' standard that
>KC> uses up development resources and isn't any better than what already
>exists,
>KC> is as bad as (worse than?) no standard at all.

Yes, I have been trying to come up with an implementation and I have looked
at HDF, netCDF, ISO-8211 (Data Descriptive File for Information
Interchange), and FITS.

The choice was not clear to me until I found out that the newest version of
HDF (3.3r3) now supports the netCDF model within the Scientific Data Sets.
The currently available documentation does not say this (I got hold of some
very preliminary new documentation from NCSA).  With the joining of these
two models (HDF and netCDF) the new HDF libraries look like a winner.

The netCDF standard can store all types of experimental data and has
official methods for labeling each piece of data with important things that
one needs to know (like the units).  Unfortunately, the pure netCDF format
cannot group data into scans or entrys.  In netCDF, every file is only one
picture or one scan or one image.  Each scan would have to be stored in a
separate file.  HDF, on the other hand, has great ability to group and
organize data through the use of Vgroups.  With the merging of the two
standards, it becomes easy to organize data in the fashion that users
expect and to include the information needed to plot and ananlyze the data
at a latter time.

I have been designining an implementation in HDF 3.3r3 in order to see what
requirements must be imposed to obtain useful data files.  Many of John
Quintana's specifications listed above (2,3,&8) are automatically satisfied
by the use of HDF.  I beleive that with the netCDF features now available
in HDF, the other specifications can also be met by using the Vgroup
feature of HDF to organize multiple SDS's.  The remaining work is in
deciding the best method for tagging the stored information.

I will say here that I am trying to make heavy use of the attributes (a
feature from netCDF) to identify parts of the data to automate plotting and
analysis.  The presence of a 'units' attribute and the 'udunits.dat' file
from netCDF is a wonderful thing for an experimentalist to see.

With HDF 3.3r3 and a small number of additional standards, it should be
easy to store multiple types of scans (EXAFS, MCA, crystallography,
diffraction, status, CCD, ...) in one data file and to identify both the
preferred way to plot and/or analyze that data.
Writing such a standard data file should be easy to do.  And, reading and
plotting (or ananlyzing) such a standard data file should also be
straightforward.



Jon Tischler                                        Solid State Division
ORNL, Bldg 3025, MS-6030    internet  zzt@ornl.gov  	TEL  (615) 574-6505
Oak Ridge, TN 37831-6030    bitnet    zzt@ornlstc   	FAX  (615) 574-4143