Motivation
Biomedical statistical data is often published as Excel sheets. Thus they have the advantage of being easily readable by humans. However, they cannot be queried efficiently. Also it is difficult to integrate with other datasets, which may be in different formats. Our approach is to convert the data into a single data model – RDF. But in these biomedical datasets, a single statistical value is described in several dimensions. Thus a simple row-based transformation is not possible. Therefore, we used
The RDF Data Cube vocabulary for the conversion as it is designed particularly to represent multidimensional statistical data using RDF.
Methodology
Transforming CSV to RDF in a fully automated way is not feasible as there may be dimensions encoded in the heading or label of a sheet. Therefore, we introduce a semi-automated approach as a plug-in in
OntoWiki. More information
here. Using this plug-in, a CSV file can be converted to RDF using the Data Cube Vocabulary. We used the WHO's
Global Health Observatory dataset as a first use case. It is primarily available as Excel sheets. We converted them to CSV files and then transformed them into RDF.
The following screenshot displays a GHO table configured for conversion into RDF when imported into the OntoWiki CSV import plug-in:

Results
After converting the data reported for the mortality and burden of disease in GHO, classified according to countries and region, we obtained an RDF dataset containing 3 million triples.
Following is the example of a single statistical item, the death value of 1098, from the GHO dataset represented using the Data Cube vocabulary:
eg:o1 a qb:Observation;
qb:dataset eg:dataset-in1;
eg:refArea Afghanistan;
eg:contact 3010;
eg:stat_pop 11605;
eg:disease Tuberculosis;
eg:gbdcode W0003;
eg:death 1098.
Further Information
- This is a
short presentation describing the process of conversion of the CSV files to RDF using the
SCOVO (Statistical Core Vocabulary) in
OntoWiki. SCOVO is an earlier version of the Data Cube Vocabulary and the conversion process is similar for both.
- This is a
position paper that was accepted for a presentation at the
Ontologies in Biomedicine and Life Sciences workshop held at Mannheim (Germany) from September 9 – 10, 2010.
- This dataset is also part of the LODD datasets. Click
here for more details.
Download
To use the plug-in, you need to install OntoWiki
via github and enable the extension by:
«cd ontowiki/extensions» and «git clone git://github.com/AKSW/csvimport.ontowiki.git csvimport».
OntoWiki and enable this extension.
The RDFized GHO files can be downloaded from
here. The original data is present
here.
Contact
| Amrapali J. Zaveri Johannisgasse 26, Zimmer 5-10 04103 Leipzig
| |||||||
Information
Last Modification:
2012-05-13 17:06:14 by Amrapali Zaveri