Towards a Cureus Data Repository

*Co-authored by John R. Adler & Achim Schweikard

Most scientific articles come with some form of data. For Cureus, our data includes medical images, graphics, drawings and tables. Access to such data enables a reader to verify the content and credibility of an article while also better understanding it. Many types of data, provided they are findable via search, can have value to readers independent of the article with which it was originally connected; medical imagery and genetic sequencing information being cases in point. For example, with modern methods of machine learning requiring large data sets for training, journals are a logical repository for such information. Therefore a peer-reviewed article that provides access to data will oftentimes prove much more valuable than that same article without data. Especially for internet-based media with access to large amounts of digital storage space, data sharing between authors and readers would and should become a standard. As an example, instead of showing data for a single representative case in an article, which is typical for traditional journals, we could make data for all cases in this article available to readers. Readers could then download such enhanced data sets directly in electronic format.

Given the above argument, it would appear highly desirable for Cureus to develop practical methods for structuring data and making it accessible via search. It goes without saying that the additional effort needed on the side of the authors should be minimal while the cost of making data available to others needs to be reasonable. With these objectives in mind, how might the Cureus community go about setting up an article-linked data repository?

Currently, most medical images in Cureus are in JPEG or PNG formats. Clearly, our repository should be open to standard medical image modalities in common formats. This includes CT, MR, x-ray, ultrasound data, and others in Dicom format.

Although all data would be available in electronic format, additional tools would be needed to enable intelligent search. Perhaps a straightforward way to address this problem would be to add label fields for images in the journal interface, which would let authors set tags for the image content in various ways. Such label fields could for example be modality, anatomy, pathology, entity, etc.

Finally, we would need a way to include the data in the peer-review process. Reviewers should be able to check whether the images submitted are adequate in terms of content, space requirements and other conditions. This is necessary, because once published, data sets must remain available without limitations.

Such a repository would require additional effort on the side of the editors and reviewers, but also on our journal’s programming team, and last but not least, on the side of the authors.

Now, if we decide to go this route, here is what we’d propose as a first step towards our new data repository:

  • We now allow for uploading full DICOM image data sets, and other types of numeric or image data.
  • DICOM image data sets are typically associated with images within the article. While the images remain in their native formats (.jpeg, .png, etc.), the DICOM image sets (3D or even 4D image sets) are made available for download behind the current images.
  • When uploading the data sets, authors must fill in tag fields, on pull-down menus. The tags provide information on image modality, anatomic site and entity.
  • If the article is a case report, we allow for uploading image data for the single case in point.
  • If the article is a study, with data for more than one patient, data for all patients can be uploaded.
  • Based on the tags provided by our authors, data sets from all papers across the entire data base are included into the journal’s internal search engine. This would allow us to report, for instance, all image data sets available within the data base for a specific anatomic site, and entity.
  • In addition, data correlations can be made available. This means that, for instance, matching image data and ECG data can be found.

We would love to know what the Cureus user community thinks about the above proposal. Leave a comment and let us know!