Exploratory Data Analysis for Disease Pedigrees and Cancer Genetics

Data Analysis, Cancer Genetics

We have developed software packages for pedigree analysis (FAST, CASPAR, PedHunter, IIC) and
cancer genetics (oncotrees, METREX). The purpose of this protocol is to allow us to work on
software problems reports that may contain human subjects data in them. The data should be
coded, by which we mean that if there are any links between identifiers and names, we do not
possess the links to decode the names. Occasionally our assistance leads to a more formal
research collaboration. This protocol seeks to clarify the guidelines under which we can
provide assistance to users of our human genetics software and to establish a formal
procedure under which we can seek IRB approval for the serendipitous collaborations that
arise from providing that assistance. We cannot predict the sizes of samples or the
diseases studied in the data sets sent to us, so most of the medical aspects of this
protocol are necessarily general. We rely on the data being coded and the collectors of the
data having their own institutional approvals to protect against most risks. The scientific
aspects of investigating problem reports cannot be hypothesis driven because we cannot guess
what problems will arise. On the engineering side, the basic hypotheses are that: 1) our
software is likely to contain some bugs or other weaknesses, which cannot be easily found
except by having others use the software and 2) a good way to improve the functionality of
the software is to encourage users to submit problem reports and other suggestions.

This protocol has been in effect since early 2002. The only amendments during that time
were to set up three collaborations, as described in Sections 4.6 and 4.7 and 4.8. The
protocol has been quite useful and no changes are proposed in procedures.

Inclusion Criteria


User Requests:

When receiving data sets for problem reports and/or new feature requests, all data sets
would be accepted. We would not consider how the data were collected. The reasons for this
are: the medical aspects of the data set are irrelevant to the reason we receiving the
data; we cannot respond promptly to problem reports, if we have to get details on how the
data were collected; the data are never used by us for any research into the traits being
studies by the researchers who collected data.


For collaborative studies, we would request details of how the data were collected,
including evidence of approval by a local ethics board. We would submit to the NHGRI IRB
an amendment describing the new collaboration. That amendment would necessarily include a
formal indication that the collaborating research group has permission to collect and
analyze the human data that they present to us (in coded and summarized format). For
collaborators in the United States that permission would consist of an IRB-approved
protocol or exemption from the collaborator's institution. For collaborators outside the
United States the permission would be one of the types of agreements currently supported
by the NIH Office of Human Subjects Research.

If we do not see evidence of appropriate permission to collect the data or the IRB turns
down our proposed amendment, then we would exclude ourselves from participating in the
proposed collaboration.

There are two other circumstances under which we have also excluded collaborating on
analysis of data sets in the past and may do so in the future. One circumstance was where
we did not feel that the proposed data set could possibly give sufficient statistical
power to detect anything interesting. The other circumstance was where we had an existing
collaboration with one research group, and a competing group asked us to collaborate also.

