PASNP2 proposal: Pan Asian Personal Genome Project (PAPGP)
Summary of the meeting
HUGO Pan Asian Personal Genomics Initiative Proposal for mapping Asian population diversity
HUGO Pan Asian Personal Genomics Initiative consortium
On 15th of March 2010, Pan Asia SNP consortium participants held a HUGO meeting session for the next stage PASNP project (PASNP 2.0). The main proposal and discussion point was to map Asian population diversity using whole genome sequencing methods rather than SNP genotyping chips that were used in the first phase (2003-2009). It was due to rapid cost drop for sequencing whole human genomes. Critical issues were discussed on what experimental approaches should be used on how many samples for what research purposes.
PASNP is a relatively loosely woven consortium with the spirit of working together respecting the technological and cultural gaps among participating groups and people. The PASNP 1.0 for mapping genetic diversity was a huge success in that it overcame numerous hurdles in technology, science, diplomacy, funding, and politics. It gathered over 2000 human samples from more than 71 Asian ethnic groups. The result showed the diversity magnitude of Asians. It also revealed the southern human migration route linking Africa, India, Southern Asia, and East Asia. However, the some drawbacks were the low density of SNP chips used and the lack of detailed genetic association with phenotypes.
A formal proposal for the PASNP 2.0 was presented to open a discussion on the principles, protocols, questions, sample number, and analysis technologies. It aims to add more detailed genetic association information as well as definite genomic diversity mapping for Asians. Also, the meeting was to invite new participants as well as previous PASNP consortium members.
Major points raised in the session and subsequent meetings were on 1) how to coordinate with other existing large scale genome diversity projects. Merging Pan Arab, Pan Asia, and Pan African projects were suggested, 2) how to design the project to set the right questions, 3) what experimental protocols we choose as soon as possible so that the projects can move on quickly, 4) how much phenotypic/trait information should be collected, 5) how a full voluntary consent can be guaranteed from such a diverse population groups with heterogeneous political/social backgrounds, 6) how to educate and train researchers and participants about genomics concept, 7) how to manage the procedural issues, 8) how HUGO and large institutes can facilitate research funding, 9) how fast the project should proceed to be effective on time in such a fast moving genomics field, 10) how many samples should be collected to answer what specific questions in what amount, and 11) how we can invite the participation from diverse regional research groups.
The original draft suggestion of the PASNP 2.0 proposal was using an openfree protocol of PGP (Personal Genome Project) by George Church of Harvard Medical School. PGP is an open project with very well thought out sampling, phenotyping, analysis, data distribution, and ethics protocols.
Most discussed points were on how many samples should be collected in what fashion. Trio genome samples can reveal exact genome composition with phase information. However, it may be restricted to one minority family in an ethnic group. Another common approach is to accompany genotyping chips to the genome sequencing as diversity information provider. Even, a low depth re-sequencing can be done with a fraction of whole genome sequencing. Yet another approach is to map exon regions only to reduce the cost while one can increase the gene diversity with possible disease associated targetted gene sets.
Related to the first issue was to use targetted sequencing approach to study disease associations with thousands of samples in Asia. This way, one can map important genes with clear trait information from the start to make it practically useful for drug discovery. A question on this approach is that whole exome sequencing is not much cheaper than whole genome sequencing (WGS) and many people expect that tagged targetted sequencing will be obsolete when WGS becomes as low as $1,000 per genome. Yet, such in depth disease association must be done in the future in each country for diverse ethnic groups. Also, as long as detailed phenotype information is collected as George Church addressed in the follow-up discussion meeting, in depth disease association can be done by using WGS data in the future.
Regarding to the sample number, the consortium chair, Edison Liu suggested a modular approach not to hold the project progress. Mapping out all the details of exact sample number with precise phenotype information may require years rather than months. Therefore, it is wiser to start off the project by collecting samples from available ethnic groups and gradually increase the number in chunks. Some raised concern on this approach as the samples may not represent ethnic groups sufficiently. An argument on this is that one whole genome contains so much information of the its ethic group, it is still valuable to acquire small number of WGSs. If WGSs with low depths are accompanied, this modular approach can be quite practical. An audience suggested healthy old people's samples can be picked as there will be less ethnical issues. They can give more information on life-time health aspects. Acquiring useful and necessary trait information is supposedly one of the most important aspects of sampling. Some people raised a concern on the difficulty of acquiring extensive phenotype information from subjects especially indigenous ethnic groups.
An interesting advancement was Arabic region researchers showed interest in joining the PASNP 2.0. In the discussion meeting on 16th March, researchers from the region turned up to express their interest and asked questions. By joining them and African genome project (H3Africa) led by Charles Rotimi who presented the project in the session, PASNP consortium can link Africa, Arabia, and Asia as one international genomics initiative.
The second major point was whether we set out with clear questions on what research outcome we will achieve or not. This question-driven approach will result in efficient problem tackling and well-defined and meaningful analyses. The other approach is data-driven where producing much genome sequences provide the researchers all over the world an important information infrastructure as a foundation of future genomics research in Asia.
One of the most agreed important issues is the need for educating both the researchers and subjects on fast advancing genomics. Next Generation Sequencing method is still not well known for many researchers in practical terms such as the cost, the most suitable sequencing platforms, data types, and what research projects are applicable. By providing training to researchers who are less exposed to NGS will accelerate the genetic diversity in Asia.
On the ethics side of the project, PGP protocol can be useful for an international genomics project as it makes sure that subject does understand the consequences of donating her/his genetic information. PGP protocol dictates that every subject must pass an exam on the project and possible consequences such as her/his genome information will be publicized and will be used for research in many different ways. The subject must fully and voluntarily consent to the idea that the data are public and anyone can tract her/him down using various publically available information. George Church cautioned that some participants do not know that they do not know the meaning of consenting on the public use of the genetic information. Also, certain ethnic groups can sue the consortium for using the data other than what they think they have agreed on where to use it after many years. PGP protocol can reduce such ethical and legal issues.
Many participants agreed to that the next step of PASNP 2.0 will be practical and researchers were eager to present the progress the consortium in the next gathering of the consortium members. A preliminary data presentation on sample analysis on some Asian human genomes is expected before October 2011.
To accelerate the project progress, it is sufficiently agreed by the members that common procedures and protocols for sequencing and bioinformatics analysis should be determined as soon as possible.
Unlike many large scale international collaborations in genomics with dedicated project fund secured, PASNP 2.0 project was based on the contribution of diverse researchers who voluntarily participated in the consortium. Cheap sequencing and rapid increase in computer analysis capacity are enabling bottom-up approach of performing large scale genomics initiative. PASNP 2.0 can not only provide invaluable whole genome diversity information to science but also suggest a paradigm of international collaboration that can accommodate as much diversity as the genomes and cultures.