Genomics and predictive medicine

Progress in understanding of structural and functional human genome organization and deciphering primary DNA sequence in human cells allowed for hitherto unreachable new capabilities of medical genetics in identifying the causes and mechanisms of inherited and inborn pathology. Implementation of genetics into medicine is progressively advancing along with improvement of molecular analysis of genome. Knowledge of genome and its functions allows to provide more accurate diagnosis, predict, to a considerable extent, the presence of genetic predisposition of a person to pathology, and to assess the chances for developing one or another disease. This approach became the basis for a new area of medical genetics named predictive medicine. The progress of predictive medicine refl ects success in tremendous upgrowth of molecular genetic methods and new capabilities of studying structure and functions of genome. Within less than 15 years after deciphering genome, medical genetics has travelled a long way from a single gene analysis to whole genome studies, from screening of genetic associations


Introduction
The article reviews a history of predictive medicine (PM), its progress with a brief period of disappointment, its further revival, rapid development, and the nearest and long-term prospects.
The review aims at increasing awareness of medical professionals, medical students, and specialists working in medical institutions of the need for in-depth knowledge of medical genetics and at drawing their attention to the essential achievements in this relatively new area of molecular medicine.
Deciphering human genome in 2000s predetermined the transition of medicine to the molecular (genomic) level. Genetic tests aimed at a search for new methods of diagnostics, prevention, and treatment for hereditable diseases (HD) and common non-hereditary diseases became widespread. Over 1,500 genes with mutations responsible for thousands of HD have been identified for less than 10 years.
From this perspective, PM became a natural outcome of implementing the advances of human genetics into medicine. Considering the uniqueness of every person's genome and a realistic possibility for its analysis at any stage of ontogenesis [6], PM is a priori not only a predictive, but also a personalized (individual) and preventive (preemptive) discipline. Obviously, this consideration clarifies the meaning of its full name: Predictive, Preventive, and Personalized Medicine (PPPM) or 3P-Medicine [7]. Key data on human genome, mutations, polymorphisms, and HDs are summarized in the monographs [710].
In this regard, PM is a field of genomic medicine focusing on the presymptomatic identification of persons with a high predisposition to HDs and common MFDs and aiming at their prevention, diagnosis, and treatment.

Achievements and disappointments
During the early 2000s, human genome studies allowed for identifi cation of genes responsible for the most HD as well as for elucidation of rare monogenic forms of common MFDs including osteoporosis, Parkinson's disease, Alzheimer's disease, certain forms of cardiovascular pathology, and some neuromuscular disorders [1]. Besides, the same studies convincingly demonstrated insuffi cient prognostic strength of genetic testing (GT) for MFDs. Missing heritability phenomenon acquired its name from the prognostic and information defi cits of GT in regard to MFDs. The causes of this phenomenon sparked a lively discussion in 2009-2013 and have been thoroughly studied (see below) [11][12][13][14].
A conceptual basis of PPPM and its quintessence consists in genetic passport (GP), an individual DNA database representing the unique genetic features of person and their predisposition to some genetic disorders and MFDs. It is noted that the GP today only allows to estimate the extent a person belongs to the risk group for a particular MFD, but does not allow to reliably assert the future presence or absence of this MFD in a particular person.
It is also important to note that GP is by no means the complete primary nucleotide sequence of the entire genome, but only those genomic components (fragments) for which a causal relationship with human pathology has been shown (proven). The main causes of this uncertainty are rooted in the complexity of real intergenic and protein-based interactions in the organism, in diffi culties to unbiased assessment of contribution of epigenetic and exogenous factors to the occurrence of MFDs. Prospects of implementing the advanced genomics techniques in PM and its future as an ancestor for precision medicine, GT, and current version of electronic genetic health chart [8] are discussed in the last section of the review. Special attention is given to the analysis of MFDs from the perspective of systems genetics i.e. the personalized study of MFD pathogenomics at all levels of unfolding hereditary information [15].
Therefore, PM has come a long way from a single gene testing to whole genome studies, from gene networks and association screening to whole genome sequencing, and from GP with single gene variations to genetic health chart based on individual omics studies (genome, transcriptome, and metabolome). Over the past 20 years, PM has not only signifi cantly evolved and successfully endured all the diffi culties of establishing and implementing new ности становления и внедрения новых технологий, но по мере прогресса ГТ стала важным источником оперативной информации для клинической и профилактической медицины.
Несмотря на генетическую идентичность, совпадение в парах близнецов за малым исключением (форма ушей, technologies, but, along with the progress in GT, it also became an important source of operational information for clinical and preventive medicine.
The history of the emergence and development of PM has much to do with the progress in the development of human research areas. Current PM is the recognized methodology for the modern science of genomic medicine. The clear parallelism between the chronology of advances in genomics and PM is presented in Table 1.
It is worthy to note that the concepts of PM and GP came into existence on the Spit of Vasilievsky Island in Saint Petersburg back in 2000s [7]. Their introduction to genetics has been occurring progressively and continuously [16]. Genetic medicine focusing on genetic/monogenic diseases emerged due to the development of molecular methods, whereas implementation of genomic technologies such as genome-wide association studies (GWAS), new-generation sequencing (NGS), bioinformatics methods, and omics analysis resulted in the shift of research interests towards multifactorial pathology. The era of genomic medicine and its derivatives including molecular, predictive, translational, personalized, and precision medicine began. Information on candidate genes and variants associated with MFDs may be obtained from the various databases and international catalogues: OMIM (http://www.omim.org).
Внедрение новых технологий (GWAS) идентификации и тестирования генов-кандидатов, ассоциированных с МФЗ, не привело к повышению эффективности предиктивного тестирования [18]. Увеличение числа генов, ассоциированных с МФЗ, стимулировало создание сложных наборов генов или SNP-панелей, которые применяются для ГТ наследственной предрасположенности. Некото-Despite the genetic identity, the phenotypic concordance with few exceptions (shape of ears and eye, skin, and hair color) in twin pairs is within 7080% and within 4045% for typical most common MFDs (diabetes, atherosclerosis, and hypertension). With a high probability, this means that the fi nal estimation of familial risk for MFDs will not reach 100% even while testing all known genetic predisposing factors (alleles) and cannot exceed the frequency of concordance of phenotypic traits in identical twins. Therefore, the complex genotype-and phenotype interactions in the processes of ontogenesis are the main reason for the diffi culties in interpreting the GT results on MFDs. According to the well-known American geneticist E. Lander, solving the results of GT for MFD may be likened to the Gordian knot, which could be loosed only by cutting it, and, in case of PM, by studying large number of genomes by NGS method in patients with accurately established diagnosis and detailed results of laboratory and clinical tests [17]. Large collections of biological samples known as biobanks are being created to store large amounts of data. It is believed that implementation of NGS method will allow to identify new candidate genes and pathogenetically signifi cant mutations in MFDs and perform qualitative and quantitative analyses of abnormalities such as copy number variations (CNV). The Patient Centered Outcomes Research Institute (PCORI) in the U.S.A. has been created to match patient medical charts containing the results of patient laboratory tests with their personal genome data to improve the quality of diagnosis, prevention, and treatment of common diseases. In 2015, the Precision Medicine Program has been launched with the participation and offi cial blessing of the U.S. President B. Obama. According to Francis Collins, study results obtained on such a large group would allow to obtain the proof of concept of precision medicine.
The U.S. projects "All of Us" (https://allofus.nih.gov/ news-events) or "Self-Made Medicine" are the ideological continuation of Precision Medicine Project. They aim at solving the tasks of personalized medicine. Their goal consists in a search for and improvement of interpretation of genomic and medical data. The greatest American universities are involved in the project including Baylor College of Medicine, Johns Hopkins University, University of Texas, Washington University, and University of Cambridge. The projects comprise complete data on health of volunteers undergoing testing for 59 genes involved in severe HD.
The European Association for Predictive, Preventive and Personalized Medicine was established in the countries of European Union in 2015. The society published the program "Personalized Medicine for the European citizen -towards more precise medicine for the diagnosis, treatment and prevention of disease" (2015).
its prognostic value is not high (see below). The SNPbased genetic risk calculation is insufficient for prediction of complex MFDs, i.e. it does not completely cover the missing heritability. Therefore, it is believed that a search for causal genes of MFDs by the GWAS method should be complimented by exome sequencing and even whole genome sequencing aimed at identification of rare SNPs [19,20].
The accuracy of prognosis in some diseases increased to 20% (prostate cancer) and even 80% (Crohn's disease) while using such GT. The success is explained by the creation of sophisticated high-density panels allowing for simultaneous testing of several hundred predisposition genes or target DNA fragments containing clusters of such genes. The study of predisposition gene panels aimed at search for mutations and disadvantageous polymorphic sites is ongoing.
Testing of panels for MFD predisposition genes is used in some national laboratories and commercial companies. The gene panels for testing genes related to the detoxification system, cardiovascular pathology, osteoporosis, diabetes mellitus, etc. underwent significant modifications (табл. 2).

PM and pathogenomics of MFDs
An increase in the number of candidate genes was found to be of little success in improving GT for inherited predisposition to MFDs. A comparative analysis of functional activities of causal genes at different levels (genome, proteome, and metabolome) allows for understanding the pathological process dynamics, i.e. MFD pathogenomics (landscape) is more promising [13,22]. Each technology on its own cannot describe the entire pathological process.
According to EAOPM program, PM roadmap includes mass sequencing of genomes as an essential element aimed at elucidation of their population, ethnic, social, and even tissue-specifi c features. Integrative analysis of gene expression and protein-protein interactions allow to form individual omics profi les, which undergo clinical approbation, i.e. are compared with the results of clinical and laboratory tests in the same patient. The availability of these data allows to create the integrated gene networks of damaged organ systems in a patient and work with them in virtual reality. Therefore, patients themselves are both information source and users of PPPM data. Roadmap focus on the preferred work with the virtual models of patients rather than with the patients themselves deserves a further discussion along with the term "precision medicine". "Personalized medicine" is a more adequate term.
The analysis of GWAS screening outcomes allowed to conclude that MFDs are not the result of a superposition of disadvantageous allelic variants of many genes. The common polymorphisms of candidate MFD genes on their own are insuffi cient to explain the onset of MFDs. Indeed, the hypothesis "common diseases -common genes", proposed to explain missing heritability phenomenon, was rejected. The attempts were made to explain the development of MFDs through the presence of rare dominant alleles present in the genome with a frequency less than 0.5%, which, therefore, could not be detected by GWAS. It led to establishing the hypothesis of rare alleles as the causal factors for MFDs (rare variants -common diseases hypothesis) [24]. However, testing this hypothesis requires a comparative analysis of results from sequencing thousands of genomes from healthy and diseased persons, and such studies are currently ongoing in the United Kingdom and countries of Europe and America. At the same time, theoretical considerations suggest inappropriateness of rare variant hypothesis in regard to the etiology of common MFDs [25,26]. According to the hypothesis of S. Hussain, a cause of MFD is a biallelic inactivation of one of causal genes related to MFD. It is postulated that the fi rst mutation (recessive allele) is inherited from one of parents, whereas the other (weak allele) has a somatic origin and emerges in embryogenesis or soon after the birth. Epigenetic abnormalities, in particular, mutations caused by methyl-cytosine deamination to thymidine in CpG islands of the structural genes, may play на в тимидин в CpG-островках структурных генов. Итак, доминантные мутации структурных генов, сочетания неблагоприятных аллелей нескольких генов одного метаболического пути, а также гомозиготные мутации многочисленных рецессивных генов могут быть причиной разнообразия и высокой популяционной частоты различных МФЗ [26]. Исследование любого количественного признака, равно как и патогенеза МФЗ, невозможно без анализа функции генома, то есть выяснения всех этапов реализации генетической информации на разных уровнях организации живой материи: молекулярном (ДНК, РНК), биохимическом (белки, функциональные белковые модули, метаболические пути), клеточном, межклеточном и др. [27].
Therefore, an in-depth understanding of MFDs requires a transition from the analysis of individual elements of pathological process such as genome, transcriptome, metabolome, etc. to their integration [22].
The need for natural science transition from analysis to synthesis and from reductionism to holism was justified in 1968 by Karl Ludwig von Bertalanffy who proposed and provided mathematical foundation for general systems theory that became a foundation of systems biology, an interdisciplinary scientific discipline aimed at study and modeling of complex interrelationships in biological systems from the perspective of the whole (holism/reductionism) [28][29][30]. One of its areas is systems genetics aimed at studying the processes of genetic information unfolding and the relationships between genetic components of metabolic pathways and functional modules in the development of phenotypic traits [22,30]. A search for relationships between causal gene expression and clinical symptoms of MFDs seems especially promising at the current stage. Refinement of the structure of genetic networks, their expression, properties, causal gene product interactions, and states of the corresponding metabolic pathways underlie molecular medicine.

PM and systems genetics of MFDs
The PM studies essentially result in an integration of obtained data, their alignment with clinical data, creation of bioinformatics model of disease, and identification of key trigger mechanisms (molecular drivers), whose use is essential for prevention and target treatment of MFDs [25]. Wide implementation of systems genetics approaches and methods signifies the beginning of new stage of PM: translational (target) medicine.
New surge of interest in PM was caused by rapidly growing effectiveness of whole genome sequencing and implementation of NGS. Within a short period of time, the use of NGS allowed to perform genome sequencing of 1,500 residents of Europe and vigorously sequence millions of genomes of people living in the United Kingdom, North America, and China [22]. It is proposed that NGS will allow to solve the essential problems in PM, in particular, identify the new candidate genes for MFDs and detect new pathogenetically significant mutations: to perform qualitative and quantitative analysis of copy number variations (CNV), assess their contribution to MFD pathogenesis and individual genetic specificity of MFDs, and create new genetic classification of diseases.
It is proposed to combine the results of wide-spread whole genome sequencing with analysis of causal gene expression in relationships with individual clinical data and Баранов В.С.
В настоящее время Генетическая карта репродуктивного здоровья (ГКРЗ) [8] все чаще рассматривается как набор генетических и других лабораторных тестов, направленных на выяснение особенностей и прогнозирование репродуктивного риска в семье. Она включает не только кариотипирование супругов, но и их тестирование на скрытое носительство более 1000 аутосомно-рецессивных заболеваний, включая семейные формы гиперхолестеринемии, наследственные раки, кардиомиопатии и др. Особое значение для прогноза репродуктивной results of laboratory tests. An achievement of this global goal, which is so important for humanity, implies the following milestones: Creation of representative DNA biobanks and tissue samples from patients with various MFDs -Identifi cation of candidate genes and determination of MFD genetic profi les (GWAS, exome and genome sequencing) -Comparative analysis of epigenetic regulation of candidate genes (methylation, spectrums of regulatory miRNA, and expression profi les of candidate genes in health and disease) -Development of computer software and bioinformatics models -Determination of main metabolic pathways disrupted in various MFDs -Search and identifi cation of MFD biomarkers suitable for early diagnosis, treatment, and unbiased assessment of individual MFD risk The proofs of main strategic PPPM principles will be acquired during the following fi ve years, and, in ten years, PPPM will be included in clinical practice, and, possibly, it will achieve the level of personalized medicine.
Patient Centered Outcomes Research Institute (PCO-RI) has been established in the U.S. in 2013 aimed at aligning the medical charts of patients with the results of their laboratory tests and personal genome data to improve quality of diagnosis, prevention, and treatment of common diseases. NGS is currently the most important diagnostic method in rare diseases, but its contribution to PM of multifactorial pathology is certain.
Systems genetics opens the true way to overcoming this challenge. The analysis of expression and epigenetic profi les of causal genes, functional modules, epistasis eff ects, and gene-gene and protein-protein interactions according to the rules of systems genetics signifi cantly expands capabilities of MFD diagnostics [19]. Omics technologies and omics-based integrative approach allow to get in-depth insights into the molecular mechanisms of MFD pathogenomics, identify the main metabolic pathways causing disease if disrupted, and off er personalized treatment. The results of genomic, transcriptomic, and proteomic analysis may provide the basis for electronic records of human health, GP, and family charts of reproductive health in the near future [31,32].
Currently available genetic chart of reproductive health (GCRH) [8] is increasingly often considered an array of genetic and other laboratory tests aimed at detection of traits and prediction of reproductive risk in the family. It comprises not only karyotyping of spouses, but also testing for latent carriership of over 1,000 autosomal recessive diseases including familial forms of hypercholesterolemia, hereditary cancers, cardiomyopathies, etc. Testing for the genetic markers of recurrent pregnancy loss, infertility, fetoplacental insuffi ciency, and thrombophilia are especially signifi cant for the prognosis of reproductive function. To preserve reproductive health in the presence of appropriate clinical indications, GT for hereditary causes of male and female infertility deserves a special attention.
Gene panels for selective screening of newborns, spouses, and adults under 25 years are currently under development. Standardization of screening procedure and known gene variant testing is warranted [6].
pathogenomics of typical MFDs such as endometriosis and uterine myoma, in particular, specify the ways of prevention, improve diagnosis, increase effi cacy of personalized treatment, and begin the development of gene therapy methods.
Performed studies convincingly demonstrate that the very integrative approach is the most productive and the only unbiased way to understanding pathogenomics of disease. The question to what extent identifi ed causal genes and products of their expression may be considered biomarkers requires further studies. However, one may currently conclude that both diseases (endometriosis and uterine myoma) are miscellaneous entities because they comprise clinically similar, but pathogenetically diverse diseases, each of which has its own epigenome with the corresponding clinical forms. Specifi c biomarkers of endometriosis and uterine myoma may be found within these subunits [33].
An essential condition for such a search is the presence of representative groups of relevant patients with accurate diagnosis. Therefore, the organization of DNA biobanks of patients with endometriosis and sequencing of their genomes supplemented by bioinformatic analysis is a logical and timely measure. Endometriosis is a complex systemic endometrial disease, which has its own genetic program of development regulating and directing the main genetic and epigenetic processes in the endometrial cells. The proposed hypothesis suggesting that the genetic program of endometriosis development includes at least three critical periods when the reprogramming of genome of endometrial cells occurs. Critical period 1 corresponds to the establishment of female reproductive system; critical period 2 corresponds to the transformation of epithelial cells of the endometrium and peritoneum to the mesenchymal stem cells; critical period 3 corresponds to the establishment and development of endometrioid heterotopies [34]. Integrative approach to studying pathogenomics of external genital endometriosis allowed to obtain new data regarding the genetic and epigenetic factors underlying the onset and progression of disease. Evidence suggests that endometriosis is not a self-suffi cient nosology, but rather an array (bunch) of similar or phenotypically close clinical forms caused by the mutations of diff erent genes or unfavorable combination of dysfunctional alleles and their epigenetic dysregulation. Each clinical from of endometriosis has its own epigenetic landscape i.e. specifi c functional changes in the transcription of causal genes. Equifi nality of pathological process (its fi nal clinical phenotype) is determined by the individual features of genome and epigenome of aff ected women.
Therefore, the study of endometriosis pathogenesis, based on omics technologies and methods of systems genetics, open the prospects for the development of eff ective methods of prevention, diagnostics, and personalized treatment of endometriosis, a common socially signifi cant disease.
Существенный прогресс достигнут и в других разделах ПМ, как традиционных (фармакогенетика, генетика старения, нутригеномика, спортивная геномика [8]), так и the most common mechanism (8085%) refers to MED12 gene whose mutations are associated with an increased expression of early WNT4 gene, which is activated in the uterine myometrium stem cells in response to various damaging factors (infections, mechanotransductions). Mutation of MED12 gene disrupts the metabolic pathways associated with myoblast proliferation and intracellular matrix formation (Wnt/β-catenin, prolactin, and insulin growth factor (IGF)). These abnormalities result in the development of multiple moderate-size UM nodes. The second mechanism is characterized by the presence of translocation t(12;14)(q14-q15;q23-q24) and an increase in the activity of HMGA2 gene whose product, a polypeptide belonging to the family of NA-binding histone proteins, stimulates cell proliferation and transformation to UM cells. The expression of HMGA2 gene is activated by hypoxia, xenobiotics, detoxification system abnormalities, and chromosome aberrations. HMGA2 gene product is the main driver of UM development. The HMGA2 gene hyperexpression activates protooncogene PLAG1 and WIF1 gene, an inhibitor of metabolic pathway of Wnt/β-catenin. Large solitary UM nodes result from the second epigenetic pathway.
Abundant data suggest the globally abnormal processes of methylation/demethylation in the genome of leiomyoma (LM) cells. This observation denotes a significant contribution of abnormal epigenetic regulation to the pathogenesis of the disease [35]. MicroRNAs play a significant role in the epigenetic regulation. Even the first studies of LM showed the presence of significant abnormalities in the profile of synthesis of regulatory microRNA especially let7, miR-21, miR-93, miR-106b, and miR-200. Many of them (let-7, 200a, 200c, 93, 106b, and 21) regulate the processes of proliferation, inflammation, and angiogenesis and control synthesis of extracellular matrix components and apoptotic processes in the LM cells.
The achievements of studying the UM pathogenomics open new strategic perspectives for diagnosis and treatment of this disease. In particular, a new promising strategy of UM pharmacogenetics consists in the downregulation of insulin growth factor (IGF2BP2 gene) signaling pathway to attenuate the HMGA2 gene expression and the decrease in extracellular matrix growth via suppression of TGF-β and ACVR1 genes.
Therefore, the use of cutting-edge molecular technologies and methods of systems genetics allowed to succeeded in understanding the UM pathogenomics and propose new promising approaches to its treatment.
A significant success has been also achieved in other PM subdisciplines including both traditional (pharmacogenetics, genetics of aging, nutrigenetics, and sports genetics [8]) and relatively recent areas (prenatal GT [6] and hereditary features of sensitivity to COVID-19 infection).
These areas of PM represent the focus of a second edition of new book "Predictive medicine evolution", which is currently in press.
Итак, ПМ, возникшая еще в 2000 г., претерпела серьезную эволюцию и сохранила свою актуальность. Она положила начало ПППМ, которая в обозримом будущем вместе с биоинформатикой и системной генетикой станет неотъемлемой частью трансляционной (таргетной) медицины. Не исключено, что уже упоминавшийся «генетический паспорт» в той или иной форме трансформируется в электронную карту здоровья, совмещенную с индивидуальными геномными данными [37]. Внедрение секвенирования нового поколения (NGS), создание обобщенных генетических и клинических баз данных -путь к «точной геномной персонализированной медицине» будущего. Разработка и внедрение масштабных международных проектов (EPPPM, Precision Medicine), направленных на анализ корреляций между молекулярными изменениями генома и особенностями их фенотипического проявления, включая МФЗ, свидетельствует об актуальности, перспективности и практической значимости исследований по ПМ. attempts of presymptomatic diagnostics in monogenic diseases to functional mapping of candidate genes in common MFDs. Implementation of GWAS method contributed to the identifi cation of many previously unknown causal MGF genes, but it did not result in breakthrough eff ectiveness of predictive GT. It became clear that low eff ectiveness of PM was caused by missing heritability associated with the imperfections of testing. An approach to PM from perspective of systems genetics (study of pathology at different levels of genetic information unfolding) signifi cantly reinforced a stand of PM and allowed for better understanding the MFD pathogenomics. Tremendous progress of genomics allows to state that, in the foreseeable future, a person even at birth may receive the results of individual genome analysis not only for the presence of already existing or potential not-yet manifesting pathology, but also for individual predisposition to common MFDs [34], which is essential for pharmacotherapy. It is perceived that, the results of genome sequencing are proposed to be integrated with medical record of a newborn, which will lay foundation for the life-long electronic health record. Current PPPM is still at the early stages of its development. Nowadays, GT provides an opportunity for a given person to determine only their risk group for MFD, but, with very few exceptions, it does not allow to make any unbiased predictions in regard to occurrence of future diseases. It is reasonable to emphasize that any clinical studies and tests (biochemical, functional, serological, etc.) provide an opportunity to estimate only a current state of organism, whereas GT, once made, provides information on unique features of the entire inheritable program of person [36].
In summary, PM, which emerged in 2000s, has underwent a signifi cant evolution and preserved its relevance. This discipline gave rise to PPPM, which, along with bioinformatics and systems genetics, will become an essential element of translational (target) medicine. The above-mentioned GP in one or another form will potentially turn into the electronic health record integrated with individual genomic data [37]. Implementation of NGS and creation of integrated genetic and clinical databases are the way to future precision genomic personalized medicine. Development and implementation of large-scale projects (EPPPM, Precision Medicine) aimed at the analysis of correlations between molecular changes in genome and features of the phenotypic manifestation including MFDs suggests the relevance, bright prospects, and practical signifi cance of PM studies.