Does Protein Similarity of Pluripotency Factors Mean Their Gene Ontology Semantic Similarity?

Document Type : Original Article


1 Department of Animal Science, Faculty of agriculture, Ferdowsi university of Mashhad, Mashhad, Iran

2 Department of Biology, Faculty of Science, Ferdowsi university of Mashhad, Mashhad, Iran


Recognition and prediction of biological function of proteins based on amino acid sequencesis a simple method employed in so many software and operators. However, the sequence similarity does not always imply to similarity of biological function. The aim of this study was to determine the semantic similarity of gene ontology (GO) of six pluripotency factors, Oct4, Sox2, C-Myc, Klf-4, Lin28 and Nanog in six species and evaluate their conformity with their protein sequence similarity and phylogenetic distance. C-myc factor exhibited a significant correlation between phylogenetic distance and protein similarity. The other factors like Sox2, Klf-4 and Lin-28 showed the correct changes of phylogenetic distance and protein similarity, but Nanog and Oct4 factors did not display a correct correlation between two indices because, the increase of protein similarity was not followed with the decrease of phylogenetic distance. Following the study, the protein or nucleotide similarity was assumed as dependent variable and GO similarity in three categories of biological process (BP), molecular function (MF) and, cell component (CC) were expected as the independent variables. With this assumption, regression analysis was accomplished to determine the best model for protein and nucleotide similarity estimation. The protein or nucleotide similarity also displayed a significant regression with GO similarity for C-myc factor and category of BP and CC were selected to estimate protein or nucleotide similarity by model, but a significant regression was not observed for other pluripotency factors for estimation of protein or nucleotide similarity. It means that except of C-myc, GO similarity of other studied pluripotency factors didn’t reflect the protein or nucleotide similarity. It is suggested that related data for five pluripotency factors, including Oct-4, Sox2, Klf4, Lin28 and Nanog in the six studied species should be reviewed.


Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., 2000. Gene Ontology: tool for the unification of biology. Nature genetics 25, 25.
Balaji, S., Srinivasan, N., 2007. Comparison of sequence-based and structure-based phylogenetic trees of homologous proteins: Inferences on protein evolution. Journal of biosciences 32, 83-96.
Consortium, G.O., 2004. The Gene Ontology (GO) database and informatics resource. Nucleic acids research 32, D258-D261.
Duan, Z.-H., Hughes, B., Reichel, L., Perez, D.M., Shi, T., 2006. The relationship between protein sequences and their gene ontology functions. BMC bioinformatics 7, S11.
Echols, N., Milburn, D., Gerstein, M., 2003. MolMovDB: analysis and visualization of conformational change and structural flexibility. Nucleic Acids Research 31, 478-482.
Faria, D., Pesquita, C., Couto, F.M., Falc˜ao, A.e.O., 2007. ProteInOn: A Web Tool for Protein Semantic Similarity, pp. 1-11.
Gerlt, J.A., Babbitt, P.C., 2000. Can sequence determine function? Genome Biology 1, reviews0005. 0001.
Hennig, S., Groth, D., Lehrach, H., 2003. Automated Gene Ontology annotation for anonymous sequence data. Nucleic Acids Research 31, 3712-3715.
Leung, A.K.L., Trinkle-Mulcahy, L., Lam, Y.W., Andersen, J.S., Mann, M., Lamond, A.I., 2006. NOPdb: nucleolar proteome database. Nucleic Acids Research 34, D218-D220.
Lord, P.W., Stevens, R.D., Brass, A., Goble, C.A., 2003. Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics 19, 1275-1283.
Park, I.-H., Zhao, R., West, J.A., Yabuuchi, A., Huo, H., Ince, T.A., Lerou, P.H., Lensch, M.W., Daley, G.Q., 2008. Reprogramming of human somatic cells to pluripotency with defined factors. Nature 451, 141.
Pearson, W.R., 2013. An introduction to sequence similarity (“homology”) searching. Current Protocols in Bioinformatics 42, 3.1. 1-3.1. 8.
Šali, A., 1999. Genomics: Functional links between proteins. Nature 402, 23.
SAS Institute, , 2004. SAS/STAT User’s Guide: Statistics. Version 9.2 Edition. SAS Inst. Inc., Cary, NC.
Schug, J., Diskin, S., Mazzarelli, J., Brunk, B.P., Stoeckert, C.J., 2002. Predicting gene ontology functions from ProDom and CDD protein domains. Genome Research 12, 648-655.
Simara, P., Motl, J.A., Kaufman, D.S., 2013. Pluripotent stem cells and gene therapy. Translational Research 161, 284-292.
Takahashi, K., Tanabe, K., Ohnuki, M., Narita, M., Ichisaka, T., Tomoda, K., Yamanaka, S., 2007. Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell 131, 861-872.
Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., Kumar, S., 2011. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28, 2731-2739.
Vinayagam, A., König, R., Moormann, J., Schubert, F., Eils, R., Glatting, K.-H., Suhai, S., 2004. Applying support vector machines for gene ontology based gene function prediction. BMC Bioinformatics 5, 116.
Wernig, M., Meissner, A., Foreman, R., Brambrink, T., Ku, M., Hochedlinger, K., Bernstein, B.E., Jaenisch, R., 2007. In vitro reprogramming of fibroblasts into a pluripotent ES-cell-like state. Nature 448, 318.
Winter, C., Henschel, A., Kim, W.K., Schroeder, M., 2006. SCOPPI: a structural classification of protein–protein interfaces. Nucleic Acids Research 34, D310-D314.
Yu, J., Vodyanik, M.A., Smuga-Otto, K., Antosiewicz-Bourget, J., Frane, J.L., Tian, S., Nie, J., Jonsdottir, G.A., Ruotti, V., Stewart, R., 2007. Induced pluripotent stem cell lines derived from human somatic cells. Science 318, 1917-1920.
Zehetner, G., 2003. OntoBlast function: From sequence similarities directly to potential functional annotations by ontology terms. Nucleic Acids Research 31, 3799-3803.