Preprint / Version 1

Oncogene Protein Annotation assisted with Machine-learning pipelines

Keywords:

Oncogene protein, machine learning, breast cancer, protein annotation tools

Abstract

Lembar pernyataan

Yth Moderator RINarxiv

Bahwa saya menyatakan:

1) Sebagai penulis artikel berjudul "Oncogene Protein Annotation assisted with
Machine-learning pipelines". Melalui surel ini saya menyatakan bahwa artikel ini berstatus (pilih salah satu):

B. Preprint yang belum dikirimkan ke jurnal manapun

2) bahwa artikel ini bukan merupakan karya original. Seandainya di kemudian hari ditemukan ada unsur plagiarisme (sengaja atau tidak sengaja), maka itu adalah tanggung jawab saya dan tim penulis.

Stephen Sugiharto, Siti Lateefa Az Zahra B, Nelson Chandra, Arli Aditya Parikesit

---

Abstract

Breast cancer is one of the major causes of death in females of all ages. It has been studied that the disease is related to a protein called oncogene protein. The protein itself is a result of a mutation in proto-oncogene. By analyzing the structures, properties and functions of the protein pattern then can be determined.  During protein structure annotation, various techniques in the analysis are available. One of the assisting techniques used in annotation is the  machine learning pipeline as it was known to be applied in many categories  such  as  technologies  and  not  restricted to the health field. Protein annotation tools also play a significant contribution as a part of machine learning pipelines.

References

A. Parikesit, D. Agustriawan and R. Nurdiansyah, "Protein Annotation of Breast-cancer-related Proteins with Machine-learning Tools", Makara Journal of Science, 2020.

D. Wilson, M. Madera, C. Vogel, C. Chothia and J. Gough, "The SUPERFAMILY database in 2007: families and functions", Nucleic Acids Research, vol. 35, no., pp. D308-D313, 2007.

M. Madera, "The SUPERFAMILY database in 2004: additions and improvements", Nucleic

Acids Research, vol. 32, no. 90001, pp. 235D-239, 2004.

A. Andreeva, "SCOP database in 2004: refinements integrate structure and sequence family

data", Nucleic Acids Research, vol. 32, no. 90001, pp. 226D-229, 2004.

S. Sivashankari and P. Shanmughavel, "Functional annotation of hypothetical proteins – A

review", Bioinformation, vol. 1, no. 8, pp. 335-338, 2006.

M. Pellegrini, E. Marcotte, M. Thompson, D. Eisenberg and T. Yeates, "Assigning protein

functions by comparative genome analysis: Protein phylogenetic profiles", Proceedings of the

National Academy of Sciences, vol. 96, no. 8, pp. 4285-4288, 1999.

F. Bernstein et al., "The protein data bank: A computer-based archival file for macromolecular

structures", Journal of Molecular Biology, vol. 112, no. 3, pp. 535-542, 1977.

A. Parikesit & D. Anugro. “3D PREDICTION OF BREAST CANCER BIOMARKER FROM THE EXPRESSION PATHWAY OF LINCRNA-ROR/MIR-145/ARF6”. FaST- Jurnal Sains dan

Teknologi. Vol. 2, No. 1, 2018.

J. Ferlay et al., Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008, International Journal of Cancer, 127(12): 2893–2917. 2010

Kemenkes-RI. Profil Kesehatan Indonesia. Jakarta. Retrieved from http://www.depkes.go.id/resources/download/pusdatin/profil-kesehatan-indonesia/profil-kesehat an-indonesia-2013.pdf. 2013

WHO. WHO Breast Cancer Prevention and Control. Retrieved from http://www.who.int/cancer/detection/breastcancer/en/. 2016

J. Gough, K. Karplus, R. Hughey and C. Chothia, "Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure", Journal of Molecular Biology, vol. 313, no. 4, pp. 903-919, 2001.

S.Eddy,"ProfilehiddenMarkovmodels",Bioinformatics,vol.14,no.9,pp.755-763,1998.

A. Parikesit and D. Anurogo, "PREDIKSI STRUKTUR 2-DIMENSI NON-CODING RNA DARI BIOMARKER KANKER PAYUDARA TRIPLE-NEGATIVE DENGAN VIENNA RNA PACKAGE",

Chimica et Natura Acta, vol. 4, no. 1, p. 27, 2016.

Shen, L., Margolies, L. R., Rothstein, J. H., Fluder, E., McBride, R., & Sieh, W. (2019). Deep

Learning to Improve Breast Cancer Detection on Screening Mammography. Scientific Reports,

(1). doi:10.1038/s41598-019-48995-4

A. Gruber, R. Lorenz, S. Bernhart, R. Neubock and I. Hofacker, "The Vienna RNA Websuite",

Nucleic Acids Research, vol. 36, no., pp. W70-W74, 2008.

M. Zuker and P. Stiegler, "Optimal computer folding of large RNA sequences using

thermodynamics and auxiliary information", Nucleic Acids Research, vol. 9, no. 1, pp. 133-148,

J. McCaskill, "The equilibrium partition function and base pair binding probabilities for RNA

secondary structure", Biopolymers, vol. 29, no. 6-7, pp. 1105-1119, 1990.

I. Hofacker and P. Stadler, "Memory efficient folding algorithms for circular RNA secondary

structures", Bioinformatics, vol. 22, no. 10, pp. 1172-1176, 2006.

S. El-Gebali et al., "The Pfam protein families database in 2019", Nucleic Acids Research, vol.

, no. 1, pp. D427-D432, 2018.

S. Kumar, G. Stecher and K. Tamura, "MEGA7: Molecular Evolutionary Genetics Analysis

Version 7.0 for Bigger Datasets", Molecular Biology and Evolution, vol. 33, no. 7, pp.

-1874, 2016. Available: 10.1093/molbev/msw054 [Accessed 21 June 2021].

"BRCA2-Breastcancertype2susceptibilityprotein-Homosapiens(Human)-BRCA2gene&

protein", Uniprot.org, 2021.

"BCAR1 - Breast cancer anti-estrogen resistance protein 1 - Homo sapiens (Human) - BCAR1gene & protein", Uniprot.org, 2021.

Published

2021-07-08

Section

Preprints