repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects. In order to develop powerful computational predictors for identifying the biological features or attributes of DNAs, one of the most challenging problems is to find a suitable approach to effectively represent the DNA sequences. To facilitate the studies of DNAs and nucleotides, we developed a Python package called representations of DNAs (repDNA) for generating the widely used features reflecting the physicochemical properties and sequence-order effects of DNAs and nucleotides. There are three feature groups composed of 15 features. The first group calculates three nucleic acid composition features describing the local sequence information by means of kmers; the second group calculates six autocorrelation features describing the level of correlation between two oligonucleotides along a DNA sequence in terms of their specific physicochemical properties; the third group calculates six pseudo nucleotide composition features, which can be used to represent a DNA sequence with a discrete model or vector yet still keep considerable sequence-order information via the physicochemical properties of its constituent oligonucleotides. In addition, these features can be easily calculated based on both the built-in and user-defined properties via using repDNA.
Keywords for this software
References in zbMATH (referenced in 12 articles )
Showing results 1 to 12 of 12.
- Zhao, Wei; Li, Guang-Ping; Wang, Jun; Zhou, Yuan-Ke; Gao, Yang; Du, Pu-Feng: Predicting protein sub-Golgi locations by combining functional domain enrichment scores with pseudo-amino acid compositions (2019)
- Contreras-Torres, Ernesto: Predicting structural classes of proteins by incorporating their global and local physicochemical and conformational properties into general Chou’s PseAAC (2018)
- Sabooh, M. Fazli; Iqbal, Nadeem; Khan, Mukhtaj; Khan, Muslim; Maqbool, H. F.: Identifying 5-methylcytosine sites in RNA sequence using composite encoding feature into Chou’s PseKNC (2018)
- Jia, Jianhua; Liu, Zi; Xiao, Xuan; Liu, Bingxiang; Chou, Kuo-Chen: pSuc-Lys: predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach (2016)
- Jiao, Ya-Sen; Du, Pu-Feng: Prediction of Golgi-resident protein types using general form of Chou’s pseudo-amino acid compositions: approaches with minimal redundancy maximal relevance feature selection (2016)
- Jiao, Ya-Sen; Du, Pu-Feng: Predicting Golgi-resident protein types using pseudo amino acid compositions: approaches with positional specific physicochemical properties (2016)
- Yang, Lianping; Zhang, Xiangde; Fu, Haoyue; Yang, Chenhui: An estimator for local analysis of genome based on the minimal absent word (2016)
- Ali, Farman; Hayat, Maqsood: Classification of membrane protein types using voting feature interval in combination with Chou’s pseudo amino acid composition (2015)
- Ju, Zhe; Cao, Jun-Zhe; Gu, Hong: iLM-2L: a two-level predictor for identifying protein lysine methylation sites and their methylation degrees by incorporating K-gap amino acid pairs into Chou’s general PseAAC (2015)
- Kou, Gaoshan; Feng, Yonge: Identify five kinds of simple super-secondary structures with quadratic discriminant algorithm based on the chemical shifts (2015)
- Liu, Guoqing; Xing, Yongqiang; Cai, Lu: Using weighted features to predict recombination hotspots in \textitSaccharomycescerevisiae (2015)
- Marrero-Ponce, Yovani; Contreras-Torres, Ernesto; García-Jacas, César R.; Barigye, Stephen J.; Cubillán, Néstor; Alvarado, Ysaías J.: Novel 3D bio-macromolecular bilinear descriptors for protein science: predicting protein structural classes (2015)