Statistical inference for phylogenetic trees

From AIRWiki
Jump to: navigation, search
Title: Statistical inference for phylogenetic trees
Toloverview.jpg

Image:Toloverview.jpg

Description: The project will focus on the study, implementation, comparison, and analysis of different statistical inference techniques for phylogenetic trees. Phylogenetic trees (1, 2, 3) are evolutionary trees used to represent the relationships between different species with a common ancestor. Typical inference tasks concern the construction of a tree starting from DNA sequences, involving both the choice of the topology of the tree (i.e., model selection) and the values of the parameters (i.e., model fitting). The focus will be a probabilistic description of the tree, given by the introduction of stochastic

variables associated to both internal nodes and leaves of the tree.

The project will focus on the understanding of the problem and on the implementation of different algorithms, so (C/C++ or Matlab or R) coding will be required. Since the approach will be based on statistical models, the student is supposed to be comfortable with notions that come from probability and statistics courses.

The project is thought to be extended to master thesis, according to interesting and novel directions of research that will emerge in the first part of the work. Possible ideas may concern the proposal and implementation of new algorithms, based on recent approaches to phylogenetic inference available in the literature, as in (3) and (4). In this case the thesis requires some extra effort in order to build and consolidate some background in math in oder to understand some recent literature, especially in (mathematical) statistics and, for example, in the emerging field of algebraic statistics (5).

Other possible novel applications of phylogenetic trees have been proposed in contexts different from biology, as in (6). Malware (malicious software) is software designed to infiltrate a computer without the owner's informed consent. Often malwares are related to previous programs thought evolutionary relationships, i.e., new malwares appear as small mutations of previous softwares. We are interested in the use of techniques from phylogenetic trees to create a taxonomy of real world malwares.

Picture taken from http://www.tolweb.org/tree/ and http://www.blogscienze.com

Bibliography

  1. Felsenstein 2003: Inferring Phylogenies
  2. Semple and Steel 2003: Phylogenetics: The mathematics of phylogenetics
  3. Louis J. Billera, Susan P. Holmes and and Karen Vogtmann Geometry of the space of phylogenetic trees. Advances in Applied Math 27, 733-767 (2001)
  4. Evans, S.N. and Speed, T.P. (1993). Invariants of some probability models used in phylogenetic inference. Annals of Statistics 21, 355-377.
  5. Lior Pachter, Bernd Sturmfels 2005, Algebraic Statistics for Computational Biology.
  6. A. Walenstein, E-Md. Karim, A. Lakhotia, and L. Parida. Malware Phylogeny Generation Using Permutations of Code, Journal in Computer Virology, v1.1, 2005.
Tutor: MatteoMatteucci (matteo.matteucci@polimi.it), LuigiMalago (malago@elet.polimi.it), StefanoZanero ()
Start: 2009/10/01
Students: 1 - 2
CFU: 5 - 20
Research Area: Machine Learning
Research Topic: Information Geometry, Stocastic Optimization, Evolutionary Computation
Level: Ms
Type: Course, Thesis
Status: Closed