AI Models Shed Light on Parasite Evolution at the 100th Annual ASP Meeting
AI Models Shed Light on Parasite Evolution at the 100th Annual ASP Meeting
It was a privilege to present at the 100th Annual Meeting of the American Society of Parasitologists in Winston-Salem, NC, where I shared our lab’s latest advances in integrating artificial intelligence with phylogenetic analysis.
Our talk, titled “AI-driven predictions of phylogenetic trees from zoogeographical data in dactylogyrids (Platyhelminthes: Monogenea),” introduced a novel strategy for using non-phylogenetic data—such as host distribution and climate variables—as external sources of support for phylogenetic inference.
Dactylogyrids are parasitic flatworms (Monogenea) known for their high host specificity. Their limited geographic range, tied closely to their host species, makes them ideal candidates for studying host-parasite coevolution. In collaboration with Aline A. Acosta (UNC Charlotte’s Klein College of Science, Department of Biological Sciences) and Anastasiia Duchenko (UNC Charlotte’s Dept. of Bioinformatics and Genomics), we developed a modified random forest model that learns to predict parasite clades using zoogeographical data. Our working hypothesis: the better a tree’s topology is predicted by ecological data, the more biologically meaningful it may be.
This presentation builds directly upon our recent publication in Cladistics (Alves et al., 2025, DOI: 10.1111/cla.12610), where we first demonstrated this approach using proteocephalidean tapeworms. There, our model achieved remarkable success—nearly 89% classification accuracy—suggesting that geography and host identity are deeply encoded in the evolutionary signal.
At ASP 2025, we expanded this methodology to other monogenean genera, including the several Neotropical endemics like Anacanthorus, Cosmetocleithrum, Demidospermus, and Urocleidoides. We also tested the model’s robustness under data perturbation, revealing consistent topological signals even under synthetic noise.
By reframing ecological traits as interdependent phylogenetic predictors, we’re opening new doors to how we assess the credibility of evolutionary hypotheses—especially in groups where molecular data may be sparse or noisy.
This work exemplifies how computational tools can enrich evolutionary biology, offering new layers of insight where traditional methods reach their limits. I’m grateful to our collaborators and to the ASP community for the engaging discussions and warm reception.
Read the original study published in Cladistics
Why Zoogeographical Data Are Not Phylogenetic Character?
Let’s take as an example a flatworm parasite that infects the spiral valves of freshwater stingrays or the guts of fish. Using this as an example, let us discuss what a phylogenetic character is. The DNA sequences of the parasite and its morphological features can be coded and called phylogenetic characters. But what about their host, or the river system in which they are found? Are zoogeographical features also phylogenetic characters?
The question cuts directly to one of the most illuminating boundary cases in the theory of phylogenetic characters. The answer requires applying the heritability criterion carefully.
Morphological features and DNA sequences pass the test unambiguously. They are heritable: they are transmitted from parent to offspring through genetic information. Each observed variation can, in principle, be traced to a unique transformation event in the lineage. A particular spine morphology, attachment organ structure, or nucleotide at a given position is a candidate for a transformation series in the sense of Hennig (1966) and Grant & Kluge (2004).
Host identity and river system present a fundamentally different situation. Grant & Kluge (2004) are explicit on this point in a footnote that is easy to overlook: “for there to be a direct evidential connection between observed variation of any kind and lineage diversification there must be heritability, and that is not a characteristic of stratigraphic data.” The same logical argument extends without modification to zoogeographic data and, more critically, to ecological associations such as host identity.
The flatworm being found in the spiral valve of a freshwater stingray, rather than the gut of a teleost, is an observation about the ecological context of the organism. It is not a property of the organism’s genome that is transmitted from parent to offspring. The parasite’s offspring do not inherit “infects stingray” as a heritable unit in the way they inherit a morphological structure or a nucleotide sequence. What is heritable is the underlying biochemical and developmental machinery that determines host specificity — the receptor molecules, surface proteins, and behavioral repertoires that make infection of a given host possible. But that machinery is encoded in the genome and expressed in the phenotype, and those features are already capturable as morphological or sequence characters. “Host species X” as a character state is at best a coarse proxy for a set of heritable features whose real identity lies elsewhere.
Similarly, “found in the Paraná basin” versus “found in the Orinoco basin” reflects a historical distributional event — a vicariance or dispersal — that affected a population. The river system is a feature of the physical world, not a heritable feature of the organism. Two sister lineages separated by vicariance do not inherit their respective distributions from a parental organism; the distributions are consequences of geological and ecological history, not of transmission of genetic information.
This does not mean that host associations and geographic distributions are evidentially useless in systematics. They are extremely valuable as corroborative sources of information. When a cladogram inferred from morphological and molecular characters of the parasite is congruent with a pattern of host co-speciation or with a known sequence of river capture events, that congruence strengthens confidence in both the phylogenetic hypothesis and the biogeographic or ecological interpretation. This is the logic underlying host-parasite co-phylogenetics and vicariance biogeography. But the key epistemological point is that congruence between a tree and a geographic pattern is a test of the geographic hypothesis against an independently derived phylogenetic hypothesis — it is not the same as treating geography as a character in the construction of that hypothesis.
The practical danger of treating host identity or geography as phylogenetic characters is precisely the one Grant & Kluge (2004) identify for non-heritable variation generally: there is no transformation event in the organism’s lineage that one can point to as the causal basis for the “character state.” The variation is real, the history behind it is real, but the connection to lineage diversification through heritability — which is the prerequisite for evidential relevance in phylogenetic inference — is absent or at best indirect.
Previous post
Unlocking the Genomic Code of the World's Deadliest Frog
Next post
Bioinformatics Teaching in the age of GenAI