Carcinogenesis Predictions using DL-Learner

“Obtaining accurate structural alerts for the causes of chemical cancers is a problem of great scientific and humanitarian value.” (A. Srinivasan, R.D. King, S.H. Muggleton, M.J.E. Sternberg in [5])


About Carcinogenesis Prediction

Carcinogenesis (meaning literally, the creation of cancer) is the process by which normal cells are transformed into cancer cells. Cell division is a normal physiological process. However, mutations in DNA that lead to cancer can disrupt the balance between proliferation and programmed cell death. This results in uncontrolled cell division and tumor formation.

Prevention of environmentally-induced cancers is a health issue of unquestionable importance. Almost every sphere of human activity in an industrialised society faces potential chemical hazards of some form. It is estimated that nearly 100,000 chemicals are in use in large amounts every day. A further 500–1000 are added every year. Only a small fraction of these chemicals have been evaluated for toxic effects like carcinogenicity (paragraph taken from [3]).

There have been several approaches by both – human and machines – to predict carcinogenicity. It has been shown in [6] that the performance of machine derived models for carcinogenicity can be equal to human experts. Classifying chemicals is a massive challenge, due to the high number and diversity of elements, structures, and tests involved in the problem. As outlined in several journal articles in the area of bioinformatics [1,3,7], reliable carcinogenesis prediction has not been achieved yet, but Artificial Intelligence approaches were able to learn new or confirm existing toxicological knowledge. The use of carcinogenicity models, which can be derived by humans and AI approaches, for setting priorities in the experiments of the The U.S. National Toxicology Program (NTP) has been succesful [1].

Modelling Carcinogenesis Data in OWL

The NTP makes a set of more than 300 tested compounds available for training by Machine Learning tools. Carcinogenicity of these compounds has been predicted using standardised chemical bioassays (exposure of rodents to chemicals). However, obtaining empirical evidence from such bioassays is expensive and usually too slow to cope with the number of chemicals that can result in adverse effects on human exposure. This has resulted in a need for models for carcinogenesis and lead to the decision of the NTP to make the data available in a suitable form.

For each compound, the chemical structure, structural indicators, and the results of short-term assays were made available. They can be downloaded at the website of the Oxford University Machine Learning Group. The data was made available in Prolog format. It is well known that not every logic program can be converted to an OWL knowledge base and vice versa. However, for the carcinogenesis problem, such a mapping is possible. DL-Learner contains a Prolog parser, which was used to read in the data. Using a set of mapping rules, the Prolog files were converted into an OWL ontology, which can be downloaded here and viewed using e.g. Protegé or OntoWiki. During the transformation process almost no knowledge was lost or added. The resulting ontology contains 142 classes, 19 properties, 22373 instances, and 74567 triples.

DL-Learner Prediction Results

We used DL-Learner to learn general models for substances causing cancer. The most sensible parameter of DL-Learner in this case is noise (bounding the minimum acceptable training set accuracy of the learned definition). We varied the noise parameter starting from 35% downwards in one percent steps and finally determined 32% as the setting, which minimizes the 10 fold cross validation error. The following table summarizes the accuracy of the learned definitions in comparison with other approaches using the same background knowledge (many of those use Aleph – a state of the art Inductive Logic Programming system – as their basis):


approach/tool accuracy (standard deviation) reference
DL-Learner 67.4% (± 7.9%)
Aleph with Ensembles 59.0% to 64.5% [2]
Boosted Weak ILP 61.1% [4]
Weak ILP 58.7% [4]
Aleph Deterministic Top-Down 0.7 57.9% (± 9.8%) [8]
Aleph Randomized Rapid Restarts 0.9 57.6% (± 6.4%) [8]
Aleph Deterministic Top-Down 0.9 56.2% (± 9.0%) [8]
Aleph Randomized Rapid Restarts 0.7 54.8% (± 9.0%) [8]

As evident from the table, DL-Learner has the highest cross validation accuracy of the listed approaches. For all approaches, where the standard deviation was given in the articles, we calculated whether the difference in accuracy is statistically significant using a t-test with a confidence interval of 95%. Indeed, in all those cases the difference turned out to be statistically significant. Note, that the second and third table entry combine different learned rules to obtain a classifier, which increases accuracy but reduces readability of the result. The average runtime of DL-Learner for each of the ten folds was 178 seconds on a 2.2Ghz Dual core machine with 2GB RAM. Additional 36 seconds were needed to read in the ontology (using the OWL API) and prepare it for the approximate reasoner built into DL-Learner. Overall, DL-Learner reaches a similar accuracy as those approaches involving (possible task-specific) additional preprocessing steps. DL-Learner classifies slightly more than 2 out of 3 chemical compounds correctly, which is remarkable considering the difficulty of the considered problem. However, it is just one step towards more reliable carcinogenicity predictions and could be improved further for instance through a) more classified compounds in the background knowledge, b) more complete short-term assays for each compound, c) a proper selection of structural indicators by experts or feature selection AI approaches (which was not yet available for experiments here), d) the inclusion of more scientific principles («chemical reasoning») developed within chemistry over the last decades, and e) improvements in DL-Learner and other Machine Learning approaches. As an example, the following definition was obtained when using DL-Learner over the full training set (72% accuracy, for easier readability negation symbols are moved further out if possible):

This can be phrased in natural language as:

A chemical compound is carcinogenic iff …

… it does not contain a Nitrogen-35, Phosphorus-60, Phosphorus-61, or Titanium-134 atom

… and it has at least three Halide – excluding Halide10 – structures

or the ames test of the compound is positive and there are at least five atom bonds which are not of bond type 7.

Over various runs, the DL-Learner has identified the ames test and multiple occurrences of halide groups (in particular Ar-Halide) as indicators for carcinogenicity.


The Prolog dataset, as well as the OWL ontology can be found here: click


[1] R. Benigni and A. Giuliani, Putting the Predictive Toxicology Challenge Into Perspective: Reflections on the Results, Bioinformatics, 19(10), pp. 1194–1200, 2003.

[2] I. Dutra, D. Page, V. S. Costa, and J. Shavlik, An Empirical Evaluation of Bagging in Inductive Logic Programming, Proceedings of the 12th International Conference on Inductive Logic Programming, Lecture Notes in Artificial Intelligence, Vol. 2583, pp. 48–65, Springer-Verlag, 2003.

[3] C. Helma, R. D. King, S. Kramer, and A. Srinivasan, The Predictive Toxicology Challenge 2000–2001, Bioinformatics, 17(1), pp. 107–108, 2001.

[4] N. Jiang and S. Colton, Boosting Descriptive ILP for Predictive Learning in Bioinformatics, Proceedings of ILP 2006, Lecture Notes in Computer Science, Vol. 4455, pp. 275–289, Springer, 2006.

[5] A. Srinivasan, R. D. King, S. Muggleton, and M. J. E. Sternberg, Carcinogenesis Predictions Using ILP, Proceedings of the 7th International Workshop on Inductive Logic Programming, Lecture Notes in Artificial Intelligence, Vol. 1297, pp. 273–287, Springer-Verlag, 1997.

[6] A. Srinivasan, R. D. King, and D. W. Bristol, An assessment of submissions made to the Predictive Toxicology Evaluation Challenge, Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI-99-Vol1), pp. 270–275, Morgan Kaufmann Publishers, July 31- August 6 1999.

[7] H. Toivonen, A. Srinivasan, R. D. King, S. Kramer, and C. Helma, Statistical Evaluation of the Predictive Toxicology Challenge 2000–2001, Bioinformatics, 19(10), pp. 1183–1193, 2003.

[8] F. Zelezný, A. Srinivasan, and D. Page, Lattice-Search Runtime Distributions May Be Heavy-Tailed, Proceedings of the 12th International Conference on Inductive Logic Programming, Lecture Notes in Artificial Intelligence, Vol. 2583, pp. 333–345, Springer-Verlag, 2003.