(1) Brookhaven National Laboratory, Upton, NY 11973
(2) Department of
Applied Mathematics and Statistics SUNY at Stony Brook, Stony Brook
NY 11794-3600
(3) Department of
Pharmacological Sciences SUNY at Stony Brook, Stony Brook NY
11794-8651
A number of essential biological functions are controlled by proteins that bind to specific sequences in genomic DNA. In this paper, we present a simplified model for analyzing DNA/protein interactions as if these were mediated exclusively by hydrogen bonds. For this model, an optimized algorithm for geometric pattern recognition was developed which includes three stages. First, the large number of local energy minima are efficiently screened by using a geometric approach to pattern matching based on a square well potential. The second part of the algorithm represents a closed form solution for minimization based on a quadratic potential. A Monte Carlo method applied to a modified Lennard-Jones potential is used as a third step to rank DNA sequences in terms of pattern matching. To validate our hypothesis, we used protein structures derived from DNA-protein complexes with three-dimensional coordinates established experimentally by x-ray diffraction analysis. All possible DNA sequences to which these proteins could bind were ranked in terms of binding energies. The algorithm predicts the "correct" DNA sequence (i.e. the experimentally determined specific one) when at least two hydrogen bonds per base pair are involved in binding to the protein. This study provides a partial solution to the three-dimensional docking problem, and lays a framework for future refinements of the algorithm in which the number of assumptions made in the present analysis are reduced.