CpG islands contain a high density of CpG content and embrace the promoters of
most genes in vertebrate genomes. In the human genome, ~70% of promoters have
a high frequency of CpG dinucleotides. CFP1 is a CXXC domain-containing protein
and an essential component of the SETD1 histone H3K4 methyltransferase complex.
By recognizing the unmodified CpG DNA, CXXC domain proteins direct different
chromatin-modifying activities to various chromatin regions.
The CXXC domain of CFP1 consists of two alpha helices and one short 310 helix
with two long loops linking them. Eight conserved cysteine residues bind two zinc ions to
form two C4-type zinc fingers, with the first three cysteines and the last cysteine binding
one zinc ion and the middle four cysteines binding the other zinc ion. The crescent-
shaped CFP1 CXXC domain is wedged into the major groove of the CpG DNA and
forms extensive interactions between the CXXC domain and DNA. The DNA-binding
surface of CFP1 is predominantly positively charged, interacting with the negatively
charged DNA. In addition to electrostatic interactions, a network of hydrogen bonds
between the CXXC domain and DNA, including several water-mediated interactions,
contribute to CFP1-DNA binding. Interestingly, only the middle four nucleotides
including the CpG dinucleotide contribute to the CXXC binding.