Each protein chain is represented using six lines: 1: Protein ID 2: Amino acid sequence for the test dataset or selected residue for the training dataset 3: The DNA-binding annotations where 0 indicates non-DNA-binding 1 indicates for DNA-binding annotations 2 (only for the test dataset) shows residues that were not used since the binding information was unavailable for these residues 4: Space-separated RAA values for each residue 5: Space-separated putative RSA values for each residue (predicted with ASAquick) 6: Space-separated ECO value for each residue (computed using profiles generated with HHblits)