
Compaction of otherwise long DNA to fit into the nucleus of a eukaryotic cell is a well-established fact. This compaction is a result of fine play between two components of the cell: the nuclear matrix and the S/MAR elements of DNA. S/MARs or Scaffold/Matrix Attachment Regions, are components of DNA that interact with the nuclear matrix and compartmentalize the chromatin into structural as well as functional domains, thereby controlling gene expression. Various proteins known as S/MARBPs or S/MAR binding proteins are known to facilitate this looping of chromatin. Defects in S/MARs have been linked to various diseases viz., cancers, inflammatory diseases, facioscapulohumeral dystrophy (FSHD) and viral infections like HIV and HTLV. Therefore, genome-wide understanding of these elements holds great importance and therapeutic promise.
To construct a comprehensive genome-wide map of S/MARs present in humans, ChIP-Seq data of 10 different S/MAR binding proteins were analyzed and the binding site coordinates of these proteins were used to prepare a non-redundant S/MAR dataset of human genome. S/MARs are known to possess various features like origin of replication (OriC), AT richness, kinked and curved DNA, TG richness, MAR signature and Topoisomerase-II sites. These features were then considered to determine the S/MAR sequences present in this dataset. Along with co-ordinate (location) details of human S/MARs in whole genome, this dataset also revealed details of S/MAR features such as length, inter-S/MARs length (the chromatin loop size), nucleotide repeats, abundance of motifs, chromosomal distribution and genomic context.