[PWM] HOmo sapiens COmprehensive MOdel COllection (HOCOMOCO) contains transcription factor (TF) binding models represented as classic Position Weight Matrices (PWMs, also known as Position-Specific Scoring Matrices, PSSMs) and precalculated score thresholds.
The dinucleotide PWMs provide better TFBS recognition quality and are also available for selected TFs, primarily for those with ChIP-Seq data available (see diChIPMunk and HOCOMOCO papers for details).
PWMs in HOCOMOCO v9 were derived from various types of experimental data using data integration by ChIPMunk motif discovery tool. HOCOMOCO v10 and v11 updates were primarily based on high-throughput sequencing data.
HOCOMOCO provides mono- and dinucleotide PWMs and precomputed score thresholds for download. To visualize predicted TFBS in a small set of sequences we provide MoLoTool. A dedicated web-interface in BioUML is also suitable for this task. For large-scale analysis we suggest to use command-line tools, such as SPRY-SARUS.
Each model in the collection has a quality rating from A to D where A represents motifs with the highest confidence, and models down to C quality are suitable for quantitative analyses. A motif of D quality provides only rough description of a binding pattern and should be used primarily in exploration studies. The details on quality assignment can be found in the most recent HOCOMOCO paper.
auROC (Area Under for ROC curve) represents the power of a given model to discriminate true positive ChIP-Seq segments from random noise. The best AUC across all peak sets shows the highest value reachable for the selected transcription factor.
Since v11 the alternative binding models of a particular TF are ranked from 0 (the primary model) to 1 and 2 (the alternative models, including single-box variants). The models of 0 rank is the most 'general' variant with the best weighted performance in the cross-validation benchmark (see the HOCOMOCO paper for details).
CORE COLLECTION: primary binding motifs that robustly represent binding sites across multiple experiments. The CORE collection contains models of ABC quality and 0 rank only.
FULL COLLECTION: core collection plus all the high-quality alternative and lower-reliability binding models built from limited experimental data. The FULL collection contains models of ABCD quality and 0-1-2 ranks.