Chapter 1:

Architectural study in individual conditions i.e Control and eed mutant

Creation of microc contact matrix files Without being verbose, the pairs file was generated by following the official micro c documnetation & then the subsequent .cool and .hic file was generated using cooler toolkit & juicer pipeline respectively; both of which are well documented. QC results as such was good for further analysis

Loops

Chromatin loops are a fundamental feature of three-dimensional (3D) genome organization, enabling long-range interactions between regulatory elements and target genes. Studying loops in each condition separately provides a foundational understanding of how genome architecture is maintained under normal circumstances and how it is perturbed upon loss of PRC2 activity via eed knockout.

TADs

TADs represent another level of 3D genome organization. These are contiguous genomic regions with high internal contact frequency and relatively insulated from neighboring domains. TADs often act as regulatory neighborhoods, ensuring that enhancers interact with appropriate targets within their domain. While TADs are generally stable across cell types and conditions, perturbations in chromatin modifiers can weaken TAD boundaries or affect insulation strength. In the eed knockout background, changes in TAD structure, either through weakened boundaries or disrupted insulation may reflect a role for PRC2 in maintaining domain integrity or insulating repressed chromatin. Quantifying TAD boundary scores and comparing insulation profiles between conditions allows for detection of such changes.

A/B compartments

A/B compartmentalization reflects broad euchromatin (A) and heterochromatin (B) segregation across the genome. A compartments are gene-rich, transcriptionally active, and open, while B compartments are gene-poor, repressed, and compacted. Compartment status is typically inferred through eigenvector decomposition of contact matrices, where positive and negative values correspond to A and B, respectively. Changes in compartment scores between control and eed knockout samples can reveal compartment switching, particularly in regions losing PRC2-mediated repression. Genes moving from B to A compartments may show increased expression, whereas the inverse can indicate secondary repression effects. This is something I tried to explore in my compartment_switch.ipynb notebook. More on that later..

Genomic stripes

Stripes, or contact stripes, are unidirectional interaction patterns extending from strong anchors such as CTCF sites or transcription start sites (TSSs). These features are thought to reflect progressive loop extrusion or ongoing transcriptional activity. Stripes often overlap with active genes and can be used as markers of chromatin dynamics.

Although I didn’t focus much on this, it’d be interesting to see If increased stripe strength around derepressed genes may reflect heightened transcriptional output, or like loss of stripe signal could indicate a impaired extrusion machinery or transcriptional shutdown # to-do

Genomic fountains

Fountains are recently described, triangular interaction structures in Hi-C maps. They may reflect complex chromatin folding events or paused extrusion phenomena. Need to use this tool to explain about it #to-do

Loop Detection and Quality Control

Loop structures were identified using multiple loop-calling algorithms (e.g., Mustache, Peakachu, or cooltools-dots]) on Micro-C contact maps processed at 5kb and 10kb resolution. Each condition was processed separately, and loop calls were filtered based on tool-specific confidence scores and reproducibility between replicates. Mostly pval < 0.05, visual inspection by loading the bedpe in genome browser & pileups. While trans-chromosomal loops were excluded due to lower confidence and resolution limitations, their potential relevance in global nuclear organization is acknowledged. The resulting loop sets were assessed for:

Total number of loops
Loop length distribution
Contact map signal at anchor regions (aggregate peak analysis or pileups)
Overlap with known regulatory regions/chromatin marks

Now what makes a loop biologically meaningful?

Not all detected loops are biologically meaningful; many can arise from technical noise, low read coverage, or statistical artifacts. A loop gains biological significance when it meets several criteria: reproducibility across replicates, high contact enrichment relative to local background, and association with known regulatory elements such as promoters, enhancers, CTCF sites, or Polycomb-bound regions. Loops anchored at transcriptionally active genes or regions marked by regulatory histone modifications (e.g., H3K27ac, H3K4me1) often suggest a functional role in gene regulation. Furthermore, loops that correlate with differential gene expression between conditions—such as those gained near upregulated genes in the eed knockout—offer strong evidence of functional relevance. Integrating loop data with epigenomic and transcriptomic layers is therefore essential to distinguish structural noise from regulatory architecture.

A union of 3 tools was preferred to retain maximum info and avoid tool based biases. A mega list of loops were created with unique values from these 3 and used downstream.

A brief mention about the 3 tools used -

Peakachu is a supervised machine learning-based loop caller trained on known loop datasets, making it sensitive to canonical loop patterns and capable of identifying high-confidence loops in sparse contact maps. In contrast, Mustache is a parameter-free, unsupervised method that detects loops by identifying statistically significant local enrichments using scale-space filtering, allowing robust detection across resolutions and interaction distances. Finally, Cooltools-dots is part of the Cooltools suite and uses a simple but effective Z-score-based approach to identify focal enrichment over a distance-stratified background model, offering a fast and reproducible baseline for loop annotation. Highly similar to juicer HiCCUPs but offers more flexibility with a python API and open file format.

Why not other tools?

Some other established tools were considered such as SIP, FitHiC2, Chromosight, HiCCUPs, and some newer ones like - Dconnloop, CGLoops. In the end, a decision was made to proceed with the most sound ones. 3 tools seemed optimal and not an overkill. Now there could be possibility for real loops not being identified by these 3 tools. To mitigate this info loss I am in the process of creating a machine learning model which can easily classify loops suited for the downstream integration. Could also be done the normal way by identifying non duplicates

Why loops as a basic unit of analysis?

Chromatin loops serve as the most direct representation of spatial proximity between distal genomic elements, making them a foundational unit for understanding 3D genome architecture. Unlike broader features such as compartments or TADs, loops represent discrete, high-resolution interactions that often connect enhancers, promoters, and other regulatory elements. These interactions can have direct consequences on gene expression and are highly responsive to changes in chromatin context. For eed knockout, loops provide a sensitive and interpretable readout of structural rewiring - whether through gain, loss, or repositioning. By analyzing loops independently in each condition, we gain insight into how chromatin folding and regulatory connectivity are altered in the absence of Polycomb-mediated repression. Furthermore, loops can be directly associated with specific genes or regulatory loci, allowing for integration with transcriptomic changes and facilitating mechanistic interpretations. Plus they are relatively easy to link to higher order structs like TADs & compartments.

Are certain loop classes differentially regulated upon PRC2 loss?

Differential analysis

An important question in the context of PRC2 inactivation is whether specific classes of loops are differentially regulated. Not all chromatin loops serve the same function—some connect enhancers to promoters, others mark structural boundaries via CTCF and cohesin, while a subset may be Polycomb-associated and contribute to gene repression. Upon eed knockout, Polycomb-bound regions lose H3K27me3 and may undergo structural reorganization. Thus, loops anchored at these regions may be selectively lost, gained, or rewired. I’ll post my findings with plots here shortly #to-do

While the architectural landscape of each condition was studied independently to establish a baseline, a key question remains: what specific changes in chromatin topology distinguish the control from the eed knockout, and how reliably can these differences be quantified? This includes assessing differential loop enrichment, TAD boundary shifts, insulation score changes, and compartment switching. Although the current focus has been on per-condition annotation, the ultimate goal is to identify features that are not just present or absent, but differentially organized in response to PRC2 loss. Addressing this requires a shift from descriptive to comparative analysis, which to some extent will be explored in next chap.