# Loop Annotation

## Input files

Polaris requires a `.mcool` file as input. You can obtain `.mcool` files in the following ways:

### 1. Download from the 4DN Database

- Visit the [4DN Data Portal](https://data.4dnucleome.org/).
- Search for and download `.mcool` files suitable for your study.

### 2. Convert Files Using cooler

If you have data in formats such as `.pairs` or `.cool`, you can convert them to `.mcool` format using the Python library [cooler](https://cooler.readthedocs.io/en/latest/index.html). Follow these steps:

- **Install cooler**

  Ensure you have installed cooler using the following command:
  ```bash
  pip install cooler
  ```
- **Convert .pairs to .cool**

   If you are starting with a .pairs file (e.g., normalized contact data with columns for chrom1, pos1, chrom2, pos2), use this command to create a .cool file:
   ```bash
   cooler cload pairs --assembly <genome_version> -c1 chrom1 -p1 pos1 -c2 chrom2 -p2 pos2 <pairs_file> <resolution>.cool
   ```
   Replace `<genome_version> with the appropriate genome assembly (e.g., hg38) and <resolution> with the desired bin size in base pairs.
- **Generate a Multiresolution .mcool File**

   To convert a single-resolution .cool file into a multiresolution .mcool file, use the following command:

   ```bash
   cooler zoomify <input.cool>
   ```

The resulting `.mcool` file can be directly used as input for Polaris.

## Loop Annotation by Polaris

### Download example data

You can download example data from the [Hugging Face repo of Polaris](https://huggingface.co/rr-ss/Polaris/resolve/main/example/loop_annotation/GM12878_250M.bcool?download=true) by runing:

In [None]:
%%bash

wget https://huggingface.co/rr-ss/Polaris/resolve/main/example/loop_annotation/GM12878_250M.bcool?download=true -O "./GM12878_250M.bcool"

--2025-03-10 16:52:13--  https://huggingface.co/rr-ss/Polaris/resolve/main/example/loop_annotation/GM12878_250M.bcool?download=true
Resolving huggingface.co (huggingface.co)... 54.230.71.103, 54.230.71.28, 54.230.71.2, ...
Connecting to huggingface.co (huggingface.co)|54.230.71.103|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs-us-1.hf.co/repos/23/0c/230c589108031ad469781dc63a8df8d4f9950d91b268b326f2a95c3481e4100c/962c8dbb130eb024d9d931cf50ace2f1adff8a84bdbf023f6d3770d27842212d?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27GM12878_250M.bcool%3B+filename%3D%22GM12878_250M.bcool%22%3B&Expires=1741600333&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc0MTYwMDMzM319LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmhmLmNvL3JlcG9zLzIzLzBjLzIzMGM1ODkxMDgwMzFhZDQ2OTc4MWRjNjNhOGRmOGQ0Zjk5NTBkOTFiMjY4YjMyNmYyYTk1YzM0ODFlNDEwMGMvOTYyYzhkYmIxMzBlYjAyNGQ5ZDkzMWNmNTBhY2UyZjFhZGZmOGE4NGJkYmYwMjNm

Polaris provides two methods to generate loop annotations for input `.mcool` file. Both methods ultimately yield consistent loop results.

### Method 1: polaris loop pred

This is the simplest approach, allowing you to directly predict loops in a single step.
The command below will take approximately 30 seconds, depending on your device, to identify loops in GM12878 data (250M valid read pairs).

In [6]:
%%bash

polaris loop pred -c chr15,chr16,chr17 -i GM12878_250M.bcool -o GM12878_250M_chr151617_loops.bedpe -t 0.5


polaris loop pred START :)
Automatically selected GPU: 0
Analysing chroms: ['chr15', 'chr16', 'chr17']

********score START********


[Analyzing chr17 with 458 submatrices]: 100%|██████████| 3/3 [00:24<00:00,  8.08s/it]


********score FINISHED********
********pool START********


[Runing clustering on chr16]: 100%|██████████| 3/3 [00:04<00:00,  1.37s/it]


********pool FINISHED********

polaris loop pred FINISHED :)
2876 loops saved to GM12878_250M_chr151617_loops.bedpe


> **Note:** If you encounter a `CUDA OUT OF MEMORY` error, please:
> - Check your GPU's status and available memory.
> - Reduce the --batchsize parameter. (The default value of 128 requires approximately 36GB of CUDA memory. Setting it to 24 will reduce the requirement to less than 10GB.)

### Method 2: polaris loop score and polaris loop pool

This method involves two steps: generating loop scores for each pixel in the contact map and clustering these scores to call loops.


**Step 1: Generate Loop Scores**

Run the following command to calculate the loop score for each pixel in the input contact map and save the result in `GM12878_250M_chr151617_loop_score.bedpe`.

In [7]:
%%bash

polaris loop score -c chr15,chr16,chr17 -i GM12878_250M.bcool -o GM12878_250M_chr151617_loop_score.bedpe -t 0.5


polaris loop score START :) 
Automatically selected GPU: 0
Analysing chroms: ['chr15', 'chr16', 'chr17']


[Analyzing chr17 with 458 submatrices]: 100%|██████████| 3/3 [00:23<00:00,  7.97s/it]



polaris loop score FINISHED :)
Loopscore file saved at GM12878_250M_chr151617_loop_score.bedpe


**Step 2: Call Loops from Loop Candidates**

Use the following command to identify loops by clustering from the generated loop score file.

In [8]:
%%bash

polaris loop pool -i GM12878_250M_chr151617_loop_score.bedpe  -o GM12878_250M_chr151617_loops_method2.bedpe -t 0.5


polaris loop pool START :) 


[Runing clustering on chr15]: 100%|██████████| 3/3 [00:04<00:00,  1.41s/it]



polaris loop pool FINISHED :)
2876 loops saved to GM12878_250M_chr151617_loops_method2.bedpe


We can see both methods ultimately yield consistent loop number.

The we can perform [Aggregate Peak Analysis](https://github.com/ai4nucleome/Polaris/blob/master/example/APA/APA.ipynb) to visualize these results.

#### ⭐**Little function for very large, high coverage, and hight resolution mcool file**

For very large file, the above methods may cause out of memory problem. 

Therefore, we provide a **Function that under Development**.


You can run the code below:

In [9]:
%%bash

polaris loop scorelf --help

Usage: polaris loop scorelf [OPTIONS]

  *development* Score Pixels for Very Large mcool (>30GB) ...

Options:
  -b, --batchsize INTEGER      Batch size [128]
  -C, --cpu BOOLEAN            Use CPU [False]
  -G, --gpu TEXT               Comma-separated GPU indices [auto select]
  -c, --chrom TEXT             Comma separated chroms [all autosomes]
  -t, --threshold FLOAT        Loop Score Threshold [0.5]
  -s, --sparsity FLOAT         Allowed sparsity of submatrices [0.9]
  -md, --max_distance INTEGER  Max distance (bp) between contact pairs
                               [3000000]
  -r, --resol INTEGER          Resolution [5000]
  --raw BOOLEAN                Raw matrix or balanced matrix
  -i, --input TEXT             Hi-C contact map path  [required]
  -o, --output TEXT            .bedpe file path to save loop candidates
                               [required]
  --help                       Show this message and exit.


In [2]:
%%bash

polaris loop scorelf -c chr15,chr16,chr17 -i GM12878_250M.bcool -o GM12878_250M_chr151617_loop_score.bedpe -t 0.5
polaris loop pool -i GM12878_250M_chr151617_loop_score.bedpe  -o GM12878_250M_chr151617_loops_method2.bedpe -t 0.5



polaris loop scorelf START :) 
Automatically selected GPU: 0
Analysing chroms: ['chr15', 'chr16', 'chr17']


[Analyzing chr17]: 100%|██████████| 3/3 [00:21<00:00,  7.26s/it]



polaris loop scorelf FINISHED :)
Loopscore file saved at GM12878_250M_chr151617_loop_score.bedpe

polaris loop pool START :) 


[Runing clustering on chr16]: 100%|██████████| 3/3 [00:04<00:00,  1.41s/it]



polaris loop pool FINISHED :)
2876 loops saved to GM12878_250M_chr151617_loops_method2.bedpe
