本地部署的 ColabFold • JuniorTree

有一个项目叫做 LocalColabFold，但是在国内服务器上安装各种依赖和解决依赖冲突实在是灾难，项目在这里：GitHub - YoshitakaMo/localcolabfold: ColabFold on your local PC ↗

一个很好的解决办法是使用 Docker/singularity 来运行 ColabFold：Running ColabFold in Docker · sokrypton/ColabFold Wiki · GitHub ↗

拉取镜像#

将 ghcr.io 替换为 ghcr.nju.edu.cn 即可，用南大的镜像站

singularity pull docker://ghcr.nju.edu.cn/sokrypton/colabfold:1.5.5-cuda12.2.2

bash

权重文件下载#

singularity run -B ./cache:/cache \
  colabfold_1.5.5-cuda12.2.2.sif \
  python -m colabfold.download

bash

这个会在当前目录下创建一个 cache ，里面会下载权重文件，速度忽好忽慢

运行预测#

我写了俩脚本，一个用于快速预测，另外一个用来精细化预测，而且因为实验室的卡有两张 A 100，所以我还尝试做了并行优化

这个优化的逻辑其实很简单，就是在一个文件夹下面放多个蛋白的 fasta 文件，然后脚本去取两个蛋白，分别丢到两个 GPU 上，这个过程并不涉及交火，所以其实性能损耗很小

但是似乎效果不太好

运行脚本大概是：

bash run_parallel_colabfold.sh --input liueic/protein_data --out liueic/output_dir --cache liueic/colabfold_cache --work-bind liueic/protein_data --sif liueic/colabfold_1.5.5-cuda12.2.2.sif --gpus 1 --tasks-per-gpu 1 --colabfold-args "--num-models 5 --num-recycle 6 --max-seq 256 --max-extra-seq 512 --msa-mode mmseqs2_uniref_env --pair-mode unpaired_paired --model-type auto --stop-at-score 90 --zip"

bash

刚开始预测的时候速度比较慢，但是我能看到显存已经被占用了，打开日志可以看到，是因为请求 MSA 服务：

[00:23:21] [GPU 0] [] WARNING: You are welcome to use the default MSA server, however keep in mind that it's a
[00:23:21] [GPU 0] [] limited shared resource only capable of processing a few thousand MSAs per day. Please
[00:23:21] [GPU 0] [] submit jobs only from a single IP address. We reserve the right to limit access to the
[00:23:21] [GPU 0] [] server case-by-case when usage exceeds fair use. If you require more MSAs: You can 
[00:23:21] [GPU 0] [] precompute all MSAs with `colabfold_search` or host your own API and pass it to `--host-url`

bash

如果是要做高通量并行这样是不行的，比较好的方法是自部署一个 MSA 服务，可以参考：GitHub - sokrypton/ColabFold: Making Protein folding accessible to all! ↗

但是这个磁盘需求比较大，应该需要一台专门的服务器来做这个：

First create a directory for the databases on a disk with sufficient storage (940 GB (!)). Depending on where you are, this will take a couple of hours

目前即使不自己部署 MSA，其实也是能接受的，大不了晚上睡一觉起来又好了，因为也就几百个蛋白（？可能）