Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not find contig ptg000001l in the gfa file(s). Maybe the gfa file(s) does not match the fasta file #97

Open
zhulicui opened this issue Jan 3, 2025 · 4 comments

Comments

@zhulicui
Copy link

zhulicui commented Jan 3, 2025

曾老师您好,
我在使用您开发的haphic过程中遇到了一些问题,希望您能提供一些帮助。
我使用了haphic建议的bwa处理方法以及过滤条件得到了hic.bam以及hifiasm的hic加hifi数据模式组装得到的asm.hic.p_ctg.fa,在使用
”haphic pipeline asm.hic.p_ctg.fasta HIC.filtered.sort.bam 30 --gfa ‘asm.hic.hap1.p_ctg.gfa,asm.hic.hap2.p_ctg.gfa’ --correct_nrounds 2 --remove_allelic_links 2”命令时一直出现报错,提示“Can not find contig ptg000001l in the gfa file(s). Maybe the gfa file(s) does not match the fasta file.
”,但是我在尝试将hifiasm直接生成的hap*.p_ctg.gfa文件中的contig名称修改为与asm.hic.p_ctg.fa相同后运行该命令行还是出现了这样的错误信息,
我不是很清楚具体是哪里出现了问题。
Snipaste_2025-01-03_16-54-51

@zhulicui zhulicui closed this as completed Jan 3, 2025
@zhulicui zhulicui changed the title hic hic_p_ctg data plot Jan 3, 2025
@zhulicui zhulicui reopened this Jan 3, 2025
@zengxiaofei
Copy link
Owner

zengxiaofei commented Jan 3, 2025

  1. It appears that the input GFA files are not correct; the lines in GFA files generated by hifiasm should start with either "S", "A", or "L". However, the first lines of your GFA files begin with ">".
  2. The contig IDs in *.p_ctg.gfa files should start with "ptg", whereas those in *.hap1.p_ctg.gfa and *.hap2.p_ctg.gfa files should begin with "h1tg" and "h2tg", respectively. I‘m not sure what modifications have been made to your GFA files, but both the sequence IDs and formats appear to be incorrect.
  3. The FASTA file and the GFA files you provided do not match. If your input FASTA file is a primary assembly (*.p_ctg.fa), you should provide the corresponding *.p_ctg.gfa file for the --gfa parameter. Conversely, if your input FASTA file is a haplotype-resolved assembly (a concatenation of *.hap1.p_ctg.fa and *.hap2.p_ctg.fa), then both *.hap1.p_ctg.gfa and *.hap2.p_ctg.gfa files should be provided for the --gfa parameter.

@zengxiaofei
Copy link
Owner

Another issue I have noticed is that the input BAM file was named *.sort.bam. Please be aware that the input BAM file should not be coordinate-sorted. If you followed our method for Hi-C data mapping and filtering, there should be no need for BAM sorting. For more information, refer to our documentation:

You can prepare the BAM file according to your own preferences or requirements, but DO NOT sort it by coordinate. If your BAM file is already sorted by coordinate, you need to resort it by read name (samtools sort -n).

@zengxiaofei zengxiaofei changed the title hic_p_ctg data plot Can not find contig ptg000001l in the gfa file(s). Maybe the gfa file(s) does not match the fasta file Jan 3, 2025
@zengxiaofei
Copy link
Owner

zengxiaofei commented Jan 6, 2025

我的问题在于最开始我在使用没有修改的gfa文件,也就是您说的以S h1tg或S h2tg开头的*.hap1.p_ctg.gfa 和 .hap2.p_ctg.gfa文件时就出现了这样的报错信息

The answer is:

  1. The FASTA file and the GFA files you provided do not match. If your input FASTA file is a primary assembly (*.p_ctg.fa), you should provide the corresponding *.p_ctg.gfa file for the --gfa parameter. Conversely, if your input FASTA file is a haplotype-resolved assembly (a concatenation of *.hap1.p_ctg.fa and *.hap2.p_ctg.fa), then both *.hap1.p_ctg.gfa and *.hap2.p_ctg.gfa files should be provided for the --gfa parameter.

Your current issue is that you have input the FASTA file of the primary assembly (*.p_ctg.fa, haplotype-unresolved, ptgxxxxxxl) but used GFA files of the haplotype-resolved assemblies (hap*.p_ctg.fa, h1tgxxxxxxl/h2tgxxxxxxl).

If you are unclear about the meaning of the various GFA files output by hifiasm, please refer to:

还是说我的输入文件中的asm.fa文件应该改为p_ctg.gfa文件吗?

FASTA and GFA are two entirely different formats. When I refer to .fa or .fasta, it means the FASTA format, whereas mentioning .gfa refers to the GFA format. You do not need to modify these formats yourself. For the output results of hifasm, you only need to convert the GFA format to .fa as the input file asm.fa for haphic. The file required by the --gfa parameter is the original GFA file output by hifiasm, and you should not make any modifications to it.

@zhulicui
Copy link
Author

zhulicui commented Jan 6, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants