Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory problem while creating fasta from hvcf #191

Open
sidevshiy opened this issue Jul 24, 2024 · 5 comments
Open

Out of memory problem while creating fasta from hvcf #191

sidevshiy opened this issue Jul 24, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@sidevshiy
Copy link

sidevshiy commented Jul 24, 2024

Description

Hello,

I've encountered a memory allocation issue while running phg create-fasta-from-hvcf command, which results in an OutOfMemoryError. The error suggests that the required array length exceeds the maximum integer value, implying that an extremely large array allocation is being attempted.

Here is the error message:

Exception in thread "main" java.lang.OutOfMemoryError: Required array length 2147483639 + 931773 is too large
at java.base/jdk.internal.util.ArraysSupport.hugeLength(ArraysSupport.java:752)
at java.base/jdk.internal.util.ArraysSupport.newLength(ArraysSupport.java:745)
at java.base/java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:266)
at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:246)
at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:590)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:179)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:91)
at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:629)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:209)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:91)
at kotlin.text.StringsKt__AppendableKt.appendElement(Appendable.kt:86)
at kotlin.collections.CollectionsKt___CollectionsKt.joinTo(_Collections.kt:3490)
at kotlin.collections.CollectionsKt___CollectionsKt.joinToString(_Collections.kt:3507)
at kotlin.collections.CollectionsKt___CollectionsKt.joinToString$default(_Collections.kt:3506)
at net.maizegenetics.phgv2.cli.CreateFastaFromHvcf.writeCompositeSequence(CreateFastaFromHvcf.kt:224)
at net.maizegenetics.phgv2.cli.CreateFastaFromHvcf.writeSequences(CreateFastaFromHvcf.kt:110)
at net.maizegenetics.phgv2.cli.CreateFastaFromHvcf.buildFastaFromHVCF(CreateFastaFromHvcf.kt:80)
at net.maizegenetics.phgv2.cli.CreateFastaFromHvcf.run(CreateFastaFromHvcf.kt:261)
at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:279)
at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:292)
at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:41)
at com.github.ajalt.clikt.core.CliktCommand.parse(CliktCommand.kt:457)
at com.github.ajalt.clikt.core.CliktCommand.parse$default(CliktCommand.kt:454)
at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:474)
at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:481)
at net.maizegenetics.phgv2.cli.PhgKt.main(Phg.kt:58)

java version "22.0.2" 2024-07-16
Java(TM) SE Runtime Environment (build 22.0.2+9-70)
Java HotSpot(TM) 64-Bit Server VM (build 22.0.2+9-70, mixed mode, sharing)

I attempted to increase the maximum heap size by setting the JVM option JAVA_OPTS="-Xmx1000g". However, this did not resolve the issue. I suspect the issue might be related to the efficiency of the algorithm used or a limitation within Java itself when handling such large data structures

Any guidance or recommendations would be greatly appreciated.

Thank you!

Expected behavior

No response

PHG version

phg version 2.3.7.144

@sidevshiy sidevshiy added the bug Something isn't working label Jul 24, 2024
@sidevshiy
Copy link
Author

I believe problem in joinToString in that function.

fun writeCompositeSequence(outputFileWriter: BufferedWriter, haplotypeSequences: List) {
//group the sequences by chromosome
val sequencesByChr = haplotypeSequences.groupBy { it.refContig }
for(chr in sequencesByChr.keys.sorted()) {
outputFileWriter.write(">$chr\n")
//sort the sequences by startPos
val sequencesByStartPos = sequencesByChr[chr]!!.sortedBy { it.refStart }
//merge and output the sequences
sequencesByStartPos.map { it.sequence }
.joinToString("")
.chunked(80)//Chunking into 80 character lines
.forEach { outputFileWriter.write(it + "\n") }
}
}

@zrm22
Copy link
Collaborator

zrm22 commented Jul 25, 2024

Hello,

Based on the error log I think you are right. It should be a relatively simple fix. We should be able to get out a patch sometime in the next day or so.

Just for information, what species are you working in? Does it have really large chromosomes?

Thanks,

@sidevshiy
Copy link
Author

Hello,
Thank you! I am working with wheat.
I changed the code there, compiled phg and tried. It worked out! So the problem is there!

Thank you for quick replies!

@zrm22
Copy link
Collaborator

zrm22 commented Jul 25, 2024

If you want your changes brought into the main repo we do take external Pull Requests! Otherwise we can update it in the next day or so.

@lynnjo
Copy link
Collaborator

lynnjo commented Oct 4, 2024

@sidevshiy I've coded a fix for this, but am not able to reproduce the original problem. Do you have data you can share for us to test?

Alternately, you could share your fix (which we'll compare to ours) and we can bring that in.

Or I can create a tar file of our change for you to test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants