Our lab has since 2026 started using LLMs. It wasn't an easy step - there's no lack of ethical concerns here, and we take pride in our programming skills. But the world moves on and it's evolve-or-die: We can't stop the progress. We can only try to move in the least bad direction.
We are delivering increasingly complex bioinformatics pipelines and we are a small number of people, with limited monetary supply and time, against an ecosystem that keeps asking more out of us. Our use of LLMs was born out of necessity to keep up, fixing the long list of trivial problems, so we can focus our attention on the really hard ones. Below is a blurb that all our crates will have, while they also point here for the latest one and further info.
Most users should probably first see if the existing original code works for them, unless they have reason otherwise. The original source may have newer features and it has had more love in terms of fixing bugs. In fact, we aim to replicate bugs if they are present, for the sake of reproducibility! (but then we might have added a few more in the process)
There are however cases when you might prefer this Rust version. We generally agree with this manifesto but more specifically:
- We have had many issues with ensuring that our software works using existing containers (Docker, PodMan, Singularity). One size does not fit all and it eats our resources trying to keep up with every way of delivering software
- Common package managers do not work well. It was great when we had a few Linux distributions with stable procedures, but now there are just too many ecosystems (Homebrew, Conda). Conda has an NP-complete resolver which does not scale. Homebrew is only so-stable. And our dependencies in Python still break. These can no longer be considered professional serious options. Meanwhile, Cargo enables multiple versions of packages to be available, even within the same program(!)
- The future is the web. We deploy software in the web browser, and until now that has meant Javascript. This is a language where even the == operator is broken. Typescript is one step up, but a game changer is the ability to compile Rust code into webassembly, enabling performance and sharing of code with the backend. Translating code to Rust enables new ways of deployment and running code in the browser has especial benefits for science - researchers do not have deep pockets to run servers, so pushing compute to the user enables deployment that otherwise would be impossible
- Old CLI-based utilities are bad for the environment(!). A large amount of compute resources are spent creating and communicating via small files, which we can bypass by using code as libraries. Even better, we can avoid frequent reloading of databases by hoisting this stage, with up to 100x speedups in some cases. Less compute means faster compute and less electricity wasted
- LLM-mediated translations may actually be safer to use than the original code. This article shows that running the same code on different operating systems can give somewhat different answers. This is a gap that Rust+Cargo can reduce. Typesafe interfaces also reduce coding mistakes and error handling, as opposed to typical command-line scripting
But:
- This approach should still be considered experimental. The LLM technology is immature and has sharp corners. But there are opportunities to reap, and the genie is not going back into the bottle. This translation is as much aimed to learn how to improve the technology and get feedback on the results.
- Translations are not endorsed by the original authors unless otherwise noted. Do not send bug reports to the original developers. Use our Github issues page instead.
- Do not trust the benchmarks on this page. They are used to help evaluate the translation. If you want improved performance, you generally have to use this code as a library, and use the additional tricks it offers. We generally accept performance losses in order to reduce our dependency issues
- Check the original Github pages for information about the package. This README is kept sparse on purpose. It is not meant to be the primary source of information
- If you are the author of the original code and wish to move to Rust, you can obtain ownership of this repository and crate. Until then, our commitment is to offer an as-faithful-as-possible translation of a snapshot of your code. If we find serious bugs, we will report them to you. Otherwise we will just replicate them, to ensure comparability across studies that claim to use package XYZ v.666. Think of this like a fancy Ubuntu .deb-package of your software - that is how we treat it
This blurb might be out of date. Go to this page for the latest information and further information about how we approach translation
To avoid confusion with the original software, but aid users to find the translation, we aim to put "-rs" to the original name. Sometimes such crates already exist on Crate.io, in which other versions such as -pure-rs or -compliant-rs are added. Compliant here refers to our practice of trying to retain the original behavior.
Rust takes semantic versioning VERY seriously. But anything 0.x.y is treated as "good luck", aimed for early development. Because we add high-level APIs (affects x.y.0), unintentionally add additional bugs (affects 0.0.z), but also have the original software version (x.y.z) which need not follow semantic versioning, we just can't commit to Rust versioning. We will thus use 0.y.z for all translations. If ever the original author (or appointed person) wish to take over the translation and make it the primary implementation, this would be the time to bump the major Rust crate version.
We will release information about our process once we think it produces good output. Other people may otherwise blindly follow the instructions and fill the internet with poor translations. But in short, we use Claude and Codex along with static analysis software that helps validate the quality. Most important conclusions so far are:
- Static analysis is essential for quality translation
- Claude is borderline unsuitable for translation work as it frequently refuses to follow instructions and produces biased benchmarks. Cleaning up work from Claude can offset its contribution
- Functions should be translated as near to 1-1 as possible to ensure efficient audit; aim to oneshot the full logic right away to avoid time consuming backtracking
- Performance deviation (up or down) vs original software should be treated as a regression until otherwise proven
- Memory use deviation (up or down) vs original software should be treated as a regression until otherwise proven
- Always translate bottom up
- Rely on LLM as little as possible. Treat it as a fallback
- Benchmark on real world data, of increasing size. Use multiple different datasets to stress different code paths. Also test with properly large datasets
We are designing software to aid faithful and efficient translation. You can point your LLM to them and it will figure out how to use them
- tracehash: Helps instrument original + translated code for efficient bug tracking and bit-wise reproducibility. Supports light hash-based comparison for a first pass, and heavy full I/O recording to figure out what is going wrong
- CCC/code-complexity-comparator: Pinpoints unfaithful translation in code by comparing complexity and content of functions, using static analysis.
- gdb-translation-verifier-rs: Run two GDBs (debuggers), step through the code, and halt when the code paths deviate. Theoretically the gold standard internal comparison, and avoids instrumentation
Translation is hard. Below are three screenshots from the CCC, which has a TUI to compare the before-after translation. This is the HDF5 source code, which is in C. Unlike newer languages, C does not enforce any structure, and C projects have frequently came up with "their own ways" of organizing code. But for an outsider, it can look really esoteric. CCC attempts to color code variables the same left vs right to aid comparison. It works well for the first function, but not so well for the second function. For this project, translating functions 1-1 in a good way is near impossible.
At the bottom of this viewer you also see metrics on code complexity. If they deviate, it suggests that logic or optimizations have gone missing. This has caught many bugs, especially related to optimization where LLM frequently ends up trying to optimize "the hot loop" rather than why it ended up there in the first place. Unfortunately this approach relies on translations being 1-1, and one should try to enforce this from the first line of translation.
If you want to try out translation, here are some things to consider:
We will likely disagree on how to do translations but to minimize damage, at least think through the implications before you put up code on the internet. Read this manifesto, which has at least some basic decent guidelines. There are legal matters to understand (at minimum), but long-term, the biggest concerns will be in how the code should be maintained.
- C code can be really hard to translate and it is commonly quite fast already. This is not the best place for a beginner to start
- If you don't know Rust, be sure to have someone senior around who can QC your final result before cargo publishing! (cannot be undone)
- Plan for future maintenance from the start. Avoid translating code bases that are undergoing rapid updates unless you are seriously committed to updating the translation. Check when the last commit was made to the repo to gauge this
Claude will aim to deliver a minimum viable product as soon as possible. Likely because it is easier to test working code, and LLMs heavily depend on testing. But the downside is that it will skip 90% of the features. It will ignore corner cases and introduce bugs. It will replace advanced fast algorithms with brute force algorithms. All of this will come to explode in your face later!
Worst of all, when you later try to fix the code (speed, output equality), the LLM will start to introduce heuristics that only work on your small test data. Claude is especially good at "innovating" in heuristics.
To overcome these problems as much as possible:
- Obtain real-life data for testing on ASAP
- Ask the LLM to do faithful (bitwise equality of outputs) translation, covering all features, from the beginning
- Keep reminding the LLM of this goal (for Claude, every 1h, or more frequently if the code is hard to fix)
- Ask the LLM to produce one Rust function for each input function. This is not always possible, but it makes it so much easier to check correctness later
- Translate bottom up. Use our CCC to figure out the best order to translate, using static analysis. Tell the LLM to keep the ccc_mapping.toml file up to date. In the current workflow we produce stubs for each function to translate, from day 1, before filling functions with code. This appears to limit any LLM creativity
- Instrument the code to check intermediate values, not just the final output. Our Tracehash crate is for this purpose
- Do some manual inspection. Our CCC has a TUI for this
Claude is notoriously bad at performing speed comparisons vs the original code, because it will use every means of finding ways of making translation look better. It can be hard to even convince Claude to be fair. For example, it will compare 8-thread Rust code vs 1-thread C. It will not enable C compiler optimization. It will focus on trivial examples where starting costs dominate. It will disable SIMD. It will focus optimization on irrelevant parts of code. Don't use Claude for this purpose if you can avoid it.
If the LLM gets freedom to add multithreading, it might not do it at the right level, causing major speed problems. Exposed if compared to the original code. Be vigilant if original code lack multithreading.
Also remember that your final tool might run as part of a bigger tool. The caller should be able to control if the multithreading should be hoisted or not. Make it possible to share the Threadpool.
The API generated by LLM is not pretty and it might not fulfill your needs. You especially need to ensure that data can be fed directly and not just via CLI. If the tool loads a database, ensure that the user need only do this once.
Check naming of exposed functions. "Pipeline" is not a good name (as there might be many). Instead of myfunction(x), x.myfunction() is cleaner to export.
If there is a low level API already, then best to add a new high level API on top. It's a strict addition to the translated code (does not mess up verifiability), and the top layer has little impact on performance anyway.
Investigate if you can support better integration with existing Rust ecosystem. E.g., it might be better to use Noodles for parsing FASTA. This is in conflict with keeping the code to the original source, and adds dependencies. Good judgement needed.
To reduce cost, we are investigating how to limit context size (e.g. using agents in Claude) without loss of precision. Input is welcome
- We could just ask our LLM to write the software we need but it would then just go out and copy it from somewhere, without us knowing the origin. This means unclear copyright, and nobody gets attributed for their contribution
- Maintaining a competing software is more work, adds code to maintain, and steals citations. It's a lose-lose scenario
- Written software has been tested, and LLMs tend to produce rather shoddy software. It still cannot compete with hand-crafted code from an engineer. By translating, we can compare our software to a reference to ensure correctness
Software like c2rust can convert C to Rust 100% faithfully, but at a serious cost: It doesn't look like Rust at all! This causes several problems:
- Heavy use of "unsafe" means that the memory safety guarantees of Rust are disabled
- The Rust compiler will not recognize the code. It is designed for "code that looks like Rust", and if it does not, the generated binary will be slower than otherwise
- Ideally we want to promote Rust as the future language of bioinformatics! And to smoothen the transition, we ideally offer code that is as similar to the original as possible, to provide familiarity with its structure. LLMs are needed for idiomatic rewrites
That said, we are looking into ways of reducing LLM usage, as an algorithm can be proven to generate faithful results, while LLMs not really
Don't assume this; it can be much faster if you use the final software as a library, but don't expect a basic CLI replacement to be faster. While Rust is faster than R, python etc, it can be hard to beat well-written C++ code that already includes SIMD or assembly code. But you can get a ton of speed improvement if your tool reads a database and then applies it to input data. If you call the crate as a library (not CLI!), and apply it to a lot of input data, you can reduce the initial loading cost by a ton.
You can put it as a request in this thread and I will see what I have time for. If you are an author of a commonly used tool and want to move to Rust, I am happy to help!
One challenge in translating code to Rust is that floating point (decimal) math is really hard to translate well. If you care about p-values, this is important.
Everyone learn in school that (ab)c = a(bc), but this is not true in computers for floating point. Orders matter. This even worse for addition of decimal numbers. "YOLO" is a very bad approach for translating any code that deals with numbers as once you are off, you will have to search the whole code base for the reordering error. This is hard for a human, and it is hard for an LLM. Verbatim translation for day 1 is the only serious solution (and use tracehash when you undoubtedly will still fail).
This has implications for SIMD, which can really help speed up computations. Instead of (((ab)c)d), a CPU can e.g. compute (abcd) in one swoop. But now the order was changed! There are thus potential tradeoffs to be made when enabling SIMD vs faithful reproduction of the original. SIMD is typically offered as an optional crate feature because of this. If you enable this feature, beware that you might get somewhat different results.
Bioinformaticians, please don't use p-value cutoffs if you can!!! If you include genes at p=0.05, then a gene that got 0.051 once, and then 0.049 after SIMD, this is a huge difference for the results (exaggerated, differences are smaller, but you get the point). If you run multiple steps with cutoffs everywhere, it is better if you can instead weigh by p-value. This is much more robust to numeric differences. i.e., if you want to compute P(x and y), then do this as P(x)P(y). This is better than if P(x<0.05) then check P(y) -- both more a numerical and statistical standpoint. The future will love you.
There are many libraries for graphical user interfaces, some only working on certain operating systems. Faithful reproduction of GUIs is usually not a good idea. Ideally a modern design should also cater to different monitor sizes, and enable adaptive font sizes to aid those with poor eye sight, or needing special readers for aid. We translate GUIs very losely and more work can be done in this domain.
Testing GUIs is also hard. LLMs only work well if they are allowed to write A LOT of tests, and we also lack the equivalent of a "byte per byte" comparison. GUIs are not fun to translate.
Pragmatically, we have picked the Rust Iced library as the default. It results in the least code but is not perfect. Default translations don't look good. Instead, the best trick so far is to include screenshots of the original and let the LLM do some adjustments. Input of GUI translation is most welcome!

