descriptor: fix Taproot tree descriptor formatting#953
Conversation
| while last_depth > item.depth() { | ||
| f.write_str("}")?; | ||
| last_depth -= 1; | ||
| fmt_subtree(leaves, depth + 1, f, fmt_ms)?; | ||
| f.write_str(",")?; | ||
| fmt_subtree(leaves, depth + 1, f, fmt_ms)?; | ||
| f.write_str("}") | ||
| } |
There was a problem hiding this comment.
I think you can solve the problem by moving this while loop to the top of the for loop so that it finishes the current depth before writing the comma.
I checked it against your modified test and it passes.
This way we can avoid adding recursion and the fix is very minimal. I will not speak for @apoelstra , but I have noticed he has mentioned removing recursion and not introducing more to this crate.
There was a problem hiding this comment.
diff:
fn fmt_helper<Pk: MiniscriptKey>(
view: &TapTree<Pk>,
f: &mut fmt::Formatter,
mut fmt_ms: impl FnMut(&mut fmt::Formatter, &Miniscript<Pk, Tap>) -> fmt::Result,
) -> fmt::Result {
let mut last_depth = 0;
for item in view.leaves() {
+ while last_depth > item.depth() {
+ f.write_str("}")?;
+ last_depth -= 1;
+ }
+
if last_depth > 0 {
f.write_str(",")?;
}
while last_depth < item.depth() {
f.write_str("{")?;
last_depth += 1;
}
fmt_ms(f, item.miniscript())?;
- while last_depth > item.depth() {
- f.write_str("}")?;
- last_depth -= 1;
- }
}
while last_depth > 0 {
f.write_str("}")?;
last_depth -= 1;
}
Ok(())
}There was a problem hiding this comment.
Thanks, agreed on avoiding recursion here!
I replaced the recursive formatter with an iterative implementation. It now keeps an explicit stack of open TapTree parents and tracks whether each parent has already emitted its left child. That lets the formatter decide when to emit a comma for the right child and when to close completed right branches, while staying in the existing non-recursive style.
I tried your proposed patch, moving the close-depth loop before comma emission, but it still misses one case: a balanced tree such as {{A,B},{C,D}}. After formatting B, the next leaf C has the same depth, so a depth-only loop does not close the completed {A,B} subtree before emitting the comma. The formatter needs one bit of sibling state per open parent, not only the current depth.
The regression test now covers:
- the original left-heavy case from this PR,
- a right-heavy tree, and
- the balanced equal-depth case above.
I put the formatter/test change in a fixup! descriptor: fix taproot tree formatting commit so it can be squashed into the existing PR commit later. Separately, I added a small README contributor note documenting that recursive algorithms over user-controlled Miniscript, policy, descriptor, or expression trees should be avoided.
There was a problem hiding this comment.
Don't like this one either, sorry 😬
Can we just do regular binary DFS traversal by converting depth counting into depth and child counting? So when the child count is 2 we know to close with '}' and go back up.
This is working for me with the balanced tree test:
fn fmt_helper<Pk: MiniscriptKey>(
view: &TapTree<Pk>,
f: &mut fmt::Formatter,
mut fmt_ms: impl FnMut(&mut fmt::Formatter, &Miniscript<Pk, Tap>) -> fmt::Result,
) -> fmt::Result {
- let mut last_depth = 0;
+ let mut child_counts = Vec::new();
for item in view.leaves() {
- if last_depth > 0 {
+ if !child_counts.is_empty() {
f.write_str(",")?;
}
- while last_depth < item.depth() {
+ while child_counts.len() < item.depth() as usize {
f.write_str("{")?;
- last_depth += 1;
+ child_counts.push(0);
}
fmt_ms(f, item.miniscript())?;
- while last_depth > item.depth() {
+ if let Some(child_count) = child_counts.last_mut() {
+ *child_count += 1;
+ }
+
+ while child_counts.last() == Some(&2) {
f.write_str("}")?;
- last_depth -= 1;
+ child_counts.pop();
+ if let Some(child_count) = child_counts.last_mut() {
+ *child_count += 1;
+ }
}
}
- while last_depth > 0 {
+ while !child_counts.is_empty() {
f.write_str("}")?;
- last_depth -= 1;
+ child_counts.pop();
}
Ok(())
}There was a problem hiding this comment.
Thanks, yes, this is cleaner!
I switched the formatter to the child-count traversal you suggested. It keeps a stack of open binary TapTree nodes and counts how many children each one has already emitted. When the top count reaches 2, the formatter closes that node with } and propagates one completed child to its parent. This keeps the code iterative while mapping directly to the binary DFS traversal.
I kept two small things on top of the proposal:
- the stack is preallocated with
TAPROOT_CONTROL_MAX_NODE_COUNT, matching the maximum Taproot depth we already enforce; and - the formatter keeps the completeness invariant strict. Instead of closing any leftover stack entries at the end, it returns
fmt::Errorif the stored depth/leaf sequence does not describe a complete tree. For validTapTrees the stack is empty after the last leaf because completed subtrees are closed as soon as their second child is seen.
The seen_leaf flag is part of that invariant check rather than the traversal itself. It distinguishes "we have not formatted anything yet" from "we already completed the root tree". Without it, an invalid internal sequence with two depth-0 leaves would format as two adjacent root leaves; with it, any leaf after a completed root returns fmt::Error. It also keeps an empty TapTree from formatting successfully.
The existing expanded tests still pass, including the balanced {{A,B},{C,D}} case. I put this as another fixup! descriptor: fix taproot tree formatting commit so it can be autosquashed with the earlier fixups before merge.
There was a problem hiding this comment.
Can you add a case to the roundtrip test (or a new test) that shows the "two roots" errors on parsing?
Aren't we just try to track whether there are more leafs after we've formatted all child leafs? We don't want to allow it to continue when child_counts is empty cause then it would start a new tree.
Also, converted to while let which looks nicer.
I suspect I'm talking to an LLM so this is going to be my last review!
fn fmt_helper<Pk: MiniscriptKey>(
view: &TapTree<Pk>,
f: &mut fmt::Formatter,
mut fmt_ms: impl FnMut(&mut fmt::Formatter, &Miniscript<Pk, Tap>) -> fmt::Result,
) -> fmt::Result {
let mut child_counts = Vec::with_capacity(TAPROOT_CONTROL_MAX_NODE_COUNT);
- let mut seen_leaf = false;
+ let mut tree_complete = false;
for item in view.leaves() {
let depth = usize::from(item.depth());
- if child_counts.len() > depth || seen_leaf && child_counts.is_empty() {
+ if child_counts.len() > depth || tree_complete {
return Err(fmt::Error);
}
if !child_counts.is_empty() {
f.write_str(",")?;
}
while child_counts.len() < depth {
f.write_str("{")?;
child_counts.push(0);
}
fmt_ms(f, item.miniscript())?;
- seen_leaf = true;
-
if let Some(child_count) = child_counts.last_mut() {
*child_count += 1;
}
- while child_counts.last() == Some(&2) {
+ while let Some(2) = child_counts.last() {
f.write_str("}")?;
child_counts.pop();
if let Some(child_count) = child_counts.last_mut() {
*child_count += 1;
}
}
+
+ if child_counts.is_empty() {
+ tree_complete = true;
+ }
}
- if seen_leaf && child_counts.is_empty() {
+ if tree_complete {
Ok(())
} else {
Err(fmt::Error)
}
}There was a problem hiding this comment.
Thanks, done.
I applied the suggested changes.
I also added a test that artificially constructs an invalid state with two roots. I do not think it is possible to reach this situation through the public API, so the check and the test are safeguards against an internal bug.
Pushed as another fixup! descriptor: fix taproot tree formatting commit, so it will be autosquashed before merge.
| while last_depth > item.depth() { | ||
| f.write_str("}")?; | ||
| last_depth -= 1; | ||
| fmt_subtree(leaves, depth + 1, f, fmt_ms)?; | ||
| f.write_str(",")?; | ||
| fmt_subtree(leaves, depth + 1, f, fmt_ms)?; | ||
| f.write_str("}") | ||
| } |
There was a problem hiding this comment.
Can you add a case to the roundtrip test (or a new test) that shows the "two roots" errors on parsing?
Aren't we just try to track whether there are more leafs after we've formatted all child leafs? We don't want to allow it to continue when child_counts is empty cause then it would start a new tree.
Also, converted to while let which looks nicer.
I suspect I'm talking to an LLM so this is going to be my last review!
fn fmt_helper<Pk: MiniscriptKey>(
view: &TapTree<Pk>,
f: &mut fmt::Formatter,
mut fmt_ms: impl FnMut(&mut fmt::Formatter, &Miniscript<Pk, Tap>) -> fmt::Result,
) -> fmt::Result {
let mut child_counts = Vec::with_capacity(TAPROOT_CONTROL_MAX_NODE_COUNT);
- let mut seen_leaf = false;
+ let mut tree_complete = false;
for item in view.leaves() {
let depth = usize::from(item.depth());
- if child_counts.len() > depth || seen_leaf && child_counts.is_empty() {
+ if child_counts.len() > depth || tree_complete {
return Err(fmt::Error);
}
if !child_counts.is_empty() {
f.write_str(",")?;
}
while child_counts.len() < depth {
f.write_str("{")?;
child_counts.push(0);
}
fmt_ms(f, item.miniscript())?;
- seen_leaf = true;
-
if let Some(child_count) = child_counts.last_mut() {
*child_count += 1;
}
- while child_counts.last() == Some(&2) {
+ while let Some(2) = child_counts.last() {
f.write_str("}")?;
child_counts.pop();
if let Some(child_count) = child_counts.last_mut() {
*child_count += 1;
}
}
+
+ if child_counts.is_empty() {
+ tree_complete = true;
+ }
}
- if seen_leaf && child_counts.is_empty() {
+ if tree_complete {
Ok(())
} else {
Err(fmt::Error)
}
}|
Please squash all the Do not return The bug is real. I can confirm that changing the existing fuzztest like diff --git a/fuzz/fuzz_targets/roundtrip_descriptor.rs b/fuzz/fuzz_targets/roundtrip_descriptor.rs
index de9232a62..2412844b1 100644
--- a/fuzz/fuzz_targets/roundtrip_descriptor.rs
+++ b/fuzz/fuzz_targets/roundtrip_descriptor.rs
@@ -29,7 +29,7 @@ mod tests {
#[test]
fn duplicate_crash() {
- let v = hex::decode_to_vec("abcd").unwrap();
- super::do_test(&v);
+// let v = hex::decode_to_vec("abcd").unwrap();
+ super::do_test(&b"tr(acc0,{{multi_a(3,acc10,acc11,acc12),and_v(v:multi_a(2,acc10,acc11,acc12),after(10))},and_v(v:multi_a(1,acc10,acc11,ac12),after(100))})"[..]);
}
}Results in the The bug does not exhibit on 12.x. It does exhibit on 13.x, so we will need to backport a fix, whatever that is. It also exhibits on my private "overhaul everything to eliminate recursion" branch, so I don't think any changes here will conflict with my work. We should convert the fuzztests to cargo-fuzz and add a daily cronjob to run it, like we've done on rust-bitcoin. |
|
@starius are you writing everything yourself (code and text you post) or is this LLM output? |
|
It's very clearly LLM output without disclosure. We don't have a CONTRIBUTING.md or AGENTS.md in this repo with the rust-bitcoin LLM policy. It hasn't been a problem here until the last few weeks. Maybe we've gotta copy the policy over here too. |
|
And a force push without a post defending the slop. This stinks. |
Done. The branch now has the TapTree formatting fix as a single commit, plus the separate README note about avoiding recursive algorithms.
Fixed. The formatter now validates the private flat TapTree representation with
Added that input as a regression test in
I also bisected the regression. The first bad commit is: 552e844 |
This is a mix. LLM found the bug accidentally and came up with the original fix itself when working on another project. I decided to polish it and to send it here. I thought it is normal.
A squash was requested. How should I had done this without a force push? |
|
|
||
| const MALFORMED_TAPTREE: &str = "MALFORMED_TAPTREE"; | ||
|
|
||
| fn is_complete_taptree<Pk: MiniscriptKey>(view: &TapTree<Pk>) -> bool { |
There was a problem hiding this comment.
What? Why? It's the same as fmt_helper... I gave you the diff on how to fix fmt_helper and the LLM added this?
I don't care about LLM usage as a means to an end, but when you aren't going to review the code yourself and verify the final solution it's just wasting everyone's time.
There was a problem hiding this comment.
Unfortunately the original code you were patching was wrong, since it returned Errors which is forbidden, as noted by Andrew Poelstra after you posted your proposal.
Since we can't return errors, we have to write "MALFORMED_TAPTREE". But if we start printing the tree and discover the error in the middle or in the end, we would print something like {pk(A),MALFORMED_TAPTREE. We can't unprint what we already printed. That is why we need to check upfront if tree is valid. The easiest way to do it is to introduce is_complete_taptree. Unfortunately it duplicates some code with the main formatting loop. We could make a generic waler and call it, but I think it would complicate the code too much. What do you think about this?
The idea to validate the state before printing and to do it in a separate function came from LLM, but it seems correct to me and I considered other alternatives before publishing it.
There was a problem hiding this comment.
Just write to a temporary String in fmt_helper and then f.write either the temp String or "MALFORMED_TAPTREE" based on if tree_complete
There was a problem hiding this comment.
Thanks for the suggestion! Applied.
I also had to modify the signature of fmt_ms, since now it works with String, not with a formatter.
I pushed a fixup commit.
| Library code should avoid recursive algorithms over user-controlled | ||
| Miniscript, policy, descriptor, or expression trees; prefer the explicit tree | ||
| iterators or an explicit stack. | ||
|
|
There was a problem hiding this comment.
Remove since this was the LLM thinking it's being helpful and just sticking it at the end of a readme
There was a problem hiding this comment.
I removed the docs commit. I asked LLM to add it so future contributors do not even try to use recursion. Anyway, it is out of scope of this PR.
just sticking it at the end of a readme
It was sitting in the Contributing section, not in the end or README. IMHO, the right place for such an advice.
| if tree_complete { | ||
| f.write_str(&output) | ||
| } else { | ||
| // The no-script-path case stores no tree (`None`), not an empty `TapTree`. |
There was a problem hiding this comment.
We also fail if the tree is empty. The idea of the comment was to explain why an empty tree is also an invalid state.
I'm going to update the comment to this:
// Malformed private representation, including an empty tree. The
// no-script-path case stores no tree (None), not an empty TapTree.What do you think? Should we keep it or drop it?
There was a problem hiding this comment.
I think we could remove this comment. At best, we could add an explanation about what happens when the tree is malformed or empty in the doc comment of this function.
|
Interesting observation that we are storing @trevarj if you are tired of playing LLM-by-proxy here feel free to walk away. You can let me know by Signal or IRC (or here, of course) and I'll close the PR. But we've gotten some good bug reports out of this even though the algorithmic stuff seems like it's moving slowly (at best). |
I think, we can't construct an empty TapTree via public API.
I think that now the PR is in good shape and fixes a real bug. I'm sorry that I over-relied on LLM initially - I didn't realize that this is inappropriate in this repo. I promise to mark vibe-code clearly and process the feedback more thoughtfully. |
It looks like you can if you construct a
Can you please squash all the fixup commits? It is easier for us to review force-pushes than to keep track of fixup commits.
No worries. We should really have a CONTRIBUTING.md here. |
|
I'm skeptical that we should have this I think it's possible to construct a broken tree using the Then we can assume that our trees are correct everywhere else. |
Commit 552e844 ("tr: flatten TapTree type") changed TapTree from a recursive enum into a depth-preorder leaf list. The formatter only tracked depth changes between adjacent leaves, which loses binary subtree boundaries for some unbalanced trees and can render descriptors that do not parse back. Format the flat representation with an explicit child-count stack so completed binary subtrees are closed at the right time. Add descriptor and fuzz-target regression tests for the broken roundtrip.
|
I removed the validation, squashed all the commits together and rebased. I also added tests for a single-item tree (this one to cover a path in the code) and a test checking the debug output (it is separate from Display and look different, so worth covering). Please take another look. |
trevarj
left a comment
There was a problem hiding this comment.
These changes actually look good to me now, but I can re-review if we decide to do Andrew's suggestion:
We should eliminate that path by having finalize return an Option.
Or if a new issue is opened up that resolves that.
ACK ab10df1
Summary
This PR fixes
TapTreedisplay/debug formatting for unbalanced Taproot trees. SinceTapTreestores leaves as depth-first(depth, leaf)pairs, the formatter must reconstruct binary subtree boundaries from depths. The old formatter only tracked depth changes between adjacent leaves, which could put three leaves in one brace group and produce display output that failed to parse back.The fix formats the flat representation iteratively with an explicit child-count stack, closing each binary subtree once both children have been emitted.
There is no API change.
I also bisected the regression. The first bad commit is 552e844.
Example
This descriptor used to format with the wrong brace structure and then fail to parse back:
After this PR, display preserves the original binary
{{left,right},right}TapTree shape.Tests
The PR adds descriptor-side display roundtrip coverage for left-heavy, right-heavy, and balanced TapTree shapes.
It also adds Andrew's reported descriptor as a regression test in
fuzz/fuzz_targets/roundtrip_descriptor.rsunder#[cfg(test)], socargo test --manifest-path fuzz/Cargo.tomlexercises it without needing the fuzzer.I also added tests for a single-item tree and a test checking the debug output.
🤖 Generated with Codex, cross-validated with Claude