[WIP] Token-based outer attributes handling by Aaron1011 · Pull Request #76130 · rust-lang/rust

Aaron1011 · 2020-08-30T22:04:52Z

Makes progress towards #43081

We now capture tokens for attributes, and remove them from the tokenstream when applying #[cfg] attributes (in addition to modifying the AST). As a result, derive-proc-macros now receive the exact input (instead of the pretty-printed/retokenized input), even when fields/variants get #[cfg]-stripped.

Several changes are made in order to accomplish this:

New structs PreexpTokenStream and PreexpTokenTree are added. These are identical to TokenStream and PreexpTokenstream, with the exception of an OuterAttributes variant in PreexpTokenTree. This is used to represent captured attributes, allowing the target tokens to be removed by #[cfg]-stripping
(and when invoking attribute macros).
Tokens are now attached to Attribute. This allows us to remove prepend_attributes, which required pretty-printing/retokenizing attributes in certain cases.
collect_tokens was rewritten. The implementation is now much simpler - instead of keeping track of nested TokenTree::Delimited at various deps, we collect all tokens into a flat buffer (e.g. we push TokenKind::OpenDelim and TokenKind::CloseDelim. After capturing, we losslessly re-package these tokens back into the normal TokenTree::Delimited structure.
We now store a PreexpTokenStream on AST s structs instead of a plain TokenStream. This contains both the attributes and the target itself.
parse_outer_attributes now passes the parsed attributes to a closure, instead of simply returning them. The closure is responsible for parsing the attribute target, and allows us to automatically construct the proper PreexpTokenStream.
HasAttrs now has a visit_tokens method. This is used during #[cfg]-stripping to allow us to remove attribute targets from the PreexpTokenStream.

This PR is quite large (though ~1000 lines are tests). I opened it mainly to show how all of the individual pieces fit together, since some of them (e.g. the parse_outer_attributes change) don't make sense if we're not going to land this PR. However, many pieces are mergeable on their own - I plan to split them into their own PRs.

TODO:

Any errors for malformed #[cfg]/#[cfg_attr] are duplicated. This is because we run the same code for both the AST-based attributes and the token-based attributes. We could possibly skip validating the token-based attributes, but I was worried about accidentally skipping gating if the pretty-print/reparse check ever fails. These are deduplicated by the diagnostic infrastructure outside of compiletest, but it would be nice to fix this.
Inner attributes are partially handled - we store them in the PreexpTokenStream alongside outer attributes, which allows us to handle #![cfg(FALSE)] and remove its parent. This should be good enough to handle all stable uses of inner attributes that are observable by proc-macros (currently, just cfg-stripping before #[derive] macros are invoked). Custom inner attributes are not handled.
Parser::parse_stmt_without_recovery does not eat a trailing semicolon, which means we will not capture it in the tokenstream. I need to either refactor statement parsing to eat the semicolon inside the parse_outer_attributes_with_tokens, or manually adjust the captured tokens.
I currently only check for the presence/absence of attributes when determining whether to run token-collection logic. If this causes performance problems, we could look for specific attributes (cfg, cfg_attr, derive, or any non-builtin attribute). There are many scenarios where we can skip token collection for some/all ast structs (no derive on an an item, no cfg inside an item, no custom attributes on an item, etc.)

rust-highfive · 2020-08-30T22:04:56Z

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

Aaron1011 · 2020-08-30T22:05:04Z

r? @petrochenkov

In PR rust-lang#76130, I add a fourth field, which makes using a tuple variant somewhat unwieldy.

…nkov Factor out StmtKind::MacCall fields into `MacCallStmt` struct In PR rust-lang#76130, I add a fourth field, which makes using a tuple variant somewhat unwieldy.

Aaron1011 · 2020-09-04T18:39:20Z

@bors try @rust-timer queue

rust-timer · 2020-09-04T18:39:22Z

Awaiting bors try build completion

bors · 2020-09-04T18:39:39Z

⌛ Trying commit 083792bcf67d1e50afc9ad50500ab011d34b0bee with merge 30ddab741f8c02392c4383445436138f5a6ec53d...

bors · 2020-09-04T19:25:26Z

☀️ Try build successful - checks-actions, checks-azure
Build commit: 30ddab741f8c02392c4383445436138f5a6ec53d (30ddab741f8c02392c4383445436138f5a6ec53d)

rust-timer · 2020-09-04T19:25:28Z

Queued 30ddab741f8c02392c4383445436138f5a6ec53d with parent 80cacd7, future comparison URL.

rust-timer · 2020-09-04T21:22:24Z

Finished benchmarking try commit (30ddab741f8c02392c4383445436138f5a6ec53d): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

Aaron1011 · 2020-09-04T23:03:35Z

@bors try @rust-timer queue

rust-timer · 2020-09-04T23:03:37Z

Awaiting bors try build completion

bors · 2020-09-04T23:03:49Z

⌛ Trying commit 23fd9fbd9d07096ca254592013a42b4cae12ba29 with merge b3535fba76a3fc7202284b7699afff8f8831e461...

bors · 2020-09-04T23:46:27Z

☀️ Try build successful - checks-actions, checks-azure
Build commit: b3535fba76a3fc7202284b7699afff8f8831e461 (b3535fba76a3fc7202284b7699afff8f8831e461)

rust-timer · 2020-09-04T23:46:30Z

Queued b3535fba76a3fc7202284b7699afff8f8831e461 with parent 42d896a, future comparison URL.

rust-timer · 2020-09-05T01:31:58Z

Finished benchmarking try commit (b3535fba76a3fc7202284b7699afff8f8831e461): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

petrochenkov · 2020-09-06T16:57:58Z

@Aaron1011
Apologies, I spent all the weekend time on testing LLD, so I didn't have chance to look a this.
I'll try to do it during the week.

petrochenkov · 2020-09-26T20:46:06Z

We need to trim this PR a bit first.
Right now I don't really understand what happens here because it's too large.

The next changes are improvements on their own and can be landed in separate PRs:

Newly added tests and test cases.
Adding tokens to statements.
Adding tokens to ast::Attributes and eliminating prepend_attrs (at least partially).
Token collection refactoring.

Additionally:

Field / variant / etc span changes are responsible for the majority of the diff in test outputs. Can we keep the old spans in AST and not extend them to comma separators?
parse_outer_attributes and similar changes involving self -> this dominate the parser diff. Can they be factored out into a separate commit (the first one). Other commits contain some significant back and forth, so I ended up reading the diff for the whole PR instead of separate commits, so it would be better to squash them.

petrochenkov · 2020-09-26T17:40:58Z

compiler/rustc_ast/src/tokenstream.rs

+pub enum PreexpTokenTree {
+    Token(Token),
+    Delimited(DelimSpan, DelimToken, PreexpTokenStream),
+    OuterAttributes(AttributesData),


Inlining AttributeData will improve readability here, IMO.

Aaron1011 · 2020-09-26T20:49:46Z

Field / variant / etc span changes are responsible for the majority of the diff in test outputs. Can we keep the old spans in AST and not extend them to comma separators?

We could, but that seems weirdly inconsistent - we would then have a span that doesn't include some of the tokens we capture. However, I could split this out into a separate PR to minimize the diff.

Aaron1011 · 2020-09-26T20:58:39Z

I'll work on splitting things up. I opened this PR mainly to do a Crater run to check that everything fit together.

Split out from rust-lang#76130 This tests our handling of combining derives, derive helper attributes, attribute macros, and `cfg`/`cfg_attr`

petrochenkov · 2020-09-29T17:05:52Z

With this PR we perform transformations like cfg-expansion on both tokens and AST simultaneously, and then make sure that they are still in sync with the pretty-print-reparse hack.

In the end state, I think, we should treat the AST part as a cache (used for performance only) and always reparse it from tokens after any transformations (which will only be performed on tokens). The reparse hack will then be eliminated.

This is kinda opposite to what we do now by treating the tokens as a "cache" and regenerating them from AST when necessary (via pretty-printing), but unlike the "AST -> tokens", the "tokens -> AST" conversion is always lossless.

Having only tokens without any AST cache should be functionally correct, but it's pretty reasonable to predict that it will be a performance regression because we'll have to parse same tokens at least twice and maybe more.

Dylan-DPC-zz · 2020-10-16T15:47:39Z

Blocked on #77250

crlf0710 · 2020-11-13T12:55:53Z

Triage: A reminder that #77250 has been merged, so it seems maybe this can continue...

Aaron1011 · 2020-11-13T13:04:06Z

This is currently blocked on further work on statement attributes

Aaron1011 · 2021-01-04T15:30:28Z

Closing in favor of #80689

rust-highfive assigned nikomatsakis Aug 30, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 30, 2020

rust-highfive assigned petrochenkov and unassigned nikomatsakis Aug 30, 2020

Aaron1011 mentioned this pull request Aug 30, 2020

Don't use zip to compare iterators during pretty-print hack #76131

Merged

Aaron1011 added a commit to Aaron1011/rust that referenced this pull request Aug 30, 2020

Factor out StmtKind::MacCall fields into MacCallStmt struct

090b167

In PR rust-lang#76130, I add a fourth field, which makes using a tuple variant somewhat unwieldy.

Aaron1011 mentioned this pull request Aug 30, 2020

Factor out StmtKind::MacCall fields into MacCallStmt struct #76132

Merged

This comment has been minimized.

Sign in to view

Aaron1011 force-pushed the feature/new-preexp-cfg-tmp branch from 602b066 to 083792b Compare September 3, 2020 17:34

Aaron1011 force-pushed the feature/new-preexp-cfg-tmp branch from 083792b to 23fd9fb Compare September 4, 2020 23:01

petrochenkov reviewed Sep 26, 2020

View reviewed changes

petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 26, 2020

Aaron1011 mentioned this pull request Sep 26, 2020

Test more attributes in test issue-75930-derive-cfg.rs #77243

Merged

Aaron1011 mentioned this pull request Sep 27, 2020

Rewrite collect_tokens implementations to use a flattened buffer #77250

Merged

gui1117 mentioned this pull request Nov 12, 2020

Add pallet attribute macro to declare pallets paritytech/substrate#6877

Merged

3 tasks

gui1117 mentioned this pull request Nov 13, 2020

Compiler loses location information before calling macros (sometimes) #43081

Closed

Uh oh!

Conversation

Aaron1011 commented Aug 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rust-highfive commented Aug 30, 2020

Uh oh!

Aaron1011 commented Aug 30, 2020

Uh oh!

This comment has been minimized.

Aaron1011 commented Sep 4, 2020

Uh oh!

rust-timer commented Sep 4, 2020

Uh oh!

bors commented Sep 4, 2020

Uh oh!

bors commented Sep 4, 2020

Uh oh!

rust-timer commented Sep 4, 2020

Uh oh!

rust-timer commented Sep 4, 2020

Uh oh!

Aaron1011 commented Sep 4, 2020

Uh oh!

rust-timer commented Sep 4, 2020

Uh oh!

bors commented Sep 4, 2020

Uh oh!

bors commented Sep 4, 2020

Uh oh!

rust-timer commented Sep 4, 2020

Uh oh!

rust-timer commented Sep 5, 2020

Uh oh!

petrochenkov commented Sep 6, 2020

Uh oh!

petrochenkov commented Sep 26, 2020

Uh oh!

petrochenkov Sep 26, 2020

Choose a reason for hiding this comment

Uh oh!

Aaron1011 commented Sep 26, 2020

Uh oh!

Aaron1011 commented Sep 26, 2020

Uh oh!

petrochenkov commented Sep 29, 2020

Uh oh!

Dylan-DPC-zz commented Oct 16, 2020

Uh oh!

crlf0710 commented Nov 13, 2020

Uh oh!

Aaron1011 commented Nov 13, 2020

Uh oh!

Aaron1011 commented Jan 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Aaron1011 commented Aug 30, 2020 •

edited

Loading