Skip to content

Accept only a single file as grammar#6

Open
kitterion wants to merge 1 commit into
elliottt:mainfrom
kitterion:make-grammar-a-single-label
Open

Accept only a single file as grammar#6
kitterion wants to merge 1 commit into
elliottt:mainfrom
kitterion:make-grammar-a-single-label

Conversation

@kitterion

@kitterion kitterion commented Oct 22, 2023

Copy link
Copy Markdown
Contributor

This might have been the intention considering that only the first element of the label_list is used and the README doesn't have the []:

tree_sitter_cc_library(
name = "hello_parser",
grammar = "grammar.js",
)

The alternative is that the README is wrong and needs to say grammar = ["grammar.js"].

@elliottt elliottt left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me, thanks!

@elliottt

Copy link
Copy Markdown
Owner

I'll have a look at the doctest failures today and ping you when I have a solution.

@elliottt

Copy link
Copy Markdown
Owner

Tests should pass if you rebase on main.

@kitterion kitterion force-pushed the make-grammar-a-single-label branch from 1a2f91c to 9b22187 Compare October 23, 2023 14:38
@kitterion

Copy link
Copy Markdown
Contributor Author

The more I look at this, the less I'm certain that this is a good idea :)
Technically, since grammar.js is just javascript executed by nodejs, it can load just about any file.
But I haven't seen people abusing this wonderful feature, so this is most likely fine. (Until somebody complains 😅)

@elliottt

Copy link
Copy Markdown
Owner

Now that I'm looking at my own use of this again, I do make use of the multiple files per grammar. Specifically, I make use of the require function as the first argument to grammar to import and extend another grammar. Is there still a way to include the dependency on another js source with this change?

@elliottt

Copy link
Copy Markdown
Owner

What do you think about making the second change you suggested in the description, and updating the example to include the singleton list for the grammar file instead?

@kitterion

Copy link
Copy Markdown
Contributor Author

I think in the end the grammar attribute should stay a single file but there should be other ways to specify extra files. This is going to be a bit long but let me explain.

I feel like the current logic is too subtle and needs changing: selecting the first file from the list as the main file is virtually never done in bazel rules. Looking at some prior art:

  • py_binary has srcs which is an arbitrary label_list. It has some heuristics to select the main .py file, but it also has main single file attribute which is "The name of the source file that is the main entry point of the application. This file must also be listed in srcs".
  • sh_binary has srcs which is a label_list, however it must contain only a single file. The rest of the files are specified either via data or via deps (intendended to be used with sh_library).
  • js_binary similarly has data and deps (intendended to be used with js_library) to include extra .js files. The main file is selected via the entry_point attribute.

There needs to be a way to specify extra files, in particular there are extra .js file to extend another grammar, and also there is an extra scanner.c (or scanner.cc) that tree-sitter supports https://tree-sitter.github.io/tree-sitter/creating-parsers#external-scanners.

Extending another grammar is essentially depending on another grammar so it should probably use a deps attribute. Something like this:

# In one folder
tree_sitter_cc_library(
    name = "c",
    grammar = "grammar.js",
)

# In another folder
tree_sitter_cc_library(
    name = "cpp",
    grammar = "grammar.js",
    deps = ["//c"],
)

Unfortunately, at this point I don't know if this suggestion is viable at all because I don't know enough about node and rules_js. This might require a proper integration with rules_js and a different user api.

In the meantime, while we figure out the best way to depend on other grammars, we can

  • make grammar a single file but add a data attribute as a stopgap alternative to deps. data is a bit ambiguous on tree_sitter_cc_library because it's unclear if it applies to js or cc files. It could conceivably apply to both. This is not a huge issue, documentation probably really helps with this one.
  • do nothing until we figure out a proper way of doing this. This is not ideal, somebody like me will just use a patch, I'm already using various patches, another one is not going to make a difference.
  • something else :)

@elliottt

elliottt commented Nov 1, 2023

Copy link
Copy Markdown
Owner

I like your first suggestion, thanks for the write-up! For the future where we allow depending on other tree_sitter_cc_library rule results, what do you think about introducing a provider that captures the input grammar sources? That way we could require that the arguments to deps all have that provider, and pull the sources in that way?

@kitterion

Copy link
Copy Markdown
Contributor Author

I think that's a great idea. I would wait and see, however, how the deps attribute shapes out. Maybe the best way is to use the JsInfo provider from rules_js to share js bits.
Speaking of a provider, I feel like there's already a need for some sort of provider that exposes the inner bits like the generated c sources or the shared library separately from the static library. As an example, I have a patch that exposes the shared library that I use for my neovim plugin. I didn't get around to upstreaming that yet.

@elliottt

elliottt commented Nov 1, 2023

Copy link
Copy Markdown
Owner

I'm all for adding a provider even if it's not for supporting this specific use-case. Do you already have an implementation on your fork that you'd be interested in upstreaming?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants