-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tree-sitter 1.0 Checklist #930
Comments
For anyone who is interested, please let me know if I've left important things off of this list ☝️ . |
Reads like tag queries are not going to be a 1.0 feature? |
An alternative to removing the generated files would be to let them be pushed automatically on master by a CI bot. User can create mergable PRs by not needing to change any generated files. In this repo https://github.com/neovim/nvim-lspconfig/blob/master/.github/workflows/docgen.yml user coot changes to a configuration and a bit updates the documentation after each push on master. |
I think #516 should also be addressed, even if the function is marked experimental? At least document the behavior. |
I would suggest to reduce implicitness:
|
Also I saw that |
Note that the version number in those file names aren’t the same as the 1.0 semver release that @maxbrunsfeld is proposing. If there are any backwards incompatible changes as part of putting together this release, we’d bump the SOVERSION to 1; if not, we’d keep it at 0. More details can be found here. |
@razzeee Tag queries are already done, but you're right that we still need to document them. I envision those mostly being documented in a GitHub-specific context, since there isn't much generally-useful functionally specific to Tags; it's mostly just a convention for tree queries that GitHub is using for code navigation. All of the broadly-useful stuff has been generalized into the query system. I added that to the TODOs around documentation though.
Nevermind, this got fixed. |
So you don't think tags make sense for others? I hoped, that it would help moving the queries towards the parser and thus having multiple projects consume these/improve these.
Understandable, do I need to be worried about the incremental parsing bit? Moved our parser to use this on a regular basis now and it seemed good, after figuring out, while it always gets stuck... |
Nice strech goals would be: |
This is awesome. Currently for Emacs, I have a custom package that compiles the grammar binaries for the 3 major platforms, and distributes them through GitHub Releases, in a single bundle. Having a standard tool for individual language package to do this on their own would be great. Will the official language repositories start distributing these binaries through GitHub Releases as well? I think some GitHub actions on top of these subcommands would be very helpful for that. |
@ubolonton I might not take on the automation of compilation and storage of binary files (except for wasm) right now. I was mostly planning to use GH releases to store tarballs of generated files like |
Never mind, I see now that this only applies only to tree-sitters in this org. |
@WhyNotHugo Yes, to confirm, the plan is not to mandate any particular hosting platform. Those commands will be able to produce the generated artifacts without uploading them as a GitHub release. |
while I agree, feel it's disappointing that it needed that to happen. as there have been other grammars suffering from it. still, thank you ❤️ |
It would be awesome to automate release process for all official tree-sitter tools, especially for tree-sitter-cli, for all official bindings Wasm, Rust, Node.js, Python, Haskell, Ruby and the Playground with its separately living parsers and keep all in sync with the core tree-sitter library releases. This would help to reduce misunderstanding and situations that some things work somewhere and somewhere don't. VersionsBindingsNotes
|
I am not sure if this is actually possible - it would be also awesome if generated parser/runtime never segfaults. Showing errors, warnings, exiting - yes, but never segfaulting. |
Obviously the library should never segfault. AFAIK, that's already the case. I think you're referencing tree-sitter/tree-sitter-c#64, which I can't reproduce after stripping out third-party libraries. If anyone is seeing Tree-sitter cause a segfaults, and you can reproduce the problem, please report it. |
Add generating bindings for Zig programming language. It's successor of C language. It provides a lot of safety features, like Rust, and might be more because of runtime checks. |
tree-sitter should provide means to replace memory allocation functions at runtime. This allows us to link to tree-sitter as a library instead of embedding it. |
+1 for better error messages. |
Native Library, WASM parsers I would love to use wasm in other runtimes. Currently I am only able to use wasm in JS. But I would want to use it in wasmer and I don't want to use the c version because the same parser is run in different runtimes. |
For wasm target, how about Typescript typing is really useful when working with VSCode LSP(Language Server Protocol) |
Suggestion: ESM format In the interest of an evergreen format for 1.0 I’d like to recommend ESM over CJS (e.g. basically just changing Happy to help with this if this is a desirable change! But just a suggestion I’ll leave to the author/maintainers to decide 🙂 |
Yeah, I've been thinking about this too @drwpow. I added this to the list, as well as an item about reducing our coupling to npm in general.
@lambdadog I started work on this issue in #1864. I ended up going with a solution that's specifically tied to wasmtime for now. |
@maxbrunsfeld I worked around the issue of having to check-in build files by running |
maybe this issue Standardized node name need to be released |
The ABI break seemed to be unintentional, but adding a subslot will be useful in the future as a break with version 1.0 of tree-sitter looks to be planned. Ref: tree-sitter/tree-sitter#930 (comment) Bug: https://bugs.gentoo.org/930039 Signed-off-by: Matthew Smith <matthew@gentoo.org>
In the not-too-distant future, I'd like to bump Tree-sitter's version to 1.0, indicating a greater degree of stability and completeness. After that I'd like to regenerate all of the parsers in the tree-sitter github org, and bump them to 1.0 as well. Before doing this, there are several important problems with the framework that I think should be fixed.
Tasks
Unicode character properties - Support ECMAScript unicode property escapes in regexes.
Partial Precedence Orderings - The integer precedence system makes some grammars shockingly difficult to maintain.
tree-sitter-javascript
andtree-sitter-typescript
to use this more flexible precedence scheme. Right now, the integer precedence system is making it very difficult to continue development oftree-sitter-typescript
in particular, because of the mix of different conflicts between types and expressions.Grammars with many fields, aliases - By historical accident, generated parsers use too small an integer type (
uint8_t
) for storing nodes' field and alias information. Parsers with large numbers of fields can cause integer overflows (Tree-sitter generates invalid code for grammars with large numbers of fields and/or aliases #511)production_id
as auint16_t
(Clean up parse table representation, use 16 bits for production_id #943)Fix issues with the
get_column
external scanner API (Fix the behavior of Lexer.get_column #978)CLI Ergonomics
parse
command, auto-detect UTF-16 files and decode them accordingly. This will help windows users who currently trip over the suggestedecho
command in the docs. (feat: add encoding flag and automatically check if a file might be utf16 #2368)GRAMMAR_PATH
setting where the CLI will search for grammar modules, instead of relying onnode_modules
andnpm
.Mergeable Git Repos - Make it easier to collaborate on grammars by removing generated files from version control.
pack
andpublish
subcommands to the Tree-sitter CLI, for uploading tarballs and compiled.wasm
files to the GitHub releases API. Store generated files as GH release artifacts instead of checking them into git repositories #730 (comment)Documentation
expression/identifier
syntax.tree-sitter-highlight
rust crate (just using tree queries directly).tags.scm
queries used for code navigation on GitHub. Documentqueries/tags.scm
#660Stretch Goals
I'm recording these here even though they are a bit less urgent.
Incremental Parsing Perf - Enhance the external scanner API to allow for looser state comparisons, avoiding the catastrophic node-reuse failures seen in the HTML parser (Incremental parsing is ineffective when a new tag is opened tree-sitter-html#23)
scanner.c
to decide whether to link against a_compare
function).tree-sitter-html
to use this API, improving its incremental performanceNative Library, WASM parsers - Add a compile-time option to link the C library against a standard WASM engine (V8, wasmtime, or wasmer). When this feature is enabled, allow the native library to load WASM parsers, marshaling the parse table into native memory, and using WASM execution only for the lexing phase. This will make it more useful to distribute parsers as pre-compiled
.wasm
files, instead of as C code. The performance cost should be small, because all of the expensive parsing operations will still be native. Add optional WASM feature to the native library, allowing it to run wasm-compiled parsers via wasmtime #1864The text was updated successfully, but these errors were encountered: