Unexpectedly Low Syntax Match Score in Repo-Level CodeBLEU for Identical Repos

I was testing out the `SketchBLEU` implementation via CodeBLEU (from `validation/evaluation_scripts/codebleu`), and while I was able to get it running after a couple of modifications, I ran into an issue where the **syntax match score is extremely low**, even when evaluating two identical repos.  

---

## Steps to Reproduce  

1. Clone the repo and set up the environment (conda environment details provided below).  

2. Apply the following fixes to make CodeBLEU run without errors:  

   - In `CodeS/validation/evaluation_scripts/codebleu/codebleu/syntax_match.py`, update the `to_str` function in the `FileOrNode` class:  
     ```python
     field_names.append(cursor.current_field_name)  # instead of cursor.field_name
     ```

   - In `CodeS/validation/evaluation_scripts/codebleu/codebleu/__main__.py`, update the `main` function to properly handle `repo_bleu` runs:  
     ```python
     def main(
         ref_files: List[str],
         hyp_file: str,
         lang: str,
         weights: Tuple[float, float, float, float] = (0.25, 0.25, 0.25, 0.25),
         repo_bleu: bool = False,
     ) -> None:
         if repo_bleu:
             repo_bleu_score = calc_repobleu(
                 [Path(ref_file) for ref_file in ref_files],
                 [Path(hyp_file)],
                 lang,
                 weights=weights,
             )
             print("Repo-level CodeBLEU score: ", repo_bleu_score)
         else:
             code_bleu_score = calc_codebleu(
                 references,
                 hypothesis,
                 lang,
                 weights=weights,
             )
             ...
     ```  

3. Create two dummy repos with identical files:  

   **`tmp.py`**  
   ```python
   # This script reads from 'input.txt' and writes its content to 'output.txt'

   with open('input.txt', 'r') as infile:
       data = infile.read()

   with open('output.txt', 'w') as outfile:
       outfile.write(data)
   ```  

   **`codebleu.py`**  
   (copied directly from this repo’s `codebleu.py`)  

4. Run the command:  
   ```bash
   python -m codebleu --refs "../repo_1" --hyp "../repo_2" --lang python --repo
   ```  

---

## Expected Behavior  
Since both repos are identical, I expected **all scores (including syntax match)** to be 1.0 (or very close to it).  

## Actual Behavior  
The output was:  
```
Repo-level CodeBLEU score:  {
    'codebleu': np.float64(0.7503026634382567),
    'ngram_match_score': 1.0,
    'weighted_ngram_match_score': 1.0,
    'syntax_match_score': 0.0012106537530266344,
    'dataflow_match_score': np.float64(1.0)
}
```

The **syntax match score is ~0.001**, even though the repos are identical.  

---

## Environment  
Conda environment (relevant libraries):  
```
python 3.11.13
numpy 2.3.3
tree-sitter 0.20.1
types-tree-sitter 0.20.1.20240311
codebleu 0.4.0
```
---

## Notes  
- This might be related to the `tree-sitter` API changes (hence the fix needed in `syntax_match.py`).  
- Possibly the repo-level aggregation is not correctly handling syntax trees across files.  

Could you clarify whether this is expected behavior or if the syntax match computation is incorrect at the repo level?  


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unexpectedly Low Syntax Match Score in Repo-Level CodeBLEU for Identical Repos #3

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Unexpectedly Low Syntax Match Score in Repo-Level CodeBLEU for Identical Repos #3

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions