Skip to content

len_doc and encoding #4

@TheAzouz

Description

@TheAzouz

Hello,

I would like to point out two issues I faced when working with wikIR tool:

  1. There is a mistake in the documentation for the len_doc parameter. It says that by default it's equal to None (all tokens are collected) while in the code is 200. To get all tokens I used --len_doc -1
  2. It would be good if we can specify the encoding of the input file and output file.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions