You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
def chunk_corpus(corpus: list, chunk_size: int = 64) -> list:
"""
Chunk the corpus into smaller parts. Run the following command to download the required nltk data:
python -c "import nltk; nltk.download('punkt')"
@param corpus: the formatted corpus, see README.md
@param chunk_size: the size of each chunk, i.e., the number of words in each chunk
@return: chunked corpus, a list
"""
the default chunk_size is 64, is that the best practice? I tried with 150, and the entity count is the same as 64, but 10% more relationships were obtained.
The text was updated successfully, but these errors were encountered:
doncat99
changed the title
chunk
setting chunk size of chunk_corpus function
Sep 13, 2024
def chunk_corpus(corpus: list, chunk_size: int = 64) -> list:
"""
Chunk the corpus into smaller parts. Run the following command to download the required nltk data:
python -c "import nltk; nltk.download('punkt')"
the default chunk_size is 64, is that the best practice? I tried with 150, and the entity count is the same as 64, but 10% more relationships were obtained.
The text was updated successfully, but these errors were encountered: