Releases: CambioML/uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering
0.0.31
What's Changed
- finish test_extract_txt_flow by @jli943 in #198
- Paper Comparison Summary Flow by @CallmeNafiy in #215
- Bump up version to 0.0.31 by @goldmermaid in #232
New Contributors
- @jli943 made their first contribution in #198
- @CallmeNafiy made their first contribution in #215
Full Changelog: https://github.com/CambioML/uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering/compare/0.0.30...0.0.31
0.0.30
What's Changed
- Refinement: setting batch_size for different models by @riboyuan99 in #212
- Add auto splitter advanced for huggingface config by @ZHIHANCHEN03 in #220
- add bug report ISSUE_TEMPLATE by @jojortz in #221
- Add Feature Request and Questions issues by @jojortz in #222
- add Documentation Github Issue Template by @jojortz in #223
- Add summary prompt and bump version to 0.0.30 by @goldmermaid in #224
Full Changelog: https://github.com/CambioML/uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering/compare/0.0.29...0.0.30
0.0.29
What's Changed
- Fix autoflake failure case and update pre-commit to run unittests by @goldmermaid in #218
- Bump up version to 0.0.29 by @goldmermaid in #219
Full Changelog: https://github.com/CambioML/uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering/compare/0.0.28...0.0.29
0.0.28
What's Changed
- added crop labeling example using google multimodal flow using gemini-vision by @boqiny in #207
- Bump up version to 0.0.28 by @goldmermaid in #217
Full Changelog: https://github.com/CambioML/uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering/compare/0.0.27...0.0.28
0.0.27
What's Changed
- Use LLM to Auto Write Example TOC Readme by @goldmermaid in #204
- Added download entry for neuron model with batch_size = 8 and benchmarked neuron models with batch_size = 1,2,4,8 by @riboyuan99 in #201
- Add
. gitattributes
by @goldmermaid in #205 - Add application folder for flow and tests. by @goldmermaid in #206
- Add op/extract/split Unit Test by @Sdddell in #186
- TransformAzureOpenAI Implementation by @frank-suwen in #208
- Add google workspace email filter uniflow application by @goldmermaid in #209
- Update gmail filter notebook by @goldmermaid in #210
- Create nougat_huggingface_QAs.ipynb by @ZHIHANCHEN03 in #135
- Long text spliter by @ZHIHANCHEN03 in #200
- polish gmail filter notebook by @goldmermaid in #213
- Bump up version to 0.0.27 by @goldmermaid in #214
New Contributors
Full Changelog: https://github.com/CambioML/uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering/compare/0.0.26...0.0.27
0.0.26
What's Changed
- Remove duplicated notebooks by @goldmermaid in #202
- Remove duplicated notebooks by @goldmermaid in #196
- Bump up version to 0.0.26 by @goldmermaid in #203
Full Changelog: https://github.com/CambioML/uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering/compare/0.0.25...0.0.26
0.0.25
What's Changed
- Add a web summary example by @goldmermaid in #190
- Fix html parser duplicate content by @SayaZhang in #191
- Add TransformOp, update it instantiation into ExtractHTMLFlow to add post_extract_op, update notebook by @goldmermaid in #192
- update langchain to nougat to extract pdf in example by @ZHIHANCHEN03 in #175
- Polish Readme with the latest features by @goldmermaid in #194
- Remove 0.1.0-small by @jojortz in #193
- Refactor pipeline class by @goldmermaid in #195
- Bump up version to 0.0.25 by @goldmermaid in #197
Full Changelog: https://github.com/CambioML/uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering/compare/0.0.24...0.0.25
0.0.24
What's Changed
- Unified HTML Extract by @vicshi06 in #180
- Nougat modified to huggingface interface by @CluckRookie in #178
- Modify huggingface_model_json.ipynb by @riboyuan99 in #160
- Add basic expand and reduce op by @goldmermaid in #182
- Add txt_op unit test by @SayaZhang in #177
- fix a bug on inf2.8x with batch_size= 1 or 2 by @CluckRookie in #183
- Refactor model_op by @SayaZhang in #184
- Update requests to mimic a browser request with header by @goldmermaid in #185
- Add Google AI Studio model server. by @goldmermaid in #187
- Add MultiModal model server using gemini vision pro by @goldmermaid in #188
- Improve HTML parser and recursive splitter by @SayaZhang in #181
- Bump up version to 0.0.24 by @goldmermaid in #189
Full Changelog: 0.0.23...0.0.24
0.0.23
What's Changed
- fix a bug
AttributeError: dict object has no attribute model_fields
by @CluckRookie in #168 - Update rule-based html parser by @SayaZhang in #166
- Refactor extract_txt_flow using unifed read_file function by @SayaZhang in #172
- fix: fix s3_op by @SeisSerenata in #169
- Add ThreadPoolExecutor for Azure OpenAI endpoint by @goldmermaid in #173
- Include header in request to get metadata for an IMDSv2-required instance by @riboyuan99 in #171
- Bump up version to 0.0.23 by @goldmermaid in #176
New Contributors
- @riboyuan99 made their first contribution in #171
Full Changelog: https://github.com/CambioML/uniflow-llm-text-data-cleaning-cluster/compare/0.0.22...0.0.23
0.0.22
What's Changed
- Fix
gpt4
togpt-4-1106-preview
to allow JSON by @goldmermaid in #156 - update docs for version 0.0.21 by @jojortz in #155
- feat: add sample code for amazon elastic search by @SeisSerenata in #142
- Add @SayaZhang @CluckRookie @SeisSerenata @jojortz as repo codeowners/reviewers by @goldmermaid in #157
- Fix: move import to class init and add RecursiveCharacterSplitter by @SayaZhang in #141
- A function to read file from Amazon S3, URLs, or local paths by @CluckRookie in #162
- Fix check key in Pydantic BaseModel class bug by @goldmermaid in #164
- Bump up version to 0.0.22 by @goldmermaid in #165
Full Changelog: https://github.com/CambioML/uniflow/compare/0.0.21...0.0.22