Business Insights Extraction from Scholarly Articles using Large Language Models (LLMs)

Project Overview

This project employs a hybrid approach, combining Meta's LLaMA 2 13B and LLaMA 2 70B models, to extract actionable business insights from scholarly articles. The dual-model pipeline leverages the strengths of each model according to the complexity of the task, utilizing NVIDIA's A100 80 GB Tensor Core GPU to process extensive scholarly texts for high-value business intelligence.

By integrating advanced natural language processing (NLP) techniques and sophisticated prompt engineering, our methodology not only processes extensive scholarly texts but also optimizes the extraction of valuable insights by ensuring that the analysis is both rapid and precise. The aim is to utilize the unparalleled capabilities of LLMs to distill and leverage dense academic content for business intelligence.

You may try it on web: https://insight-extractor.streamlit.app

Don't worry, we do not keep the API tokens.

Hybrid Solution Pipeline Explanation

Our hybrid solution pipeline employs a strategic approach to task distribution between two models, optimizing each model's strengths for various tasks:

Start: The process initiates with the user inputting an article in PDF format.
PDF Extraction: The article is preprocessed to extract text and images.
Choosing Critical Sections: The user guides the system to identify critical sections for detailed analysis.
Section Summarization: Summarizes the chosen sections to distill the core content.
Enriching the Abstract: The abstract is enriched by integrating insights from the summaries of the critical sections, forming a comprehensive overview.
Insights Extraction: Extracts insights from the enriched abstract, utilizing the more capable model for complex inference.
Finding a Title: Generates a suitable title for the insights extracted to be displayed in a chat interface.
Image Extraction: Images are extracted, and their relevance is evaluated.
End: The pipeline culminates in actionable business insights presented to the user.

This hybrid pipeline allows for efficient use of LLMs, applying the appropriate model to tasks based on complexity, ensuring a fast and accurate analysis.

The diagram illustrating our hybrid solution pipeline is presented below, delineating the specific processes and the corresponding tasks assigned to the LLaMA 2 13B and LLaMA 2 70B models.

Project Structure

A. Summarization Pipeline

The summarization_pipeline folder contains essential Python scripts:

article_parser.py: Parses articles into sections.
image_processing.py: Extracts images and matches them with significant ones identified by the model.
orchestration.py: Initializes the LLMs and orchestrates the pipeline based on task complexity.
pdf_section_extractor.py: Converts PDF files into clean, machine-readable text.

B. Solution Script

solution.py: Activates the solution pipeline, integrating all defined algorithms and model calls.

Getting Started

Ensure Python 3.6+ is installed.
Optionally, you can create and activate the virtual environment.

    python -m venv .venv
    source ./.venv/bin/activate

Install the requirements.

    pip install -r requirements.txt

License

The project is licensed under the MIT License.

This README now accurately reflects your hybrid solution pipeline and details which tasks are allocated to the LLaMA 2 13B and LLaMA 2 70B models, in line with the latest pipeline diagram you've provided.

Name		Name	Last commit message	Last commit date
Latest commit History 298 Commits
.vscode		.vscode
__pycache__		__pycache__
extractor_pipeline.egg-info		extractor_pipeline.egg-info
pages		pages
pipeline		pipeline
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
greenness_test.json		greenness_test.json
llm_dev.code-workspace		llm_dev.code-workspace
logs.json		logs.json
logs_w_time.json		logs_w_time.json
requirements.txt		requirements.txt
setup.py		setup.py
streamlit_app.py		streamlit_app.py
structures.py		structures.py
utilities.py		utilities.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Business Insights Extraction from Scholarly Articles using Large Language Models (LLMs)

Project Overview

Hybrid Solution Pipeline Explanation

Project Structure

A. Summarization Pipeline

B. Solution Script

Getting Started

License

About

Releases

Packages

Contributors 2

Languages

nusret35/llm_dev

Folders and files

Latest commit

History

Repository files navigation

Business Insights Extraction from Scholarly Articles using Large Language Models (LLMs)

Project Overview

Hybrid Solution Pipeline Explanation

Project Structure

A. Summarization Pipeline

B. Solution Script

Getting Started

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages