Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PubMed Download Pipeline #246

Draft
wants to merge 33 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
aefeac1
added functions for metadata extraction
star-nox Apr 2, 2024
b833524
completed all download functions
star-nox Apr 3, 2024
8f14cf9
added supabase upsert
star-nox Apr 3, 2024
6daa4df
updated comments
star-nox Apr 3, 2024
c8c9056
added processpool to extractMetadataFromXML()
star-nox Apr 4, 2024
f8a4d23
minor changes
star-nox Apr 6, 2024
84fd182
Merge branch 'main' into pubmed
star-nox Apr 8, 2024
fa83ddf
yielded metadata after collecting 100 articles
star-nox Apr 8, 2024
88a951b
Merge branch 'main' into pubmed
star-nox Apr 9, 2024
64a4142
storing metadata into csv and upserting per XML file
star-nox Apr 11, 2024
24425d4
parallelized metadata update
star-nox Apr 12, 2024
3ddec3b
added minio to requirements.txt
star-nox Apr 12, 2024
e03fbf1
parallelized download
star-nox Apr 12, 2024
8e5a1a0
parallelized upload
star-nox Apr 15, 2024
caac0bf
minor changes
star-nox Apr 17, 2024
fa60508
restricted upload parallelization to 10
star-nox Apr 19, 2024
a61255c
changed starting XML file
star-nox Apr 21, 2024
4d86b85
changed starting XML file
star-nox Apr 22, 2024
63a6cb6
changed start file
star-nox Apr 29, 2024
b17baea
minor comment
star-nox Apr 29, 2024
8f608c5
Merge branch 'pubmed' of https://github.com/UIUC-Chatbot/ai-ta-backen…
star-nox May 1, 2024
4cfc6b8
minor changes in main loop
star-nox May 6, 2024
8b533ba
minor changes
star-nox May 13, 2024
6dfe50b
parallelized processing
star-nox May 14, 2024
e98f3dc
added try-except in getArticleIds()
star-nox May 14, 2024
68057df
deleted csv file
star-nox May 14, 2024
a2a5f22
merged changes
star-nox May 14, 2024
4338931
test comment
star-nox May 14, 2024
048f41d
print test comment
star-nox May 14, 2024
472814e
Commented out prints for speed
KastanDay May 14, 2024
880ae97
Commented out prints for speed
KastanDay May 14, 2024
d635fe8
Merge branch 'pubmed' of https://github.com/UIUC-Chatbot/ai-ta-backen…
star-nox May 15, 2024
304ec5d
parallelized main for loop and added xml filename column
star-nox May 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions ai_ta_backend/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
from ai_ta_backend.service.sentry_service import SentryService

from ai_ta_backend.beam.nomic_logging import create_document_map
from ai_ta_backend.utils.pubmed_extraction import extractPubmedData

app = Flask(__name__)
CORS(app)
Expand Down Expand Up @@ -379,6 +380,17 @@ def getTopContextsWithMQR(service: RetrievalService, posthog_service: PosthogSer
response.headers.add('Access-Control-Allow-Origin', '*')
return response

@app.route('/pubmedExtraction', methods=['GET'])
def pubmedExtraction():
"""
Extracts metadata and download papers from PubMed.
"""
result = extractPubmedData()

response = jsonify(result)
response.headers.add('Access-Control-Allow-Origin', '*')
return response


def configure(binder: Binder) -> None:
binder.bind(RetrievalService, to=RetrievalService, scope=RequestScope)
Expand Down
Loading