Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serena branch for metadata from ArXiv #33

Open
wants to merge 79 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
d24fc27
generate output files
sy2657 May 18, 2021
2bfb7e6
query5
sy2657 May 19, 2021
299978d
query5
sy2657 May 19, 2021
7ccfaf8
update
sy2657 May 24, 2021
b674391
deduplicate universities, add wikidata links
sy2657 May 25, 2021
d32d2cd
add readings
sy2657 Jun 6, 2021
a204b14
Add files via upload
Zhuohan-Amber Jun 16, 2021
e6c49fb
Add files via upload
sy2657 Jun 16, 2021
f99ffec
Add files via upload
sy2657 Jun 16, 2021
e38c0bb
Add files via upload
sy2657 Jun 16, 2021
d455240
upload data txt files
sy2657 Jun 17, 2021
efd5378
Merge branch 'Serena-branch' of https://github.com/JonathanReeve/data…
sy2657 Jun 17, 2021
b9934b7
functions for querying Arxiv for metadata
sy2657 Jul 23, 2021
e667116
Update query_arxiv_metadata.ipynb
sy2657 Jul 23, 2021
81e53bf
Rename data/19.texts.txt to data/texts/txt/19.texts.txt
sy2657 Aug 23, 2021
e183356
Rename data/20.texts.txt to data/texts/txt/20.texts.txt
sy2657 Aug 23, 2021
6720f55
Rename data/21.texts.txt to data/texts/txt/21.texts.txt
sy2657 Aug 23, 2021
a689dfa
Rename data/22.texts.txt to data/texts/txt/22.texts.txt
sy2657 Aug 23, 2021
6f93a5f
Rename data/23.texts.txt to data/texts/txt/23.texts.txt
sy2657 Aug 23, 2021
08173f3
Rename data/24.texts.txt to data/texts/txt/24.texts.txt
sy2657 Aug 23, 2021
4aa00a0
Rename data/25.texts.txt to data/texts/txt/25.texts.txt
sy2657 Aug 23, 2021
b07e28d
Rename data/26.texts.txt to data/texts/txt/26.texts.txt
sy2657 Aug 23, 2021
14599ea
Rename data/27.texts.txt to data/texts/txt/27.texts.txt
sy2657 Aug 23, 2021
ebc741d
Rename data/29.texts.txt to data/texts/txt/29.texts.txt
sy2657 Aug 23, 2021
0d150fb
Rename data/30.texts.txt to data/texts/txt/30.texts.txt
sy2657 Aug 23, 2021
a2f38f9
Rename data/32.texts.txt to data/texts/txt/32.texts.txt
sy2657 Aug 23, 2021
5b3042c
Rename data/33.texts.txt to data/texts/txt/33.texts.txt
sy2657 Aug 23, 2021
cb4eb7f
Rename data/34.texts.txt to data/texts/txt/34.texts.txt
sy2657 Aug 23, 2021
c2ac1f9
Rename data/35.texts.txt to data/texts/txt/35.texts.txt
sy2657 Aug 23, 2021
87ac647
Rename data/36.texts.txt to data/texts/txt/36.texts.txt
sy2657 Aug 23, 2021
8139dad
Rename data/37.texts.txt to data/texts/txt/37.texts.txt
sy2657 Aug 23, 2021
48762f5
Update and rename data/38.texts.txt to data/texts/txt/38.texts.txt
sy2657 Aug 23, 2021
ef71b8c
Rename data/39.texts.txt to data/texts/txt/39.texts.txt
sy2657 Aug 23, 2021
8ee027b
Rename data/40.texts.txt to data/texts/txt/40.texts.txt
sy2657 Aug 23, 2021
ddfa673
Rename data/41.texts.txt to data/texts/txt/41.texts.txt
sy2657 Aug 23, 2021
75ae001
Rename data/42.texts.txt to data/texts/txt/42.texts.txt
sy2657 Aug 23, 2021
ff78ba8
Rename data/43.texts.txt to data/texts/txt/43.texts.txt
sy2657 Aug 23, 2021
e3991f3
Rename data/44.texts.txt to data/texts/txt/44.texts.txt
sy2657 Aug 23, 2021
398863a
Rename data/45.texts.txt to data/texts/txt/45.texts.txt
sy2657 Aug 23, 2021
ff0818d
Rename data/46.texts.txt to data/texts/txt/46.texts.txt
sy2657 Aug 23, 2021
af4db6e
Rename data/49.texts.txt to data/texts/txt/49.texts.txt
sy2657 Aug 23, 2021
0c83607
Rename data/50.texts.txt to data/texts/txt/50.texts.txt
sy2657 Aug 23, 2021
9c527f3
Rename data/51.texts.txt to data/texts/txt/51.texts.txt
sy2657 Aug 23, 2021
e14a89c
Rename data/53.texts.txt to data/texts/txt/53.texts.txt
sy2657 Aug 23, 2021
35ffabe
Rename data/54.texts.txt to data/texts/txt/54.texts.txt
sy2657 Aug 23, 2021
be6b279
Rename data/55.texts.txt to data/texts/txt/55.texts.txt
sy2657 Aug 23, 2021
37bc802
Rename data/56.texts.txt to data/texts/txt/56.texts.txt
sy2657 Aug 23, 2021
78c0a0d
Rename data/59.texts.txt to data/texts/txt/59.texts.txt
sy2657 Aug 23, 2021
475371e
Rename data/60.texts.txt to data/texts/txt/60.texts.txt
sy2657 Aug 23, 2021
0709cdf
Rename data/61.texts.txt to data/texts/txt/61.texts.txt
sy2657 Aug 23, 2021
286926e
Rename data/63.texts.txt to data/texts/txt/63.texts.txt
sy2657 Aug 23, 2021
86ac321
Rename data/64.texts.txt to data/texts/txt/64.texts.txt
sy2657 Aug 23, 2021
fd2124d
Rename data/65.texts.txt to data/texts/txt/65.texts.txt
sy2657 Aug 23, 2021
59fb24d
Rename data/65texts.txt to data/texts/txt/65texts.txt
sy2657 Aug 23, 2021
e77742e
Rename data/66.texts.txt to data/texts/txt/66.texts.txt
sy2657 Aug 23, 2021
a648ba3
Rename data/67.texts.txt to data/texts/txt/67.texts.txt
sy2657 Aug 23, 2021
a01ec8b
Rename data/68.texts.txt to data/texts/txt/68.texts.txt
sy2657 Aug 23, 2021
3226002
Rename data/69.texts.txt to data/texts/txt/69.texts.txt
sy2657 Aug 23, 2021
a27573e
Rename data/70.texts.txt to data/texts/txt/70.texts.txt
sy2657 Aug 23, 2021
5b0e316
Rename data/71.texts.txt to data/texts/txt/71.texts.txt
sy2657 Aug 23, 2021
f7f84fa
Rename data/72.texts.txt to data/texts/txt/72.texts.txt
sy2657 Aug 23, 2021
d31edf8
Rename data/73.texts.txt to data/texts/txt/73.texts.txt
sy2657 Aug 23, 2021
71653bb
Rename data/74.texts.txt to data/texts/txt/74.texts.txt
sy2657 Aug 23, 2021
b898357
Rename data/76.texts.txt to data/texts/txt/76.texts.txt
sy2657 Aug 23, 2021
38a84c8
Rename data/77.texts.txt to data/texts/txt/77.texts.txt
sy2657 Aug 23, 2021
000d87c
Rename data/80.texts.txt to data/texts/txt/80.texts.txt
sy2657 Aug 23, 2021
58bcf8b
Rename data/81.texts.txt to data/texts/txt/81.texts.txt
sy2657 Aug 23, 2021
a613e9b
Rename data/82.texts.txt to data/texts/txt/82.texts.txt
sy2657 Aug 23, 2021
b82b29f
Rename data/83.texts.txt to data/texts/txt/83.texts.txt
sy2657 Aug 23, 2021
9de4a7e
Rename data/85.texts.txt to data/texts/txt/85.texts.txt
sy2657 Aug 23, 2021
800707e
Rename data/86.texts.txt to data/texts/txt/86.texts.txt
sy2657 Aug 23, 2021
f428938
Rename data/87.texts.txt to data/texts/txt/87.texts.txt
sy2657 Aug 23, 2021
be209a7
Rename data/88.texts.txt to data/texts/txt/88.texts.txt
sy2657 Aug 23, 2021
ac1518b
Rename data/89.texts.txt to data/texts/txt/89.texts.txt
sy2657 Aug 23, 2021
9bbf394
Rename data/90.texts.txt to data/texts/txt/90.texts.txt
sy2657 Aug 23, 2021
d0b205b
Rename data/91.texts.txt to data/texts/txt/91.texts.txt
sy2657 Aug 23, 2021
178e18b
Rename data/93.texts.txt to data/texts/txt/93.texts.txt
sy2657 Aug 23, 2021
c68f007
Rename data/98.texts.txt to data/texts/txt/98.texts.txt
sy2657 Aug 23, 2021
074e20f
Rename data/99.texts.txt to data/texts/txt/99.texts.txt
sy2657 Aug 23, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file added ExtractAlphanumeric.py
Empty file.
78 changes: 56 additions & 22 deletions data/.ipynb_checkpoints/sparql queries-checkpoint.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
{
"cell_type": "code",
"execution_count": 13,
"id": "spanish-group",
"id": "alive-divorce",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -12,8 +12,8 @@
},
{
"cell_type": "code",
"execution_count": 35,
"id": "cooked-heating",
"execution_count": 45,
"id": "split-wagner",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -23,8 +23,8 @@
},
{
"cell_type": "code",
"execution_count": 36,
"id": "atmospheric-feeding",
"execution_count": 46,
"id": "federal-trustee",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -34,8 +34,8 @@
},
{
"cell_type": "code",
"execution_count": 34,
"id": "confidential-phenomenon",
"execution_count": 47,
"id": "latter-apparatus",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -48,7 +48,7 @@
"\n",
"query2 = \"\"\"\n",
"SELECT DISTINCT ?uniName WHERE { \n",
" ?uni owl:sameAs ?wikidataEntity\n",
" ?uni owl:sameAs ?wikidataEntity .\n",
" ?uni ccso:legalName ?uniName\n",
"}\n",
"\"\"\"\n",
Expand All @@ -64,13 +64,19 @@
" ?uni ccso:legalName ?uniName .\n",
" ?uni owl:sameAs ?wikidataEntity\n",
"}\n",
"\"\"\""
"\"\"\"\n",
"\n",
"query5 = \"\"\"\n",
"SELECT DISTINCT ?uniName WHERE {\n",
" ?deUniversity ccso:legalName ?uniName .\n",
" ?deUniversity owl:sameAs ?wikidataEntity \n",
"}\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "chicken-crossing",
"id": "chinese-surname",
"metadata": {},
"outputs": [
{
Expand All @@ -92,29 +98,29 @@
},
{
"cell_type": "code",
"execution_count": 9,
"id": "abstract-tobacco",
"execution_count": 43,
"id": "worth-sensitivity",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<Graph identifier=N80d3e852c0524ee7bc23253702bfb729 (<class 'rdflib.graph.Graph'>)>"
"<rdflib.plugins.sparql.processor.SPARQLResult at 0x23a81f267c0>"
]
},
"execution_count": 9,
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"g"
"g.query(query2)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "attempted-people",
"id": "young-store",
"metadata": {},
"outputs": [
{
Expand All @@ -135,17 +141,17 @@
},
{
"cell_type": "code",
"execution_count": 37,
"id": "characteristic-round",
"execution_count": 48,
"id": "chief-command",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<rdflib.plugins.sparql.processor.SPARQLResult at 0x23a81ba0280>"
"<rdflib.plugins.sparql.processor.SPARQLResult at 0x23a81fe5730>"
]
},
"execution_count": 37,
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -157,17 +163,45 @@
{
"cell_type": "code",
"execution_count": 12,
"id": "wired-atlantic",
"id": "coated-politics",
"metadata": {},
"outputs": [],
"source": [
"g.serialize(destination='q1_output.txt', format='turtle')"
]
},
{
"cell_type": "code",
"execution_count": 49,
"id": "behavioral-america",
"metadata": {},
"outputs": [],
"source": [
"g.serialize(destination='q4_output.txt', format='turtle')"
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "amazing-labor",
"metadata": {},
"outputs": [],
"source": [
"g.serialize(destination='q2_output.txt', format='turtle')"
]
},
{
"cell_type": "markdown",
"id": "colonial-midnight",
"metadata": {},
"source": [
"https://www.w3.org/TR/sparql11-query/"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "imported-spectrum",
"id": "russian-threshold",
"metadata": {},
"outputs": [],
"source": []
Expand Down
44 changes: 44 additions & 0 deletions data/.ipynb_checkpoints/text_manipulation-checkpoint.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "authorized-banks",
"metadata": {},
"outputs": [],
"source": [
"# tab -> space\n",
"\n",
"inputFile = open(“65.texts.txt”, “r”) \n",
"exportFile = open(“65texts.txt”, “w”)\n",
"for line in inputFile:\n",
" new_line = line.replace('\\t', ' ')\n",
" exportFile.write(new_line) \n",
"\n",
"inputFile.close()\n",
"exportFile.close()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading