Skip to content

Commit

Permalink
Merge pull request #21 from tjkessler/dev
Browse files Browse the repository at this point in the history
2.1.0 Release
  • Loading branch information
tjkessler authored Jan 28, 2019
2 parents d6b6bb9 + 69a8629 commit 7360c08
Show file tree
Hide file tree
Showing 56 changed files with 1,515 additions and 633 deletions.
67 changes: 66 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -267,7 +267,72 @@ To view more examples of common ECNet tasks such as hyperparameter optimization

ECNet databases are comma-separated value (CSV) formatted files that provide information such as the ID of each data point, an optional explicit sort type, various strings and groups to identify data points, target values and input parameters. Row 1 is used to identify which columns are used for ID, explicit sorting assignment, various strings and groups, and target and input data, and row 2 contains the names of these strings/groups/targets/inputs. Additional rows are data points.

The [databases](https://github.com/TJKessler/ECNet/tree/master/databases) directory contains databases for cetane number as well as a database template.
The [databases](https://github.com/TJKessler/ECNet/tree/master/databases) directory contains databases for cetane number, cloud point, pour point and yield sooting index, as well as a database template.

You can create an ECNet-formatted database with molecule names or SMILES and (optionally) target values. The following programs must be installed for you to do so:
- [Open Babel](http://openbabel.org/wiki/Main_Page) software
- [Java JRE](https://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html) version 6 and above

Supplied names or SMILES must exist in a text file, one entry per line:
```
Acetaldehyde
Acetaldehyde dimethyl acetal
Acetic acid
Acetic anhydride
Acetol
Acetone
Acetonitrile
Acetonylacetone
```

If target values are supplied, they must also exist in a text file (of equal length to the supplied names or SMILES):
```
70
147
244
284
295
133
180
376
```

The database can then be constructed with:
```python
from ecnet.tools import create_db

create_db('names.txt', 'my_database.csv', targets='targets.txt')
```

If SMILES strings are supplied instead of names:
```python
from ecnet.tools import create_db

create_db('smiles.txt', 'my_database.csv', targets='targets.txt', form='smiles')
```

Your database's DATAID column (essentially Bates numbers for each molecule) will increment starting at 0001:

| DATAID |
|-------- |
| DATAID |
| 0001 |
| 0002 |
| 0003 |

If a prefix is desired for these values, specify it with:
```python
from ecnet.tools import create_db

create_db('names.txt', 'my_database.csv', targets='targets.txt', id_prefix='MOL')
```

| DATAID |
|----------- |
| DATAID |
| MOL0001 |
| MOL0002 |
| MOL0003 |

# Contributing, Reporting Issues and Other Support:

Expand Down
6 changes: 5 additions & 1 deletion databases/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,9 @@

### Here are brief descriptions of the databases:
- **cn_model_v1.0_full.csv**: cetane number database containing 482 molecules from 11 compound groups, each with an experimental cetane number and over 1600 QSPR descriptors generated using [E-Dragon](http://www.vcclab.org/lab/edragon/)
- **cn_model_v1.0.csv**: cetane number database containing 482 molecules from 11 compound groups, each with an experimental cetane number and 15 QSPR descriptors chosen via the [limit_parameters](https://github.com/TJKessler/ECNet/blob/master/examples/limit_db_parameters.py) server method
- **cn_model_v1.0.csv**: cetane number database containing 482 molecules from 11 compound groups, each with an experimental cetane number and 15 QSPR descriptors chosen via the [limit_parameters](https://github.com/tjkessler/ECNet/blob/master/examples/limit_input_parameters.py) Server method
- **cp_model_v1.0_full.csv**: cloud point database containing 43 molecules, each with an experimental cloud point value and over 1800 QSPR descriptors generated using [PaDEL-Descriptor](http://www.yapcwsoft.com/dd/padeldescriptor/)
- **cp_model_v1.0.csv**: cloud point database containing 43 molecules, each with an experimental cloud point value and 15 QSPR descriptors chosen via the [limit_parameters](https://github.com/tjkessler/ECNet/blob/master/examples/limit_input_parameters.py) Server method
- **pp_model_v1.0_full.csv**: pour point database containing 41 molecules, each with an experimental pour point value and over 1800 QSPR descriptors generated using [PaDEL-Descriptor](http://www.yapcwsoft.com/dd/padeldescriptor/)
- **pp_model_v1.0.csv**: pour point database containing 41 molecules, each with an experimental pour point value and 15 QSPR descriptors chosen via the [limit_parameters](https://github.com/tjkessler/ECNet/blob/master/examples/limit_input_parameters.py) Server method
- **db_template.csv**: ECNet-formatted database template
45 changes: 45 additions & 0 deletions databases/cp_model_v1.0.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
DATAID,ASSIGNMENT,STRING,STRING,TARGET,INPUT,INPUT,INPUT,INPUT,INPUT,INPUT,INPUT,INPUT,INPUT,INPUT,INPUT,INPUT,INPUT,INPUT,INPUT
DATAID,ASSIGNMENT,Formula,SMILES,Target,nHdNH,MATS2e,SC-5,SdsssP,MATS8v,mindSe,mindSe,AATSC3m,VE3_Dzs,AATSC1v,ETA_dEpsilon_D,TDB9r,ATSC5s,naaNH,TDB9p
UMLCP0001,L,C6E3,CCCCCCOCCOCCOCCO,40.5,0,-0.219635017,0,0,-0.034780067,0,0,-0.339435472,-2.41209002,-5.85592544,0.0348,1.910026466,-5.87414966,0,7.856152845
UMLCP0002,L,C6E4,CCCCCCOCCOCCOCCOCCO,63.8,0,-0.241400656,0,0,0.001959448,0,0,-0.631740798,-3.081262757,-6.14262828,0.03009,2.156481529,-7.813202832,0,8.771095959
UMLCP0003,L,C6E5,CCCCCCOCCOCCOCCOCCOCCO,75,0,-0.257991147,0,0,0.025296109,0,0,-0.889179415,-3.87719352,-6.358674128,0.02648,2.112244238,-9.769132653,0,8.552160777
UMLCP0004,L,C6E6,CCCCCCOCCOCCOCCOCCOCCOCCO,83,0,-0.271041986,0,0,0.041513732,0,0,-1.112386229,-4.769287768,-6.527284383,0.02364,2.248358313,-11.73582766,0,9.038760735
UMLCP0005,L,C8E3,CCCCCCCCOCCOCCOCCO,7,0,-0.195178264,0,0,0.015697116,0,0,-0.168553352,-2.307015717,-5.126223652,0.03002,2.141455432,-4.770833333,0,8.889880537
UMLCP0006,L,C8E4,CCCCCCCCOCCOCCOCCOCCO,38.5,0,-0.216782749,0,0,0.036398415,0,0,-0.396911884,-2.711185365,-5.481633016,0.02648,2.088539856,-6.589586777,0,8.57368948
UMLCP0007,V,C8E5,CCCCCCCCOCCOCCOCCOCCOCCO,58.6,0,-0.233932211,0,0,0.050797588,0,0,-0.613351466,-3.247340834,-5.756170569,0.02367,2.24048913,-8.453173777,0,9.128597862
UMLCP0008,V,C8E6,CCCCCCCCOCCOCCOCCOCCOCCOCCO,72.5,0,-0.247848102,0,0,0.061435852,0,0,-0.811440627,-3.890901476,-5.974602508,0.02138,2.195812694,-10.34688091,0,8.914916161
UMLCP0009,V,C8E8,CCCCCCCCOCCOCCOCCOCCOCCOCCOCCOCCO,96,0,-0.269022769,0,0,0.076167501,0,0,-1.15092226,-5.424975856,-6.300244293,0.01791,2.252694367,-14.19211787,0,9.074850537
UMLCP0010,L,C9E4,CCCCCCCCCOCCOCCOCCOCCO,32,0,-0.206876706,0,0,0.033075899,0,0,-0.315539433,-2.648533393,-5.201743459,0.02497,2.092628402,-6.097800238,0,8.630654197
UMLCP0011,L,C9E5,CCCCCCCCCOCCOCCOCCOCCOCCO,55,0,-0.223994711,0,0,0.04683278,0,0,-0.512853118,-3.081305061,-5.495774924,0.02246,2.230638176,-7.908284024,0,9.128461857
UMLCP0012,L,C9E6,CCCCCCCCCOCCOCCOCCOCCOCCOCCO,75,0,-0.238071422,0,0,0.057237107,0,0,-0.697705327,-3.622282808,-5.731904728,0.0204,2.191042809,-9.758680556,0,8.93302882
UMLCP0013,V,C10E4,CCCCCCCCCCOCCOCCOCCOCCO,19.7,0,-0.198199535,0,0,0.030290854,0,0,-0.250399558,-2.637161531,-4.949040496,0.02361,2.192659435,-5.667159366,0,9.150669062
UMLCP0014,T,C10E5,CCCCCCCCCCOCCOCCOCCOCCOCCO,41.6,0,-0.215159176,0,0,0.043423145,0,0,-0.430038256,-2.980491581,-5.25790708,0.02137,2.227511153,-7.423010381,0,9.216282579
UMLCP0015,L,C10E6,CCCCCCCCCCOCCOCCOCCOCCOCCOCCO,60.3,0,-0.229276444,0,0,0.053557588,0,0,-0.601983012,-3.43080315,-5.508139871,0.0195,2.254406532,-9.228,0,9.266975734
UMLCP0016,L,C10E8,CCCCCCCCCCOCCOCCOCCOCCOCCOCCOCCOCCO,84.5,0,-0.251374119,0,0,0.068225801,0,0,-0.911370727,-4.592766706,-5.888806686,0.01659,2.293553961,-12.9361823,0,9.341413922
UMLCP0017,V,C10E10,CCCCCCCCCCOCCOCCOCCOCCOCCOCCOCCOCCOCCOCCO,95,0,-0.267845332,0,0,0.078365739,0,0,-1.173559872,-6.02401207,-6.16467817,0.01442,2.277245077,-16.72839099,0,9.227614613
UMLCP0018,L,C11E4,CCCCCCCCCCCOCCOCCOCCOCCO,10.5,0,-0.190547781,0,0,0.027924663,0,0,-0.197699668,-2.663915498,-4.719747419,0.02239,2.192966678,-5.287353516,0,9.243449815
UMLCP0019,L,C11E5,CCCCCCCCCCCOCCOCCOCCOCCOCCO,37,0,-0.20726037,0,0,0.040461642,0,0,-0.361192952,-2.929176314,-5.039766235,0.02037,2.151700211,-6.98839516,0,8.991217518
UMLCP0020,L,C11E6,CCCCCCCCCCCOCCOCCOCCOCCOCCOCCO,57.5,0,-0.221328616,0,0,0.050308289,0,0,-0.520827603,-3.299234122,-5.30117731,0.01867,2.249991552,-8.74704142,0,9.329594799
UMLCP0021,T,C11E8,CCCCCCCCCCCOCCOCCOCCOCCOCCOCCOCCOCCO,82,0,-0.243628194,0,0,0.064826138,0,0,-0.814385005,-4.300949088,-5.70257991,0.01599,2.287947286,-12.38149811,0,9.389506088
UMLCP0022,L,C12E4,CCCCCCCCCCCCOCCOCCOCCOCCO,6,0,-0.183760194,0,0,0.025890918,0,0,-0.154667277,-2.71990075,-4.510755948,0.02128,2.191828507,-4.950211628,0,9.316353045
UMLCP0023,V,C12E5,CCCCCCCCCCCCOCCOCCOCCOCCOCCO,28.9,0,-0.200164263,0,0,0.037866823,0,0,-0.303509896,-2.916187197,-4.838997178,0.01946,2.154110448,-6.597151205,0,9.075778288
UMLCP0024,T,C12E6,CCCCCCCCCCCCOCCOCCOCCOCCOCCOCCO,51,0,-0.214116714,0,0,0.047419234,0,0,-0.4515645,-3.214909027,-5.109194316,0.01791,2.246520961,-8.309327846,0,9.386361281
UMLCP0025,T,C12E7,CCCCCCCCCCCCOCCOCCOCCOCCOCCOCCOCCO,64.7,0,-0.226101147,0,0,0.055228435,0,0,-0.594234865,-3.603476333,-5.335483726,0.01658,2.216035672,-10.07024793,0,9.219194553
UMLCP0026,L,C12E8,CCCCCCCCCCCCOCCOCCOCCOCCOCCOCCOCCOCCO,77.9,0,-0.236493151,0,0,0.061738925,0,0,-0.729424832,-4.071350512,-5.52775825,0.01543,2.282739681,-11.86853186,0,9.432050865
UMLCP0027,L,C12E9,CCCCCCCCCCCCOCCOCCOCCOCCOCCOCCOCCOCCOCCO,87.8,0,-0.245583316,0,0,0.067254252,0,0,-0.856358531,-4.609078197,-5.693147697,0.01443,2.254154078,-13.69607843,0,9.287346606
UMLCP0028,L,C12E10,CCCCCCCCCCCCOCCOCCOCCOCCOCCOCCOCCOCCOCCOCCO,95.5,0,-0.253597897,0,0,0.071989225,0,0,-0.974957469,-5.208222897,-5.836920161,0.01355,2.308578787,-15.54696574,0,9.464681444
UMLCP0029,L,C12E11,CCCCCCCCCCCCOCCOCCOCCOCCOCCOCCOCCOCCOCCOCCOCCO,100.3,0,-0.260714785,0,0,0.076100384,0,0,-1.085503446,-5.861431018,-5.963052234,0.01277,2.282931494,-17.41676576,0,9.340613456
UMLCP0030,L,C13E5,CCCCCCCCCCCCCOCCOCCOCCOCCOCCO,27,0,-0.193761035,0,0,0.035575655,0,0,-0.254837982,-2.933470912,-4.653605104,0.01862,2.219843374,-6.243295665,0,9.410576071
UMLCP0031,V,C13E6,CCCCCCCCCCCCCOCCOCCOCCOCCOCCOCCO,42,0,-0.207548014,0,0,0.044834678,0,0,-0.392094178,-3.168352714,-4.930622429,0.0172,2.243341803,-7.909438776,0,9.436243709
UMLCP0032,T,C13E8,CCCCCCCCCCCCCOCCOCCOCCOCCOCCOCCOCCOCCO,72.5,0,-0.22990251,0,0,0.058923775,0,0,-0.654664408,-3.892981526,-5.363326203,0.01491,2.277914266,-11.39285714,0,9.469944776
UMLCP0033,L,C14E5,CCCCCCCCCCCCCCOCCOCCOCCOCCOCCO,20,0,-0.187959865,0,0,0.03353863,0,0,-0.213508532,-2.975146007,-4.481889245,0.01785,2.217400613,-5.921875,0,9.457806202
UMLCP0034,L,C14E6,CCCCCCCCCCCCCCOCCOCCOCCOCCOCCOCCO,42.3,0,-0.201544627,0,0,0.042509693,0,0,-0.340751421,-3.152405036,-4.764104635,0.01655,2.185134141,-7.542806183,0,9.25331583
UMLCP0035,T,C14E7,CCCCCCCCCCCCCCOCCOCCOCCOCCOCCOCCOCCO,57.6,0,-0.213392839,0,0,0.049997309,0,0,-0.466669497,-3.416073462,-5.003213458,0.01542,2.258187944,-9.223404255,0,9.492691589
UMLCP0036,V,C14E8,CCCCCCCCCCCCCCOCCOCCOCCOCCOCCOCCOCCOCCO,70.5,0,-0.223799025,0,0,0.056346892,0,0,-0.588605726,-3.757162189,-5.208385223,0.01442,2.273594184,-10.95064209,0,9.50474028
UMLCP0037,T,C15E6,CCCCCCCCCCCCCCCOCCOCCOCCOCCOCCOCCO,37.5,0,-0.196040653,0,0,0.040407684,0,0,-0.296202933,-3.16161004,-4.608460923,0.01594,2.185412183,-7.205555556,0,9.304758811
UMLCP0038,T,C15E8,CCCCCCCCCCCCCCCOCCOCCOCCOCCOCCOCCOCCOCCO,66,0,-0.218133146,0,0,0.053979788,0,0,-0.530011046,-3.656952292,-5.062137375,0.01396,2.229672442,-10.53855399,0,9.383719417
UMLCP0039,T,C16E6,CCCCCCCCCCCCCCCCOCCOCCOCCOCCOCCOCCO,35.5,0,-0.190979974,0,0,0.038498552,0,0,-0.25737188,-3.191774972,-4.462660352,0.01537,2.234395543,-6.894380853,0,9.551145038
UMLCP0040,L,C16E7,CCCCCCCCCCCCCCCCOCCOCCOCCOCCOCCOCCOCCO,54,0,-0.202567749,0,0,0.045648257,0,0,-0.368309598,-3.351885628,-4.709875454,0.01439,2.251401296,-8.4968,0,9.558740667
UMLCP0041,L,C16E8,CCCCCCCCCCCCCCCCOCCOCCOCCOCCOCCOCCOCCOCCO,65,0,-0.212861772,0,0,0.051798289,0,0,-0.477850967,-3.586764511,-4.923871647,0.01353,2.227303557,-10.15368154,0,9.417272789
UMLCP0042,V,C16E9,CCCCCCCCCCCCCCCCOCCOCCOCCOCCOCCOCCOCCOCCOCCO,75,0,-0.222054349,0,0,0.057147388,0,0,-0.584326713,-3.889654809,-5.110918656,0.01276,2.278789041,-11.85457064,0,9.571051049
UMLCP0043,L,C16E12,CCCCCCCCCCCCCCCCOCCOCCOCCOCCOCCOCCOCCOCCOCCOCCOCCOCCO,92,0,-0.244495259,0,0,0.069711236,0,0,-0.878232273,-5.150200945,-5.553161869,0.01089,2.279603979,-17.15037037,0,9.473975298
Loading

0 comments on commit 7360c08

Please sign in to comment.