Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Announcement - Preparing for v3.0.0 (Breaking changes) #1247

Open
MrLuit opened this issue Aug 19, 2018 · 5 comments
Open

Announcement - Preparing for v3.0.0 (Breaking changes) #1247

MrLuit opened this issue Aug 19, 2018 · 5 comments

Comments

@MrLuit
Copy link
Owner

MrLuit commented Aug 19, 2018

https://medium.com/@etherscamdb/breaking-api-changes-in-v3-646217a22bac

@sekisanchi
Copy link
Contributor

I'm very much afraid that you might have done some of the wrong design decision skipping some design point when transforming the dataset into nosql.

  1. URI confusion
    Entities described with URI are roughly categorized as a whole or with a part into root domain, sub domain and URL. I have not seen clear distinction of them in your statement.

  2. Previous ids can be and have to be preserved and shown optionally.

  3. All old format data have to be preserved and accessed through a format overlay to match new format, so that chronology can be preserved.

Assumption

  1. The ESDB data set is going to be a kind of vulnerability activity oracle on Ethereum address

  2. A source for Virustotal URI scanner and other aggregated blacklisting services

  3. Aggregated liability per Eth address will be the most valuable data for the whole dataset in the near future considering affordable smart contract use case at social layer level (above layer 2)

If enough time and resources are available, those require data modeling by UML to validate.

Are these make sense?

I'm currently using current API to get full 5K lines full dataset into google sheet with importjson about once in a day.

@sekisanchi
Copy link
Contributor

sekisanchi commented Aug 20, 2018

Here's a data entry attribute I've been waiting for.

somethingbad.tumblr.com ==> sub bad, root good
sub.badwallet.com ==> sub bad, root bad

I need an additional distinctive attribute for the above two.

Also how can I identify the scope of black below?
somedomain.com/aaaa/bad domain good/bad, upper directory good/bad

@MrLuit
Copy link
Owner Author

MrLuit commented Aug 20, 2018

First of all, your opinion on this matter is highly appreciated. We will not be pushing these changes to production until we are sure they work out for everyone involved on the project.

To address your concerns about URI / URL classification, I've been thinking about this as well. I propose we work towards the following scam entry structure:

- 
  url: https://malicious.example/scam.php
  scheme: *://malicious.example/*
  category: Phishing
  subcategory: MyCrypto
  description: Malicious website
  addresses:
    - 0x0
    - 0x1

These changes will also be reflected in the UI

While I understand compatibility is important, I think some entries still showing the id property while others do not is a bit confusing. Also, chronology will still be preserved through the natural order of the scams.yaml file (which is the same order as the ascending id)

We will indeed also be providing more integration data (like VirusTotal) through the API in the future.

Please let me know what you think 😄

@sekisanchi
Copy link
Contributor

Good!
Let me add some insights.
I look at current entry/record as

a) Scam/phishing evidence
b) filtering template to detect malicious activities

As evidence I want some snapshot record, like URLscan or phishcheck
in case of
ppp.ttt -> nothing
ppp.ttt/eth -> scam page

showing only ttt.ppp does not sastify the needs. We may either having addtional link to the evidence, or specify that in URI like latter example

I'm now locally keep them or search above.

For b) I can think of 3 types of templates at least

  1. root domain
    whole contents of the ttt.ppp root domain 

  2. sub domain

sss.ttt.ppp

They are rotating URI/contents, and try to escape the filter with staging deployment.
I guess you now assuming that for sub-domain only, but it's a security hole they already targeting in their staging disgusting deployment.
like

initial:
ttt.ppp - nothing
sss.ttt.ppp - scam page

later:
sss.ttt.ppp - gone
ttt.ppp - scam

In that case, when registering sss.ttt.ppp, we must mark if ttt.ppp is good or bad, to distinguish those from tumbler/blogger for example.

  1. ttt.ppp/sdjrhf/djrj

Google doc/telegram/dropbox etc

Those are probably fundamental requirements from perspective view to the dataset, desirably comply at any point of expansion, but faster is better to avoid big modification or rewrite.

For UI, ESDB is a kind of professional tool on purpose, and I recommend you not to stick on simplicity and entertainment factor that some of outside may aware.

@MrLuit
Copy link
Owner Author

MrLuit commented Aug 23, 2018

I will update my PR next week using your feedback, thanks for being involved 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants