Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gather ~2,000-5,000 record corpus of dirty town names to test against. #2

Open
tommeagher opened this issue Feb 5, 2013 · 5 comments

Comments

@tommeagher
Copy link
Member

No description provided.

@ghost ghost assigned esagara Feb 5, 2013
@esagara
Copy link
Contributor

esagara commented Feb 6, 2013

I will put this together. I ran a quick query on the voter registration database and the names seem to be a bit more standardized than I remember. It could be that I cleaned it already. Another option is pulling from another, much dirtier data source, such as FEC campaign finance data. That will give us not only variations in words like mount, borough and township but also misspellings of town names.

@tommeagher
Copy link
Member Author

Any luck?

@tommeagher
Copy link
Member Author

Added @CarlaAstudillo & @sstirling to this repo.
Obviously, Eric and I didn't get very far with this. But if you two find any compelling use case for something like this, let me know. I'd be interested in resurrecting this. If not, no big deal. Got plenty to keep us busy.

@sstirling
Copy link

I'm sure we'll stumble across something before long.

@tommeagher
Copy link
Member Author

Sorry if you got hit with a bunch of alerts, @esagara, @sstirling, @CarlaAstudillo. Just transferred this repo to HackJersey. No obligation for you. Just wanted to make sure you still have access if you want to contribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants