Skip to content

Latest commit

 

History

History
106 lines (75 loc) · 7.84 KB

README-en-US.md

File metadata and controls

106 lines (75 loc) · 7.84 KB

English (US) | Português (BR)

Querido Diário

Data processing

Within the Querido Diário ecosystem, this repository is responsible for document transformations and loading them into appropriate storages.

Learn more about the technologies and the history of the project.

Summary

How to contribute

catarse

Thank you for considering contributing to Querido Diário! 🎉

You can find how to do so in the CONTRIBUTING.md!

Additionally, check out the Querido Diário documentation to help you.

Development environment

To set up the development environment, the podman container manager is required.

From a terminal open in the repository root directory, use the following command sequence to build the images and set up the pod and resource containers on a Linux operating system:

make build
make setup

For more details about the setup, read "how to set up the development environment".

How to run

To run any pipeline, it's necessary to populate the metadata database (Postgres) and download documents to the object storage (Minio). For this, we can use the scraper repository according to the end-to-end setup documentation.

After running the scrapers, we can run the text extraction pipeline, which will populate the search engine (Opensearch) with the main index (full text of gazettes) and thematic indexes (gazette excerpts related to specific topics). This is done with the command:

make re-run

By default, this pipeline will process all documents in the database, regardless of whether they have been previously processed. If you want to change this behavior, modify the EXECUTION_MODE environment variable in envvars.

With the extracted texts, we can also run the data aggregation pipeline, which provides the gazette texts in CSV format. To do so, run:

make aggregate-gazettes

The results can be found in the search engine and object storage. Find tips on how to access them in this documentation.

Support

Discord Invite

Join our community channel to discuss projects, ask questions, request help with contributions, and chat about civic innovation in general.

Acknowledgments

The application was initially developed with the people from the software studio Jurema.

This project is maintained by Open Knowledge Brasil and made possible thanks to technical communities, Civic Innovation Ambassadors, volunteers, and financial donors, as well as partner universities, supporting companies, and funders.

Get to know who supports Querido Diário.

Open Knowledge Brasil

Bluesky Follow Instagram Follow LinkedIn Follow Mastodon Follow

Open Knowledge Brasil is a non-profit civil society organization whose mission is to use and develop civic tools, projects, public policy analyses, and data journalism to promote open knowledge in various fields of society.

All the work produced by OKBR is freely available.

License

Code licensed under the MIT License.