English (US) | Português (BR)

Data processing

Within the Querido Diário ecosystem, this repository is responsible for document transformations and loading them into appropriate storages.

Learn more about the technologies and the history of the project.

Summary

How to contribute
Development environment
How to run
Support
Acknowledgments
Open Knowledge Brasil
License

How to contribute

Thank you for considering contributing to Querido Diário! 🎉

You can find how to do so in the CONTRIBUTING.md!

Additionally, check out the Querido Diário documentation to help you.

Development environment

To set up the development environment, the podman container manager is required.

From a terminal open in the repository root directory, use the following command sequence to build the images and set up the pod and resource containers on a Linux operating system:

make build
make setup

For more details about the setup, read "how to set up the development environment".

How to run

To run any pipeline, it's necessary to populate the metadata database (Postgres) and download documents to the object storage (Minio). For this, we can use the scraper repository according to the end-to-end setup documentation.

After running the scrapers, we can run the text extraction pipeline, which will populate the search engine (Opensearch) with the main index (full text of gazettes) and thematic indexes (gazette excerpts related to specific topics). This is done with the command:

make re-run

By default, this pipeline will process all documents in the database, regardless of whether they have been previously processed. If you want to change this behavior, modify the EXECUTION_MODE environment variable in envvars.

With the extracted texts, we can also run the data aggregation pipeline, which provides the gazette texts in CSV format. To do so, run:

make aggregate-gazettes

The results can be found in the search engine and object storage. Find tips on how to access them in this documentation.

Support

Join our community channel to discuss projects, ask questions, request help with contributions, and chat about civic innovation in general.

Acknowledgments

The application was initially developed with the people from the software studio Jurema.

This project is maintained by Open Knowledge Brasil and made possible thanks to technical communities, Civic Innovation Ambassadors, volunteers, and financial donors, as well as partner universities, supporting companies, and funders.

Get to know who supports Querido Diário.

Open Knowledge Brasil

Open Knowledge Brasil is a non-profit civil society organization whose mission is to use and develop civic tools, projects, public policy analyses, and data journalism to promote open knowledge in various fields of society.

All the work produced by OKBR is freely available.

License

Code licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README-en-US.md

README-en-US.md

Data processing

Summary

How to contribute

Development environment

How to run

Support

Acknowledgments

Open Knowledge Brasil

License

Files

README-en-US.md

Latest commit

History

README-en-US.md

File metadata and controls

Data processing

Summary

How to contribute

Development environment

How to run

Support

Acknowledgments

Open Knowledge Brasil

License