English (US) | Português (BR)
Within the Querido Diário ecosystem, this repository is responsible for document transformations and loading them into appropriate storages.
Learn more about the technologies and the history of the project.
- How to contribute
- Development environment
- How to run
- Support
- Acknowledgments
- Open Knowledge Brasil
- License
Thank you for considering contributing to Querido Diário! 🎉
You can find how to do so in the CONTRIBUTING.md!
Additionally, check out the Querido Diário documentation to help you.
To set up the development environment, the podman container manager is required.
From a terminal open in the repository root directory, use the following command sequence to build the images and set up the pod and resource containers on a Linux operating system:
make build
make setup
For more details about the setup, read "how to set up the development environment".
To run any pipeline, it's necessary to populate the metadata database (Postgres) and download documents to the object storage (Minio). For this, we can use the scraper repository according to the end-to-end setup documentation.
After running the scrapers, we can run the text extraction pipeline, which will populate the search engine (Opensearch) with the main index (full text of gazettes) and thematic indexes (gazette excerpts related to specific topics). This is done with the command:
make re-run
By default, this pipeline will process all documents in the database, regardless of whether they have been previously processed. If you want to change this behavior, modify the EXECUTION_MODE
environment variable in envvars
.
With the extracted texts, we can also run the data aggregation pipeline, which provides the gazette texts in CSV format. To do so, run:
make aggregate-gazettes
The results can be found in the search engine and object storage. Find tips on how to access them in this documentation.
Join our community channel to discuss projects, ask questions, request help with contributions, and chat about civic innovation in general.
The application was initially developed with the people from the software studio Jurema.
This project is maintained by Open Knowledge Brasil and made possible thanks to technical communities, Civic Innovation Ambassadors, volunteers, and financial donors, as well as partner universities, supporting companies, and funders.
Get to know who supports Querido Diário.
Open Knowledge Brasil is a non-profit civil society organization whose mission is to use and develop civic tools, projects, public policy analyses, and data journalism to promote open knowledge in various fields of society.
All the work produced by OKBR is freely available.
Code licensed under the MIT License.