This repository contains a Python-based tool for analyzing and processing Dockerfiles. It is designed to generate user-friendly questions and outputs in JSONL format for training or other applications. The tool uses the Hugging Face Inference API to interact with a language model, providing meaningful outputs based on the content of the Dockerfiles.
- Automated Dockerfile Analysis: Parses and validates Dockerfiles for processing.
- Hugging Face Integration: Uses
mistralai/Mistral-7B-Instruct-v0.3
for generating prompts and responses. - Error Handling & Retry Mechanism: Handles API failures with retry logic and logs failures for later review.
- Logging: Tracks success and failure statistics in both console output and log files.
- JSONL Output: Generates well-structured JSONL files with system-user interactions for each Dockerfile.
.
├── dockerfiles
│ └── sources-gold # Directory containing input Dockerfiles
├── data
│ └── dockerfiles.jsonl # Output file storing processed data in JSONL format
├── logs
│ ├── success.log # Logs filenames successfully processed
│ └── failure.log # Logs filenames that failed processing
├── .env # Environment variables (e.g., API_TOKEN)
├── main.py # Main Python script for processing Dockerfiles
├── README.md # Repository documentation (this file)
└── requirements.txt # Python dependencies
-
Dockerfile Parsing:
- The tool reads Dockerfiles from the
dockerfiles/sources-gold
directory. - Validates each file using the
dockerfile
library to ensure compatibility.
- The tool reads Dockerfiles from the
-
Prompt Generation:
- Constructs a prompt based on the content of the Dockerfile.
- Sends the prompt to the Hugging Face Inference API for processing.
-
Response Handling:
- Validates and cleans the model's response.
- Retries up to a defined limit if the response is invalid or empty.
-
Output Generation:
- Creates a JSONL entry with the Dockerfile content and the generated user question.
- Logs each file's success or failure into separate log files.
- Python 3.8+
- Install dependencies:
pip install -r requirements.txt
- Set up the
.env
file with your Hugging Face API token:API_TOKEN=your_hugging_face_api_token
Execute the main script to process Dockerfiles:
python main.py
- Processed Data:
- Saved in
data/dockerfiles.jsonl
as structured JSONL.
- Saved in
- Logs:
- Successful files:
logs/success.log
- Failed files:
logs/failure.log
- Successful files:
{
"text": "System: You are a Dockerfile generator.\n\nUser: Create a Dockerfile using...\n\nAssistant: FROM alpine:3.10\nRUN ..."
}
- Fork the repository.
- Create a new branch:
git checkout -b feature-branch
- Make your changes and commit them:
git commit -m "Add new feature"
- Push to your branch:
git push origin feature-branch
- Open a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
For questions or feedback, please create an issue in this repository.