Dockerfile Processor

Overview

This repository contains a Python-based tool for analyzing and processing Dockerfiles. It is designed to generate user-friendly questions and outputs in JSONL format for training or other applications. The tool uses the Hugging Face Inference API to interact with a language model, providing meaningful outputs based on the content of the Dockerfiles.

Features

Automated Dockerfile Analysis: Parses and validates Dockerfiles for processing.
Hugging Face Integration: Uses mistralai/Mistral-7B-Instruct-v0.3 for generating prompts and responses.
Error Handling & Retry Mechanism: Handles API failures with retry logic and logs failures for later review.
Logging: Tracks success and failure statistics in both console output and log files.
JSONL Output: Generates well-structured JSONL files with system-user interactions for each Dockerfile.

File Structure

.
├── dockerfiles
│   └── sources-gold       # Directory containing input Dockerfiles
├── data
│   └── dockerfiles.jsonl  # Output file storing processed data in JSONL format
├── logs
│   ├── success.log        # Logs filenames successfully processed
│   └── failure.log        # Logs filenames that failed processing
├── .env                   # Environment variables (e.g., API_TOKEN)
├── main.py                # Main Python script for processing Dockerfiles
├── README.md              # Repository documentation (this file)
└── requirements.txt       # Python dependencies

How It Works

Dockerfile Parsing:
- The tool reads Dockerfiles from the dockerfiles/sources-gold directory.
- Validates each file using the dockerfile library to ensure compatibility.
Prompt Generation:
- Constructs a prompt based on the content of the Dockerfile.
- Sends the prompt to the Hugging Face Inference API for processing.
Response Handling:
- Validates and cleans the model's response.
- Retries up to a defined limit if the response is invalid or empty.
Output Generation:
- Creates a JSONL entry with the Dockerfile content and the generated user question.
- Logs each file's success or failure into separate log files.

Usage

Prerequisites

Python 3.8+
Install dependencies:
```
pip install -r requirements.txt
```
Set up the .env file with your Hugging Face API token:
```
API_TOKEN=your_hugging_face_api_token
```

Running the Script

Execute the main script to process Dockerfiles:

python main.py

Outputs

Processed Data:
- Saved in data/dockerfiles.jsonl as structured JSONL.
Logs:
- Successful files: logs/success.log
- Failed files: logs/failure.log

Example JSONL Entry

{
  "text": "System: You are a Dockerfile generator.\n\nUser: Create a Dockerfile using...\n\nAssistant: FROM alpine:3.10\nRUN ..."
}

Contributing

Fork the repository.
Create a new branch:
```
git checkout -b feature-branch
```
Make your changes and commit them:
```
git commit -m "Add new feature"
```
Push to your branch:
```
git push origin feature-branch
```
Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For questions or feedback, please create an issue in this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.idea		.idea
data		data
logs		logs
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dockercompose.generator.py		dockercompose.generator.py
dockerfile.generator.py		dockerfile.generator.py
dockerimage.info.generator.py		dockerimage.info.generator.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dockerfile Processor

Overview

Features

File Structure

How It Works

Usage

Prerequisites

Running the Script

Outputs

Example JSONL Entry

Contributing

License

Contact

About

Releases

Packages

Languages

License

neptun-software/neptun.data.generators

Folders and files

Latest commit

History

Repository files navigation

Dockerfile Processor

Overview

Features

File Structure

How It Works

Usage

Prerequisites

Running the Script

Outputs

Example JSONL Entry

Contributing

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages