Skip to content

Commit

Permalink
Add archival of github starred repos and docs
Browse files Browse the repository at this point in the history
  • Loading branch information
KennethWussmann committed Mar 6, 2024
1 parent 5861b1d commit 583aaab
Show file tree
Hide file tree
Showing 17 changed files with 583 additions and 10 deletions.
3 changes: 2 additions & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
node_modules
README.md
.github
build
build
.env
17 changes: 17 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"editor.formatOnSave": true,
"editor.defaultFormatter": "esbenp.prettier-vscode",
"editor.codeActionsOnSave": {
"source.fixAll.eslint": "explicit"
},
"files.exclude": {
"**/.git": true,
"**/.svn": true,
"**/.hg": true,
"**/CVS": true,
"**/.DS_Store": true,
"**/Thumbs.db": true,
"**/node_modules": true,
"**/.turbo": true
}
}
7 changes: 7 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Copyright 2024 Kenneth Wußmann

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
34 changes: 34 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
<div align="center">
<h1><code>github-archive</code></h1>
<p>
<strong>Create mirrors of starred GitHub repos</strong>
</p>
</div>

## In a nutshell

GitHub repos that were previously open-source sometimes get taken down. `github-archive` is a cronjob as a Docker image that will poll the repos that you starred on GitHub and will automatically create a mirror in your [Gitea](https://gitea.com/) instance.
Gitea will download the repository and can even mirror wikis, labels, issues, pull requests, releases and milestones. It will also keep the mirror in sync as long as the source is available.

## Getting Started

`github-archive` is a simple Docker image to just host somewhere and run in the background.

```YAML
services:
github-archive:
image: ghcr.io/kennethwussmann/github-archive:latest
restart: always
environment:
GITHUB_USER: Username
GITHUB_PAT: <fill-me>
GITEA_ORG: github-archive
GITEA_API_KEY: <fill-me>
GITEA_URL: https://gitea.example.com/api/v1
```
For more details see the [Getting Started guide](./docs/getting-started.md) and [Configuration guide](./docs/configuration.md)
---
Please ensure that the software license of the software you want to archive is actually allowing this use-case.
40 changes: 40 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Configuration

This document outlines the environment variables used to configure the behavior of github-archive.

## Configuration Overview

The application uses the following environment variables for its configuration:

- `GITHUB_USER`: The GitHub username. **Required.**
- `GITHUB_PAT`: Personal Access Token(s) for GitHub, separated by commas (,). Optional.
- `GITEA_ORG`: The Gitea organization name where mirrors will be created. Only one of `GITEA_ORG` or `GITEA_USER` can be specified. The org has to exist, it will not be created. Optional.
- `GITEA_USER`: The Gitea username where mirrors will be created. Only one of `GITEA_ORG` or `GITEA_USER` can be specified. Optional.
- `GITEA_API_KEY`: The API key for Gitea. **Required.**
- `GITEA_URL`: The base URL for the Gitea instance. **Required.**
- `GITEA_CREATE_PUBLIC`: Determines if the repositories should be created as public in Gitea. Defaults to `false`. Acceptable values are `true` or `false`.
- `GITEA_MIGRATION_ITEMS`: Specifies the items to be migrated to Gitea. Defaults to "wiki,labels,issues,pull-requests,releases,milestones". Each item should be separated by a comma (,). The possible items are:

- `wiki`
- `labels`
- `issues`
- `pull-requests`
- `releases`
- `milestones`

This is nullable. **By default, all items will be migrated.**

- `GITEA_MIRROR_INTERVAL`: The interval at which the repository should be mirrored. Defaults to `1d` (once a day).
- `CRON_SCHEDULE`: The schedule for the cron job at which it will create mirrors of starred repos in cron format. Defaults to `0 0 * * *` (runs at midnight every day).

## Rate Limiting

The GitHub API will rate-rate limit at some point. You can specify multiple personal access tokens in `GITHUB_PAT` by separating them by comma. That will cause Gitea to rotate them when needed.

## Debugging and Monitoring

- `LOG_LEVEL`: Log level used to control visiblity of messages. The possible items are:
- `info` (default)
- `debug`
- `error`
- `LOG_DESTINATION`: Optionally write logs to files in addition to stdout. Specify a path on a Docker volume.
26 changes: 26 additions & 0 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Getting Started

`github-archive` is best hosted in a Docker container using Docker Compose like in the example below.

**Image:** `ghcr.io/kennethwussmann/github-archive`

**Tags:**

- `latest` - The latest stable release
- `x.x.x` - Specific version (see [Releases](https://github.com/KennethWussmann/github-archive/releases))
- `develop` - Latest up-to-date development build with features still in progress (unstable)

```YAML
services:
github-archive:
image: ghcr.io/kennethwussmann/github-archive:latest
restart: always
environment:
GITHUB_USER: Username
GITHUB_PAT: <fill-me>
GITEA_ORG: github-archive
GITEA_API_KEY: <fill-me>
GITEA_URL: https://gitea.example.com/api/v1
```
This is the minimal set of configuration to get started. For more customization see [Configuration Guide](./configuration.md)
5 changes: 5 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
"description": "",
"scripts": {
"build": "tsc --noEmit && ncc build src/start.ts -o build -m",
"dev": "tsx --watch src/start.ts",
"start": "tsx src/start.ts",
"lint": "eslint \"**/*.{ts,tsx}\" --ext .ts",
"lint:fix": "npm run lint -- --fix",
"format": "prettier . --write",
Expand All @@ -23,13 +25,16 @@
"author": "",
"license": "MIT",
"dependencies": {
"dotenv": "^16.4.5",
"node-cron": "^3.0.3",
"winston": "^3.12.0",
"winston-daily-rotate-file": "^5.0.0",
"zod": "^3.22.4"
},
"devDependencies": {
"@types/eslint": "^8.56.5",
"@types/node": "^20.11.24",
"@types/node-cron": "^3.0.11",
"@typescript-eslint/eslint-plugin": "^7.1.1",
"@typescript-eslint/parser": "^7.1.1",
"@vercel/ncc": "^0.38.1",
Expand Down
42 changes: 42 additions & 0 deletions pnpm-lock.yaml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

120 changes: 120 additions & 0 deletions src/api/gitea/giteaApiService.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
import { type Logger } from "winston";
import {
repoListResponse,
type GiteaRepo,
type CreateRepoMirrorRequest,
} from "./schema";
import { type GitHubRepo } from "../github/schema";
import { type GiteaMigrationItem } from "../../utils/config";

export class GiteaApiService {
constructor(
private readonly logger: Logger,
private readonly url: string,
private readonly apiKey: string,
private readonly org: string | undefined,
private readonly user: string | undefined,
private readonly createPublic: boolean,
private readonly mirrorInterval: string,
private readonly gitHubPATs: string[],
private readonly migrationItems: GiteaMigrationItem[],
) {
this.logger.info("MIGRATION ITEMS", { migrationItems });
}

private getReposFromPath = async (path: string) => {
let page = 1;
const limit = 50;
const allRepos: GiteaRepo[] = [];
let hasMore = true;

while (hasMore) {
const url = `${this.url}${path}?${new URLSearchParams({ limit: limit.toString(), page: page.toString() }).toString()}`;
const response = await fetch(url, {
headers: {
Authorization: `token ${this.apiKey}`,
},
});

if (!response.ok) {
this.logger.error(`Failed to fetch repos`, {
status: response.status,
response: await response.text(),
url,
});
return allRepos;
}

const repos: GiteaRepo[] = repoListResponse.parse(await response.json());

if (repos.length < limit) {
hasMore = false;
} else {
page++;
}

allRepos.push(...repos);
}

this.logger.debug(`Fetched repos`, {
path,
repos: allRepos,
});

return allRepos;
};

public getRepos = async (): Promise<GiteaRepo[]> => {
if (this.org) {
return this.getReposFromPath(`/orgs/${this.org}/repos`);
} else {
return this.getReposFromPath("/repos");
}
};

public createRepoMirror = async (sourceRepo: GitHubRepo) => {
const owner = this.org ?? this.user;
if (!owner) {
throw new Error("No owner found");
}
const request: CreateRepoMirrorRequest = {
service: "github",
mirror: true,
mirror_interval: this.mirrorInterval,
clone_addr: sourceRepo.clone_url,
repo_name: sourceRepo.name,
repo_owner: owner,
auth_token: this.gitHubPATs.join(","),
description: sourceRepo.description,
private: !this.createPublic,
issues: this.migrationItems.some((item) => item === "issues"),
labels: this.migrationItems.some((item) => item === "labels"),
milestones: this.migrationItems.some((item) => item === "milestones"),
releases: this.migrationItems.some((item) => item === "releases"),
pull_requests: this.migrationItems.some(
(item) => item === "pull-requests",
),
wiki: this.migrationItems.some((item) => item === "wiki"),
};
this.logger.debug(`Creating repo mirror`, {
sourceRepo,
request,
});
const response = await fetch(`${this.url}/repos/migrate`, {
method: "POST",
headers: {
Authorization: `token ${this.apiKey}`,
"Content-Type": "application/json",
},
body: JSON.stringify(request),
});

if (!response.ok) {
this.logger.error(`Failed to create repo mirror`, {
status: response.status,
response: await response.text(),
sourceRepo,
});
}
};
}
29 changes: 29 additions & 0 deletions src/api/gitea/schema.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
import { z } from "zod";

const repo = z.object({
name: z.string(),
});

export const repoListResponse = z.array(repo);

export type GiteaRepo = z.infer<typeof repo>;

const createRepoMirrorRequest = z.object({
auth_token: z.string().nullable(),
repo_owner: z.string(),
repo_name: z.string(),
clone_addr: z.string(),
description: z.string().nullable(),
mirror: z.literal(true),
mirror_interval: z.string(),
private: z.boolean(),
service: z.literal("github"),
issues: z.boolean(),
labels: z.boolean(),
milestones: z.boolean(),
releases: z.boolean(),
pull_requests: z.boolean(),
wiki: z.boolean(),
});

export type CreateRepoMirrorRequest = z.infer<typeof createRepoMirrorRequest>;
Loading

0 comments on commit 583aaab

Please sign in to comment.