diff --git a/.gitignore b/.gitignore
index c906af6..dbe67c4 100644
--- a/.gitignore
+++ b/.gitignore
@@ -168,3 +168,6 @@ cython_debug/
!doc_gen/Makefile
!doc_gen/static.css
doc_gen/*
+
+# Examples
+!samples/*
diff --git a/README.md b/README.md
index c4edd74..1830c8d 100644
--- a/README.md
+++ b/README.md
@@ -51,31 +51,16 @@ git2s3 start
Sourcing environment variables from an env file
-Environment variables can be sourced using any `plaintext` / `JSON` / `YAML` file.
-The filepath should be provided as an argument during object instantiation.
-
-> _By default, `Git2S3` will look for a `.env` file in the current working directory._
-
-**Examples**
-
-- **CLI**
-```shell
-git2s3 start --env-file "/path/to/env/file"
-```
-
-- **IDE**
-```python
-import git2s3
-backup = git2s3.Git2S3(env_file='/path/to/env/file')
-backup.start()
-```
+> _By default, `Git2S3` will look for a `.env` file in the current working directory._
+> Refer [samples] directory for examples.
Environment variables can be sourced using any plaintext
/ JSON
/ YAML
file.
-The filepath should be provided as an argument during object instantiation.
-By default,
+Git2S3
will look for a.env
file in the current working directory.Sourcing environment variables from an env file
+-By default,
Git2S3
will look for a.env
file in the current working directory.
+Refer samples directory for examples.Examples
--
-- -
CLI
--git2s3 start --env-file "/path/to/env/file" --
-- -
IDE
-import git2s3 -backup = git2s3.Git2S3(env_file='/path/to/env/file') -backup.start() -
GIT_API_URL - GitHub API endpoint. Defaults to
https://api.github.com/
GIT_OWNER - GitHub profile owner or organization name.
- -
GIT_TOKEN - GitHub token to get ALL repos (including private).
- +
FIELDS - Fields options to restore. Defaults to all.
- +
GIT_IGNORE - List of repositories/gists to ignore. Defaults to
[]
SOURCE - Source options
[repo, gist, wiki]
to back up. Defaults to all.LOG - Log options to log to a
file
orstdout
. Does not apply when custom logger is usedDEBUG - Boolean flag to enable debug level logging. Does not apply when custom logger is used
- diff --git a/docs/README.md b/docs/README.md index c4edd74..1830c8d 100644 --- a/docs/README.md +++ b/docs/README.md @@ -51,31 +51,16 @@ git2s3 start
AWS_PROFILE_NAME - AWS profile name. Uses the CLI config value
AWS_DEFAULT_PROFILE
by default.- **GIT_API_URL** - GitHub API endpoint. Defaults to `https://api.github.com/` - **GIT_OWNER** - GitHub profile owner or organization name. - **GIT_TOKEN** - GitHub token to get ALL repos (including private). -- **FIELDS** - Fields options to restore. Defaults to all. +- **GIT_IGNORE** - List of repositories/gists to ignore. Defaults to `[]` +- **SOURCE** - Source options `[repo, gist, wiki]` to back up. Defaults to all. - **LOG** - Log options to log to a ``file`` or ``stdout``. _Does not apply when custom logger is used_ - **DEBUG** - Boolean flag to enable debug level logging. _Does not apply when custom logger is used_ - **AWS_PROFILE_NAME** - AWS profile name. Uses the CLI config value ``AWS_DEFAULT_PROFILE`` by default. @@ -162,3 +147,4 @@ Licensed under the [MIT License][license] [license]: https://github.com/thevickypedia/git2s3/blob/master/LICENSE [runbook]: https://thevickypedia.github.io/git2s3/ [Boto3 retry configuration]: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html#available-retry-modes +[samples]: https://github.com/thevickypedia/git2s3/tree/main/samples diff --git a/docs/_sources/README.md.txt b/docs/_sources/README.md.txt index c4edd74..1830c8d 100644 --- a/docs/_sources/README.md.txt +++ b/docs/_sources/README.md.txt @@ -51,31 +51,16 @@ git2s3 startSourcing environment variables from an env file
-Environment variables can be sourced using any `plaintext` / `JSON` / `YAML` file. -The filepath should be provided as an argument during object instantiation. - -> _By default, `Git2S3` will look for a `.env` file in the current working directory._ - -**Examples** - -- **CLI** -```shell -git2s3 start --env-file "/path/to/env/file" -``` - -- **IDE** -```python -import git2s3 -backup = git2s3.Git2S3(env_file='/path/to/env/file') -backup.start() -``` +> _By default, `Git2S3` will look for a `.env` file in the current working directory._
+> Refer [samples] directory for examples.- **GIT_API_URL** - GitHub API endpoint. Defaults to `https://api.github.com/` - **GIT_OWNER** - GitHub profile owner or organization name. - **GIT_TOKEN** - GitHub token to get ALL repos (including private). -- **FIELDS** - Fields options to restore. Defaults to all. +- **GIT_IGNORE** - List of repositories/gists to ignore. Defaults to `[]` +- **SOURCE** - Source options `[repo, gist, wiki]` to back up. Defaults to all. - **LOG** - Log options to log to a ``file`` or ``stdout``. _Does not apply when custom logger is used_ - **DEBUG** - Boolean flag to enable debug level logging. _Does not apply when custom logger is used_ - **AWS_PROFILE_NAME** - AWS profile name. Uses the CLI config value ``AWS_DEFAULT_PROFILE`` by default. @@ -162,3 +147,4 @@ Licensed under the [MIT License][license] [license]: https://github.com/thevickypedia/git2s3/blob/master/LICENSE [runbook]: https://thevickypedia.github.io/git2s3/ [Boto3 retry configuration]: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html#available-retry-modes +[samples]: https://github.com/thevickypedia/git2s3/tree/main/samples diff --git a/docs/_sources/index.rst.txt b/docs/_sources/index.rst.txt index c56085c..8ecf322 100644 --- a/docs/_sources/index.rst.txt +++ b/docs/_sources/index.rst.txt @@ -14,6 +14,7 @@ Welcome to Git2S3's documentation! Git2S3 - Main ============= + .. automodule:: git2s3.main S3 @@ -22,19 +23,18 @@ S3 Squire ====== + .. automodule:: git2s3.squire Configuration ============= -.. autoclass:: git2s3.config.Field(BaseModel) - :members: EnvConfig - :exclude-members: _abc_impl, model_config, model_fields +.. autoclass:: git2s3.config.DataStore(BaseModel) + :exclude-members: _abc_impl, model_config, model_fields, model_computed_fields ==== .. autoclass:: git2s3.config.EnvConfig(BaseSettings) - :members: EnvConfig :exclude-members: _abc_impl, model_config, model_fields, model_computed_fields ==== @@ -43,7 +43,12 @@ Configuration ==== -.. autoclass:: git2s3.config.Fields(StrEnum) +.. autoclass:: git2s3.config.SourceControl(StrEnum) + +Exceptions +========== + +.. automodule:: git2s3.exc Indices and tables ================== diff --git a/docs/genindex.html b/docs/genindex.html index 7c5dbaa..91d90b0 100644 --- a/docs/genindex.html +++ b/docs/genindex.html @@ -50,6 +50,7 @@Sourcing environment variables from an env file
-Environment variables can be sourced using any `plaintext` / `JSON` / `YAML` file. -The filepath should be provided as an argument during object instantiation. - -> _By default, `Git2S3` will look for a `.env` file in the current working directory._ - -**Examples** - -- **CLI** -```shell -git2s3 start --env-file "/path/to/env/file" -``` - -- **IDE** -```python -import git2s3 -backup = git2s3.Git2S3(env_file='/path/to/env/file') -backup.start() -``` +> _By default, `Git2S3` will look for a `.env` file in the current working directory._
+> Refer [samples] directory for examples.Index
| F | G | H + | I | L | M | N @@ -64,7 +65,9 @@Index
A
-
- all (git2s3.config.Fields attribute) +
- all (git2s3.config.SourceControl attribute) +
+- ArchiveError
- aws_access_key_id (git2s3.config.EnvConfig attribute)
@@ -98,7 +101,9 @@B
C
+
@@ -112,13 +117,17 @@
C
D
@@ -144,20 +153,10 @@
E
F
@@ -212,6 +224,18 @@
@@ -168,11 +167,18 @@ G
- get_all() (git2s3.main.Git2S3 method)
-- gist (git2s3.config.Fields attribute) +
- gist (git2s3.config.SourceControl attribute)
- Git2S3 (class in git2s3.main)
- + git2s3.exc + +
++
- module +
+- git2s3.main
+@@ -195,11 +201,17 @@
G
- module
- Git2S3Error +
- git_api_url (git2s3.config.EnvConfig attribute) +
+- git_ignore (git2s3.config.EnvConfig attribute)
- git_owner (git2s3.config.EnvConfig attribute)
- git_token (git2s3.config.EnvConfig attribute) +
+- GitHubAPIError
H
I
++
+ + +
- InvalidOwner +
++ +
- InvalidSource +
+L
@@ -227,12 +251,12 @@
L
M
-
- model_computed_fields (git2s3.config.Field attribute) -
- module
+
- git2s3.exc +
- git2s3.main
- git2s3.s3 @@ -246,7 +270,7 @@
M
N
@@ -254,13 +278,15 @@
N
P
-
- private (git2s3.config.Field attribute) +
- parse_source() (git2s3.config.EnvConfig class method) +
+- private (git2s3.config.DataStore attribute)
- profile_type() (git2s3.main.Git2S3 method)
@@ -270,7 +296,7 @@P
R
@@ -278,10 +304,20 @@
R
S
@@ -298,11 +334,15 @@ T
U
@@ -310,7 +350,7 @@
U
W
+ diff --git a/docs/index.html b/docs/index.html index 4d0a5b6..3e16ec7 100644 --- a/docs/index.html +++ b/docs/index.html @@ -96,11 +96,11 @@
Welcome to Git2S3’s documentation!
- -get_all(field: Fields) Generator[Dict[str, str]] ¶
+get_all(source: SourceControl) Generator[Dict[str, str]] ¶- @@ -215,15 +228,15 @@
Iterate through a target owner/organization to get all available repositories/gists.
@@ -138,24 +138,37 @@
- Parameters:
-field – Field type to clone.
+source – Source type to clone.
- Yields:
Generator[Dict[str, str]] – Yields a dictionary of each repo’s information.
@@ -110,11 +110,11 @@Welcome to Git2S3’s documentation!
- -clone_wiki(field: Field) None ¶
+clone_wiki(datastore: DataStore) None ¶Clone all the wikis from the repository.
- Parameters:
-field – Field model to store repository/gist information.
+datastore – DataStore model to store repository/gist information.
Welcome to Git2S3’s documentation!
- -cloner(func: Callable, field: str) None ¶
+cloner(source: SourceControl) bool ¶Clones all the repos/gists concurrently.
+
- Parameters:
--
+- -
func – Function to get all repos/gists.
- -
field – Field type to clone.
source – Source type to clone.
+See also
++
+- +
Clones all the repos/gists concurrently using ThreadPoolExecutor.
- +
GitHub doesn’t have a rate limit for cloning, so multi-threading is safe.
- +
This makes it depend on Git installed on the host machine.
References
https://github.com/orgs/community/discussions/44515
++
- Returns:
+- +
Returns a boolean flag to indicate if any of the threads failed.
+- Return type:
+- +
bool
+
- start() None ¶
-Start the cloning process.
+Start the cloning process and upload to S3 once cloning completes successfully.
Welcome to Git2S3’s documentation!
Returns a reference to the
EnvConfig
object.- Return type:
-- +
-
@@ -257,47 +270,58 @@- -git2s3.squire.field_detector(repo: Dict[str, str], env: EnvConfig) Field ¶
-Detects the type of field to clone and returns the Field model.
+- +git2s3.squire.source_detector(repo: Dict[str, str], env: EnvConfig) DataStore ¶
+Detects the type of source to clone and returns the DataStore model.
- Parameters:
@@ -232,10 +245,10 @@
Welcome to Git2S3’s documentation!Returns: -
Field model.
+DataStore model.
- Return type:
-- +
Welcome to Git2S3’s documentation! +
- +git2s3.squire.check_file_presence(root: str | os.PathLike) bool ¶
+- +
Get a list of all subdirectories and check for file presence.
++
+- Parameters:
+- +
root – Root directory to check for file presence.
+- Returns:
+- +
Returns a bool indicating if files are present in the subdirectories.
+- Return type:
+- +
bool
+Configuration¶
-
- -class git2s3.config.Field(BaseModel)¶
-Field model to store repository/gist information.
->>> Field +- +class git2s3.config.DataStore(BaseModel)¶
+DataStore model to store repository/gist information.
+>>> DataStore-
-- -field: Fields¶
+- +source: SourceControl¶
-
-- -model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}¶
-A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
-
@@ -324,8 +348,13 @@Configuration -
- -fields: Union[Fields, List[Fields]]¶
+- +git_ignore: List[str]¶
+ + ++
- +source: Union[SourceControl, List[SourceControl]]¶
@@ -405,9 +434,9 @@
Configuration -
- -classmethod parse_fields(value: Union[Fields, List[Fields]]) Path ¶
-Validate and parse ‘fields’ to remove ‘all’ from the fields option.
+- +classmethod parse_source(value: Union[SourceControl, List[SourceControl]]) Path ¶
+Validate and parse ‘source’ to remove ‘all’ from the source option.
@@ -416,6 +445,12 @@
+Configuration/ at the end.
+
+- +classmethod parse_git_ignore(value: List[str]) List[str] ¶
+Convert all git_ignore values to lowercase.
+
- class Config¶
@@ -461,34 +496,85 @@Configuration
-
- -class git2s3.config.Fields(StrEnum)¶
-Available fields to clone.
->>> Fields +- +class git2s3.config.SourceControl(StrEnum)¶
+- + +
Available source control options to clone.
+>>> SourceControl+ Exceptions¶
++
+ +- +exception git2s3.exc.DirectoryExists¶
+Warning: Raised when clone directory already exists.
++
+ +- +exception git2s3.exc.UnsupportedSource¶
+Warning: Raised when source is not supported.
++
+ +- +exception git2s3.exc.Git2S3Error¶
+Exception: Base class for all exceptions.
++
+ +- +exception git2s3.exc.GitHubAPIError¶
+Exception: Raised when failed to fetch repositories from source control.
++
+ +- +exception git2s3.exc.InvalidOwner¶
+Exception: Raised when owner is invalid.
++
+ +- +exception git2s3.exc.InvalidSource¶
+Exception: Raised when source is invalid.
++
+ +- +exception git2s3.exc.ArchiveError¶
+Exception: Raised when failed to archive repositories.
++
+- +exception git2s3.exc.UploadError¶
+Exception: Raised when failed to upload file objects to S3.
+Indices and tables¶
@@ -514,6 +600,7 @@Table of Contents
- S3
- Squire
- Configuration
+- Exceptions
- Indices and tables
diff --git a/docs/objects.inv b/docs/objects.inv index 894b6b3..8f4b7ce 100644 Binary files a/docs/objects.inv and b/docs/objects.inv differ diff --git a/docs/py-modindex.html b/docs/py-modindex.html index e5429cd..ac9ed9b 100644 --- a/docs/py-modindex.html +++ b/docs/py-modindex.html @@ -58,6 +58,11 @@Python Module Index
git2s3
+ + + git2s3.exc
+ diff --git a/docs/searchindex.js b/docs/searchindex.js index 4ff2ef0..4854808 100644 --- a/docs/searchindex.js +++ b/docs/searchindex.js @@ -1 +1 @@ -Search.setIndex({"docnames": ["README", "index"], "filenames": ["README.md", "index.rst"], "titles": ["Git2S3", "Welcome to Git2S3\u2019s documentation!"], "terms": {"backup": 0, "github": [0, 1], "project": 0, "aw": 0, "s3": 0, "platform": 0, "support": 0, "deploy": 0, "recommend": 0, "instal": 0, "python": 0, "3": 0, "10": 0, "11": 0, "us": [0, 1], "dedic": 0, "virtual": 0, "m": 0, "pip": 0, "initi": 0, "id": 0, "import": 0, "__name__": 0, "__main__": 0, "git": 0, "start": [0, 1], "cli": 0, "help": 0, "usag": 0, "instruct": 0, "sourc": 0, "from": [0, 1], "an": [0, 1], "env": [0, 1], "file": [0, 1], "can": 0, "ani": 0, "plaintext": 0, "json": [0, 1], "yaml": 0, "The": 0, "filepath": 0, "should": 0, "provid": 0, "argument": [0, 1], "dure": 0, "object": [0, 1], "instanti": [0, 1], "By": 0, "default": [0, 1], "look": 0, "current": 0, "work": 0, "directori": 0, "exampl": 0, "path": [0, 1], "env_fil": [0, 1], "_": 0, "api": 0, "url": [0, 1], "endpoint": 0, "http": [0, 1], "com": [0, 1], "owner": [0, 1], "profil": [0, 1], "organ": [0, 1], "name": [0, 1], "token": 0, "get": [0, 1], "all": [0, 1], "repo": [0, 1], "includ": 0, "privat": [0, 1], "field": [0, 1], "option": [0, 1], "restor": 0, "log": [0, 1], "stdout": [0, 1], "doe": 0, "appli": 0, "when": 0, "custom": 0, "logger": [0, 1], "i": 0, "debug": [0, 1], "boolean": 0, "flag": 0, "enabl": 0, "level": 0, "config": [0, 1], "valu": [0, 1], "aws_default_profil": 0, "access": 0, "kei": 0, "aws_access_key_id": [0, 1], "secret": 0, "aws_secret_access_kei": [0, 1], "region": 0, "bucket": 0, "": 0, "aws_default_region": 0, "store": [0, 1], "prefix": 0, "folder": 0, "like": 0, "boto3": 0, "retri": 0, "attempt": 0, "number": [0, 1], "client": 0, "mode": 0, "configur": 0, "docstr": 0, "format": 0, "googl": 0, "style": 0, "convent": 0, "pep": 0, "8": 0, "isort": 0, "requir": 0, "gitvers": 0, "revers": 0, "f": 0, "release_not": 0, "rst": 0, "t": 0, "pre": 0, "commit": 0, "ensur": 0, "run": 0, "pytest": 0, "gener": [0, 1], "valid": [0, 1], "hyperlink": 0, "markdown": 0, "wiki": [0, 1], "page": [0, 1], "sphinx": 0, "5": 0, "1": 0, "recommonmark": 0, "org": [0, 1], "thevickypedia": 0, "io": 0, "vignesh": 0, "rao": 0, "under": 0, "mit": 0, "kick": 1, "off": 1, "environ": 1, "variabl": 1, "code": 1, "standard": 1, "releas": 1, "note": 1, "lint": 1, "pypi": 1, "packag": 1, "runbook": 1, "licens": 1, "copyright": 1, "class": 1, "str": 1, "o": 1, "pathlik": 1, "none": 1, "max_per_pag": 1, "int": 1, "100": 1, "clone": 1, "gist": 1, "upload": 1, "keyword": 1, "bring": 1, "your": 1, "own": 1, "maximum": 1, "fetch": 1, "per": 1, "profile_typ": 1, "type": 1, "return": 1, "get_al": 1, "dict": 1, "iter": 1, "through": 1, "target": 1, "avail": 1, "repositori": 1, "paramet": 1, "yield": 1, "dictionari": 1, "each": 1, "inform": 1, "clone_wiki": 1, "model": 1, "worker": 1, "payload": 1, "rais": 1, "except": 1, "If": 1, "thread": 1, "fail": 1, "cloner": 1, "func": 1, "callabl": 1, "concurr": 1, "function": 1, "refer": 1, "commun": 1, "discuss": 1, "44515": 1, "process": 1, "envconfig": 1, "upload_fil": 1, "local_file_path": 1, "s3_file_path": 1, "local": 1, "trigger": 1, "env_load": 1, "filenam": 1, "load": 1, "base": 1, "filetyp": 1, "where": 1, "var": 1, "have": 1, "field_detector": 1, "detect": 1, "default_logg": 1, "consol": 1, "basemodel": 1, "clone_url": 1, "descript": 1, "bool": 1, "model_computed_field": 1, "classvar": 1, "computedfieldinfo": 1, "A": 1, "comput": 1, "correspond": 1, "baseset": 1, "pydant": 1, "git_api_url": 1, "git_own": 1, "git_token": 1, "union": 1, "list": 1, "logopt": 1, "aws_profile_nam": 1, "aws_region_nam": 1, "aws_bucket_nam": 1, "aws_s3_prefix": 1, "boto3_retry_attempt": 1, "boto3_retry_mod": 1, "boto3retrymod": 1, "classmethod": 1, "from_env_fil": 1, "creat": 1, "instanc": 1, "ar": 1, "addit": 1, "featur": 1, "both": 1, "system": 1, "session": 1, "parse_field": 1, "pars": 1, "remov": 1, "parse_git_api_url": 1, "strip": 1, "end": 1, "env_prefix": 1, "extra": 1, "allow": 1, "hide_input_in_error": 1, "true": 1, "strenum": 1, "index": 1, "modul": 1, "search": 1}, "objects": {"git2s3.config": [[1, 0, 1, "", "EnvConfig"], [1, 0, 1, "", "Field"], [1, 0, 1, "", "Fields"], [1, 0, 1, "", "LogOptions"]], "git2s3.config.EnvConfig": [[1, 0, 1, "", "Config"], [1, 1, 1, "", "aws_access_key_id"], [1, 1, 1, "", "aws_bucket_name"], [1, 1, 1, "", "aws_profile_name"], [1, 1, 1, "", "aws_region_name"], [1, 1, 1, "", "aws_s3_prefix"], [1, 1, 1, "", "aws_secret_access_key"], [1, 1, 1, "", "boto3_retry_attempts"], [1, 1, 1, "", "boto3_retry_mode"], [1, 1, 1, "", "debug"], [1, 1, 1, "", "fields"], [1, 2, 1, "", "from_env_file"], [1, 1, 1, "", "git_api_url"], [1, 1, 1, "", "git_owner"], [1, 1, 1, "", "git_token"], [1, 1, 1, "", "log"], [1, 2, 1, "", "parse_fields"], [1, 2, 1, "", "parse_git_api_url"]], "git2s3.config.EnvConfig.Config": [[1, 1, 1, "", "env_prefix"], [1, 1, 1, "", "extra"], [1, 1, 1, "", "hide_input_in_errors"]], "git2s3.config.Field": [[1, 1, 1, "", "clone_url"], [1, 1, 1, "", "description"], [1, 1, 1, "", "field"], [1, 1, 1, "", "model_computed_fields"], [1, 1, 1, "", "name"], [1, 1, 1, "", "private"]], "git2s3.config.Fields": [[1, 1, 1, "", "all"], [1, 1, 1, "", "gist"], [1, 1, 1, "", "repo"], [1, 1, 1, "", "wiki"]], "git2s3.config.LogOptions": [[1, 1, 1, "", "file"], [1, 1, 1, "", "stdout"]], "git2s3": [[1, 3, 0, "-", "main"], [1, 3, 0, "-", "s3"], [1, 3, 0, "-", "squire"]], "git2s3.main": [[1, 0, 1, "", "Git2S3"]], "git2s3.main.Git2S3": [[1, 2, 1, "", "clone_wiki"], [1, 2, 1, "", "cloner"], [1, 2, 1, "", "get_all"], [1, 2, 1, "", "profile_type"], [1, 2, 1, "", "start"], [1, 2, 1, "", "worker"]], "git2s3.s3": [[1, 0, 1, "", "Uploader"]], "git2s3.s3.Uploader": [[1, 2, 1, "", "trigger"], [1, 2, 1, "", "upload_file"]], "git2s3.squire": [[1, 4, 1, "", "default_logger"], [1, 4, 1, "", "env_loader"], [1, 4, 1, "", "field_detector"]]}, "objtypes": {"0": "py:class", "1": "py:attribute", "2": "py:method", "3": "py:module", "4": "py:function"}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "attribute", "Python attribute"], "2": ["py", "method", "Python method"], "3": ["py", "module", "Python module"], "4": ["py", "function", "Python function"]}, "titleterms": {"git2s3": [0, 1], "kick": 0, "off": 0, "environ": 0, "variabl": 0, "code": 0, "standard": 0, "releas": 0, "note": 0, "lint": 0, "pypi": 0, "packag": 0, "runbook": 0, "licens": 0, "copyright": 0, "welcom": 1, "": 1, "document": 1, "content": 1, "main": 1, "s3": 1, "squir": 1, "configur": 1, "indic": 1, "tabl": 1}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 6, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 56}}) \ No newline at end of file +Search.setIndex({"docnames": ["README", "index"], "filenames": ["README.md", "index.rst"], "titles": ["Git2S3", "Welcome to Git2S3\u2019s documentation!"], "terms": {"backup": 0, "github": [0, 1], "project": 0, "aw": 0, "s3": 0, "platform": 0, "support": [0, 1], "deploy": 0, "recommend": 0, "instal": [0, 1], "python": 0, "3": 0, "10": 0, "11": 0, "us": [0, 1], "dedic": 0, "virtual": 0, "m": 0, "pip": 0, "initi": 0, "id": 0, "import": 0, "__name__": 0, "__main__": 0, "git": [0, 1], "start": [0, 1], "cli": 0, "help": 0, "usag": 0, "instruct": 0, "sourc": [0, 1], "from": [0, 1], "an": [0, 1], "env": [0, 1], "file": [0, 1], "By": 0, "default": [0, 1], "look": 0, "current": 0, "work": 0, "directori": [0, 1], "refer": [0, 1], "sampl": 0, "exampl": 0, "_": 0, "api": 0, "url": [0, 1], "endpoint": 0, "http": [0, 1], "com": [0, 1], "owner": [0, 1], "profil": [0, 1], "organ": [0, 1], "name": [0, 1], "token": 0, "get": [0, 1], "all": [0, 1], "repo": [0, 1], "includ": 0, "privat": [0, 1], "ignor": 0, "list": [0, 1], "repositori": [0, 1], "gist": [0, 1], "option": [0, 1], "wiki": [0, 1], "back": 0, "up": 0, "log": [0, 1], "stdout": [0, 1], "doe": 0, "appli": 0, "when": [0, 1], "custom": 0, "logger": [0, 1], "i": [0, 1], "debug": [0, 1], "boolean": [0, 1], "flag": [0, 1], "enabl": 0, "level": 0, "config": [0, 1], "valu": [0, 1], "aws_default_profil": 0, "access": 0, "kei": 0, "aws_access_key_id": [0, 1], "secret": 0, "aws_secret_access_kei": [0, 1], "region": 0, "bucket": 0, "": 0, "aws_default_region": 0, "store": [0, 1], "prefix": 0, "folder": 0, "like": 0, "boto3": 0, "retri": 0, "attempt": 0, "number": [0, 1], "client": 0, "mode": 0, "configur": 0, "docstr": 0, "format": 0, "googl": 0, "style": 0, "convent": 0, "pep": 0, "8": 0, "isort": 0, "requir": 0, "gitvers": 0, "revers": 0, "f": 0, "release_not": 0, "rst": 0, "t": [0, 1], "pre": 0, "commit": 0, "ensur": 0, "run": 0, "pytest": 0, "gener": [0, 1], "valid": [0, 1], "hyperlink": 0, "markdown": 0, "page": [0, 1], "sphinx": 0, "5": 0, "1": 0, "recommonmark": 0, "org": [0, 1], "thevickypedia": 0, "io": 0, "vignesh": 0, "rao": 0, "under": 0, "mit": 0, "kick": 1, "off": 1, "environ": 1, "variabl": 1, "code": 1, "standard": 1, "releas": 1, "note": 1, "lint": 1, "pypi": 1, "packag": 1, "runbook": 1, "licens": 1, "copyright": 1, "class": 1, "env_fil": 1, "str": 1, "o": 1, "pathlik": 1, "none": 1, "max_per_pag": 1, "int": 1, "100": 1, "instanti": 1, "object": 1, "clone": 1, "upload": 1, "keyword": 1, "argument": 1, "bring": 1, "your": 1, "own": 1, "maximum": 1, "fetch": 1, "per": 1, "profile_typ": 1, "type": 1, "return": 1, "get_al": 1, "sourcecontrol": 1, "dict": 1, "iter": 1, "through": 1, "target": 1, "avail": 1, "paramet": 1, "yield": 1, "dictionari": 1, "each": 1, "inform": 1, "clone_wiki": 1, "datastor": 1, "model": 1, "worker": 1, "json": 1, "payload": 1, "rais": 1, "If": 1, "thread": 1, "fail": 1, "cloner": 1, "bool": 1, "concurr": 1, "threadpoolexecutor": 1, "doesn": 1, "have": 1, "rate": 1, "limit": 1, "so": 1, "multi": 1, "safe": 1, "thi": 1, "make": 1, "depend": 1, "host": 1, "machin": 1, "commun": 1, "discuss": 1, "44515": 1, "ani": 1, "process": 1, "onc": 1, "complet": 1, "successfulli": 1, "envconfig": 1, "upload_fil": 1, "local_file_path": 1, "s3_file_path": 1, "local": 1, "path": 1, "trigger": 1, "env_load": 1, "filenam": 1, "load": 1, "base": 1, "filetyp": 1, "where": 1, "var": 1, "source_detector": 1, "detect": 1, "default_logg": 1, "consol": 1, "check_file_pres": 1, "root": 1, "subdirectori": 1, "check": 1, "presenc": 1, "ar": 1, "present": 1, "basemodel": 1, "clone_url": 1, "descript": 1, "baseset": 1, "pydant": 1, "git_api_url": 1, "git_own": 1, "git_token": 1, "git_ignor": 1, "union": 1, "logopt": 1, "aws_profile_nam": 1, "aws_region_nam": 1, "aws_bucket_nam": 1, "aws_s3_prefix": 1, "boto3_retry_attempt": 1, "boto3_retry_mod": 1, "boto3retrymod": 1, "classmethod": 1, "from_env_fil": 1, "creat": 1, "instanc": 1, "addit": 1, "featur": 1, "both": 1, "system": 1, "session": 1, "parse_sourc": 1, "pars": 1, "remov": 1, "parse_git_api_url": 1, "strip": 1, "end": 1, "parse_git_ignor": 1, "convert": 1, "lowercas": 1, "env_prefix": 1, "extra": 1, "allow": 1, "hide_input_in_error": 1, "true": 1, "strenum": 1, "control": 1, "exc": 1, "directoryexist": 1, "warn": 1, "alreadi": 1, "exist": 1, "unsupportedsourc": 1, "git2s3error": 1, "githubapierror": 1, "invalidown": 1, "invalid": 1, "invalidsourc": 1, "archiveerror": 1, "archiv": 1, "uploaderror": 1, "index": 1, "modul": 1, "search": 1}, "objects": {"git2s3.config": [[1, 0, 1, "", "DataStore"], [1, 0, 1, "", "EnvConfig"], [1, 0, 1, "", "LogOptions"], [1, 0, 1, "", "SourceControl"]], "git2s3.config.DataStore": [[1, 1, 1, "", "clone_url"], [1, 1, 1, "", "description"], [1, 1, 1, "", "name"], [1, 1, 1, "", "private"], [1, 1, 1, "", "source"]], "git2s3.config.EnvConfig": [[1, 0, 1, "", "Config"], [1, 1, 1, "", "aws_access_key_id"], [1, 1, 1, "", "aws_bucket_name"], [1, 1, 1, "", "aws_profile_name"], [1, 1, 1, "", "aws_region_name"], [1, 1, 1, "", "aws_s3_prefix"], [1, 1, 1, "", "aws_secret_access_key"], [1, 1, 1, "", "boto3_retry_attempts"], [1, 1, 1, "", "boto3_retry_mode"], [1, 1, 1, "", "debug"], [1, 2, 1, "", "from_env_file"], [1, 1, 1, "", "git_api_url"], [1, 1, 1, "", "git_ignore"], [1, 1, 1, "", "git_owner"], [1, 1, 1, "", "git_token"], [1, 1, 1, "", "log"], [1, 2, 1, "", "parse_git_api_url"], [1, 2, 1, "", "parse_git_ignore"], [1, 2, 1, "", "parse_source"], [1, 1, 1, "", "source"]], "git2s3.config.EnvConfig.Config": [[1, 1, 1, "", "env_prefix"], [1, 1, 1, "", "extra"], [1, 1, 1, "", "hide_input_in_errors"]], "git2s3.config.LogOptions": [[1, 1, 1, "", "file"], [1, 1, 1, "", "stdout"]], "git2s3.config.SourceControl": [[1, 1, 1, "", "all"], [1, 1, 1, "", "gist"], [1, 1, 1, "", "repo"], [1, 1, 1, "", "wiki"]], "git2s3": [[1, 3, 0, "-", "exc"], [1, 3, 0, "-", "main"], [1, 3, 0, "-", "s3"], [1, 3, 0, "-", "squire"]], "git2s3.exc": [[1, 4, 1, "", "ArchiveError"], [1, 4, 1, "", "DirectoryExists"], [1, 4, 1, "", "Git2S3Error"], [1, 4, 1, "", "GitHubAPIError"], [1, 4, 1, "", "InvalidOwner"], [1, 4, 1, "", "InvalidSource"], [1, 4, 1, "", "UnsupportedSource"], [1, 4, 1, "", "UploadError"]], "git2s3.main": [[1, 0, 1, "", "Git2S3"]], "git2s3.main.Git2S3": [[1, 2, 1, "", "clone_wiki"], [1, 2, 1, "", "cloner"], [1, 2, 1, "", "get_all"], [1, 2, 1, "", "profile_type"], [1, 2, 1, "", "start"], [1, 2, 1, "", "worker"]], "git2s3.s3": [[1, 0, 1, "", "Uploader"]], "git2s3.s3.Uploader": [[1, 2, 1, "", "trigger"], [1, 2, 1, "", "upload_file"]], "git2s3.squire": [[1, 5, 1, "", "check_file_presence"], [1, 5, 1, "", "default_logger"], [1, 5, 1, "", "env_loader"], [1, 5, 1, "", "source_detector"]]}, "objtypes": {"0": "py:class", "1": "py:attribute", "2": "py:method", "3": "py:module", "4": "py:exception", "5": "py:function"}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "attribute", "Python attribute"], "2": ["py", "method", "Python method"], "3": ["py", "module", "Python module"], "4": ["py", "exception", "Python exception"], "5": ["py", "function", "Python function"]}, "titleterms": {"git2s3": [0, 1], "kick": 0, "off": 0, "environ": 0, "variabl": 0, "code": 0, "standard": 0, "releas": 0, "note": 0, "lint": 0, "pypi": 0, "packag": 0, "runbook": 0, "licens": 0, "copyright": 0, "welcom": 1, "": 1, "document": 1, "content": 1, "main": 1, "s3": 1, "squir": 1, "configur": 1, "except": 1, "indic": 1, "tabl": 1}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 6, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx": 56}}) \ No newline at end of file diff --git a/git2s3/config.py b/git2s3/config.py index c01e3de..bfaafd7 100644 --- a/git2s3/config.py +++ b/git2s3/config.py @@ -1,5 +1,6 @@ import pathlib import sys +import time from typing import List, Optional from pydantic import BaseModel, DirectoryPath, HttpUrl, field_validator @@ -25,10 +26,10 @@ class LogOptions(StrEnum): file: str = "file" -class Fields(StrEnum): - """Available fields to clone. +class SourceControl(StrEnum): + """Available source control options to clone. - >>> Fields + >>> SourceControl """ @@ -38,14 +39,14 @@ class Fields(StrEnum): wiki: str = "wiki" -class Field(BaseModel): - """Field model to store repository/gist information. +class DataStore(BaseModel): + """DataStore model to store repository/gist information. - >>> Field + >>> DataStore """ - field: Fields + source: SourceControl clone_url: HttpUrl name: str description: Optional[str] = None @@ -75,8 +76,9 @@ class EnvConfig(BaseSettings): git_api_url: HttpUrl = "https://api.github.com/" git_owner: str git_token: str + git_ignore: List[str] = [] - fields: Fields | List[Fields] = Fields.all + source: SourceControl | List[SourceControl] = SourceControl.all log: LogOptions = LogOptions.stdout debug: bool = False @@ -85,7 +87,7 @@ class EnvConfig(BaseSettings): aws_secret_access_key: str | None = None aws_region_name: str | None = None aws_bucket_name: str | None = None - aws_s3_prefix: str = "github" + aws_s3_prefix: str = f"github_{int(time.time())}" boto3_retry_attempts: int = 10 boto3_retry_mode: Boto3RetryMode = Boto3RetryMode.standard @@ -106,22 +108,34 @@ def from_env_file(cls, filename: pathlib.Path) -> "EnvConfig": """ return cls(_env_file=filename) - @field_validator("fields", mode="after", check_fields=True) - def parse_fields(cls, value: Fields | List[Fields]) -> DirectoryPath: - """Validate and parse 'fields' to remove 'all' from the fields option.""" + @field_validator("source", mode="after", check_fields=True) + def parse_source(cls, value: SourceControl | List[SourceControl]) -> DirectoryPath: + """Validate and parse 'source' to remove 'all' from the source option.""" if isinstance(value, list): - if value == [Fields.all] or Fields.all in value: - return [Fields.repo, Fields.gist, Fields.wiki] + if value == [SourceControl.all] or SourceControl.all in value: + return [SourceControl.repo, SourceControl.gist, SourceControl.wiki] + if SourceControl.repo not in value: + raise ValueError( + f"{value!r} must contain {SourceControl.repo.value!r} as a source type" + ) return value - if value == Fields.all: - return [Fields.repo, Fields.gist, Fields.wiki] - raise ValueError(f"{value!r} is not a valid field type") + if value == SourceControl.all: + return [SourceControl.repo, SourceControl.gist, SourceControl.wiki] + value = [value] + if SourceControl.repo in value: + return value + raise ValueError(f"Must contain {SourceControl.repo.value!r} as a source type") @field_validator("git_api_url", mode="after", check_fields=True) def parse_git_api_url(cls, value: HttpUrl) -> str: """Parse git_api_url stripping the ``/`` at the end.""" return str(value).rstrip("/") + @field_validator("git_ignore", mode="after", check_fields=True) + def parse_git_ignore(cls, value: List[str]) -> List[str]: + """Convert all git_ignore values to lowercase.""" + return [v.lower() for v in value] + class Config: """Environment variables configuration.""" diff --git a/git2s3/exc.py b/git2s3/exc.py new file mode 100644 index 0000000..39e3ddc --- /dev/null +++ b/git2s3/exc.py @@ -0,0 +1,30 @@ +class DirectoryExists(ResourceWarning): + """Warning: Raised when clone directory already exists.""" + + +class UnsupportedSource(UserWarning): + """Warning: Raised when source is not supported.""" + + +class Git2S3Error(Exception): + """Exception: Base class for all exceptions.""" + + +class GitHubAPIError(Git2S3Error): + """Exception: Raised when failed to fetch repositories from source control.""" + + +class InvalidOwner(GitHubAPIError): + """Exception: Raised when owner is invalid.""" + + +class InvalidSource(GitHubAPIError): + """Exception: Raised when source is invalid.""" + + +class ArchiveError(Git2S3Error): + """Exception: Raised when failed to archive repositories.""" + + +class UploadError(Git2S3Error): + """Exception: Raised when failed to upload file objects to S3.""" diff --git a/git2s3/main.py b/git2s3/main.py index 28fe006..3cb6823 100644 --- a/git2s3/main.py +++ b/git2s3/main.py @@ -1,5 +1,4 @@ import logging -import multiprocessing import os import secrets import shutil @@ -7,13 +6,14 @@ import warnings from collections.abc import Generator from concurrent.futures import ThreadPoolExecutor, as_completed -from typing import Callable, Dict +from multiprocessing.pool import ThreadPool +from typing import Dict import git import requests from git.exc import GitCommandError -from git2s3 import config, s3, squire +from git2s3 import config, exc, s3, squire class Git2S3: @@ -37,9 +37,8 @@ def __init__( """Instantiates Git2S3 object to clone all repos/wiki/gists from GitHub and upload to S3.""" assert 1 <= max_per_page <= 100, "'max_per_page' must be between 1 and 100" self.per_page = max_per_page - self.src_logger = logger self.env = squire.env_loader(env_file) - self.logger = self.src_logger or squire.default_logger(self.env) + self.logger = logger or squire.default_logger(self.env) self.session = requests.Session() self.session.headers = { "Accept": "application/vnd.github+json", @@ -47,16 +46,25 @@ def __init__( "X-GitHub-Api-Version": "2022-11-28", "Content-Type": "application/x-www-form-urlencoded", } + # fixme: clone might fail with authentication error if git CLI isn't configured self.repo = git.Repo() self.clone_dir = os.path.join(os.getcwd(), self.env.git_owner) + warnings.simplefilter("always", exc.DirectoryExists) + warnings.simplefilter("always", exc.UnsupportedSource) + if os.path.isdir(self.clone_dir) and os.listdir(self.clone_dir): + warnings.warn( + "The clone directory is not empty. Deleting the contents to avoid conflicts.", + exc.DirectoryExists, + ) + shutil.rmtree(self.clone_dir) profile = self.profile_type() if profile == "orgs": - if config.Fields.gist in self.env.fields: + if config.SourceControl.gist in self.env.source: warnings.warn( - f"Gists are not supported for organizations. Removing {config.Fields.gist!r} from the fields.", - UserWarning, + f"Gists are not supported for organizations. Removing {config.SourceControl.gist!r} from source.", + exc.UnsupportedSource, ) - self.env.fields.remove(config.Fields.gist) + self.env.source.remove(config.SourceControl.gist) self.base_url = f"{self.env.git_api_url}/{profile}/{self.env.git_owner}" def profile_type(self) -> str: @@ -82,28 +90,28 @@ def profile_type(self) -> str: return "users" except (requests.RequestException, AssertionError): pass - raise Exception( + raise exc.InvalidOwner( f"Failed to get the profile type for {self.env.git_owner}. Please check the owner/organization name." ) - def get_all(self, field: config.Fields) -> Generator[Dict[str, str]]: + def get_all(self, source: config.SourceControl) -> Generator[Dict[str, str]]: """Iterate through a target owner/organization to get all available repositories/gists. Args: - field: Field type to clone. + source: Source type to clone. Yields: Generator[Dict[str, str]]: Yields a dictionary of each repo's information. """ - if field == config.Fields.repo: + if source == config.SourceControl.repo: endpoint = f"{self.base_url}/repos" - elif field == config.Fields.gist: + elif source == config.SourceControl.gist: endpoint = f"{self.base_url}/gists" else: # This won't occur programmatically, but here just in case - raise ValueError( - f"Invalid field type. Please choose from {config.Fields.repo.value!r} or {config.Fields.gist.value!r}" + raise exc.InvalidSource( + f"Invalid field type. Please choose from {config.SourceControl.repo!r} or {config.SourceControl.gist!r}" ) idx = 1 while True: @@ -115,6 +123,10 @@ def get_all(self, field: config.Fields) -> Generator[Dict[str, str]]: assert response.ok, response.text except (requests.RequestException, AssertionError) as error: self.logger.error("Failed to fetch repos on page: %d - %s", idx, error) + if idx == 1: + raise exc.GitHubAPIError( + f"Failed to fetch {source.value}s from {self.env.git_owner!r}." + ) break json_response = response.json() if json_response: @@ -128,22 +140,24 @@ def get_all(self, field: config.Fields) -> Generator[Dict[str, str]]: self.logger.debug("No repos found in page: %d, ending loop.", idx) break - def clone_wiki(self, field: config.Field) -> None: + def clone_wiki(self, datastore: config.DataStore) -> None: """Clone all the wikis from the repository. Args: - field: Field model to store repository/gist information. + datastore: DataStore model to store repository/gist information. """ - field.field = config.Fields.wiki.value - self.logger.debug("Cloning wiki for %s", field.name) - wiki_url = str(field.clone_url).replace(".git", ".wiki.git") - if field.private: + datastore.source = config.SourceControl.wiki.value + self.logger.debug("Cloning wiki for %s", datastore.name) + wiki_url = str(datastore.clone_url).replace(".git", ".wiki.git") + if datastore.private: wiki_dest = str( - os.path.join(self.clone_dir, field.field, "private", field.name) + os.path.join( + self.clone_dir, datastore.source, "private", datastore.name + ) ) else: wiki_dest = str( - os.path.join(self.clone_dir, field.field, "public", field.name) + os.path.join(self.clone_dir, datastore.source, "public", datastore.name) ) if not os.path.isdir(wiki_dest): os.makedirs(wiki_dest) @@ -165,33 +179,38 @@ def worker(self, repo: Dict[str, str]) -> None: Exception: If the thread fails to clone the repository. """ - target = squire.field_detector(repo, self.env) - self.logger.info("Cloning %s: %s", target.field, target.name) - if target.private: + datastore = squire.source_detector(repo, self.env) + self.logger.info("Cloning %s: %s", datastore.source, datastore.name) + if datastore.private: repo_dest = str( - os.path.join(self.clone_dir, target.field.value, "private", target.name) + os.path.join( + self.clone_dir, datastore.source.value, "private", datastore.name + ) ) else: repo_dest = str( - os.path.join(self.clone_dir, target.field.value, "public", target.name) + os.path.join( + self.clone_dir, datastore.source.value, "public", datastore.name + ) ) # only repos have this field anyway - if config.Fields.wiki in self.env.fields and repo.get("has_wiki"): - # run as daemon and don't care about the output + if config.SourceControl.wiki in self.env.source and repo.get("has_wiki"): + # run as daemon and don't care about the output for wiki + # 'has_wiki' flag will always be true even if there are no files to clone threading.Thread( - target=self.clone_wiki, args=(target,), daemon=True + target=self.clone_wiki, args=(datastore,), daemon=True ).start() if not os.path.isdir(repo_dest): os.makedirs(repo_dest) try: - self.repo.clone_from(target.clone_url, repo_dest) + self.repo.clone_from(datastore.clone_url, repo_dest) try: - if target.description: + if datastore.description: desc_file = os.path.join( repo_dest, f"description_{secrets.token_hex(2)}.txt" ) with open(desc_file, "w") as desc: - desc.write(target.description) + desc.write(datastore.description) desc.flush() except Exception as warning: # Adding description file is only an added feature, so no need to fail @@ -200,70 +219,81 @@ def worker(self, repo: Dict[str, str]) -> None: if os.path.isfile(f"{repo_dest}.zip"): shutil.rmtree(repo_dest) else: - self.logger.error("Failed to create a zip file for %s", target.name) - raise Exception(f"Failed to create a zip file for {target.name}") + self.logger.error("Failed to create a zip file for %s", datastore.name) + raise exc.ArchiveError( + f"Failed to create a zip file for {datastore.name!r}" + ) except GitCommandError as error: msg = error.stderr or error.stdout or "" msg = msg.strip().replace("\n", "").replace("'", "").replace('"', "") self.logger.error(msg) # Raise an exception to indicate that the thread failed - raise Exception(msg) + raise exc.Git2S3Error(msg) - def cloner(self, func: Callable, field: str) -> None: + def cloner(self, source: config.SourceControl) -> bool: """Clones all the repos/gists concurrently. Args: - func: Function to get all repos/gists. - field: Field type to clone. + source: Source type to clone. + + See Also: + - Clones all the repos/gists concurrently using ThreadPoolExecutor. + - GitHub doesn't have a rate limit for cloning, so multi-threading is safe. + - This makes it depend on Git installed on the host machine. References: https://github.com/orgs/community/discussions/44515 + + Returns: + bool: + Returns a boolean flag to indicate if any of the threads failed. """ - # Reload logger for child process - self.logger = self.src_logger or squire.default_logger(self.env) futures = {} with ThreadPoolExecutor(max_workers=os.cpu_count()) as executor: - for repo in func(field): + for repo in self.get_all(source): + identifier = repo.get("name") or repo.get("id") + if identifier.lower() in self.env.git_ignore: + self.logger.info("Skipping %s: '%s'", source, identifier) + continue future = executor.submit(self.worker, repo) - futures[future] = repo.get("name") or repo.get("id") + futures[future] = identifier + return_flag = True for future in as_completed(futures): if future.exception(): self.logger.error( "Thread cloning the %s '%s' received an exception: %s", - field, + source, futures[future], future.exception(), ) + return_flag = False + return return_flag def start(self) -> None: - """Start the cloning process.""" + """Start the cloning process and upload to S3 once cloning completes successfully.""" self.logger.info("Starting cloning process...") # Both processes run concurrently, calling the same function with different arguments processes = [ - multiprocessing.Process( - target=self.cloner, - args=( - self.get_all, - config.Fields.repo, - ), + ThreadPool(processes=1).apply_async( + self.cloner, args=(config.SourceControl.repo,) ) ] - if config.Fields.gist in self.env.fields: + if config.SourceControl.gist in self.env.source: processes.append( - multiprocessing.Process( - target=self.cloner, - args=( - self.get_all, - config.Fields.gist, - ), + ThreadPool(processes=1).apply_async( + self.cloner, args=(config.SourceControl.gist,) ) ) - for process in processes: - process.start() - for process in processes: - process.join() + if not all([process.get() for process in processes]): + self.logger.error( + "Cloning process did not complete successfully. Skipping S3 backup." + ) + return self.logger.info("Cloning process completed.") - self.logger.info("Initiating S3 upload process...") - s3_upload = s3.Uploader(self.env, self.logger) - s3_upload.trigger() - self.logger.info("S3 upload process completed.") + if squire.check_file_presence(self.clone_dir): + self.logger.info("Initiating S3 upload process...") + s3_upload = s3.Uploader(self.env, self.logger) + s3_upload.trigger() + self.logger.info("S3 upload process completed.") + else: + self.logger.warning("No files found to upload to S3.") diff --git a/git2s3/s3.py b/git2s3/s3.py index 122e234..1d4fb13 100644 --- a/git2s3/s3.py +++ b/git2s3/s3.py @@ -6,7 +6,7 @@ from botocore.config import Config from botocore.exceptions import BotoCoreError, ClientError -from git2s3.config import EnvConfig +from git2s3 import config, exc class Uploader: @@ -20,7 +20,7 @@ class Uploader: logger: Logger object. """ - def __init__(self, env: EnvConfig, logger: logging.Logger): + def __init__(self, env: config.EnvConfig, logger: logging.Logger): """Concurrent uploader object to upload files to S3. References: @@ -59,7 +59,7 @@ def upload_file( self.s3_client.upload_file(local_file_path, self.bucket, s3_file_path) self.logger.info("Uploaded '%s' to 's3://%s'", s3_file_path, self.bucket) except (FileNotFoundError, BotoCoreError, ClientError) as error: - raise Exception(error) + raise exc.UploadError(error) def trigger(self) -> None: """Trigger to upload all file objects concurrently to S3.""" diff --git a/git2s3/squire.py b/git2s3/squire.py index 0aa5865..df151a5 100644 --- a/git2s3/squire.py +++ b/git2s3/squire.py @@ -7,61 +7,61 @@ import yaml -from git2s3.config import EnvConfig, Field, Fields, LogOptions +from git2s3 import config -def env_loader(filename: str | os.PathLike) -> EnvConfig: +def env_loader(filename: str | os.PathLike) -> config.EnvConfig: """Loads environment variables based on filetypes. Args: filename: Filename from where env vars have to be loaded. Returns: - EnvConfig: + config.EnvConfig: Returns a reference to the ``EnvConfig`` object. """ env_file = pathlib.Path(filename) if env_file.suffix.lower() == ".json": with open(env_file) as stream: env_data = json.load(stream) - return EnvConfig(**{k.lower(): v for k, v in env_data.items()}) + return config.EnvConfig(**{k.lower(): v for k, v in env_data.items()}) elif env_file.suffix.lower() in (".yaml", ".yml"): with open(env_file) as stream: env_data = yaml.load(stream, yaml.FullLoader) - return EnvConfig(**{k.lower(): v for k, v in env_data.items()}) + return config.EnvConfig(**{k.lower(): v for k, v in env_data.items()}) elif not env_file.suffix or env_file.suffix.lower() in ( ".text", ".txt", "", ): - return EnvConfig.from_env_file(env_file) + return config.EnvConfig.from_env_file(env_file) else: raise ValueError( "\n\tUnsupported format for 'env_file', can be one of (.json, .yaml, .yml, .txt, .text, or null)" ) -def field_detector(repo: Dict[str, str], env: EnvConfig) -> Field: - """Detects the type of field to clone and returns the Field model. +def source_detector(repo: Dict[str, str], env: config.EnvConfig) -> config.DataStore: + """Detects the type of source to clone and returns the DataStore model. Args: repo: Repository information as a dict. env: Environment configuration. Returns: - Field: - Field model. + config.DataStore: + DataStore model. """ if repo.get("comments_url") == f"{env.git_api_url}/gists/{repo['id']}/comments": - return Field( - field=Fields.gist, + return config.DataStore( + source=config.SourceControl.gist, clone_url=repo["git_pull_url"], name=repo["id"], description=repo["description"], private=not repo["public"], ) - return Field( - field=Fields.repo, + return config.DataStore( + source=config.SourceControl.repo, clone_url=repo["clone_url"], name=repo["name"], description=repo["description"], @@ -69,7 +69,7 @@ def field_detector(repo: Dict[str, str], env: EnvConfig) -> Field: ) -def default_logger(env: EnvConfig) -> logging.Logger: +def default_logger(env: config.EnvConfig) -> logging.Logger: """Generates a default console logger. Args: @@ -79,7 +79,7 @@ def default_logger(env: EnvConfig) -> logging.Logger: logging.Logger: Logger object. """ - if env.log == LogOptions.file: + if env.log == config.LogOptions.file: if not os.path.isdir("logs"): os.mkdir("logs") logfile: str = datetime.now().strftime( @@ -100,3 +100,27 @@ def default_logger(env: EnvConfig) -> logging.Logger: ) logger.addHandler(hdlr=handler) return logger + + +def check_file_presence(root: str | os.PathLike) -> bool: + """Get a list of all subdirectories and check for file presence. + + Args: + root: Root directory to check for file presence. + + Returns: + bool: + Returns a bool indicating if files are present in the subdirectories. + """ + for subdir in [ + os.path.join(root, subdir) + for subdir in os.listdir(root) + if os.path.isdir(os.path.join(root, subdir)) + ]: + if [ + file + for file in os.listdir(subdir) + if os.path.isfile(os.path.join(subdir, file)) + ]: + return True + return False diff --git a/pyproject.toml b/pyproject.toml index b94e9f4..9c2dc9d 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -8,7 +8,7 @@ license = { file = "LICENSE" } classifiers = [ "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3", - "Development Status :: 2 - Pre-Alpha", # todo: change dev status to "Development Status :: 5 - Production/Stable" + "Development Status :: 3 - Alpha", # todo: change dev status to "Development Status :: 5 - Production/Stable" "Operating System :: MacOS :: MacOS X", "Operating System :: Microsoft :: Windows", "Operating System :: POSIX :: Linux", @@ -22,6 +22,7 @@ packages = ["git2s3"] [tool.setuptools.dynamic] version = {attr = "git2s3.version"} +dependencies = { file = ["requirements.txt"] } [project.scripts] # sends all the args to commandline function, where the arbitary commands as processed accordingly diff --git a/requirements.txt b/requirements.txt index e5821b6..2c68700 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,7 +1,7 @@ -boto3==1.34.133 -botocore==1.34.133 -GitPython==3.1.43 -pydantic==2.7.4 -pydantic-settings==2.3.4 -PyYAML>=6.0.1 -requests>=2.32.3 +boto3~=1.34.133 +botocore~=1.34.133 +GitPython~=3.1.43 +pydantic~=2.7.4 +pydantic-settings~=2.3.4 +PyYAML~=6.0.1 +requests~=2.32.3 diff --git a/samples/.env b/samples/.env new file mode 100644 index 0000000..6ea8853 --- /dev/null +++ b/samples/.env @@ -0,0 +1,6 @@ +GIT_TOKEN="ghp_321LnHdDexVEwvGv7dC1ktPSJZ3h9Zq7VdZK" +GIT_OWNER="rustic-monkey" +AWS_ACCESS_KEY_ID="PT2WG2ECGJ49V5AMFBG8" +AWS_SECRET_ACCESS_KEY="9cviz4r/KnT3LqSRd0dWt8XAbXt3MVWX9TNX13DU" +AWS_REGION_NAME="us-west-2" +AWS_BUCKET_NAME="github-to-s3-backup" diff --git a/samples/README.md b/samples/README.md new file mode 100644 index 0000000..ba15424 --- /dev/null +++ b/samples/README.md @@ -0,0 +1,34 @@ +## Sample Environment Variables + +Environment variables can be sourced using any `plaintext` / `JSON` / `YAML` file. + +The filepath should be provided as an argument during object instantiation. + +Samples values are randomly generated strings from https://pinetools.com/random-string-generator + +> _By default, `Git2S3` will look for a `.env` file in the current working directory._
+> Refer [samples] directory for examples. + +### Examples + +- PlainText: [.env] +- JSON: [secrets.json] +- YAML: [secrets.yaml] + +[.env]: .env +[secrets.json]: secrets.json +[secrets.yaml]: secrets.yaml + +### Usage + +- **CLI** +```shell +git2s3 start --env "/path/to/env/file" +``` + +- **IDE** +```python +import git2s3 +backup = git2s3.Git2S3(env_file='/path/to/env/file') +backup.start() +``` diff --git a/samples/secrets.json b/samples/secrets.json new file mode 100644 index 0000000..9c16ebd --- /dev/null +++ b/samples/secrets.json @@ -0,0 +1,8 @@ +{ + "GIT_TOKEN": "ghp_321LnHdDexVEwvGv7dC1ktPSJZ3h9Zq7VdZK", + "GIT_OWNER": "rustic-monkey", + "AWS_ACCESS_KEY_ID": "PT2WG2ECGJ49V5AMFBG8", + "AWS_SECRET_ACCESS_KEY": "9cviz4r/KnT3LqSRd0dWt8XAbXt3MVWX9TNX13DU", + "AWS_REGION_NAME": "us-west-2", + "AWS_BUCKET_NAME": "github-to-s3-backup" +} diff --git a/samples/secrets.yaml b/samples/secrets.yaml new file mode 100644 index 0000000..fa75d95 --- /dev/null +++ b/samples/secrets.yaml @@ -0,0 +1,6 @@ +GIT_TOKEN: "ghp_321LnHdDexVEwvGv7dC1ktPSJZ3h9Zq7VdZK" +GIT_OWNER: "rustic-monkey" +AWS_ACCESS_KEY_ID: "PT2WG2ECGJ49V5AMFBG8" +AWS_SECRET_ACCESS_KEY: "9cviz4r/KnT3LqSRd0dWt8XAbXt3MVWX9TNX13DU" +AWS_REGION_NAME: "us-west-2" +AWS_BUCKET_NAME: "github-to-s3-backup"