Skip to content

Commit

Permalink
fsspark -> fslite
Browse files Browse the repository at this point in the history
  • Loading branch information
ypriverol committed Sep 22, 2024
1 parent f15b4e8 commit ea15b18
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 14 deletions.
28 changes: 15 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,43 @@
[![Python application](https://github.com/enriquea/fsspark/actions/workflows/python-app.yml/badge.svg?branch=main)](https://github.com/enriquea/fsspark/actions/workflows/python-app.yml)
[![Python Package using Conda](https://github.com/enriquea/fsspark/actions/workflows/python-package-conda.yml/badge.svg?branch=main)](https://github.com/enriquea/fsspark/actions/workflows/python-package-conda.yml)
[![Python application](https://github.com/enriquea/fslite/actions/workflows/python-app.yml/badge.svg?branch=main)](https://github.com/enriquea/fslite/actions/workflows/python-app.yml)
[![Python Package using Conda](https://github.com/enriquea/fslite/actions/workflows/python-package-conda.yml/badge.svg?branch=main)](https://github.com/enriquea/fslite/actions/workflows/python-package-conda.yml)

# fsspark
# fslite

---

## Feature selection in Spark
### Memory-Efficient, High-Performance Feature Selection Library for Big and Small Datasets

### Description

`fsspark` is a python module to perform feature selection and machine learning based on spark.
Pipelines written using `fsspark` can be divided roughly in four major stages: 1) data pre-processing, 2) univariate
`fslite` is a python module to perform feature selection and machine learning using pre-built FS pipelines.
Pipelines written using `fslite` can be divided roughly in four major stages: 1) data pre-processing, 2) univariate
filters, 3) multivariate filters and 4) machine learning wrapped with cross-validation (**Figure 1**).

`fslite` is based on our previous work [feseR](https://github.com/enriquea/feseR); previously implemented in R and caret package; publication can be found [here](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0189875).

![Feature Selection flowchart](images/fs_workflow.png)
**Figure 1**. Feature selection workflow example implemented in fsspark.
**Figure 1**. Feature selection workflow example implemented in fslite.

### Documentation

The package documentation describes the [data structures](docs/README.data.md) and
[features selection methods](docs/README.methods.md) implemented in `fsspark`.
[features selection methods](docs/README.methods.md) implemented in `fslite`.

### Installation

- pip
```bash
git clone https://github.com/enriquea/fsspark.git
cd fsspark
git clone https://github.com/bigbio/fslite.git
cd fslite
pip install . -r requirements.txt
```

- conda
```bash
git clone https://github.com/enriquea/fsspark.git
cd fsspark
git clone https://github.com/bigbio/fslite.git
cd fslite
conda env create -f environment.yml
conda activate fsspark-venv
conda activate fslite-venv
pip install . -r requirements.txt
```

Expand Down
5 changes: 4 additions & 1 deletion fsspark/tests/test_fsdataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,4 +107,7 @@ def measure_memory_usage(n_samples: int, n_features: int, nan_prob = 0.01) -> fl
plt.show()

# Print results table
print(results_df.to_string(index=False))
print(results_df.to_string(index=False))

# Initialize FSDataFrame with DataFrame having sparse numerical features and insufficient memory for dense matrix

0 comments on commit ea15b18

Please sign in to comment.