fsspark -> fslite

bigbio · Sep 22, 2024 · ea15b18 · ea15b18
1 parent f15b4e8
commit ea15b18
Show file tree

Hide file tree

Showing 2 changed files with 19 additions and 14 deletions.
diff --git a/README.md b/README.md
@@ -1,41 +1,43 @@
-[![Python application](https://github.com/enriquea/fsspark/actions/workflows/python-app.yml/badge.svg?branch=main)](https://github.com/enriquea/fsspark/actions/workflows/python-app.yml)
-[![Python Package using Conda](https://github.com/enriquea/fsspark/actions/workflows/python-package-conda.yml/badge.svg?branch=main)](https://github.com/enriquea/fsspark/actions/workflows/python-package-conda.yml)
+[![Python application](https://github.com/enriquea/fslite/actions/workflows/python-app.yml/badge.svg?branch=main)](https://github.com/enriquea/fslite/actions/workflows/python-app.yml)
+[![Python Package using Conda](https://github.com/enriquea/fslite/actions/workflows/python-package-conda.yml/badge.svg?branch=main)](https://github.com/enriquea/fslite/actions/workflows/python-package-conda.yml)
 
-# fsspark
+# fslite
 
 ---
 
-## Feature selection in Spark
+### Memory-Efficient, High-Performance Feature Selection Library for Big and Small Datasets
 
 ### Description
 
-`fsspark` is a python module to perform feature selection and machine learning based on spark.
-Pipelines written using `fsspark` can be divided roughly in four major stages: 1) data pre-processing, 2) univariate 
+`fslite` is a python module to perform feature selection and machine learning using pre-built FS pipelines. 
+Pipelines written using `fslite` can be divided roughly in four major stages: 1) data pre-processing, 2) univariate 
 filters, 3) multivariate filters and 4) machine learning wrapped with cross-validation (**Figure 1**).
 
+`fslite` is based on our previous work [feseR](https://github.com/enriquea/feseR); previously implemented in R and caret package; publication can be found [here](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0189875).
+
 ![Feature Selection flowchart](images/fs_workflow.png)
-**Figure 1**. Feature selection workflow example implemented in fsspark.
+**Figure 1**. Feature selection workflow example implemented in fslite.
 
 ### Documentation
 
 The package documentation describes the [data structures](docs/README.data.md) and 
-[features selection methods](docs/README.methods.md) implemented in `fsspark`.
+[features selection methods](docs/README.methods.md) implemented in `fslite`.
 
 ### Installation
 
 - pip
 ```bash
-git clone https://github.com/enriquea/fsspark.git
-cd fsspark
+git clone https://github.com/bigbio/fslite.git
+cd fslite
 pip install . -r requirements.txt
 ```
 
 - conda
 ```bash
-git clone https://github.com/enriquea/fsspark.git
-cd fsspark
+git clone https://github.com/bigbio/fslite.git
+cd fslite
 conda env create -f environment.yml
-conda activate fsspark-venv
+conda activate fslite-venv
 pip install . -r requirements.txt
 ```
 

diff --git a/fsspark/tests/test_fsdataframe.py b/fsspark/tests/test_fsdataframe.py
@@ -107,4 +107,7 @@ def measure_memory_usage(n_samples: int, n_features: int, nan_prob = 0.01) -> fl
     plt.show()
 
     # Print results table
-    print(results_df.to_string(index=False))
+    print(results_df.to_string(index=False))
+
+    # Initialize FSDataFrame with DataFrame having sparse numerical features and insufficient memory for dense matrix
+