Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic Programming c++ implementation #9

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
bb10165
added Dynp class
npkamath Jan 2, 2024
d13dfc7
reverted extra spaces
npkamath Jan 2, 2024
5bbbaf2
reverted spaces
npkamath Jan 2, 2024
930430a
removed read_input function
npkamath Jan 2, 2024
379897a
Merge remote-tracking branch 'refs/remotes/origin/main'
npkamath Jan 2, 2024
e35cff1
removed read_input function
npkamath Jan 2, 2024
e8ccb7c
Delete PULL_REQUEST_TEMPLATE.md
npkamath Jan 2, 2024
fccca35
refactor: Remove Dynp directory
b-butler Jan 9, 2024
0c64950
fixed namespace, naming, and cleaned up comments
npkamath Jan 12, 2024
fe6bac1
added dynp.py file, fixed constructors, added column_indices for fast…
npkamath Jan 14, 2024
245a92f
fixed index system
npkamath Jan 15, 2024
0af1b08
reorganized class variables and added dynp.py functions and fit
npkamath Jan 15, 2024
66e2104
fit function added with parameter input, removed whitespace with pre…
npkamath Jan 19, 2024
a4cff6f
Merge remote-tracking branch 'upstream/main'
b-butler Jan 24, 2024
9ccf1fe
build: Switch to scikit-build-core.
b-butler Jan 24, 2024
8da0fe0
ci: Add reusable workflow to install system packages
b-butler Jan 24, 2024
943a6c0
ci: Fix package installation by using composite action
b-butler Jan 24, 2024
98ee45c
ci: Correctly specify shell for custom action
b-butler Jan 24, 2024
744b928
ci: Fix action step name formatting
b-butler Jan 24, 2024
b8dd0f2
ci: Remove trailing ":"
b-butler Jan 24, 2024
c7e6920
ci: Update package manager caches before installing
b-butler Jan 24, 2024
7771157
ci: Fix apt-get package names
b-butler Jan 24, 2024
f7287f8
ci: Fix one last package name
b-butler Jan 24, 2024
47aedc0
ci: Yet another package name change
b-butler Jan 24, 2024
4fd3a58
documentation and renaming added
npkamath Feb 26, 2024
4dd27f8
upper triangular restructured and dynp restructured
npkamath Feb 26, 2024
f51ff1c
upper triangular restructured and dynp restructured
npkamath Feb 26, 2024
c0afe6f
conditionals
npkamath Feb 26, 2024
77e3e00
naming and other cpp changes
npkamath Feb 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ endif()

find_package(Eigen3 REQUIRED)
find_package(TBB REQUIRED)
find_package(pybind11 CONFIG REQUIRED)
npkamath marked this conversation as resolved.
Show resolved Hide resolved
add_subdirectory(extern/pybind11)

include_directories(${PROJECT_SOURCE_DIR}/src)
add_subdirectory(src)
46 changes: 46 additions & 0 deletions dupin/detect/dynp.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
import _DynP

class DynP:
"""Dynamic Programming class for calculating optimal segmentation

Attributes:
data (np.ndarray): Matrix storing the dataset.
num_bkps (int): Number of breakpoints to detect.
jump (int): Interval for checking potential breakpoints.
min_size (int): Minimum size of a segment.
"""

def __init__(self, data: np.ndarray, num_bkps: int, jump: int, min_size: int):
"""Initializes the DynamicProgramming instance with given parameters."""
self.dynp = _DynP.DynamicProgramming(data, num_bkps, jump, min_size)
npkamath marked this conversation as resolved.
Show resolved Hide resolved

def set_num_threads(self, num_threads: int):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long term dev wise, if additional CPP methods will be added we will most likely want num_threads to be controlled on the level of the whole module? Would preprocessor be a good place to have this? @b-butler thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this to as a _DynP module function and exporting it to dupin/util.py is probably the best solution in my opinion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I would just move this function to util.py right? would just have to import _DynP in util.py?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

"""Sets the number of threads for parallelization.

Args:
num_threads (int): The number of threads to use.
"""
self.dynp.set_threads(num_threads)

def return_breakpoints(self) -> list:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be unnecessary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you clarify which part

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant return_breakpoints, I removed it and just kept fit.

"""Returns the optimal set of breakpoints after segmentation.

Returns:
list: A list of integers representing the breakpoints.
"""
return self.dynp.return_breakpoints()

def initialize_cost_matrix(self):
"""Initializes and fills the upper triangular cost matrix for all data segments."""
self.dynp.initialize_cost_matrix()

def fit(self) -> list:
"""Calculates the cost matrix and returns the breakpoints.

Returns:
list: A list of integers representing the breakpoints.
"""
return self.dynp.fit()
npkamath marked this conversation as resolved.
Show resolved Hide resolved



2 changes: 1 addition & 1 deletion src/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
pybind11_add_module(_dupin dupininterface.cpp
dupin.h dupin.cpp
dupin.h dupin.cpp
npkamath marked this conversation as resolved.
Show resolved Hide resolved
)

set_target_properties(_dupin PROPERTIES
Expand Down
14 changes: 6 additions & 8 deletions src/dupin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,6 @@ double DynamicProgramming::cost_function(int start, int end) {
void DynamicProgramming::initialize_cost_matrix() {
scale_data();
cost_matrix.initialize(num_timesteps);

tbb::parallel_for(tbb::blocked_range<int>(0, num_timesteps),
[&](const tbb::blocked_range<int> &r) {
for (int i = r.begin(); i < r.end(); ++i) {
Expand Down Expand Up @@ -130,7 +129,7 @@ std::pair<double, std::vector<int>> DynamicProgramming::seg(int start, int end,
return best;
}

std::vector<int> DynamicProgramming::return_breakpoints() {
std::vector<int> DynamicProgramming::compute_breakpoints() {
auto result = seg(0, num_timesteps - 1, num_bkps);
std::vector<int> breakpoints = result.second;
std::sort(breakpoints.begin(), breakpoints.end());
npkamath marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -139,6 +138,11 @@ std::vector<int> DynamicProgramming::return_breakpoints() {
return breakpoints;
}

std::vector<int> DynamicProgramming::fit(){
initialize_cost_matrix();
return compute_breakpoints();
}

void set_parallelization(int num_threads) {
static tbb::global_control gc(tbb::global_control::max_allowed_parallelism,
num_threads);
Expand All @@ -157,12 +161,6 @@ DynamicProgramming::getCostMatrix() {
return cost_matrix;
}

void DynamicProgramming::set_num_timesteps(int value) { num_timesteps = value; }

void DynamicProgramming::set_num_parameters(int value) {
num_parameters = value;
}

void DynamicProgramming::setDatum(const Eigen::MatrixXd &value) {
data = value;
}
Expand Down
36 changes: 19 additions & 17 deletions src/dupin.h
Original file line number Diff line number Diff line change
Expand Up @@ -88,16 +88,7 @@ class DynamicProgramming {
Eigen::MatrixXd y; // Dependent variable (labels).
Eigen::VectorXd x; // z Independent variable (time steps).
};

public:
// Default constructor.
DynamicProgramming();

// Parameterized constructor.
DynamicProgramming(const Eigen::MatrixXd &data, int num_bkps_, int jump_,
int min_size_);

// Scales the dataset using min-max normalization.
// Scales the dataset using min-max normalization.
void scale_data();

// Prepares data for linear regression.
Expand All @@ -116,17 +107,30 @@ class DynamicProgramming {
// Computes the cost of a specific data segment using linear regression.
double cost_function(int start, int end);

// Recursive function for dynamic programming segmentation.
std::pair<double, std::vector<int>> seg(int start, int end, int num_bkps);


public:
// Default constructor.
DynamicProgramming();

// Parameterized constructor.
DynamicProgramming(const Eigen::MatrixXd &data, int num_bkps_, int jump_,
int min_size_);

// Initializes and fills the cost matrix for all data segments.
void initialize_cost_matrix();

// Recursive function for dynamic programming segmentation.
std::pair<double, std::vector<int>> seg(int start, int end, int num_bkps);

//sets number of threads for parallelization
void set_parallelization(int num_threads);

// Returns the optimal set of breakpoints after segmentation.
std::vector<int> return_breakpoints();
std::vector<int> compute_breakpoints();

// Calculates the cost matrix and return the breakpoints
std::vector<int> fit();

// Getter functions for accessing private class members.
int get_num_timesteps();
Expand All @@ -136,9 +140,7 @@ class DynamicProgramming {
DynamicProgramming::UpperTriangularMatrix &getCostMatrix();

// Setter functions for modifying private class members.
void set_num_timesteps(int value);
void set_num_parameters(int value);

void setDatum(const Eigen::MatrixXd &value);
void
setCostMatrix(const DynamicProgramming::UpperTriangularMatrix &value);
void setCostMatrix(const DynamicProgramming::UpperTriangularMatrix &value);
};
8 changes: 4 additions & 4 deletions src/dupininterface.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,21 @@

namespace py = pybind11;

PYBIND11_MODULE(_dupin, m) {
PYBIND11_MODULE(_DynP, m) {
npkamath marked this conversation as resolved.
Show resolved Hide resolved
py::class_<DynamicProgramming>(m, "DynamicProgramming")
.def(py::init<>())
.def_property("data", &DynamicProgramming::getDatum,
&DynamicProgramming::setDatum)
.def_property("cost_matrix", &DynamicProgramming::getCostMatrix,
&DynamicProgramming::setCostMatrix)
.def_property("num_bkps", &DynamicProgramming::get_num_bkps,
&DynamicProgramming::set_num_bkps)
.def("num_bkps", &DynamicProgramming::get_num_bkps)
.def_property("num_timesteps", &DynamicProgramming::get_num_timesteps,
&DynamicProgramming::set_num_timesteps)
.def_property("num_parameters", &DynamicProgramming::get_num_parameters,
&DynamicProgramming::set_num_parameters)
.def("initialize_cost_matrix",
&DynamicProgramming::initialize_cost_matrix)
.def("return_breakpoints", &DynamicProgramming::return_breakpoints)
.def("return_breakpoints", &DynamicProgramming::compute_breakpoints)
.def("fit", &DynamicProgramming::fit)
.def("set_threads", &DynamicProgramming::set_parallelization);
}
Loading