Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataClump detection fails when methods are separated by unrelated code #1798

Closed
le-santos opened this issue Jan 6, 2025 · 5 comments
Closed

Comments

@le-santos
Copy link
Contributor

le-santos commented Jan 6, 2025

I came across with a situation where reek's DataClump smell detection stopped warning when a new method is added between existing methods that share similar arguments.

In order to investigate this I wrote a failing test case. However, before start working on some solution, I'd like to confirm: is this the expected behavior or an issue in smell detection?

The documentation states:

a Data Clump occurs when the same two or three items frequently appear together in classes and parameter lists

However, it doesn't really specify wether these methods need to be consecutive in the source code. Could someone clarify this requirement?

Target code

# sample.rb
class Alfa
  def bravo  (echo, foxtrot); end
  def charlie(echo, foxtrot); end
  def dummy(param); end
  def delta  (echo, foxtrot); end
end

Expected behavior

reek --smell DataClump sample.rb

Inspecting 1 file(s):
S

sample.rb -- 1 warning:
  [2, 3, 5]:DataClump: Alfa takes parameters ['echo', 'foxtrot'] to 3 methods

Actual behavior

reek --smell DataClump sample.rb

Inspecting 1 file(s):

.

Nothing to warn.

Ruby and gem versions

  • Ruby 3.4.1
  • Reek 6.3.0
@mvz
Copy link
Collaborator

mvz commented Jan 7, 2025

Hi @le-santos, this detector should work even of there is unrelated code between the offending methods. So this would definitely a bug. Have you checked that it does create a warning when the methods are grouped together? I'm asking because DataClump usually allows clumps of two parameters. So this smell would only be detected if you've lowered the maximum in your configuration.

@mvz
Copy link
Collaborator

mvz commented Jan 7, 2025

@le-santos I just checked this and I was wrong. I have checked your sample with the three methods grouped together and I can confirm that DataClump does detect the clump with two parameters.

@mvz
Copy link
Collaborator

mvz commented Jan 7, 2025

@le-santos in conclusion: Yes, this is a bug and a pull request to fix this would definitely be appreciated.

@le-santos
Copy link
Contributor Author

le-santos commented Jan 7, 2025

Hi @mvz, thanks for the answer!
Analysing the code, I found that the issue is related to these 2 methods in DataClump Detector.

# lib/reek/smell_detectors/data_clump.rb
def candidate_clumps
  candidate_methods.each_cons(max_copies + 1).map do |methods|
    common_argument_names_for(methods)
  end.select do |clump|
    clump.length >= min_clump_size
  end.uniq
end

def common_argument_names_for(methods)
  methods.map(&:arg_names).inject(:&).compact.sort
end

Considering a source code as below, candidate_methods.each_cons will return groups of consecutive methods.

def a(x,y); end
def b(x,y); end
def c(z)  ; end
def d(x,y); end

candidate_methods.each_cons(3) # => [[a, b, c], [b, c, d]]

The common_argument_names_for() applies an intersection of argument names from the methods in each group. In this case, the intersection of arg_names of each method ( [[[:x, :y], [:x, :y], [:z]], [[:x, :y], [:z], [:x, :y]]]) will be an empty array, and we lose track of the existing clump.

An alternative approach would be to use combination instead of each_cons, generating all possible combinations without considering the order of items. By doing this, we can match the methods that have the same clump of args, even if mixed with other methods.

candidate_methods.combination(3) #= [[a, b, c], [a, b, d], [a, c, d], [b, c, d]]

There is one caveat, though: this may have some performance impact, since the number of items to iterate will be larger. Do you think this is a concern here?

@mvz
Copy link
Collaborator

mvz commented Jan 12, 2025

Fixed in #1799.

@mvz mvz closed this as completed Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants