Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluetooth: Mesh: Add Directed Forwarding Support #66883

Closed
wants to merge 1 commit into from

Conversation

LingaoM
Copy link
Collaborator

@LingaoM LingaoM commented Dec 22, 2023

Add Directed Forwarding Support

#66480

PTS Case(not passed yet, other passed test case not list):

Directed Forwarding

Directed Forwarding Discovery
  • MESH/NODE/DF/DISC/BV-03-C
  • MESH/NODE/DF/DISC/BV-10-C
Directed Forwarding Establishment
  • MESH/NODE/DF/EST/BV-04-C
Directed Forwarding Echo
  • MESH/NODE/DF/ECHO/BI-04-C(LOOK LIKE PTS BUG)

Directed Forwarding Model

Directed Control State
  • MESH/SR/DFM/DFS/BV-04-C
Forwarding Table State
  • MESH/SR/DFM/FTS/BV-10-C
  • MESH/SR/DFM/FTS/BV-11-C
Directed Control Network Transmit State
  • MESH/SR/DFM/DCNTS/BV-01-C

Test Report:
MDF.zip

@LingaoM LingaoM added this to the future milestone Dec 22, 2023
@LingaoM LingaoM added DNM This PR should not be merged (Do Not Merge) In progress For PRs: is work in progress and should not be merged yet. For issues: Is being worked on labels Dec 22, 2023
@Thalley Thalley removed their request for review December 22, 2023 13:08
@zephyrbot zephyrbot requested a review from Thalley December 23, 2023 11:04
@LingaoM LingaoM removed the request for review from Thalley December 23, 2023 11:04
@zephyrbot zephyrbot requested a review from Thalley December 23, 2023 11:34
@LingaoM LingaoM force-pushed the add_mdf_support branch 3 times, most recently from 0a2feb4 to 931d83d Compare December 23, 2023 12:20
@LingaoM LingaoM marked this pull request as draft December 23, 2023 12:32
@LingaoM LingaoM removed the DNM This PR should not be merged (Do Not Merge) label Dec 23, 2023
@LingaoM LingaoM force-pushed the add_mdf_support branch 3 times, most recently from 57d2b18 to bcb4ed3 Compare December 25, 2023 04:50
@LingaoM
Copy link
Collaborator Author

LingaoM commented Jan 3, 2024

@PavelVPV @alxelax Except for listed above for several PTS tests not passed, everything else passed, so look forward to your review now.

@LingaoM LingaoM removed the In progress For PRs: is work in progress and should not be merged yet. For issues: Is being worked on label Jan 3, 2024
@Thalley Thalley removed their request for review January 3, 2024 07:01
@omkar3141
Copy link
Collaborator

omkar3141 commented Jan 3, 2024

@LingaoM Can you please provide complete list of passing or not passing PTS tests? PR description is not clear with respect to what has been passed and what did not.

@LingaoM
Copy link
Collaborator Author

LingaoM commented Jan 3, 2024

@LingaoM Can you please provide complete list of passing or not passing PTS tests? PR description is not clear with respect to what has been passed and what did not.

MDF.zip

@omkar3141
Copy link
Collaborator

@LingaoM Can you please provide complete list of passing or not passing PTS tests? PR description is not clear with respect to what has been passed and what did not.

MDF.zip

Ok. So 126 out of 466 are PASS.

@zephyrbot zephyrbot requested a review from Thalley January 3, 2024 11:17
@LingaoM
Copy link
Collaborator Author

LingaoM commented Jan 3, 2024

@LingaoM Can you please provide complete list of passing or not passing PTS tests? PR description is not clear with respect to what has been passed and what did not.

MDF.zip

Ok. So 126 out of 466 are PASS.

BTW: Only Mesh Directed Forwarding tested.

@LingaoM LingaoM force-pushed the add_mdf_support branch 5 times, most recently from bb735bb to a8709df Compare January 9, 2024 09:27
Add Directed Forwarding Support

Signed-off-by: Lingao Meng <[email protected]>
@Thalley Thalley removed their request for review January 11, 2024 09:03
@alxelax
Copy link
Collaborator

alxelax commented Jan 11, 2024

Hi @LingaoM, I've just started reviewing this PR.
Thanks for your great work. DFW is very complex feature. This is quite huge and might take a couple of days.
While I will be reviewing in details I'd like to emphasize the main architectural disadvantages that are seen already. Mesh-1.1 assumes that every subnetwork is operating with direct-forwarding algorithms independently. How do I understand that? It should be possible to have a scalable number of state machines as well as discovery and forwarding tables:

Figure 4.9: Path Origin State Machine states, events, and transitions 
  1. Could be all states encapsulated within the subnetwork structure?
  2. Could we formalize the state machine that the specification suggests in a more readable manner? I meant Zephyr's State Machine Framework: https://docs.zephyrproject.org/latest/services/smf/index.html
  3. Could we split dfw.c functionality into:
    3.1) server commands receiving and state updating over commands and state bindings;
    3.2) direct forwarding procedures like subclause 3.6.8.2 Directed forwarding procedures describes

Currently dfw.c is enormously huge and this is very difficult to maintain in the future.

If you have any questions or topics for architecture discussion we could continue in Zephyr's Mesh Discord. I expect this PR will have a lot of comments to not pollute it with architecture discussion. :)

__ASSERT_NO_MSG(false);
}

sub->dfw = dfw;
Copy link
Collaborator Author

@LingaoM LingaoM Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Could be all states encapsulated within the subnetwork structure?

@alxelax I need to explain to you why DFW states is not placed in struct bt_mesh_subnet, this is because during the process of calling settings_load(), we cannot guarantee the order in which the information of netkey and dfw will be restored.

Mesh Protocol 1.1 requires that the directed forwarding states of fixed paths and subnets needs to be stored.

If you look at my implementation code, you can see that its corresponding storage path is bt/mesh/DFW/Sub/Cfg/xxx, bt/mesh/DFW/Sub/Fw/xxx/xxx/xxx and bt/mesh/DFW/Sub/Dep/xxx/xxx/xxx/xxx.

According to my implementation, when settings_load() is executed, if the dfw information is restored first, only net_idx will be temporarily stored in the struct bt_mesh_dfw_subnet. After waiting for the subnet to be restored, a one-to-one correspondence will be established between the subnet and dfw in BT_MESH_KEY_ADDED .

@LingaoM
Copy link
Collaborator Author

LingaoM commented Jan 16, 2024

Mesh-1.1 assumes that every subnetwork is operating with direct-forwarding algorithms independently. How do I understand that? It should be possible to have a scalable number of state machines as well as discovery and forwarding tables:

My understanding of your mean is to design the discovery table and forwarding table into separate like pools, rather than using the same fixed number of configurations for each subnet, right?

If you have any questions or topics for architecture discussion we could continue in Zephyr's Mesh Discord.

Due to GFW restrictions in mainland China, it may be more convenient for me to communicate using github, thanks.

Copy link
Collaborator

@alxelax alxelax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I agree with your explanation about the first point of my initial list. However, the second and third should be done.

Using Zephyr's state machine framework will reduce complexity of implementation. We will use some already done and tested pattern.
Also, splitting on server and common parts are mandatory. At the moment, we do not know which part of the current implementation will be required for dfw's client implementation. I'd prefer to keep dfw logic in common place rather than client will reuse server functionality.

Also, I think dfw integration into existing mesh code should be redesigned. At the current implementation, I see, some layers take responsibility to trigger one or another dfw algorithm. I do not think this is correct. They should send events to dfw and dfw should decide what to do according to state machine states.

Dfw implementation requires simplification. There are a couple of functions that are enormously long and complex. Also, there are some functions parts of them look similar (look at my comments).

My concentration to the about 3rd thousands line of dfw implementation went down. Obviously I missed a lot of places those should be improved or have bugs.
To be sure the quality of dfw is good enough, it requires significant babblesim testing.
Right now, I recommend keeping it as experimental feature.

@@ -587,6 +587,14 @@ uint8_t bt_mesh_app_key_del(uint16_t app_idx, uint16_t net_idx);
*/
bool bt_mesh_app_key_exists(uint16_t app_idx);

/** @brief Get the binding network key index of the application key.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/** @brief Get the binding network key index of the application key.
/** @brief Get the bound subnetwork index by the application index.

Zephyr's Mesh implementation operates subnetwork term instead of network key index.

Why did you add this function as the public API instead of adding it into local cfg.h?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

*
* @param app_idx Application index.
*
* @return The binding network key index if Application is known, BT_MESH_KEY_UNUSED otherwise.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return The binding network key index if Application is known, BT_MESH_KEY_UNUSED otherwise.
* @return The subnetwork index if it is bound to provided application index, BT_MESH_KEY_UNUSED otherwise.

*
* @return The binding network key index if Application is known, BT_MESH_KEY_UNUSED otherwise.
*/
uint16_t bt_mesh_app_binding_net_key_get(uint16_t app_idx);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uint16_t bt_mesh_app_binding_net_key_get(uint16_t app_idx);
uint16_t bt_mesh_subnet_by_app_idx_get(uint16_t app_idx);


/** Send message with a random delay according to the Access layer transmitting rules. */
bool rnd_delay;
uint8_t send_rel:1, /*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you need to add comment above each bitfield.
This syntax /** is the marker for doxygen to pick up comment into generated documentation. The comment style that you used here will be ignored by doxygen.

@@ -24,6 +24,14 @@
extern "C" {
#endif

/** Security material */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/** Possible message security credentials */
enum bt_mesh_cred {
	BT_MESH_CRED_FLOODING, /**< Flooding security credentials */
	BT_MESH_CRED_FRIEND,       /**< Friendship security credentials */
	BT_MESH_CRED_DIRECTED,  /**< Directed security credentials */
};

Immutable-credentials is an additional tag for security credential, but not a separate type of credentials.
It might be flooding immutable, but might be just flooding.
I'd prefer to not mix these two entities in the single variable.

Btw. BT_MESH_CRED_IMMUTABLE is not analyzed anywhere in the code. Do you need it at all?


}

static void dfw_state_machine_state_set(struct dfw_state_machine *machine,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure it is better to use Zephyr's state machine framework here.
Since you've already implemented it on state machine manner, it will not be a complex task to convert it to Zephyr's fsm.

dfw_forwarding_clear(fw);
}

static void dfw_path_reply_delay_expired(struct k_work *work)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems very complex function with multirange nested if. Could it be split on simple functions?

return false;
}

int bt_mesh_dfw_path_request(struct bt_mesh_net_rx *rx, struct net_buf_simple *buf)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more long and complex function that requires refactoring.

}
}

int bt_mesh_dfw_path_reply(struct bt_mesh_net_rx *rx, struct net_buf_simple *buf)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more function for splitting and refactoring.

(void)bt_mesh_dfw_get(sub->net_idx, &state);
net_buf_simple_add_u8(&msg, state);

(void)bt_mesh_dfw_relay_get(sub->net_idx, &state);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do all such kind of functions in dfw have returned values? It is not analyzed and sent in the air as is.

Copy link
Collaborator

@PavelVPV PavelVPV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a huge work, @LingaoM! We really appreciate it, thanks!

I'm posting my first comments. I've just briefly gone through the interaction between the MDF and the rest of the stack. I didn't check the compliance with the spec and dfw.c code.

I agree with all @alxelax points. We need to remove the dependency on the MDF from the layers. From what I see using callbacks similar to bt_mesh_subnet_cb/bt_mesh_friend_cb/bt_mesh_lpn_cb/etc. could be an option. I wish the stack used a single events callback like we had in nRF5 SDK for Mesh, then it would be less memory consuming and would give more flexibility with regards to public API (bt_mesh_friend_cb and bt_mesh_lpn_cb are public callbacks). I think it would make sense to add such callback for cfg_srv.c to post events like subscription list change, publication state change, etc.

I also agree with @alxelax that we need to use the SMF which is well (I believe) tested and maintained in Zephyr rather than inventing something own. As well as I agree with @alxelax that we should hide all MDF state machine triggers under some API that generate events for MDF (in places where we can't use callbacks to move the code to dfw.c).


config BT_MESH_DFW_STATE_MACHINE_COUNT
int "Max number of Path Origin state Machine"
default 20
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below. use range as 20 is minimum allowed value by spec.

menu "Directed Control"

config BT_MESH_DFW_ENABLED
bool "Directed forwarding enable"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in other such options: enable -> enabled (as in other such options), or put Enable in the first place, or enabled by default (we have added by default to few options some time ago to differentiate these options from options that enables support for a feature).

@@ -47,7 +48,7 @@ struct mod_pub_val {
uint8_t retransmit;
uint8_t period;
uint8_t period_div:4,
cred:1;
cred:2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it backward compatible? If cred was stored with value 1 in the old firmware, will it be restored correctly as BT_MESH_CRED_FRIEND when the firmware is updated even without MDF support?

Comment on lines +1617 to +1627
if (!ADDR_RANGE_IN(src, &fw->path_origin)) {
for (i = 0; i < ARRAY_SIZE(fw->dependent_origin); i++) {
if (ADDR_RANGE_IN(src, &fw->dependent_origin[i])) {
break;
}
}

if (i == ARRAY_SIZE(fw->dependent_origin)) {
return false;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is return true; missed somewhere here?

} relay_tables[] = {
#if defined(CONFIG_BT_MESH_RELAY)
{
.inbound_bearer = BT_MESH_NET_IF_ADV,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in this table it is only BT_MESH_NET_IF_ADV.

Comment on lines +902 to +908
cred = &rx->sub->keys[SUBNET_KEY_TX_IDX(rx->sub)].flooding;

#if defined(CONFIG_BT_MESH_DFW)
if (outbound_security_material == BT_MESH_CRED_DIRECTED) {
cred = &rx->sub->keys[SUBNET_KEY_TX_IDX(rx->sub)].directed;
}
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cred = &rx->sub->keys[SUBNET_KEY_TX_IDX(rx->sub)].flooding;
#if defined(CONFIG_BT_MESH_DFW)
if (outbound_security_material == BT_MESH_CRED_DIRECTED) {
cred = &rx->sub->keys[SUBNET_KEY_TX_IDX(rx->sub)].directed;
}
#endif
#if defined(CONFIG_BT_MESH_DFW)
if (outbound_security_material == BT_MESH_CRED_DIRECTED) {
cred = &rx->sub->keys[SUBNET_KEY_TX_IDX(rx->sub)].directed;
}
else
#endif
{
cred = &rx->sub->keys[SUBNET_KEY_TX_IDX(rx->sub)].flooding;
}

outbound_bearers |= relay_tables[i].conditions(adv, rx,
&outbound_security_material);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exit earlier if no outbound_bearers selected?

Comment on lines +1036 to +1068
#if defined(CONFIG_BT_MESH_DFW)
if (ctl_op <= TRANS_CTL_OP_PATH_REQ_SOLICITATION &&
ctl_op >= TRANS_CTL_OP_PATH_REQUEST) {
enum bt_mesh_feat_state state = BT_MESH_FEATURE_DISABLED;

if (rx->ctx.cred != BT_MESH_CRED_DIRECTED &&
ctl_op != TRANS_CTL_OP_PATH_ECHO_REPLY) {
return -EINVAL;
}

(void)bt_mesh_dfw_get(rx->ctx.net_idx, &state);
if (state != BT_MESH_FEATURE_ENABLED) {
return -EACCES;
}
}

switch (ctl_op) {
case TRANS_CTL_OP_PATH_REQUEST:
return bt_mesh_dfw_path_request(rx, buf);
case TRANS_CTL_OP_PATH_REPLY:
return bt_mesh_dfw_path_reply(rx, buf);
case TRANS_CTL_OP_PATH_CONFIRM:
return bt_mesh_dfw_path_confirm(rx, buf);
case TRANS_CTL_OP_PATH_ECHO_REQ:
return bt_mesh_dfw_path_echo_request(rx, buf);
case TRANS_CTL_OP_PATH_ECHO_REPLY:
return bt_mesh_dfw_path_echo_reply(rx, buf);
case TRANS_CTL_OP_DEPENDENT_NODE_UPDATE:
return bt_mesh_dfw_dependent_node_update(rx, buf);
case TRANS_CTL_OP_PATH_REQ_SOLICITATION:
return bt_mesh_dfw_path_request_solicitation(rx, buf);
}
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a function in dfw.c and move this code there?

@@ -77,6 +84,43 @@ struct bt_mesh_ctl_friend_sub_confirm {
uint8_t xact;
} __packed;

struct bt_mesh_ctl_path_request {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all these structures needs to be accessible outside of dfw.c? Why not define them there?

switch (ctl_op) {
case TRANS_CTL_OP_FRIEND_UPDATE:
if (rx->ctx.cred != BT_MESH_CRED_FRIEND) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess because the rx address can be the lpn's address, but the opcode not necessarily Friend Update or Subscription Confirmation. But what about this:

if ((ctl_op == TRANS_CTL_OP_FRIEND_UPDATE ||
     ctl_op == TRANS_CTL_OP_FRIEND_SUB_CFM) &&
     rx->ctx.cred != BT_MESH_CRED_FRIEND) {
	LOG_WRN("Message from friend with wrong credentials");
	return -EINVAL;
}

this will preserve the error message. Otherwise it could be just:

if (rx->friend_cred) {
	switch (ctl_op) {
	// TRANS_CTL_OP_FRIEND_UPDATE 
	// TRANS_CTL_OP_FRIEND_SUB_CFM
...

Copy link
Collaborator

@PavelVPV PavelVPV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting more comments. Need to go through the state machine.

Comment on lines +5333 to +5335
if (IS_ENABLED(CONFIG_BT_SETTINGS)) {
dfw_cfg_store();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though that this patter always creates the unused function warning when CONFIG_BT_SETTINGS is disabled. Perhaps IS_ENABLED(CONFIG_BT_SETTINGS) should be moved inside dfw_cfg_store()?

{
uint8_t xmit = net_buf_simple_pull_u8(buf);

dfw_cfg.directed_ctl_relay_retransmit = xmit;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check if it actually changed to not trigger settings once again. Same for other states. I see that you already have such functions (e.g. bt_mesh_dfw_ctl_relay_retransmit_set) that avoid extra store. They can be used here.

goto send_status;
}

mod = get_model(elem, &buf->data[3], buf->len - 3);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check that buf->len either 2 or 4. Otherwise, return -EINVAL and abort message processing.

net_buf_simple_add_u8(&msg, status);

/* Directed Publish Policy Get Message */
if (buf->len == 4 || buf->len == 6) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the length is neither 4 nor 6, we can't send message.

uint16_t net_idx;

net_idx = sys_get_le16(buf->data);
if (net_idx == BT_MESH_KEY_UNUSED) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BT_MESH_KEY_UNUSED is an internal value. For this, every index starting from 0x1000 should be prohibited (figure 4.5).

Comment on lines +3479 to +3489
static const char *path_lifetime_to_str(enum bt_mesh_dfw_path_lifetime lifetime)
{
static const char * const strs[] = {
"12 Minuters",
"2 Hours",
"24 Hours",
"10 Days",
};

return strs[lifetime];
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it generate a unused function warning if the logs are disabled? Shouldn't it be wrapped into #if defined(CONFIG_BT_MESH_DFW_LOG_LEVEL_DBG)/#endif?

return -EINVAL;
}

friend_enable = buf->data[6];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prohibited values should be checked

Comment on lines +3148 to +3151
/*
* Ignore for all-directed-forwarding-nodes, all-nodes, and all-relays
* fixed group addresses.
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment probably belongs to the if statement at line 3141

Comment on lines +2741 to +2742
fw_new->echo_intv.ticks *= percent;
fw_new->echo_intv.ticks /= 100;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure we can access ticks field of k_timeout_t? I don't see any sign that this field can be treated as public. In fact, the description for k_timeout_t says:

Timeout arguments presented to kernel APIs are stored in this opaque type

Thought I may understand this incorrectly.

Comment on lines +1556 to +1558
if (!dfw) {
__ASSERT_NO_MSG(false);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!dfw) {
__ASSERT_NO_MSG(false);
}
__ASSERT_NO_MSG(!!dfw);

Copy link
Collaborator

@PavelVPV PavelVPV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm done with the review for the current moment.

We also need some documentation page that will describe this feature (how it works, what is supported, configuration and API (dfw srv model, credentials part)).

dfw->unicast_echo_intv = cfg.unicast_echo_intv;
dfw->multicast_echo_intv = cfg.multicast_echo_intv;

LOG_DBG("Restored directed forwarding subnet configuration value");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LOG_DBG("Restored directed forwarding subnet configuration value");
LOG_DBG("Restored directed forwarding configuration value for subnet %#03x", net_idx);

Comment on lines +911 to +914
val.is_dependent_origin = is_dependent_origin;
val.secondary_count = dependent->secondary_count;

err = settings_delete(path);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either I don't understand something or this should be settings_save_one instead of settings_delete.

dv->forwarding_number = ++dfw->forwarding_number;
dv->metric = 0;

k_work_init_delayable(&dv->timer, dfw_initialization_discovery_expired);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you didn't use Z_WORK_DELAYABLE_INITIALIZER for this timer?

Comment on lines +1801 to +1803
if (fw->path_not_ready) {
fw->path_not_ready = false;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it could be just:

Suggested change
if (fw->path_not_ready) {
fw->path_not_ready = false;
}
fw->path_not_ready = false;

Comment on lines +1028 to +1031
(void)bt_rand(&random, sizeof(random));
random %= 2000;
random += ARRAY_INDEX(state_machines, machine) * 2000;
(void)k_work_reschedule(&machine->timer, K_MSEC(random));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our implementation, when device boots up, a model starts publishing after the publication period:

if (mod->pub && mod->pub->addr != BT_MESH_ADDR_UNASSIGNED) {
int32_t ms = bt_mesh_model_pub_period_get(mod);
if (IS_ENABLED(CONFIG_BT_MESH_DFW) && mod->pub->cred == BT_MESH_CRED_DIRECTED) {
(void)bt_mesh_dfw_path_origin_state_machine_start(
bt_mesh_app_binding_net_key_get(mod->pub->key),
NULL, mod->pub->addr, true);
}
if (mod->pub->update && ms > 0) {
/* Delay the first publication after power-up for longer time (section
* 3.7.3.1):
*
* When the publication of a message is the result of a power-up, a state
* transition progress update, or completion of a state transition, multiple
* nodes may be reporting the state change at the same time. To reduce the
* probability of a message collision, these messages should be sent with a
* random delay between 20 and 500 milliseconds.
*/
uint16_t random;
random = !!mod->pub->delayable ? pub_delay_get(RANDOM_DELAY_LONG) : 0;
LOG_DBG("Starting publish timer (period %u ms, delay %u ms)", ms, random);
k_work_schedule(&mod->pub->timer, K_MSEC(ms + random));
}
}

2 seconds even multiplied by the number of state machines may be less than publication period for some models. MDF will monitor the path, but will always fail. Should we change our implementation in the access layer to start publication without waiting for the publication period? We can make it always use 20 to 500ms delay to randomize publications from several models. @omkar3141 do you have any thoughts?

Comment on lines +2446 to +2454
if (atomic_test_and_set_bit(fw->flags, BT_MESH_DFW_FORWARDING_FLAG_ECHO_REPLY)) {
LOG_ERR("Not received Echo Reply");
dfw->update_id++;
dfw_forwarding_clear(fw);

(void)bt_mesh_dfw_path_origin_state_machine_start(dfw->net_idx, NULL,
ctx.addr, false);
return;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is correct. This depends on the Path Origin State Machine state. If the state machine is in the Path Validation state, then according to Table 4.347, when the path validation fails due to not received echo reply, we should start the new Directed Forwarding Initialization procedure:
image
Table 4.346:
image
End of the section 3.6.8.2.6:
image
But here a new machine instance is started which is only applicable for the Path In Use state:
image
I also see that DFW_STATE_MACHINE_EVENT_PATH_VALID_FAILED event is not used.

Comment on lines +22 to +24
BT_MESH_DFW_ENABLED, /* Indicate the directed enabled or not. */
BT_MESH_DFW_RELAY_ENABLED, /* Indicate the directed relay enabled or not. */
BT_MESH_DFW_FRIEND_ENABLED, /* Indicate the directed friend enabled or not. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These names are confusing when read for example bt_mesh_dfw_set function. Could the postfix be for example _STATE: BT_MESH_DFW_STATE, BT_MESH_DFW_RELAY_STATE, BT_MESH_DFW_FRIEND_STATE?

Comment on lines +1593 to +1598
dfw_table_clear(sub->dfw, true);

dfw_subnet_cfg_clear(sub->dfw);

/* Remove All pending store flags */
atomic_clear(sub->dfw->flags);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All settings API should be called from the settings work (thread).

} relay_tables[] = {
#if defined(CONFIG_BT_MESH_RELAY)
{
.inbound_bearer = BT_MESH_NET_IF_ADV,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if I don't miss anything, now the relay from proxy to adv is broken (which was previously implemented in relay_to_adv function).

@henrikbrixandersen henrikbrixandersen removed this from the future milestone Mar 7, 2024
Copy link

github-actions bot commented May 7, 2024

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label May 7, 2024
@github-actions github-actions bot closed this May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants