Skip to content

prompty: fix parsing of tool_calls when array in arguments (#3820) #18598

prompty: fix parsing of tool_calls when array in arguments (#3820)

prompty: fix parsing of tool_calls when array in arguments (#3820) #18598

GitHub Actions / promptflow-evals test result succeeded Dec 9, 2024 in 0s

All 125 tests pass in 20m 15s

   12 files     12 suites   20m 15s ⏱️
  125 tests   125 ✅ 0 💤 0 ❌
1 500 runs  1 500 ✅ 0 💤 0 ❌

Results for commit 40c84b4.

Annotations

Check notice on line 0 in .github

See this annotation in the file changed.

@github-actions github-actions / promptflow-evals test result

125 tests found

There are 125 tests, see "Raw output" for the full list of tests.
Raw output
tests.evals.unittests.test_batch_run_context.TestBatchRunContext ‑ test_batch_timeout_custom
tests.evals.unittests.test_batch_run_context.TestBatchRunContext ‑ test_batch_timeout_default
tests.evals.unittests.test_batch_run_context.TestBatchRunContext ‑ test_with_codeclient
tests.evals.unittests.test_batch_run_context.TestBatchRunContext ‑ test_with_pfclient
tests.evals.unittests.test_built_in_evaluator.TestBuiltInEvaluators ‑ test_fluency_evaluator
tests.evals.unittests.test_built_in_evaluator.TestBuiltInEvaluators ‑ test_fluency_evaluator_empty_string
tests.evals.unittests.test_built_in_evaluator.TestBuiltInEvaluators ‑ test_fluency_evaluator_non_string_inputs
tests.evals.unittests.test_chat_evaluator.TestChatEvaluator ‑ test_conversation_validation_invalid_citations
tests.evals.unittests.test_chat_evaluator.TestChatEvaluator ‑ test_conversation_validation_missing_role
tests.evals.unittests.test_chat_evaluator.TestChatEvaluator ‑ test_conversation_validation_normal
tests.evals.unittests.test_chat_evaluator.TestChatEvaluator ‑ test_conversation_validation_question_answer_not_paired
tests.evals.unittests.test_chat_evaluator.TestChatEvaluator ‑ test_per_turn_results_aggregation
tests.evals.unittests.test_content_safety_chat_evaluator.TestChatEvaluator ‑ test_conversation_validation_missing_role
tests.evals.unittests.test_content_safety_chat_evaluator.TestChatEvaluator ‑ test_conversation_validation_normal
tests.evals.unittests.test_content_safety_chat_evaluator.TestChatEvaluator ‑ test_conversation_validation_question_answer_not_paired
tests.evals.unittests.test_content_safety_chat_evaluator.TestChatEvaluator ‑ test_per_turn_results_aggregation
tests.evals.unittests.test_content_safety_defect_rate.TestContentSafetyDefectRate ‑ test_content_safety_defect_rate
tests.evals.unittests.test_content_safety_rai_script.TestContentSafetyEvaluator ‑ test_ensure_service_availability
tests.evals.unittests.test_content_safety_rai_script.TestContentSafetyEvaluator ‑ test_ensure_service_availability_exception_capability_unavailable
tests.evals.unittests.test_content_safety_rai_script.TestContentSafetyEvaluator ‑ test_ensure_service_availability_service_unavailable
tests.evals.unittests.test_content_safety_rai_script.TestContentSafetyEvaluator ‑ test_evaluate_with_rai_service
tests.evals.unittests.test_content_safety_rai_script.TestContentSafetyEvaluator ‑ test_fetch_or_reuse_token
tests.evals.unittests.test_content_safety_rai_script.TestContentSafetyEvaluator ‑ test_fetch_result
tests.evals.unittests.test_content_safety_rai_script.TestContentSafetyEvaluator ‑ test_fetch_result_timeout
tests.evals.unittests.test_content_safety_rai_script.TestContentSafetyEvaluator ‑ test_get_rai_svc_url
tests.evals.unittests.test_content_safety_rai_script.TestContentSafetyEvaluator ‑ test_get_service_discovery_url
tests.evals.unittests.test_content_safety_rai_script.TestContentSafetyEvaluator ‑ test_get_service_discovery_url_exception
tests.evals.unittests.test_content_safety_rai_script.TestContentSafetyEvaluator ‑ test_parse_response
tests.evals.unittests.test_content_safety_rai_script.TestContentSafetyEvaluator ‑ test_rai_subscript_functions
tests.evals.unittests.test_content_safety_rai_script.TestContentSafetyEvaluator ‑ test_submit_request
tests.evals.unittests.test_content_safety_rai_script.TestContentSafetyEvaluator ‑ test_submit_request_not_found
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_end_logs_if_fails
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_end_raises[FAILED-False]
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_end_raises[FINISHED-False]
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_end_raises[KILLED-False]
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_end_raises[WRONG_STATUS-True]
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_get_urls
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_lifecycle[200-False]
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_lifecycle[200-True]
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_lifecycle[401-False]
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_lifecycle[401-True]
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_local_lifecycle
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_log_artifacts_logs_error[log_artifact-register artifact]
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_log_artifacts_logs_error[log_metric-save metrics]
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_log_metrics_and_instance_results_logs_error
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_logs_if_not_started[log_artifact-args2-log artifact]
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_logs_if_not_started[log_metric-args1-log metric]
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_logs_if_not_started[write_properties_to_run_history-args0-write properties]
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_run_broken_if_no_tracking_uri
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_run_logs_if_terminated
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_run_name
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_run_with_name
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_start_run_fails
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_starting_started_run[RunStatus.BROKEN]
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_starting_started_run[RunStatus.STARTED]
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_starting_started_run[RunStatus.TERMINATED]
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_write_properties[200]
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_write_properties[401]
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_write_properties_to_run_history_logs_error
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_wrong_artifact_path[False-True-The path to the artifact is either not a directory or does not exist.]
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_wrong_artifact_path[True-False-The run results file was not found, skipping artifacts upload.]
tests.evals.unittests.test_eval_run.TestEvalRun ‑ test_wrong_artifact_path[True-True-The path to the artifact is empty.]
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_apply_column_mapping
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_apply_column_mapping_target[json_data0-inputs_mapping0-I'm fine]
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_apply_column_mapping_target[json_data1-inputs_mapping1-I'm great]
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_apply_column_mapping_target[json_data2-inputs_mapping2-I'm fine]
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_apply_column_mapping_target[json_data3-inputs_mapping3-I'm fine]
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_apply_column_mapping_target[json_data4-inputs_mapping4-I'm great]
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_apply_column_mapping_target[json_data5-inputs_mapping5-I'm fine]
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_apply_target_to_data[questions.jsonl-questions_answers.jsonl-expected_columns0-_target_fn]
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_apply_target_to_data[questions_ground_truth.jsonl-questions_answers_ground_truth.jsonl-expected_columns1-_target_fn2]
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_content_safety_aggregation
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_evaluate_evaluators_not_a_dict
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_evaluate_invalid_data
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_evaluate_invalid_evaluator_config
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_evaluate_invalid_jsonl_data
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_evaluate_main_entry_guard
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_evaluate_missing_data
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_evaluate_missing_required_inputs
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_evaluate_missing_required_inputs_target
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_evaluate_output_path[False]
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_evaluate_output_path[True]
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_evaluate_with_errors
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_general_aggregation
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_get_trace_destination
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_label_based_aggregation
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_renaming_column
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_target_raises_on_outputs
tests.evals.unittests.test_evaluate.TestEvaluate ‑ test_wrong_target
tests.evals.unittests.test_jailbreak_simulator.TestSimulator ‑ test_initialization_parity_with_evals
tests.evals.unittests.test_jailbreak_simulator.TestSimulator ‑ test_initialization_with_all_valid_scenarios
tests.evals.unittests.test_jailbreak_simulator.TestSimulator ‑ test_simulator_raises_validation_error_with_unsupported_scenario
tests.evals.unittests.test_save_eval.TestSaveEval ‑ test_load_and_run_evaluators
tests.evals.unittests.test_save_eval.TestSaveEval ‑ test_save_evaluators[BleuScoreEvaluator]
tests.evals.unittests.test_save_eval.TestSaveEval ‑ test_save_evaluators[ChatEvaluator]
tests.evals.unittests.test_save_eval.TestSaveEval ‑ test_save_evaluators[CoherenceEvaluator]
tests.evals.unittests.test_save_eval.TestSaveEval ‑ test_save_evaluators[ContentSafetyChatEvaluator]
tests.evals.unittests.test_save_eval.TestSaveEval ‑ test_save_evaluators[ContentSafetyEvaluator]
tests.evals.unittests.test_save_eval.TestSaveEval ‑ test_save_evaluators[F1ScoreEvaluator]
tests.evals.unittests.test_save_eval.TestSaveEval ‑ test_save_evaluators[FluencyEvaluator]
tests.evals.unittests.test_save_eval.TestSaveEval ‑ test_save_evaluators[GleuScoreEvaluator]
tests.evals.unittests.test_save_eval.TestSaveEval ‑ test_save_evaluators[GroundednessEvaluator]
tests.evals.unittests.test_save_eval.TestSaveEval ‑ test_save_evaluators[HateUnfairnessEvaluator]
tests.evals.unittests.test_save_eval.TestSaveEval ‑ test_save_evaluators[IndirectAttackEvaluator]
tests.evals.unittests.test_save_eval.TestSaveEval ‑ test_save_evaluators[MeteorScoreEvaluator]
tests.evals.unittests.test_save_eval.TestSaveEval ‑ test_save_evaluators[ProtectedMaterialEvaluator]
tests.evals.unittests.test_save_eval.TestSaveEval ‑ test_save_evaluators[QAEvaluator]
tests.evals.unittests.test_save_eval.TestSaveEval ‑ test_save_evaluators[RelevanceEvaluator]
tests.evals.unittests.test_save_eval.TestSaveEval ‑ test_save_evaluators[RougeScoreEvaluator]
tests.evals.unittests.test_save_eval.TestSaveEval ‑ test_save_evaluators[SelfHarmEvaluator]
tests.evals.unittests.test_save_eval.TestSaveEval ‑ test_save_evaluators[SexualEvaluator]
tests.evals.unittests.test_save_eval.TestSaveEval ‑ test_save_evaluators[SimilarityEvaluator]
tests.evals.unittests.test_save_eval.TestSaveEval ‑ test_save_evaluators[ViolenceEvaluator]
tests.evals.unittests.test_simulator.TestSimulator ‑ test_initialization_parity_with_evals
tests.evals.unittests.test_simulator.TestSimulator ‑ test_initialization_with_all_valid_scenarios
tests.evals.unittests.test_simulator.TestSimulator ‑ test_simulator_raises_validation_error_with_unsupported_scenario
tests.evals.unittests.test_synthetic_callback_conv_bot.TestCallbackConversationBot ‑ test_generate_response_with_callback_exception
tests.evals.unittests.test_synthetic_callback_conv_bot.TestCallbackConversationBot ‑ test_generate_response_with_no_callback_response
tests.evals.unittests.test_synthetic_callback_conv_bot.TestCallbackConversationBot ‑ test_generate_response_with_valid_callback
tests.evals.unittests.test_synthetic_conversation_bot.TestConversationBot ‑ test_conversation_bot_initialization_assistant
tests.evals.unittests.test_synthetic_conversation_bot.TestConversationBot ‑ test_conversation_bot_initialization_user
tests.evals.unittests.test_synthetic_conversation_bot.TestConversationBot ‑ test_conversation_bot_initialization_user_invalid_jinja
tests.evals.unittests.test_synthetic_conversation_bot.TestConversationBot ‑ test_generate_response_first_turn_with_starter
tests.evals.unittests.test_synthetic_conversation_bot.TestConversationBot ‑ test_generate_response_with_history_and_role
tests.evals.unittests.test_utils.TestUtils ‑ test_nltk_tokenize