[Bug]: usage-based-routing-v2 router retry logic doesn't respect `retry_after` or do backoff causing immediate failure #7669

rob-judith · 2025-01-10T16:26:53Z

What happened?

When using usage-based-routing-v2 router retry logic doesn't do any backoff for rate limit errors and ignores all router settings for retries such as retry_after. All retries happen immediatly and fail because it's a RateLimitError. Expected behavior is that there would be exponential backoff (as mentioned in the docs here) or it would use the information in the RateLimitError header to set the retry logic (mentioned here in the retry_after setting). Previous behavior v1.55.12 appears to use the rate limit header to set the wait to 60 seconds, I included logs for both the good and bad behavior. I've tested different versions and the immediate retry behavior appears in v1.56.2.

The test configuration is shown bellow, and I attached the logs for v1.57.5 with the incorrect behavior in relevant log output field and the logs for v1.55.12 which has the expected behavior. The command was run. litellm --config ./scratch/test.yaml --detailed_debug

litellm_settings:
  turn_off_message_logging: true
router_settings:
  enable_precall_checks: True
  routing_strategy: usage-based-routing-v2
  num_retries: 4
  retry_after: 30
  timeout: 300
model_list:
  - model_name: test
    litellm_params:
      model: bedrock/anthropic.claude-3-haiku-20240307-v1:0
      aws_region_name: us-east-1
      timeout: 15
    rpm: 1
  - model_name: test
    litellm_params:
      model: bedrock/anthropic.claude-3-haiku-20240307-v1:0
      aws_region_name: us-west-2
      timeout: 15
    rpm: 1

Test code:

import openai

client = openai.AsyncOpenAI(api_key="anything", base_url="http://0.0.0.0:4000")
for i in range(4):
    y = await client.chat.completions.create(model="test",  messages = [
        {
            "role": "user",
            "content": "this is a test request, write a short poem"
        }],
    )
    print(y)
> RateLimitError: Error code: 429 - {'error': {'message': "litellm.RateLimitError: No deployments available for selected model. 12345 Passed model=test. Deployments={'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68': {'current_tpm': 161, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}, '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665': {'current_tpm': 67, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}}\nReceived Model Group=test\nAvailable Model Group Fallbacks=None", 'type': 'throttling_error', 'param': None, 'code': '429'}}

At a minimum I would expect the retry logic to at least follow the retry_after flag in the config. Ideally it would respect the information in the RateLimitError to retry after x time.

Let me know if you need any more information, and how we can help.

Relevant log output

Attached as comments because I errors that the comment was too long.

Are you a ML Ops Team?

Yes

What LiteLLM version are you on ?

v1.57.5

Twitter / LinkedIn details

The text was updated successfully, but these errors were encountered:

rob-judith · 2025-01-10T16:29:55Z

v.155.12 logs with expected behavior.

INFO:     Started server process [36206]
INFO:     Waiting for application startup.
�[92m16:20:29 - LiteLLM Proxy:DEBUG�[0m: proxy_server.py:3010 - litellm.proxy.proxy_server.py::startup() - CHECKING PREMIUM USER - False
�[92m16:20:29 - LiteLLM Proxy:DEBUG�[0m: litellm_license.py:98 - litellm.proxy.auth.litellm_license.py::is_premium() - ENTERING 'IS_PREMIUM' - LiteLLM License=None
�[92m16:20:29 - LiteLLM Proxy:DEBUG�[0m: litellm_license.py:107 - litellm.proxy.auth.litellm_license.py::is_premium() - Updated 'self.license_str' - None
�[92m16:20:29 - LiteLLM Proxy:DEBUG�[0m: proxy_server.py:3021 - worker_config: {"model": null, "alias": null, "api_base": null, "api_version": "2024-07-01-preview", "debug": false, "detailed_debug": true, "temperature": null, "max_tokens": null, "request_timeout": null, "max_budget": null, "telemetry": true, "drop_params": false, "add_function_to_prompt": false, "headers": null, "save": false, "config": "./scratch/test.yaml", "use_queue": false}
�[92m16:20:29 - LiteLLM Proxy:DEBUG�[0m: proxy_server.py:1252 - loaded config={
    "litellm_settings": {
        "turn_off_message_logging": true
    },
    "router_settings": {
        "enable_precall_checks": true,
        "routing_strategy": "usage-based-routing-v2",
        "num_retries": 4,
        "retry_after": 30,
        "timeout": 300
    },
    "model_list": [
        {
            "model_name": "test",
            "litellm_params": {
                "model": "bedrock/anthropic.claude-3-haiku-20240307-v1:0",
                "aws_region_name": "us-east-1",
                "timeout": 15
            },
            "rpm": 1
        },
        {
            "model_name": "test",
            "litellm_params": {
                "model": "bedrock/anthropic.claude-3-haiku-20240307-v1:0",
                "aws_region_name": "us-west-2",
                "timeout": 15
            },
            "rpm": 1
        }
    ]
}
�[92m16:20:29 - LiteLLM Proxy:DEBUG�[0m: proxy_server.py:1697 - �[94m setting litellm.turn_off_message_logging=True�[0m
�[92m16:20:29 - LiteLLM:DEBUG�[0m: utils.py:1892 - bedrock/anthropic.claude-3-haiku-20240307-v1:0 added to model cost map
�[92m16:20:29 - LiteLLM:DEBUG�[0m: utils.py:1892 - bedrock/anthropic.claude-3-haiku-20240307-v1:0 added to model cost map
�[92m16:20:29 - LiteLLM Router:DEBUG�[0m: router.py:3923 - 
Initialized Model List ['test', 'test']
�[92m16:20:29 - LiteLLM Router:INFO�[0m: router.py:605 - Routing strategy: usage-based-routing-v2
�[92m16:20:29 - LiteLLM Router:DEBUG�[0m: router.py:497 - Intialized router with Routing strategy: usage-based-routing-v2

Routing enable_pre_call_checks: False

Routing fallbacks: None

Routing content fallbacks: None

Routing context window fallbacks: None

Router Redis Caching=None

�[92m16:20:29 - LiteLLM Proxy:DEBUG�[0m: proxy_server.py:3086 - prisma_client: None
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:4000 (Press CTRL+C to quit)
�[92m16:20:35 - LiteLLM Proxy:DEBUG�[0m: proxy_server.py:3238 - Request received by LiteLLM:
{
    "messages": [
        {
            "role": "user",
            "content": "this is a test request, write a short poem"
        }
    ],
    "model": "test"
}
�[92m16:20:35 - LiteLLM Proxy:DEBUG�[0m: litellm_pre_call_utils.py:417 - Request Headers: Headers({'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'authorization': 'Bearer anything', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'})
�[92m16:20:35 - LiteLLM Proxy:DEBUG�[0m: litellm_pre_call_utils.py:423 - receiving data: {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test', 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}}
�[92m16:20:35 - LiteLLM Proxy:DEBUG�[0m: litellm_pre_call_utils.py:589 - [PROXY]returned data from litellm_pre_call_utils: {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test', 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.55.12', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': ''}}
�[92m16:20:35 - LiteLLM Proxy:DEBUG�[0m: utils.py:88 - Inside Proxy Logging Pre-call hook!
NoneType: None

�[92m16:20:35 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - Inside Max Parallel Request Pre-Call Hook
�[92m16:20:35 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - current: None
�[92m16:20:35 - LiteLLM:DEBUG�[0m: utils.py:275 - Initialized litellm callbacks, Async Success Callbacks: [<bound method Router.deployment_callback_on_success of <litellm.router.Router object at 0x750db42e2900>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x750db46f1760>, <litellm.router_strategy.lowest_tpm_rpm_v2.LowestTPMLoggingHandler_v2 object at 0x750db3a22030>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x750db46f1cd0>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x750db46f1d30>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x750db46f1d60>, <litellm._service_logger.ServiceLogging object at 0x750db46f1dc0>]
�[92m16:20:35 - LiteLLM:DEBUG�[0m: litellm_logging.py:404 - self.optional_params: {}
�[92m16:20:35 - LiteLLM Router:DEBUG�[0m: router.py:2865 - Inside async function with retries: args - (); kwargs - {'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.55.12', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test'}, 'litellm_call_id': '3abd901f-7a25-4cc6-a882-fdde367717b5', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x750db3a23170>, 'model': 'test', 'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'stream': False, 'original_function': <bound method Router._acompletion of <litellm.router.Router object at 0x750db42e2900>>, 'num_retries': 4, 'litellm_trace_id': '2fded341-2d0f-49c0-86de-c75871d6d81e', 'mock_timeout': None}
�[92m16:20:35 - LiteLLM Router:DEBUG�[0m: router.py:2887 - async function w/ retries: original_function - <bound method Router._acompletion of <litellm.router.Router object at 0x750db42e2900>>, num_retries - 4
�[92m16:20:35 - LiteLLM Router:DEBUG�[0m: router.py:848 - Inside _acompletion()- model: test; kwargs: {'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.55.12', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2}, 'litellm_call_id': '3abd901f-7a25-4cc6-a882-fdde367717b5', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x750db3a23170>, 'stream': False, 'litellm_trace_id': '2fded341-2d0f-49c0-86de-c75871d6d81e', 'mock_timeout': None}
�[92m16:20:35 - LiteLLM Router:DEBUG�[0m: router.py:5188 - initial list of deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', 'db_model': False}, 'rpm': 1}, {'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', 'db_model': False}, 'rpm': 1}]
�[92m16:20:35 - LiteLLM Router:DEBUG�[0m: cooldown_handlers.py:234 - retrieve cooldown models: []
�[92m16:20:35 - LiteLLM Router:DEBUG�[0m: router.py:5249 - async cooldown deployments: []
�[92m16:20:35 - LiteLLM Router:DEBUG�[0m: router.py:5252 - cooldown_deployments: []
�[92m16:20:35 - LiteLLM Router:DEBUG�[0m: router.py:5531 - cooldown deployments: []
�[92m16:20:35 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:434 - get_available_deployments - Usage Based. model_group: test, healthy_deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', 'db_model': False}, 'rpm': 1}, {'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', 'db_model': False}, 'rpm': 1}]
�[92m16:20:35 - LiteLLM:DEBUG�[0m: utils.py:275 - token_counter messages received: [{'role': 'user', 'content': 'this is a test request, write a short poem'}]
�[92m16:20:35 - LiteLLM:DEBUG�[0m: utils.py:275 - Token Counter - using generic token counter, for model=
�[92m16:20:35 - LiteLLM:DEBUG�[0m: utils.py:275 - LiteLLM: Utils - Counting tokens for OpenAI model=gpt-3.5-turbo
�[92m16:20:35 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:347 - input_tokens=17
�[92m16:20:35 - LiteLLM:DEBUG�[0m: utils.py:275 - returning picked lowest tpm/rpm deployment.
�[92m16:20:35 - LiteLLM Router:INFO�[0m: router.py:5355 - get_available_deployment for model: test, Selected deployment: {'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', 'db_model': False}, 'rpm': 1} for model: test
�[92m16:20:35 - LiteLLM:DEBUG�[0m: utils.py:275 - 

�[92m16:20:35 - LiteLLM:DEBUG�[0m: utils.py:275 - �[92mRequest to litellm:�[0m
�[92m16:20:35 - LiteLLM:DEBUG�[0m: utils.py:275 - �[92mlitellm.acompletion(rpm=1, timeout=15.0, aws_region_name='us-east-1', model='bedrock/anthropic.claude-3-haiku-20240307-v1:0', messages=[{'role': 'user', 'content': 'this is a test request, write a short poem'}], caching=False, client=None, proxy_server_request={'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, metadata={'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.55.12', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2, 'deployment': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0', 'model_info': {'id': '116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', 'db_model': False}, 'api_base': None, 'caching_groups': None}, litellm_call_id='3abd901f-7a25-4cc6-a882-fdde367717b5', litellm_logging_obj=<litellm.litellm_core_utils.litellm_logging.Logging object at 0x750db3a23170>, stream=False, litellm_trace_id='2fded341-2d0f-49c0-86de-c75871d6d81e', mock_timeout=None, model_info={'id': '116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', 'db_model': False}, max_retries=0)�[0m
�[92m16:20:35 - LiteLLM:DEBUG�[0m: utils.py:275 - 

�[92m16:20:35 - LiteLLM:DEBUG�[0m: utils.py:275 - ASYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache'): None
�[92m16:20:35 - LiteLLM:DEBUG�[0m: caching_handler.py:212 - CACHE RESULT: None
�[92m16:20:35 - LiteLLM:INFO�[0m: utils.py:2699 - 
LiteLLM completion() model= anthropic.claude-3-haiku-20240307-v1:0; provider = bedrock
�[92m16:20:35 - LiteLLM:DEBUG�[0m: utils.py:2702 - 
LiteLLM: Params passed to completion() {'model': 'anthropic.claude-3-haiku-20240307-v1:0', 'functions': None, 'function_call': None, 'temperature': None, 'top_p': None, 'n': None, 'stream': False, 'stream_options': None, 'stop': None, 'max_tokens': None, 'max_completion_tokens': None, 'modalities': None, 'prediction': None, 'audio': None, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': None, 'custom_llm_provider': 'bedrock', 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': 0, 'logprobs': None, 'top_logprobs': None, 'extra_headers': None, 'api_version': None, 'parallel_tool_calls': None, 'drop_params': None, 'additional_drop_params': None, 'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'aws_region_name': 'us-east-1'}
�[92m16:20:35 - LiteLLM:DEBUG�[0m: utils.py:2705 - 
LiteLLM: Non-Default params passed to completion() {'stream': False, 'max_retries': 0}
�[92m16:20:35 - LiteLLM:DEBUG�[0m: utils.py:275 - Final returned optional params: {'stream': False, 'aws_region_name': 'us-east-1'}
�[92m16:20:35 - LiteLLM:DEBUG�[0m: litellm_logging.py:404 - self.optional_params: {'stream': False, 'aws_region_name': 'us-east-1'}
�[92m16:20:35 - LiteLLM:DEBUG�[0m: base_aws_llm.py:122 - in get credentials
aws_access_key_id=None
aws_secret_access_key=None
aws_session_token=None
aws_region_name=us-east-1
aws_session_name=None
aws_profile_name=None
aws_role_name=None
aws_web_identity_token=None
aws_sts_endpoint=None
�[92m16:20:36 - LiteLLM:DEBUG�[0m: litellm_logging.py:524 - PRE-API-CALL ADDITIONAL ARGS: {'complete_input_dict': '{"messages": [{"role": "user", "content": [{"text": "this is a test request, write a short poem"}]}], "additionalModelRequestFields": {}, "system": [], "inferenceConfig": {}}', 'api_base': 'https://bedrock-runtime.us-east-1.amazonaws.com/model/anthropic.claude-3-haiku-20240307-v1:0/converse', 'headers': {'Content-Type': 'application/json', 'X-Amz-Date': '20250110T162035Z', 'X-Amz-Security-Token': 'IQoJb3JpZ2luX2VjEMD//////////wEaCXVzLWVhc3QtMiJHMEUCIQCWW9G1xjheP/+g2tPELBJxCwfjW6CvSc5/z3y/Uh48DgIgeA83Eh7aNv2m8sJU2AKM21ptm3JjFJ6jrPhhRCvnFXYqmAMIqf//////////ARAAGgwyMTExMjU2NDkwMTkiDLyEL6hn1dpDhjdBdCrsAoQrzNmmx8+O+8Nwy7FjaxYBVtYCHb14JVBXbZE4ZzwN/WoAocvand7gZyWGDQoDPMGGEbrW1bLdbsrFPIrEJ4SDUmDNMAgOEg0JmJEFNKzMxRE3+MFcP80MBf6vSjj/X6wEIgO8nqIj0XaLjHFL7z2QgeWt87wL9Diik50fWaH6awGYnektnAHZ98KOykD4yAZQs1MOB9O8s2tz7M4aW30VLxLBkww9QE7Y0v5WkTb9F6SXZ4BlDLnIzvXXnpL0w+VIeYX5Pjm7YGp5C70g98ko425sEWytw7+0AXAQaMDxhGHQHqiIME04D8rAdeJ1LGCfpzuAqk/shZDDgIb8I7gxZoRELNWOYNKIF7u1vx8pkPAj6RAIEJrYPFNCt3OMTFMM4VOZvYmz30GWQ9NxpOSZ1hYfK4HhhhqnWqa6BedBCuD6m9qpphRjFlJ4wJT6QfQB2hy5sSZIqV1CpwjM7icguTBuiXUPtEeEIbow1JGFvAY6pgGrbf3I7MHjLeGuy1F78P39eSUCdkGKVq7TekNHVGntR2WvXpOBiduQHyL2F/coM/Z4iVXKxE/LIs/6xFtczKT27/RC3jPU7g8ixlE8MrulRmjS/sttbdNd0AdvovgYM9224QV+oQd8EytlRPR3DzxD774NYGyd0AG/02QWxRtB4o/sP6UHTlzBV4EWsVRIozJvbQ608ssEU5J/kfE5JaBZXgp2Ujtu', 'Authorization': 'AWS4-HMAC-SHA256 Credential=ASIATCKARXZ5WT7DBIEL/20250110/us-east-1/bedrock/aws4_request, SignedHeaders=content-type;host;x-amz-date;x-amz-security-token, Signature=e83e20e975d34395c084816437ba2d50b854c6f16e351eaef3a3a292a5c6d537', 'Content-Length': '174'}}
�[92m16:20:36 - LiteLLM:DEBUG�[0m: utils.py:275 - �[92m

POST Request Sent from LiteLLM:
curl -X POST \
https://bedrock-runtime.us-east-1.amazonaws.com/model/anthropic.claude-3-haiku-20240307-v1:0/converse \
-H 'Content-Type: *****' -H 'X-Amz-Date: *****' -H 'X-Amz-Security-Token: IQoJb3JpZ2luX2VjEMD//////////wEaCXVzLWVhc3QtMiJHMEUCIQCWW9G1xjheP/+g2tPELBJxCwfjW6CvSc5/z3y/Uh48DgIgeA83Eh7aNv2m8sJU2AKM21ptm3JjFJ6jrPhhRCvnFXYqmAMIqf//////////ARAAGgwyMTExMjU2NDkwMTkiDLyEL6hn1dpDhjdBdCrsAoQrzNmmx8+O+8Nwy7FjaxYBVtYCHb14JVBXbZE4ZzwN/WoAocvand7gZyWGDQoDPMGGEbrW1bLdbsrFPIrEJ4SDUmDNMAgOEg0JmJEFNKzMxRE3+MFcP80MBf6vSjj/X6wEIgO8nqIj0XaLjHFL7z2QgeWt87wL9Diik50fWaH6awGYnektnAHZ98KOykD4yAZQs1MOB9O8s2tz7M4aW30VLxLBkww9QE7Y0v5WkTb9F6SXZ4BlDLnIzvXXnpL0w+VIeYX5Pjm7YGp5C70g98ko425sEWytw7+0AXAQaMDxhGHQHqiIME04D8rAdeJ1LGCfpzuAqk/shZDDgIb8I7gxZoRELNWOYNKIF7u1vx8pkPAj6RAIEJrYPFNCt3OMTFMM4VOZvYmz30GWQ9NxpOSZ1hYfK4HhhhqnWqa6BedBCuD6m9qpphRjFlJ4wJT6QfQB2hy5sSZIqV1CpwjM7icguTBuiXUPtEeEIbow1JGFvAY6pgGrbf3I7MHjLeGuy1F78P39eSUCdkGKVq7TekNHVGntR2WvXpOBiduQHyL2F/coM/Z4iVXKxE/LIs/6xFtczKT27/RC3jPU7g8ixlE8MrulRmjS/sttbdNd0AdvovgYM9224QV+oQd8EytlRPR3DzxD774NYGyd0AG/02QWxRtB4o/sP6UH********************************************' -H 'Authorization: AWS4-HMAC-SHA256 Credential=ASIATCKARXZ5WT7DBIEL/20250110/us-east-1/bedrock/aws4_request, SignedHeaders=content-type;host;x-amz-date;x-amz-security-token, Signature=e83e20e975d34395c084********************************************' -H 'Content-Length: *****' \
-d '{"messages": [{"role": "user", "content": [{"text": "this is a test request, write a short poem"}]}], "additionalModelRequestFields": {}, "system": [], "inferenceConfig": {}}'
�[0m

�[92m16:20:37 - LiteLLM:DEBUG�[0m: utils.py:275 - RAW RESPONSE:
{"metrics":{"latencyMs":901},"output":{"message":{"content":[{"text":"Here's a short poem for you:\n\nWhispers in the gentle breeze,\nPetals dance, a symphony.\nMoments fleeting, yet profound,\nNature's beauty all around."}],"role":"assistant"}},"stopReason":"end_turn","usage":{"inputTokens":17,"outputTokens":46,"totalTokens":63}}


�[92m16:20:37 - LiteLLM:DEBUG�[0m: main.py:5317 - raw model_response: {"metrics":{"latencyMs":901},"output":{"message":{"content":[{"text":"Here's a short poem for you:\n\nWhispers in the gentle breeze,\nPetals dance, a symphony.\nMoments fleeting, yet profound,\nNature's beauty all around."}],"role":"assistant"}},"stopReason":"end_turn","usage":{"inputTokens":17,"outputTokens":46,"totalTokens":63}}
�[92m16:20:37 - LiteLLM:DEBUG�[0m: cost_calculator.py:576 - completion_response response ms: None 
�[92m16:20:37 - LiteLLM:DEBUG�[0m: utils.py:275 - Returned custom cost for model=anthropic.claude-3-haiku-20240307-v1:0 - prompt_tokens_cost_usd_dollar: 4.25e-06, completion_tokens_cost_usd_dollar: 5.75e-05
�[92m16:20:37 - LiteLLM:DEBUG�[0m: utils.py:275 - Async Wrapper: Completed Call, calling async_success_handler: <bound method Logging.async_success_handler of <litellm.litellm_core_utils.litellm_logging.Logging object at 0x750db3a23170>>
�[92m16:20:37 - LiteLLM:DEBUG�[0m: utils.py:275 - Logging Details LiteLLM-Success Call: Cache_hit=None
�[92m16:20:37 - LiteLLM Router:INFO�[0m: router.py:943 - litellm.acompletion(model=bedrock/anthropic.claude-3-haiku-20240307-v1:0)�[32m 200 OK�[0m
�[92m16:20:37 - LiteLLM:DEBUG�[0m: cost_calculator.py:576 - completion_response response ms: 1437.052 
�[92m16:20:37 - LiteLLM Router:DEBUG�[0m: router.py:2630 - Async Response: ModelResponse(id='chatcmpl-a14d18c9-9e4d-4693-b0a6-6907612fbae3', created=1736526037, model='anthropic.claude-3-haiku-20240307-v1:0', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='stop', index=0, message=Message(content="Here's a short poem for you:\n\nWhispers in the gentle breeze,\nPetals dance, a symphony.\nMoments fleeting, yet profound,\nNature's beauty all around.", role='assistant', tool_calls=None, function_call=None))], usage=Usage(completion_tokens=46, prompt_tokens=17, total_tokens=63, completion_tokens_details=None, prompt_tokens_details=None))
�[92m16:20:37 - LiteLLM:DEBUG�[0m: utils.py:275 - Returned custom cost for model=anthropic.claude-3-haiku-20240307-v1:0 - prompt_tokens_cost_usd_dollar: 4.25e-06, completion_tokens_cost_usd_dollar: 5.75e-05
�[92m16:20:37 - LiteLLM:DEBUG�[0m: utils.py:275 - Logging Details LiteLLM-Async Success Call, cache_hit=None
�[92m16:20:37 - LiteLLM:DEBUG�[0m: litellm_logging.py:970 - success callbacks: [<bound method Router.sync_deployment_callback_on_success of <litellm.router.Router object at 0x750db42e2900>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x750db46f1760>, <litellm.router_strategy.lowest_tpm_rpm_v2.LowestTPMLoggingHandler_v2 object at 0x750db3a22030>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x750db46f1cd0>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x750db46f1d30>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x750db46f1d60>, <litellm._service_logger.ServiceLogging object at 0x750db46f1dc0>]
�[92m16:20:37 - LiteLLM:DEBUG�[0m: cost_calculator.py:576 - completion_response response ms: 1437.052 
�[92m16:20:37 - LiteLLM:DEBUG�[0m: utils.py:275 - Returned custom cost for model=anthropic.claude-3-haiku-20240307-v1:0 - prompt_tokens_cost_usd_dollar: 4.25e-06, completion_tokens_cost_usd_dollar: 5.75e-05
�[92m16:20:37 - LiteLLM Proxy:DEBUG�[0m: model_max_budget_limiter.py:151 - in RouterBudgetLimiting.async_log_success_event
�[92m16:20:37 - LiteLLM Proxy:DEBUG�[0m: model_max_budget_limiter.py:167 - Not running _PROXY_VirtualKeyModelMaxBudgetLimiter.async_log_success_event because user_api_key_model_max_budget is None or empty. `user_api_key_model_max_budget`={}
�[92m16:20:37 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - INSIDE parallel request limiter ASYNC SUCCESS LOGGING
�[92m16:20:37 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - updated_value in success call: {'current_requests': 0, 'current_tpm': 63, 'current_rpm': 1}, precise_minute: 2025-01-10-16-20

�[1;37m#------------------------------------------------------------#�[0m
�[1;37m#                                                            #�[0m
�[1;37m#           'It would help me if you could add...'            #�[0m
�[1;37m#        https://github.com/BerriAI/litellm/issues/new        #�[0m
�[1;37m#                                                            #�[0m
�[1;37m#------------------------------------------------------------#�[0m

 Thank you for using LiteLLM! - Krrish & Ishaan



�[1;31mGive Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new�[0m


�[32mLiteLLM: Proxy initialized with Config, Set models:�[0m
�[32m    test�[0m
�[32m    test�[0m
INFO:     127.0.0.1:39420 - "POST /chat/completions HTTP/1.1" 200 OK
�[92m16:20:37 - LiteLLM Proxy:DEBUG�[0m: proxy_server.py:3238 - Request received by LiteLLM:
{
    "messages": [
        {
            "role": "user",
            "content": "this is a test request, write a short poem"
        }
    ],
    "model": "test"
}
�[92m16:20:37 - LiteLLM Proxy:DEBUG�[0m: litellm_pre_call_utils.py:417 - Request Headers: Headers({'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'authorization': 'Bearer anything', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'})
�[92m16:20:37 - LiteLLM Proxy:DEBUG�[0m: litellm_pre_call_utils.py:423 - receiving data: {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test', 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}}
�[92m16:20:37 - LiteLLM Proxy:DEBUG�[0m: litellm_pre_call_utils.py:589 - [PROXY]returned data from litellm_pre_call_utils: {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test', 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.55.12', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': ''}}
�[92m16:20:37 - LiteLLM Proxy:DEBUG�[0m: utils.py:88 - Inside Proxy Logging Pre-call hook!
NoneType: None

�[92m16:20:37 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - Inside Max Parallel Request Pre-Call Hook
�[92m16:20:37 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - current: {'current_requests': 0, 'current_tpm': 63, 'current_rpm': 1}
�[92m16:20:37 - LiteLLM:DEBUG�[0m: utils.py:275 - Initialized litellm callbacks, Async Success Callbacks: [<bound method Router.deployment_callback_on_success of <litellm.router.Router object at 0x750db42e2900>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x750db46f1760>, <litellm.router_strategy.lowest_tpm_rpm_v2.LowestTPMLoggingHandler_v2 object at 0x750db3a22030>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x750db46f1cd0>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x750db46f1d30>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x750db46f1d60>, <litellm._service_logger.ServiceLogging object at 0x750db46f1dc0>]
�[92m16:20:37 - LiteLLM:DEBUG�[0m: litellm_logging.py:404 - self.optional_params: {}
�[92m16:20:37 - LiteLLM Router:DEBUG�[0m: router.py:2865 - Inside async function with retries: args - (); kwargs - {'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.55.12', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test'}, 'litellm_call_id': '670ce3fe-4d41-43b3-8a99-2c041232fdf4', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x750db0475310>, 'model': 'test', 'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'stream': False, 'original_function': <bound method Router._acompletion of <litellm.router.Router object at 0x750db42e2900>>, 'num_retries': 4, 'litellm_trace_id': '3ca5ed38-c71c-48ac-adbb-d5333cc8e8cc', 'mock_timeout': None}
�[92m16:20:37 - LiteLLM Router:DEBUG�[0m: router.py:2887 - async function w/ retries: original_function - <bound method Router._acompletion of <litellm.router.Router object at 0x750db42e2900>>, num_retries - 4
�[92m16:20:37 - LiteLLM Router:DEBUG�[0m: router.py:848 - Inside _acompletion()- model: test; kwargs: {'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.55.12', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2}, 'litellm_call_id': '670ce3fe-4d41-43b3-8a99-2c041232fdf4', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x750db0475310>, 'stream': False, 'litellm_trace_id': '3ca5ed38-c71c-48ac-adbb-d5333cc8e8cc', 'mock_timeout': None}
�[92m16:20:37 - LiteLLM Router:DEBUG�[0m: router.py:5188 - initial list of deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', 'db_model': False}, 'rpm': 1}, {'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', 'db_model': False}, 'rpm': 1}]
�[92m16:20:37 - LiteLLM Router:DEBUG�[0m: cooldown_handlers.py:234 - retrieve cooldown models: []
�[92m16:20:37 - LiteLLM Router:DEBUG�[0m: router.py:5249 - async cooldown deployments: []
�[92m16:20:37 - LiteLLM Router:DEBUG�[0m: router.py:5252 - cooldown_deployments: []
�[92m16:20:37 - LiteLLM Router:DEBUG�[0m: router.py:5531 - cooldown deployments: []
�[92m16:20:37 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:434 - get_available_deployments - Usage Based. model_group: test, healthy_deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', 'db_model': False}, 'rpm': 1}, {'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', 'db_model': False}, 'rpm': 1}]
�[92m16:20:37 - LiteLLM:DEBUG�[0m: utils.py:275 - token_counter messages received: [{'role': 'user', 'content': 'this is a test request, write a short poem'}]
�[92m16:20:37 - LiteLLM:DEBUG�[0m: utils.py:275 - Token Counter - using generic token counter, for model=
�[92m16:20:37 - LiteLLM:DEBUG�[0m: utils.py:275 - LiteLLM: Utils - Counting tokens for OpenAI model=gpt-3.5-turbo
�[92m16:20:37 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:347 - input_tokens=17
�[92m16:20:37 - LiteLLM:DEBUG�[0m: utils.py:275 - returning picked lowest tpm/rpm deployment.
�[92m16:20:37 - LiteLLM Router:INFO�[0m: router.py:5355 - get_available_deployment for model: test, Selected deployment: {'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', 'db_model': False}, 'rpm': 1} for model: test
�[92m16:20:37 - LiteLLM:DEBUG�[0m: utils.py:275 - 

�[92m16:20:37 - LiteLLM:DEBUG�[0m: utils.py:275 - �[92mRequest to litellm:�[0m
�[92m16:20:37 - LiteLLM:DEBUG�[0m: utils.py:275 - �[92mlitellm.acompletion(rpm=1, timeout=15.0, aws_region_name='us-west-2', model='bedrock/anthropic.claude-3-haiku-20240307-v1:0', messages=[{'role': 'user', 'content': 'this is a test request, write a short poem'}], caching=False, client=None, proxy_server_request={'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, metadata={'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.55.12', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2, 'deployment': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0', 'model_info': {'id': '0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', 'db_model': False}, 'api_base': None, 'caching_groups': None}, litellm_call_id='670ce3fe-4d41-43b3-8a99-2c041232fdf4', litellm_logging_obj=<litellm.litellm_core_utils.litellm_logging.Logging object at 0x750db0475310>, stream=False, litellm_trace_id='3ca5ed38-c71c-48ac-adbb-d5333cc8e8cc', mock_timeout=None, model_info={'id': '0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', 'db_model': False}, max_retries=0)�[0m
�[92m16:20:37 - LiteLLM:DEBUG�[0m: utils.py:275 - 

�[92m16:20:37 - LiteLLM:DEBUG�[0m: utils.py:275 - ASYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache'): None
�[92m16:20:37 - LiteLLM:DEBUG�[0m: caching_handler.py:212 - CACHE RESULT: None
�[92m16:20:37 - LiteLLM:INFO�[0m: utils.py:2699 - 
LiteLLM completion() model= anthropic.claude-3-haiku-20240307-v1:0; provider = bedrock
�[92m16:20:37 - LiteLLM:DEBUG�[0m: utils.py:2702 - 
LiteLLM: Params passed to completion() {'model': 'anthropic.claude-3-haiku-20240307-v1:0', 'functions': None, 'function_call': None, 'temperature': None, 'top_p': None, 'n': None, 'stream': False, 'stream_options': None, 'stop': None, 'max_tokens': None, 'max_completion_tokens': None, 'modalities': None, 'prediction': None, 'audio': None, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': None, 'custom_llm_provider': 'bedrock', 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': 0, 'logprobs': None, 'top_logprobs': None, 'extra_headers': None, 'api_version': None, 'parallel_tool_calls': None, 'drop_params': None, 'additional_drop_params': None, 'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'aws_region_name': 'us-west-2'}
�[92m16:20:37 - LiteLLM:DEBUG�[0m: utils.py:2705 - 
LiteLLM: Non-Default params passed to completion() {'stream': False, 'max_retries': 0}
�[92m16:20:37 - LiteLLM:DEBUG�[0m: utils.py:275 - Final returned optional params: {'stream': False, 'aws_region_name': 'us-west-2'}
�[92m16:20:37 - LiteLLM:DEBUG�[0m: litellm_logging.py:404 - self.optional_params: {'stream': False, 'aws_region_name': 'us-west-2'}
�[92m16:20:37 - LiteLLM:DEBUG�[0m: base_aws_llm.py:122 - in get credentials
aws_access_key_id=None
aws_secret_access_key=None
aws_session_token=None
aws_region_name=us-west-2
aws_session_name=None
aws_profile_name=None
aws_role_name=None
aws_web_identity_token=None
aws_sts_endpoint=None
�[92m16:20:37 - LiteLLM:DEBUG�[0m: litellm_logging.py:524 - PRE-API-CALL ADDITIONAL ARGS: {'complete_input_dict': '{"messages": [{"role": "user", "content": [{"text": "this is a test request, write a short poem"}]}], "additionalModelRequestFields": {}, "system": [], "inferenceConfig": {}}', 'api_base': 'https://bedrock-runtime.us-west-2.amazonaws.com/model/anthropic.claude-3-haiku-20240307-v1:0/converse', 'headers': {'Content-Type': 'application/json', 'X-Amz-Date': '20250110T162037Z', 'X-Amz-Security-Token': 'IQoJb3JpZ2luX2VjEMD//////////wEaCXVzLWVhc3QtMiJGMEQCIH0Jix5AVHvpMJD3krIC5MvLHJoF3AapKQbB46lGf+dJAiAXCn7nmc6UNjNPsk7LZySXcINqmFigJAMA0WPJahEC1yqYAwip//////////8BEAAaDDIxMTEyNTY0OTAxOSIMQalltcrRRiRsCG2UKuwCIJ4tJ1GJl5DJEyGlV2sUzrnv3EcOMVlcWroE9R8OrPk4yz80AW0yzJjeUp/kMCfG8ZdSMr2ZAFwXB662PgJegbnJEWFpfE21yHzTeYxyrkX1TuFT+7YXLGoiZhAa06aWhxxGU0TFq/X9EIzj4tYxkvkqts+rxG6YIVyPSi1E1RqaDPskmVCYe5JDE3/E1c98J+YjUM8DwMJQUAnFRC6X5mMI1Cg7LT3M5EF1gL66cV1GdexgF0EgBel3ER2f43zTBwCuB2hIs8nAttlZwGnb/EpR4LMmMEUR3iH+YAUdGaDB7d+HHr3uv/9GOn95xURB556ae9z4A/AOgwp+0fGEKHROwm3cPm3fckRCAYpvcXQLhsjdFbRAFIzUuQaTJHy8W8nYzKhHH9SG/QbnFqk5djUtSV2Ltrz/lN5OezxbeK6wYOw1RsQ9UE5PGJnafSVoqpUiUtQwIs0S3uycgOz7p2njHZoCERonMJMN4TDVkYW8BjqnAZ7f3lgXOdOHa0SyefBWGEXj8UdZ9zZzMocEpOIn1muICkYjBVDjt0yhkwbeSH3uBuENHtdMJ47VitlUznCRpBG2xCDVcpbjNi9iOcJU+55Y+4fbf9BWq5G9fKyvVWg7YckG5iKonI5P28zZyIX/qXWZCLOugPNWyO8Q3qntof24GbYAVUZE0MFV6tT9hz/fwytHiZIC9xCuIfxMZdgDArChkgY8yEKH', 'Authorization': 'AWS4-HMAC-SHA256 Credential=ASIATCKARXZ55DUDQ4LI/20250110/us-west-2/bedrock/aws4_request, SignedHeaders=content-type;host;x-amz-date;x-amz-security-token, Signature=dda79d0dcb6d0aaf446f5f14a4b81c28ce7158bed1b2a8fc6d457b538bc4989a', 'Content-Length': '174'}}
�[92m16:20:37 - LiteLLM:DEBUG�[0m: utils.py:275 - �[92m

POST Request Sent from LiteLLM:
curl -X POST \
https://bedrock-runtime.us-west-2.amazonaws.com/model/anthropic.claude-3-haiku-20240307-v1:0/converse \
-H 'Content-Type: *****' -H 'X-Amz-Date: *****' -H 'X-Amz-Security-Token: IQoJb3JpZ2luX2VjEMD//////////wEaCXVzLWVhc3QtMiJGMEQCIH0Jix5AVHvpMJD3krIC5MvLHJoF3AapKQbB46lGf+dJAiAXCn7nmc6UNjNPsk7LZySXcINqmFigJAMA0WPJahEC1yqYAwip//////////8BEAAaDDIxMTEyNTY0OTAxOSIMQalltcrRRiRsCG2UKuwCIJ4tJ1GJl5DJEyGlV2sUzrnv3EcOMVlcWroE9R8OrPk4yz80AW0yzJjeUp/kMCfG8ZdSMr2ZAFwXB662PgJegbnJEWFpfE21yHzTeYxyrkX1TuFT+7YXLGoiZhAa06aWhxxGU0TFq/X9EIzj4tYxkvkqts+rxG6YIVyPSi1E1RqaDPskmVCYe5JDE3/E1c98J+YjUM8DwMJQUAnFRC6X5mMI1Cg7LT3M5EF1gL66cV1GdexgF0EgBel3ER2f43zTBwCuB2hIs8nAttlZwGnb/EpR4LMmMEUR3iH+YAUdGaDB7d+HHr3uv/9GOn95xURB556ae9z4A/AOgwp+0fGEKHROwm3cPm3fckRCAYpvcXQLhsjdFbRAFIzUuQaTJHy8W8nYzKhHH9SG/QbnFqk5djUtSV2Ltrz/lN5OezxbeK6wYOw1RsQ9UE5PGJnafSVoqpUiUtQwIs0S3uycgOz7p2njHZoCERonMJMN4TDVkYW8BjqnAZ7f3lgXOdOHa0SyefBWGEXj8UdZ9zZzMocEpOIn1muICkYjBVDjt0yhkwbeSH3uBuENHtdMJ47VitlUznCRpBG2xCDVcpbjNi9iOcJU+55Y+4fbf9BWq5G9fKyvVWg7YckG5iKonI5P28zZyIX/qXWZCLOugPNWyO8Q3qntof24GbYAVUZE********************************************' -H 'Authorization: AWS4-HMAC-SHA256 Credential=ASIATCKARXZ55DUDQ4LI/20250110/us-west-2/bedrock/aws4_request, SignedHeaders=content-type;host;x-amz-date;x-amz-security-token, Signature=dda79d0dcb6d0aaf446f********************************************' -H 'Content-Length: *****' \
-d '{"messages": [{"role": "user", "content": [{"text": "this is a test request, write a short poem"}]}], "additionalModelRequestFields": {}, "system": [], "inferenceConfig": {}}'
�[0m

�[92m16:20:38 - LiteLLM:DEBUG�[0m: utils.py:275 - RAW RESPONSE:
{"metrics":{"latencyMs":783},"output":{"message":{"content":[{"text":"Here is a short poem for you:\n\nWhispers of the wind,\nCarrying secrets that transcend.\nNature's symphony unfolds,\nRevealing tales yet untold."}],"role":"assistant"}},"stopReason":"end_turn","usage":{"inputTokens":17,"outputTokens":42,"totalTokens":59}}


�[92m16:20:38 - LiteLLM:DEBUG�[0m: main.py:5317 - raw model_response: {"metrics":{"latencyMs":783},"output":{"message":{"content":[{"text":"Here is a short poem for you:\n\nWhispers of the wind,\nCarrying secrets that transcend.\nNature's symphony unfolds,\nRevealing tales yet untold."}],"role":"assistant"}},"stopReason":"end_turn","usage":{"inputTokens":17,"outputTokens":42,"totalTokens":59}}
�[92m16:20:38 - LiteLLM:DEBUG�[0m: cost_calculator.py:576 - completion_response response ms: None 
�[92m16:20:38 - LiteLLM:DEBUG�[0m: utils.py:275 - Returned custom cost for model=anthropic.claude-3-haiku-20240307-v1:0 - prompt_tokens_cost_usd_dollar: 4.25e-06, completion_tokens_cost_usd_dollar: 5.25e-05
�[92m16:20:38 - LiteLLM:DEBUG�[0m: utils.py:275 - Async Wrapper: Completed Call, calling async_success_handler: <bound method Logging.async_success_handler of <litellm.litellm_core_utils.litellm_logging.Logging object at 0x750db0475310>>
�[92m16:20:38 - LiteLLM:DEBUG�[0m: utils.py:275 - Logging Details LiteLLM-Success Call: Cache_hit=None
�[92m16:20:38 - LiteLLM Router:INFO�[0m: router.py:943 - litellm.acompletion(model=bedrock/anthropic.claude-3-haiku-20240307-v1:0)�[32m 200 OK�[0m
�[92m16:20:38 - LiteLLM:DEBUG�[0m: cost_calculator.py:576 - completion_response response ms: 1316.362 
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: router.py:2630 - Async Response: ModelResponse(id='chatcmpl-0b1b2cfa-83d8-4cf2-8102-382fee1fba05', created=1736526038, model='anthropic.claude-3-haiku-20240307-v1:0', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='stop', index=0, message=Message(content="Here is a short poem for you:\n\nWhispers of the wind,\nCarrying secrets that transcend.\nNature's symphony unfolds,\nRevealing tales yet untold.", role='assistant', tool_calls=None, function_call=None))], usage=Usage(completion_tokens=42, prompt_tokens=17, total_tokens=59, completion_tokens_details=None, prompt_tokens_details=None))
�[92m16:20:38 - LiteLLM:DEBUG�[0m: utils.py:275 - Returned custom cost for model=anthropic.claude-3-haiku-20240307-v1:0 - prompt_tokens_cost_usd_dollar: 4.25e-06, completion_tokens_cost_usd_dollar: 5.25e-05
�[92m16:20:38 - LiteLLM:DEBUG�[0m: utils.py:275 - Logging Details LiteLLM-Async Success Call, cache_hit=None
�[92m16:20:38 - LiteLLM:DEBUG�[0m: litellm_logging.py:970 - success callbacks: [<bound method Router.sync_deployment_callback_on_success of <litellm.router.Router object at 0x750db42e2900>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x750db46f1760>, <litellm.router_strategy.lowest_tpm_rpm_v2.LowestTPMLoggingHandler_v2 object at 0x750db3a22030>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x750db46f1cd0>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x750db46f1d30>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x750db46f1d60>, <litellm._service_logger.ServiceLogging object at 0x750db46f1dc0>]
�[92m16:20:38 - LiteLLM:DEBUG�[0m: cost_calculator.py:576 - completion_response response ms: 1316.362 
�[92m16:20:38 - LiteLLM:DEBUG�[0m: utils.py:275 - Returned custom cost for model=anthropic.claude-3-haiku-20240307-v1:0 - prompt_tokens_cost_usd_dollar: 4.25e-06, completion_tokens_cost_usd_dollar: 5.25e-05
�[92m16:20:38 - LiteLLM Proxy:DEBUG�[0m: model_max_budget_limiter.py:151 - in RouterBudgetLimiting.async_log_success_event
�[92m16:20:38 - LiteLLM Proxy:DEBUG�[0m: model_max_budget_limiter.py:167 - Not running _PROXY_VirtualKeyModelMaxBudgetLimiter.async_log_success_event because user_api_key_model_max_budget is None or empty. `user_api_key_model_max_budget`={}
�[92m16:20:38 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - INSIDE parallel request limiter ASYNC SUCCESS LOGGING
�[92m16:20:38 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - updated_value in success call: {'current_requests': 0, 'current_tpm': 122, 'current_rpm': 2}, precise_minute: 2025-01-10-16-20
INFO:     127.0.0.1:39420 - "POST /chat/completions HTTP/1.1" 200 OK
�[92m16:20:38 - LiteLLM Proxy:DEBUG�[0m: proxy_server.py:3238 - Request received by LiteLLM:
{
    "messages": [
        {
            "role": "user",
            "content": "this is a test request, write a short poem"
        }
    ],
    "model": "test"
}
�[92m16:20:38 - LiteLLM Proxy:DEBUG�[0m: litellm_pre_call_utils.py:417 - Request Headers: Headers({'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'authorization': 'Bearer anything', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'})
�[92m16:20:38 - LiteLLM Proxy:DEBUG�[0m: litellm_pre_call_utils.py:423 - receiving data: {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test', 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}}
�[92m16:20:38 - LiteLLM Proxy:DEBUG�[0m: litellm_pre_call_utils.py:589 - [PROXY]returned data from litellm_pre_call_utils: {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test', 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.55.12', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': ''}}
�[92m16:20:38 - LiteLLM Proxy:DEBUG�[0m: utils.py:88 - Inside Proxy Logging Pre-call hook!
NoneType: None

�[92m16:20:38 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - Inside Max Parallel Request Pre-Call Hook
�[92m16:20:38 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - current: {'current_requests': 0, 'current_tpm': 122, 'current_rpm': 2}
�[92m16:20:38 - LiteLLM:DEBUG�[0m: utils.py:275 - Initialized litellm callbacks, Async Success Callbacks: [<bound method Router.deployment_callback_on_success of <litellm.router.Router object at 0x750db42e2900>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x750db46f1760>, <litellm.router_strategy.lowest_tpm_rpm_v2.LowestTPMLoggingHandler_v2 object at 0x750db3a22030>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x750db46f1cd0>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x750db46f1d30>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x750db46f1d60>, <litellm._service_logger.ServiceLogging object at 0x750db46f1dc0>]
�[92m16:20:38 - LiteLLM:DEBUG�[0m: litellm_logging.py:404 - self.optional_params: {}
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: router.py:2865 - Inside async function with retries: args - (); kwargs - {'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.55.12', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test'}, 'litellm_call_id': '01fb92e2-3e08-4f33-9c61-9c7bebf5620a', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x750dab55dbb0>, 'model': 'test', 'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'stream': False, 'original_function': <bound method Router._acompletion of <litellm.router.Router object at 0x750db42e2900>>, 'num_retries': 4, 'litellm_trace_id': '281c08c9-3929-435b-ad24-cd7e60bf83ad', 'mock_timeout': None}
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: router.py:2887 - async function w/ retries: original_function - <bound method Router._acompletion of <litellm.router.Router object at 0x750db42e2900>>, num_retries - 4
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: router.py:848 - Inside _acompletion()- model: test; kwargs: {'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.55.12', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2}, 'litellm_call_id': '01fb92e2-3e08-4f33-9c61-9c7bebf5620a', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x750dab55dbb0>, 'stream': False, 'litellm_trace_id': '281c08c9-3929-435b-ad24-cd7e60bf83ad', 'mock_timeout': None}
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: router.py:5188 - initial list of deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', 'db_model': False}, 'rpm': 1}, {'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', 'db_model': False}, 'rpm': 1}]
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: cooldown_handlers.py:234 - retrieve cooldown models: []
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: router.py:5249 - async cooldown deployments: []
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: router.py:5252 - cooldown_deployments: []
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: router.py:5531 - cooldown deployments: []
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:434 - get_available_deployments - Usage Based. model_group: test, healthy_deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', 'db_model': False}, 'rpm': 1}, {'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', 'db_model': False}, 'rpm': 1}]
�[92m16:20:38 - LiteLLM:DEBUG�[0m: utils.py:275 - token_counter messages received: [{'role': 'user', 'content': 'this is a test request, write a short poem'}]
�[92m16:20:38 - LiteLLM:DEBUG�[0m: utils.py:275 - Token Counter - using generic token counter, for model=
�[92m16:20:38 - LiteLLM:DEBUG�[0m: utils.py:275 - LiteLLM: Utils - Counting tokens for OpenAI model=gpt-3.5-turbo
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:347 - input_tokens=17
�[92m16:20:38 - LiteLLM:DEBUG�[0m: utils.py:275 - returning picked lowest tpm/rpm deployment.
�[92m16:20:38 - LiteLLM Router:INFO�[0m: router.py:5355 - get_available_deployment for model: test, Selected deployment: {'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', 'db_model': False}, 'rpm': 1} for model: test
�[92m16:20:38 - LiteLLM:DEBUG�[0m: litellm_logging.py:1810 - Logging Details LiteLLM-Failure Call: [<bound method Router.deployment_callback_on_failure of <litellm.router.Router object at 0x750db42e2900>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x750db46f1760>, <litellm.router_strategy.lowest_tpm_rpm_v2.LowestTPMLoggingHandler_v2 object at 0x750db3a22030>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x750db46f1cd0>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x750db46f1d30>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x750db46f1d60>, <litellm._service_logger.ServiceLogging object at 0x750db46f1dc0>]
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: cooldown_handlers.py:182 - Attempting to add 0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4 to cooldown list
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: cooldown_handlers.py:117 - percent fails for deployment = 0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4, percent fails = 0.0, num successes = 1, num fails = 0
�[92m16:20:38 - LiteLLM Router:INFO�[0m: router.py:955 - litellm.acompletion(model=bedrock/anthropic.claude-3-haiku-20240307-v1:0)�[31m Exception litellm.RateLimitError: Deployment over defined rpm limit=1. current usage=1�[0m
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: router.py:5188 - initial list of deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', 'db_model': False}, 'rpm': 1}, {'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', 'db_model': False}, 'rpm': 1}]
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: cooldown_handlers.py:234 - retrieve cooldown models: [('0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', {'exception_received': 'litellm.RateLimitError: Deployment over defined rpm limit=1. current usage=1', 'status_code': '429', 'timestamp': 1736526038.3674307, 'cooldown_time': 5})]
�[92m16:20:38 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - Inside Max Parallel Request Failure Hook
�[92m16:20:38 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - user_api_key: anything
�[92m16:20:38 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - updated_value in failure call: {'current_requests': 0, 'current_tpm': 122, 'current_rpm': 2}
�[92m16:20:38 - LiteLLM:DEBUG�[0m: cooldown_callbacks.py:33 - In router_cooldown_event_callback - updating prometheus
�[92m16:20:38 - LiteLLM:DEBUG�[0m: get_api_base.py:63 - Error occurred in getting api base - litellm.BadRequestError: LLM Provider NOT provided. Pass in the LLM provider you are trying to call. You passed model=test
 Pass model as E.g. For 'Huggingface' inference endpoints pass in `completion(model='huggingface/starcoder',..)` Learn more: https://docs.litellm.ai/docs/providers
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: router.py:848 - Inside _acompletion()- model: test; kwargs: {'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.55.12', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2, 'deployment': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0', 'model_info': {'id': '0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', 'db_model': False}, 'api_base': None, 'caching_groups': None, 'previous_models': [{'exception_type': 'RateLimitError', 'exception_string': 'litellm.RateLimitError: Deployment over defined rpm limit=1. current usage=1', 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.55.12', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2, 'deployment': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0', 'model_info': {'id': '0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', 'db_model': False}, 'api_base': None, 'caching_groups': None}, 'litellm_call_id': '01fb92e2-3e08-4f33-9c61-9c7bebf5620a', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x750dab55dbb0>, 'model': 'test', 'stream': False, 'litellm_trace_id': '281c08c9-3929-435b-ad24-cd7e60bf83ad', 'mock_timeout': None}]}, 'litellm_call_id': '01fb92e2-3e08-4f33-9c61-9c7bebf5620a', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x750dab55dbb0>, 'stream': False, 'litellm_trace_id': '281c08c9-3929-435b-ad24-cd7e60bf83ad', 'mock_timeout': None}
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: router.py:5188 - initial list of deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', 'db_model': False}, 'rpm': 1}, {'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', 'db_model': False}, 'rpm': 1}]
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: cooldown_handlers.py:234 - retrieve cooldown models: [('0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', {'exception_received': 'litellm.RateLimitError: Deployment over defined rpm limit=1. current usage=1', 'status_code': '429', 'timestamp': 1736526038.3674307, 'cooldown_time': 5})]
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: router.py:5249 - async cooldown deployments: ['0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4']
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: router.py:5252 - cooldown_deployments: ['0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4']
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: router.py:5531 - cooldown deployments: ['0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4']
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:434 - get_available_deployments - Usage Based. model_group: test, healthy_deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', 'db_model': False}, 'rpm': 1}]
�[92m16:20:38 - LiteLLM:DEBUG�[0m: utils.py:275 - token_counter messages received: [{'role': 'user', 'content': 'this is a test request, write a short poem'}]
�[92m16:20:38 - LiteLLM:DEBUG�[0m: utils.py:275 - Token Counter - using generic token counter, for model=
�[92m16:20:38 - LiteLLM:DEBUG�[0m: utils.py:275 - LiteLLM: Utils - Counting tokens for OpenAI model=gpt-3.5-turbo
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:347 - input_tokens=17
�[92m16:20:38 - LiteLLM:DEBUG�[0m: utils.py:275 - returning picked lowest tpm/rpm deployment.
�[92m16:20:38 - LiteLLM Router:INFO�[0m: router.py:5355 - get_available_deployment for model: test, Selected deployment: {'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', 'db_model': False}, 'rpm': 1} for model: test
�[92m16:20:38 - LiteLLM:DEBUG�[0m: litellm_logging.py:1810 - Logging Details LiteLLM-Failure Call: [<bound method Router.deployment_callback_on_failure of <litellm.router.Router object at 0x750db42e2900>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x750db46f1760>, <litellm.router_strategy.lowest_tpm_rpm_v2.LowestTPMLoggingHandler_v2 object at 0x750db3a22030>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x750db46f1cd0>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x750db46f1d30>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x750db46f1d60>, <litellm._service_logger.ServiceLogging object at 0x750db46f1dc0>]
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: cooldown_handlers.py:182 - Attempting to add 116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47 to cooldown list
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: cooldown_handlers.py:117 - percent fails for deployment = 116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47, percent fails = 0.0, num successes = 1, num fails = 0
�[92m16:20:38 - LiteLLM Router:INFO�[0m: router.py:955 - litellm.acompletion(model=bedrock/anthropic.claude-3-haiku-20240307-v1:0)�[31m Exception litellm.RateLimitError: Deployment over defined rpm limit=1. current usage=1�[0m
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: router.py:5188 - initial list of deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', 'db_model': False}, 'rpm': 1}, {'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', 'db_model': False}, 'rpm': 1}]
�[92m16:20:38 - LiteLLM Router:DEBUG�[0m: cooldown_handlers.py:234 - retrieve cooldown models: [('116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', {'exception_received': 'litellm.RateLimitError: Deployment over defined rpm limit=1. current usage=1', 'status_code': '429', 'timestamp': 1736526038.3721502, 'cooldown_time': 5}), ('0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', {'exception_received': 'litellm.RateLimitError: Deployment over defined rpm limit=1. current usage=1', 'status_code': '429', 'timestamp': 1736526038.3674307, 'cooldown_time': 5})]
�[92m16:20:38 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - Inside Max Parallel Request Failure Hook
�[92m16:20:38 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - user_api_key: anything
�[92m16:20:38 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - updated_value in failure call: {'current_requests': 0, 'current_tpm': 122, 'current_rpm': 2}
�[92m16:20:38 - LiteLLM:DEBUG�[0m: cooldown_callbacks.py:33 - In router_cooldown_event_callback - updating prometheus
�[92m16:20:38 - LiteLLM:DEBUG�[0m: get_api_base.py:63 - Error occurred in getting api base - litellm.BadRequestError: LLM Provider NOT provided. Pass in the LLM provider you are trying to call. You passed model=test
 Pass model as E.g. For 'Huggingface' inference endpoints pass in `completion(model='huggingface/starcoder',..)` Learn more: https://docs.litellm.ai/docs/providers
�[92m16:21:38 - LiteLLM Router:DEBUG�[0m: router.py:848 - Inside _acompletion()- model: test; kwargs: {'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.55.12', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2, 'deployment': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0', 'model_info': {'id': '116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', 'db_model': False}, 'api_base': None, 'caching_groups': None, 'previous_models': [{'exception_type': 'RateLimitError', 'exception_string': 'litellm.RateLimitError: Deployment over defined rpm limit=1. current usage=1', 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.55.12', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2, 'deployment': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0', 'model_info': {'id': '0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', 'db_model': False}, 'api_base': None, 'caching_groups': None}, 'litellm_call_id': '01fb92e2-3e08-4f33-9c61-9c7bebf5620a', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x750dab55dbb0>, 'model': 'test', 'stream': False, 'litellm_trace_id': '281c08c9-3929-435b-ad24-cd7e60bf83ad', 'mock_timeout': None}, {'exception_type': 'RateLimitError', 'exception_string': 'litellm.RateLimitError: Deployment over defined rpm limit=1. current usage=1', 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.55.12', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2, 'deployment': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0', 'model_info': {'id': '116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', 'db_model': False}, 'api_base': None, 'caching_groups': None}, 'litellm_call_id': '01fb92e2-3e08-4f33-9c61-9c7bebf5620a', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x750dab55dbb0>, 'model': 'test', 'stream': False, 'litellm_trace_id': '281c08c9-3929-435b-ad24-cd7e60bf83ad', 'mock_timeout': None}]}, 'litellm_call_id': '01fb92e2-3e08-4f33-9c61-9c7bebf5620a', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x750dab55dbb0>, 'stream': False, 'litellm_trace_id': '281c08c9-3929-435b-ad24-cd7e60bf83ad', 'mock_timeout': None}
�[92m16:21:38 - LiteLLM Router:DEBUG�[0m: router.py:5188 - initial list of deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', 'db_model': False}, 'rpm': 1}, {'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', 'db_model': False}, 'rpm': 1}]
�[92m16:21:38 - LiteLLM Router:DEBUG�[0m: cooldown_handlers.py:234 - retrieve cooldown models: []
�[92m16:21:38 - LiteLLM Router:DEBUG�[0m: router.py:5249 - async cooldown deployments: []
�[92m16:21:38 - LiteLLM Router:DEBUG�[0m: router.py:5252 - cooldown_deployments: []
�[92m16:21:38 - LiteLLM Router:DEBUG�[0m: router.py:5531 - cooldown deployments: []
�[92m16:21:38 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:434 - get_available_deployments - Usage Based. model_group: test, healthy_deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', 'db_model': False}, 'rpm': 1}, {'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', 'db_model': False}, 'rpm': 1}]
�[92m16:21:38 - LiteLLM:DEBUG�[0m: utils.py:275 - token_counter messages received: [{'role': 'user', 'content': 'this is a test request, write a short poem'}]
�[92m16:21:38 - LiteLLM:DEBUG�[0m: utils.py:275 - Token Counter - using generic token counter, for model=
�[92m16:21:38 - LiteLLM:DEBUG�[0m: utils.py:275 - LiteLLM: Utils - Counting tokens for OpenAI model=gpt-3.5-turbo
�[92m16:21:38 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:347 - input_tokens=17
�[92m16:21:38 - LiteLLM:DEBUG�[0m: utils.py:275 - returning picked lowest tpm/rpm deployment.
�[92m16:21:38 - LiteLLM Router:INFO�[0m: router.py:5355 - get_available_deployment for model: test, Selected deployment: {'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', 'db_model': False}, 'rpm': 1} for model: test
�[92m16:21:38 - LiteLLM:DEBUG�[0m: utils.py:275 - 

�[92m16:21:38 - LiteLLM:DEBUG�[0m: utils.py:275 - �[92mRequest to litellm:�[0m
�[92m16:21:38 - LiteLLM:DEBUG�[0m: utils.py:275 - �[92mlitellm.acompletion(rpm=1, timeout=15.0, aws_region_name='us-east-1', model='bedrock/anthropic.claude-3-haiku-20240307-v1:0', messages=[{'role': 'user', 'content': 'this is a test request, write a short poem'}], caching=False, client=None, proxy_server_request={'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, metadata={'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.55.12', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2, 'deployment': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0', 'model_info': {'id': '116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', 'db_model': False}, 'api_base': None, 'caching_groups': None, 'previous_models': [{'exception_type': 'RateLimitError', 'exception_string': 'litellm.RateLimitError: Deployment over defined rpm limit=1. current usage=1', 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.55.12', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2, 'deployment': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0', 'model_info': {'id': '0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', 'db_model': False}, 'api_base': None, 'caching_groups': None}, 'litellm_call_id': '01fb92e2-3e08-4f33-9c61-9c7bebf5620a', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x750dab55dbb0>, 'model': 'test', 'stream': False, 'litellm_trace_id': '281c08c9-3929-435b-ad24-cd7e60bf83ad', 'mock_timeout': None}, {'exception_type': 'RateLimitError', 'exception_string': 'litellm.RateLimitError: Deployment over defined rpm limit=1. current usage=1', 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.55.12', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2, 'deployment': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0', 'model_info': {'id': '116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', 'db_model': False}, 'api_base': None, 'caching_groups': None}, 'litellm_call_id': '01fb92e2-3e08-4f33-9c61-9c7bebf5620a', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x750dab55dbb0>, 'model': 'test', 'stream': False, 'litellm_trace_id': '281c08c9-3929-435b-ad24-cd7e60bf83ad', 'mock_timeout': None}]}, litellm_call_id='01fb92e2-3e08-4f33-9c61-9c7bebf5620a', litellm_logging_obj=<litellm.litellm_core_utils.litellm_logging.Logging object at 0x750dab55dbb0>, stream=False, litellm_trace_id='281c08c9-3929-435b-ad24-cd7e60bf83ad', mock_timeout=None, model_info={'id': '116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', 'db_model': False}, max_retries=0)�[0m
�[92m16:21:38 - LiteLLM:DEBUG�[0m: utils.py:275 - 

�[92m16:21:38 - LiteLLM:DEBUG�[0m: utils.py:275 - ASYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache'): None
�[92m16:21:38 - LiteLLM:DEBUG�[0m: caching_handler.py:212 - CACHE RESULT: None
�[92m16:21:38 - LiteLLM:INFO�[0m: utils.py:2699 - 
LiteLLM completion() model= anthropic.claude-3-haiku-20240307-v1:0; provider = bedrock
�[92m16:21:38 - LiteLLM:DEBUG�[0m: utils.py:2702 - 
LiteLLM: Params passed to completion() {'model': 'anthropic.claude-3-haiku-20240307-v1:0', 'functions': None, 'function_call': None, 'temperature': None, 'top_p': None, 'n': None, 'stream': False, 'stream_options': None, 'stop': None, 'max_tokens': None, 'max_completion_tokens': None, 'modalities': None, 'prediction': None, 'audio': None, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': None, 'custom_llm_provider': 'bedrock', 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': 0, 'logprobs': None, 'top_logprobs': None, 'extra_headers': None, 'api_version': None, 'parallel_tool_calls': None, 'drop_params': None, 'additional_drop_params': None, 'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'aws_region_name': 'us-east-1'}
�[92m16:21:38 - LiteLLM:DEBUG�[0m: utils.py:2705 - 
LiteLLM: Non-Default params passed to completion() {'stream': False, 'max_retries': 0}
�[92m16:21:38 - LiteLLM:DEBUG�[0m: utils.py:275 - Final returned optional params: {'stream': False, 'aws_region_name': 'us-east-1'}
�[92m16:21:38 - LiteLLM:DEBUG�[0m: litellm_logging.py:404 - self.optional_params: {'stream': False, 'aws_region_name': 'us-east-1'}
�[92m16:21:38 - LiteLLM:DEBUG�[0m: base_aws_llm.py:122 - in get credentials
aws_access_key_id=None
aws_secret_access_key=None
aws_session_token=None
aws_region_name=us-east-1
aws_session_name=None
aws_profile_name=None
aws_role_name=None
aws_web_identity_token=None
aws_sts_endpoint=None
�[92m16:21:38 - LiteLLM:DEBUG�[0m: litellm_logging.py:524 - PRE-API-CALL ADDITIONAL ARGS: {'complete_input_dict': '{"messages": [{"role": "user", "content": [{"text": "this is a test request, write a short poem"}]}], "additionalModelRequestFields": {}, "system": [], "inferenceConfig": {}}', 'api_base': 'https://bedrock-runtime.us-east-1.amazonaws.com/model/anthropic.claude-3-haiku-20240307-v1:0/converse', 'headers': {'Content-Type': 'application/json', 'X-Amz-Date': '20250110T162138Z', 'X-Amz-Security-Token': 'IQoJb3JpZ2luX2VjEMD//////////wEaCXVzLWVhc3QtMiJHMEUCIBLEVZUXnqneNh0q4+WanVg8U0uyeDiuIbyXqX2FtEnmAiEAuVzIo7cOE3VRj9TA+Q3EH3U0nDrD0los25GU3IXI0DUqmAMIqf//////////ARAAGgwyMTExMjU2NDkwMTkiDIstBvrtf0+kIUpBhCrsAvpCzCgcoRqGHHyBqhu1UPemmLY/RN/zKmc0QW+YSokmsOsajkx56WiZH6pK9UiXFaY2c4NoAUCAKZEmmpI3W2LBDCt75lZKWOu3qdXSXAo6GYKphm/Y9Sj2TZH1vAQIiwn0t8gtwQEIzFuK3la9QAMe3vLcp7O+kpYMYfo5fTz+WEs/3xU1/7TmIWdN9zCf1t2F79kfujLwf6KCuAHHccdvv+GBXZWvoVsAtaQ4uHbpyUC10EKCL1blY0UKduDdHI4lldegso4c4tywGOc6vbQ2nX24IUZz9ex29Kx0fvh3tkJhteyymZiCO9jnkBIYlNSi9SnTbD6EIByKh/n6zN7PkZp4JkVytfiEwxgCEG6LVwS+KLBUSYXfBB38jYqvdEmb8eQGaUIvkZiXmbZSfXEKi8qE1kDMnlzRvhRZ8xCLlTMyoHqzQallDgBGExGVNC1DJjK0mVl+aRfkb/4YtXq0zsYXYaff5MOVu2AwkpKFvAY6pgHl/xA/gSrksSLSuFjumWvs50AeHi1lXYmcGJOAbq+ujM4g5zHpg1MJEx29HDCyepKPRGNLmaSF1I2Tl/rkqXpVcXAL/fjOHk7ibR6EsgJtW9xCLA02pQvvpHICL2jH6fiotJhK10i/RIDtQiPmHttE1z9qegu+vsyxq4D2y3vHAZZU7DHG3uEaWD794a7sfmRlzhsvEwsnfhD5oU2AWgFE4EcLOFWH', 'Authorization': 'AWS4-HMAC-SHA256 Credential=ASIATCKARXZ5RWTU3UHO/20250110/us-east-1/bedrock/aws4_request, SignedHeaders=content-type;host;x-amz-date;x-amz-security-token, Signature=2df714ad8787d715de081f4c9d0a83d769851d3104b64865fadb3e78514dc751', 'Content-Length': '174'}}
�[92m16:21:38 - LiteLLM:DEBUG�[0m: utils.py:275 - �[92m

POST Request Sent from LiteLLM:
curl -X POST \
https://bedrock-runtime.us-east-1.amazonaws.com/model/anthropic.claude-3-haiku-20240307-v1:0/converse \
-H 'Content-Type: *****' -H 'X-Amz-Date: *****' -H 'X-Amz-Security-Token: IQoJb3JpZ2luX2VjEMD//////////wEaCXVzLWVhc3QtMiJHMEUCIBLEVZUXnqneNh0q4+WanVg8U0uyeDiuIbyXqX2FtEnmAiEAuVzIo7cOE3VRj9TA+Q3EH3U0nDrD0los25GU3IXI0DUqmAMIqf//////////ARAAGgwyMTExMjU2NDkwMTkiDIstBvrtf0+kIUpBhCrsAvpCzCgcoRqGHHyBqhu1UPemmLY/RN/zKmc0QW+YSokmsOsajkx56WiZH6pK9UiXFaY2c4NoAUCAKZEmmpI3W2LBDCt75lZKWOu3qdXSXAo6GYKphm/Y9Sj2TZH1vAQIiwn0t8gtwQEIzFuK3la9QAMe3vLcp7O+kpYMYfo5fTz+WEs/3xU1/7TmIWdN9zCf1t2F79kfujLwf6KCuAHHccdvv+GBXZWvoVsAtaQ4uHbpyUC10EKCL1blY0UKduDdHI4lldegso4c4tywGOc6vbQ2nX24IUZz9ex29Kx0fvh3tkJhteyymZiCO9jnkBIYlNSi9SnTbD6EIByKh/n6zN7PkZp4JkVytfiEwxgCEG6LVwS+KLBUSYXfBB38jYqvdEmb8eQGaUIvkZiXmbZSfXEKi8qE1kDMnlzRvhRZ8xCLlTMyoHqzQallDgBGExGVNC1DJjK0mVl+aRfkb/4YtXq0zsYXYaff5MOVu2AwkpKFvAY6pgHl/xA/gSrksSLSuFjumWvs50AeHi1lXYmcGJOAbq+ujM4g5zHpg1MJEx29HDCyepKPRGNLmaSF1I2Tl/rkqXpVcXAL/fjOHk7ibR6EsgJtW9xCLA02pQvvpHICL2jH6fiotJhK10i/RIDtQiPmHttE1z9qegu+vsyxq4D2y3vHAZZU7DHG********************************************' -H 'Authorization: AWS4-HMAC-SHA256 Credential=ASIATCKARXZ5RWTU3UHO/20250110/us-east-1/bedrock/aws4_request, SignedHeaders=content-type;host;x-amz-date;x-amz-security-token, Signature=2df714ad8787d715de08********************************************' -H 'Content-Length: *****' \
-d '{"messages": [{"role": "user", "content": [{"text": "this is a test request, write a short poem"}]}], "additionalModelRequestFields": {}, "system": [], "inferenceConfig": {}}'
�[0m

�[92m16:21:39 - LiteLLM:DEBUG�[0m: utils.py:275 - RAW RESPONSE:
{"metrics":{"latencyMs":814},"output":{"message":{"content":[{"text":"Here's a short poem for your test request:\n\nWhispers of the wind,\nDancing on the autumn leaves.\nA moment in time, stilled."}],"role":"assistant"}},"stopReason":"end_turn","usage":{"inputTokens":17,"outputTokens":37,"totalTokens":54}}


�[92m16:21:39 - LiteLLM:DEBUG�[0m: main.py:5317 - raw model_response: {"metrics":{"latencyMs":814},"output":{"message":{"content":[{"text":"Here's a short poem for your test request:\n\nWhispers of the wind,\nDancing on the autumn leaves.\nA moment in time, stilled."}],"role":"assistant"}},"stopReason":"end_turn","usage":{"inputTokens":17,"outputTokens":37,"totalTokens":54}}
�[92m16:21:39 - LiteLLM:DEBUG�[0m: cost_calculator.py:576 - completion_response response ms: None 
�[92m16:21:39 - LiteLLM:DEBUG�[0m: utils.py:275 - Returned custom cost for model=anthropic.claude-3-haiku-20240307-v1:0 - prompt_tokens_cost_usd_dollar: 4.25e-06, completion_tokens_cost_usd_dollar: 4.6250000000000006e-05
�[92m16:21:39 - LiteLLM:DEBUG�[0m: utils.py:275 - Async Wrapper: Completed Call, calling async_success_handler: <bound method Logging.async_success_handler of <litellm.litellm_core_utils.litellm_logging.Logging object at 0x750dab55dbb0>>
�[92m16:21:39 - LiteLLM:DEBUG�[0m: utils.py:275 - Logging Details LiteLLM-Success Call: Cache_hit=None
�[92m16:21:39 - LiteLLM Router:INFO�[0m: router.py:943 - litellm.acompletion(model=bedrock/anthropic.claude-3-haiku-20240307-v1:0)�[32m 200 OK�[0m
�[92m16:21:39 - LiteLLM:DEBUG�[0m: cost_calculator.py:576 - completion_response response ms: 1203.655 
�[92m16:21:39 - LiteLLM Router:DEBUG�[0m: router.py:2630 - Async Response: ModelResponse(id='chatcmpl-b652421c-b32b-45e0-be3b-9d3009b6d876', created=1736526099, model='anthropic.claude-3-haiku-20240307-v1:0', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='stop', index=0, message=Message(content="Here's a short poem for your test request:\n\nWhispers of the wind,\nDancing on the autumn leaves.\nA moment in time, stilled.", role='assistant', tool_calls=None, function_call=None))], usage=Usage(completion_tokens=37, prompt_tokens=17, total_tokens=54, completion_tokens_details=None, prompt_tokens_details=None))
�[92m16:21:39 - LiteLLM:DEBUG�[0m: utils.py:275 - Returned custom cost for model=anthropic.claude-3-haiku-20240307-v1:0 - prompt_tokens_cost_usd_dollar: 4.25e-06, completion_tokens_cost_usd_dollar: 4.6250000000000006e-05
�[92m16:21:39 - LiteLLM:DEBUG�[0m: utils.py:275 - Logging Details LiteLLM-Async Success Call, cache_hit=None
�[92m16:21:39 - LiteLLM:DEBUG�[0m: litellm_logging.py:970 - success callbacks: [<bound method Router.sync_deployment_callback_on_success of <litellm.router.Router object at 0x750db42e2900>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x750db46f1760>, <litellm.router_strategy.lowest_tpm_rpm_v2.LowestTPMLoggingHandler_v2 object at 0x750db3a22030>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x750db46f1cd0>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x750db46f1d30>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x750db46f1d60>, <litellm._service_logger.ServiceLogging object at 0x750db46f1dc0>]
�[92m16:21:39 - LiteLLM:DEBUG�[0m: cost_calculator.py:576 - completion_response response ms: 1203.655 
�[92m16:21:39 - LiteLLM:DEBUG�[0m: utils.py:275 - Returned custom cost for model=anthropic.claude-3-haiku-20240307-v1:0 - prompt_tokens_cost_usd_dollar: 4.25e-06, completion_tokens_cost_usd_dollar: 4.6250000000000006e-05
�[92m16:21:39 - LiteLLM Proxy:DEBUG�[0m: model_max_budget_limiter.py:151 - in RouterBudgetLimiting.async_log_success_event
�[92m16:21:39 - LiteLLM Proxy:DEBUG�[0m: model_max_budget_limiter.py:167 - Not running _PROXY_VirtualKeyModelMaxBudgetLimiter.async_log_success_event because user_api_key_model_max_budget is None or empty. `user_api_key_model_max_budget`={}
�[92m16:21:39 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - INSIDE parallel request limiter ASYNC SUCCESS LOGGING
�[92m16:21:39 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - updated_value in success call: {'current_requests': 0, 'current_tpm': 54, 'current_rpm': 1}, precise_minute: 2025-01-10-16-21
INFO:     127.0.0.1:39420 - "POST /chat/completions HTTP/1.1" 200 OK
�[92m16:21:39 - LiteLLM Proxy:DEBUG�[0m: proxy_server.py:3238 - Request received by LiteLLM:
{
    "messages": [
        {
            "role": "user",
            "content": "this is a test request, write a short poem"
        }
    ],
    "model": "test"
}
�[92m16:21:39 - LiteLLM Proxy:DEBUG�[0m: litellm_pre_call_utils.py:417 - Request Headers: Headers({'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'authorization': 'Bearer anything', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'})
�[92m16:21:39 - LiteLLM Proxy:DEBUG�[0m: litellm_pre_call_utils.py:423 - receiving data: {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test', 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}}
�[92m16:21:39 - LiteLLM Proxy:DEBUG�[0m: litellm_pre_call_utils.py:589 - [PROXY]returned data from litellm_pre_call_utils: {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test', 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.55.12', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': ''}}
�[92m16:21:39 - LiteLLM Proxy:DEBUG�[0m: utils.py:88 - Inside Proxy Logging Pre-call hook!
NoneType: None

�[92m16:21:39 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - Inside Max Parallel Request Pre-Call Hook
�[92m16:21:39 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - current: {'current_requests': 0, 'current_tpm': 54, 'current_rpm': 1}
�[92m16:21:39 - LiteLLM:DEBUG�[0m: utils.py:275 - Initialized litellm callbacks, Async Success Callbacks: [<bound method Router.deployment_callback_on_success of <litellm.router.Router object at 0x750db42e2900>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x750db46f1760>, <litellm.router_strategy.lowest_tpm_rpm_v2.LowestTPMLoggingHandler_v2 object at 0x750db3a22030>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x750db46f1cd0>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x750db46f1d30>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x750db46f1d60>, <litellm._service_logger.ServiceLogging object at 0x750db46f1dc0>]
�[92m16:21:39 - LiteLLM:DEBUG�[0m: litellm_logging.py:404 - self.optional_params: {}
�[92m16:21:39 - LiteLLM Router:DEBUG�[0m: router.py:2865 - Inside async function with retries: args - (); kwargs - {'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.55.12', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test'}, 'litellm_call_id': '7dd36634-60d9-432e-aede-3d95812fd0d2', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x750db0475970>, 'model': 'test', 'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'stream': False, 'original_function': <bound method Router._acompletion of <litellm.router.Router object at 0x750db42e2900>>, 'num_retries': 4, 'litellm_trace_id': '957f1064-57b4-47c8-b1f1-77f92712416f', 'mock_timeout': None}
�[92m16:21:39 - LiteLLM Router:DEBUG�[0m: router.py:2887 - async function w/ retries: original_function - <bound method Router._acompletion of <litellm.router.Router object at 0x750db42e2900>>, num_retries - 4
�[92m16:21:39 - LiteLLM Router:DEBUG�[0m: router.py:848 - Inside _acompletion()- model: test; kwargs: {'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.55.12', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2}, 'litellm_call_id': '7dd36634-60d9-432e-aede-3d95812fd0d2', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x750db0475970>, 'stream': False, 'litellm_trace_id': '957f1064-57b4-47c8-b1f1-77f92712416f', 'mock_timeout': None}
�[92m16:21:39 - LiteLLM Router:DEBUG�[0m: router.py:5188 - initial list of deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', 'db_model': False}, 'rpm': 1}, {'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', 'db_model': False}, 'rpm': 1}]
�[92m16:21:39 - LiteLLM Router:DEBUG�[0m: cooldown_handlers.py:234 - retrieve cooldown models: []
�[92m16:21:39 - LiteLLM Router:DEBUG�[0m: router.py:5249 - async cooldown deployments: []
�[92m16:21:39 - LiteLLM Router:DEBUG�[0m: router.py:5252 - cooldown_deployments: []
�[92m16:21:39 - LiteLLM Router:DEBUG�[0m: router.py:5531 - cooldown deployments: []
�[92m16:21:39 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:434 - get_available_deployments - Usage Based. model_group: test, healthy_deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '116a724433feaddb4b4ff3a149e92eb6f3ab29540b3f257ccbdd5addf11d4a47', 'db_model': False}, 'rpm': 1}, {'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', 'db_model': False}, 'rpm': 1}]
�[92m16:21:39 - LiteLLM:DEBUG�[0m: utils.py:275 - token_counter messages received: [{'role': 'user', 'content': 'this is a test request, write a short poem'}]
�[92m16:21:39 - LiteLLM:DEBUG�[0m: utils.py:275 - Token Counter - using generic token counter, for model=
�[92m16:21:39 - LiteLLM:DEBUG�[0m: utils.py:275 - LiteLLM: Utils - Counting tokens for OpenAI model=gpt-3.5-turbo
�[92m16:21:39 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:347 - input_tokens=17
�[92m16:21:39 - LiteLLM:DEBUG�[0m: utils.py:275 - returning picked lowest tpm/rpm deployment.
�[92m16:21:39 - LiteLLM Router:INFO�[0m: router.py:5355 - get_available_deployment for model: test, Selected deployment: {'model_name': 'test', 'litellm_params': {'rpm': 1, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', 'db_model': False}, 'rpm': 1} for model: test
�[92m16:21:39 - LiteLLM:DEBUG�[0m: utils.py:275 - 

�[92m16:21:39 - LiteLLM:DEBUG�[0m: utils.py:275 - �[92mRequest to litellm:�[0m
�[92m16:21:39 - LiteLLM:DEBUG�[0m: utils.py:275 - �[92mlitellm.acompletion(rpm=1, timeout=15.0, aws_region_name='us-west-2', model='bedrock/anthropic.claude-3-haiku-20240307-v1:0', messages=[{'role': 'user', 'content': 'this is a test request, write a short poem'}], caching=False, client=None, proxy_server_request={'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, metadata={'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.55.12', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2, 'deployment': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0', 'model_info': {'id': '0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', 'db_model': False}, 'api_base': None, 'caching_groups': None}, litellm_call_id='7dd36634-60d9-432e-aede-3d95812fd0d2', litellm_logging_obj=<litellm.litellm_core_utils.litellm_logging.Logging object at 0x750db0475970>, stream=False, litellm_trace_id='957f1064-57b4-47c8-b1f1-77f92712416f', mock_timeout=None, model_info={'id': '0e92b760908db2649c85a28e1e22909633b2adec158b46d9e38ba2bb254c61b4', 'db_model': False}, max_retries=0)�[0m
�[92m16:21:39 - LiteLLM:DEBUG�[0m: utils.py:275 - 

�[92m16:21:39 - LiteLLM:DEBUG�[0m: utils.py:275 - ASYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache'): None
�[92m16:21:39 - LiteLLM:DEBUG�[0m: caching_handler.py:212 - CACHE RESULT: None
�[92m16:21:39 - LiteLLM:INFO�[0m: utils.py:2699 - 
LiteLLM completion() model= anthropic.claude-3-haiku-20240307-v1:0; provider = bedrock
�[92m16:21:39 - LiteLLM:DEBUG�[0m: utils.py:2702 - 
LiteLLM: Params passed to completion() {'model': 'anthropic.claude-3-haiku-20240307-v1:0', 'functions': None, 'function_call': None, 'temperature': None, 'top_p': None, 'n': None, 'stream': False, 'stream_options': None, 'stop': None, 'max_tokens': None, 'max_completion_tokens': None, 'modalities': None, 'prediction': None, 'audio': None, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': None, 'custom_llm_provider': 'bedrock', 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': 0, 'logprobs': None, 'top_logprobs': None, 'extra_headers': None, 'api_version': None, 'parallel_tool_calls': None, 'drop_params': None, 'additional_drop_params': None, 'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'aws_region_name': 'us-west-2'}
�[92m16:21:39 - LiteLLM:DEBUG�[0m: utils.py:2705 - 
LiteLLM: Non-Default params passed to completion() {'stream': False, 'max_retries': 0}
�[92m16:21:39 - LiteLLM:DEBUG�[0m: utils.py:275 - Final returned optional params: {'stream': False, 'aws_region_name': 'us-west-2'}
�[92m16:21:39 - LiteLLM:DEBUG�[0m: litellm_logging.py:404 - self.optional_params: {'stream': False, 'aws_region_name': 'us-west-2'}
�[92m16:21:39 - LiteLLM:DEBUG�[0m: base_aws_llm.py:122 - in get credentials
aws_access_key_id=None
aws_secret_access_key=None
aws_session_token=None
aws_region_name=us-west-2
aws_session_name=None
aws_profile_name=None
aws_role_name=None
aws_web_identity_token=None
aws_sts_endpoint=None
�[92m16:21:39 - LiteLLM:DEBUG�[0m: litellm_logging.py:524 - PRE-API-CALL ADDITIONAL ARGS: {'complete_input_dict': '{"messages": [{"role": "user", "content": [{"text": "this is a test request, write a short poem"}]}], "additionalModelRequestFields": {}, "system": [], "inferenceConfig": {}}', 'api_base': 'https://bedrock-runtime.us-west-2.amazonaws.com/model/anthropic.claude-3-haiku-20240307-v1:0/converse', 'headers': {'Content-Type': 'application/json', 'X-Amz-Date': '20250110T162139Z', 'X-Amz-Security-Token': 'IQoJb3JpZ2luX2VjEMD//////////wEaCXVzLWVhc3QtMiJHMEUCIFJO/kSgjDAwG/CIFEJvcAhGygsa7cQwjIfDZCpwE2obAiEA5Z5xQzAunni5zoYcG55v/WIwMgif+tcVQYl9byKZGm0qmAMIqf//////////ARAAGgwyMTExMjU2NDkwMTkiDI/wJXwvJzvx0VxMyCrsArxBHEQbFJfdsmkt7Ibf9I21bLgV6ExpzqpRtjikIWPE4CAktEKNIPURrf74zfDTfkfQyNc1QdGcI5BFjJj99IfuIQhOn85NJfQLfX4RyTTFeNKQ4ITPbzcwHCwtaz1NpiR5w88TccNK1o24cezrzyV2R3wBUfOuD7h6WFDdGWicrgfEPvIBKI2nI2GD2hJ0xE1bXCXVgC/UCgnxW3Z2yk5TzBTsn27ZmsptpcbQkhBYhtiYkdGq9URPmPWutzpRZcoCXP1/PGS+6/xvOwFrEBw5h72wMmqRj3LaZNF+MP5RBXakidLMSmq1FBLLr4bNfN2yDCnzqAUS89K6MJG51hQ4/oki/4ZiSbYcguOcoux6+5uxMgGRjpfn8rCZOopniqAT7s3Fhvn14taVDVlOVMeMT77hyh/8EQdLGVIjuF0quqPVPJVhsEzG3Q6EJTxCUs43VahXkfSjliKgkvFV3bUbEWv2dYpzy8zMOt4wk5KFvAY6pgFIWnfejfTq11NcKj3OrTqH8KXM1CUWT+CwzMK4HbQaBonPmzjmEemj7y6g9eL2EJ+OXXhPqiKhglP1NPIHc3AxMKmSu4i2x4hPT+K/B8Rkudg4jtzVsXVj16u9WV6D0/50uo/+dW2dXNqVV6Q9S845pA1avZMf6dJBaIZOAxev5I9FveTFEUuoyafgP6YzJuwKjox1dt15p/S2/uVZOCPVBt2Y2tfG', 'Authorization': 'AWS4-HMAC-SHA256 Credential=ASIATCKARXZ5TBNOHDKQ/20250110/us-west-2/bedrock/aws4_request, SignedHeaders=content-type;host;x-amz-date;x-amz-security-token, Signature=d92c955cb70d7ca89488d53743f6d84c010ec99fd86d17093882ae274b3c20eb', 'Content-Length': '174'}}
�[92m16:21:39 - LiteLLM:DEBUG�[0m: utils.py:275 - �[92m

POST Request Sent from LiteLLM:
curl -X POST \
https://bedrock-runtime.us-west-2.amazonaws.com/model/anthropic.claude-3-haiku-20240307-v1:0/converse \
-H 'Content-Type: *****' -H 'X-Amz-Date: *****' -H 'X-Amz-Security-Token: IQoJb3JpZ2luX2VjEMD//////////wEaCXVzLWVhc3QtMiJHMEUCIFJO/kSgjDAwG/CIFEJvcAhGygsa7cQwjIfDZCpwE2obAiEA5Z5xQzAunni5zoYcG55v/WIwMgif+tcVQYl9byKZGm0qmAMIqf//////////ARAAGgwyMTExMjU2NDkwMTkiDI/wJXwvJzvx0VxMyCrsArxBHEQbFJfdsmkt7Ibf9I21bLgV6ExpzqpRtjikIWPE4CAktEKNIPURrf74zfDTfkfQyNc1QdGcI5BFjJj99IfuIQhOn85NJfQLfX4RyTTFeNKQ4ITPbzcwHCwtaz1NpiR5w88TccNK1o24cezrzyV2R3wBUfOuD7h6WFDdGWicrgfEPvIBKI2nI2GD2hJ0xE1bXCXVgC/UCgnxW3Z2yk5TzBTsn27ZmsptpcbQkhBYhtiYkdGq9URPmPWutzpRZcoCXP1/PGS+6/xvOwFrEBw5h72wMmqRj3LaZNF+MP5RBXakidLMSmq1FBLLr4bNfN2yDCnzqAUS89K6MJG51hQ4/oki/4ZiSbYcguOcoux6+5uxMgGRjpfn8rCZOopniqAT7s3Fhvn14taVDVlOVMeMT77hyh/8EQdLGVIjuF0quqPVPJVhsEzG3Q6EJTxCUs43VahXkfSjliKgkvFV3bUbEWv2dYpzy8zMOt4wk5KFvAY6pgFIWnfejfTq11NcKj3OrTqH8KXM1CUWT+CwzMK4HbQaBonPmzjmEemj7y6g9eL2EJ+OXXhPqiKhglP1NPIHc3AxMKmSu4i2x4hPT+K/B8Rkudg4jtzVsXVj16u9WV6D0/50uo/+dW2dXNqVV6Q9S845pA1avZMf6dJBaIZOAxev5I9FveTF********************************************' -H 'Authorization: AWS4-HMAC-SHA256 Credential=ASIATCKARXZ5TBNOHDKQ/20250110/us-west-2/bedrock/aws4_request, SignedHeaders=content-type;host;x-amz-date;x-amz-security-token, Signature=d92c955cb70d7ca89488********************************************' -H 'Content-Length: *****' \
-d '{"messages": [{"role": "user", "content": [{"text": "this is a test request, write a short poem"}]}], "additionalModelRequestFields": {}, "system": [], "inferenceConfig": {}}'
�[0m

�[92m16:21:40 - LiteLLM:DEBUG�[0m: utils.py:275 - RAW RESPONSE:
{"metrics":{"latencyMs":836},"output":{"message":{"content":[{"text":"Here is a short poem for you:\n\nGentle whispers, soft and fair,\nCaress the soul without a care.\nMoments fleeting, yet profound,\nEchoes of beauty all around."}],"role":"assistant"}},"stopReason":"end_turn","usage":{"inputTokens":17,"outputTokens":49,"totalTokens":66}}


�[92m16:21:40 - LiteLLM:DEBUG�[0m: main.py:5317 - raw model_response: {"metrics":{"latencyMs":836},"output":{"message":{"content":[{"text":"Here is a short poem for you:\n\nGentle whispers, soft and fair,\nCaress the soul without a care.\nMoments fleeting, yet profound,\nEchoes of beauty all around."}],"role":"assistant"}},"stopReason":"end_turn","usage":{"inputTokens":17,"outputTokens":49,"totalTokens":66}}
�[92m16:21:40 - LiteLLM:DEBUG�[0m: cost_calculator.py:576 - completion_response response ms: None 
�[92m16:21:40 - LiteLLM:DEBUG�[0m: utils.py:275 - Returned custom cost for model=anthropic.claude-3-haiku-20240307-v1:0 - prompt_tokens_cost_usd_dollar: 4.25e-06, completion_tokens_cost_usd_dollar: 6.125000000000001e-05
�[92m16:21:40 - LiteLLM:DEBUG�[0m: utils.py:275 - Async Wrapper: Completed Call, calling async_success_handler: <bound method Logging.async_success_handler of <litellm.litellm_core_utils.litellm_logging.Logging object at 0x750db0475970>>
�[92m16:21:40 - LiteLLM:DEBUG�[0m: utils.py:275 - Logging Details LiteLLM-Success Call: Cache_hit=None
�[92m16:21:40 - LiteLLM Router:INFO�[0m: router.py:943 - litellm.acompletion(model=bedrock/anthropic.claude-3-haiku-20240307-v1:0)�[32m 200 OK�[0m
�[92m16:21:40 - LiteLLM:DEBUG�[0m: cost_calculator.py:576 - completion_response response ms: 1361.269 
�[92m16:21:40 - LiteLLM Router:DEBUG�[0m: router.py:2630 - Async Response: ModelResponse(id='chatcmpl-aaee94bc-8b0c-48e5-bf58-3a318ed0bd00', created=1736526100, model='anthropic.claude-3-haiku-20240307-v1:0', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='stop', index=0, message=Message(content='Here is a short poem for you:\n\nGentle whispers, soft and fair,\nCaress the soul without a care.\nMoments fleeting, yet profound,\nEchoes of beauty all around.', role='assistant', tool_calls=None, function_call=None))], usage=Usage(completion_tokens=49, prompt_tokens=17, total_tokens=66, completion_tokens_details=None, prompt_tokens_details=None))
�[92m16:21:40 - LiteLLM:DEBUG�[0m: utils.py:275 - Returned custom cost for model=anthropic.claude-3-haiku-20240307-v1:0 - prompt_tokens_cost_usd_dollar: 4.25e-06, completion_tokens_cost_usd_dollar: 6.125000000000001e-05
�[92m16:21:40 - LiteLLM:DEBUG�[0m: utils.py:275 - Logging Details LiteLLM-Async Success Call, cache_hit=None
�[92m16:21:40 - LiteLLM:DEBUG�[0m: litellm_logging.py:970 - success callbacks: [<bound method Router.sync_deployment_callback_on_success of <litellm.router.Router object at 0x750db42e2900>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x750db46f1760>, <litellm.router_strategy.lowest_tpm_rpm_v2.LowestTPMLoggingHandler_v2 object at 0x750db3a22030>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x750db46f1cd0>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x750db46f1d30>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x750db46f1d60>, <litellm._service_logger.ServiceLogging object at 0x750db46f1dc0>]
�[92m16:21:40 - LiteLLM:DEBUG�[0m: cost_calculator.py:576 - completion_response response ms: 1361.269 
�[92m16:21:40 - LiteLLM:DEBUG�[0m: utils.py:275 - Returned custom cost for model=anthropic.claude-3-haiku-20240307-v1:0 - prompt_tokens_cost_usd_dollar: 4.25e-06, completion_tokens_cost_usd_dollar: 6.125000000000001e-05
�[92m16:21:40 - LiteLLM Proxy:DEBUG�[0m: model_max_budget_limiter.py:151 - in RouterBudgetLimiting.async_log_success_event
�[92m16:21:40 - LiteLLM Proxy:DEBUG�[0m: model_max_budget_limiter.py:167 - Not running _PROXY_VirtualKeyModelMaxBudgetLimiter.async_log_success_event because user_api_key_model_max_budget is None or empty. `user_api_key_model_max_budget`={}
�[92m16:21:40 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - INSIDE parallel request limiter ASYNC SUCCESS LOGGING
�[92m16:21:40 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - updated_value in success call: {'current_requests': 0, 'current_tpm': 120, 'current_rpm': 2}, precise_minute: 2025-01-10-16-21
INFO:     127.0.0.1:39420 - "POST /chat/completions HTTP/1.1" 200 OK
INFO:     Shutting down
INFO:     Waiting for application shutdown.
�[92m16:22:41 - LiteLLM Proxy:INFO�[0m: proxy_server.py:8783 - Shutting down LiteLLM Proxy Server
INFO:     Application shutdown complete.
INFO:     Finished server process [36206]

rob-judith · 2025-01-10T16:33:46Z

logs for v1.57.5 with unexpected behavior. Truncated because wouldn't fit.:

/home/robjudith/iml/products/idp/litellm-proxy/venv/lib/python3.12/site-packages/pydantic/_internal/_config.py:345: UserWarning: Valid config keys have changed in V2:
* 'fields' has been removed
  warnings.warn(message, UserWarning)
INFO:     Started server process [30780]
INFO:     Waiting for application startup.
�[92m16:15:40 - LiteLLM Proxy:DEBUG�[0m: proxy_server.py:438 - litellm.proxy.proxy_server.py::startup() - CHECKING PREMIUM USER - False
�[92m16:15:40 - LiteLLM Proxy:DEBUG�[0m: litellm_license.py:98 - litellm.proxy.auth.litellm_license.py::is_premium() - ENTERING 'IS_PREMIUM' - LiteLLM License=None
�[92m16:15:40 - LiteLLM Proxy:DEBUG�[0m: litellm_license.py:107 - litellm.proxy.auth.litellm_license.py::is_premium() - Updated 'self.license_str' - None
�[92m16:15:40 - LiteLLM Proxy:DEBUG�[0m: proxy_server.py:449 - worker_config: {"model": null, "alias": null, "api_base": null, "api_version": "2024-07-01-preview", "debug": false, "detailed_debug": true, "temperature": null, "max_tokens": null, "request_timeout": null, "max_budget": null, "telemetry": true, "drop_params": false, "add_function_to_prompt": false, "headers": null, "save": false, "config": "./scratch/test.yaml", "use_queue": false}
�[92m16:15:40 - LiteLLM Proxy:DEBUG�[0m: proxy_server.py:1437 - loaded config={
    "litellm_settings": {
        "turn_off_message_logging": true
    },
    "router_settings": {
        "enable_precall_checks": true,
        "routing_strategy": "usage-based-routing-v2",
        "num_retries": 4,
        "retry_after": 30,
        "timeout": 300
    },
    "model_list": [
        {
            "model_name": "test",
            "litellm_params": {
                "model": "bedrock/anthropic.claude-3-haiku-20240307-v1:0",
                "aws_region_name": "us-east-1",
                "timeout": 15,
                "rpm": 2
            }
        },
        {
            "model_name": "test",
            "litellm_params": {
                "model": "bedrock/anthropic.claude-3-haiku-20240307-v1:0",
                "aws_region_name": "us-west-2",
                "timeout": 15,
                "rpm": 2
            }
        }
    ]
}
�[92m16:15:40 - LiteLLM Proxy:DEBUG�[0m: proxy_server.py:1894 - �[94m setting litellm.turn_off_message_logging=True�[0m
�[92m16:15:40 - LiteLLM:DEBUG�[0m: utils.py:1957 - bedrock/anthropic.claude-3-haiku-20240307-v1:0 added to model cost map
�[92m16:15:40 - LiteLLM:DEBUG�[0m: utils.py:1957 - bedrock/anthropic.claude-3-haiku-20240307-v1:0 added to model cost map
�[92m16:15:40 - LiteLLM Router:DEBUG�[0m: router.py:4085 - 
Initialized Model List ['test', 'test']
�[92m16:15:40 - LiteLLM Router:INFO�[0m: router.py:610 - Routing strategy: usage-based-routing-v2
�[92m16:15:40 - LiteLLM Router:DEBUG�[0m: router.py:502 - Intialized router with Routing strategy: usage-based-routing-v2

Routing enable_pre_call_checks: False

Routing fallbacks: None

Routing content fallbacks: None

Routing context window fallbacks: None

Router Redis Caching=None

�[92m16:15:40 - LiteLLM Proxy:DEBUG�[0m: proxy_server.py:514 - prisma_client: None
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:4000 (Press CTRL+C to quit)
�[92m16:15:42 - LiteLLM Proxy:DEBUG�[0m: proxy_server.py:3361 - Request received by LiteLLM:
{
    "messages": [
        {
            "role": "user",
            "content": "this is a test request, write a short poem"
        }
    ],
    "model": "test"
}
�[92m16:15:42 - LiteLLM Proxy:DEBUG�[0m: litellm_pre_call_utils.py:447 - Request Headers: Headers({'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'authorization': 'Bearer anything', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'})
�[92m16:15:42 - LiteLLM Proxy:DEBUG�[0m: litellm_pre_call_utils.py:453 - receiving data: {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test', 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}}
�[92m16:15:42 - LiteLLM Proxy:DEBUG�[0m: litellm_pre_call_utils.py:604 - [PROXY] returned data from litellm_pre_call_utils: {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test', 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': ''}}
�[92m16:15:42 - LiteLLM Proxy:DEBUG�[0m: utils.py:88 - Inside Proxy Logging Pre-call hook!
NoneType: None

�[92m16:15:42 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - Inside Max Parallel Request Pre-Call Hook
�[92m16:15:42 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - current: None
�[92m16:15:42 - LiteLLM:DEBUG�[0m: utils.py:284 - Initialized litellm callbacks, Async Success Callbacks: [<bound method Router.deployment_callback_on_success of <litellm.router.Router object at 0x73734cab3890>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x73734d139430>, <litellm.router_strategy.lowest_tpm_rpm_v2.LowestTPMLoggingHandler_v2 object at 0x73734c9c00e0>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x73734d154ad0>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x73734d154b30>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x73734d154b60>, <litellm._service_logger.ServiceLogging object at 0x73734d154bc0>]
�[92m16:15:42 - LiteLLM:DEBUG�[0m: litellm_logging.py:361 - self.optional_params: {}
�[92m16:15:42 - LiteLLM Router:DEBUG�[0m: router.py:3026 - Inside async function with retries: args - (); kwargs - {'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test'}, 'litellm_call_id': '529dcbbb-d522-49f0-8af5-d614f9c68f4e', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x73734c8f13d0>, 'model': 'test', 'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'stream': False, 'original_function': <bound method Router._acompletion of <litellm.router.Router object at 0x73734cab3890>>, 'num_retries': 4, 'litellm_trace_id': '0c213f26-a67b-4bb8-b58b-0cf4c2929afe', 'mock_timeout': None}
�[92m16:15:42 - LiteLLM Router:DEBUG�[0m: router.py:3048 - async function w/ retries: original_function - <bound method Router._acompletion of <litellm.router.Router object at 0x73734cab3890>>, num_retries - 4
�[92m16:15:42 - LiteLLM Router:DEBUG�[0m: router.py:863 - Inside _acompletion()- model: test; kwargs: {'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2}, 'litellm_call_id': '529dcbbb-d522-49f0-8af5-d614f9c68f4e', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x73734c8f13d0>, 'stream': False, 'litellm_trace_id': '0c213f26-a67b-4bb8-b58b-0cf4c2929afe', 'mock_timeout': None}
�[92m16:15:42 - LiteLLM Router:DEBUG�[0m: router.py:5382 - initial list of deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': 'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68', 'db_model': False}}, {'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665', 'db_model': False}}]
�[92m16:15:42 - LiteLLM Router:DEBUG�[0m: cooldown_handlers.py:234 - retrieve cooldown models: []
�[92m16:15:42 - LiteLLM Router:DEBUG�[0m: router.py:5443 - async cooldown deployments: []
�[92m16:15:42 - LiteLLM Router:DEBUG�[0m: router.py:5446 - cooldown_deployments: []
�[92m16:15:42 - LiteLLM Router:DEBUG�[0m: router.py:5725 - cooldown deployments: []
�[92m16:15:42 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:447 - get_available_deployments - Usage Based. model_group: test, healthy_deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': 'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68', 'db_model': False}}, {'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665', 'db_model': False}}]
�[92m16:15:42 - LiteLLM:DEBUG�[0m: utils.py:284 - token_counter messages received: [{'role': 'user', 'content': 'this is a test request, write a short poem'}]
�[92m16:15:42 - LiteLLM:DEBUG�[0m: utils.py:284 - Token Counter - using generic token counter, for model=
�[92m16:15:42 - LiteLLM:DEBUG�[0m: utils.py:284 - LiteLLM: Utils - Counting tokens for OpenAI model=gpt-3.5-turbo
�[92m16:15:42 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:403 - input_tokens=17
�[92m16:15:42 - LiteLLM:DEBUG�[0m: utils.py:284 - returning picked lowest tpm/rpm deployment.
�[92m16:15:42 - LiteLLM Router:INFO�[0m: router.py:5549 - get_available_deployment for model: test, Selected deployment: {'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': 'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68', 'db_model': False}} for model: test
�[92m16:15:42 - LiteLLM:DEBUG�[0m: utils.py:284 - 

�[92m16:15:42 - LiteLLM:DEBUG�[0m: utils.py:284 - �[92mRequest to litellm:�[0m
�[92m16:15:42 - LiteLLM:DEBUG�[0m: utils.py:284 - �[92mlitellm.acompletion(rpm=2, timeout=15.0, aws_region_name='us-east-1', model='bedrock/anthropic.claude-3-haiku-20240307-v1:0', messages=[{'role': 'user', 'content': 'this is a test request, write a short poem'}], caching=False, client=None, proxy_server_request={'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, metadata={'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2, 'deployment': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0', 'model_info': {'id': 'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68', 'db_model': False}, 'api_base': None, 'caching_groups': None}, litellm_call_id='529dcbbb-d522-49f0-8af5-d614f9c68f4e', litellm_logging_obj=<litellm.litellm_core_utils.litellm_logging.Logging object at 0x73734c8f13d0>, stream=False, litellm_trace_id='0c213f26-a67b-4bb8-b58b-0cf4c2929afe', mock_timeout=None, model_info={'id': 'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68', 'db_model': False}, max_retries=0)�[0m
�[92m16:15:42 - LiteLLM:DEBUG�[0m: utils.py:284 - 

�[92m16:15:42 - LiteLLM:DEBUG�[0m: utils.py:284 - ASYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache'): None
�[92m16:15:42 - LiteLLM:DEBUG�[0m: caching_handler.py:212 - CACHE RESULT: None
�[92m16:15:42 - LiteLLM:INFO�[0m: utils.py:2784 - 
LiteLLM completion() model= anthropic.claude-3-haiku-20240307-v1:0; provider = bedrock
�[92m16:15:42 - LiteLLM:DEBUG�[0m: utils.py:2787 - 
LiteLLM: Params passed to completion() {'model': 'anthropic.claude-3-haiku-20240307-v1:0', 'functions': None, 'function_call': None, 'temperature': None, 'top_p': None, 'n': None, 'stream': False, 'stream_options': None, 'stop': None, 'max_tokens': None, 'max_completion_tokens': None, 'modalities': None, 'prediction': None, 'audio': None, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': None, 'custom_llm_provider': 'bedrock', 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': 0, 'logprobs': None, 'top_logprobs': None, 'extra_headers': None, 'api_version': None, 'parallel_tool_calls': None, 'drop_params': None, 'additional_drop_params': None, 'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'aws_region_name': 'us-east-1'}
�[92m16:15:42 - LiteLLM:DEBUG�[0m: utils.py:2790 - 
LiteLLM: Non-Default params passed to completion() {'stream': False, 'max_retries': 0}
�[92m16:15:42 - LiteLLM:DEBUG�[0m: utils.py:284 - Final returned optional params: {'stream': False, 'aws_region_name': 'us-east-1'}
�[92m16:15:42 - LiteLLM:DEBUG�[0m: litellm_logging.py:361 - self.optional_params: {'stream': False, 'aws_region_name': 'us-east-1'}
�[92m16:15:42 - LiteLLM:DEBUG�[0m: base_aws_llm.py:122 - in get credentials
aws_access_key_id=None
aws_secret_access_key=None
aws_session_token=None
aws_region_name=us-east-1
aws_session_name=None
aws_profile_name=None
aws_role_name=None
aws_web_identity_token=None
aws_sts_endpoint=None
�[92m16:15:42 - LiteLLM:DEBUG�[0m: litellm_logging.py:481 - PRE-API-CALL ADDITIONAL ARGS: {'complete_input_dict': '{"messages": [{"role": "user", "content": [{"text": "this is a test request, write a short poem"}]}], "additionalModelRequestFields": {}, "system": [], "inferenceConfig": {}}', 'api_base': 'https://bedrock-runtime.us-east-1.amazonaws.com/model/anthropic.claude-3-haiku-20240307-v1:0/converse', 'headers': {'Content-Type': 'application/json', 'X-Amz-Date': '20250110T161542Z', 'X-Amz-Security-Token': 'IQoJb3JpZ2luX2VjEMD//////////wEaCXVzLWVhc3QtMiJIMEYCIQDOylmpat+sCVCTGnRXbi+MEd00uLFUAMSJ+Lb24X7ZrAIhAIdPqGFd7MTnEbJdfJM85yYvT5OqDkFA/IfOfC7wqFG/KpgDCKn//////////wEQABoMMjExMTI1NjQ5MDE5IgwBAiQBGKKhd3pQAqsq7AKWHHhCJ77wfDMWLdi7yB8iMq56NFC03KwYzp3Zzy5k7+2mhKd2eypN4Od9GaOinCPwVCqzkKnXYB0Q7+QelwAu/nW1qAUup4+k5J25y98aJd6ZWjVjlTSDlBnn9BBNsUQlS+HZ4Aw0ho2ji+fasdNM6EaRd1WxaRRRUeZkipgdnxWLs1Z1tMClN8u+O5N9oUGCN93IYMDFSCE6BPwPT+JM9L6kpuAda6G5N1SEgh1PGTfUcL2NM5J5qfr9VCyUob2Hf1e6oRaq9SlVeVTKoma1LBCOFoPgrFx6finwLCTH0znBpNHoSK0ljYCZyM21O6DcEXt9R6CCZS2HoB6a8tU29umEqAd1lf0ir63L97hk88A1ok7/nY1wXQzBnmXZo5yl6QpMyIi5iKvmWK5oLuA81e7PUL/i6lPt/z6Q82ubPKI7cMWhztVyhQzFBbBolFsJ90R4v4X+5EHkUB1lWDvn9P3pJp3n9YglgP27MK6PhbwGOqUBhlfLmXtSG897N+Nkk41anTzpxYR5yRC4yu9LA9UpF/k5+pJUfe9n1WqRZmGrAjz1DCBWejLPFMXBiAaFHACrtYsaXB4y2nVEkLEWJAp+CR+RoCMdx3rMV73/bNai6VnSbYMcTgNiXM84VXtI2UlJoG541BDgM9au+2B93vKPYbCwCynnH9O+erhc8JnMrZyyO9SEVea2JgcyzRrEvcUsa0fxST5d', 'Authorization': 'AWS4-HMAC-SHA256 Credential=ASIATCKARXZ54T2GMBOV/20250110/us-east-1/bedrock/aws4_request, SignedHeaders=content-type;host;x-amz-date;x-amz-security-token, Signature=c41decf4425fd13d5b57d0a0f64ef2d9e002adbd6d8c44243e0d07988b657b36', 'Content-Length': '174'}}
�[92m16:15:42 - LiteLLM:DEBUG�[0m: utils.py:284 - �[92m

POST Request Sent from LiteLLM:
curl -X POST \
https://bedrock-runtime.us-east-1.amazonaws.com/model/anthropic.claude-3-haiku-20240307-v1:0/converse \
-H 'Content-Type: *****' -H 'X-Amz-Date: *****' -H 'X-Amz-Security-Token: IQoJb3JpZ2luX2VjEMD//////////wEaCXVzLWVhc3QtMiJIMEYCIQDOylmpat+sCVCTGnRXbi+MEd00uLFUAMSJ+Lb24X7ZrAIhAIdPqGFd7MTnEbJdfJM85yYvT5OqDkFA/IfOfC7wqFG/KpgDCKn//////////wEQABoMMjExMTI1NjQ5MDE5IgwBAiQBGKKhd3pQAqsq7AKWHHhCJ77wfDMWLdi7yB8iMq56NFC03KwYzp3Zzy5k7+2mhKd2eypN4Od9GaOinCPwVCqzkKnXYB0Q7+QelwAu/nW1qAUup4+k5J25y98aJd6ZWjVjlTSDlBnn9BBNsUQlS+HZ4Aw0ho2ji+fasdNM6EaRd1WxaRRRUeZkipgdnxWLs1Z1tMClN8u+O5N9oUGCN93IYMDFSCE6BPwPT+JM9L6kpuAda6G5N1SEgh1PGTfUcL2NM5J5qfr9VCyUob2Hf1e6oRaq9SlVeVTKoma1LBCOFoPgrFx6finwLCTH0znBpNHoSK0ljYCZyM21O6DcEXt9R6CCZS2HoB6a8tU29umEqAd1lf0ir63L97hk88A1ok7/nY1wXQzBnmXZo5yl6QpMyIi5iKvmWK5oLuA81e7PUL/i6lPt/z6Q82ubPKI7cMWhztVyhQzFBbBolFsJ90R4v4X+5EHkUB1lWDvn9P3pJp3n9YglgP27MK6PhbwGOqUBhlfLmXtSG897N+Nkk41anTzpxYR5yRC4yu9LA9UpF/k5+pJUfe9n1WqRZmGrAjz1DCBWejLPFMXBiAaFHACrtYsaXB4y2nVEkLEWJAp+CR+RoCMdx3rMV73/bNai6VnSbYMcTgNiXM84VXtI2UlJoG541BDgM9au+2B93vKPYbCwCynn********************************************' -H 'Authorization: AWS4-HMAC-SHA256 Credential=ASIATCKARXZ54T2GMBOV/20250110/us-east-1/bedrock/aws4_request, SignedHeaders=content-type;host;x-amz-date;x-amz-security-token, Signature=c41decf4425fd13d5b57********************************************' -H 'Content-Length: *****' \
-d '{"messages": [{"role": "user", "content": [{"text": "this is a test request, write a short poem"}]}], "additionalModelRequestFields": {}, "system": [], "inferenceConfig": {}}'
�[0m

�[92m16:15:44 - LiteLLM:DEBUG�[0m: utils.py:284 - RAW RESPONSE:
{"metrics":{"latencyMs":2386},"output":{"message":{"content":[{"text":"Here is a short poem for you:\n\nBeneath the starry skies so clear,\nA moment of peace, a moment to hear,\nThe whispers of the gentle breeze,\nAs it dances through the ancient trees.\n\nIn this quiet respite, time stands still,\nAllowing the soul a chance to fulfill,\nThe yearning for solace, for calm reflection,\nA brief escape from life's constant direction.\n\nMay these words bring a sense of serenity,\nA reminder to pause, to find tranquility,\nIn the beauty that surrounds us each day,\nIf only we take a moment to stop and sway."}],"role":"assistant"}},"stopReason":"end_turn","usage":{"inputTokens":17,"outputTokens":144,"totalTokens":161}}


�[92m16:15:44 - LiteLLM:DEBUG�[0m: main.py:5257 - raw model_response: {"metrics":{"latencyMs":2386},"output":{"message":{"content":[{"text":"Here is a short poem for you:\n\nBeneath the starry skies so clear,\nA moment of peace, a moment to hear,\nThe whispers of the gentle breeze,\nAs it dances through the ancient trees.\n\nIn this quiet respite, time stands still,\nAllowing the soul a chance to fulfill,\nThe yearning for solace, for calm reflection,\nA brief escape from life's constant direction.\n\nMay these words bring a sense of serenity,\nA reminder to pause, to find tranquility,\nIn the beauty that surrounds us each day,\nIf only we take a moment to stop and sway."}],"role":"assistant"}},"stopReason":"end_turn","usage":{"inputTokens":17,"outputTokens":144,"totalTokens":161}}
�[92m16:15:44 - LiteLLM:DEBUG�[0m: cost_calculator.py:582 - completion_response response ms: None 
�[92m16:15:44 - LiteLLM:DEBUG�[0m: utils.py:284 - Returned custom cost for model=anthropic.claude-3-haiku-20240307-v1:0 - prompt_tokens_cost_usd_dollar: 4.25e-06, completion_tokens_cost_usd_dollar: 0.00018
�[92m16:15:44 - LiteLLM:DEBUG�[0m: litellm_logging.py:919 - Logging Details LiteLLM-Success Call: Cache_hit=None
�[92m16:15:44 - LiteLLM Router:INFO�[0m: router.py:958 - litellm.acompletion(model=bedrock/anthropic.claude-3-haiku-20240307-v1:0)�[32m 200 OK�[0m
�[92m16:15:44 - LiteLLM:DEBUG�[0m: cost_calculator.py:582 - completion_response response ms: 2927.151 
�[92m16:15:44 - LiteLLM Router:DEBUG�[0m: router.py:2791 - Async Response: ModelResponse(id='chatcmpl-9844f58a-8d1e-4e9f-8528-f88d260a037e', created=1736525744, model='anthropic.claude-3-haiku-20240307-v1:0', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='stop', index=0, message=Message(content="Here is a short poem for you:\n\nBeneath the starry skies so clear,\nA moment of peace, a moment to hear,\nThe whispers of the gentle breeze,\nAs it dances through the ancient trees.\n\nIn this quiet respite, time stands still,\nAllowing the soul a chance to fulfill,\nThe yearning for solace, for calm reflection,\nA brief escape from life's constant direction.\n\nMay these words bring a sense of serenity,\nA reminder to pause, to find tranquility,\nIn the beauty that surrounds us each day,\nIf only we take a moment to stop and sway.", role='assistant', tool_calls=None, function_call=None))], usage=Usage(completion_tokens=144, prompt_tokens=17, total_tokens=161, completion_tokens_details=None, prompt_tokens_details=None))
�[92m16:15:44 - LiteLLM:DEBUG�[0m: utils.py:284 - Returned custom cost for model=anthropic.claude-3-haiku-20240307-v1:0 - prompt_tokens_cost_usd_dollar: 4.25e-06, completion_tokens_cost_usd_dollar: 0.00018
�[92m16:15:44 - LiteLLM:DEBUG�[0m: utils.py:284 - Async Wrapper: Completed Call, calling async_success_handler: <bound method Logging.async_success_handler of <litellm.litellm_core_utils.litellm_logging.Logging object at 0x73734c8f13d0>>
�[92m16:15:44 - LiteLLM:DEBUG�[0m: litellm_logging.py:919 - Logging Details LiteLLM-Success Call: Cache_hit=None
�[92m16:15:44 - LiteLLM:DEBUG�[0m: utils.py:284 - Logging Details LiteLLM-Async Success Call, cache_hit=None
�[92m16:15:44 - LiteLLM:DEBUG�[0m: cost_calculator.py:582 - completion_response response ms: 2927.151 
�[92m16:15:44 - LiteLLM:DEBUG�[0m: cost_calculator.py:582 - completion_response response ms: 2927.151 
�[92m16:15:44 - LiteLLM:DEBUG�[0m: utils.py:284 - Returned custom cost for model=anthropic.claude-3-haiku-20240307-v1:0 - prompt_tokens_cost_usd_dollar: 4.25e-06, completion_tokens_cost_usd_dollar: 0.00018
�[92m16:15:44 - LiteLLM:DEBUG�[0m: utils.py:284 - Returned custom cost for model=anthropic.claude-3-haiku-20240307-v1:0 - prompt_tokens_cost_usd_dollar: 4.25e-06, completion_tokens_cost_usd_dollar: 0.00018
�[92m16:15:44 - LiteLLM Proxy:DEBUG�[0m: model_max_budget_limiter.py:151 - in RouterBudgetLimiting.async_log_success_event
�[92m16:15:44 - LiteLLM Proxy:DEBUG�[0m: model_max_budget_limiter.py:167 - Not running _PROXY_VirtualKeyModelMaxBudgetLimiter.async_log_success_event because user_api_key_model_max_budget is None or empty. `user_api_key_model_max_budget`={}
�[92m16:15:44 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - INSIDE parallel request limiter ASYNC SUCCESS LOGGING
�[92m16:15:44 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - updated_value in success call: {'current_requests': 0, 'current_tpm': 161, 'current_rpm': 1}, precise_minute: 2025-01-10-16-15
�[92m16:15:44 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - updated_value in success call: {'current_requests': 0, 'current_tpm': 161, 'current_rpm': 1}, precise_minute: 2025-01-10-16-15

�[1;37m#------------------------------------------------------------#�[0m
�[1;37m#                                                            #�[0m
�[1;37m#              'I don't like how this works...'               #�[0m
�[1;37m#        https://github.com/BerriAI/litellm/issues/new        #�[0m
�[1;37m#                                                            #�[0m
�[1;37m#------------------------------------------------------------#�[0m

 Thank you for using LiteLLM! - Krrish & Ishaan



�[1;31mGive Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new�[0m


�[32mLiteLLM: Proxy initialized with Config, Set models:�[0m
�[32m    test�[0m
�[32m    test�[0m
INFO:     127.0.0.1:47686 - "POST /chat/completions HTTP/1.1" 200 OK
�[92m16:15:44 - LiteLLM Proxy:DEBUG�[0m: proxy_server.py:3361 - Request received by LiteLLM:
{
    "messages": [
        {
            "role": "user",
            "content": "this is a test request, write a short poem"
        }
    ],
    "model": "test"
}
�[92m16:15:44 - LiteLLM Proxy:DEBUG�[0m: litellm_pre_call_utils.py:447 - Request Headers: Headers({'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'authorization': 'Bearer anything', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'})
�[92m16:15:44 - LiteLLM Proxy:DEBUG�[0m: litellm_pre_call_utils.py:453 - receiving data: {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test', 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}}
�[92m16:15:44 - LiteLLM Proxy:DEBUG�[0m: litellm_pre_call_utils.py:604 - [PROXY] returned data from litellm_pre_call_utils: {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test', 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': ''}}
�[92m16:15:44 - LiteLLM Proxy:DEBUG�[0m: utils.py:88 - Inside Proxy Logging Pre-call hook!
NoneType: None

�[92m16:15:44 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - Inside Max Parallel Request Pre-Call Hook
�[92m16:15:44 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - current: {'current_requests': 0, 'current_tpm': 161, 'current_rpm': 1}
�[92m16:15:44 - LiteLLM:DEBUG�[0m: utils.py:284 - Initialized litellm callbacks, Async Success Callbacks: [<bound method Router.deployment_callback_on_success of <litellm.router.Router object at 0x73734cab3890>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x73734d139430>, <litellm.router_strategy.lowest_tpm_rpm_v2.LowestTPMLoggingHandler_v2 object at 0x73734c9c00e0>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x73734d154ad0>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x73734d154b30>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x73734d154b60>, <litellm._service_logger.ServiceLogging object at 0x73734d154bc0>]
�[92m16:15:44 - LiteLLM:DEBUG�[0m: litellm_logging.py:361 - self.optional_params: {}
�[92m16:15:44 - LiteLLM Router:DEBUG�[0m: router.py:3026 - Inside async function with retries: args - (); kwargs - {'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test'}, 'litellm_call_id': '627d9479-faec-48af-a361-9fd28c7643f4', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x737348f0b1a0>, 'model': 'test', 'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'stream': False, 'original_function': <bound method Router._acompletion of <litellm.router.Router object at 0x73734cab3890>>, 'num_retries': 4, 'litellm_trace_id': '0345a7ed-bd02-43e6-a5a1-370b9cc2c1e3', 'mock_timeout': None}
�[92m16:15:44 - LiteLLM Router:DEBUG�[0m: router.py:3048 - async function w/ retries: original_function - <bound method Router._acompletion of <litellm.router.Router object at 0x73734cab3890>>, num_retries - 4
�[92m16:15:44 - LiteLLM Router:DEBUG�[0m: router.py:863 - Inside _acompletion()- model: test; kwargs: {'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2}, 'litellm_call_id': '627d9479-faec-48af-a361-9fd28c7643f4', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x737348f0b1a0>, 'stream': False, 'litellm_trace_id': '0345a7ed-bd02-43e6-a5a1-370b9cc2c1e3', 'mock_timeout': None}
�[92m16:15:44 - LiteLLM Router:DEBUG�[0m: router.py:5382 - initial list of deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': 'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68', 'db_model': False}}, {'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665', 'db_model': False}}]
�[92m16:15:44 - LiteLLM Router:DEBUG�[0m: cooldown_handlers.py:234 - retrieve cooldown models: []
�[92m16:15:44 - LiteLLM Router:DEBUG�[0m: router.py:5443 - async cooldown deployments: []
�[92m16:15:44 - LiteLLM Router:DEBUG�[0m: router.py:5446 - cooldown_deployments: []
�[92m16:15:44 - LiteLLM Router:DEBUG�[0m: router.py:5725 - cooldown deployments: []
�[92m16:15:44 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:447 - get_available_deployments - Usage Based. model_group: test, healthy_deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': 'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68', 'db_model': False}}, {'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665', 'db_model': False}}]
�[92m16:15:44 - LiteLLM:DEBUG�[0m: utils.py:284 - token_counter messages received: [{'role': 'user', 'content': 'this is a test request, write a short poem'}]
�[92m16:15:44 - LiteLLM:DEBUG�[0m: utils.py:284 - Token Counter - using generic token counter, for model=
�[92m16:15:44 - LiteLLM:DEBUG�[0m: utils.py:284 - LiteLLM: Utils - Counting tokens for OpenAI model=gpt-3.5-turbo
�[92m16:15:44 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:403 - input_tokens=17
�[92m16:15:44 - LiteLLM:DEBUG�[0m: utils.py:284 - returning picked lowest tpm/rpm deployment.
�[92m16:15:44 - LiteLLM Router:INFO�[0m: router.py:5549 - get_available_deployment for model: test, Selected deployment: {'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665', 'db_model': False}} for model: test
�[92m16:15:44 - LiteLLM:DEBUG�[0m: utils.py:284 - 

�[92m16:15:44 - LiteLLM:DEBUG�[0m: utils.py:284 - �[92mRequest to litellm:�[0m
�[92m16:15:44 - LiteLLM:DEBUG�[0m: utils.py:284 - �[92mlitellm.acompletion(rpm=2, timeout=15.0, aws_region_name='us-west-2', model='bedrock/anthropic.claude-3-haiku-20240307-v1:0', messages=[{'role': 'user', 'content': 'this is a test request, write a short poem'}], caching=False, client=None, proxy_server_request={'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, metadata={'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2, 'deployment': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0', 'model_info': {'id': '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665', 'db_model': False}, 'api_base': None, 'caching_groups': None}, litellm_call_id='627d9479-faec-48af-a361-9fd28c7643f4', litellm_logging_obj=<litellm.litellm_core_utils.litellm_logging.Logging object at 0x737348f0b1a0>, stream=False, litellm_trace_id='0345a7ed-bd02-43e6-a5a1-370b9cc2c1e3', mock_timeout=None, model_info={'id': '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665', 'db_model': False}, max_retries=0)�[0m
�[92m16:15:44 - LiteLLM:DEBUG�[0m: utils.py:284 - 

�[92m16:15:44 - LiteLLM:DEBUG�[0m: utils.py:284 - ASYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache'): None
�[92m16:15:44 - LiteLLM:DEBUG�[0m: caching_handler.py:212 - CACHE RESULT: None
�[92m16:15:44 - LiteLLM:INFO�[0m: utils.py:2784 - 
LiteLLM completion() model= anthropic.claude-3-haiku-20240307-v1:0; provider = bedrock
�[92m16:15:44 - LiteLLM:DEBUG�[0m: utils.py:2787 - 
LiteLLM: Params passed to completion() {'model': 'anthropic.claude-3-haiku-20240307-v1:0', 'functions': None, 'function_call': None, 'temperature': None, 'top_p': None, 'n': None, 'stream': False, 'stream_options': None, 'stop': None, 'max_tokens': None, 'max_completion_tokens': None, 'modalities': None, 'prediction': None, 'audio': None, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': None, 'custom_llm_provider': 'bedrock', 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': 0, 'logprobs': None, 'top_logprobs': None, 'extra_headers': None, 'api_version': None, 'parallel_tool_calls': None, 'drop_params': None, 'additional_drop_params': None, 'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'aws_region_name': 'us-west-2'}
�[92m16:15:44 - LiteLLM:DEBUG�[0m: utils.py:2790 - 
LiteLLM: Non-Default params passed to completion() {'stream': False, 'max_retries': 0}
�[92m16:15:44 - LiteLLM:DEBUG�[0m: utils.py:284 - Final returned optional params: {'stream': False, 'aws_region_name': 'us-west-2'}
�[92m16:15:44 - LiteLLM:DEBUG�[0m: litellm_logging.py:361 - self.optional_params: {'stream': False, 'aws_region_name': 'us-west-2'}
�[92m16:15:44 - LiteLLM:DEBUG�[0m: base_aws_llm.py:122 - in get credentials
aws_access_key_id=None
aws_secret_access_key=None
aws_session_token=None
aws_region_name=us-west-2
aws_session_name=None
aws_profile_name=None
aws_role_name=None
aws_web_identity_token=None
aws_sts_endpoint=None
�[92m16:15:45 - LiteLLM:DEBUG�[0m: litellm_logging.py:481 - PRE-API-CALL ADDITIONAL ARGS: {'complete_input_dict': '{"messages": [{"role": "user", "content": [{"text": "this is a test request, write a short poem"}]}], "additionalModelRequestFields": {}, "system": [], "inferenceConfig": {}}', 'api_base': 'https://bedrock-runtime.us-west-2.amazonaws.com/model/anthropic.claude-3-haiku-20240307-v1:0/converse', 'headers': {'Content-Type': 'application/json', 'X-Amz-Date': '20250110T161544Z', 'X-Amz-Security-Token': 'IQoJb3JpZ2luX2VjEMD//////////wEaCXVzLWVhc3QtMiJHMEUCIQDdA0U95Pm9SrqtbbRHTd+ezUaYPmuYJzGxw+F0ccdUeAIgDB/6SgcCq+uh6T4Nytuy2Y+e5Rqm+/XfXOZVVdz9QdUqmAMIqf//////////ARAAGgwyMTExMjU2NDkwMTkiDHRT6RIYN/cUY3qhyyrsApi7Fn+naO9WbX0GZrrafSnVJkCw9mii7KQwsB6XH62H+E54KQUoL2SXmEXKSYDQ4e6zZAdtHoSKlWQzONOY1WuHYcQzbl9blHatUKXWnb6UQ05H3Th6jT9G3nAchBW2jM35qFjKcGKuyVRFm3HY0wIjuXtGCWEK5t3Fp/iLVUN/V8JeC/EJylJ72qB38Hkr2oH2hByh/rvqKY7P50f6GyDiMphQ3/Gc0RwvVb0aKz2VJx2TvMHIFninKrYb6is4R8m5fhPwLGx0aJEFxDLw5A2f+fum/I6oagS2m+29xuXBfE1Hv0CVGCwRhaxc6oJBNdUUpD5qlPZb8EsQEevU3S22doQsW8HJiQ2LmS8tEbViwViH8ZHn64+dSmGYQ5NuGilMUMlgs6Qk/6CE/yTGwnI8Zm0VOPy6DiEkRAdwdUpY5lndwjwT+PZB0C6STztI7dLnhioxPkAVYupbQMVNfd86MO6RX9VuttgFXjMwsY+FvAY6pgF74rT3d94A3hSRfI2w8tSxNqqRto1T3PBy+fBcePW8nhXqksdtcZPvULOuC8MuMV/V1IYkd1PLdpB6FEceAualYr4809w4LFmFn/ILcDJ7xW1PMgTMHWZ1PJ13vuCa7PvpE7324QQP7YnP/OGMGcKi0+dtJ9YlrE0JnyMqSQ15Qc+FvRNd3x4gZqlZ+KilfGUp2Asm0jwVfDiUU8KDJfiqdnPFtI8X', 'Authorization': 'AWS4-HMAC-SHA256 Credential=ASIATCKARXZ5ZJDFRTMH/20250110/us-west-2/bedrock/aws4_request, SignedHeaders=content-type;host;x-amz-date;x-amz-security-token, Signature=310611d81fd8a171f757dccc3dfd71174098b8eefb425ded88f43c5818cf8d75', 'Content-Length': '174'}}
�[92m16:15:45 - LiteLLM:DEBUG�[0m: utils.py:284 - �[92m

POST Request Sent from LiteLLM:
curl -X POST \
https://bedrock-runtime.us-west-2.amazonaws.com/model/anthropic.claude-3-haiku-20240307-v1:0/converse \
-H 'Content-Type: *****' -H 'X-Amz-Date: *****' -H 'X-Amz-Security-Token: IQoJb3JpZ2luX2VjEMD//////////wEaCXVzLWVhc3QtMiJHMEUCIQDdA0U95Pm9SrqtbbRHTd+ezUaYPmuYJzGxw+F0ccdUeAIgDB/6SgcCq+uh6T4Nytuy2Y+e5Rqm+/XfXOZVVdz9QdUqmAMIqf//////////ARAAGgwyMTExMjU2NDkwMTkiDHRT6RIYN/cUY3qhyyrsApi7Fn+naO9WbX0GZrrafSnVJkCw9mii7KQwsB6XH62H+E54KQUoL2SXmEXKSYDQ4e6zZAdtHoSKlWQzONOY1WuHYcQzbl9blHatUKXWnb6UQ05H3Th6jT9G3nAchBW2jM35qFjKcGKuyVRFm3HY0wIjuXtGCWEK5t3Fp/iLVUN/V8JeC/EJylJ72qB38Hkr2oH2hByh/rvqKY7P50f6GyDiMphQ3/Gc0RwvVb0aKz2VJx2TvMHIFninKrYb6is4R8m5fhPwLGx0aJEFxDLw5A2f+fum/I6oagS2m+29xuXBfE1Hv0CVGCwRhaxc6oJBNdUUpD5qlPZb8EsQEevU3S22doQsW8HJiQ2LmS8tEbViwViH8ZHn64+dSmGYQ5NuGilMUMlgs6Qk/6CE/yTGwnI8Zm0VOPy6DiEkRAdwdUpY5lndwjwT+PZB0C6STztI7dLnhioxPkAVYupbQMVNfd86MO6RX9VuttgFXjMwsY+FvAY6pgF74rT3d94A3hSRfI2w8tSxNqqRto1T3PBy+fBcePW8nhXqksdtcZPvULOuC8MuMV/V1IYkd1PLdpB6FEceAualYr4809w4LFmFn/ILcDJ7xW1PMgTMHWZ1PJ13vuCa7PvpE7324QQP7YnP/OGMGcKi0+dtJ9YlrE0JnyMqSQ15Qc+FvRNd********************************************' -H 'Authorization: AWS4-HMAC-SHA256 Credential=ASIATCKARXZ5ZJDFRTMH/20250110/us-west-2/bedrock/aws4_request, SignedHeaders=content-type;host;x-amz-date;x-amz-security-token, Signature=310611d81fd8a171f757********************************************' -H 'Content-Length: *****' \
-d '{"messages": [{"role": "user", "content": [{"text": "this is a test request, write a short poem"}]}], "additionalModelRequestFields": {}, "system": [], "inferenceConfig": {}}'
�[0m

�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - RAW RESPONSE:
{"metrics":{"latencyMs":954},"output":{"message":{"content":[{"text":"Here is a short poem for you:\n\nA gentle breeze, a peaceful day,\nNature's canvas, colors at play.\nFlowers bloom, a symphony divine,\nMoment's grace, a gift so sublime."}],"role":"assistant"}},"stopReason":"end_turn","usage":{"inputTokens":17,"outputTokens":50,"totalTokens":67}}


�[92m16:15:46 - LiteLLM:DEBUG�[0m: main.py:5257 - raw model_response: {"metrics":{"latencyMs":954},"output":{"message":{"content":[{"text":"Here is a short poem for you:\n\nA gentle breeze, a peaceful day,\nNature's canvas, colors at play.\nFlowers bloom, a symphony divine,\nMoment's grace, a gift so sublime."}],"role":"assistant"}},"stopReason":"end_turn","usage":{"inputTokens":17,"outputTokens":50,"totalTokens":67}}
�[92m16:15:46 - LiteLLM:DEBUG�[0m: cost_calculator.py:582 - completion_response response ms: None 
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - Returned custom cost for model=anthropic.claude-3-haiku-20240307-v1:0 - prompt_tokens_cost_usd_dollar: 4.25e-06, completion_tokens_cost_usd_dollar: 6.25e-05
�[92m16:15:46 - LiteLLM Router:INFO�[0m: router.py:958 - litellm.acompletion(model=bedrock/anthropic.claude-3-haiku-20240307-v1:0)�[32m 200 OK�[0m
�[92m16:15:46 - LiteLLM:DEBUG�[0m: litellm_logging.py:919 - Logging Details LiteLLM-Success Call: Cache_hit=None
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:2791 - Async Response: ModelResponse(id='chatcmpl-8b2dec72-3671-48b1-874f-89c95088e33b', created=1736525746, model='anthropic.claude-3-haiku-20240307-v1:0', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='stop', index=0, message=Message(content="Here is a short poem for you:\n\nA gentle breeze, a peaceful day,\nNature's canvas, colors at play.\nFlowers bloom, a symphony divine,\nMoment's grace, a gift so sublime.", role='assistant', tool_calls=None, function_call=None))], usage=Usage(completion_tokens=50, prompt_tokens=17, total_tokens=67, completion_tokens_details=None, prompt_tokens_details=None))
�[92m16:15:46 - LiteLLM:DEBUG�[0m: cost_calculator.py:582 - completion_response response ms: 1512.0430000000001 
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - Async Wrapper: Completed Call, calling async_success_handler: <bound method Logging.async_success_handler of <litellm.litellm_core_utils.litellm_logging.Logging object at 0x737348f0b1a0>>
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - Returned custom cost for model=anthropic.claude-3-haiku-20240307-v1:0 - prompt_tokens_cost_usd_dollar: 4.25e-06, completion_tokens_cost_usd_dollar: 6.25e-05
�[92m16:15:46 - LiteLLM:DEBUG�[0m: litellm_logging.py:919 - Logging Details LiteLLM-Success Call: Cache_hit=None
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - Logging Details LiteLLM-Async Success Call, cache_hit=None
�[92m16:15:46 - LiteLLM:DEBUG�[0m: cost_calculator.py:582 - completion_response response ms: 1512.0430000000001 
�[92m16:15:46 - LiteLLM:DEBUG�[0m: cost_calculator.py:582 - completion_response response ms: 1512.0430000000001 
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - Returned custom cost for model=anthropic.claude-3-haiku-20240307-v1:0 - prompt_tokens_cost_usd_dollar: 4.25e-06, completion_tokens_cost_usd_dollar: 6.25e-05
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - Returned custom cost for model=anthropic.claude-3-haiku-20240307-v1:0 - prompt_tokens_cost_usd_dollar: 4.25e-06, completion_tokens_cost_usd_dollar: 6.25e-05
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: model_max_budget_limiter.py:151 - in RouterBudgetLimiting.async_log_success_event
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: model_max_budget_limiter.py:167 - Not running _PROXY_VirtualKeyModelMaxBudgetLimiter.async_log_success_event because user_api_key_model_max_budget is None or empty. `user_api_key_model_max_budget`={}
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - INSIDE parallel request limiter ASYNC SUCCESS LOGGING
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - updated_value in success call: {'current_requests': 0, 'current_tpm': 228, 'current_rpm': 2}, precise_minute: 2025-01-10-16-15
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - updated_value in success call: {'current_requests': 0, 'current_tpm': 228, 'current_rpm': 2}, precise_minute: 2025-01-10-16-15
INFO:     127.0.0.1:47686 - "POST /chat/completions HTTP/1.1" 200 OK
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: proxy_server.py:3361 - Request received by LiteLLM:
{
    "messages": [
        {
            "role": "user",
            "content": "this is a test request, write a short poem"
        }
    ],
    "model": "test"
}
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: litellm_pre_call_utils.py:447 - Request Headers: Headers({'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'authorization': 'Bearer anything', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'})
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: litellm_pre_call_utils.py:453 - receiving data: {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test', 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}}
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: litellm_pre_call_utils.py:604 - [PROXY] returned data from litellm_pre_call_utils: {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test', 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': ''}}
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: utils.py:88 - Inside Proxy Logging Pre-call hook!
NoneType: None

�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - Inside Max Parallel Request Pre-Call Hook
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - current: {'current_requests': 0, 'current_tpm': 228, 'current_rpm': 2}
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - Initialized litellm callbacks, Async Success Callbacks: [<bound method Router.deployment_callback_on_success of <litellm.router.Router object at 0x73734cab3890>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x73734d139430>, <litellm.router_strategy.lowest_tpm_rpm_v2.LowestTPMLoggingHandler_v2 object at 0x73734c9c00e0>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x73734d154ad0>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x73734d154b30>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x73734d154b60>, <litellm._service_logger.ServiceLogging object at 0x73734d154bc0>]
�[92m16:15:46 - LiteLLM:DEBUG�[0m: litellm_logging.py:361 - self.optional_params: {}
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:3026 - Inside async function with retries: args - (); kwargs - {'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test'}, 'litellm_call_id': '86fa13ba-d83f-4589-86f5-2f5b3cf0092c', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x737349076690>, 'model': 'test', 'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'stream': False, 'original_function': <bound method Router._acompletion of <litellm.router.Router object at 0x73734cab3890>>, 'num_retries': 4, 'litellm_trace_id': '77aac02b-6876-4445-9d82-af57acf0edaa', 'mock_timeout': None}
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:3048 - async function w/ retries: original_function - <bound method Router._acompletion of <litellm.router.Router object at 0x73734cab3890>>, num_retries - 4
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:863 - Inside _acompletion()- model: test; kwargs: {'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2}, 'litellm_call_id': '86fa13ba-d83f-4589-86f5-2f5b3cf0092c', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x737349076690>, 'stream': False, 'litellm_trace_id': '77aac02b-6876-4445-9d82-af57acf0edaa', 'mock_timeout': None}
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5382 - initial list of deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': 'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68', 'db_model': False}}, {'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665', 'db_model': False}}]
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: cooldown_handlers.py:234 - retrieve cooldown models: []
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5443 - async cooldown deployments: []
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5446 - cooldown_deployments: []
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5725 - cooldown deployments: []
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:447 - get_available_deployments - Usage Based. model_group: test, healthy_deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': 'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68', 'db_model': False}}, {'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665', 'db_model': False}}]
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - token_counter messages received: [{'role': 'user', 'content': 'this is a test request, write a short poem'}]
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - Token Counter - using generic token counter, for model=
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - LiteLLM: Utils - Counting tokens for OpenAI model=gpt-3.5-turbo
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:403 - input_tokens=17
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - returning picked lowest tpm/rpm deployment.
�[92m16:15:46 - LiteLLM:DEBUG�[0m: litellm_logging.py:1769 - Logging Details LiteLLM-Failure Call: [<bound method Router.deployment_callback_on_failure of <litellm.router.Router object at 0x73734cab3890>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x73734d139430>, <litellm.router_strategy.lowest_tpm_rpm_v2.LowestTPMLoggingHandler_v2 object at 0x73734c9c00e0>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x73734d154ad0>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x73734d154b30>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x73734d154b60>, <litellm._service_logger.ServiceLogging object at 0x73734d154bc0>]
�[92m16:15:46 - LiteLLM Router:INFO�[0m: router.py:970 - litellm.acompletion(model=None)�[31m Exception litellm.RateLimitError: No deployments available for selected model. 12345 Passed model=test. Deployments={'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68': {'current_tpm': 161, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}, '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665': {'current_tpm': 67, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}}�[0m
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5382 - initial list of deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': 'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68', 'db_model': False}}, {'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665', 'db_model': False}}]
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: cooldown_handlers.py:234 - retrieve cooldown models: []
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - Inside Max Parallel Request Failure Hook
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - user_api_key: anything
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - updated_value in failure call: {'current_requests': 0, 'current_tpm': 228, 'current_rpm': 2}
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:863 - Inside _acompletion()- model: test; kwargs: {'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2, 'previous_models': [{'exception_type': 'RateLimitError', 'exception_string': "litellm.RateLimitError: No deployments available for selected model. 12345 Passed model=test. Deployments={'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68': {'current_tpm': 161, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}, '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665': {'current_tpm': 67, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}}", 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2}, 'litellm_call_id': '86fa13ba-d83f-4589-86f5-2f5b3cf0092c', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x737349076690>, 'model': 'test', 'stream': False, 'litellm_trace_id': '77aac02b-6876-4445-9d82-af57acf0edaa', 'mock_timeout': None}]}, 'litellm_call_id': '86fa13ba-d83f-4589-86f5-2f5b3cf0092c', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x737349076690>, 'stream': False, 'litellm_trace_id': '77aac02b-6876-4445-9d82-af57acf0edaa', 'mock_timeout': None}
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5382 - initial list of deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': 'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68', 'db_model': False}}, {'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665', 'db_model': False}}]
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: cooldown_handlers.py:234 - retrieve cooldown models: []
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5443 - async cooldown deployments: []
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5446 - cooldown_deployments: []
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5725 - cooldown deployments: []
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:447 - get_available_deployments - Usage Based. model_group: test, healthy_deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': 'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68', 'db_model': False}}, {'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665', 'db_model': False}}]
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - token_counter messages received: [{'role': 'user', 'content': 'this is a test request, write a short poem'}]
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - Token Counter - using generic token counter, for model=
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - LiteLLM: Utils - Counting tokens for OpenAI model=gpt-3.5-turbo
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:403 - input_tokens=17
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - returning picked lowest tpm/rpm deployment.
�[92m16:15:46 - LiteLLM:DEBUG�[0m: litellm_logging.py:1769 - Logging Details LiteLLM-Failure Call: [<bound method Router.deployment_callback_on_failure of <litellm.router.Router object at 0x73734cab3890>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x73734d139430>, <litellm.router_strategy.lowest_tpm_rpm_v2.LowestTPMLoggingHandler_v2 object at 0x73734c9c00e0>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x73734d154ad0>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x73734d154b30>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x73734d154b60>, <litellm._service_logger.ServiceLogging object at 0x73734d154bc0>]
�[92m16:15:46 - LiteLLM Router:INFO�[0m: router.py:970 - litellm.acompletion(model=None)�[31m Exception litellm.RateLimitError: No deployments available for selected model. 12345 Passed model=test. Deployments={'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68': {'current_tpm': 161, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}, '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665': {'current_tpm': 67, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}}�[0m
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5382 - initial list of deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': 'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68', 'db_model': False}}, {'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665', 'db_model': False}}]
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: cooldown_handlers.py:234 - retrieve cooldown models: []
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - Inside Max Parallel Request Failure Hook
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - user_api_key: anything
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - updated_value in failure call: {'current_requests': 0, 'current_tpm': 228, 'current_rpm': 2}
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:863 - Inside _acompletion()- model: test; kwargs: {'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2, 'previous_models': [{'exception_type': 'RateLimitError', 'exception_string': "litellm.RateLimitError: No deployments available for selected model. 12345 Passed model=test. Deployments={'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68': {'current_tpm': 161, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}, '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665': {'current_tpm': 67, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}}", 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2}, 'litellm_call_id': '86fa13ba-d83f-4589-86f5-2f5b3cf0092c', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x737349076690>, 'model': 'test', 'stream': False, 'litellm_trace_id': '77aac02b-6876-4445-9d82-af57acf0edaa', 'mock_timeout': None}, {'exception_type': 'RateLimitError', 'exception_string': "litellm.RateLimitError: No deployments available for selected model. 12345 Passed model=test. Deployments={'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68': {'current_tpm': 161, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}, '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665': {'current_tpm': 67, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}}", 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2}, 'litellm_call_id': '86fa13ba-d83f-4589-86f5-2f5b3cf0092c', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x737349076690>, 'model': 'test', 'stream': False, 'litellm_trace_id': '77aac02b-6876-4445-9d82-af57acf0edaa', 'mock_timeout': None}]}, 'litellm_call_id': '86fa13ba-d83f-4589-86f5-2f5b3cf0092c', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x737349076690>, 'stream': False, 'litellm_trace_id': '77aac02b-6876-4445-9d82-af57acf0edaa', 'mock_timeout': None}
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5382 - initial list of deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': 'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68', 'db_model': False}}, {'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665', 'db_model': False}}]
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: cooldown_handlers.py:234 - retrieve cooldown models: []
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5443 - async cooldown deployments: []
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5446 - cooldown_deployments: []
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5725 - cooldown deployments: []
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:447 - get_available_deployments - Usage Based. model_group: test, healthy_deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': 'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68', 'db_model': False}}, {'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665', 'db_model': False}}]
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - token_counter messages received: [{'role': 'user', 'content': 'this is a test request, write a short poem'}]
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - Token Counter - using generic token counter, for model=
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - LiteLLM: Utils - Counting tokens for OpenAI model=gpt-3.5-turbo
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:403 - input_tokens=17
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - returning picked lowest tpm/rpm deployment.
�[92m16:15:46 - LiteLLM:DEBUG�[0m: litellm_logging.py:1769 - Logging Details LiteLLM-Failure Call: [<bound method Router.deployment_callback_on_failure of <litellm.router.Router object at 0x73734cab3890>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x73734d139430>, <litellm.router_strategy.lowest_tpm_rpm_v2.LowestTPMLoggingHandler_v2 object at 0x73734c9c00e0>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x73734d154ad0>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x73734d154b30>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x73734d154b60>, <litellm._service_logger.ServiceLogging object at 0x73734d154bc0>]
�[92m16:15:46 - LiteLLM Router:INFO�[0m: router.py:970 - litellm.acompletion(model=None)�[31m Exception litellm.RateLimitError: No deployments available for selected model. 12345 Passed model=test. Deployments={'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68': {'current_tpm': 161, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}, '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665': {'current_tpm': 67, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}}�[0m
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5382 - initial list of deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': 'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68', 'db_model': False}}, {'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665', 'db_model': False}}]
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: cooldown_handlers.py:234 - retrieve cooldown models: []
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - Inside Max Parallel Request Failure Hook
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - user_api_key: anything
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - updated_value in failure call: {'current_requests': 0, 'current_tpm': 228, 'current_rpm': 2}
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:863 - Inside _acompletion()- model: test; kwargs: {'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2, 'previous_models': [{'exception_type': 'RateLimitError', 'exception_string': "litellm.RateLimitError: No deployments available for selected model. 12345 Passed model=test. Deployments={'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68': {'current_tpm': 161, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}, '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665': {'current_tpm': 67, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}}", 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2}, 'litellm_call_id': '86fa13ba-d83f-4589-86f5-2f5b3cf0092c', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x737349076690>, 'model': 'test', 'stream': False, 'litellm_trace_id': '77aac02b-6876-4445-9d82-af57acf0edaa', 'mock_timeout': None}, {'exception_type': 'RateLimitError', 'exception_string': "litellm.RateLimitError: No deployments available for selected model. 12345 Passed model=test. Deployments={'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68': {'current_tpm': 161, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}, '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665': {'current_tpm': 67, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}}", 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2}, 'litellm_call_id': '86fa13ba-d83f-4589-86f5-2f5b3cf0092c', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x737349076690>, 'model': 'test', 'stream': False, 'litellm_trace_id': '77aac02b-6876-4445-9d82-af57acf0edaa', 'mock_timeout': None}, {'exception_type': 'RateLimitError', 'exception_string': "litellm.RateLimitError: No deployments available for selected model. 12345 Passed model=test. Deployments={'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68': {'current_tpm': 161, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}, '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665': {'current_tpm': 67, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}}", 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2}, 'litellm_call_id': '86fa13ba-d83f-4589-86f5-2f5b3cf0092c', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x737349076690>, 'model': 'test', 'stream': False, 'litellm_trace_id': '77aac02b-6876-4445-9d82-af57acf0edaa', 'mock_timeout': None}]}, 'litellm_call_id': '86fa13ba-d83f-4589-86f5-2f5b3cf0092c', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x737349076690>, 'stream': False, 'litellm_trace_id': '77aac02b-6876-4445-9d82-af57acf0edaa', 'mock_timeout': None}
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5382 - initial list of deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': 'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68', 'db_model': False}}, {'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665', 'db_model': False}}]
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: cooldown_handlers.py:234 - retrieve cooldown models: []
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5443 - async cooldown deployments: []
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5446 - cooldown_deployments: []
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5725 - cooldown deployments: []
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:447 - get_available_deployments - Usage Based. model_group: test, healthy_deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': 'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68', 'db_model': False}}, {'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665', 'db_model': False}}]
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - token_counter messages received: [{'role': 'user', 'content': 'this is a test request, write a short poem'}]
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - Token Counter - using generic token counter, for model=
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - LiteLLM: Utils - Counting tokens for OpenAI model=gpt-3.5-turbo
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:403 - input_tokens=17
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - returning picked lowest tpm/rpm deployment.
�[92m16:15:46 - LiteLLM:DEBUG�[0m: litellm_logging.py:1769 - Logging Details LiteLLM-Failure Call: [<bound method Router.deployment_callback_on_failure of <litellm.router.Router object at 0x73734cab3890>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x73734d139430>, <litellm.router_strategy.lowest_tpm_rpm_v2.LowestTPMLoggingHandler_v2 object at 0x73734c9c00e0>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x73734d154ad0>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x73734d154b30>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x73734d154b60>, <litellm._service_logger.ServiceLogging object at 0x73734d154bc0>]
�[92m16:15:46 - LiteLLM Router:INFO�[0m: router.py:970 - litellm.acompletion(model=None)�[31m Exception litellm.RateLimitError: No deployments available for selected model. 12345 Passed model=test. Deployments={'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68': {'current_tpm': 161, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}, '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665': {'current_tpm': 67, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}}�[0m
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5382 - initial list of deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': 'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68', 'db_model': False}}, {'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665', 'db_model': False}}]
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: cooldown_handlers.py:234 - retrieve cooldown models: []
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - Inside Max Parallel Request Failure Hook
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - user_api_key: anything
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - updated_value in failure call: {'current_requests': 0, 'current_tpm': 228, 'current_rpm': 2}
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:863 - Inside _acompletion()- model: test; kwargs: {'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2, 'previous_models': [{'exception_type': 'RateLimitError', 'exception_string': "litellm.RateLimitError: No deployments available for selected model. 12345 Passed model=test. Deployments={'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68': {'current_tpm': 161, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}, '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665': {'current_tpm': 67, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}}", 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2}, 'litellm_call_id': '86fa13ba-d83f-4589-86f5-2f5b3cf0092c', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x737349076690>, 'model': 'test', 'stream': False, 'litellm_trace_id': '77aac02b-6876-4445-9d82-af57acf0edaa', 'mock_timeout': None}, {'exception_type': 'RateLimitError', 'exception_string': "litellm.RateLimitError: No deployments available for selected model. 12345 Passed model=test. Deployments={'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68': {'current_tpm': 161, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}, '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665': {'current_tpm': 67, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}}", 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2}, 'litellm_call_id': '86fa13ba-d83f-4589-86f5-2f5b3cf0092c', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x737349076690>, 'model': 'test', 'stream': False, 'litellm_trace_id': '77aac02b-6876-4445-9d82-af57acf0edaa', 'mock_timeout': None}, {'exception_type': 'RateLimitError', 'exception_string': "litellm.RateLimitError: No deployments available for selected model. 12345 Passed model=test. Deployments={'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68': {'current_tpm': 161, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}, '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665': {'current_tpm': 67, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}}", 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2}, 'litellm_call_id': '86fa13ba-d83f-4589-86f5-2f5b3cf0092c', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x737349076690>, 'model': 'test', 'stream': False, 'litellm_trace_id': '77aac02b-6876-4445-9d82-af57acf0edaa', 'mock_timeout': None}, {'exception_type': 'RateLimitError', 'exception_string': "litellm.RateLimitError: No deployments available for selected model. 12345 Passed model=test. Deployments={'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68': {'current_tpm': 161, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}, '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665': {'current_tpm': 67, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}}", 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'body': {'messages': [{'role': 'user', 'content': 'this is a test request, write a short poem'}], 'model': 'test'}}, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': 'anything', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': None, 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key_end_user_id': None, 'user_api_key': 'anything', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.57.5', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_model_max_budget': {}, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'AsyncOpenAI/Python 1.59.5', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.59.5', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.12.8', 'x-stainless-async': 'async:asyncio', 'x-stainless-retry-count': '0', 'content-length': '106'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'test', 'model_group_size': 2}, 'litellm_call_id': '86fa13ba-d83f-4589-86f5-2f5b3cf0092c', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x737349076690>, 'model': 'test', 'stream': False, 'litellm_trace_id': '77aac02b-6876-4445-9d82-af57acf0edaa', 'mock_timeout': None}]}, 'litellm_call_id': '86fa13ba-d83f-4589-86f5-2f5b3cf0092c', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x737349076690>, 'stream': False, 'litellm_trace_id': '77aac02b-6876-4445-9d82-af57acf0edaa', 'mock_timeout': None}
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5382 - initial list of deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': 'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68', 'db_model': False}}, {'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665', 'db_model': False}}]
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: cooldown_handlers.py:234 - retrieve cooldown models: []
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5443 - async cooldown deployments: []
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5446 - cooldown_deployments: []
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5725 - cooldown deployments: []
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:447 - get_available_deployments - Usage Based. model_group: test, healthy_deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': 'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68', 'db_model': False}}, {'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665', 'db_model': False}}]
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - token_counter messages received: [{'role': 'user', 'content': 'this is a test request, write a short poem'}]
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - Token Counter - using generic token counter, for model=
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - LiteLLM: Utils - Counting tokens for OpenAI model=gpt-3.5-turbo
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: lowest_tpm_rpm_v2.py:403 - input_tokens=17
�[92m16:15:46 - LiteLLM:DEBUG�[0m: utils.py:284 - returning picked lowest tpm/rpm deployment.
�[92m16:15:46 - LiteLLM:DEBUG�[0m: litellm_logging.py:1769 - Logging Details LiteLLM-Failure Call: [<bound method Router.deployment_callback_on_failure of <litellm.router.Router object at 0x73734cab3890>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x73734d139430>, <litellm.router_strategy.lowest_tpm_rpm_v2.LowestTPMLoggingHandler_v2 object at 0x73734c9c00e0>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x73734d154ad0>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x73734d154b30>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x73734d154b60>, <litellm._service_logger.ServiceLogging object at 0x73734d154bc0>]
�[92m16:15:46 - LiteLLM Router:INFO�[0m: router.py:970 - litellm.acompletion(model=None)�[31m Exception litellm.RateLimitError: No deployments available for selected model. 12345 Passed model=test. Deployments={'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68': {'current_tpm': 161, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}, '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665': {'current_tpm': 67, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}}�[0m
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:5382 - initial list of deployments: [{'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-east-1', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': 'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68', 'db_model': False}}, {'model_name': 'test', 'litellm_params': {'rpm': 2, 'timeout': 15.0, 'aws_region_name': 'us-west-2', 'model': 'bedrock/anthropic.claude-3-haiku-20240307-v1:0'}, 'model_info': {'id': '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665', 'db_model': False}}]
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: cooldown_handlers.py:234 - retrieve cooldown models: []
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - Inside Max Parallel Request Failure Hook
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - user_api_key: anything
�[92m16:15:46 - LiteLLM Proxy:DEBUG�[0m: parallel_request_limiter.py:48 - updated_value in failure call: {'current_requests': 0, 'current_tpm': 228, 'current_rpm': 2}
�[92m16:15:46 - LiteLLM Router:DEBUG�[0m: router.py:2794 - TracebackTraceback (most recent call last):
  File "/home/robjudith/iml/products/idp/litellm-proxy/venv/lib/python3.12/site-packages/litellm/router_strategy/lowest_tpm_rpm_v2.py", line 492, in async_get_available_deployments
    assert deployment is not None
           ^^^^^^^^^^^^^^^^^^^^^^
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/robjudith/iml/products/idp/litellm-proxy/venv/lib/python3.12/site-packages/litellm/router.py", line 2788, in async_function_with_fallbacks
    response = await self.async_function_with_retries(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/robjudith/iml/products/idp/litellm-proxy/venv/lib/python3.12/site-packages/litellm/router.py", line 3150, in async_function_with_retries
    raise original_exception
  File "/home/robjudith/iml/products/idp/litellm-proxy/venv/lib/python3.12/site-packages/litellm/router.py", line 3056, in async_function_with_retries
    response = await self.make_call(original_function, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/robjudith/iml/products/idp/litellm-proxy/venv/lib/python3.12/site-packages/litellm/router.py", line 3159, in make_call
    response = await response
               ^^^^^^^^^^^^^^
  File "/home/robjudith/iml/products/idp/litellm-proxy/venv/lib/python3.12/site-packages/litellm/router.py", line 975, in _acompletion
    raise e
  File "/home/robjudith/iml/products/idp/litellm-proxy/venv/lib/python3.12/site-packages/litellm/router.py", line 868, in _acompletion
    deployment = await self.async_get_available_deployment(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/robjudith/iml/products/idp/litellm-proxy/venv/lib/python3.12/site-packages/litellm/router.py", line 5583, in async_get_available_deployment
    raise e
  File "/home/robjudith/iml/products/idp/litellm-proxy/venv/lib/python3.12/site-packages/litellm/router.py", line 5492, in async_get_available_deployment
    await self.lowesttpm_logger_v2.async_get_available_deployments(
  File "/home/robjudith/iml/products/idp/litellm-proxy/venv/lib/python3.12/site-packages/litellm/router_strategy/lowest_tpm_rpm_v2.py", line 542, in async_get_available_deployments
    raise litellm.RateLimitError(
litellm.exceptions.RateLimitError: litellm.RateLimitError: No deployments available for selected model. 12345 Passed model=test. Deployments={'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68': {'current_tpm': 161, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}, '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665': {'current_tpm': 67, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}} LiteLLM Retried: 3 times, LiteLLM Max Retries: 4

�[92m16:15:46 - LiteLLM Router:INFO�[0m: router.py:2815 - Trying to fallback b/w models
�[92m16:15:46 - LiteLLM Proxy:ERROR�[0m: proxy_server.py:3552 - litellm.proxy.proxy_server.chat_completion(): Exception occured - litellm.RateLimitError: No deployments available for selected model. 12345 Passed model=test. Deployments={'a91f829622dcf80aafb32d934a542f7a588e6760202d51ba410adfd19d91af68': {'current_tpm': 161, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}, '000f98c1833a05c728268082e48292827cdf996e67bfe578d0673c0e447ba665': {'current_tpm': 67, 'tpm_limit': inf, 'current_rpm': 1, 'rpm_limit': 2}}
Received Model Group=test
Available Model Group Fallbacks=None LiteLLM Retried: 3 times, LiteLLM Max Retries: 4

rob-judith added the bug Something isn't working label Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: usage-based-routing-v2 router retry logic doesn't respect `retry_after` or do backoff causing immediate failure #7669

[Bug]: usage-based-routing-v2 router retry logic doesn't respect `retry_after` or do backoff causing immediate failure #7669

rob-judith commented Jan 10, 2025 •

edited

Loading

rob-judith commented Jan 10, 2025 •

edited

Loading

rob-judith commented Jan 10, 2025

[Bug]: usage-based-routing-v2 router retry logic doesn't respect retry_after or do backoff causing immediate failure #7669

[Bug]: usage-based-routing-v2 router retry logic doesn't respect retry_after or do backoff causing immediate failure #7669

Comments

rob-judith commented Jan 10, 2025 • edited Loading

What happened?

Relevant log output

Are you a ML Ops Team?

What LiteLLM version are you on ?

Twitter / LinkedIn details

rob-judith commented Jan 10, 2025 • edited Loading

rob-judith commented Jan 10, 2025

[Bug]: usage-based-routing-v2 router retry logic doesn't respect `retry_after` or do backoff causing immediate failure #7669

[Bug]: usage-based-routing-v2 router retry logic doesn't respect `retry_after` or do backoff causing immediate failure #7669

rob-judith commented Jan 10, 2025 •

edited

Loading

rob-judith commented Jan 10, 2025 •

edited

Loading