When changing to Anthropic models, always cut by 1024 token output if not explicitly set 4096. This behaviour is annoying because I have to regenerate with max token output available (4096), this makes inferencing not efficient.
Seems for Anthropic model, you should make it 4096 by default, except user set lower. Many of my process has long input tokens, so when the output is only 1024 by default I lost money on my first process.
This is specific to Antropic.