An exploration of the core architectural and statistical challenges encountered when scaling a large language model framework into a live financial crime detection system. This session focuses heavily on balancing multi-lingual context windows against API costs and practical throughput restrictions.