In the field of AI, the most frequently asked question is: "How long does it take to train a model once?" But for Taiwan AI Cloud, fine-tuning is never a "one-time training" action, but a "long-term relationship" about quality management and continuous evolution.
If the first two articles discussed process and model classification, this article will delve into the real-world scenario to see how we can transform an LLM into a financial expert capable of multi-step reasoning when faced with the extremely stringent "anti-money laundering (AML)" task, evolving it from a machine that only relies on surface semantic judgments.
Battlefield Account: Why "Memorizing Answers" Doesn't Work in the Financial Industry
Traditional AML systems excel at parsing structured data (such as transfer amounts), but struggle with unstructured data (such as news reports and court announcements). We hope AI can sift through lengthy news articles to determine: Does this involve illegal activity? Who are the individuals involved? What are their relationships?
This is not a simple keyword matching; it requires "reasoning".
Practical Path: An Evolutionary History Driven by "Why"
In the AML case, we experienced five key technological leaps, each aimed at solving the most practical pain points in implementation:
- We started by validating the feasibility of LoRA: We initially used the lightest LoRA to fine-tune it and were surprised to find that the model could understand the questions, which gave the team confidence that this path was viable!
- Distillation Model Introduction: To enable the 8B mini-model to learn the reasoning methods of the 70B model, we don't teach the answers, but rather "how the large model thinks." We extract the **Chain of Reasoning (CoT)** from the large model and feed it to the mini-model, allowing it to learn the projection of abstract logic.
- CoT Data Augmentation: Teaching the Model "How to Think": We found that the model sometimes "answers correctly, but makes up nonsensical reasons." So we improved the quality of the inference chain, upgrading the model from learning "tokens" to learning "steps".
- Self-Refine: Learning to "self-correct": We introduced a mechanism of "making mistakes first, then checking, then correcting." This allows the model to learn to identify its own reasoning blind spots during training, thus significantly improving logical density.
- GRPO: Making inference stable at every step: This is the final key. Instead of giving absolute scores, we give "relative rewards." This tells the model which of the four inference paths is the most reasonable. This makes the model not just occasionally strong, but extremely reliable.
In conclusion: Good models are not "trained," they are "managed."
Through this case study, we have summarized three key success factors for enterprise-level fine-tuning:
- Data determines the ceiling: the source is not important, the "task definition" of cleaning and labeling is important.
- Choosing the right model family: Gemma for fast reasoning, Phi for resource efficiency, and DeepSeek-R1 for strong logic. Choosing the right brain makes all the difference.
- Evaluation goes beyond just loss: In financial scenarios, we place more emphasis on latency, consistency, and the "LLM as a judge" score, as these are the true business metrics.
Conclusion: Establish a "growing" pipeline
Fine-tuning is a marathon. When we connect the data pipeline, automated assessment, and computing infrastructure, the model is no longer a static file, but a "digital asset" that is continuously updated with business data and becomes more accurate with use.
Taiwan Cloud not only provides computing power, but also this set of battlefield-proven engineering experience to help Taiwanese companies build their own digital resilience in the AI era.