With a foundational understanding of synthetic data's role from the previous chapter, we now turn to the 'how': the specific techniques used to generate synthetic text. This chapter provides a practical overview of these methods.
You will learn to:
The chapter includes a hands-on exercise where you'll use an LLM API to generate text, putting these techniques into practice. By progressing through these sections, you will build a toolkit for producing synthetic text tailored to various LLM development needs.
2.1 Algorithmic and Rule-Based Text Creation
2.2 Leveraging Back-Translation for Data Expansion
2.3 Employing Paraphrasing Models to Diversify Text
2.4 Using LLMs for Synthetic Sample Generation
2.5 Guiding Generation with Effective Prompt Design
2.6 Methods for Data Masking and Perturbation
2.7 Hands-on Practical: Text Generation with an LLM API
© 2025 ApX Machine Learning