Abstract:Autism spectrum disorder (ASD) is a neurodevelopmental disorder that typically arises in early childhood, with core symptoms predominantly manifesting in language and social interaction. Although traditional diagnostic tools, such as the Autism Diagnostic Observation Schedule-2 (ADOS-2), are widely used, they often have limited accessibility and objectivity in younger children and in primary healthcare settings. Due to its strong quantitative capabilities, relatively low cost, and high sensitivity to early speech anomalies in infants, speech recognition technology has emerged as a promising avenue for ASD diagnostic support. This paper systematically reviews the latest research on the application of speech recognition in early screening across different age groups, diagnosis of comorbid emotional issues, assessment of disease severity, and multimodal data integration. The findings show that extracting acoustic features—such as fundamental frequency, speech rate, and pauses—can effectively distinguish individuals with ASD from typically developing children, while also identifying comorbidities like anxiety, depression, and ADHD. Furthermore, multimodal fusion (e.g., neuroimaging, physiological signals, and behavioral data) can further improve diagnostic accuracy. Nevertheless, challenges persist, including inadequate data diversity, limitations related to dialect and age applicability, confounding effects of comorbid conditions, and concerns over privacy.