Fine-scale Population Structure and Demographic History of Han Chinese Inferred from Haplotype Network of 111,000 Genomes

By Ao Lan, Kang Kang, Senwei Tang, Xiaoli Wu, Lizhong Wang, Teng Li, Haoyi Weng, Junjie Deng, WeGene Research Team, Qiang Zheng, Xiaotian Yao, Gang Chen

Posted 03 Jul 2020
bioRxiv DOI: 10.1101/2020.07.03.166413

Han Chinese is the most populated ethnic group across the globe with a comprehensive substructure that resembles its cultural diversification. Studies have constructed the genetic polymorphism spectrum of Han Chinese, whereas high-resolution investigations are still missing to unveil its fine-scale substructure and trace the genetic imprints for its demographic history. Here we construct a haplotype network consisted of 111,000 genome-wide genotyped Han Chinese individuals from direct-to-consumer genetic testing and over 1.3 billion identity-by-descent (IBD) links. We observed a clear separation of the northern and southern Han Chinese and captured 5 subclusters and 17 sub-subclusters in haplotype network hierarchical clustering, corresponding to geography (especially mountain ranges), immigration waves, and clans with cultural-linguistic segregation. We inferred differentiated split histories and founder effects for population clans Cantonese, Hakka, and Minnan-Chaoshanese in southern China, and also unveiled more recent demographic events within the past few centuries, such as Zou Xikou and Chuang Guandong. The composition shifts of the native and current residents of four major metropolitans (Beijing, Shanghai, Guangzhou, and Shenzhen) imply a rapidly vanished genetic barrier between subpopulations. Our study yields a fine-scale population structure of Han Chinese and provides profound insights into the nation's genetic and cultural-linguistic multiformity. ### Competing Interest Statement The authors AL, KK, ST, XW, LW, TL, HW, JD, QZ, XY , and GC work for WeGene (Shenzhen Zaozhidao Technology Co. Ltd. or Shenzhen WeGene Clinical Laboratory).

