This paper presents the first fully integrated analysis of multimodal news frames. A standardized content analysis of text and images in newspaper articles from Brazil, Germany, India, South Africa, and the United States covering the United Nations (UN) Climate Change Conferences 2010–2013 was conducted using a subset of photo-illustrated articles (n = 432) as well as the entire conference coverage (n = 1,311). In the photo-illustrated articles, four overarching multimodal frames were identified: global warming victims, civil society demands, political negotiations, and sustainable energy frames. The distribution of these global frames across the five countries is relatively similar, and a comparison of frames emerging from the national subsets also reveals a strong element of cross-national frame convergence. This is explained by the news production context at global staged political events, which features uniform media access rules and similar information supplies, as well as strong interaction between journalists from different countries and between journalists and other actors. Event-related frame convergence across vastly different contexts is interpreted as one mechanism by which truly transnational media debate can be facilitated that can potentially serve to legitimize global political decisions. In conclusion, perspectives for future qualitative and quantitative multimodal framing research are discussed.