Planning Large Language Models for Enhancing Spatial Cognition and Decision-Making Abilities

News

[2026-06-05] PlanBench official test data is publicly released on Hugging Face, with code available on GitHub.
[2026-06-04] PlanBench-V is available on arXiv.
[2025-05-20] PlanGPT-VL is open-sourced on ModelScope, with an online demo available for testing.
[2025-05-09] PlanGPT-1.5 is accepted by ACL 2025 Industry Track as an Oral Presentation.

PlanGPT-1/1.5

PlanGPT: Enhancing Urban Planning with Tailored
Language Model and Efficient Retrieval

Abstract

In the field of urban planning, general-purpose large language models often fall short of meeting the specific needs of planners. Tasks such as generating urban planning texts, retrieving relevant information, and evaluating planning documents present unique challenges. To enhance the efficiency of urban professionals and overcome these obstacles, we introduce PlanGPT, the first specialized language model tailored for urban and spatial planning. Through collaborative efforts with institutions like the China Academy of Urban Planning and Design, PlanGPT is developed using a customized local database retrieval framework, industry-based foundational model fine-tuning, and advanced tool capabilities. Empirical testing demonstrates that PlanGPT achieves state-of-the-art performance, delivering high-quality responses that accurately adapt to the intricacies of urban planning.

Technical Architecture

Teaser

Figure 1: PlanGPT-1/1.5 Architecture.

PlanGPT-1.5: Building upon PlanGPT-1, it incorporates key engineering techniques for practical applications in the urban planning industry, including insights from real-world use cases, methods to further mitigate hallucinations, and data synthesis techniques to reduce manual annotation costs. The paper has been accepted by ACL'25 (Industry) Oral, receiving a 9/10 rating from one of the four reviewers, who highly recognized the value of PlanGPT in industry large models.

The data synthesis techniques behind PlanGPT-1.5 are detailed in FANNO.

“The paper describes a real-life implementation of an LLM-based assistant tailored to a specific domain and highlights the importance of tailoring each component to obtain good usable results. It can serve as a reference for carrying out similar adaptations in other domains and use cases.”

📅 Release Date: September 28, 2023

PlanBench Planning Knowledge Benchmark

A Comprehensive Benchmark for Evaluating Urban Planning Capabilities in Large Language Models

Abstract

Urban planning, as a highly interdisciplinary and practice-oriented field, requires not only simple recall of knowledge but also complex situational judgment, policy understanding, spatial logical reasoning, and value assessment. Planning texts are characterized by dense terminology, complex structures, and long reasoning chains. Constructing benchmarks can help enhance large models' planning adaptation capabilities in the following aspects:

  • Deconstruction of planning texts (e.g., regulation breakdown, indicator interpretation)
  • Multi-level spatial governance logic (national - city - community)
  • Situational policy judgment and plan generation (e.g., site selection, land allocation, industry recommendations)

Text-based benchmarks serve as the linguistic foundation for "multimodal urban intelligence." In subsequent integrations with maps, charts, and spatial models, text comprehension capabilities are fundamental for achieving the three-dimensional linkage of "text-image-policy."

Technical Architecture

Teaser

Figure 2: PlanBench-Text Architecture.

📅 Release Date: May 19, 2025

PlanBench Planning Visual Recognition Benchmark

Multimodal Multi-image Understanding for Evaluating Multimodal Large Language Models

Abstract

National spatial planning maps visually present the concepts, goals, strategies, and specific measures of spatial planning, serving as a guide for coordinating various spatial development, protection, and utilization activities. They are not only crucial for planning decisions but also important tools for public participation and oversight of planning implementation. Planning is a highly interdisciplinary and specialized task; understanding planning maps requires grasping detailed elements (symbols, legends, geographic features) and the ability to conduct comprehensive analysis and judgment in conjunction with policies. This complexity makes understanding planning maps challenging. With the rapid development of multimodal large language models (MLLMs), we have established a benchmark for national spatial planning maps to evaluate MLLMs' capabilities in understanding these maps. Our contributions are as follows:

(1) Data: We constructed the Spatial Planning Map Database (SPMD), featuring diverse image content and high-quality annotations provided by experts in the field of planning.
(2) Framework: We proposed a comprehensive framework based on planning disciplines, measuring MLLMs' understanding of planning maps from four perspectives: perception, reasoning, association, and application, including eight subcategories.
(3) Experiments: By constructing question-answer tasks based on authoritative question banks (China's Registered Urban Planner Qualification Examination), we significantly reduced the proportion of "hallucination-style normative citations" by models.
(4) Results: All models performed worst in the application dimension, with Qwen2.5-VL-32B-Instruct achieving the highest overall score across all four dimensions.

Technical Architecture

Teaser

Figure 3: PlanBench-VL Architecture.

📅 Release Date: May 19, 2025

PlanGPT-VL

PlanGPT-VL: Enhancing Urban Planning with Domain-Specific Vision-Language Models

Abstract

Despite the critical importance of urban planning maps to professionals and educators, existing vision-language models (VLMs) often struggle to interpret and evaluate these specialized maps. These planning maps visualize key information such as land use, infrastructure layout, and functional zoning, requiring domain-specific knowledge that general VLMs typically lack. To address this issue, we developed PlanGPT-VL, the first domain-specific vision-language model designed for urban planning maps, featuring three major innovations: (1) PlanAnno-V framework for generating high-quality visual question-answering data for planning maps; (2) Keypoint reasoning mechanism that effectively reduces model hallucinations through structured verification methods; (3) PlanBench-V evaluation benchmark, the first comprehensive testing standard for assessing understanding of planning maps. Experimental results show that compared to open-source and commercial VLMs, PlanGPT-VL achieves an average performance improvement of 59.2% on specialized planning tasks. Notably, despite having only 7 billion parameters, classifying it as a lightweight model, its performance rivals that of larger models with over 72 billion parameters, providing urban planners with a reliable and factually accurate tool for professional map analysis.

Technical Architecture

Teaser

Figure 4: PlanGPT-VL Architecture.

📅 Release Date: May 19, 2025