Interview with Clayton Miller by Lada Hensen Centnerová

 

 

Dr. Clayton Miller is an Associate Professor of Urban Analytics based at Singapore Management University (SMU) in the College of Integrative Studies (CIS). His research focuses on characterizing human experiences in both indoor and outdoor environments, particularly in relation to comfort, health, and productivity, through the convergence of architecture, engineering, data science, and human-computer interaction. Dr. Miller's team leads the development of the open-source smartwatch-based Cozie wellness and comfort data collection project, as well as the ASHRAE Great Energy Predictor III and Cool, Quiet City Kaggle competitions, and the SpaceMatch activity-based workstation optimization platform. He is the creator of the YouTube online course titled "Data Science for Construction, Architecture, and Engineering," which was formerly an edX course with over 43,000 participants worldwide.

 

 

LHC: You were keynote speaker for the Digitization theme at CLIMA 2022 in Rotterdam. Your background is in ‘architectural engineering’ and you recently joined Singapore Management University (SMU). What is your main research focus?

CM: Every four or five years I move to a new academic environment. I find this essential for tackling complex problems. My early work focused on buildings, but when designing cities, we are simplifying human behavior. A lot of research I’ve been lately working on is using wearable technologies – smartwatches, smartphones and micro surveys to better understand how people actually use buildings and urban spaces. At SMU’s Urban Institute, I now collaborate with colleagues from social sciences, history and economics to explore these dynamics further.

LHC: In 2020 you launched an open online course on edX platform titled ‘Data Science for Construction, Architecture and Engineering’, which attracted over 38 000 participants. Could you share some key lessons learned from this course, as you did in your paper [1]?

CM: Around 2012, during my PhD, I began using Python. Compared to Excel, Python felt like moving from a microwave meal to a chef’s kitchen. Excel is accessible and intuitive, but Python opens up vast possibilities - machine learning, modeling, data visualization. However, generic Python courses can be boring, and there weren’t any courses tailored to the built environment.

In 2014, I started offering workshops at conferences. Once I became a professor, I wanted to scale that effort, which led to the creation of the edX course. We launched it in April 2020 - perfect timing, as millions were at home during lockdowns. The response was overwhelming.

LHC: I’ve watched some of your courses videos on You Tube. Each video has a distinct theme, but they seem quite advanced for beginners.

CM: That’s intentional. The course is designed to get learners up and running quickly using high-level code. Many technical details are abstracted at first - you start by loading and visualizing data, which is highly motivating. Once learners see what’s possible, they’re encouraged to revisit the fundamentals and deepen their understanding. The course resonated strongly with professionals in the built environment, and many have told me it inspired them to continue learning. That was exactly the goal.

LHC: Fast forward five years - AI tools like Copilot and ChatGPT are now widely used, even for writing Python codes. How has this changed the learning landscape?

CM: Since 2023, the way people learn data analysis has completely transformed. Today, you can ask large language models (LLMs) to explain every line of code - or even write it for you.

LHC: You are a co-author of the paper from 2023 “How good is the advice from Chat GPT for building science? Comparison of four scenarios” [2]. One of your conclusions is that Chat GPT can generate sophisticated research outcomes. How do you see it now, in late 2025?

CM: I often tell students to treat ChatGPT like a smart team member - but one that offers opinions, not guarantees. When coding, the output can contain errors, sometimes even absurd ones.

Sure, students might get through university by copy-pasting from ChatGPT, but in professional practice, engineers are legally and ethically responsible. You must treat AI advice like input from a colleague - valuable, but not faultless.

When we wrote that paper in early 2023, ChatGPT had just launched. Now, organizations are increasingly protective of their digital assets, making it harder for LLMs to access high-quality data. This shift means LLM providers may soon need to pay for access to reliable sources.

LHC: Several versions of ChatGPT have been released. Do you think its capabilities are improving exponentially?

CM: I recently read an article about how LLM-generated content is being recycled online and then used to retrain the models. That’s problematic. These feedback loops can degrade quality over time.

I tell my students: your value lies in your judgment. If a building is too hot, no one’s going to sue OpenAI. There’s a great book called Prediction Machines - still relevant five years on. It explains the crucial difference between prediction and judgment. AI excels at prediction, but judgment is what professionals are paid for.

LHC: Can you give an example of prediction vs. judgment?

CM: A classic example from the book is weather forecasting. AI might predict an 80% chance of rain. But deciding whether to carry an umbrella is a judgment call. Maybe I just styled my hair and want to stay dry, or maybe I’ve just left the gym and don’t care.

This personal nuance is also present in the built environment. AI can’t make those individualized decisions.

LHC: So, you’re saying that AI is just a prediction machine and can’t make judgments?

CM: Exactly. AI struggles with judgment and likely will continue to. Take self-driving cars: billions have been invested over the past decade, and while progress has been made, it’s still fraught with challenges.

Driving isn’t inherently difficult - I got my first license at 14 growing up on a farm in Nebraska. But driving involves judgment. Imagine a giraffe crossing the road. A human would adapt instantly. AI might fail because it’s never seen that scenario before. That’s the crux: judgment is context-dependent and deeply human.

LHC: Since you speak about cars. Zoltan Nagy I’ve interviewed in the previous issue (5/2025) used electric cars as an example of data sharing for training AI models. Do you agree with it?

CM: Absolutely. Self-driving cars are a great example, especially because much of their technology relies on computer vision - arguably the most widespread application of AI today. Back in 2009, researchers at Princeton developed a dataset called ImageNet to address a challenge similar to what we face in the built environment: the need for large, diverse datasets to drive innovation.

They scraped the web for millions of images and labeled them - “cat,” “dog,” “black cat running,” and so on - creating a hierarchical structure of labels. They used Amazon Mechanical Turk to crowdsource this labeling effort, engaging thousands of people globally. This initiative led to machine learning competitions and helped catalyze the rise of deep learning.

Today, computer vision is foundational to autonomous vehicles. So, the question is: can we replicate this approach in the built environment? Can we just scrape the web?

LHC: I imagine it’s not as straightforward, given the volume of images uploaded to Internet by millions of people every day compared to building data.

CM: Exactly. While we do have access to building management systems or energy modeling outputs, the scale is nowhere near comparable. And labeling this data is far more complex. You can’t simply hire crowd workers to tag HVAC components in BIM models - you need domain experts.

This challenge isn’t unique to our field. Medicine and law face similar issues. Even when data exists, unlocking its value requires expert interpretation. So the question becomes: how can we replicate the success of the computer vision community?

LHC: You’re one of the initiators of the Building Data Genome Directory - an open, comprehensive data-sharing platform for building performance research [3]. Is this part of the solution?

CM: Yes, that’s one of our efforts to “unsilo” data - much of which is trapped in proprietary systems. But labeling remains a major hurdle. If a funding agency were to support expert labeling, it would be a game-changer.

The reality is that open data doesn’t generate the same excitement as LLMs. We’re stuck in a chicken-and-egg problem: we need hype to attract funding, but we can’t generate excitement without large datasets.

Interestingly, the medical field has made significant progress. Despite the sensitivity of medical data, companies are investing heavily - not just in acquiring raw data, but also in hiring doctors to label it. This has unlocked commercial opportunities, such as training models on radiology scans for lung cancer detection.

In our field, the question remains: if a company collects and labels building data, will it lead to a commercial breakthrough? Who should fund this? Building owners? But who owns enough buildings to make it viable?

LHC: Well, what about the European Union? They regularly update the EPBD, and the latest version includes the Smart Readiness Indicator (SRI).

CM: The EU is definitely ahead in terms of policy frameworks that could generate useful data. But buildings go through multiple phases - design, construction, operation - and AI applications vary across these stages.

For example, digital fabrication has clear commercial value and is seeing more AI integration. But when it comes to open data, we’re still far from a breakthrough. After five years of organizing competitions, we’ve built datasets, but they’re not yet large enough to catalyze real change.

That’s why we shifted our approach. Instead of relying on building operators for data, we developed platform called Cozie [4], which is an open-source platform using smartwatches to collect micro-survey data. Figure 1. The idea is to crowdsource subjective human feedback on indoor comfort, rather than relying solely on sensor data. You can read about this concept in our paper Humans as Sensors [5].

Figure 1. Training/testing data prediction objectives structure for the Cool, Quiet City Competition

 

We organized the city-scale collection of more than 10,000 micro-survey responses from 100 participants across Singapore [6]. This dataset is richer than anything previously available - and we didn’t need permission from building owners. Combined with tools like Street View imagery, we’re now able to analyze buildings and cities in a scalable way. Instead of breaking silos, we’re building new datasets from the ground up.

LHC: This kind of urban informatics seems to represent a new research paradigm - complementing traditional lab and field measurements.

CM: Exactly. I recently discussed this with Stefano Schiavon and Gail Brager from UC Berkeley. They focus more on traditional building physics research, while we’re exploring patterns and context through data-driven methods.

Both approaches are essential. They should inform and strengthen each other.

LHC: To conclude, do you have a message for our readers?

CM: The theme of my career - and my life - is stepping outside your comfort zone. It may sound cliché, but it’s powerful.

Treat AI as a colleague: a smart advisor, not a replacement. Learn how to interact with it. Expand your boundaries, embrace new technologies, and see them as opportunities to grow.

References

[1]     C. Miller and C. Tan, “Data science skills for the built environment: Lessons learned from a massive open online Python course for construction, architecture, and engineering,” 2024, doi: doi.org/10.1051/e3sconf/202456206001.

[2]     A. Rysanek, Z. Nagy, C. Miller, and A. Demir Dilsiz, “How good is the advice from ChatGPT for building science? Comparison of four scenarios,” J. Phys. Conf. Ser., vol. 2600, p. 82006, 2023, doi: 10.1088/1742-6596/2600/8/082006.

[3]     X. Jin et al., “The Building Data Genome Directory – An open, comprehensive data sharing platform for building performance research,” J. Phys. Conf. Ser., vol. 2600, no. 3, 2023, doi: 10.1088/1742-6596/2600/3/032003.

[4]     C. Miller et al., “Introducing the Cool, Quiet City Competition: Predicting Smartwatch-Reported Heat and Noise with Digital Twin Metrics,” in Proceedings of the 10th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, 2023, pp. 298–299, doi: 10.1145/3600100.3626269.

[5]     P. Jayathissa, M. Quintana, M. Abdelrahman, and C. Miller, “Humans-as-a-Sensor for Buildings—Intensive Longitudinal Indoor Comfort Models,” Buildings, vol. 10, no. 10, 2020, doi: doi.org/10.3390/buildings10100174.

[6]     C. Miller et al., “The Cool, Quiet City machine learning competition: Overview and results.” 2025, doi: 10.13140/RG.2.2.25717.90087.

Interview with Clayton Miller by Lada Hensen CentnerováPages 62 - 65

Stay Informed

Follow us on social media accounts to stay up to date with REHVA actualities

0

0 product in cart.products in cart.