Investing in Data Quality for High-Impact Entrepreneurship Research

Authors:
R. Markku Maula, Aalto University
Tomasz Mickiewicz, Aston University
Silvio Vismara, University of Bergamo & IMT Lucca
Johan Wiklund, Whitman School of Management, Syracuse University

 

Journal:
Entrepreneurship Theory and Practice (forthcoming)

 

Summary:
High-impact entrepreneurship research depends on data quality, and the 5I framework—Invest, Integrate, Innovate, Incentivize, Impact—provides researchers, reviewers, editors, and institutions with practical guidance for navigating today's shifting data landscape and building unique, rigorous datasets.

 

Research Questions:
1. What are the key data quality challenges facing entrepreneurship research today, including emerging threats such as AI-generated fake survey responses?
2. How can researchers escape familiar data trade-offs by investing in richer designs, integrating across sources and levels, and innovating with new methods and multimodal data?
3. What collective, institutional, and open-science approaches can incentivize higher data quality and maximize the credible impact of entrepreneurship research?

 

What we know:
Entrepreneurship research faces persistent data quality challenges—declining survey response rates, convenience samples, and measurement limitations in secondary databases—as well as new threats: AI language models can now generate fake survey responses that pass all plausibility checks, fundamentally undermining survey research. At the same time, multimodal digital data, geolocation-linked datasets, and platform-based behavioral traces create unprecedented opportunities. Scholars, journal editors, institutions, and policymakers all have a stake in ensuring that entrepreneurship research rests on credible data.

 

Novel Findings:
The paper introduces the 5I framework as a new organizing structure for data quality, covering Invest (building fit-for-purpose datasets), Integrate (linking levels and sources to reduce fragmentation), Innovate (leveraging new methods and multimodal data), Incentivize (sharing costs and building collective infrastructure), and Impact (maximizing transparency and reproducibility). It provides the first systematic treatment of LLM-generated fake survey responses as a threat to entrepreneurship research, and introduces federated learning as an emerging privacy-preserving solution for collaborative data analysis.

 

Novel Methodology:
The editorial synthesizes a broad range of innovative data approaches—geolocation-linked datasets, platform-based digital traces, AI-augmented large-scale text analysis, synthetic agents, longitudinal experimental designs, and federated learning—and evaluates their applicability and limitations for entrepreneurship research questions.

 

Implications for Practice:
Researchers should treat data collection as a foundational intellectual investment, not a preliminary step. Investing in multi-wave designs, building partnerships with organizations that provide unique data access, and adopting transparent documentation workflows leads to both stronger publications and faster review processes.

 

Implications for Policy:
Evidence-based entrepreneurship policy depends on sustained investment in high-quality data infrastructure. Policymakers should enable access to administrative data under appropriate privacy protections, support long-term data collection efforts, and establish stable and transparent terms for platform data access to protect scientific independence.

 

Implications for Society:
High-quality, responsibly managed data produce findings that better inform decisions affecting entrepreneurs, workers, and communities. Collective data infrastructure and open-science practices can ensure that high-quality empirical entrepreneurship research is not concentrated in well-resourced institutions, but available to scholars globally.

 

Implications for Research:
The 5I framework equips authors, reviewers, editors, and institutions with actionable, role-specific guidance for improving data quality. The editorial calls for stronger AI disclosure standards, higher reproducibility requirements, and greater investment in collaborative data infrastructure—and positions federated learning, multimodal data analysis, and field experiments as priority methodological frontiers.

 

Full Citation:
Maula, M., Mickiewicz, T., Vismara, and Wiklund, J. (forthcoming). Investing in data quality for high-impact entrepreneurship research. Entrepreneurship Theory and Practice.

 

Abstract:
High-impact entrepreneurship research stands or falls with data quality. Yet research design and data collection choices often force researchers into trade-offs among relevance, validity, and replicability. Reliance on existing databases constrains the questions we can study, while primary data collection to address new questions often struggles to deliver high-quality, large, and representative samples. Increasingly, the most tangible contributions come from unique, high-quality data that answer novel, important questions. We present a 5I framework (Invest, Integrate, Innovate, Incentivize, Impact), offering guidance for authors, reviewers, and editors to navigate these trade-offs and build unique datasets that enable relevant, valid, and replicable research.


Tagged As:

  • Alumni
  • Donors
  • Faculty
  • Stories