Seminar: Miriam Hurtado Bodell, Linköping University

EVENT

Date: 19 May 2021, 1.00 PM - 19 May 2021, 2.00 PM
Venue: Online

From Documents to Data: a Framework for Total Corpus Quality

When: 19 May, 2021, kl. 13-14
Where: This seminar is given online. E-mail Dan Hedlin if you want to attend.

Abstract

As digitized large-scale textual corpora and novel methodologies are increasingly becoming available, researchers are rediscovering textual sources’ potential for inquiries into social and cultural phenomena. Yet while textual corpora show great promise to enrich our knowledge of the social, empirical research faces challenges on how to avoid particular “garbage in-garbage out” problems: our scientific inferences are only as good as the quality of our data analyzed. This paper argues that an evaluation of a processed machine-readable corpus with regard to its quality is pivotal for later social science inquiries. The paper proposes a framework of total corpus quality, which identifies three dimensions that impact the potential of using large corpora for research. Our conceptual framework helps to diagnose and understand errors in studies based on large-scale textual analyses.

Last updated: March 25, 2021
Page editor: Richard Hager
Source: Department of Statistics

Tell a friend

Contacts

Visiting adress
Statistiska institutionen
Universitetsvägen 10B, plan 7
Stockholm

Postal adress
Statistiska institutionen
Stockholms universitet
SE - 106 91 STOCKHOLM

Fax 08 - 16 7511

Opening hours

The Department is open Monday - Friday, 07:50 AM – 5:05 PM throughout academic year.

More contact details

We belong to the Faculty of Social Sciences

Faculty of Social Sciences

Seminars

Seminar: Miriam Hurtado Bodell, Linköping University

EVENT

Abstract

Contacts

We belong to the Faculty of Social Sciences