Higher revision available You are viewing revision 7 of this document. A higher revision of this document has been published: Revision 21.

CoNSSA: Corpus of Novels of the Spanish Silver Age

CoNSSA, Text+ and TextGrid Repository

This corpus was already published through GitHub and Zenodo (DOI).

As part of the activities of the Text+ consortium within the German National Research Data Infrastructure (NFDI), this new version of the corpus is now available in TextGrid Repository.

In contrast to the version published in GitHub and Zenodo, this new version (2.0.0) also contains:

  1. A better modeling of the FRBR model of the works, editions and texts in the TEI Header
  2. Data from further editions exported from the catalog K10plus
  3. Each work was described using library classification systems such as the Regensburger Verbundklassifikation (RVK), the Basic Classification (or Basisklassifikation, BK), and the Göttinger Online-Klassifikation (GOK). By that, we apply to research data the same classification systems that are used for describing primary and secondary literature in library catalogs
  4. References for works and authors to Wikidata, VIAF, and the authority files of German-speaking area (GND) and by the Spanish National library (BNE)

Why publish this corpus in TextGrid Repository if it was already available in GitHub and Zenodo?

Here are some reasons briefly mentioned:

  1. Persistent identifiers for each document
  2. Repository awarded with the CoreTrustSeal
  3. Repository for XML TEI with specific functions (transformation to HTML or plain text, creation of Table of Contents for each text)
  4. Search functions (see next section)
  5. Filtering functions through metadata
  6. Links to GND
  7. Combination with further corpora published in TextGrid Repository
  8. Download options (Shelf)
  9. User-friendly analysis through tools such as Voyant Tools
  10. Automatic annotation with tools from the CLARIN Switchboard
  11. Options for manual annotation
  12. Integration of the corpus in future developments
  13. Further visibility and harvesting options through other portals (re3data, OpenAIRE, CLARIN Virtual Language Observatory)

Searching in TextGrid Repository

Following searches are possible in TextGrid Repository

  • Search for words:
    • Madrid
    • dictador
  • Further options for searches are available:
    • Españ*
    • mujeres~, hombres~
  • Search for authors (with complete name, partial name or GND-ID):
    • work.agent.value: Benito Pérez Galdós
    • work.agent.value: Galdós
    • work.agent.id:"gnd:118641573"
  • Search for gender:
    • work.subject.id.value: authorGender AND work.subject.value: female
  • Search for year of publication
    • published in: work.dateOfCreation.value:1900
    • published after: work.dateOfCreation.value:>1901
    • published before: work.dateOfCreation.value:>1901
    • published between: work.dateOfCreation.value:>1900 work.dateOfCreation.value:<1910

Of course, these searches can be combined to construct pretty complex queries using information of the author, the edition and the text. For example, the following query should find all texts written by women, published between 1890 and 1900 in which the root Españ appears in the text:

  • work.subject.id.value: authorGender AND work.subject.value: female AND work.dateOfCreation.value:>1890 work.dateOfCreation.value:<1900 AND Españ*

Description of the corpus

A full description of the corpus can be found online in the chapters 3.1 and 3.2 of the following publication (Open Access):

Besides, an article written in Spanish about the main characteristics of the corpus is accessible online (Open Access):

History of the corpus

The corpus was composed as a part of the PhD of José Calvo Tello at the University of Würzburg (Germany). It was part of the project Computational Literary Genre Stylistics (CLiGS), led by Prof. Dr. Christof Schöch. The project was located at the Professorship of Prof. Dr. Fotis Jannidis.

The goal of the project was to analyze the Spanish novel and its subgenres (adventure, erotic, realistic novel, etc.) in the so-called Silver Age period (1880-1939).

Current version

Because of the changes mentioned before, the corpus is now in its version 2.0.0. The changes relating to the implementation of the FRBR model in TEI lead to the situation that much metadata has changed its location in the TEI Header, which forces updating the xPaths to extract this information. Following the Semantic Versioning, the incompatibility of the previous xPaths leads to a new major version of the corpus, changing its version from 1. to 2.


Citation Suggestion for this Object
TextGrid Repository (2022). README.md. CoNSSA: Corpus of Novels of the Spanish Silver Age (version 2.0.0). . https://hdl.handle.net/21.T11991/0000-001C-30B9-B