IPM Text Encoding Part 2: Indexes of people
Posted by: pgooch 11 years, 3 months ago
In Part 1, I gave an overview of the process we are using to automate the structural and semantic XML markup of the IPM calendars. In this post, let's have a look at how we are dealing with the metadata about Person entities mentioned in the calendar entries.
Initially, we attempted to identify family relations directly from the narrative of the inquisition, a typical example being
the estate passed to John son of Richard heir of Robert son of Eleanor
which was fairly straightforward to process. However, the general case proved to be too complex, for example
Richard, late earl of Arundel, father of Richard his father, and his male heirs by Eleanor daughter of Henry of Lancaster, senior, late earl of Lancaster
And this would only have worked on a calendar-by-calendar basis: cross-document coreference of person names is hard, particularly without any training data available!
Instead, we are making use of the hard work done by the volume indexer in disambiguating, de-duplicating and rationalising (where possible) the various references to people and their descendants in each inquisition. Both the structure of each index entry and the levels of indentation used are employed to identify the nature of the relationship between one index entry and another.
Let's look at a concrete example:
Bourchier (de Bourchier, Bourghchier, de Bourgchier, Burghchier), Lord Bourchier, 211
Lady Bourchier, 640
Bartholomew, knight, 560
Elizabeth daughter of, see Robessart
William, knight, 596, 712-15
William, knight, 560
Eleanor daughter of, 560
Henry son of, 560
John son of, 560
Thomas son of, 560
William son of, 560
Examples of the conventions used across the indexes are as follows:
- Text in parenthesis indicates either an alternative spelling of a surname when it follows a surname, or a person's birthname when it follows a given name
- Indentation indicates that the person shares the same surname as the most recent person entry above the current entry that also has a lower level of indentation than the current person
- Family relations ‘son of', ‘wife of' etc indicate a relation with the most recent person entry above the current entry that also has a lower level of indentation than the current person, unless the family relation is immediately followed by another person (e.g. ‘Isabel wife of' vs ‘Isabel wife of Richard')
- Where the relation is inverted (e.g. ‘his wife'), the relationship is to the most recently mentioned person in the same entry.
Clearly, the index is much more complex than this, and the levels of indentation have to be kept track of (as they snake in and out!) but this gives you an idea of how this can work.
Here's what a portion of the processed index looks like, as visualised in a custom index-processing pipeline that we developed in GATE:
The metadata generated for the second 'William, knight' entry is shown in the pop-up. The entry has inherited the surname and surname variant spelling information from the parent 'Lord Bourchier' entry. The values of the 'has_daughter' and 'has_son' fields contain, respectively, the internal identifiers of the 'Eleanor', 'Henry', 'John', 'Thomas', 'William' entries that follow. Similarly, these entries will have 'daughter_of' and 'son_of' pointers back to the internal identifier for 'William, knight'.
From this representation, an intermediate XML file can be exported from which a topic map can be created:
<Person>
<ID>344936</ID>
<Inquisition>560</Inquisition>
<Gender>male</Gender>
<IndexEntry>Bourchier, William</IndexEntry>
<Surname>Bourchier</Surname>
<SurnameVariant>De Bourchier</SurnameVariant>
<SurnameVariant>Bourghchier</SurnameVariant>
<SurnameVariant>De Bourgchier</SurnameVariant>
<SurnameVariant>Burghchier</SurnameVariant>
<Forename>William</Forename>
<RoleName>knight</RoleName>
<Relation key="has_daughter" value="344938"/>
<Relation key="has_son" value="344944"/>
<Relation key="has_son" value="344946"/>
<Relation key="has_son" value="344940"/>
<Relation key="has_son" value="344942"/>
</Person>
<Person>
<ID>344938</ID>
<Inquisition>560</Inquisition>
<Gender>female</Gender>
<IndexEntry>Bourchier, Eleanor</IndexEntry>
<Surname>Bourchier</Surname>
<SurnameVariant>De Bourchier</SurnameVariant>
<SurnameVariant>Bourghchier</SurnameVariant>
<SurnameVariant>De Bourgchier</SurnameVariant>
<SurnameVariant>Burghchier</SurnameVariant>
<Forename>Eleanor</Forename>
<Relation key="daughter_of" value="344936"/>
</Person>
<Person>
<ID>344940</ID>
<Inquisition>560</Inquisition>
<Gender>male</Gender>
<IndexEntry>Bourchier, Henry</IndexEntry>
<Surname>Bourchier</Surname>
<SurnameVariant>De Bourchier</SurnameVariant>
<SurnameVariant>Bourghchier</SurnameVariant>
<SurnameVariant>De Bourgchier</SurnameVariant>
<SurnameVariant>Burghchier</SurnameVariant>
<Forename>Henry</Forename>
<Relation key="son_of" value="344936"/>
</Person>
<Person>
<ID>344942</ID>
<Inquisition>560</Inquisition>
<Gender>male</Gender>
<IndexEntry>Bourchier, John</IndexEntry>
<Surname>Bourchier</Surname>
<SurnameVariant>De Bourchier</SurnameVariant>
<SurnameVariant>Bourghchier</SurnameVariant>
<SurnameVariant>De Bourgchier</SurnameVariant>
<SurnameVariant>Burghchier</SurnameVariant>
<Forename>John</Forename>
<Relation key="son_of" value="344936"/>
</Person>
<Person>
<ID>344944</ID>
<Inquisition>560</Inquisition>
<Gender>male</Gender>
<IndexEntry>Bourchier, Thomas</IndexEntry>
<Surname>Bourchier</Surname>
<SurnameVariant>De Bourchier</SurnameVariant>
<SurnameVariant>Bourghchier</SurnameVariant>
<SurnameVariant>De Bourgchier</SurnameVariant>
<SurnameVariant>Burghchier</SurnameVariant>
<Forename>Thomas</Forename>
<Relation key="son_of" value="344936"/>
</Person>
<Person>
<ID>344946</ID>
<Inquisition>560</Inquisition>
<Gender>male</Gender>
<IndexEntry>Bourchier, William</IndexEntry>
<Surname>Bourchier</Surname>
<SurnameVariant>De Bourchier</SurnameVariant>
<SurnameVariant>Bourghchier</SurnameVariant>
<SurnameVariant>De Bourgchier</SurnameVariant>
<SurnameVariant>Burghchier</SurnameVariant>
<Forename>William</Forename>
<Relation key="son_of" value="344936"/>
</Person>
This XML is then checked and corrected by our research team, before being transformed into an XML topic map representation that can be imported directly into the EATS system.