Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to include Psalm superscriptions in a UsfmFileTextCorpus that uses English versification? #101

Open
robertsonbrinker opened this issue Mar 15, 2024 · 5 comments

Comments

@robertsonbrinker
Copy link

When I load an English Bible into a UsfmFileTextCorpus and then look at the Psalms, I notice that the superscriptions are not included in the corpus. For example, using the Berean Standard Bible and versification from Berean Bible: Free Downloads, when printing out Psalm 3 there is no superscription.

from machine.corpora import UsfmFileTextCorpus
from machine.scripture import Versification

targetVersification = Versification.load("./resources/bsb_usx/release/versification.vrs", fallback_name="web")
corpus = UsfmFileTextCorpus("./resources/bsb_usfm", versification = targetVersification, include_markers=True)

for row in corpus:
    if(row.ref.bbbcccvvvs[:6] == "019003"):
        print(row)

What printed:
PSA 3:1 - O LORD, how my foes have increased! How many rise up against me!
PSA 3:2 - Many say of me, “God will not deliver him.” Selah
PSA 3:3 - But You, O LORD, are a shield around me, my glory, and the One who lifts my head.
PSA 3:4 - To the LORD I cry aloud, and He answers me from His holy mountain. Selah
PSA 3:5 - I lie down and sleep; I wake again, for the LORD sustains me.
PSA 3:6 - I will not fear the myriads set against me on every side.
PSA 3:7 - Arise, O LORD! Save me, O my God! Strike all my enemies on the jaw; break the teeth of the wicked.
PSA 3:8 - Salvation belongs to the LORD; may Your blessing be on Your people. Selah

Is there a way to access or include Psalm superscriptions in a corpus that uses English versification when using machine?

@johnml1135
Copy link
Collaborator

Here is the USFM from the link provided:

\c 3
\s1 Deliver Me, O LORD!
\r (2 Samuel 15:13–29)
\b
\d A Psalm of David, when he fled from his son Absalom. 
\b
\q1 
\v 1 O LORD, how my foes have increased! 
\q2 How many rise up against me! 
\q1 
\v 2 Many say of me, 
\q2 “God will not deliver him.” 
\qr Selah \f + \fr 3:2 \ft Selah or Interlude is probably a musical or literary term; here and throughout the Psalms.\f* 
\q1 
\v 3 But You, O LORD, are a shield around me, 
\q2 my glory, and the One who lifts my head. 
\q1 
\v 4 To the LORD I cry aloud, 
\q2 and He answers me from His holy mountain. 
\qr Selah 
\q1 
\v 5 I lie down and sleep; 
\q2 I wake again, for the LORD sustains me. 
\q1 
\v 6 I will not fear the myriads 
\q2 set against me on every side. 
\b
\q1 
\v 7 Arise, O LORD! 
\q2 Save me, O my God! 
\q1 Strike all my enemies on the jaw; 
\q2 break the teeth of the wicked. 
\q1 
\v 8 Salvation belongs to the LORD; 
\q2 may Your blessing be on Your people. 
\qr Selah 

As per the USFM documentation, superscripts are designated by \sup, which does not appear in the text. Are you referring to footnotes or something else? Currently only the primary text (not headers, footnotes etc.) are extracted; everything else is discarded when processing the USFM.

@robertsonbrinker
Copy link
Author

Superscription might be the wrong word. I'm referring to the text that precedes the main body of a Psalm which in the English versification isn't assigned a verse number but in other schemes it is.

For example:

  • For Psalm 3, "A Psalm of David, when he fled from his son Absalom."
  • For Psalm 51, "For the director of music. A psalm of David. When the prophet Nathan came to him after David had committed adultery with Bathsheba."

In many USFM or USX files this kind of text is part of something other than a \v tag, such as a \d tag. When using machine to parse a usfm/usx file, is there any way to access this kind of text without changing the USFM files? It sounds like there may not be.

@johnml1135
Copy link
Collaborator

Yes - we call that "non Scripture portions" and have an issue open in SILNLP here. We are planning on implementing this over the next few months (no promised timeline). We (that is @ddaspit) still need to work out exactly how we will have unique labels to reference these text before and around scripture.

@ddaspit
Copy link
Contributor

ddaspit commented Mar 20, 2024

As @johnml1135 said, we don't currently support extracting non-verse text using the corpus classes in Machine. We are planning on adding support for this. In the meantime, Machine does provide an underlying USFM parser that you could use to implement custom text extraction. Here is a short tutorial on how to use the parser classes.

@robertsonbrinker
Copy link
Author

Sounds good. Thank you for the example on how to customize the parser, this is helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants