Discover TEI-encoded documents from GitHub public repositories.
Last indexed | Repository | Description | Languages | Matching files |
---|---|---|---|---|
30 May 2020 05:30 UTC | Tech-Leaderboard/nips_scraper | Scrape from https://papers.nips.cc/ | eng, som, spa, sqi, dan, por | 7234 |
30 May 2020 05:30 UTC | MorielV/Digital-Humanities---Ass2 | parsing song lyrics in python. | heb | 7202 |
30 May 2020 05:30 UTC | ananana/scientific_authorship_data | - | eng | 7195 |
30 May 2020 05:30 UTC | providedh/ACDH_Salzburg_recipes | Parser for the XML Recipes | deu | 7038 |
30 May 2020 05:30 UTC | giladax/digi-proj-GUI | - | heb | 6854 |
30 May 2020 05:30 UTC | srophe/persons | Public Respository for Syriaca persons projects, including authorities, hagiography, and prosopography | syr, eng, ara, fra, deu, lat, rus, ita, ell, grc | 6711 |
30 May 2020 05:30 UTC | uvalib/ead-utils | Tools used to process and ingest EAD xml finding aids into the repository and solr. | eng, spa, fra | 6687 |
25 Feb 2023 09:45 UTC | pruizf/disco | Diachronic Spanish Sonnet Corpus. Canonical and minor authors in Spanish (Europe, America and Asia): 15th to 19th century | spa | 6616 |
30 May 2020 05:30 UTC | centre-for-humanities-computing/grundtvig-data | Data repository for all data related to the grundtvig center | dan | 6551 |
09 Feb 2023 08:49 UTC | telota/jean_paul_briefe | Daten der digitalen Edition "Jean Paul – Sämtliche Briefe digital" | - | 6437 |
11 Mar 2023 13:02 UTC | christopheparisse/evalang | Données partagées pour le projet Evalang | fra, eng | 6403 |
11 Feb 2023 04:49 UTC | BetaMasaheft/Places | Any place mentioned in the catalogue | eng, orm, ara, gez, amh, lat, oro, fra, som, grc, deu, heb, ita, swa, tir, rus, spa, swe, roh, nor | 6309 |
30 May 2020 05:30 UTC | OpenGreekAndLatin/Teubner1-grc-dev | Raw OCR of out-of-copyright Teubner editions | - | 6089 |
30 May 2020 05:30 UTC | EAGLE-BPN/Inscriptions-from-Dacia | draft website for inscriptions contributed to EAGLE from UBB Cluj Napoca | ara, eng, fra, deu, grc, ell, heb, ita, lat | 5904 |
30 May 2020 05:30 UTC | MariaBarrett/Datamining-project | - | eng | 5524 |
30 May 2020 05:30 UTC | utda/text | - | - | 5513 |
21 Mar 2023 20:45 UTC | ISicily/ISicily | EpiDoc files for the I.Sicily project | eng, ita, grc, lat, heb, phn, xpu, osc, xly, scx, sxc | 5431 |
03 Apr 2023 02:51 UTC | peterwebster/henson | Master data store for the Hensley Henson Journals project, and issue tracker. The application code is kept elsewhere. | - | 5349 |
30 May 2020 05:30 UTC | blumenbach/blumenbach-tei | Blumenbach TEI Datenbank | deu, eng, fra, nld, dan, ita, rus | 5318 |
30 May 2020 05:30 UTC | sros-UNED/disco | Diachronic Spanish Sonnet Corpus. Canonical and minor authors in Spanish (Europe and America): 15th to 19th century | - | 5289 |
30 May 2020 05:30 UTC | OpenGreekAndLatin/Teubner2-grc-dev | - | - | 5236 |
14 Mar 2021 20:38 UTC | svakulenk0/uva-lsdp-course | Language, Speech and Dialogue Processing course @ University of Amsterdam 2021 | nld, eng | 5198 |
30 May 2020 05:30 UTC | sims-mss/openn-xml | TEI-XML files from OPenn | - | 5174 |
27 Jan 2023 11:45 UTC | bncolorado/CorpusSonetosSigloDeOro | Corpus of Spanish Golden-Age Sonnets (with metrical annotation) / Corpus de Sonetos del Siglo de Oro (con anotación métrica) | - | 5078 |
30 May 2020 05:30 UTC | ldkhanh/CSCI-544-ClassProject | Spanish Poetry Generation using Recurrent Neural Networks | - | 5077 |
30 May 2020 05:30 UTC | stevenly/pix2poem | Spanish Poetry Generation using Recurrent Neural Networks | - | 5077 |
30 Mar 2023 09:47 UTC | BetaMasaheft/Works | Ethiopian Literature edited in TEI | eng, gez, ara, lat, grc, amh, syr, kat, pal, ita, heb, cop, tir, deu, fra | 5022 |
05 Apr 2023 15:48 UTC | cbeta-git/xml-p5a | CBETA XML P5a 版本 | eng, zho, pli, san, x-unknown | 4869 |
20 Feb 2023 20:44 UTC | cbeta-org/xml-p5 | CBETA XML P5 版本 | eng, zho, pli, san, x-unknown | 4866 |
15 Jun 2022 11:42 UTC | DILA-edu/word-segment | - | - | 4830 |
30 May 2020 05:30 UTC | thsh77/textbase | A collection of markdown texts | - | 4801 |
30 May 2020 05:30 UTC | OpenGreekAndLatin/septuagint-dev | Machine-corrected version of Henry Barclay Swetes Septuagint. | ell, lat | 4736 |
11 Apr 2023 10:46 UTC | 84000/data-tei | TEI files of the translations | san, bod, zho, eng, pli, lat, jpn | 4719 |
30 May 2020 05:30 UTC | Chenlisk/xml-p5a-new | 因應「CB校註」所轉換的新版 xml-p5a | eng, zho, san, pli, x-unknown | 4718 |
30 May 2020 05:30 UTC | cbeta-git/xml-p5a-2018 | CBETA XML P5a 版本 (2013 - 2018) | eng, zho, san, pli, x-unknown | 4717 |
30 May 2020 05:30 UTC | cbeta-org/xml-p5-2018 | CBETA XML P5 版本 (2013 - 2018) | eng, zho, san, pli, x-unknown | 4717 |
29 Mar 2023 11:45 UTC | DARIAH-ERIC/lexicalresources | Data space of the DARIAH Lexical Resources Working Group | por, lat, fra, spa, eng, deu, bar, mix, gsw, swe, kat, nds, nor, slv, gmh, mig, miy, miz, smd, pit, pie, ine, ell, pol, bul, ibe, rus, ara, und, dan, chu, grc | 4706 |
09 Nov 2022 17:49 UTC | whitmanarchive/whitman-correspondence | Data Repo | Whitman Correspondence TEI | - | 4604 |
27 Dec 2021 20:40 UTC | newtfire/newtfire-site | - | eng, ita, fra, lat | 4507 |
30 May 2020 05:30 UTC | 84000/data | 84000 XML data files | bod, san, zho, pli, eng, lat, jpn | 4442 |
30 May 2020 05:30 UTC | heavenchou/xml_p4 | CBETA XML P4 | pli, san, eng, zho | 4429 |
15 Dec 2022 19:44 UTC | Antonomaz/Corpus | Collection de mazarinades encodées en XML-TEI. | fra | 4422 |
30 May 2020 05:30 UTC | utkdigitalinitiatives/tdh-migration | TEI migration from P2 SGML/XML to P5. | - | 4403 |
30 May 2020 05:30 UTC | cbeta-git/xml_p4 | CBETA XML P4 | pli, san, eng, zho | 4399 |
30 May 2020 05:30 UTC | JamesWolfe753/Patrologia-Latina-Corrected | - | grc, lat, heb | 4241 |
20 Sep 2020 16:32 UTC | pminhtam/entity-fishing-custom | - | eng, fra, deu, spa, ita, pol | 4239 |
30 May 2020 05:30 UTC | grasshoff/vorlesung2019 | - | eng, fra, dan, ita, spa, ces | 4202 |
30 May 2020 05:30 UTC | martinmueller39/TCP2ESTC | Experimental relabeling of TCP texts by decade with aligment to ESTC, including four decades at 40year intervals | eng, unk | 4139 |
30 May 2020 05:30 UTC | WesScivetti/Phonesthemes-Project | - | eng | 4052 |
24 Nov 2021 01:39 UTC | antonkarl/iceErrorCorpus | An Icelandic Error corpus, annotated for mistakes related to spelling, grammar, and other issues. | - | 4046 |