The article focuses on the interoperability of research data in the "Slovenian Linguistic Atlas" (SLA) and VerbaAlpina and compares them with some other dialectological and geolinguistic projects, i. e. the "Slavic Linguistic Atlas" (OLA, < https://www.slavatlas.org/>) and the "European Linguistic Atlas" (ALE).
1. Introduction
Slovenian dialects are part of many international linguistic and interdisciplinary projects, such as VerbaAlpina, "Slavic Linguistic Atlas" (OLA) and "European Linguistic Atlas" (ALE). As the data collection and recording in all these projects have not been coordinated from the beginning, some problems may arise in the interpretation and presentation of the data in other joint projects.
Interoperability between Slovenian dialect data collected for the "Slovenian Linguistic Atlas" (SLA), and other regional, national and international dialect projects covering Slovenian dialects could be achieved by:
- a common transcription system (national transcription systems should be translated/reorganized into IPA),
- a common database for long-term preservation of data, accessible to all project partners willing to contribute their material, maps and cartography,
- common e-maps, accessible to all project partners,
- common guidelines for the presentation of dialect data on the maps (symbols, words, lines, polygons),
- common programming tools for the mapping of dialect material.
2. Transcription
2.1. Transcription system in SLA
The "Slovenian Linguistic Atlas" (SLA) is a long-term Slovenian dialectological and geolinguistic project which was launched in 1934. The first two volumes were published in 2011 (SLA 1) and 2016 (SLA 2) the third volume will be published in 2023 (agricultural tasks and tools, alpine pasture, etc. SLA 1 and SLA 2 have also been published as e-publications, i.e. in HTML format at the dictionary portal of the Fran Ramovš Institute of the Slovenian Language, i.e. at <https://fran.si/iskanje?FilteredDictionaryIds=204&View=1&Query=%2A>, and as pdf files at <https://doi.org/10.3986/9789612543570> (SLA 1) and <https://doi.org/10.3986/9789612548797> (SLA 2). One can also reach the books in pdf at the home website of the Dialectology section of the ISJFR.
Several methods of transcription of dialect material have been used during the decades when dialect material has been collected for the SLA. The last changes sought to harmonise Slovenian national phonetic transcription (based on Slovenian standard/ortographic transcription) with that used in the "Slavic Linguistic Atlas" – OLA (comp. K. Kenda-Jež (Škofic, et al. 2016), p. 27–31. <https://omp.zrc-sazu.si/zalozba/catalog/view/1121/4763/1628>). 1
(Special) characters for vowels in SLA:
(Special) characters for sonorants in SLA:
(Special) characters for obstruents in SLA:
"Since the SLA recordings did not consistently follow this agreement and the new transcription system only gradually came into force, it was not possible to harmonise the recordings of all material without a time-consuming re-examination on site." ((Škofic, et al. 2016b), p. 454) With the inconsistent recording of vowel quality and quantity, diphthongs and consonant variants, the recording of accentuation/intonation was the biggest problem. Despite various shortcomings it was decided to publish the dialect material in the originally transcribed form, in order to avoid errors in the transliteration process. However – a retroactive harmonisation of the recorded dialect data would also require some experimental phonetic studies of the Slovenian dialects. The lack of research in this area is also one of the reasons why the transcription in International Phonetic Alphabet (IPA) has not yet been introduced in Slovenian dialectology.
A comparison between Slovenian (SLA), Slavic (OLA) and IPA transcription reveals some significant differences in the recording of phonetic variants of Slovenian dialects, e.g.:
SLA | OLA | IPA |
Close vowels | ||
i | i | i |
| ||
ü | | y |
ɨ | ɪ | |
y | | |
| ||
u | u | u |
| ||
| ||
| ɨ | |
| ʉ | |
| ||
ů | ||
ẹ | ẹ/e | e |
ḙ | e |
SLA | OLA | IPA |
Accent and quantity | ||
| : | ː |
| | |
| : | ː |
| | |
| ||
Ṽ | | |
/ | V: | Vː |
Vˑ | V | Vˑ |
| V | V |
SLA | OLA | IPA |
Velars | ||
k | k | k |
k’ | k’ | kj |
g | g | ɡ |
g’ | g’ | ɡj |
x | x | x |
x’ | x’ | xj |
γ | γ | ɣ |
ǥ | ǥ | |
ǵ | ǵ | |
| | |
γx | ||
h | ĥ/h | ħ/ʕ |
h | ||
q | ʔ | ʔ |
ʔ | ʔ | ʔ |
Comparision between SLA, OLA and IPA phonetic transcriptions
Therefore, it is difficult to rewrite/translate Slovenian dialect material into IPA without some simplification and without further experimental phonetic studies – e.g. SLA prnca (quilted blanket) (ḙ is a neutral front/anterior closed-middle to open-middle vowel, is closed, anterior, tense, clear vowel i) ≠ IPA péːrnica, SLA cìẹǥu (brick)(ǥ is a fricative/spirantized g) ≠ IPA cìegu, SLA špa (glass) ( is an extra short ascending accent on vowel i) ≠ IPA ˈʃipa etc.
2.1.1. From phonetic to orthographic/standardised transcription in SLA
The data generalization (and thus the reduction of the excessive number of data symbols) makes the lexical maps clearer and more meaningful. In morphological analysis phonetically abstracted lexemes (according to the phonetic rules of each dialect, subdialect or even local dialect) are followed by their Proto-Slavic transposition and their word-forming predecessors or foreign language sources (e.g. various Italian, Friulian, German, Hungarian or Croatian geolects, chronolects and sociolects). The orthographic transcription in SLA is thus based on the morphonological analysis of dialect lexemes, e.g. SLA V455.01 alpine meadow (Germ. die Alm, Slov. planina):
planína (alpine meadow) < *poln-in-a ← Psl. *poln-ъ ‘plane, without trees’
plánina (alpine meadow) < *poln-in-a ← PSl. *poln-ъ ‘plane, without trees’
Other lexemes for the concept (alpine meadow) in Slovenian dialects:
pašnik (alpine meadow) < *paš-ьn-ik-ъ ← *paš-ьn-ъ ← *paš-a (< *pas-j-a) ← *pas-ti ‘to pasture, to sheperd’
pašnjak (alpine meadow) < paš-ьn-jak-ъ
spašnjak (alpine meadow) < *sъ-paš-ьn-jak-ъ ← *sъ(n)- ‘together with’ + *pas-ti
pasovnik (alpine meadow) < *pas-ov-ьn-ik-ъ
gora (alpine meadow) < *gor-a ‘mountain’, ‘mountain forest’
velika gora (alpine meadow) < *vel-ik-a-j-a gor-a ← *vel-ik-ъ ‘big’ + *gor-a
grič (alpine meadow) < *grič-ь/*gъrič-ь ‘something like mountain’, ‘small mountain’
breg (alpine meadow) < *berg-ъ ‘elevation, rising ground, hill, brink’
visoki breg (alpine meadow) < *vysok-ъ-j-ь berg-ъ ← *vysok-ъ ‘high’ + *berg-ъ
hrib (alpine meadow) < *xrib-ъ ‘crest, broad ridge’
hriber (alpine meadow) < *xrib-r-ъ
vrh (alpine meadow) < *vьrx-ъ ‘top, peak’
snežnik (alpine meadow) < *sněž-ьn-ik-ъ ← *sněž-ьn-ъ ← *sněg-ъ ‘snow’
gmajna (alpine meadow) < *(gmajn)-a ← MHG. gemeine ‘common community estate’
komunja (alpine meadow) < *(komuń)-a ← Friul. comugne ‘common estate’
paskulič (alpine meadow) < *(paskul)-i-ь ← Friul. pascul ‘pasture’
In the non-linguistic (e.g. Slovenian ethnological) literature other types of orthographic transcription can be found, mostly based on a simplified phonetic transcription, that does not take into account various regular and irregular dialect phonetic changes, e.g., both plòh and pòh (with dialect development lo > o > o) for ʻwashboard’, Standard Slovenian perilnik, should be transcribed orthographically as ploh according to their morphonological analysis:
ploh (washboard) < *(plox)-ъ ← MHG bloch, bloc, Bavarian ploch ʻdeal board, log’ // < *plox-ъ ʻflat, level piece of wood’, comp. Proto-Slavic *plosk-ъ
2.2. Transcription system in OLA
The work on the "Slavic Linguistic Atlas" ("Общеславянский лингвистический атлас" - OLA) is based on the decision of the 4th "International Congress of the Slavists" (Moscow, September 1958). The OLA Commission is represented by individual Slavic centers – one of which is also the only Slavic language in the Alpine region, namely Slovenian (represented by the ZRC SAZU - "Scientific Research Center of Slovenian Academy of Sciences and Arts".
The OLA questionnaire has 3454 questions and the number of research points (local dialects) is 850 (including 25 Slovenian local dialects); dialect materials for the OLA were collected mainly in the years 1965-1975. The OLA is issued in two series: the first covers vocabulary, word formation, semantics; the second is grammar (morphology and syntactics), as well as phonetics and phonology. The OLA publications are divided into two series devoted to: first - lexical and wordforming maps and second - grammatical, i.e. phonetic and morphological maps. One of the OLA subcommissions is responsible for the generalizing transcription of the field material - here the morphonological analysis of the Slavic dialect material is prepared, predictable phonological differentiations are eliminated and the general legend concepts for the lexical and word formation maps are prepared (see more: <http://ola.zrc-sazu.si/OLB15ENG-uvod.htm> and <https://www.slavatlas.org/>).
For transcription of dialect material, the Slavic phonetic transcription is used in the OLA. The font ZRCola is used for phonetic transcription.
(Special) characters for vowels in OLA:
(Special) characters for consonants in OLA:
2.2.1. From an index to a legend in OLA
The orthographic transcription in OLA is also based on the morphonological analysis of dialect lexemes, e.g. - OLA 1901 кудрявый ‘curly (hair)’ (curly (hair)) (Germ. gelockt, Slov. kodrast) in Slovenian dialects:
kodrast (curly (hair)) < *kǫdr-ast-ъ ← Proto-Slavic *kǫdr-ь, *kǫder-ъ *‘piece of spinning material’ (OLA 16 k ːdrast, OLA 21 ˈkundrḁsti) 2
kodrav (curly (hair)) < *kǫdr-av-ъ (OLA 17 ˈkuːdra, OLA 149 ˈkọndravi)
skodran (curly (hair)) < *sъ-kǫdr-a-n-ъ (OLA 17 skudˈrḁn) ←*sъ(n)- ‘together with’ + *kǫdr-ь, *kǫder-ъ
kuštrast (curly (hair)) < *kuStr-ast-ъ ← Proto-Slavic *kustr-ъ / *kustr-a (S > š) (OLA 16 kːštrast) ← Proto-Slavic *kust-ъ ‘bush, bundle, touf of hair’
rodljast (curly (hair)) < *rǫd-ьlj-ast-ъ ← Proto-Slavic *rǫd-ъ ‘curly’ (OLA 15 ˈruːdĺast)
zaročen (curly (hair)) < *za-rǫd-j-en-ъ ← Proto-Slavic *za- ‘behind’ + *rǫd-ъ ‘curly’ (OLA 6 zaˈručen)
zafrjen (curly (hair)) < *za-xv-j-en-ъ ← Proto-Slavic *za- ‘behind’ + onomatopoeic *xvъr (OLA 4 zaˈfərjen)
zafrkočen (curly (hair)) < * za-fъrk-ǫt-j-en / *za-xv-k-ǫt-j-en (OLA 20 zafˈkọčen)
kravžast (curly (hair)) < *(kraž)-ast-ъ ← Germ. Bav. krausen ‘curl’ (OLA 147 ʔʀàːžast)
kravžljast (curly (hair)) < *(kražlj)-ast-ъ ← Germ. Bav. krausel ‘curl’ (Germ. Kräusel ‘curl’) (OLA 13 kˈrḁːžlast, OLA 14 kràːžlast, OLA 19 kˈražlast)
skravžan (curly (hair)) < *sъ-(kraž)-a-n-ъ ← *sъ(n)- ‘together with’ + Germ. Bav. krausen ‘curl’ (OLA 7 skràːžan, OLA 8 skràːžen, OLA 9 skràːžan, 10 skˈraːžan, 18 skˈraːžan, 147 sʔʀàːžan)
skravžljan (curly (hair)) < *sъ-(kražlj)-a-n-ъ ← *sъ(n)- ‘together with’ + Germ. Bav. krausel ‘curl’ (Germ. Kräusel ‘curl’) (146 skràžlan, 13 skˈrọžlan, OLA 16 skrːžlen, OLA 19 skˈražlan, OLA 148 sʔràːžlan)
ričast (curly (hair)) < *(rič)-ast-ъ ← Ital. riccio ‘curly’ (OLA 11 ˈriːčast)
ricast (curly (hair)) < *(ric)-ast-ъ ← Ital. Ven. rizo (for Ital. riccio) ‘curly’ (OLA 3 ˈricəst, OLA 5 ˈriːcast, OLA 12 ˈriːcast)
ricotast (curly (hair)) < *(ricot)-ast-ъ ← Friul. riçot (*rizot) ‘curl’ (OLA 2 ricóːtast)
čafarunast (curly (hair)) < *(ćavelun)-ast-ъ ← Friul. cjavelon ‘long-haired’ (OLA 1 čafaˈrunest)
2.3. Transcription system in ALE
The origins of the "European Linguistic Atlas" ("Atlas linguarum Europae" - ALE) project (<https://en.wikipedia.org/wiki/Atlas_Linguarum_Europae> and <http://www.lingv.ro/ALE.html>) date back to 1965, when the realization of a "European Linguistic Atlas" was proposed at the "International Geolinguistics Congress" in Marburg. In 1970, the project was put under the UNESCO direction and represents the first and so far only research project of this dimension in the field of linguistics. The ALE questionnaire has 546 questions and the number of research points (local dialects) is 2631 (including 9 Slovenian local dialects).
For the transcription of dialect material ALE uses an adopted international phonetic transcription:3
(Special) characters for vowels in ALE:
(Special) characters for consonants in ALE:
(Special) diacritics in ALE:
The orthographic transcription in the ALE depends on different transcription traditions in the languages involved in the project, e.g. ALE ‘potato’ (potato) (Germ. Grundbirne, Erdapfel, Kartoffel, Slov. krompir) in Austrian and Slovenian survey points:
Austrian German dialect lexemes for potato:
Erdapfel (potato) ← German Erde (< Proto-Germanic *erþō ‛earth’) + German Apfel ‛apple’ (< Proto-Germanic *aplu- ‛apple’)
Erdbirne (potato) ← German Erde (< Proto-Germanic *erþō ‛earth’) + German Birne ‛pear’ (< AHD bira, pira, MHD bir, bire ← Latin pirum ‛pear’)
Fletzbirne (potato) ← German Fletz (< MHG vletze ‛level, ground’) + German Birne (< AHD bira, pira, MHD bir, bire ← Latin pirum ‛pear’)
Grundbirne (potato)← German Grund (< MHG grunt ‛ground’) + German Birne (< AHD bira, pira, MHD bir, bire (← Latin pirum ‛pear’)
Rübe (potato) ← German Rübe (< Proto-Germanic *rāpā ‛turnip’)
Kropfrübe (potato) ← German Kropf ‛crop’ (< Proto-Germanic *kruppa- ‛boil, bulge’) + Rübe (< Proto-Germanic *rāpā ‛turnip’)
Kastanie (potato) ← Latin kastanea < Greek kástanon ‛chestnut’
Erdbohne(potato)← German Erde (< Proto-Germanic *erþō ‛earth’) + German Bohne ‛horsebean’
Bodennudel (potato)← German Boden (< Proto-Germanic *erþō ‛earth’) + German Nudel ‛noodle’ (< German Knödel ‛dumpling’)
Erdnudel (potato) ← German Erde (< Proto-Germanic *erþō ‛earth’) + German Nudel ‛noodle’ (< German Knödel ‛dumpling’)
Slovenian dialect lexemes for potato:
krompir ← German Grundbirne
repica < Proto-Slavic *rěp-ic-a < *rěp-a ‛turnip’
bob < Proto-Slavic *bob-ъ ‛bean, Latin Vicia faba’
hruške (Npl) < Proto-Slavic *gruš-a, *kruš-a (Nsg) ‛pear’
3. Database
All the above mentioned projects use different databases for storing and processing the collected data. The manuscripts of the collected dialect materials are mainly stored in the traditional project collections of the participating research centers, which will be gradually digitized.
3.1. SLA Slovenian Linguistic Atlas
The Slovenian Linguistic Atlas database SlovarRed has been designed in "Microsoft Access" with many relational subcharts (survey points, their codes and geographical names as well as the geographical coordinates of the survey points, the names of the recorders, the years of the recording, the form (notebook, card paper, pdf file) of the recording, the reliability of the recording, the answer in phonetic transcription, etc.). The SlovarRed database enables the connection with the scannings of the traditional archive notes and the connection with "ArcGIS". 4
The list of symbols and their numerical codes (SIMBola font) is one of the subcharts of SlovarRed for preparing legends of maps.
3.2. OLA
Tha database for the "Slavic Linguistic Atlas" uses Microsoft Excel, which contains the data of the survey points such as geographical names and codes/numbers of survey points the name of the country (Пункт), an answer in phonetic transcription (Материал), the numerical code of the symbol and its position on the printed map (L1, L2, P1, P2), commentary (Морфонология) etc. The database is not connected to a GIS.
3.3. ALE
A database for the "European Linguistic Atlas" uses Microsoft Excel, in which columns are provided for different data: A) survey points (code for country, language, survey point), B) name of language, C) answer in orthographic transcription, D) answer in phonetic transcription, E) commentary (i.e. ethymology etc.), F) symbol, G) numerical code of symbol (SIMBola). The database is not connected to a GIS.
4. Maps
All three projects, SLA, OLA and ALE, use symbols to represent dialect material on the maps. However, the programming tools they use to place symbols on the map are very different.
4.1. SLA
The SLA project is an interdisciplinary project, which not only involves linguistics (the linguists have to complete the database by collecting dialect material, preparing synchronic-diachronic linguistic analysis and geolinguistic presentation of dialect lexemes), but also other disciplines such as geography, i.e. cartography, spatial analysis of data, GIS. A specialist of the Institute of Anthropological and Spatial Studies at ZRC SAZU prepared the georeferencing of all data, while cartographers of the Anton Melik Geographical Institute at ZRC SAZU prepared suitable cartographic bases for the geolinguistic presentation of dialect material, i.e. electronic maps, which are accessible to all project members.
Common guidelines for the presentation of dialect data on the maps have been agreed. »A special hierarchy of symbols has been formulated in the Slovenian Linguistic Slovenian Linguistic Atlas is based on the deliberate and established practice of the "Slavic Linguistic Atlas". The symbols were mainly chosen by the authors of the maps themselves, in line with the agreement that the outline of the symbols should indicate the root of the lexeme and the inner part the word-formational construction of the lexemes. The choice of symbol was based on the morphological analysis«5 (see map for ʻalpine meadow’below). Lines on the maps are used to represent morphological and phonetic features (see the map for SLA V573.01 gnoj ʻmanure’in SLA 2.1, 163), while words and lines together are used only when no lexical differentiation (besides phonetic) is noticed. Colours represent a semantic differentiation (see the map for SLA V153B.01 ʻpocket knife’ in SLA 2.1, 131).
4.2. OLA
The mapping with symbols in the SLA follows the principles of the international OLA project. In contrast to the SLA, the program MapOLA has been used at OLA project in the last years, which has been developed at the Jazykovedný ústav Ľudovíta Štúra (Ľ. Štúr Institute of Linguistics) of the Slovak Academy of Sciences. It does not use GIS with georeferenced survey points, but places symbols directly into the pdf of the blank (empty) map.
4.3. ALE
The ALE project has undergone various types of mapping over the last decades – volumes 5 to 7 ((Poligrafico 1997), (Poligrafico 2002), (Poligrafico 2007)) were printed by the Instituto Poligrafico and the maps were produced by cartographers in Rome on the basis of the legends prepared by the authors of the commentaries. The authors themselves could not prepare any maps for their commentaries (except to draw symbols manually on a printed blank map to test a selection of symbols). In recent years, the geolinguists and computer scientists of the Romanian Academy (as president of ALE is Prof. Dr. Nicolae Saramandu from Bucharest) have developed a mapping tool Surfer 8, which allows the electronic drawing of symbols on the empty map (pdf) when a complex legend for each of the participating countries and languages is created in Excel. However, the linguists cannot draw their maps themselves, so the team is still looking for a better solution.
5. Conclusion
Future cooperation between linguists and language technologists of all project partners is essential. Information technology can prepare technical updates of databases and prepare programming tools for electronic (interactive) linguistic atlases. With the help of information technologies, the incorporation of the analyzed dialect material into different language and dictionary portals (such as <https://www.fran.si>) is also possible. The involvement of citizens in the electronic collection of dialect material through crowdsourcing is important, too. Information technologies thus enable linking of language knowledge, presented on electronic linguistic maps, with other open electronic sources and make them more (wider) accessible via the Internet. We (i.e. Slovenian dialectologists) should also reach a joint agreement with the project"VerbaAlpina" for the unification of the transcriptions, databases and mapping tools in order to efficiently link different projects. It would be very beneficial if the project partners could prepare a common tool, that is also useful for other geolinguistic projects (regional, national, international ones ...). This would allow an easier integration of Slovenian material into this international geolinguistic projects, which aims, among other things, to synthesise research on the linguistically, culturally, politically, etc. fragmented Alpine region.
And finally – does the SLA meet the FAIR requirements? It is findable (the easiest and most well-known way is to search on the Fran portal – dictionaries of the ISJFR ZRC SAZU), it is accessible and it wants to be interoperable because it is highly reusable. Thus - it is just »FAIR«.
Bibliography
- Mladinska knjiga 2004 = Mladinska knjiga (2004): Slovenski etnološki leksikon (Slovenian ethnological lexicon), Ljubljana, Mladinska knjiga, 410.
- Poligrafico 1997 = Poligrafico (1997): Atlas Linguarum Europae, volume I: fifth fascicle, maps and commentaries, Roma, Poligrafico.
- Poligrafico 2002 = Poligrafico (2002): Atlas Linguarum Europae, volume I: sixth fascicle, maps and commentaries, Roma, Poligrafico.
- Poligrafico 2007 = Poligrafico (2007): Atlas Linguarum Europae, volume I: seventh fascicle, maps and commentaries, Roma, Poligrafico.
- Škofic, et al. 2016 = Škofic, et al. , Jožica (2016): Slovenski lingvistični atlas. 2, Kmetija, 1 Atlas (SLA 2.1), Ljubljana, Založba ZRC, ZRC SAZU.
- Škofic, et al. 2016b = Škofic, et al. , Jožica (2016): Slovenski lingvistični atlas. 2, Kmetija, 2 Komentarji (SLA 2.2), Ljubljana, Založba ZRC, ZRC SAZU.
- SLA 1 = Škofic, Jožica (Hrsg.) (2011): Slovenian Linguistic Atlas 1 – Man (body, deseases, family), vol. 1, Ljubljana, Založba ZRC, ZRC SAZU (Link).
- SLA 2 = Škofic, Jožica (Hrsg.) (2016): Slovenian Linguistic Atlas 2 – Farm, vol. 2, Ljubljana, Založba ZRC, ZRC SAZU (Link).
- Weijnen 1973 = Weijnen, Antonius (1973): Atlas Linguarum Europae. Premier Questionnaire: onomasiologie, vocabulaire fondamental, Nimègue, Secrétariat de la Rédaction de l'A.L.E. (Link).
- Варбот/Вендина/Шалаева 2020 = Варбот, Ж. Ж. / Вендина, Т. И. / Шалаева, Т. В. (Hrsgg.) (2020): Общеславянский лингвистический атлас (ОЛА). Серия лексико-словообразовательная. /Выпуск 12. Личные черты человека/ (Slavic Linguistic Atlas (OLA), Lexical and Word-formational Series. /Volume 12, Personal characteristics/), Международный комитет славистов, Kомиссия Oбщеславянского лингвистического атласа, Российская академия наук (РАН), Институт русского языка Им. В. В. Виноградова РАН, Институт славяноведения РАН (International Committee of Slavists, The Slavic Linguistic Atlas Commission, Russian Academy of Sciences (RAS), Institute of the Russian Language Im. V. V. Vinogradov RAS, Institute of Slavic Studies RAS), Москва - Санкт-Петербург (Moscow - Saint Peterburg) (Link).