Ambition

1. Progressing the state of the art in OER use 

Quick facts:

  • The term Open Educational Resources (OER) was first introduced by UNESCO
  • OER movement has been successful in promoting the idea to create and publish OER
  • Various new models have appeared, from open repositories of multimedia content to recent MOOCs
  • There are no comprehensive data available about various OER sites
  • There has been and is still a lively debate within the OER community about which metadata and standards to use. 

Challenge:

There are obstacles that have prevented OERs from reaching their full potential. This includes quality of materials, quality of service as well as the problems of finding, assessing and reconfiguring learning materials. There are thousands of OER sites available in the world but no common practice or common entry points to ease the use and reuse of OER. Many studies like “Beyond OER” are finding that OERs are available but are not frequently used. Five main barriers have been identified: 1) lack of institutional support, 2) lack of technological tools for sharing and adapting resources, 3) lack of skills and time of users, 4) lack of quality or fitness of OER, and 5) personal issues like lack of trust and time.

Solution:

X5gon project will approach these fundamental OER use problems from a different perspective. It intends to leverage off the power of the data driven AI technologies that are able to create commonalities from the variety of data. The goal is to create a comprehensive analytics and modelling framework with simple scripts and snippets that can be easily integrated into OER sites to gain access to all of the richness of the OER landscape. This will include content understanding and structuring, cross and multilingual support and learning analytics. X5gon intends to offer these snippets with analytics services to OER sites for free in a transparent mode so that each OER will be automatically connected to other OERs through content and users.

2. Progressing state-of-the-art in cross-lingual text processing

Quick facts:

  • Cross-lingual text processing is currently a very active research area.
  • Linking extracted semantic triples to an existing non-linguistic knowledge base is a classic problem of computational linguistics.
  • It is often covered under the topic of text mining and is related to Relation Extraction (RE).
  • The main techniques for RE are rule-based, pattern-based or learning-based. The most popular types of methods are either feature-based or kernel-based.

Solution:

Cross-linguality in X5gon will be supported by services developed in the two FP7 EU projects XLike and xLiMe. xLiMe is providing services for cross-lingual and cross-media comparison, while XLike handles cross-lingual text processing. In addition to statistical machine learning in Xlike, xLiMe approach incorporates shallow and deep linguistic information.

3. Progressing state of the art in ASR, TTS and MT

Quick facts:

  • Spoken language translation (SLT) and machine translation of text (MT) is nowadays of great relevance in our increasingly globalized world, both from a social and an economic point of view.
  • Research laboratories all across the world are reporting steady progress in the performance of these systems, measured in evaluation campaigns such as the International Workshop on Spoken Language Translation (IWSLT) or the Workshop on Machine Translation (WMT).
  • But the transition of these systems into applications that are competing at the market is still slow and acceptance of machine translation technology is low.

Challenge:

However, many applications would benefit further from machine translation. Application scenarios considered in research institutions range from portable translators for tourists to translation services for international organizations or media surveillance. There has been significant progress in the field: the introduction of phrase-based models around 2000 and grammar-based approaches which matured at the end of the 2010s and the increased use of advanced machine learning methods. However, most of the work has focused on a narrow set of language pairs, often motivated by the needs of intelligence agencies in the United States, and narrow domains, such as news stories, and a rather static application of bulk translation of a fixed set of documents based on a fixed amount of training data. There has been rather limited research in domain adaptation, dynamic adaptation, robustness in the face of noisy data, use of semantics, and integration of machine translation systems into real work environments.

Solution:

Our X5gon project will apply technologies for multilingual ASR, TTS and MT systems developed in FP7 transLectures and H2020 TraMOOC in which data-rich EU languages contribute to the development and adaptation of systems for “data-scarce” EU languages (languages for which little support is available). In doing so, the lack of suitable data resources for many European languages could be overcome through the development of language technologies. The goal is not only to translate texts but multimodal content as well. For the video recorded materials high-quality transcriptions and translations will be produced using lecture-adapted ASR and MT models.

4. Progressing State-of-the-art on Mining on Educational data logs

Quick facts:

Educational Data Mining is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the settings which they learn. Until recently, it was not possible to analyse much of the learning activity of students while learning, and thus, the analysis of educational data was limited to tests in a closed experimental setup. However, MOOCs provided an opportunity to gather large collections of data from students interacting with the MOOC contents.

Challenge:

Another area of investigation has been the effect of the social network on learning. This can be derived from social network activity, and can be used to predict the performance of students. Other relevant factors that compose the profile of the student, such as age, can be predicted by the content that they visit. Some studies have been able to combine this implicit information with explicit answers given by the users in questionnaires or other sources of user annotated data such as categories or tags.

Solution:

In X5gon, we will use the experience of JSI developing scalable mining analytic services on logs. JSI and its spinoff Quintelligence are the developers of QMiner [Quint2012], which is used by New York Times and Bloomberg to analyse web logs, and handles tens of millions of events per day, with one event containing complete description of the visited web page (text, meta-data), user (demographic data) and context (time, user’s location).

5. Progressing state of the art in OER automatic quality assurance

Quick facts:

The quality assurance of OER, although recognized as an essential problem, is not nearly as advanced as it should be; quality assessment corresponds to the issue of being able to filter, recommend, and evaluate educational resources based on their pedagogical value, which is on their capacity of fixing the desired goals or learning outcomes. It should be noted that the issue is not new, but is today still very improperly addressed. It should also be noted that this issue is somehow specific to open education and solutions coming from other open fields do not transfer readily: In open research, quality assessment can follow similar rules as in the general framework: research papers are peer reviewed. There is nevertheless an issue with finding a correct financial model (green Vs gold open). The quantity of open educational resources produced (and to be produced) is immense: yet it is still expected that all the material would be evaluated for quality by human experts. 

Solution:

By providing technology based solutions in which the producing teacher can easily prepare OERs and distribute these without extra effort (for example without having to compile herself complex metadata), the learning teacher is easily able to find the resources she requires, adapted to the course she is preparing, the teachers are able to get a clear view of the quality factors associated to the resources without needing to go through a complex and frustrating refereeing process, by using automatically generated indicators whose correlation with quality is accepted by all, the decision makers and governing bodies are satisfied that the resources available to the teachers and educators is assessed, we firmly believe that we are addressing all 5 barriers.