Publications

A survey on narrative extraction from textual data

Published in Artificial Intelligence Review (AIRE), 2023

Narratives are present in many forms of human expression and can be understood as a fundamental way of communication between people. Computational understanding of the underlying story of a narrative, however, may be a rather complex task for both linguists and computational linguistics. Such task can be approached using natural language processing techniques to automatically extract narratives from texts. In this paper, we present an in depth survey of narrative extraction from text, providing a establishing a basis/framework for the study roadmap to the study of this area as a whole as a means to consolidate a view on this line of research. We aim to fulfill the current gap by identifying important research efforts at the crossroad between linguists and computer scientists. In particular, we highlight the importance and complexity of the annotation process, as a crucial step for the training stage. Next, we detail methods and approaches regarding the identification and extraction of narrative components, their linkage and understanding of likely inherent relationships, before detailing formal narrative representation structures as an intermediate step for visualization and data exploration purposes. We then move into the narrative evaluation task aspects, and conclude this survey by highlighting important open issues under the domain of narratives extraction from texts that are yet to be explored.

Download here

Sexist Hate Speech: Identifying Potential Online Verbal Violence Instances

Published in Computational Processing of the Portuguese Language (PROPOR22), 2022

Online communication provides space for content dissemination and opinion sharing. However, the limit between opinion and offense might be exceeded, characterizing hate speech. Moreover, its automatic detection is challenging, and approaches focused on the Portuguese language are scarce. This paper proposes an interface between linguistic concepts and computational interventions to support hate speech detection. We applied a Natural Language Processing pipeline involving topic modeling and semantic role labeling, allowing a semi-automatic identification of hate speech. We also discuss how such speech qualifies as a type of verbal violence widespread on social networks to reinforce a sexist stereotype. Finally, we use Twitter data to analyze information that resulted in virtual attacks against a specific person. As an achievement, this work validates the use of linguistic features to annotate data either as hate speech or not. It also proposes using fallacies as a potential additional feature to identify potential intolerant discourses.

Download here

Brat2Viz: a Tool and Pipeline for Visualizing Narratives from Annotated Texts

Published in 4th International Workshop on Narrative Extraction from Texts (Text2Story 2021) associated to 43rd International Conference on Information Retrieval (ECIR2021), 2021

Narrative Extraction from text is a complex task that starts by identifying a set of narrative elements (actors, events, times), and the semantic links between them (temporal, referential, semantic roles). The outcome is a structure or set of structures which can then be represented graphically, thus opening room for further and alternative exploration of the plot. Such visualization can also be useful during the on-going annotation process. Manual annotation of narratives can be a complex effort and the possibility offered by the Brat annotation tool of annotating directly on the text does not seem suciently helpful. In this paper, we propose Brat2Viz, a tool and a pipeline that displays visualization of narrative information annotated in Brat. Brat2Viz reads the annotation file of Brat, produces an intermediate representation in the declarative language DRS (Discourse Representation Structure), and from this obtains the visualization. Currently, we make available two visualization schemes: MSC (Message Sequence Chart) and Knowledge Graphs. The modularity of the pipeline enables the future extension to new annotation sources, different annotation schemes, and alternative visualizations or representations. We illustrate the pipeline using examples from an European Portuguese news corpus.

Download here

TLS-Covid19: A New Annotated Corpus for Timeline Summarization

Published in Advances in Information Retrieval. (ECIR21), 2021

The rise of social media and the explosion of digital news in the web sphere have created new challenges to extract knowledge and make sense of published information. Automated timeline generation appears in this context as a promising answer to help users dealing with this information overload problem. Formally, Timeline Summarization (TLS) can be defined as a subtask of Multi-Document Summarization (MDS) conceived to highlight the most important information during the development of a story over time by summarizing long-lasting events in a timely ordered fashion. As opposed to traditional MDS, TLS has a limited number of publicly available datasets. In this paper, we propose TLS-Covid19 dataset, a novel corpus for the Portuguese and English languages. Our aim is to provide a new, larger and multi-lingual TLS annotated dataset that could foster timeline summarization evaluation research and, at the same time, enable the study of news coverage about the COVID-19 pandemic. TLS-Covid19 consists of 178 curated topics related to the COVID-19 outbreak, with associated news articles covering almost the entire year of 2020 and their respective reference timelines as gold-standard. As a final outcome, we conduct an experimental study on the proposed dataset over two extreme baseline methods. All the resources are publicly available at https://github.com/LIAAD/tls-covid19.

Download here

Extraction and Use of Structured and Unstructured Features for the Recommendation of Urban Resources

Published in Computational Processing of the Portuguese Language 2020, 2020

Urban Computing is concerned about the exploration and understanding of urban systems using data generated by itself. The objective of this paper is to describe an approach to analyze information expressed in social networks to help the recommendation of urban resources. This process considers different structured and unstructured features like resource’s location, reviews polarity, and user profile reliability. Therefore, we use text and Web mining techniques to extract those features and then apply traditional recommendation algorithms considering different combinations to identify if they provide better results. Results were compared, and we found that for neighborhood algorithms, the proposed approach presented better results when compared to traditional methods.’

Download here

Detecting Group Beliefs Related to 2018’s Brazilian Elections in Tweets: A Combined Study on Modeling Topics and Sentiment Analysis

Published in Workshop on Digital Humanities and Natural Language Processing, 2020

2018’s Brazilian presidential elections highlighted the influence of alternative media and social networks, such as Twitter. In this work, we perform an analysis covering politically motivated discourses related to the second round in Brazilian elections. In order to verify whether similar discourses reinforce group engagement to personal beliefs, we collected a set of tweets related to political hashtags at that moment. To this end, we have used a combination of topic modeling approach with opinion mining techniques to analyze the motivated political discourses. Using SentiLex-PT, a Portuguese sentiment lexicon, we extracted from the dataset the top 5 most frequent group of words related to opinions. Applying a bag-of-words model, the cosine similarity calculation was performed between each opinion and the observed groups. This study allowed us to observe an exacerbated use of passionate discourses in the digital political scenario as a form of appreciation and engagement to the groups which convey similar beliefs.

Download here

Mapeamento Sistemático: Ambientes Virtuais de Aprendizagem Ubíquos

Published in 5ª SENID, 2018

New tecnologies represents innovation opportunities in many contexts. A specific context is the educational environment, varied forms of knowledge presentation and new teaching methodologies allied to the technologies are capable to improve the students performance. This paper presents a systematic mapping on the use of ubiquitous computing in learning virtual environments.The work comprises the period between 2007 and 2017 and returns the works considered more relevant after the application of the systematic method of systematic mapping

Download here

Utilização de Classificadores Bayesianos para Predição de Afinidade Entre Personagens Literários

Published in XIV Simpósio de Informática do Centro Universitário Franciscano – SIRC 2017, 2017

Bayesian Classifiers are classifiers that use statistics to classify an attribute to a given class based on the probability of this object belonging to thisclass. In this paper, the use of machine learning techniques, aiming at external relations between the entities named in the same literary work. From theseclassifiers, applied in a test model were able to arrive at a result capable of indicating the potentiality of use of Bayesian classifiers as tools of summarization of texts.

Download here

JMP: Uma Solução para Definição de Topologias Mininet Utilizando JSON

Published in XIV Simpósio de Informática do Centro Universitário Franciscano – SIRC 2017, 2017

The fast growth and popularization of computer networks created bigand complex interconnected equipment groups. These groups are ilustrated bynetwork topologies. As a way to guarantee good instantiations of network topo-logies and to test new network technologies, network emulations, such offered byMininet platform, may be used. This paper presents JMP, a solution thats addsan abstraction layer to create Mininet topologies by using JSON, facilitating theMininet emulator use for the prototyping environment creation.

Download here

SIMPsON: Interface SQUID Para Gerenciamento de Perfis

Published in XIV Simpósio de Informática do Centro Universitário Franciscano – SIRC 2017, 2017

Access control in computer networks is an increasingly necessary al-ternative and present in companies and educational institutions, after all, it be-comes ideal to have tools that can manage and monitor access to the Internet,quickly and intuitively. Therefore, we present the SIMPsOn tool, which uses theSquid tool integrated to an intuitive graphical interface

Download here

Estudo de Caso: Análise do Método C4.5 na Predição do Papel de Jogadores de League of Legends

Published in Encontro Anual de Tecnologia da Informação (EATI), 2017

Multiplayer Online Battle Arena (MOBA) games have been in increasing popularity since the middle of the year of 2005. In this context, League of Legends (LoL) has become an important exponent in the MOBAs universe. This paper aims to analyze the possibility of detecting the role assumed by players during the matches, for this purpose it is used statistics and characteristics from best players by the official ranking. From the execution of several configurations of C4.5 algorithm we try to detect gaming patterns and understand the generated results.

Download here

Máquina de Turing Reversível: Um Estudo de Caso

Published in IV WORKSHOP-ESCOLA DE INFORMÁTICA TEÓRICA, 2017

This paper presents the concept of a Reversible Turing Machine and a case study based on the model idealized by Charles Bennett in 1973. In order to exemplify such a theoretical computation model, a machine was elaborated according to the presented guidelines in the article “Logical Reversibility of Computation”. For this purpose the Java programming language was used. The machine presented in this article receives an input from a standard Turing Machine, and converts the transition functions so that they are reversible, so that any action taken can be undone.

Download here

Brenda Salenave Santana

Publications