Stance detection on social media State of the art and trends

2023-01-06

ALDayel, A., & Magdy, W. (2021). Stance detection on social media: State of the art and trends. Information Processing & Management, 58(4), 102597. https://doi.org/10.1016/j.ipm.2021.102597

“we survey the work on stance detection across those communities and present an exhaustive review of stance detection techniques on social media, including the task definition, different types of targets in stance detection, features set used, and various machine learning approaches applied.” (ALDayel 和 Magdy, 2021, p. 1) 我们调查了这些社区的姿态检测工作,并对社交媒体上的姿态检测技术进行了详尽的回顾,包括任务定义、姿态检测中不同类型的目标、使用的特征集以及应用的各种机器学习方法。

“In addition, we explore the emerging trends and different applications of stance detection on social media, including opinion mining and prediction and recently using it for fake news detection.” (ALDayel 和 Magdy, 2021, p. 1) 此外,我们探索了社交媒体上立场检测的新兴趋势和不同应用,包括意见挖掘和预测以及最近将其用于假新闻检测。

“The huge dependency of users on these platforms as their main source of communication allows researchers to study different aspects of online human behavior, including the public stance toward various social and political aspects” (ALDayel 和 Magdy, 2021, p. 1) 用户对这些平台作为主要交流来源的巨大依赖性使研究人员能够研究在线人类行为的不同方面,包括公众对各种社会和政治方面的立场

“The nature of these issues is usually controversial, wherein people express opposing opinions toward differentiable points. Social issues such as abortion, climate change, and feminism have been heavily used as target topics for stance detection on social media (Mohammad, Kiritchenko, Sobhani, Zhu, & Cherry, 2016a).” (ALDayel 和 Magdy, 2021, p. 1) 这些问题的性质通常是有争议的,人们对可区分的点发表反对意见。堕胎、气候变化和女权主义等社会问题已被大量用作社交媒体上立场检测的目标主题(Mohammad、Kiritchenko、Sobhani、Zhu 和 Cherry,2016a)。

“Unlike sentiment analysis that can be applied to a piece of text without having a specific target, stance detection required a given target for analysis to measure the user’s viewpoint toward it.” (ALDayel 和 Magdy, 2021, p. 1) 与可以在没有特定目标的情况下应用于一段文本的情感分析不同,姿态检测需要一个给定的目标进行分析,以衡量用户对它的看法。

“The stance has been used in various research as a means to link linguistic forms and social identities that have the capability to better understand the background of people with polarized stances (Bassiouney, 2015).” (ALDayel 和 Magdy, 2021, p. 2) 这种立场已被用于各种研究中,作为一种将语言形式和社会身份联系起来的手段,能够更好地理解具有两极分化立场的人的背景(Bassiouney,2015 年)。

“In contrast, social media discussions on a given topic are more scattered; however sometimes they could be linked over a given hashtag” (ALDayel 和 Magdy, 2021, p. 2) 相比之下,社交媒体对特定主题的讨论更加分散;但是有时它们可​​以通过给定的主题标签链接

“We map out the terrain of existing research on stance detection and synthesizes how this relates to existing theoretical orientations. We provide a through comparison of stance and sentiment for socio-political opinion mining and show the orthogonal relation of sentiment polarity and stance.” (ALDayel 和 Magdy, 2021, p. 2) 我们绘制了现有姿态检测研究的领域,并综合了这与现有理论方向的关系。我们通过对社会政治观点挖掘的立场和情绪进行比较,并显示情绪极性和立场的正交关系。

“We provide a broader overview of stance detection methods, covering work that has been published in multiple research domains, including NLP, computational social science, and Web science.” (ALDayel 和 Magdy, 2021, p. 2) 我们对姿态检测方法进行了更广泛的概述,涵盖了在多个研究领域发表的工作,包括 NLP、计算社会科学和网络科学。

“We examine the modeling of stances using text, networks, and contextual features that have been overlooked by previous surveys and give a clear summary on which are the most effective features and machine learning approaches.” (ALDayel 和 Magdy, 2021, p. 2) 我们使用先前调查忽略的文本、网络和上下文特征来检查姿势建模,并就哪些是最有效的特征和机器学习方法给出清晰的总结。

“We show the different applications of stance detection on social media and provide a comprehensive discussion on the current trends and future directions of this field of study.” (ALDayel 和 Magdy, 2021, p. 2) 我们展示了姿态检测在社交媒体上的不同应用,并全面讨论了该研究领域的当前趋势和未来方向。

“We provide a comprehensive list to the existing resources for stance detection in the literature, including the datasets used, which of them are publicly available, the used machine learning methods, and extracted features.” (ALDayel 和 Magdy, 2021, p. 2) 我们提供了文献中用于姿态检测的现有资源的综合列表,包括使用的数据集、其中哪些是公开可用的、使用的机器学习方法和提取的特征。

“Du Bois and John W, argue that stance-taking (i.e., a person taking a polarized stance toward a given subject) is a subjective and inter-subjective phenomenon in which the stance-taking process is affected by personal opinions and non-personal factors such as cultural norms.” (ALDayel 和 Magdy, 2021, p. 3) Du Bois 和 John W 认为,立场采取(即一个人对给定的主题采取两极分化的立场)是一种主观和主体间的现象,其中立场采取过程受到个人意见和非个人因素的影响比如文化规范。

“Stance taking is a complex process related to different personal, cultural, and social aspects. For instance, the political stance-taking process depends on experiential behavior, as stated by McKendrick and Webb (2014).” (ALDayel 和 Magdy, 2021, p. 3) 立场采取是一个复杂的过程,涉及不同的个人、文化和社会方面。例如,正如 McKendrick 和 Webb(2014 年)所述,政治立场采取过程取决于经验行为。

“Mainly, stance detection aims to infer the embedded viewpoint from the writer’s text by linking the stance to three factors, namely, linguistic acts, social interactions, and individual identity. Using the linguistic features in stance detection is usually associated with attributes such as adjectives, adverbs, and lexical items (Jaffe, 2009).” (ALDayel 和 Magdy, 2021, p. 3) 主要是,立场检测旨在通过将立场与语言行为、社会互动和个人身份这三个因素联系起来,从作者的文本中推断出嵌入的观点。在姿态检测中使用语言特征通常与形容词、副词和词汇项等属性相关联 (Jaffe, 2009)。

“As stated in Due Boi’s stance triangle shown in Fig. 2, the process of taking a stance is based on three factors. These factors are (1) evaluating objects, (2) positioning the main subject (the self), and (3) aligning with other subjects (i.e., other social actors).” (ALDayel 和 Magdy, 2021, p. 3) 正如图 2 中 Due Boi 的立场三角形中所述,采取立场的过程基于三个因素。这些因素是 (1) 评估对象,(2) 定位主要主体(自我),以及 (3) 与其他主体(即其他社会参与者)保持一致。

“Furthermore, from a sociolinguistic perspective (Jaffe, 2009), it has been argued that there is no completely neutral stance, as people tend to position themselves through their texts to be in favor of or against the object of evaluation.” (ALDayel 和 Magdy, 2021, p. 3) 此外,从社会语言学的角度来看(Jaffe,2009),有人认为不存在完全中立的立场,因为人们倾向于通过他们的文本将自己定位为支持或反对评估对象。

“Another work by Chauhan, Kumar, and Ekbal (2019) used sentiment as an auxiliary task to predict the stance.” (ALDayel 和 Magdy, 2021, p. 4) Chauhan、Kumar 和 Ekbal (2019) 的另一项工作使用情绪作为辅助任务来预测立场。

“Unlike sentiment analysis, stance detection mainly focuses on identifying a person’s standpoint or view toward an object of evaluation, either to be in favor of (supporting) or against (opposing) the topic. This is usually inferred by a mixture of signals besides the linguistic cues, such as the person’s feelings, personality traits, and cultural aspects (Biber & Finegan, 1988).” (ALDayel 和 Magdy, 2021, p. 4) 与情感分析不同,立场检测主要侧重于识别一个人对评估对象的立场或观点,支持(支持)或反对(反对)该主题。这通常是通过除了语言线索之外的信号混合来推断的,例如人的感受、人格特质和文化方面 (Biber & Finegan, 1988)。

“stance detection can take an additional paradigm by leveraging non-textual features such as social attributes through networks and contextual features to infer the user’s stance (Aldayel & Magdy, 2019a; Darwish et al., 2017a; Lahoti, Garimella, & Gionis, 2018).” (ALDayel 和 Magdy, 2021, p. 4) 姿态检测可以采取额外的范例,通过网络利用社交属性等非文本特征和上下文特征来推断用户的姿态(Aldayel & Magdy, 2019a; Darwish et al., 2017a; Lahoti, Garimella, & Gionis, 2018) .

“For stance detection, a clear target 𝐺 must be defined in advance, to assess the overall attitude toward this target.” (ALDayel 和 Magdy, 2021, p. 4) 对于姿态检测,必须事先定义一个明确的目标 G,以评估对该目标的整体态度。

“According to a more relaxed definition, a text 𝑇 entails a stance to a target 𝐺, (𝑇 ⟶ stance to 𝐺), if the stance to a target can be inferred from the given text.” (ALDayel 和 Magdy, 2021, p. 5) 根据更宽松的定义,如果可以从给定文本推断出对目标的立场,则文本 T 需要对目标 G 的立场(T ⟶ 对 G 的立场)。

“The structure of social media extends the possibility to represent the social actor by using a variety of network features such as profile information (including age and gender)” (ALDayel 和 Magdy, 2021, p. 5) 社交媒体的结构通过使用各种网络特征(例如个人资料信息(包括年龄和性别))扩展了代表社会行为者的可能性

“To define the interplay between sentiment and stance, several studies have demonstrated that it is insufficient to use sentiment as the only dependent factor to interpret a user’s stance” (ALDayel 和 Magdy, 2021, p. 5) 为了定义情绪和立场之间的相互作用,多项研究表明,将情绪作为解释用户立场的唯一依赖因素是不够的

“From literature, stance detection has multiple forms, which can be categorized: (1) according to the target type, whether it is a single target, multi-related-targets, or claim-based targets; or (2) according to the classification task itself, where it is a detection of an existing stance or prediction of a future stance.” (ALDayel 和 Magdy, 2021, p. 6) 从文献来看,姿态检测有多种形式,可以分为:(1)根据目标类型,是单一目标、多相关目标还是基于声明的目标;或者(2)根据分类任务本身,它是对现有立场的检测或对未来立场的预测。

“Usually the classification task will have three classes: support, against, and none, where none is usually a text that is irrelevant to the target, or an objective statement that does not carry a given stance (Allaway & McKeown, 2020; Darwish et al., 2017b; Mohammad et al., 2016a).” (ALDayel 和 Magdy, 2021, p. 6) 通常分类任务将分为三个类别:支持、反对和无,其中 none 通常是与目标无关的文本,或者不带有给定立场的客观陈述 (Allaway & McKeown, 2020; Darwish et al ., 2017b;Mohammad 等人,2016a)。

“The user level: in which the objective is to predict the stance of a user toward a given topic. Different users attributes can be incorporated along with the text in their posts” (ALDayel 和 Magdy, 2021, p. 6) 用户级别:目标是预测用户对给定主题的立场。不同的用户属性可以与他们帖子中的文本一起合并

“It is more common in these studies to find the classification includes only the two polarized classes: support and against, since they usually assume that users will have a given stance even if their posts are neutral” (ALDayel 和 Magdy, 2021, p. 6) 在这些研究中更常见的是发现分类只包括两个两极分化的类别:支持和反对,因为他们通常假设用户会持有给定的立场,即使他们的帖子是中立的

“Multiple posts from the same user might have different stance classes, usually a combination of a given polarized stance (either support or against) and the neutral one; and in some rare cases they can have few number of posts of the opposite stance (e.g. when they show opposition to a certain action from an entity they support).” (ALDayel 和 Magdy, 2021, p. 6) 来自同一用户的多个帖子可能具有不同的立场类别,通常是给定的极化立场(支持或反对)和中立立场的组合;在极少数情况下,他们可能会有很少数量的相反立场的帖子(例如,当他们表示反对他们支持的实体的某项行动时)。

“The mostly common form of stance detection on social media is the target-specific stance detection. Most of the previous studies focused on inferring the stance for a set of predefined targets” (ALDayel 和 Magdy, 2021, p. 6) 社交媒体上最常见的姿态检测形式是目标特定的姿态检测。之前的大多数研究都集中在推断一组预定义目标的立场

“A separate stance classification model must be built for each target (G) separately. This is the basic practice, even for benchmark datasets such as SemEval 2016, which covers multiple topics. Most of the published work on this dataset has trained a separate model for each topic (target) separately” (ALDayel 和 Magdy, 2021, p. 6) 必须为每个目标 (G) 分别建立一个单独的姿态分类模型。这是基本做法,即使是基准数据集,如 SemEval 2016,它涵盖了多个主题。大多数关于此数据集的已发布工作都分别为每个主题(目标)训练了一个单独的模型

“In some special cases, one stance detection model can be applied on multiple related targets (Sobhani, Inkpen, & Zhu, 2017). The main assumption behind this kind of stance detection is that when a person gives their stance for one target, it gives implicit indication about their stance toward the other related targets.” (ALDayel 和 Magdy, 2021, p. 7) 在某些特殊情况下,一个姿态检测模型可以应用于多个相关目标(Sobhani、Inkpen 和 Zhu,2017)。这种立场检测背后的主要假设是,当一个人给出他们对一个目标的立场时,它会隐含地表明他们对其他相关目标的立场。

“In claim-based or open-domain stance detection, the target of the analysis is not an explicit entity, as is the case in the ones discussed earlier. The target here is a claim in a piece of news, and the objective is to detect the stance in the comments to this news, whether they are confirming the news or challenging its validity. The prediction labels tend to take the form of confirming the claim or denying it .” (ALDayel 和 Magdy, 2021, p. 7) 在基于声明或开放域的姿态检测中,分析的目标不是一个明确的实体,就像前面讨论的那样。这里的目标是一条新闻中的声明,目的是检测对该新闻的评论中的立场,无论他们是在确认新闻还是质疑其有效性。预测标签倾向于采用确认或否认声明的形式。

“Claim-based stance detection is considered a key method to analyze the veracity of a miss-information. In this task the stance of the replies is used to predict the veracity of a claim” (ALDayel 和 Magdy, 2021, p. 7) 基于声明的姿态检测被认为是分析错误信息真实性的关键方法。在此任务中,回复的立场用于预测声明的真实性

“Another possible categorization to the stance detection task can be framed according to the status of the stance to be modeled, either being (1) an existing stance expressed in text; or (2) an unexpressed stance that might have not occurred yet.” (ALDayel 和 Magdy, 2021, p. 7) 根据要建模的姿势的状态,可以对姿势检测任务进行另一种可能的分类,或者是(1)用文本表达的现有姿势; (2) 可能尚未发生的未表达的立场。

“Stance prediction aims to infer the stances of social media users with no explicit expression of these stances online. It is also used to predict stances on events that have not occurred yet.” (ALDayel 和 Magdy, 2021, p. 7) 立场预测旨在推断社交媒体用户的立场,而无需在线明确表达这些立场。它还用于预测对尚未发生的事件的立场。

“At the micro level, stance prediction means the estimation of an individual user’s standpoint toward a target or event in advance (pre-event), indicating the likelihood that the user will be in favor of or against the target event.” (ALDayel 和 Magdy, 2021, p. 7) 在微观层面,立场预测是指提前(事件前)估计个体用户对目标或事件的立场,表明用户赞成或反对目标事件的可能性。

“At the macro level, the public opinion toward an event is inferred, and the research tend to address this level of prediction as an aggregation of micro predictions” (ALDayel 和 Magdy, 2021, p. 7) 在宏观层面,公众对事件的看法被推断出来,研究倾向于将这一层面的预测作为微观预测的集合来处理

<img alt="" data-attachment-key="LSGC73HW" data-annotation="%7B%22attachmentURI%22%3A%22http%3A%2F%2Fzotero.org%2Fusers%2F8071752%2Fitems%2FT6QJ5ICY%22%2C%22annotationKey%22%3A%22TQZB5ATF%22%2C%22color%22%3A%22%23ffd400%22%2C%22pageLabel%22%3A%228%22%2C%22position%22%3A%7B%22pageIndex%22%3A7%2C%22rects%22%3A%5B%5B62.344%2C602.052%2C473.437%2C701.896%5D%5D%7D%2C%22citationItem%22%3A%7B%22uris%22%3A%5B%22http%3A%2F%2Fzotero.org%2Fusers%2F8071752%2Fitems%2FLBHECS77%22%5D%2C%22locator%22%3A%228%22%7D%7D" width="685" height="166" src="attachments/LSGC73HW.png" ztype="zimage">
(ALDayel 和 Magdy, 2021, p. 8)

“In debate forums, Qiu et al. (2015) have proposed a micro-level stance prediction model based on user behavior toward new events in which they have not participated. In this study, this kind of user was referred to as a cold-start user, which is a well-known term commonly used in recommendation systems.” (ALDayel 和 Magdy, 2021, p. 8) 在辩论论坛上,Qiu 等人。 (2015) 提出了一种基于用户对他们没有参与的新事件的行为的微观姿态预测模型。在这项研究中,这种用户被称为冷启动用户,这是推荐系统中常用的一个众所周知的术语。

“These signals can be categorized into two main types, namely, (1) content signals, such as the text of the tweets, and (2) network signals, such as the users’ connections and interactions on their social networks.” (ALDayel 和 Magdy, 2021, p. 8) 这些信号可以分为两种主要类型,即(1)内容信号,例如推文的文本,以及(2)网络信号,例如用户在其社交网络上的联系和互动。

“As illustrated in Fig. 4, the content features can be categorized into two general types, namely, the linguistic features and users’ vocabulary. The first type of features are related to the text’s linguistic features that help in inferring the stance. The other type concerns modeling a user’s stance based on the user’s general choice of vocabulary.” (ALDayel 和 Magdy, 2021, p. 8) 如图 4 所示,内容特征可以分为两种一般类型,即语言特征和用户词汇。第一类特征与文本的语言特征有关,有助于推断立场。另一种类型涉及根据用户的一般词汇选择来建模用户的立场。

“In the existing literature, the stance detection work that is concerned with using textual cues to detect stances includes textual features, sentiment polarity, and latent semantics.” (ALDayel 和 Magdy, 2021, p. 8) 在现有文献中,关注使用文本线索检测立场的立场检测工作包括文本特征、情感极性和潜在语义。

“Another kind of feature is the latent semantics feature that aims to reduce the dimension of a given input, such as mapping the sentences according to a predefined set of topics (i.e., topic modeling)” (ALDayel 和 Magdy, 2021, p. 9) 另一种特征是潜在语义特征,旨在减少给定输入的维度,例如根据预定义的一组主题映射句子(即主题建模)

“There are a considerable number of studies that represent stance detection based on the users’ vocabulary. The hypothesis is that individuals with same stance tend to use the same vocabulary choices to express their points of view (Darwish et al., 2020).” (ALDayel 和 Magdy, 2021, p. 9) 有相当多的研究表示基于用户词汇的姿态检测。假设是具有相同立场的个人倾向于使用相同的词汇选择来表达他们的观点(Darwish 等人,2020)。

“The focus of these studies is mainly to disentangle the topic from the viewpoint where the vocabulary is not only linked to the topic but also individual attitudes and characteristics (Beigman Klebanov et al., 2010).” (ALDayel 和 Magdy, 2021, p. 9) 这些研究的重点主要是从词汇不仅与主题相关联,而且与个人态度和特征相关联的观点来解开主题(Beigman Klebanov 等人,2010 年)。

“Social media provides a special kind of social data due to the structure of its platforms, wherein users can be characterized based on their social connections and interactions. Many existing works used network features to gauge the similarity between the users.” (ALDayel 和 Magdy, 2021, p. 9) 由于其平台结构,社交媒体提供了一种特殊的社交数据,其中用户可以根据他们的社交联系和互动来表征。许多现有作品使用网络特征来衡量用户之间的相似性。

“The network features that have been used to learn the users’ representations on social media can be grouped under two categories, namely, users’ behavioral data (Cignarella et al., 2020; Darwish et al., 2017a, 2020; Thonet, Cabanac, Boughanem, & Pinel-Sauvagnat, 2017) and users’ meta-data attributes (Pennacchiotti & Popescu, 2011).” (ALDayel 和 Magdy, 2021, p. 9) 用于学习用户在社交媒体上的表现的网络特征可以分为两类,即用户的行为数据(Cignarella 等人,2020 年;Darwish 等人,2017a,2020 年;Thonet,Cabanac, Boughanem, & Pinel-Sauvagnat, 2017) 和用户的元数据属性 (Pennacchiotti & Popescu, 2011)。

“The application of users’ behavioral data to identify the stance is motivated by the notion of homophily, based on the social phenomenon according to which ‘‘individuals associate with similar ones’’ (Bessi et al., 2016). When it comes to social media, the similarity between users is considered a core property that helps in inferring stances.” (ALDayel 和 Magdy, 2021, p. 9) 应用用户行为数据来识别立场的动机是同质性的概念,基于“个人与相似的人联系在一起”的社会现象(Bessi 等人,2016 年)。在社交媒体方面,用户之间的相似性被认为是有助于推断立场的核心属性。

“The interaction elements have been used to define the similarity between the users. One of the elements that has been extensively used to infer Twitter users’ stances is the retweet” (ALDayel 和 Magdy, 2021, p. 9) 交互元素已用于定义用户之间的相似性。被广泛用于推断 Twitter 用户立场的元素之一是转推

“In a recent study by Aldayel and Magdy (2019a), three types of network features to model stances on social media were defined. These network features are the (1) interaction network, (2) preferences network, and (3) connection network. The interaction network represents the users’ direct interactions with other users, in the sense of retweets, mentions, and replies. This type of network provides the best performance score of the stance detection model in compression with two other networks. The preference network is the network of users in the tweets they like. This network allows the detection of stances for users who may have limited posting or interaction behaviors online. Finally, the connection network includes the friends and followers of the users. The three types of networks provide the best performance in comparison with content features, Table 2.” (ALDayel 和 Magdy, 2021, p. 9) 在 Aldayel 和 Magdy (2019a) 最近的一项研究中,定义了三种类型的网络特征来模拟社交媒体上的立场。这些网络特征是 (1) 交互网络,(2) 偏好网络,和 (3) 连接网络。交互网络代表用户与其他用户的直接交互,在转发、提及和回复的意义上。这种类型的网络提供了与其他两个网络压缩的姿态检测模型的最佳性能分数。偏好网络是用户在他们喜欢的推文中的网络。该网络允许检测在线发布或交互行为可能有限的用户的立场。最后,连接网络包括用户的朋友和追随者。与内容特征相比,三种类型的网络提供了最佳性能,表 2。

“In studies by Darwish et al. (2018), Magdy et al. (2016), a user’s similarity component was utilized as the features. The similarity was calculated based on the interaction elements of a given tweet. These interaction elements included mentions, retweets, and replies, as well as website links (URLs) and hashtags used by users in their tweets.” (ALDayel 和 Magdy, 2021, p. 9) 在 Darwish 等人的研究中。 (2018),马格迪等人。 (2016),用户的相似性成分被用作特征。相似度是根据给定推文的交互元素计算的。这些交互元素包括提及、转发和回复,以及用户在其推文中使用的网站链接 (URL) 和主题标签。

“An study by Trabelsi and Zaïane (2018) used heterophily instead of homophily to measure the dissimilarity between the users to model the stance. The hypothesis is based on the tendency of users to reply to the opposed viewpoint. They used a rebuttal variable to model the users’ interactions and denote if the replies attack the previous author’s parent post. The value of the rebuttal depended on the degree of opposition between the viewpoint of the parent post and tweet” (ALDayel 和 Magdy, 2021, p. 9) Trabelsi 和 Zaïane(2018 年)的一项研究使用异质性而非同质性来衡量用户之间的差异性,从而对立场进行建模。该假设基于用户回复相反观点的倾向。他们使用反驳变量来模拟用户的交互,并表示回复是否攻击前作者的父帖子。反驳的价值取决于父帖子和推文的观点之间的反对程度

“Most of the studies on stance detection have focused only on one type of features, either content (the majority of studies) or network. However, there were few studies that directly compared the effectiveness of both sets of features.” (ALDayel 和 Magdy, 2021, p. 10) 大多数关于姿势检测的研究只关注一种类型的特征,内容(大多数研究)或网络。然而,很少有研究直接比较两组特征的有效性。

“However, it was also demonstrated by some of these studies, that combining both network and content feature might lead to the best performance” (ALDayel 和 Magdy, 2021, p. 10) 然而,其中一些研究也表明,将网络和内容特征结合起来可能会带来最佳性能

“In this approach, a stance dataset is annotated using a predefined set of labels, usually two (pro/against) or three (pro/against/none) labels.” (ALDayel 和 Magdy, 2021, p. 10) 在这种方法中,立场数据集使用一组预定义的标签进行注释,通常是两个(赞成/反对)或三个(赞成/反对/无)标签。

“To address the scarcity of the labeled data for each target in the stance detection task, some studies in this field attempted to incorporate unconstrained supervised methods, including transfer learning, weak-supervision, and distant supervision methods for stance detection.” (ALDayel 和 Magdy, 2021, p. 10) 为了解决姿态检测任务中每个目标的标记数据的稀缺性,该领域的一些研究试图结合无约束的监督方法,包括用于姿态检测的迁移学习、弱监督和远程监督方法。

“The study by Zhang et al. (2020) used SemEval stance task A and B along with new dataset ’Trade Policy’ to construct eight cross-target stance detection sub tasks based on splitting the task into two groups: Women’s Right (Feminist Movement, Legalization of Abortion) and American Politics (Hilary Clinton, Donald trump, Trade Policy). In their study they used knowledge-aware memory component to incorporate the external knowledge into BiLSTM.” (ALDayel 和 Magdy, 2021, p. 11) Zhang 等人的研究。 (2020) 使用 SemEval 立场任务 A 和 B 以及新数据集“贸易政策”构建八个基于将任务分为两组的跨目标立场检测子任务:妇女权利(女权运动,堕胎合法化)和美国政治(希拉里克林顿,唐纳德特朗普,贸易政策)。在他们的研究中,他们使用知识感知记忆组件将外部知识整合到 BiLSTM 中。

“Recently, attention has been devoted toward building unsupervised stance detection models, where clustering techniques are primarily used with a focus on the user and topic representation on the social media platform” (ALDayel 和 Magdy, 2021, p. 11) 最近,注意力集中在建立无监督的姿态检测模型上,其中聚类技术主要用于关注社交媒体平台上的用户和主题表示

“The work of Trabelsi and Zaïane (2018) proposed an unsupervised model using the clustering model at the author and topic levels. In this study, six topics collected from two online debate forums, namely, 4Forums and CreateDebate were used. Their clustering model leveraged both the content and interaction networks of the users (i.e., retweets and replies).” (ALDayel 和 Magdy, 2021, p. 11) Trabelsi 和 Zaïane (2018) 的工作提出了一种在作者和主题级别使用聚类模型的无监督模型。在这项研究中,使用了从两个在线辩论论坛收集的六个主题,即 4Forums 和 CreateDebate。他们的聚类模型利用了用户的内容和交互网络(即转发和回复)。

“The other study by Darwish et al. (2020) used a clustering technique to create an initial set of stance partition for annotation. They used unlabeled tweets related to three topics, Kavanaugh, Trump, and Erdogan. Their findings showed that using retweets as a feature provided the best performance score upon implementing clustering algorithm (DBSCAN), which surpassed the supervised method when using the fast-text and SVM models. Their findings are considered greatly motivational for the use unsupervised methods for stance classification in the future.” (ALDayel 和 Magdy, 2021, p. 11) Darwish 等人的另一项研究。 (2020) 使用聚类技术为注释创建一组初始的姿态分区。他们使用了与三个主题相关的未标记推文,即卡瓦诺、特朗普和埃尔多安。他们的发现表明,使用转推作为特征在实施聚类算法 (DBSCAN) 时提供了最佳性能得分,在使用快速文本和 SVM 模型时超过了监督方法。他们的发现被认为对未来使用无监督方法进行姿势分类具有极大的推动作用。

“The most recent study by Ammar et al. (2021) introduced embeddings representations of users’ tweets to enhance stance detection model. They used the users tweets embeddings along with hierarchy clustering to provide a framework to analyze fine-grained polarization between groups tweets related to Turkish election.” (ALDayel 和 Magdy, 2021, p. 11) Ammar 等人的最新研究。 (2021) 引入了用户推文的嵌入表示以增强姿态检测模型。他们使用用户推文嵌入和层次聚类来提供一个框架来分析与土耳其选举相关的群体推文之间的细粒度极化。

“Particularly, for projecting the users similarity, they used the Uniform Manifold Approximation and Projection (UMAP) algorithm along with hierarchical density based clustering (HDB-SCAN) to cluster the projected user vectors.” (ALDayel 和 Magdy, 2021, p. 11) 特别是,为了投影用户相似性,他们使用统一流形近似和投影 (UMAP) 算法以及基于层次密度的聚类 (HDB-SCAN) 对投影的用户向量进行聚类。

“The use of stance detection has proven beneficial as a social sensing technique to measure public support related to social, religious, and political topics.” (ALDayel 和 Magdy, 2021, p. 11) 立场检测的使用已被证明是一种有益的社会感知技术,可以衡量与社会、宗教和政治主题相关的公众支持。

“Another line of study uses stance detection to analyze the public viewpoint on social aspects.” (ALDayel 和 Magdy, 2021, p. 12) 另一项研究使用立场检测来分析公众对社会方面的看法。

“On the other hand, stance detection has been used to analyze the attitude related to disruptive events (Darwish et al., 2018; Demszky et al., 2019).” (ALDayel 和 Magdy, 2021, p. 12) 另一方面,姿态检测已被用于分析与破坏性事件相关的态度(Darwish 等人,2018 年;Demszky 等人,2019 年)。

“Homophily is the social phenomenon that concerns with people’s tendency to connect with ‘‘like-minded friends’’. An echo-chamber is the cascade of certain information among groups of people. This social behavior has been magnified in social media structures wherein certain beliefs are amplified within close circles of communication. Consequently, people are exposed to content in consent with the same opinions that they hold. As a result, this reinforces social media users’ view biases and blinds the users from other sides of information.” (ALDayel 和 Magdy, 2021, p. 12) 同质性是一种社会现象,涉及人们倾向于与“志同道合的朋友”联系的倾向。回声室是人群中某些信息的级联。这种社会行为在社交媒体结构中得到了放大,在这种结构中,某些信念在紧密的交流圈子中得到了放大。因此,人们接触到同意他们持有的相同观点的内容。结果,这加强了社交媒体用户的观点偏见,并使用户对信息的其他方面视而不见。

“Therefore, stance detection has been used to help in measuring and alleviating the problems that result from the polarization on social media.” (ALDayel 和 Magdy, 2021, p. 12) 因此,姿态检测已被用于帮助衡量和缓解社交媒体两极分化导致的问题。

“As discussed in Section 3.2.3, stance in comments toward the news is measured to detect if these comments are confirming or denying the news, which, in turn, is used to detect if the news is a rumor or authentic.” (ALDayel 和 Magdy, 2021, p. 12) 如第 3.2.3 节所述,衡量新闻评论的立场是为了检测这些评论是在证实还是否认新闻,而这反过来又被用来检测新闻是谣言还是真实的。

“The Fake News Challenge initiative (FNC-1) adopted this approach and proposed a stance detection task to estimate the stance of articles toward a given headline (i.e., claim).” (ALDayel 和 Magdy, 2021, p. 12) 假新闻挑战计划 (FNC-1) 采用了这种方法,并提出了一个立场检测任务来估计文章对给定标题(即声明)的立场。

“In the classification tasks the datasets further categorized as: target-specific, multi-target and claim-based stance dataset. For the stance prediction datasets, they are further categorized as macro and micro predictions” (ALDayel 和 Magdy, 2021, p. 13) 在分类任务中,数据集进一步分类为:特定目标、多目标和基于声明的立场数据集。对于姿态预测数据集,它们进一步分为宏观预测和微观预测

“Most of the target specific stance detection dataset in social media are English sources. There are two distinct stance datasets that covers non-English stance in social media. The first dataset is the MultistanceCat dataset (Taulé et al., 2018), which contains tweets related to Catalan Referendum in Spanish and Catalan. The dataset provides a multi-modeling to the stance in social media by incorporating the information included in the link along with the text of the tweet. The other dataset is the ‘‘SardiStance’’ which is related to Sardines movement in Italian tweets. This dataset has been introduce as part of EVALITA2020 task (Cignarella et al., 2020). This task provides two variations of data based on two subtasks (a) Textual Stance Detection and (b) Contextual Stance Detection.” (ALDayel 和 Magdy, 2021, p. 13) 社交媒体中的大多数目标特定姿态检测数据集都是英文来源。有两个不同的立场数据集涵盖了社交媒体中的非英语立场。第一个数据集是 MultistanceCat 数据集(Taulé 等人,2018 年),其中包含与西班牙语和加泰罗尼亚语的加泰罗尼亚公投相关的推文。该数据集通过将链接中包含的信息与推文文本结合起来,为社交媒体中的立场提供了多重建模。另一个数据集是“SardiStance”,它与意大利推文中的沙丁鱼运动有关。该数据集已作为 EVALITA2020 任务的一部分引入(Cignarella 等人,2020)。该任务基于两个子任务(a)文本姿态检测和(b)上下文姿态检测提供两种数据变体。

“Claim-based datasets: In this kind of stance dataset the object of evaluation is the source of information instead of social actor.” (ALDayel 和 Magdy, 2021, p. 13) 基于声明的数据集:在这种立场数据集中,评估的对象是信息源而不是社会行为者。

“Multi-related-targets: The two datasets that have multi-related-targets stance annotations are the Trump vs. Hillary dataset (Darwish et al., 2017b) and Multi-targets dataset (Sobhani et al., 2017). In Trump vs. Hillary dataset, each tweet is stance annotated for the two candidates in the same time such as (supporting Hillary and Against Trump).” (ALDayel 和 Magdy, 2021, p. 13) 多相关目标:具有多相关目标立场注释的两个数据集是特朗普与希拉里数据集(Darwish 等人,2017b)和多目标数据集(Sobhani 等人,2017)。在 Trump vs. Hillary 数据集中,每条推文都同时标注了两位候选人的立场,例如(支持希拉里和反对特朗普)。

“Stance prediction datasets: As a result of the lack of the benchmarks datasets in this kind of stance detection, the researchers tend to build their own datasets as illustrated in Table 4.” (ALDayel 和 Magdy, 2021, p. 13) 姿态预测数据集:由于在这种姿态检测中缺乏基准数据集,研究人员倾向于构建自己的数据集,如表 4 所示。

“Social media provides a rich source for social studies to analyze public opinion, especially on controversial issues. While sentiment analysis has been used for decades to analyze public opinion on products and services, stance detection comes as the correspondent solution for analyzing public opinion on political and social topics, where sentiment analysis fails to reflect support.” (ALDayel 和 Magdy, 2021, p. 13) 社交媒体为社会研究提供了丰富的资源来分析公众舆论,尤其是在有争议的问题上。虽然几十年来一直使用情感分析来分析公众对产品和服务的看法,但立场检测是分析政治和社会话题的公众观点的相应解决方案,其中情感分析无法反映支持。

“It is worth noticing that small amount of work used stance to analyze the social issues in comparison with political topics. This is due to the controversial nature of the political topics which facilitates the data collection for the stance detection.” (ALDayel 和 Magdy, 2021, p. 13) 值得注意的是,与政治话题相比,少量工作使用立场来分析社会问题。这是由于政治话题的争议性有助于为立场检测收集数据。

<img alt="" data-attachment-key="BUA43MQ8" data-annotation="%7B%22attachmentURI%22%3A%22http%3A%2F%2Fzotero.org%2Fusers%2F8071752%2Fitems%2FT6QJ5ICY%22%2C%22annotationKey%22%3A%22TQYDXZNW%22%2C%22color%22%3A%22%23ffd400%22%2C%22pageLabel%22%3A%2214%22%2C%22position%22%3A%7B%22pageIndex%22%3A13%2C%22rects%22%3A%5B%5B30.557000000000002%2C408.61032572837644%2C523.6051242427922%2C696.8410000000001%5D%5D%7D%2C%22citationItem%22%3A%7B%22uris%22%3A%5B%22http%3A%2F%2Fzotero.org%2Fusers%2F8071752%2Fitems%2FLBHECS77%22%5D%2C%22locator%22%3A%2214%22%7D%7D" width="822" height="481" src="/attachments/BUA43MQ8.png" ztype="zimage"> (ALDayel 和 Magdy, 2021, p. 14)

<img alt="" data-attachment-key="ZAVYH2LX" data-annotation="%7B%22attachmentURI%22%3A%22http%3A%2F%2Fzotero.org%2Fusers%2F8071752%2Fitems%2FT6QJ5ICY%22%2C%22annotationKey%22%3A%22Z9JTRXNH%22%2C%22color%22%3A%22%23ffd400%22%2C%22pageLabel%22%3A%2214%22%2C%22position%22%3A%7B%22pageIndex%22%3A13%2C%22rects%22%3A%5B%5B33.448%2C289.4777718292869%2C520.3016218816389%2C401.59000000000003%5D%5D%7D%2C%22citationItem%22%3A%7B%22uris%22%3A%5B%22http%3A%2F%2Fzotero.org%2Fusers%2F8071752%2Fitems%2FLBHECS77%22%5D%2C%22locator%22%3A%2214%22%7D%7D" width="811" height="187" src="attachments/ZAVYH2LX.png" ztype="zimage">

(ALDayel 和 Magdy, 2021, p. 14)

“Stance detection has been mostly approached using classification-based algorithm. This is mostly applied using supervised learning algorithms with huge dependency on human-annotated data. Consequently, techniques such as transfer learning and unsupervised learning have been used to resolve the scarcity of the annotated data but with less attention from researchers compared to supervised methods.” (ALDayel 和 Magdy, 2021, p. 14) 姿态检测主要是使用基于分类的算法来进行的。这主要是使用对人工注释数据有巨大依赖性的监督学习算法来应用的。因此,迁移学习和无监督学习等技术已被用于解决注释数据的稀缺性问题,但与监督方法相比,研究人员对它们的关注较少。

“This scarcity reflected by the need to enrich the data with information related to the object of interest. For instance, to detect stances related to climate change, information related to global warming considered beneficial for stance detection in-order to cover the complete aspect of the topic.” (ALDayel 和 Magdy, 2021, p. 14) 这种稀缺性反映在需要用与感兴趣的对象相关的信息来丰富数据。例如,为了检测与气候变化相关的立场,与全球变暖相关的信息被认为有利于立场检测,以便涵盖该主题的完整方面。

“The main goal behind this kind of stance detection is to predicting the unexpressed views and to infer people standpoints toward an event in advance (pre-event).” (ALDayel 和 Magdy, 2021, p. 14) 这种立场检测背后的主要目标是预测未表达的观点并提前(事件前)推断人们对事件的立场。

“In this kind of studies, the dataset contains pre-event and post-event posts annotated with the user’s stance before and after the event consequently. Thereby, predicting the stances is based on the user’s past behavior which can be extracted from network features along with the post’s content (Himelboim, McCreery, & Smith, 2013).” (ALDayel 和 Magdy, 2021, p. 14) 在这种研究中,数据集包含事件前和事件后的帖子,因此在事件前后用用户的立场注释。因此,预测立场是基于用户过去的行为,这些行为可以与帖子的内容一起从网络特征中提取出来 (Himelboim, McCreery, & Smith, 2013)。

“there is also large amount of work from the social computing and computational social science communities that showed the effectiveness of using social interactions and network feature for stance detection and prediction.” (ALDayel 和 Magdy, 2021, p. 14) 社会计算和计算社会科学社区也有大量工作表明使用社会互动和网络特征进行姿态检测和预测的有效性。

“Several studies demonstrated the consistent improvement on the overall performance of stance detection models when using the network features instead of just using content of post only. This kind of stance modeling draws more emphasis on the crucial role of users online behavior and social attributes in the stance detection models.” (ALDayel 和 Magdy, 2021, p. 15) 几项研究表明,当使用网络特征而不是仅使用帖子内容时,姿态检测模型的整体性能会得到持续改进。这种立场建模更加强调用户在线行为和社交属性在立场检测模型中的关键作用。

“While network features are more effective, but they are computationally expensive, since they require collecting additional information about users, which might not be highly practical in some cases. The need for further investigation in this direction is required to understand how to reach more effective, but at the same time highly efficient stance detection models that utilize social attributes and network information.” (ALDayel 和 Magdy, 2021, p. 15) 虽然网络特征更有效,但它们的计算成本很高,因为它们需要收集有关用户的额外信息,这在某些情况下可能不太实用。需要在这个方向上进行进一步调查,以了解如何达到更有效但同时利用社会属性和网络信息的高效姿态检测模型。

“In addition, most of the existing datasets are mainly focusing on social, political and religious issues. Having new datasets that covers additional domains would be of importance to explore the stance detection task for these new domains, such as the recent dataset of financial domain (Conforti et al., 2020).” (ALDayel 和 Magdy, 2021, p. 15) 此外,大多数现有数据集主要关注社会、政治和宗教问题。拥有涵盖其他领域的新数据集对于探索这些新领域的立场检测任务非常重要,例如最近的金融领域数据集(Conforti 等人,2020)。

“In general, as stance detection has been used heavily as social sensing technique on social media to study societal aspects ranging from politic, religion and social topics, this urges the need to incorporate more robust modeling of the stance on social by using network features for a more accurate analysis.” (ALDayel 和 Magdy, 2021, p. 15) 总的来说,由于姿态检测已被大量用作社交媒体上的社交感知技术,以研究政治、宗教和社会话题等社会方面,这促使需要通过使用网络特征来整合更强大的社交姿态模型更准确的分析。