How does STILINGUE's sentiment analysis work? January 22, 2024 20:48 Updated Index: Introduction What is the accuracy of our sentiment classification? Introduction Understand how Artificial Intelligence analyzes the posts collected in your research Within the context of sentiment analysis, the STILINGUE Artificial Intelligence engine utilizes three classifications: Positive, Negative, or Neutral. This classification is done semantically, taking into account the current meaning of words. This classification is done semantically, taking into account the current meaning of words. This process also includes the classification of emojis according to their recurrent usage patterns on social media. STILINGUE currently has a database of 1100 registered emojis, updated monthly, and takes into account those most commonly used by social media users. What is the accuracy of our sentiment classification? Accuracy indicates the overall performance of the AI model responsible for classification and allows determining how much the model correctly classified out of the total number of posts. Currently, the accuracy for sentiment classification in PT-BR posts is between 75% and 82%. STILINGUE understands that accuracy is not determined by the number of correct predictions but rather by diversity. This means that the most important aspect is to seek a good variety of examples to measure the quality of the Artificial Intelligence responsible for sentiment classification because accuracy is a dynamic metric; its result differs depending on the analyzed sample. By analyzing different samples, it is possible to establish a more common accuracy range in sentiment classification for the AI engine. The value of this range may be higher or lower, as it not only depends on the sample used but also needs to consider that Brazilian Portuguese (BR) is a living language, and words are redefined daily. A certain term, once considered positive, takes on a negative connotation. Furthermore, campaigns, commemorative dates, and crises can impart a new meaning to an expression, all as a result of the context in which it is used. Whenever a word or expression gains a new meaning, or when a new term is coined, the STILINGUE Artificial Intelligence engine needs to be trained, and for this reason, it undergoes continuous improvements. Another important point to consider is sentiment classification, which can vary due to various factors such as terms used in your Search Configuration and current context. Below are other elements considered by the AI when classifying the sentiment of collected posts: 1. Emojis and their classifications As explained earlier, the STILINGUE library contains thousands of emojis, the result of mapping from user posts on social media. Based on this sample, the AI categorizes each emoji according to its polarity, being: Negative, Positive, or Neutral. The polarity of the emoji is defined by its frequency of use on social media. If an emoji is not used frequently, it is considered Neutral. If this changes over time, and the emoji starts being used more critically, the polarity becomes Negative. If a post contains only emojis without any text, the sentiment classification is determined by the polarization of the emojis to establish the polarity of the post. 2. Laughter and its classifications In the case of posts that contain only laughter-indicating language, such as: kkkkk, hehehehe, hahahaha, hehehe, and others, the content undergoes a process called normalization so that all variations of laughter are understood in the same way. This process is similar to the normalization of abbreviated terms; for more information, refer to the FAQ. Although normalization facilitates the identification of a text with laughter, it is important to note that this category of interaction is not considered by the STILINGUE Artificial Intelligence engine when conducting sentiment analysis. In other words, a message with laughter will not be automatically classified as positive, neutral, or negative. This is because laughter cannot always be considered a positive sentiment, often being used as irony or in neutral and negative contexts. Therefore, the AI cannot determine the polarity of a post with laughter if there is no other textual element providing context for that post. 3. Abbreviations and their classifications When a term is written as an abbreviation of a word or an acronym on social media, the platform adopts normalization to revert this abbreviation to its original form. After this step, sentiment classification is performed based on the context presented in the post. This normalization is the same as that used in laughter classification, being one of the internal text processing stages that is not visible within STILINGUE. See the following example: Text collected: "amg, eu amei isso, sqn" Processed text: "amiga, eu amei isso, só que não." Text that appears on the platform: "amg, eu amei isso, sqn" This normalization is carried out by the Machine Teaching team (responsible for enhancing the STILINGUE Artificial Intelligence Engine), based on an extensive and constantly updated dictionary of terms. The platform is not altered by this action: the substitution of the abbreviated term for the full version is done automatically, as well as sentiment classification. In the case of the abbreviation "sqn," which is ironic, it is initially classified as Negative on the platform, according to the context of the publication. 4. Ambiguity and its classifications When a collected publication contains a term that may be ambiguous, Artificial Intelligence understands that term with a polarity that makes the most sense in the context of the word, most of the time. Observe the example below: A clothing brand that has a Children's section, and some interaction is collected with the word "Children," will be processed as a word with negative polarity. However, in this context, it is not referring to "Children" related to childishness. In such cases, with ambiguous words whose meaning has a polarity different from what the system usually classifies, it is recommended that users add these terms to the desired polarity to be classified by the Term Library. For more information, visit the discussion on the subject at our community or videos on our channel. 😃 Related articles FAQs - Search Configuration How to set user downtime Term Library Open Sea Data Handling - YouTube Sentiment in Reviews - Google Play