The problems of measurements and its consequences for models
In social models, we usually represent quantities of some abstract constructs (e.g. happiness, opinion, stubbornness...). This comes so natural to us that we often do not even question this process. However, entities like opinions do not appear naturally in a numerical form, but they are transformed into numbers through the process of measurement.
One of the biggest problems of measuring in the social sciences is that we want to measure complex entities which go beyond the data. For instance, we may collect some data about smiling and laughing, however, usually, we are not interested in the data per se, but in what it tells us about some underlying construct, such as happiness. This results in two main problems.
The first one is related to ordinality. Indeed, even if we assume no error in our data, we still have that the relationship between our data and the construct we want to measure is usually non-linear. This is a well-known problem in the social sciences (even if it is still often forgotten) where tools such as non-parametric statistics are used. However, many models, especially in the social simulations, completely neglect this aspect and suppose that all variables are of interval type (i.e. non-ordinal).
The second problem relates to operationalization. Indeed, there are multiple pieces of evidence that the same measurement can be operationalized in many different ways, often producing very different results. This seems to suggest that sometimes some of the phenomena we would like to quantify are more complex than expected and cannot be summarized in a single number.
While these are very complex problems and no simple solution exists for now, I believe that we could strongly improve the situation by producing models which strongly mimic the data's behaviour. This could be achieved, for example, by producing models from experiments, or from sources of dynamic data.
While this won't directly inform us about the underlying constructs, it will definitely give us much better understanding of how some observables relate to each other in a dynamic way. Furthermore, since such approach would be focused on data (instead of the construct) we will not have to worry much about the previously mentioned problems.
List of Endorsements
Report inappropriate content
Is this content inappropriate?
Close debate
What is the summary or conclusion of this debate?
Comment details
You are seeing a single comment
You can check the rest of the comments here.
Conversation with Emanuele
The role of the Deffuant model for the measurement process I proposed is just conceptual. With the bounded confidence model at hand we have a notion of equilibrium. Two agents (at a distance lower than the confidence bound) will not update their opinion if they have the same one, so we can say that the LLM agent we use as an opinometer and the human agent are at equilibrium and we can read the internal numerical value of the attitude on the LLM agent. However, you can in principle also calibrate a new LLM agent by instructing it with the updating rules of the Deffuant model and then putting it in interaction with an already calibrated one…
Concerning standardization, I completely agree with you on the limits. Not only for the time-dependency of social constructs but also because of the fact that they are context dependent. Following with the opinion dynamics example, what is considered extremist in a social context about an issue (so corresponding to 1 in a scale between 0 and 1) could be middle in another context (thus 0.5). The gun control issue in US and Europe is a good example. Thus, in my opinion what can be standardized is the operationalization, but not "the scale" let say. Our thermometer are in celsius, theirs in fahrenheit. The point, in the context of this discussion on the use of generative AI, can we use an agent trained on dataset coming from both datasets to convert from one scale to the other?
The idea of the agent+interaction is very interesting! If I understood correctly, you may start from supposing perfect deffuant interactions and - to make it simple - let's fix the confidence interval to a chosen number (let's say 0.2). Then we can test different interactions between the agent and a real person. If the person doesn't change opinion, we know their difference was bigger 0.2, otherwise it's below. In this way, we can calculate the person's real position.
Did I understand correctly?
Regarding converting from one scale to another, I think it may be possible geographically. In that case we have that our LLM is going to be our standard of measurement. It's a pretty wild idea, but I think it may theoretically work.
Time would be a much bigger problem (at least, I think, I don't know much about LLMs) mostly because we will need to update the LLM. For simplicity let's suppose we build now (in 2023) an LLM (let's call it L1) which is able to put information in context for every location and every year. So if I pass a Tweet from Italy in 2016 L1 would be able to provide a number on a desired scale (e.g. left-right scale).
The problem to me is that if we want to repeat things in 2025. At that time we should update our LLM to integrate information from the last years, so we can use it to analyze also data from 2024. This now LLM (let's call it L2). Now needs to provide exactly the same results as L1 for all the data up to 2023 and be able to process also data from 2024.
And I think this may be very very challenging... even if it may be technically possible.
Do you have some information on the reliability or something similar related to how LLMs can convert text to a measurement?
Loading comments ...