Recent reinforcement learning studies extensively explore the interplay between cooperative and competitive behaviour in mixed environments. Unlike cooperative environments where agents strive towards a common goal, mixed environments are notorious for the conflicts of selfish and social interests. As a consequence, purely rational agents often struggle to maintain cooperation. A prevalent approach to induce cooperative behaviour is to assign additional rewards based on other agents' well-being. However, this approach suffers from the issue of multi-agent credit assignment, which can hinder performance. This issue is efficiently alleviated in cooperative setting with such state-of-the-art algorithms as QMIX and COMA. Still, when applied to mixed environments, these algorithms may result in unfair allocation of rewards. We propose BAROCCO, an extension of these algorithms capable to balance individual and social incentives. The mechanism behind BAROCCO is to train two distinct but interwoven components that jointly affect agents' decisions. We experimentally confirm the advantages of BAROCCO.
In social choice there often arises a conflict between the majority principle (the search for a candidate that is as good as possible for as many voters as possible), and the protection of minority rights (choosing a candidate that is not overly bad for particular individuals or groups). In a context where the latter is our main concern, veto-based rules -- giving individuals or groups the ability to strike off certain candidates from the list -- are a natural and effective way of ensuring that no minority is left with an outcome they find untenable. However, such rules often fail to be anonymous, or impose specific restrictions on the number of voters and candidates. These issues can be addressed by considering the proportional veto core -- the solution to a cooperative game where every coalition is given the power to veto a number of candidates proportional to its size. However, the naive algorithm for the veto core is exponential, and the only known rules for selecting from the veto core, with an arbitrary number of voters, violate either anonymity or neutrality. In this paper we present a polynomial time algorithm for computing the veto core and present a neutral and anonymous algorithm for selecting a candidate from it. We also show that a pessimist can manipulate the veto core in polynomial time.
Do individuals consider bribery as an acceptable behavior? We use a newly-designed game to study if—and under which conditions—bystanders are willing to express disapproval for bribing behavior through costly punishment. We manipulate two key dimensions: the benefits accrued by corrupt actors and the externality imposed on idle victims. We show that on average bystanders were unresponsive nearly half of the time they witnessed bribery. We also find that context specificity matters, as bystanders were more willing to punish when bribing caused them a disadvantageous inequity with respect to corrupt actors, even if bribing enhanced overall welfare. In an additional experiment testing whether social norms play any role in punishment decisions, we find that norms did not align with the observed bystanders’ behavior. This further supports our main result that bystanders did not react to bribery due to a concern for the social norm, but rather for their own comparative disadvantage relative to corrupt actors.
We consider the problem of electing a committee of k candidates, subject to constraints as to which committees are admissible for constitutional, conventional, or practical reasons. In our framework, the candidates are given labels as an abstraction of a politician’s religion, a film’s genre, a song’s language, or other attribute, and the election outcome is constrained by interval constraints (constraints of the form “Between 3 and 5 candidates with label X”) and dominance constraints (“At least as many candidates with label X as with label Y”). The goal is to select a committee that is as good as possible among those that satisfy the constraints. The difficulty is that in the standard social choice framework we do not have a quantifiable notion of “goodness”, only a voting rule that tells us which committee is the best. In this paper we argue how the logic underlying separable and best-k rules can be extended into an ordering of committees from best to worst, and study the question of how to select the best valid committee with respect to this order. The problem is NP-hard, but we show the existence of a polynomial time solution in the case of tree-like constraints, and a fixed-parameter tractable algorithm for the general case.
Based on micro-level data on reported household earnings, expenditures and assets, provided by the Russian Longitudinal Monitoring Survey (RLMS) for the period 2000–2013, it is found that households with workers in the public sector receive lower earnings than households with members employed in the private sector but enjoy the same level of consumption. Controlling for the reported level of earnings, private households do not show a significantly higher probability of possessing summer cottages (dachas), cars and computers, or living in better housing conditions, or having a higher level of monetary savings. The differences in assets cannot be reconciled with the sizeable expenditure-income gap found. The precautionary motives of workers are not able to reconcile these discrepancies either: neither attitude to risk, nor risk itself, differ between individuals employed in the private and public sectors. It is hypothesized that employees continue working in the public sector despite their low rate of official pay, because of unreported income they receive, or bribes.
We study ex-post fairness and efficiency in the object allocation problem. A matching is individually fair if it minimizes the number of envying agents, we call it minimal envy matching, and conditional on being minimal envy also minimizes the number of envying agents in a reduced problem, we call it minimal envy-2 matching. A matching is socially fair if supported by the majority of agents against any other matching – popular matching. We observe that when a popular matching exists it is equivalent to a minimal envy-2 matching. We show the equivalence between global and local popularity: a matching is popular if and only if in no group of size up to 3 agents a majority can benefit by exchanging objects, keeping the remaining matching fixed. We algorithmically show that an arbitrary matching is path-connected with a popular matching by locally-popular exchanges in small groups. Thus a corresponding market converges to a popular matching. Popular matchings often do not exist. We define most popular matching as a matching that is popular among the largest (by cardinality) subset of agents. We show that a matching is minimal envy-2 if and only if it is minimal envy and most popular, and propose a polynomial-time algorithm to find a Pareto efficient minimal envy-2 matching.
This work proves the existence of a coalition structure that is both Nash-stable and strongly permutation-stable for classes of coalition partition games. Among them are games where player’s payoff are Aumann-Drze value or the weighted value. These results are obtained by adapting the definition of a potential function for coalition partition games.
Recent experimental evidence reveals that information is often avoided by decision makers in order to create and exploit a so-called ‘moral wiggle room’, which reduces the psychological and moral costs associated with selfish behavior. Despite the relevance of this phenomenon for corrupt practices from both a legal and a moral point of view, it has hitherto never been examined in a corruption context. We test for information avoidance in a framed public procurement experiment, in which a public official receives bribes from two competing firms and often faces a tradeoff between maximizing bribes and citizen welfare. In a treatment where officials have the option to remain ignorant about the implications of their actions for citizens, we find practically no evidence of information avoidance. We discuss possible reasons for the absence of willful ignorance in our experiment.
This paper is devoted to the study of multicriteria cooperative games with vector payoffs and coalition partition. An imputation based on the concept of the Owen value is proposed. The definition of a stable coalition partition for bi-criteria games is formulated. A three-player cooperative game with the 0-1 characteristic function is considered and stability conditions of a coalition partition are established.
We give a criterion of positivity for the value of a matrix game. We use the criterion to investigate conditions of positivity of value for two classes of games: payoff matrices with all elements outside of the main diagonal having the same sign and symmetric payoff matrices.
We are dealing with a search game where one searcher looks for two mobile objects on a graph. The searcher distributes his searching resource so as to maximize the probability of detecting at least one of the mobile objects. Each mobile object minimizes its own probability of being found. In this problem the Nash equilibrium, i.e. the optimal transition probabilities of the mobile objects and the optimal values of the searcher’s resource, was found. The value of the game in a single-stage search game with non-exponential payoff functions was found.
This note shows that the egalitarian Dutta and Ray (1989) solution for transferable utility games is self-antidual on the class of exact partition games. By applying a careful antiduality analysis, we derive several new axiomatic characterizations. Moreover, we point out an error in earlier work on antiduality and repair and strengthen several related characterizations on the class of convex games.
We consider fair allocation of indivisible items under additive utilities. We show that there exists a strongly polynomial-time algorithm that always computes an allocation satisfying Pareto optimality and proportionality up to one item even if the utilities are mixed and the agents have asymmetric weights. The result does not hold if either of Pareto optimality or PROP1 is replaced with slightly stronger concepts.
This paper studies independence of higher claims and independence of irrelevant claims on the domain of bargaining problems with claims. Independence of higher claims requires that the payoff of an agent does not depend on the higher claim of another agent. Independence of irrelevant claims states that the payoffs should not change when the claims decrease but remain higher than the payoffs. Interestingly, in conjunction with standard axioms from bargaining theory, these properties characterize a new constrained Nash solution, a constrained Kalai–Smorodinsky solution, and a constrained Kalai solution.
The bibliography is published in onne tion with the ninetieth anniversary of the professor Robert John (Yisrael) Aumann. On the eighth of June the outstanding Israeli Ameri an game theorist Robert John (Yisrael) Aumann elebrates his 90-anniversary. A professor at the Center for the Study of Rationality in the Hebrew University of Jerusalem in Israel, he re eived in 2005 the Nobel Memorial Prize in E onomi S ien es for his work on oni t and ooperation through game-theory analysis. He shared the prize with Thomas S helling. At the Hebrew University of Jerusalem Prof. Aumann reated a ourishing game theory s hool whi h is one of the best in the world.
This paper axiomatically studies the equal split-off set (cf. Branzei et al. (Banach Center Publ 71:39–46, 2006)) as a solution for cooperative games with transferable utility which extends the well-known Dutta and Ray (Econometrica 57:615–635, 1989) solution for convex games. By deriving several characterizations, we explore consistency of the equal split-off set on the domains of exact partition games and arbitrary games.
Positive-Unlabeled (PU) learning is an analog to supervised binary classification for the case when only the positive sample is clean, while the negative sample is contaminated with latent instances of positive class and hence can be considered as an unlabeled mixture. The objectives are to classify the unlabeled sample and train an unbiased positive-negative classifier, which generally requires to identify the mixing proportions of positives and negatives first. Recently, unbiased risk estimation framework has achieved state-of-the-art performance in PU learning. This approach, however, exhibits two major bottlenecks. First, the mixing proportions are assumed to be identified, i.e. known in the domain or estimated with additional methods. Second, the approach relies on the classifier being a neural network. In this paper, we propose DEDPUL, a method that solves PU Learning without the aforementioned issues. The mechanism behind DEDPUL is to apply a computationally cheap postprocessing procedure to the predictions of any classifier trained to distinguish positive and unlabeled data. Instead of assuming the proportions to be identified, DEDPUL estimates them alongside with classifying unlabeled sample. Experiments show that DEDPUL outperforms the current state-of-the-art in both proportion estimation and PU Classification and is flexible in the choice of the classifier.