Statistical inference using rank-based post-stratified samples in a finite population
Abstract
In this paper, we consider statistical inference based on post-stratified samples from a finite population. We first select a simple random sample (SRS) of size n and identify their population ranks. Conditioning on these population ranks, we construct probability mass functions of the sample ranks of n units in a larger sample of size \(M > n\). The n units in SRS are then post-stratified into d classes using conditional sample ranks. The sample ranks are constructed with two different conditional distributions leading to two different sampling designs. The first design uses a conditional distribution given n ordered population ranks. The second design uses a conditional distribution given a single (marginal) unordered population rank. The paper introduces unbiased estimators for the population mean, total, and their variances based on the post-stratified samples from these two designs. The conditional distributions of the sample ranks are used to construct Rao–Blackwell estimator for the population mean and total. We show that Rao–Blackwell estimators outperform the same estimators constructed from a systematic sample.
Comments on: Deville and Särndal’s calibration: revisiting a 25 years old successful optimization problem
Abstract
We provide a brief discussion on the development of model calibration techniques and optimal calibration estimation in survey sampling and its relation to Deville and Särndal’s calibration, and applications of model calibration to missing data problems for robust inference.
Testing equality of a large number of densities under mixing conditions
Abstract
In certain settings, such as microarray data, the sampling information is formed by a large number of possibly dependent small data sets. In special applications, for example in order to perform clustering, the researcher aims to verify whether all data sets have a common distribution. For this reason we propose a formal test for the null hypothesis that all data sets come from a single distribution. The asymptotic setting is that in which the number of small data sets goes to infinity, while the sample size remains fixed. The asymptotic null distribution of the proposed test is derived under mixing conditions on the sequence of small data sets, and the power properties of our test under two reasonable fixed alternatives are investigated. A simulation study is conducted, showing that the test respects the nominal level, and that it has a power which tends to 1 when the number of data sets tends to infinity. An illustration involving microarray data is provided.
Comments on: Deville and Särndal’s calibration: revisiting a 25 years old successful optimization problem
On the convenience of heteroscedasticity in highly multivariate disease mapping
Abstract
Highly multivariate disease mapping has recently been proposed as an enhancement of traditional multivariate studies, making it possible to perform the joint analysis of a large number of diseases. This line of research has an important potential since it integrates the information of many diseases into a single model yielding richer and more accurate risk maps. In this paper we show how some of the proposals already put forward in this area display some particular problems when applied to small regions of study. Specifically, the homoscedasticity of these proposals may produce evident misfits and distorted risk maps. In this paper we propose two new models to deal with the variance-adaptivity problem in multivariate disease mapping studies and give some theoretical insights on their interpretation.
Likelihood-based tests for a class of misspecified finite mixture models for ordinal categorical data
Abstract
The main purpose of this paper is to apply likelihood-based hypothesis testing procedures to a class of latent variable models for ordinal responses that allow for uncertain answers (Colombi et al. in Scand J Stat, 2018. https://doi.org/10.1111/sjos.12366). As these models are based on some assumptions, needed to describe different respondent behaviors, it is essential to discuss inferential issues without assuming that the tested model is correctly specified. By adapting the works of White (Econometrica 50(1):1–25, 1982) and Vuong (Econometrica 57(2):307–333, 1989), we are able to compare nested models under misspecification and then contrast the limiting distributions of Wald, Lagrange multiplier/score and likelihood ratio statistics with the classical asymptotic Chi-square to show the consequences of ignoring misspecification.
Δεν υπάρχουν σχόλια:
Δημοσίευση σχολίου