tag:blogger.com,1999:blog-7701757403364514168.post3037043467794215853..comments2023-10-23T11:13:35.712-04:00Comments on Games with Words: When should an effect be called significant?Edwardhttp://www.blogger.com/profile/04295927435118827266noreply@blogger.comBlogger4125tag:blogger.com,1999:blog-7701757403364514168.post-24861444111538890472010-12-16T23:05:52.835-05:002010-12-16T23:05:52.835-05:00Hah. I missed that 2 in the numerator. OK, those n...Hah. I missed that 2 in the numerator. OK, those numbers will have to be fixed.<br /><br /><i>What I'm saying is that if you keep collecting data, you will eventually reject the null.</i><br /><br />From reading your previous comments and your blog, I think you've confused observed effect size with expected effect size. Our hypotheses are really about expected effect size. Certainly, whenever you actually measure an effect, it's only very rarely going to be exactly 0 (though this can and does happen). But that doesn't mean that the correct prediction isn't 0.<br /><br />If it helps, think of the limit on an asymptote. At any finite point, you won't have reached the asymptote, but at infinity, you do. Infinity works differently, and our hypotheses really about samples that are infinitely large.GamesWithWordshttps://www.blogger.com/profile/15107067137612954306noreply@blogger.comtag:blogger.com,1999:blog-7701757403364514168.post-76773302303112193982010-12-16T21:51:19.661-05:002010-12-16T21:51:19.661-05:00Is that not correct?
The pooled sd is multiplied ...<i>Is that not correct?</i><br /><br />The pooled sd is multiplied by root 2/n for the two sample test, so the denominator is larger and t is smaller.<br /><br /><i>So let's say I measure the heights of men and women. For some reason, I get a spurious effect where the women are taller than the men (funny sample). You're saying I'm guaranteed to replicate this if I test enough subjects?<br /><br />Direction is going to matter.</i><br /><br />Sorry, that was careless wording. I did add the appropriate caveat one paragraph up: "you're 100% guaranteed to replicate it eventually (<i>assuming it's a 'real' effect, of course--and if it's not, you don't want to replicate it</i>).<br /><br /><i>Unless you run out of subjects and can't test any more. Again, we're not conducting a census. We're not interested -- at least, I'm not interested -- in knowing the state of the world at the present moment. We're interested in understanding the mechanics that generated the world and the present moment.</i><br /><br />But you're defining 'understanding the mechanics' as 'being able to reject the null'. What I'm saying is that if you keep collecting data, you will eventually reject the null, so the implication is that if you want to understand the mechanics, you just need to collect more data. But then if it's just about rejecting the null, why do the study anyway? You're 100% certain to end up rejecting it.<br /><br />Now, if what you're saying is that you care about being able to reject the null in a <i>reasonably-sized sample</i>, that's fine with me, but that's basically an effect size claim. You're saying that you don't care about your hypothesis if it takes a million people to reject the null, but you do care about it if you can do it in 20. I agree with that, and that's exactly why I think we should be explicit about effect sizes.<br /><br />Anyway, I'm spending too much time on this discussion again (though I've found it interesting and useful!), so the last word is yours.talhttp://talyarkoni.orgnoreply@blogger.comtag:blogger.com,1999:blog-7701757403364514168.post-32504832392292237262010-12-16T19:20:13.955-05:002010-12-16T19:20:13.955-05:00@Tal -- For some reason, your comment went to spam...@Tal -- For some reason, your comment went to spam again. This is annoying.<br /><br /><i>You haven't accounted for teh variance pooling for independent samples.</i> <br /><br />As I said, I'm assuming that the standard deviation is 15 for both samples. Thus the pooled standard deviation is <br />sqrt((15*15 + 15*15)/2) = sqrt(2*15*15/2) = 15<br /><br />Is that not correct?<br /><br /><i>you can replicate any non-zero effect with 100% probability if you just keep collecting subjects.</i><br /><br />So let's say I measure the heights of men and women. For some reason, I get a spurious effect where the women are taller than the men (funny sample). You're saying I'm guaranteed to replicate this if I test enough subjects?<br /><br />Direction is going to matter.<br /><br /><i>you could have made the original p value anything you wanted just by collecting more subjects</i><br /><br />Unless you run out of subjects and can't test any more. Again, we're not conducting a census. We're not interested -- at least, I'm not interested -- in knowing the state of the world at the present moment. We're interested in understanding the mechanics that generated the world and the present moment.GamesWithWordshttps://www.blogger.com/profile/15107067137612954306noreply@blogger.comtag:blogger.com,1999:blog-7701757403364514168.post-39832607780565674912010-12-16T13:10:02.275-05:002010-12-16T13:10:02.275-05:00A few quick comments:
Assuming the standard devia...A few quick comments:<br /><br /><i>Assuming the standard deviations is 15 (which is how IQ tests are normalized), then that should give you a t-value of 8 / (15 * 1/15^.5) = 2.07 and a p-value just under .05.</i><br /><br />I think you're confusing one-sample and two-sample t-tests. The t value above is correct for a one-sample t-test; you haven't accounted for the variance pooling for independent samples. But you're then providing the replication power for a two-sample t-test, which is why you get 29%. Re-run with the right numbers and you get 50%.<br /><br /><i>In practice, though, other people are only going to follow up on your effect if they can replicate it at the standard p=.05 level. What can we do to improve the chances of replicability?</i><br /><br />Well, again, if all you care about is rejecting the null once again, the answer is very simple: keep collecting more subjects, and you're 100% guaranteed to replicate it eventually (assuming it's a 'real' effect, of course--and if it's not, you don't <i>want</i> to replicate it).<br /><br /><i>What I get from all this is that if you want a result that you and others will be able to replicate, you're going want the p-value in your original experiment to have been lower than p.05. </i><br /><br />I don't really understand the point of talking about it this way. It doesn't make sense to think of replication likelihood as depending on the original p value, because you could have made the original p value anything you wanted just by collecting more subjects. So all you're really saying here is "if you want to increase the likelihood of rejecting the null, collect more subjects." And if you want to know by how much you should increase the sample, take the effect size and plug it into a power calculation.<br /><br />Of course, this just gets us back to my original point: if all you care about is rejecting the null, then you can replicate <i>any</i> non-zero effect with 100% probability if you just keep collecting subjects. It's just not an interesting problem. If you really think that the goal of empirical studies is just to reject a null of zero effect, then you can save yourself a lot of work by not doing anything at all, because for any question that you'd actually care about, you're guaranteed that the null is false a priori.talhttp://talyarkoni.org/blognoreply@blogger.com