Category Archives: Data analysis

grid paper using Stata

I just wrote a program to make grids using Stata! Not much use for many people, but if you have a friend who is a knitter and needs a grid template, well this may come in handy 🙂

This is an ado file, so just: 1) copy the text below the image into a text editor; 2) save it with the extension “.ado”; 3) put it in your Stata personal directory

The following grid was generated using the command:
mkgrid 60 80 "44 162 95" vthin label alt
grid_60x80


/**
Program to produce a y by x grid
Carl Higgs 16 April 2016

Input parameters (arguments):
y number of vertical spaces
x number of horizontal spaces
colour line colour
width line width
labels whether numbered axes are desired (if so, write "label" here, or leave blank)
alt if you have specified having labels, you might want them to alternate: type "alt"

e.g.
mkgrid 60 80
mkgrid 60 80 black
mkgrid 60 80 black vthin label
mkgrid 60 80 black vthin label alt
**/
* capture program drop mkgrid
program mkgrid, rclass
version 13.1
args y x colour width labels alt
di "y is `y' and x is `x'"
drop _all
loc obs = max(`y',`x')
set obs `obs'
if `y'==`obs' {
egen y = fill(1(1)`y')
gen x = `x'
}
else {
gen y = `y'
egen x = fill(1(1)`x')
}

loc aspect = `y'/`x'

if "`labels'"=="label" {
tw sc y x, ms(i) ///
graphregion(fcolor(white) lcolor(white)) scheme(s2color) ///
xlab(0(1)`x',grid nogextend gmin glc("`colour'") `alt' labsize(vsmall) glwidth("`width'")) ///
ylab(0(1)`y',grid nogextend gmin glc("`colour'") `alt' angle(h) labsize(vsmall) glwidth("`width'")) ///
ytitle("") ///
xtitle("") ///
aspect(`aspect') ///
xsca(noextend) ysca(noextend) ///
name(grid_`y'x`x',replace)
}

else {
tw sc y x, ms(i) ///
graphregion(fcolor(white) lcolor(white)) scheme(s2color) ///
xlab(0(1)`x',grid nogextend gmin glc("`colour'") glwidth("`width'")) ///
ylab(0(1)`y',grid nogextend gmin glc("`colour'") glwidth("`width'")) ///
ytitle("") ///
xtitle("") ///
aspect(`aspect') ///
xsca(off) ysca(off) ///
name(grid_`y'x`x',replace)
}

end

A 5×5 grid:
grid_5x5

A 2496×738 grid, towards the limits of what this code can do, the artifacts of electrical glitch spirits start to surface:
grid_2496x738

Latex to Excel

Converting LaTeX tables to Excel is a pain; here are some steps I found useful for reformatting and removing persistent hidden whitespace characters:

  • remove the \\ symbols and import the latex text document in Excel with the ampersand symbol (&) set as a delimiter (simple and good idea from this blog post – thank you Ever Barbero!)
  • replacing intervening table code (e.g. \multicolumn{5}{c} ) and remaining curly brackets with nothing
  • and replacing sub-/super-script (eg. ^2) and other symbol codes (eg. \chi) with their equivalent unicode characters (e.g. χ²)
  • using the ‘TRIM(cell reference)’ formula in Excel to remove space characters except for single space between words/values…. However in itself I found this did not work satisfactorily.  Through trial and error I found that the embellishment of the TRIM() function as TRIM(CLEAN(cell reference)) removes hidden characters which trim alone may not. I had hidden spaces screwing up alignment of table items, and expanding this formula —actually, IF(ISBLANK(cell reference,””,TRIM(CLEAN(cell reference))) , so empty cells didnt’ return as errors — to reference the tables contents, and then pasted as values over the top solved this problem.

Whilst writing this post I found a StackOverflow post covering this and a few other tips which would have saved me some trial and error time if I found it sooner.

Table of two-sided P-values for the chi-squared distribution (d.f. = 1)

Frustrated at being unable to find a detailed chi-squared distribution for only 1 degree of freedom, as used for results of the Mantel Haenszel chi-squared test, to quickly reference your test statistic for the approximate two-sided P-value? I was, so delved into Excel to construct the following table – colour coded for glory! – using the formula “CHISQ.DIST.RT(x,deg_freedom)” for results from 0.000 (P-value = 1.00) to 15.975 (P-value = 0.000064) in increments of 0.025. Critical values are in bold.

To use, cross-reference your Mantel-Haenszel chi-squared result first with the whole numbers at the top of the columns and then with the approximate decimal point to find the corresponding approximate P-value. For example for MHX2 of 8.43 you would cross reference 8 at the top with 0.450 on the side to find the two-sided P-value of 0.0037.

Enjoy the rainbow of probability! It can be downloaded by clicking on the tools icon “>>” in the upper right-hand corner and selecting “download”.

Download the PDF file .

You can also make your own table of areas in the upper tail of the standard normal distribution (one-sided P-value) in Excel using the formula “=1-NORMSDIST(‘Z-score array’!D5)” Where ‘z-score array’ refers to a seperate worksheet (called ‘z-score array’ with an the arbitrarily demonstrated cell ‘D5’ which is one of many in an array of z-score values from 0 to 3.99 (or higher, I went to 4.49, but that’s a pretty small area in the tail there….), arranged with x.x down the left column and -.-x across the top column. Ha, or you could just look it up in a book, hey? 🙂

Download the PDF file .

Keyboard shortcuts for Stata/IC 13

Here is a consolidated list of keyboard shortcut commands for Stata, as well as some directions on how to form custom macro strings. I have formed this list from various Stata help files as well as my own trial and error:

Tab: completes a variable’s name in the command line sequence, as far as ambiguity allows (if there are similarly named variables)
Esc: clears the command sequence

Move cursor to various panes:
Ctrl+Tab: reverse cycles through the various panes
Ctrl+Shift+Tab: forward cycles through the various panes
Ctrl + 1: command pane
Ctrl + 2: output pane
Ctrl + 3: review log
Ctrl + 4: variable pane (also: pressing space bar whilst a variable is selected in the variable pane inserts this variable into the command line sequence)
Ctrl + 5: properties pane
Ctrl + 7: viewer window
Ctrl + 8: data window

Function keys:

F1: enters the command ‘help advice’
F2: enters the command ‘describe’
F3: opens up a ‘find’ tab in the output pane to search for specific text
F7: inserts the command ‘save’ into the command sequence
F8: inserts the command ‘use’ into the command sequence

Alt+F4: quits the program (I think this maybe a general windows command)

Customise function keys
Entering a command sequence of global F[number]commands assigns a function key to a command sequence such as “list;” (leave out the square brackets), with the semi-colon indicating entry of a command. This is a special form of global macro – see below.

Custom macros
Entering a command sequence of global [macro][commands] saves a sequence of words which you can later recall by prefix-ing the macro with $, that is $[macro]. You can get quite fancy with this it seems, so if interested read more in the pdf file “Programming Stata”: http://www.stata.com/manuals13/u18.pdf

________________

Coming from a background of using design software where shortcut usage is common, I find it surprising a command heavy program such as Stata doesn’t make keyboard shortcut information more readily accessible. Ideally, shortcut commands would be displayed when hovering a cursor over a menu command or displayed next to the command. Entry of these commands via shortcuts would still be recorded in the command log, so from a workflow perspective I can’t understand why this hasn’t been implemented in a mature program which many people use.