HEP-ex Postdoc Success | Sourav's (2024)

%%HTML<style type="text/css">table.dataframe td, table.dataframe th { border: 1px black solid !important; color: black !important;}</style>

HEP-ex Postdoc Success | Sourav's (1)

As an experimental High Energy Physics (hep-ex) grad student, I often wonder which university/national lab should I choose for doing a postdoc to increase my odds of getting a faculty position, if I plan to stay in academia. But unlike other sub-fields in Physics, we have huge world-wide collaborations for hep-ex experiments like the Large Hadron Collider. In such collaborative environment, it is not very clear if it really matters where one does his/her postdoc, in terms of finding an academic faculty (research scientist) position. It might not be hard to convince oneself that there is actually no such correlation between a postdoc’s affiliation and possibility of finding an academic job (faculty position) eventually. This has prompted me to put this hypothesis to test. So, let’s explore here whether such a correlation between a postdoc’s affiliation and future success in finding an academic faculty position in hep-ex exists.

import reimport pandas as pdimport numpy as npimport matplotlib.pyplot as plt%matplotlib inlinefrom sklearn.linear_model import LinearRegressionimport statsmodels.api as smfrom sklearn import linear_model

Data collection

hepexrumor (https://sites.google.com/site/hepexrumor/) is a popular unofficial site which has latest rumors about the hep-ex jobs (in the US and ouside). I parse this website for getting the job rumors from 2005-2019. For this short study, I did not consider temporal variation in job patterns and combined the data of all the years.

I use the latest affiliation of a postdoc while applying for job. I only consider the postdocs who cleared the short-list round for a job as the total candidate pool, with a presumptuous assumption that postdocs not clearing the shortlist were not serious candidates for the job.

Parsing hepexrumor:

hepexjobsite = 'https://sites.google.com/site/hepexrumor/'year = {2005: '2005-rumor' , 2006: '2006-rumor' , 2007: '2007-rumor' , 2008: '2008-rumor' , 2009: '2009-rumor-1', 2010: '2009-rumor' , 2011: '2011-rumors' , 2012: '2012-rumors' , 2013: '2013-rumors' , 2014: '2014-rumors' , 2015: '2015-rumors' , 2016: '2016-rumors' , 2017: '2016-2017' , 2018: '2018-rumors' , 2019: '2019-rumors' }df = {}for i in range(2005,2020): p = pd.read_html(hepexjobsite+year[i]) print(i, len(p)) if (i < 2016 ): tUS = p[3].iloc[1:] tUS.columns = p[3].iloc[0] else: tnonUS = p[4].iloc[1:] tnonUS.columns = p[4].iloc[0] tnonUS = tnonUS.drop(columns=['Field']) tUS = p[5].iloc[1:] tUS.columns = p[5].iloc[0] tUS = tUS.drop(columns=['Field']) tUS.append(tnonUS, ignore_index=True) tUS.columns = ["Institution", "Short List", "Offers"] df[i] = tUS
2005 42006 42007 42008 42009 42010 42011 42012 42013 42014 42015 42016 62017 62018 62019 6
df[2017].head()
Institution Short List Offers
1 Nebraska Jamie Antonelli (Ohio State) [CMS]Clemens Lan... Frank Golf (accepted)
2 Wilson Fellowship Joseph Zennamo (Chicago) [MicroBooNE, SBND]Mi... Minerba Betancourt (accepted)Nhan Tran (accep...
3 Alabama Carl Pfendner (Ohio State) [ARA, EVA] NaN
4 Cornell (accelerators) NaN NaN
5 Brookhaven John Alison (Chicago) [ATLAS]Viviana Cavalier... Viviana Cavaliere (accepted)

Data cleaning

There is ambiguity associated to the names of some of the universities and labs, like Fermilab is listed as ‘Fermilab’ in some places and ‘FNAL’ elsewhere. The function below removes this ambiguity by replacing the ambiguous names to a standard name for the organizations:

def UniNameAmbiguityFix(dfk): Uni_name_ambiguity = {'Argonne': 'ANL', 'Boston University': 'Boston U', 'BU': 'Boston U', 'Brown University': 'Brown', 'Cal Tech': 'Caltech', 'Carnegie': 'Carnegie Mellon', 'Colorado State University': 'Colorado State', 'Fermilab': 'FNAL', 'FNAL/Chicago': 'FNAL', 'Industry/Fermilab': 'FNAL', 'Chicago/FNAL': 'FNAL', 'Göttingen': 'Gottingen', 'Imperial': 'Imperial College London', 'Indiana': 'Indiana University', 'KSU': 'Kansas State', 'Los Alamos': 'LANL', 'LBL': 'LBNL', 'MSU': 'Michigan State', 'Northeastern University': 'Northeastern', 'Northwestern University': 'Northwestern', 'OSU': 'Ohio State', 'SUNY Stony Brook': 'Stony Brook', 'Texas A&M': 'TAMU', 'Triumf': 'TRIUMF', 'U Chicago': 'UChicago', 'Chicago': 'UChicago', 'University of Chicago': 'UChicago', 'Berkeley': 'UC Berkeley', 'University of Colorado Boulder': 'UC Boulder', 'CU Boulder': 'UC Boulder', 'Colorado': 'UC Boulder', 'Davis': 'UC Davis', 'Irvine': 'UC Irvine', 'UCSD': 'UC San Diego', 'UCSB': 'UC Santa Barbara', 'UCSC': 'UC Santa Cruz', 'UIC': 'University of Illinois Chicago', 'University of Illinois Urbana-Champaign': 'UIUC', 'University of North Carolina': 'UNC', 'University of Pennsylvania': 'UPenn', 'University of Texas Austin': 'UT Austin', 'Florida': 'University of Florida', 'Geneva': 'University of Geneva', 'Hawaii': 'University of Hawaii', 'Maryland': 'University of Maryland', 'Michigan': 'University of Michigan', 'Minnesota': 'University of Minnesota', 'Sheffield': 'University of Sheffield', 'Victoria': 'University of Victoria', 'Virginia': 'University of Virginia', 'Washington': 'University of Washington', 'University of Wisconsin Madison': 'UW Madison', 'Wisconsin': 'UW Madison', 'UW': 'UW Madison', 'UW-Madison': 'UW Madison'} Uni_name_ambiguity.keys() dfk = dfk.replace({'Affiliation': Uni_name_ambiguity}) dfk = dfk.groupby(['Applicant', 'Affiliation'])['Attempts'].sum().reset_index() return dfk

Extracting data about job interviews performances of postdocs

Extracting tables for applicant job performance (along with their latest affiliation at the time of job application) from tables for job results.

ApplicantTable = {}for i in range(2005, 2020): attempt = df[i]['Short List'].str.split("\)", expand=True) attempt = attempt.unstack() attempt = attempt.str.split(r"\[.*?\]").str.join('') attempt = attempt.str.strip() attempt = attempt.value_counts() attempt = attempt.to_frame() attempt.reset_index(level=0, inplace=True) attempt.columns = ['Applicant', 'Attempts'] attemptTable = attempt['Applicant'].str.split('(', expand=True) attemptTable.columns = ['Applicant', 'Affiliation'] attemptTable['Attempts'] = attempt['Attempts'] attemptTable = attemptTable.iloc[1:] indexDrop = attemptTable[attemptTable['Applicant'].str.contains("\)" or "\(" or "[" or "]")].index attemptTable.drop(indexDrop , inplace=True) attemptTable.Affiliation.str.strip() attemptTable = UniNameAmbiguityFix(attemptTable) offerTable = df[i]['Offers'].str.split(r"\(.*?\)", expand=True) offerTable = offerTable.unstack() offerTable = offerTable.str.strip() offerTable = offerTable.value_counts() offerTable = offerTable.to_frame() offerTable.reset_index(level=0, inplace=True) offerTable.columns = ['Applicant', 'Offers'] offerTable['Applicant'] = offerTable['Applicant'].str.replace(u'† \xa0', u'') offerTable = offerTable.iloc[1:] attemptTable.Applicant = attemptTable.Applicant.str.strip() offerTable.Applicant = offerTable.Applicant.str.strip() ApplicantTable[i] = attemptTable.merge(offerTable, how='left', left_on='Applicant', right_on='Applicant') ApplicantTable[i] = ApplicantTable[i].fillna(0) ApplicantTable[i].Offers = ApplicantTable[i].Offers.astype(int) #applicants with no affiliations listed are dropped ApplicantTable[i].drop(ApplicantTable[i][ApplicantTable[i]['Affiliation'].str.strip() == ""].index , inplace=True) #blank applicant dropped ApplicantTable[i].drop(ApplicantTable[i][ApplicantTable[i]['Applicant'].str.strip() == ""].index , inplace=True) #theory or non-hep jobs to be dropped ApplicantTable[i].drop(ApplicantTable[i][ApplicantTable[i]['Applicant'].str.lower().str.contains('theory')].index , inplace=True) ApplicantTable[i].drop(ApplicantTable[i][ApplicantTable[i]['Applicant'].str.lower().str.contains('hep')].index , inplace=True) ApplicantTable[i].drop(ApplicantTable[i][ApplicantTable[i]['Affiliation'] == 'IAS'].index , inplace=True) ApplicantTable[i].drop(ApplicantTable[i][ApplicantTable[i]['Affiliation'] == 'theory'].index , inplace=True) #other misc. cleaning ApplicantTable[i].drop(ApplicantTable[i][ApplicantTable[i]['Affiliation'] == 'notes below'].index , inplace=True) ApplicantTable[i].drop(ApplicantTable[i][ApplicantTable[i]['Affiliation'] == 'Ultralytics'].index , inplace=True) ApplicantTable[i] = ApplicantTable[i].sort_values(by=['Offers', 'Attempts'], ascending=False) ApplicantTable[i]
ApplicantTable[2015].head()
Applicant Affiliation Attempts Offers
67 Joshua Spitz MIT 7 2
5 Alex Himmel Duke 5 2
77 Laura Fields Northwestern 4 2
90 Matt Wetstein UChicago 4 2
12 Andrzej Szelc Yale 2 2

Combining data of all the years.

ApplicantTableAllYears = pd.concat(ApplicantTable, ignore_index=True)ApplicantTableAllYears = ApplicantTableAllYears.groupby(['Applicant', 'Affiliation'])['Attempts', 'Offers'].sum().reset_index()ApplicantTableAllYears = ApplicantTableAllYears.sort_values(by=['Offers', 'Attempts'], ascending=False)ApplicantTableAllYears.head()
Applicant Affiliation Attempts Offers
220 Florencia Canelli FNAL 15 7
571 Sabine Lammers Columbia 7 5
86 Ben Kilminster Ohio State 8 4
115 Carter Hall SLAC 5 4
215 Eva Halkiadakis Rochester 9 3

I define a success as getting at least one job offer, ie assign an applicant success = 1. With no offers at all, I define the (short-listed) candidate to be unsuccessful, ie assign the applicant success = 0.

ApplicantTableAllYears['Success'] = (ApplicantTableAllYears['Offers'] > 0).astype(int)ApplicantTableAllYears.head()
Applicant Affiliation Attempts Offers Success
220 Florencia Canelli FNAL 15 7 1
571 Sabine Lammers Columbia 7 5 1
86 Ben Kilminster Ohio State 8 4 1
115 Carter Hall SLAC 5 4 1
215 Eva Halkiadakis Rochester 9 3 1

University Metric

In order to understand if there is any role of a university/lab in the success of its postdoc in finding a permanent job in academia, we define a few metrics to quantify the track record of a university/lab in producing successful postdocs (postdocs who could find permanent jobs immediately after finishing their current postdoc at that university/lab).

For our positive hypothesis, we assume that every university/affiliation develops some qualities in its postdocs, which influences their employability in academia. Then the rate at which its postdocs get job offers every year (academic cycle) can be modelled by Poisson distribution:

$$ P(k\ job\ offers\ per\ year\ from\ university\ u) = \frac{\lambda_u^{k} e^{-\lambda_u}}{k!} $$

where the rate parameter $\lambda_u$ encoding those qualities which influence the overall employability of postdocs from university/lab $u$. Here k can theoretically range from 0, 1, 2, .., total no. of hepex job positions available globally for that year.Here, we made three assumptions:* Total number of jobs applied by all the postdocs from a university/lab in a year is very large.* All postdocs of a university/lab are of similar academic calibre when they start looking for permanent jobs, which the universities may ensure during their postdoc recruitment process and then through adequate research/academic training of their postdoctoral researchers throughout the term of their affiliation.* Success or failure of a postdoc in one job application does not affect the outcomes of other job application for that postdoc or other postdocs of that university in any way. (In reality, if one postdoc gets a job, that job becomes unavailable to other postdocs).

With these three assumptions, $\lambda_u$ becomes an indicator of the contribution of a university/lab in the overall success of its postdoctoral fellows.

Average no. of successful offers/year is a metric for estimating the rate at which postdocs of a university can crack hepex job interviews, as it is an unbiased estimator of $\lambda_u$.

Average no. of successful offers/year, however, does not take into account the no. of the postdoc fellows in a university/lab, but the size of a research group often inluences the skills of its members. The no. of postdocs a university hires varies from year to year based on various factors like - funding secured by the university/lab, no. of projects, no. of professors, etc.

Since we assume that each postdoc’s outcomes are independent of each other from the same university/lab, so model university’s role for each postdoc success as independent poisson process. This assumption makes the rate $\lambda_u$ as:

$\lambda_u$ = $\Sigma_{i_u = 0}^{N}$ $\lambda_{i_u}$ where N is the total no. of postdocs in a university/lab

Here, $i_u$ is the i-th postdoc of the university/lab u for a given year. Now, since we also assume all the postdocs of the same university/lab are at par in academic/research calibre when applying for permanent jobs, we assume the rates $\lambda_{u_i}$ for each candidate as identical to others. Therefore, $\lambda_{u_i} = \lambda^{indiv}_{u}$ (constant).

$\lambda_u = \Sigma_{i_u = 0}^{N}$ $\lambda_{i_u} = N\lambda^{indiv}_{u}$

Therefore, $\lambda^{indiv}_{u} = \frac{\lambda_u}{N}$

Although, N (no. of postdocs applying for jobs) varies every year based on many factors such as funding available to the university/lab, no. of projects the university/lab is working on, no. of principal investigators working at the time etc.

To average out these variations, we use the average no. of postdocs applying to jobs/year from a university/lab as an estimator of N.

Now we can define the university metric (estimator of $\lambda^{indiv}_{u}$):

Offers/candidate = $\frac{Average\ no.\ of\ successful\ offers\ per\ year}{Average\ no.\ of\ postdocs\ applying\ to\ jobs\ per\ year} = \frac{Total\ no.\ of\ successful\ offers\ from\ 2005-19}{Total\ no.\ of\ postdocs\ applying\ to\ jobs\ from\ 2005-19}$

UniversityTableAllYears = ApplicantTableAllYears.drop(columns=['Applicant', 'Attempts'])UniversityTableAllYears['Failure'] = (UniversityTableAllYears['Offers'] == 0).astype(int)UniversityTableAllYears = UniversityTableAllYears.groupby(['Affiliation'])['Offers', 'Success', 'Failure'].sum().reset_index()UniversityTableAllYears['Offers/candidate'] = UniversityTableAllYears['Offers']*1./(UniversityTableAllYears['Success'] + UniversityTableAllYears['Failure'])UniversityTableAllYears.columns = ['Affiliation', 'Total Offers', 'Total successful candidates', 'Total unsuccessful candidates', 'Offers/candidate']UniversityTableAllYears = UniversityTableAllYears.sort_values(by=['Offers/candidate'], ascending=False)UniversityTableAllYears.head()
Affiliation Total Offers Total successful candidates Total unsuccessful candidates Offers/candidate
24 Columbia 19 12 3 1.266667
99 Stony Brook 5 3 1 1.250000
108 Toronto 6 4 1 1.200000
124 UPenn 12 7 3 1.200000
101 Syracuse 1 1 0 1.000000

Candidates with at least one offer are counted as successful, while ones with no offer are counted as unsuccessful candidates.

plt.style.use('ggplot')u_total_success = UniversityTableAllYears.sort_values(by=['Total successful candidates'], ascending=False)x_pos = [i for i, _ in enumerate(u_total_success['Affiliation'].iloc[:5])]plt.bar(x_pos, u_total_success['Total successful candidates'].iloc[:5], color='green')plt.xlabel("Postdoc affiliation")plt.ylabel("Total successful candidates")plt.title("Universities/labs which produced largest number of successful candidates (from 2005-2019)")plt.xticks(x_pos, u_total_success['Affiliation'].iloc[:5])plt.show()

HEP-ex Postdoc Success | Sourav's (2)

FNAL (Fermilab) has a huge particle physics group especially during the tevatron days! :)

plt.style.use('ggplot')x_pos = [i for i, _ in enumerate(UniversityTableAllYears['Affiliation'].iloc[:5])]plt.bar(x_pos, UniversityTableAllYears['Offers/candidate'].iloc[:5], color='green')plt.xlabel("Postdoc affiliation")plt.ylabel("Avg. offer per candidate")plt.title("Universities/labs which have highest offers per candidate (from 2005-2019)")plt.xticks(x_pos, UniversityTableAllYears['Affiliation'].iloc[:5])plt.show()

HEP-ex Postdoc Success | Sourav's (3)

def checkmodeling(uniname): uni_offers = [] UniversityTable = {} for i in range(2005,2020): UniversityTable[i] = ApplicantTable[i].sort_values(by=['Offers', 'Attempts'], ascending=False) UniversityTable[i] = UniversityTable[i].groupby(['Applicant', 'Affiliation'])['Attempts', 'Offers'].sum().reset_index() UniversityTable[i]['Success'] = (UniversityTable[i]['Offers'] > 0).astype(int) UniversityTable[i] = UniversityTable[i].drop(columns=['Applicant', 'Attempts']) UniversityTable[i]['Failure'] = (UniversityTable[i]['Offers'] == 0).astype(int) UniversityTable[i] = UniversityTable[i].groupby(['Affiliation'])['Offers', 'Success', 'Failure'].sum().reset_index() d = UniversityTable[i] o = d[d['Affiliation'] == uniname]['Offers'] if (len(o.values)!=0): uni_offers.append(int(o)) uni_offers = np.array(uni_offers) def factorial (n): if (n > 0): return n*factorial(n-1) else: return 1 def poisson(k, lamb): """poisson pdf, parameter lamb is the fit parameter""" return (lamb**k/factorial(k)) * np.exp(-lamb) lamb = uni_offers.mean() uni_offers.sort() p = [poisson(_, lamb) for _ in range(uni_offers.max()+1)] binboundary = np.array(range(-1,uni_offers.max()+1)) + 0.5 plt.hist(uni_offers, bins=binboundary, normed=True, alpha=0.5,histtype='stepfilled', color='steelblue', edgecolor='none'); plt.plot(range(uni_offers.max()+1), p, 'ro-', label='Poiss(%.2f)'%lamb) plt.xlabel("offers per year") plt.ylabel("Arbitrary units") plt.title("offers/year to %s postdocs (from 2005-2019)"%uniname) plt.legend() plt.show()

Let’s check for some universities/labs how well (badly) does the Poisson modeling of the offers per year work..

uninames = ['Columbia', 'FNAL', 'CERN', 'Stony Brook', 'UPenn'][checkmodeling(uniname) for uniname in uninames]
/home/sourav/anaconda3/envs/ML/lib/python3.6/site-packages/ipykernel_launcher.py:32: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.

HEP-ex Postdoc Success | Sourav's (4)

HEP-ex Postdoc Success | Sourav's (5)

HEP-ex Postdoc Success | Sourav's (6)

HEP-ex Postdoc Success | Sourav's (7)

HEP-ex Postdoc Success | Sourav's (8)

[None, None, None, None, None]

Postdoc Metrics

We can define individual success of a postdoc using postdoc metric 1:

Success odds = $\frac{total\ offers}{total\ rejections}$ (for a postdoc)

postdoc metric 2 is the binary form of postdoc metric 1:

Success = 1 if (success odds > 0) else 0

ie, if a postdoc got at least one job offer, that postdoc is counted as successful.

Adding success odds to the table:

ApplicantTableAllYears['Success odds'] = ApplicantTableAllYears['Offers']/(ApplicantTableAllYears['Attempts'] - ApplicantTableAllYears['Offers'])ApplicantTableAllYears = ApplicantTableAllYears[~ApplicantTableAllYears.isin([np.nan, np.inf, -np.inf]).any(1)]ApplicantTableAllYears.head()
Applicant Affiliation Attempts Offers Success Success odds
220 Florencia Canelli FNAL 15 7 1 0.875
571 Sabine Lammers Columbia 7 5 1 2.500
86 Ben Kilminster Ohio State 8 4 1 1.000
115 Carter Hall SLAC 5 4 1 4.000
215 Eva Halkiadakis Rochester 9 3 1 0.500

Checking the distribution of success odds:

plt.hist(ApplicantTableAllYears['Success odds'], bins=20)plt.xlabel("success odds")plt.ylabel("no. of postdocs")plt.title("postdocs from all uni/labs included (from 2005-2019)")plt.show()

HEP-ex Postdoc Success | Sourav's (9)

Success odds distributions mostly 0 (no offers) and a peak at 1 (no. of offers = no. of rejections) per candidate.

UniApplicantTableAllYear = ApplicantTableAllYears.merge(UniversityTableAllYears[['Affiliation', 'Offers/candidate', 'Total successful candidates', 'Total unsuccessful candidates']], how='left', left_on='Affiliation', right_on='Affiliation')UniApplicantTableAllYear[UniApplicantTableAllYear['Success']==0].head()
Applicant Affiliation Attempts Offers Success Success odds Offers/candidate Total successful candidates Total unsuccessful candidates
191 David Lopes Pegna Princeton 11 0 0 0.0 0.384615 5 8
192 Sasha Telnov Princeton 9 0 0 0.0 0.384615 5 8
193 Christopher Backhouse Caltech 8 0 0 0.0 0.600000 3 7
194 Corrinne Mills Harvard 7 0 0 0.0 0.600000 8 7
195 Jesse Wodin SLAC 6 0 0 0.0 0.888889 9 9

Postdoc metrics vs. university metric

Postdoc metric 1 (success odds) vs. university metric (offers/candidate)

plt.scatter(UniApplicantTableAllYear['Offers/candidate'], UniApplicantTableAllYear['Success odds'], marker = '.')plt.xlabel('University metric: offers per candidate')plt.ylabel('Postdoc metric: success odds')
Text(0, 0.5, 'Postdoc metric: success odds')

HEP-ex Postdoc Success | Sourav's (10)

Pearson correlation:

correlation = UniApplicantTableAllYear[['Offers/candidate', 'Success odds']]correlation.corr()
Offers/candidate Success odds
Offers/candidate 1.000000 0.343269
Success odds 0.343269 1.000000

Since there are other factors contributing to a postdocs success, the variation of the median of success odds w.r.t offers/candidate is useful in understanding the effect of university on postdoc’s success.

bp = UniApplicantTableAllYear.boxplot(column='Success odds',by='Offers/candidate')bp.set_xlabel('offers per candidate')bp.set_ylabel('success odds')bp.set_title('')bp
<matplotlib.axes._subplots.AxesSubplot at 0x7f34126a33c8>

HEP-ex Postdoc Success | Sourav's (11)

hom*oscedasticity doesn’t hold very well here

x = UniApplicantTableAllYear['Offers/candidate'].valuesy = UniApplicantTableAllYear['Success odds'].valuesx = x.reshape(-1, 1)y = y.reshape(-1, 1)regr = LinearRegression()regr.fit(x, y)plt.scatter(UniApplicantTableAllYear['Offers/candidate'], UniApplicantTableAllYear ['Success odds'], alpha=0.5, color='blue', marker='.', label='data')plt.plot(x, regr.predict(x), color='black', linewidth=3)plt.xlabel('offers per candidae')plt.ylabel('success odds')plt.legend()plt.show()

HEP-ex Postdoc Success | Sourav's (12)

UniApplicantTableAllYearLog = UniApplicantTableAllYearUniApplicantTableAllYearLog['Success logit'] = np.log(UniApplicantTableAllYearLog['Success odds'])UniApplicantTableAllYearLog = UniApplicantTableAllYearLog[~UniApplicantTableAllYearLog.isin([np.nan, np.inf, -np.inf]).any(1)]UniApplicantTableAllYearLog.head()
/home/sourav/.local/lib/python3.6/site-packages/pandas/core/series.py:853: RuntimeWarning: divide by zero encountered in log result = getattr(ufunc, method)(*inputs, **kwargs)
Applicant Affiliation Attempts Offers Success Success odds Offers/candidate Total successful candidates Total unsuccessful candidates Success logit
0 Florencia Canelli FNAL 15 7 1 0.875 0.909091 41 25 -0.133531
1 Sabine Lammers Columbia 7 5 1 2.500 1.266667 12 3 0.916291
2 Ben Kilminster Ohio State 8 4 1 1.000 0.900000 4 6 0.000000
3 Carter Hall SLAC 5 4 1 4.000 0.888889 9 9 1.386294
4 Eva Halkiadakis Rochester 9 3 1 0.500 0.857143 4 3 -0.693147
bp = UniApplicantTableAllYearLog.boxplot(column='Success logit',by='Offers/candidate')bp.set_xlabel('offers per candidate')bp.set_ylabel('success logit')bp.set_title('')bp
<matplotlib.axes._subplots.AxesSubplot at 0x7f341284c978>

HEP-ex Postdoc Success | Sourav's (13)

hom*oscedasticity better with success logit.

x = UniApplicantTableAllYearLog['Offers/candidate'].valueslogy = UniApplicantTableAllYearLog['Success logit'].valuesx = x.reshape(-1, 1)logy = logy.reshape(-1, 1)#adding column of 1 to estimate slope and interceptUniApplicantTableAllYearLog['const'] = 1regrOLSlog = sm.OLS(UniApplicantTableAllYearLog['Success logit'], UniApplicantTableAllYearLog[['Offers/candidate', 'const']]).fit()regrlog = LinearRegression()regrlog.fit(x, logy)plt.scatter(UniApplicantTableAllYearLog['Offers/candidate'], UniApplicantTableAllYearLog['Success logit'], alpha=0.5, color='blue', marker='.', label='data')plt.plot(x, regrlog.predict(x), color='black', linewidth=3)plt.xlabel('offers per candidate')plt.ylabel('success logit')plt.legend()plt.show()
/home/sourav/anaconda3/envs/ML/lib/python3.6/site-packages/ipykernel_launcher.py:6: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.Try using .loc[row_indexer,col_indexer] = value insteadSee the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

HEP-ex Postdoc Success | Sourav's (14)

## slope of the regressionslope = regrlog.coef_[0][0]## intercept of the regressionintercept = regrlog.intercept_[0]## R^2 valuersq = regrlog.score(x, logy)slope, intercept, rsq
(0.4155776204426254, -0.8826111233324048, 0.018266248086799997)
print(regrOLSlog.summary())
 OLS Regression Results ==============================================================================Dep. Variable: Success logit R-squared: 0.018Model: OLS Adj. R-squared: 0.013Method: Least Squares F-statistic: 3.517Date: Sat, 12 Oct 2019 Prob (F-statistic): 0.0623Time: 08:18:13 Log-Likelihood: -212.57No. Observations: 191 AIC: 429.1Df Residuals: 189 BIC: 435.6Df Model: 1 Covariance Type: nonrobust ==================================================================================== coef std err t P>|t| [0.025 0.975]------------------------------------------------------------------------------------Offers/candidate 0.4156 0.222 1.875 0.062 -0.022 0.853const -0.8826 0.176 -5.011 0.000 -1.230 -0.535==============================================================================Omnibus: 7.067 Durbin-Watson: 0.269Prob(Omnibus): 0.029 Jarque-Bera (JB): 4.580Skew: -0.220 Prob(JB): 0.101Kurtosis: 2.382 Cond. No. 6.60==============================================================================Warnings:[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

p-value of 0.062 for the slope of the linear regression suggests that the dependence of success logit on offers/candidate is NOT statistically significant with 95% CL. So, the role of university in the success of its postdocs cannot be established with statistical significance using this pair of university and postdoc metrics.

Postdoc metric 2 (success) vs. university metric (offers/candidate)

UniApplicantTableAllYear.head()
Applicant Affiliation Attempts Offers Success Success odds Offers/candidate Total successful candidates Total unsuccessful candidates Success logit
0 Florencia Canelli FNAL 15 7 1 0.875 0.909091 41 25 -0.133531
1 Sabine Lammers Columbia 7 5 1 2.500 1.266667 12 3 0.916291
2 Ben Kilminster Ohio State 8 4 1 1.000 0.900000 4 6 0.000000
3 Carter Hall SLAC 5 4 1 4.000 0.888889 9 9 1.386294
4 Eva Halkiadakis Rochester 9 3 1 0.500 0.857143 4 3 -0.693147
t = UniApplicantTableAllYear[['Offers/candidate', 'Success']]t = t.sort_values(by=['Offers/candidate'], ascending=False)tsuccess = t[t['Success'] == 1]tfailure = t[t['Success'] == 0]bins=8plt.hist(tsuccess['Offers/candidate'], bins, alpha=0.3, label='Success')plt.hist(tfailure['Offers/candidate'], bins, alpha=0.3, label='Failure')plt.xlabel('Offers/candidate')plt.ylabel('no. of postdocs')plt.legend(loc='best')plt.show()

HEP-ex Postdoc Success | Sourav's (15)

logisticRegr = linear_model.LogisticRegression(C=1e5, solver='lbfgs')logisticRegr.fit(UniApplicantTableAllYear[['Offers/candidate']], UniApplicantTableAllYear['Success'])logisticRegr.score(UniApplicantTableAllYear[['Offers/candidate']], UniApplicantTableAllYear['Success'])
0.6867030965391621
print(logisticRegr.coef_[0][0], logisticRegr.intercept_[0])
3.105024677567668 -2.582806084094838
#xLR = np.arange(UniApplicantTableAllYear['Offers/candidate'].min(), # UniApplicantTableAllYear['Offers/candidate'].max(), 0.01)xLR = np.arange(-0.5, 2, 0.01)xLR = xLR.reshape(-1,1)
from scipy.special import expitlogistic = expit(xLR * logisticRegr.coef_ + logisticRegr.intercept_).ravel()plt.scatter(UniApplicantTableAllYear['Offers/candidate'], UniApplicantTableAllYear['Success'], marker='.', alpha = 0.2, color='green')plt.plot(xLR, logistic, color='blue', linewidth=3)plt.xlabel('offers per candidate')plt.ylabel('success')plt.legend()plt.show()
No handles with labels found to put in legend.

HEP-ex Postdoc Success | Sourav's (16)

offers per candidate metric is not very good in discriminating postdocs into successful (at least one job offer) and unsuccessful candidates, as the accuracy is ~69%.

Summary

  • Statistically significant relationship between Offers/candidate, the university metric, and the postdoc metrics could not be established.
  • We could not establish, with statisical significance, if the affiliation of a experimental high energy physics (hepex) postdoc is an indicator of future success in finding permanent academic position.
  • temporal variations in hep-ex job market not taken into account
  • US and non-US jobs to be treated separately
  • should separate the study into energy, intensity and cosmic frontiers, as the job trends and funding are different for each
  • Look into other indicators of postdoc success like research productivity, etc.—
HEP-ex Postdoc Success | Sourav's (2024)

References

Top Articles
Latest Posts
Article information

Author: Saturnina Altenwerth DVM

Last Updated:

Views: 6105

Rating: 4.3 / 5 (64 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Saturnina Altenwerth DVM

Birthday: 1992-08-21

Address: Apt. 237 662 Haag Mills, East Verenaport, MO 57071-5493

Phone: +331850833384

Job: District Real-Estate Architect

Hobby: Skateboarding, Taxidermy, Air sports, Painting, Knife making, Letterboxing, Inline skating

Introduction: My name is Saturnina Altenwerth DVM, I am a witty, perfect, combative, beautiful, determined, fancy, determined person who loves writing and wants to share my knowledge and understanding with you.